top of page
Tengun-label

Three-dimensional object detection_Tengun-label

Updated: Aug 30, 2023

◉Background

  • Thanks to advances in sensing technology, it has become easier to obtain 3D data. Among them, Lidar is especially active in various situations.

  • Compared to 2D data, 3D data provides another dimension of "distance" information. This makes it possible to understand the real world more deeply.

  • 3D data + deep learning solved various real problems.

◉Applications of 3D point clouds

  1. You can get 3D point cloud from lidar.

  2. 3D point cloud + deep learning is important for autonomous driving (cars, robots).

  3. Deep learning enables automatic processing of 3D point cloud data. It can be applied in the following tasks:

  • 3D Point Cloud Segmentation

  • 3D Object Detection

  • 3D Object Classification

  • 3D Object Tracking

【Autonomous Driving Case: Object Detection with Lidar Data】

自動運転の事例:Lidarデータによる物体検出

【Augmented Reality case study】

Augment Realityの事例

【Example of Quality Analysis: Comparing Design with Reality】

クオリティ分析の事例:設計とリアルを照らし合わせる

【3D Environment Reconstruction Case Study: Reconstructing an Environment】

3D環境再構築の事例:環境を再構築する

【SLAM Case Study: Calculating Your Location in Real Time】

SLAMの事例:リアルタイムで居場所を計算する

【Distance measurement example: Measure distance remotely】

距離計測の事例:リモートで距離を計測する

◉Application of 3D point cloud: 3D object detection

When driving automatically, it is necessary to detect surrounding objects (such as cars). This allows the car to “understand” its surroundings.


:Technique

  • Performs 3D object detection based on data obtained from sensors mounted on the vehicle

  • 3D object detection can detect object type, size and distance

  • 3D object detection allows the car to “understand” its environment and act accordingly


【Object detection: detect the bounding box of a 3D object】

物体検出:三次元物体のBounding Boxを検出する

※A three-dimensional bounding box consists of eight points surrounding an object.


■Sensor in car

  • CAMERA: Camera for acquiring RGB data

  • LIDAR: Sensor for acquiring 3D point cloud using Laser

  • RADAR: A sensor for judging distance using voice

LIDAR directly provides three-dimensional information, which is useful for 3D object detection. CAMERA provides 2D data, but for 3D object detection it's not as straightforward as a point cloud.


■Sensors for acquiring data

三次元物体検出_データを取得するためのセンサー

【3D object detection: Model】

After the point cloud is acquired, 3D object detection is performed by deep learning. Here are some 3D object detection deep learning models.

  • PointNet

  • VoxelNet

  • PointPillars

  • IA-SSD

  • BEVFusion

【3D object detection: PointNet】

◉PointNet handles point cloud data directly

PointNetは点群データを直接処理する

■PointNet model processing process

  • Point cloud as input

  • Transform the input with the Transform Module for easier processing

  • Multilayer perceptron (mlp) layers to expand feature dimensionality

  • Transform again with Transform Module

  • Combining mlp and max pooling layers to squeeze results


◉Transform Module

Transform Module
  1. PointNet is the first milestone to process 3D data with deep learning

  2. PointNet uses the Transform module to appropriately transform and process the point cloud (similar to when humans change their viewpoint and observe an object)

  3. PointNet uses point cloud data directly, so accuracy and efficiency are low

  • Expressive power is weak as it is as a point cloud

  • There is a lot of wasteful processing due to the large number of points


【3D object detection: VoxelNet】


■Processing point clouds with voxels in basic units

  • Use voxels to delimit space

  • Voxel generates Voxel features by interior points

  • 3D Convolution Neural Network (CNN) to process features

  • Squeeze out the bounding box of the target

End-to-End Learning for Point Cloud Based 3D Object Detection (CVPR 2018)

三次元物体検出:VoxelNet

三次元物体検出_VoxelNet
  • Efficiency is higher than PointNet because it processes in voxel units

  • Carefully designed the feature extraction part of Voxel

  • Transform module is no longer available compared to PointNet

  • The voxel size is fixed, so large variations in point cloud density (e.g. lidar data) reduce the accuracy of the VoxelNet.

  • 3D CNN is used, so the processing speed is not high


【3D object detection: PointPillars】


◉The shape of lidar data is similar to cate

Lidarデータの形はケートと似ている

◉PointPillars: Use pillars (long and thin voxels) as basic units

PointPillars:Pillar(細長いVoxel)を基本単位にする

■Processing point clouds with Pillars in base units

  • Use Pillars to divide space

  • Pillars generate Pillars features by interior points

  • 2D Convolution Neural Network (CNN) to process features

  • High processing speed (40 FPS or more) because it uses 2D CNN that squeezes out the target bounding box


◉Processing using a Neural Network with a 2D Pyramid structure

2DのPyramid構造を持つNeural Networkを使って処理する
  • PointPillars is a model designed for Lidar

  • 2D CNN is used, so the processing speed is high (40 FPS or more)

  • High accuracy of object detection on lidar data


【3D object detection: IA-SSD】


■Not all points in the lidar point cloud are equally important

  • The goal object is of high importance

  • Background is less important

■Sampling according to importance

  • Critical points should be sampled frequently

  • Insignificant points should be sampled infrequently

■Proposed sampling method

  • Train the sampling module to perform the best sampling

三次元物体検出:IA-SSD

■Proposed sampling method

  • Sampling frequently for the target class in the MLP layer

  • The closer to the object center, the higher the sampling frequency

■Advantages of this method

  • You can oversample the points you want and undersample the points you don't.

  • Result: High-speed and high-precision object detection (80 FPS or more)


【3D object detection: BEVFusion】

  • BEVFusion: Object detection by fusing camera data and lidar data

  • The previous method used only Lidar, but in the case of autonomous driving, camera data can be used

  • Advantages: As the amount of information increases, improvement in accuracy can be expected.

  • Difficult point: The characteristics of the RGB data of the camera and the point cloud data of Lidar are different, so it is difficult to fuse them


◉BEVFusion: Fusion of camera and lidar data in BEV space

BEVFusion_カメラとLidarデータをBEV空間で融合させる

◉BEVFusion model structure

BEVFusionモデルの構造

◉Project camera data into BEV space

カメラデータをBEV空間に投影する

※Project 2D RGB data into BEV space (Top-Down 3D space)


三次元物体検出:BEVFusion

※Perform object detection with all cameras and point clouds simultaneously

  • RGB image and point cloud have different properties, difficult to fuse

  • BEVFusion proposed a fusion method

  • BEVFusion is more accurate than lidar-only methods


【Inquiry】

Please contact Tengun-label for consultation on various sensing and data analysis, including AI and 3D image processing.


Tengun-label Co., Ltd.

  • Headquarters: Room 1014, Nagatani Revure Shinjuku, 4-31-3 Nishi-Shinjuku, Shinjuku-ku, Tokyo

  • Branch office: 6-6-2 Nishi-Shinjuku, Shinjuku-ku, Tokyo Shinjuku Kokusai Building 4F Computer Mind

  • E-mail:info@tengun-label.com

  • TEL:080-5179-4874


21 views

Recent Posts

See All

Comments


bottom of page