PointPillars: Fast Encoders for Object Detection from Point Clouds

2022.11.22

Pointpillars paper review for 3d object detection competition
Keywords: #3dobjectdetection #3dmodel #pointcloud

0. Abstract

Proposal: PointPillars, a novel encoder which utilizes PointNets to learn a representation of point clouds organized in vertical columns (pillars). + a lean downstream network to train the encoded features

Difference between 2d computer vision and lidar pcd object detection
1. The point cloud is a sparse representation, while image is dense
2. The point cloud is 3D, while the image is 2D
Past Literature
- 3D convolution → projection of pc to image → pc to bird’s eye view(BEV)
- BEV tends to be extremely sparse → VoxelNet, SECOND uses 3D convolution middle layers
Proposal
- PointPillars: a method for object detection in 3D that enables end-to-end learning with only 2D convolutional layers

Feature encoder network that converts pc to a sparse pseudo image
2D convolutional backbone to process the pseudo-image into high-level representation
A detection head that detects and regresses 3D boxes.