Recovering 6D pose of rigid object from point cloud at the level of instance and category

Chen, Wei ORCID: 0000-0001-6314-5600 (2022). Recovering 6D pose of rigid object from point cloud at the level of instance and category. University of Birmingham. Ph.D.

Text - Accepted Version
Available under License All rights reserved.

Download (32MB) | Preview


Estimating the 3D orientation and 3D position, i.e. 6D pose, of rigid objects plays an essential role in computer vision tasks. This field has been made huge progress with the development of deep learning techniques. However, some challenges still need to be addressed, such as occlusion, viewpoint variation, and intra-class variation in categorylevel pose estimation. This thesis is philosophically built upon addressing the aforementioned problems in 3D space via point cloud representation. Via addressing these problems, there are mainly three findings of this thesis: point cloud representation in 3D space is more suitable for 6D object pose estimation; feature design is essential to pose estimation tasks; rotation representation has an important impact on pose estimation results. For the first finding, all the three proposed pipelines use RGB information for the 2D location of the target object and estimate the 6D pose of the object in the detected region with point cloud input. The experimental results show that this fashion focuses the network learning useful 3D information from the point cloud, which is useful to pose estimation tasks. As to the second finding, for different challenges, we design different features. For the occlusion challenge at the instance level, we propose to extract dense local features by regressing point-wise vectors for pose hypotheses generation and select the best pose candidate based on 3D geometry constraints by RANSAC. Via this fashion, the network can better utilize the local 3D information from the point cloud. However, due to a large number of hypotheses, this generation and verification strategy is time-consuming. Then to mitigate the time-consuming and viewpoint variation problem, we propose the embedding vector feature. With this newly designed feature, the proposed method can effectively extract the viewpoint information from the train-ing dataset, which leads to the fast, over 20fps, 6D pose estimation of the target object. However, we still need a large amount of labelled data to train the model. To make the model less dependent on the labelled data, this thesis then addresses the category-level pose estimation problem. To handle the intra-class variation in the categorical 6D pose estimation task, we propose to use 3D graph convolution for category-level latent rotation feature learning. Finally, to fully decode the rotation information from the latent feature, we employ two decoders based on the newly designed rotation representation. With this new rotation representation and learned feature, the proposed method achieves state-of-the-art performance with almost real-time speed at the level of category.

Type of Work: Thesis (Doctorates > Ph.D.)
Award Type: Doctorates > Ph.D.
Licence: All rights reserved
College/Faculty: Colleges (2008 onwards) > College of Engineering & Physical Sciences
School or Department: School of Computer Science
Funders: None/not applicable
Subjects: Q Science > Q Science (General)
T Technology > T Technology (General)


Request a Correction Request a Correction
View Item View Item


Downloads per month over past year