Yang, Jinyu (2024). Towards more flexible and efficient RGBD object tracking. University of Birmingham. Ph.D.
|
Yang2024PhD.pdf
Text - Accepted Version Available under License All rights reserved. Download (31MB) | Preview |
Abstract
Object tracking is a fundamental task in the area of computer vision. Recently, RGBD (RGB+Depth) object tracking has gained lots of attention due to the development of depth cameras. Compared to RGB-only tracking, RGBD tracking opens more opportunities for accurate and robust object tracking in complex scenarios, such as background clutter and dark scenes, as depth clues are helpful on object and background interference, especially in color failed occasions. However, the development of RGBD tracking severely lags behind its RGB counterparts and remains far from sufficient for real-world applications due to various challenges. On the one hand, current RGBD tracking is limited by 2D settings, which constrains RGBD tracking on 2D bounding box descriptions and neglects its potential for flexibility brought by depth information. On the other hand, the efficiency of RGBD trackers is ignored, which impedes the realistic applications of RGBD tracking. This thesis addresses the aforementioned issues and contributes to more flexible and efficient RGBD object tracking. In particular, a series of works are presented to explore and demonstrate the power of RGBD tracking.
The four main contributions of this thesis are:
The first contribution of the thesis introduces a novel paradigm, i.e., weakly-supervised RGBD video object segmentation, which achieves pixel-level RGBD object tracking under weak supervision. By exploring robust cross-modal fusion, the proposed FusedCDNet performs RGBD tracking on pixel level with only bounding box level supervision in both training and testing.
The second contribution of this work introduces generic 3D object tracking in RGBD videos. Specifically, a novel Track-it-in-3D dataset is proposed with rotated 3D BBox annotation, which bridges the gap between RGBD tracking and point cloud tracking. Also, a strong baseline is given for generic 3D object tracking with color and depth fusion and 3D-level cross-correlation.
The third contribution presents a study on the training-efficient tracking paradigm, which addresses the high training cost problem in RGBD tracking by applying a prompt learning paradigm. With the proposed cross-modal prompts, both the large-scale RGB knowledge from pre-trained large models and complementary information from depth sensors can be well explored. Moreover, the prompting framework is effective on different multi-modal object tracking tasks and its effectiveness is verified on different multi-modal tracking scenarios, including RGB+D, RGB+T, and RGB+Event tasks.
Finally, we present an efficient and lightweight approach for RGBD tracking, which is the first study on efficient RGBD object tracking. With efficient modality-aware fusion and lightweight backbone, the proposed RGBD tracker EMT runs at a speed of over 100fps. We also provide on-board scenarios and newly defined overhead space for RGBD aerial tracking. In such a scenario, many more categories (34 classes) can be considered for multi-modal aerial tracking than existing aerial tracking datasets. The corresponding on-board tests demonstrate that the proposed EMT can achieve real-time tracking on edge platforms.
As outlined, it is concluded that, by fully exploiting depth clues for solving the problems in current RGBD tracking, more flexible and efficient RGBD object tracking can be achieved.
Type of Work: | Thesis (Doctorates > Ph.D.) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Award Type: | Doctorates > Ph.D. | |||||||||
Supervisor(s): |
|
|||||||||
Licence: | All rights reserved | |||||||||
College/Faculty: | Colleges > College of Engineering & Physical Sciences | |||||||||
School or Department: | School of Computer Science | |||||||||
Funders: | None/not applicable | |||||||||
Subjects: | Q Science > Q Science (General) Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
|||||||||
URI: | http://etheses.bham.ac.uk/id/eprint/14729 |
Actions
![]() |
Request a Correction |
![]() |
View Item |
Downloads
Downloads per month over past year
