Transferrable learning from synthetic data: novel texture synthesis using Domain Randomization for visual scene understanding

Ani, Mohammad Kh M H M (2022). Transferrable learning from synthetic data: novel texture synthesis using Domain Randomization for visual scene understanding. University of Birmingham. Ph.D.

Preview

Ani2022PhD.pdf
Text - Accepted Version
Available under License All rights reserved.
Download (75MB) | Preview

Abstract

Modern supervised deep learning-based approaches typically rely on vast quantities of annotated data for training computer vision and robotics tasks. A key challenge is acquiring data that encompasses the diversity encountered in the real world. The use of synthetic or computer-generated data for solving these tasks has recently garnered attention for several reasons. The first being the efficiency of producing large amounts of annotated data at a fraction of the time required in reality, addressing the time expense of manually annotated data. The second addresses the inaccuracies and mistakes arising from the laborious task of manual annotations. Thirdly, it addresses the need for vast amounts of data typically required by data-driven state-of-the-art computer vision and robotics systems. Due to domain shift, models trained on synthetic data typically underperform those trained on real-world data when deployed in the real world. Domain Randomization is a data generation approach for the synthesis of artificial data. The Domain Randomization process can generate diverse synthetic images by randomizing rendering parameters in a simulator, such as the objects, their visual appearance, the lighting, and where they appear in the picture. This synthetic data can be used to train systems capable of performing well in reality. However, it is unclear how to best approach selecting Domain Randomization parameters such as the types of textures, object poses, or types of backgrounds. Furthermore, it is unclear how Domain Randomization generalizes across various vision tasks or whether there are potential improvements to the technique. This thesis explores novel Domain Randomization techniques to solve object localization, detection, and semantic segmentation in cluttered and occluded real-world scenarios. In particular, the four main contributions of this dissertation are:
(i) The first contribution of the thesis proposes a novel method for quantifying the differences between Domain Randomized and realistic data distributions using a small number of samples. The approach ranks all commonly applied Domain Randomization texture techniques in the existing literature and finds that the ranking is reflected in the task-based performance of an object localization task.
(ii) The second contribution of this work introduces the SRDR dataset - a large domain randomized dataset containing 291K frames of household objects widely used in robotics andvision benchmarking [23]. SRDR builds on the YCB-M [67] dataset by generating syntheticversions for images in YCB-M using a variety of domain randomized texture types and in 5 unique environments with varying scene complexity. The SRDR dataset is highly beneficial in cross-domain training, evaluation, and comparison investigations.
(iii) The third contribution presents a study evaluating Domain Randomization’s generalizabilityand robustness in sim-to-real in complex scenes for object detection and semantic segmentation. We find that the performance ranking is largely similar across the two tasks when evaluating models trained on Domain Randomized synthetic data and evaluating on real-world data, indicating Domain Randomization performs similarly across multiple tasks.
(iv) Finally, we present a fast, easy to execute, novel approach for conditionally generating domain randomized textures. The textures are generated by randomly sampling patches from real-world images to apply to objects of interest. This approach outperforms the most commonly used Domain Randomization texture method from 13.157 AP to 21.287 AP and 8.950 AP to 19.481 AP in object detection and semantic segmentation tasks. The technique eliminates manually defining texture distributions to sample Domain Randomized textures. We propose a further improvement to address low texture diversity when using a small number of real-world images. We propose to use a conditional GAN-based texture generator trained on a few real-world image patches to increase the texture diversity and outperform the most commonly applied Domain Randomization texture method from 13.157 AP to 20.287 AP and 8.950 AP to 17.636 AP in object detection and semantic segmentation tasks.

Type of Work:

Thesis (Doctorates > Ph.D.)

Award Type:

Doctorates > Ph.D.

Supervisor(s):