A comprehensive analysis of vision deep learning methods for object detection and 6D pose estimation: Real-time applications

Sapienza, Davide

The popularity of Artificial Intelligence (AI) systems is growing rapidly, both in academia and society. In recent years, advances in computer vision and machine learning have enabled AI systems to be applied to a variety of scenarios, such as autonomous driving, robotics, and augmented reality applications. An obstacle detection system allows a car to detect and avoid potential hazards, or to brake in time to prevent an accident. Augmented reality can assist a surgeon in finding the most efficient way to make an incision, leading to better outcomes for patients. Automation of industrial processes can help reduce the risk of on-the-job injuries, by reducing the amount of wear and tear work.%of manual labor needed. These applications require the detection, identification and pose estimation of objects, to improve people's quality of life. In order to obtain a working system, many factors must be taken into account, including the choice of data in the learning process, the choice of the learning method, and the choice of hardware platforms. The current research focuses on examining various techniques to enhance accuracy, speed, and stability in two key applications: Object Detection and 6D Pose Estimation. This thesis will mainly delve into deep learning methods, which have led to breakthroughs in these fields. i) We will analyze the difficulties and characteristics of embedded Object Detection methods in detail, focusing on latencies, throughput, accuracy, memory and power consumption. We will evaluate the impact of each of these factors on the performance of the object detection system. ii) We will discuss the challenges and biases related to datasets and methods, as well as the possible solutions to address them. The importance of awareness of the inherent limitations of a given problem will be addressed. iii) Finally, a real-world case study of Object Detection and 6D Pose Estimation in underwater environments is presented, highlighting the challenges, pitfalls, and best choices for this particular scenario. The results of the experiments, on both simulated and real-world scenarios, will demonstrate that the proposed solutions are reliable and effective in detecting objects and estimating their 6D pose. The findings of this research could be used to improve accuracy and efficiency for 2D Object Detection and 6D Pose Estimation methods.

A comprehensive analysis of vision deep learning methods for object detection and 6D pose estimation: Real-time applications / Sapienza, D.. - (2023).