Category-level 6D object pose estimation aims at determining the pose of an object of a given category. Most current state-of-the-art methods require a significant amount of real training data to supervise their models. Moreover, annotating the 6D pose is very time consuming, error-prone, and it does not scale well to a large amount of object classes. Therefore, a handful of methods have recently been proposed to use unlabelled data to establish weak supervision. In this letter we propose a self-supervised method that leverages the 2D optical flow as a proxy for supervising the 6D pose. To this purpose, we estimate the 2D optical flow between consecutive frames based on the pose estimation. Then, we harness an off-the-shelf optical flow method to enable weak supervision using a 2D-3D optical flow based consistency loss. Experiments show that our approach for self-supervised learning yields state-of-the-art performance on the NOCS benchmark, and it reaches comparable results with some fully-supervised approaches.
Self-Supervised Category-level 6D Object Pose Estimation With Optical Flow Consistency / Zaccaria, M.; Manhardt, F.; Di, Y.; Tombari, F.; Aleotti, J.; Giorgini, M.. - In: IEEE ROBOTICS AND AUTOMATION LETTERS. - ISSN 2377-3766. - 8:5(2023), pp. 1-8. [10.1109/LRA.2023.3254463]
Self-Supervised Category-level 6D Object Pose Estimation With Optical Flow Consistency
Zaccaria M.
;Aleotti J.;Giorgini M.
2023-01-01
Abstract
Category-level 6D object pose estimation aims at determining the pose of an object of a given category. Most current state-of-the-art methods require a significant amount of real training data to supervise their models. Moreover, annotating the 6D pose is very time consuming, error-prone, and it does not scale well to a large amount of object classes. Therefore, a handful of methods have recently been proposed to use unlabelled data to establish weak supervision. In this letter we propose a self-supervised method that leverages the 2D optical flow as a proxy for supervising the 6D pose. To this purpose, we estimate the 2D optical flow between consecutive frames based on the pose estimation. Then, we harness an off-the-shelf optical flow method to enable weak supervision using a 2D-3D optical flow based consistency loss. Experiments show that our approach for self-supervised learning yields state-of-the-art performance on the NOCS benchmark, and it reaches comparable results with some fully-supervised approaches.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.