We present CalibBEV, a novel Bird’s Eye View (BEV) alignment approach for LiDAR-camera calibration. Our method unifies LiDAR and camera data into a shared 3D spatial representation, enabling accurate and robust cross-modal calibration. CalibBEV extracts sensor-wise BEV features from each modality using domain-specific architectures and estimates the calibration matrix through a two-step alignment process. First, we perform an implicit alignment by regressing a coarse calibration matrix directly from the BEV features. To ease this alignment, we enforce semantic consistency between BEV representations across modalities using a contrastive loss inspired by CLIP, guiding both networks toward a unified feature space. In the second step, we leverage our BEV formulation to explicitly align the features of one modality with the other, refining the initial coarse estimate into a final, more accurate calibration matrix. CalibBEV significantly outperforms prior point-to-pixel matching methods, achieving state-of-the-art calibration accuracy. On the KITTI and nuScenes benchmarks, our method reduces the Relative Rotation Error (RRE) by 51% and 68%, and the Relative Translation Error (RTE) by 80% and 91%, respectively, compared to previous methods.

CalibBEV: LiDAR-Camera Calibration via BEV Alignment / D'Addeo, F., Cipelli, L., Cardace, A., Ghelfi, E., Zinelli, A., Bertozzi, M.. - (2026), pp. 4345-4354. (IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2026 Tucson, AZ, USA ) [10.1109/WACV61042.2026.00423].

CalibBEV: LiDAR-Camera Calibration via BEV Alignment

D'Addeo F.
;
Cipelli L.;Zinelli A.;Bertozzi M.
Supervision
2026-01-01

Abstract

We present CalibBEV, a novel Bird’s Eye View (BEV) alignment approach for LiDAR-camera calibration. Our method unifies LiDAR and camera data into a shared 3D spatial representation, enabling accurate and robust cross-modal calibration. CalibBEV extracts sensor-wise BEV features from each modality using domain-specific architectures and estimates the calibration matrix through a two-step alignment process. First, we perform an implicit alignment by regressing a coarse calibration matrix directly from the BEV features. To ease this alignment, we enforce semantic consistency between BEV representations across modalities using a contrastive loss inspired by CLIP, guiding both networks toward a unified feature space. In the second step, we leverage our BEV formulation to explicitly align the features of one modality with the other, refining the initial coarse estimate into a final, more accurate calibration matrix. CalibBEV significantly outperforms prior point-to-pixel matching methods, achieving state-of-the-art calibration accuracy. On the KITTI and nuScenes benchmarks, our method reduces the Relative Rotation Error (RRE) by 51% and 68%, and the Relative Translation Error (RTE) by 80% and 91%, respectively, compared to previous methods.
2026
979-8-3315-5511-5
CalibBEV: LiDAR-Camera Calibration via BEV Alignment / D'Addeo, F., Cipelli, L., Cardace, A., Ghelfi, E., Zinelli, A., Bertozzi, M.. - (2026), pp. 4345-4354. (IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2026 Tucson, AZ, USA ) [10.1109/WACV61042.2026.00423].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11381/3064274
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact