Current state-of-the-art two-stage models on instance segmentation task suffer from several types of imbalances. In this paper, we address the Intersection over the Union (IoU) distribution imbalance of positive input Regions of Interest (RoIs) during the training of the second stage. Our Self-Balanced R-CNN (SBR-CNN), an evolved version of the Hybrid Task Cascade (HTC) model, brings brand new loop mechanisms of bounding box and mask refinements. With an improved Generic RoI Extraction (GRoIE), we also address the feature-level imbalance at the Feature Pyramid Network (FPN) level, originated by a non-uniform integration between low- and high-level features from the backbone layers. In addition, the redesign of the architecture heads toward a fully convolutional approach with FCC further reduces the number of parameters and obtains more clues to the connection between the task to solve and the layers used. Moreover, our SBR-CNN model shows the same or even better improvements if adopted in conjunction with other state-of-the-art models. In fact, with a lightweight ResNet-50 as backbone, evaluated on COCO minival 2017 dataset, our model reaches 45.3% and 41.5% AP for object detection and instance segmentation, with 12 epochs and without extra tricks. The code is available at https://github.com/IMPLabUniPr/mmdetection/tree/sbr_cnn.

Self-Balanced R-CNN for instance segmentation / Rossi, L.; Karimi, A.; Prati, A.. - In: JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION. - ISSN 1047-3203. - 87:(2022). [10.1016/j.jvcir.2022.103595]

Self-Balanced R-CNN for instance segmentation

Rossi L.;Karimi A.;Prati A.
2022-01-01

Abstract

Current state-of-the-art two-stage models on instance segmentation task suffer from several types of imbalances. In this paper, we address the Intersection over the Union (IoU) distribution imbalance of positive input Regions of Interest (RoIs) during the training of the second stage. Our Self-Balanced R-CNN (SBR-CNN), an evolved version of the Hybrid Task Cascade (HTC) model, brings brand new loop mechanisms of bounding box and mask refinements. With an improved Generic RoI Extraction (GRoIE), we also address the feature-level imbalance at the Feature Pyramid Network (FPN) level, originated by a non-uniform integration between low- and high-level features from the backbone layers. In addition, the redesign of the architecture heads toward a fully convolutional approach with FCC further reduces the number of parameters and obtains more clues to the connection between the task to solve and the layers used. Moreover, our SBR-CNN model shows the same or even better improvements if adopted in conjunction with other state-of-the-art models. In fact, with a lightweight ResNet-50 as backbone, evaluated on COCO minival 2017 dataset, our model reaches 45.3% and 41.5% AP for object detection and instance segmentation, with 12 epochs and without extra tricks. The code is available at https://github.com/IMPLabUniPr/mmdetection/tree/sbr_cnn.
2022
Self-Balanced R-CNN for instance segmentation / Rossi, L.; Karimi, A.; Prati, A.. - In: JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION. - ISSN 1047-3203. - 87:(2022). [10.1016/j.jvcir.2022.103595]
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S1047320322001201-main.pdf

solo utenti autorizzati

Tipologia: Versione (PDF) editoriale
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 2.73 MB
Formato Adobe PDF
2.73 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
JVCI-21-2051_R1_postprint.pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: Creative commons
Dimensione 1.73 MB
Formato Adobe PDF
1.73 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11381/2973952
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 4
social impact