In this paper, we propose an original CNN training strategy that brings together ideas from both dropout-like regularization methods and solutions that learn discriminative features. We propose a dropping criterion that, differently from dropout and its variants, is deterministic rather than random. It grounds on the empirical evidence that feature descriptors with larger L2-norm and highly-active nodes are strongly correlated to confident class predictions. Thus, our criterion guides towards dropping a percentage of the most active nodes of the descriptors, proportionally to the estimated class probability. We simultaneously train a per-sample scaling factor to balance the expected output across training and inference. This further allows us to keep high the descriptor's L2-norm, which we show enforces confident predictions. The combination of these two strategies resulted in our “Probability Guided Maxout” solution that acts as a training regularizer. We prove the above behaviors by reporting extensive image classification results on the CIFAR10, CIFAR100, and Caltech256 datasets. Code is available at https://github.com/clferrari/probability-guided-maxout.

Probability guided maxout / Ferrari, C.; Berretti, S.; Del Bimbo, A.. - (2020), pp. 9412994.6517-9412994.6523. (Intervento presentato al convegno 25th International Conference on Pattern Recognition, ICPR 2020 tenutosi a ita nel 2021) [10.1109/ICPR48806.2021.9412994].

Probability guided maxout

Ferrari C.
;
2020-01-01

Abstract

In this paper, we propose an original CNN training strategy that brings together ideas from both dropout-like regularization methods and solutions that learn discriminative features. We propose a dropping criterion that, differently from dropout and its variants, is deterministic rather than random. It grounds on the empirical evidence that feature descriptors with larger L2-norm and highly-active nodes are strongly correlated to confident class predictions. Thus, our criterion guides towards dropping a percentage of the most active nodes of the descriptors, proportionally to the estimated class probability. We simultaneously train a per-sample scaling factor to balance the expected output across training and inference. This further allows us to keep high the descriptor's L2-norm, which we show enforces confident predictions. The combination of these two strategies resulted in our “Probability Guided Maxout” solution that acts as a training regularizer. We prove the above behaviors by reporting extensive image classification results on the CIFAR10, CIFAR100, and Caltech256 datasets. Code is available at https://github.com/clferrari/probability-guided-maxout.
2020
978-1-7281-8808-9
Probability guided maxout / Ferrari, C.; Berretti, S.; Del Bimbo, A.. - (2020), pp. 9412994.6517-9412994.6523. (Intervento presentato al convegno 25th International Conference on Pattern Recognition, ICPR 2020 tenutosi a ita nel 2021) [10.1109/ICPR48806.2021.9412994].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11381/2905597
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact