Probability guided maxout

IRIS

In this paper, we propose an original CNN training strategy that brings together ideas from both dropout-like regularization methods and solutions that learn discriminative features. We propose a dropping criterion that, differently from dropout and its variants, is deterministic rather than random. It grounds on the empirical evidence that feature descriptors with larger L2-norm and highly-active nodes are strongly correlated to confident class predictions. Thus, our criterion guides towards dropping a percentage of the most active nodes of the descriptors, proportionally to the estimated class probability. We simultaneously train a per-sample scaling factor to balance the expected output across training and inference. This further allows us to keep high the descriptor's L2-norm, which we show enforces confident predictions. The combination of these two strategies resulted in our “Probability Guided Maxout” solution that acts as a training regularizer. We prove the above behaviors by reporting extensive image classification results on the CIFAR10, CIFAR100, and Caltech256 datasets. Code is available at https://github.com/clferrari/probability-guided-maxout.

Probability guided maxout / Ferrari, C.; Berretti, S.; Del Bimbo, A.. - (2020), pp. 9412994.6517-9412994.6523. (Intervento presentato al convegno 25th International Conference on Pattern Recognition, ICPR 2020 tenutosi a ita nel 2021) [10.1109/ICPR48806.2021.9412994].

Probability guided maxout

Ferrari C.;Berretti S.;Del Bimbo A.

2020-01-01

Abstract

In this paper, we propose an original CNN training strategy that brings together ideas from both dropout-like regularization methods and solutions that learn discriminative features. We propose a dropping criterion that, differently from dropout and its variants, is deterministic rather than random. It grounds on the empirical evidence that feature descriptors with larger L2-norm and highly-active nodes are strongly correlated to confident class predictions. Thus, our criterion guides towards dropping a percentage of the most active nodes of the descriptors, proportionally to the estimated class probability. We simultaneously train a per-sample scaling factor to balance the expected output across training and inference. This further allows us to keep high the descriptor's L2-norm, which we show enforces confident predictions. The combination of these two strategies resulted in our “Probability Guided Maxout” solution that acts as a training regularizer. We prove the above behaviors by reporting extensive image classification results on the CIFAR10, CIFAR100, and Caltech256 datasets. Code is available at https://github.com/clferrari/probability-guided-maxout.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2020
			
	Codice ISBN
	
				978-1-7281-8808-9
			
	Citazione
	
				Probability guided maxout / Ferrari, C.; Berretti, S.; Del Bimbo, A.. - (2020), pp. 9412994.6517-9412994.6523. (Intervento presentato al  convegno 25th International Conference on Pattern Recognition, ICPR 2020 tenutosi a ita nel 2021) [10.1109/ICPR48806.2021.9412994].
			
	Appare nelle tipologie:
	
				4.1b Atto convegno Volume

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11381/2905597

Citazioni

ND

0

0

social impact