In this paper, we propose an original CNN training strategy that brings together ideas from both dropout-like regularization methods and solutions that learn discriminative features. We propose a dropping criterion that, differently from dropout and its variants, is deterministic rather than random. It grounds on the empirical evidence that feature descriptors with larger L2-norm and highly-active nodes are strongly correlated to confident class predictions. Thus, our criterion guides towards dropping a percentage of the most active nodes of the descriptors, proportionally to the estimated class probability. We simultaneously train a per-sample scaling factor to balance the expected output across training and inference. This further allows us to keep high the descriptor's L2-norm, which we show enforces confident predictions. The combination of these two strategies resulted in our “Probability Guided Maxout” solution that acts as a training regularizer. We prove the above behaviors by reporting extensive image classification results on the CIFAR10, CIFAR100, and Caltech256 datasets. Code is available at https://github.com/clferrari/probability-guided-maxout.
Probability guided maxout / Ferrari, C.; Berretti, S.; Del Bimbo, A.. - (2020), pp. 9412994.6517-9412994.6523. (Intervento presentato al convegno 25th International Conference on Pattern Recognition, ICPR 2020 tenutosi a ita nel 2021) [10.1109/ICPR48806.2021.9412994].
Probability guided maxout
Ferrari C.
;
2020-01-01
Abstract
In this paper, we propose an original CNN training strategy that brings together ideas from both dropout-like regularization methods and solutions that learn discriminative features. We propose a dropping criterion that, differently from dropout and its variants, is deterministic rather than random. It grounds on the empirical evidence that feature descriptors with larger L2-norm and highly-active nodes are strongly correlated to confident class predictions. Thus, our criterion guides towards dropping a percentage of the most active nodes of the descriptors, proportionally to the estimated class probability. We simultaneously train a per-sample scaling factor to balance the expected output across training and inference. This further allows us to keep high the descriptor's L2-norm, which we show enforces confident predictions. The combination of these two strategies resulted in our “Probability Guided Maxout” solution that acts as a training regularizer. We prove the above behaviors by reporting extensive image classification results on the CIFAR10, CIFAR100, and Caltech256 datasets. Code is available at https://github.com/clferrari/probability-guided-maxout.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.