On the hyperparameters setting for first order stochastic optimization methods in machine learning

Trombini, Ilaria

Finite-sum problems may be regarded as sample average approximations in stochastic optimization problems. The number of components in the finite sum term is typically quite large, making the computation of its gradient impractical. Consequently, the class of stochastic gradient methods is a widely accepted approach for addressing finite sum problems. It is well established that an appropriate strategy to select the hyperparameters of these methods (i.e., the set of predefined parameters), particularly the learning rate and mini-batch size, is crucial to ensure convergence properties and achieve satisfactory practical performance avoiding expensive trial and error procedures to determine appropriate values. In this thesis, we develop several novel methods that leverage different techniques. Specifically, we introduce new algorithms based on adaptive stepsize selection rules, borrowed from the deterministic framework. By employing a stochastic version of the Barzilai-Borwein rules to dynamically select the stepsize by exploiting Ritz-type values, we propose the so-called AA-R-BB method. Furthermore, by combining a recent approach to select the learning rate as a diagonal matrix with an adaptive rule to set the mini-batch size, we develop ASM-DIAG method. A different approach is develop in LISA method, based on a non-monotone line-search technique, to adaptively select the learning rate, and on controlling the variance of the stochastic gradient by means of the increase of the mini-batch size. We also explore a practical implementation of LISA, named Deep-LISA, providing a predetermined procedure for selecting the mini-batch size aimed to reduce the computational cost and to manage hierarchical memory resources, also in presence of hardware accelerators. Moreover, in the class of spectral stochastic methods, we investigate an additional sampling technique, which plays a crucial role in accepting the trial iterate and regulating the increase in the mini-batch size. Thanks to this we propose a spectral iterative method, named LSNM-BB, for the minimization of finite or infinite sums of functions. Another significant aspect we address is the use of the proximal methods in scenarios where the regularization terms are not differentiable. For all the proposed methods, we provide the results of numerical experiments for binary or multiclass classification, comparing them against the state-of-the art techniques in the literature. The results highlight that the proposed methods are competitive and robust with respect to hyperparameters setting.

On the hyperparameters setting for first order stochastic optimization methods in machine learning / Trombini, I.. - (2025 Jan 21).