Comparison of the Stochastic Gradient Descent Based Optimization Techniques


Yazan E., TALU M. F.

2017 International Artificial Intelligence and Data Processing Symposium (IDAP), Malatya, Türkiye, 16 - 17 Eylül 2017 identifier identifier

Özet

The stochastic gradual descent method (SGD) is a popular optimization technique based on updating each theta(k) parameter in the partial derivative J(theta)/partial derivative theta(k) direction to minimize / maximize the (J theta) cost function. This technique is frequently used in current artificial learning methods such as convolutional learning and automatic encoders. In this study, five different approaches (Momentum, Adagrad, Adadelta, Rmsprop ve Adam) based on SDA used in updating the theta parameters were investigated. By selecting specific test functions, the advantages and disadvantages of each approach are compared with each other in terms of the number of oscillations, the parameter update rate and the minimum cost reached. The comparison results are shown graphically.