Please use this identifier to cite or link to this item: https://dspace.univ-ouargla.dz/jspui/handle/123456789/35043
Title: FIRST ORDER OPTIMIZATION METHODS FOR DEEP LEARNING.
Authors: BOUANANE, KHADRA
DOKKAR, BASMA
MEDDOUR, BOUTHAYNA
Keywords: Deep learning
optimization
first order optimization
Adam
CNN
U-Net
Issue Date: 2023
Publisher: UNIVERSITY OF KASDI MERBAH OUARGLA
Abstract: Deep learning has emerged as a transformative technology in various domains, ranging from computer vision to natural language processing. The success of deep learning models heavily relies on effective optimization algorithms. In this thesis, two main contributions are presented. In Contribution 1, which is a two-fold comparative study, we first explore the impact of various first-order optimization techniques on the learning process of U-Net for the task of Change Detection. Namely, Gradient descent with Momentum (Momentum GD), Nesterov Accelerated Gradient (NAG), Adaptive Gradient (AdaGrad), Root Mean Square Propagation optimizer (RMSProp), and the adaptive moment estimation optimizer (Adam). The results show that RMSProp, NAG, and AdaGrad reached the highest validationaccuracies: 0.976,0.978,and0.979with10−2, 10−3,and10−4 respectively,whileAdam was the fastest to converge and scored the lowest validation loss. Moreover, Adam scored the highest precision and F1 score across all learning rate values with 0.491 and 0.376 respectively. Nevertheless, we noticed that Adam’s performance could be significantly influenced by the data sparsity. In light of this hypothesis, the second part of Contribution 1 investigates the impact of sparsity on the performance of Adam optimizer. We compare different sparsity-level models, U-Net, DenseU-Net, and DenseNet using Adam optimizer for BCE and focal Tversky losses, on dense and sparse datasets for three ML tasks: Change detection, image segmentation, and object recognition. According to the obtained results, the Adam optimizer seems to be more sensitive to the model than the data sparsity. In Contribution 2, we propose a new method that aims to improve Adam’s performance. In this approach, we combine a simulated annealing strategy with a dynamic learning rate iii IV to overcome the generalization gap which characterizes adaptive methods. We assess the several variants of the proposed approach compared to Adam, stochastic Gradient Descent, and Adabound. For this purpose, a simple 3-layer CNN is trained on two datasets MNIST and CIFAR-10.
URI: https://dspace.univ-ouargla.dz/jspui/handle/123456789/35043
Appears in Collections:Département d'informatique et technologie de l'information - Master

Files in This Item:
File Description SizeFormat 
DOKKAR-MEDDOUR.pdf5,4 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.