TY - GEN
T1 - Comparatively Studying Modern Optimizers Capability for Fitting Vision Transformers
AU - Abdullah, Abdullah Nazhat
AU - Aydin, Tarkan
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
PY - 2024
Y1 - 2024
N2 - The Transformer architectures have been achieving great strides in both research and industry, garnering high adoption due to their versatility and generality. These qualities, combined with the availability of internet-scale datasets, open the path to constructing deep learning systems that can target many modalities and several tasks within each modality. Throughout the years, many optimization algorithms have been proposed and utilized in fitting Deep Learning models. Although many comparative assessments were made that investigated analyzing and selecting the best optimizer to fit architectures prior to Transformers, the literature lacks such extensive assessments in relation to optimizing Transformer-based deep learning models. In this paper, we investigated modern and recently introduced deep learning optimizers and applied the comparative assessment to multiple Transformer architectures implemented for the task of image classification. It was discovered experimentally by our comparative study that the novel optimizer LION provided the best performance on the target task and datasets, proving that the algorithmic design of optimizers can compete with and surpass handcrafted optimization schemes that are normally used in fitting Transformer architectures.
AB - The Transformer architectures have been achieving great strides in both research and industry, garnering high adoption due to their versatility and generality. These qualities, combined with the availability of internet-scale datasets, open the path to constructing deep learning systems that can target many modalities and several tasks within each modality. Throughout the years, many optimization algorithms have been proposed and utilized in fitting Deep Learning models. Although many comparative assessments were made that investigated analyzing and selecting the best optimizer to fit architectures prior to Transformers, the literature lacks such extensive assessments in relation to optimizing Transformer-based deep learning models. In this paper, we investigated modern and recently introduced deep learning optimizers and applied the comparative assessment to multiple Transformer architectures implemented for the task of image classification. It was discovered experimentally by our comparative study that the novel optimizer LION provided the best performance on the target task and datasets, proving that the algorithmic design of optimizers can compete with and surpass handcrafted optimization schemes that are normally used in fitting Transformer architectures.
KW - Computer vision
KW - Optimization
KW - Transformers
UR - http://www.scopus.com/inward/record.url?scp=85202288487&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-64495-5_6
DO - 10.1007/978-3-031-64495-5_6
M3 - Conference contribution
AN - SCOPUS:85202288487
SN - 9783031644948
T3 - EAI/Springer Innovations in Communication and Computing
SP - 77
EP - 87
BT - 7th EAI International Conference on Robotic Sensor Networks - EAI ROSENET 2023
A2 - Gül, Ömer Melih
A2 - Fiorini, Paolo
A2 - Kadry, Seifedine Nimer
PB - Springer Science and Business Media Deutschland GmbH
T2 - 7th EAI International Conference on Robotics and Networks, ROSENET 2023
Y2 - 15 December 2023 through 16 December 2023
ER -