TY - GEN
T1 - Hybrid CNN+Transformer for Diabetic Retinopathy Recognition and Grading
AU - Sadeghzadeh, Arezoo
AU - Junayed, Masum Shah
AU - Aydin, Tarkan
AU - Islam, Md Baharul
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Diabetic retinopathy (DR) is a cause of blindness when it is not cured timely. Therefore, automatic DR detection and grading systems play a significant role in early diagnosis and treatment. However, the accuracy of the existing computer-aided systems is still insufficient for clinical applications and they need large-scale training datasets for obtaining good performance. This paper proposes a hybrid CNN+Transformer DR recognition and grading system to competitively improve performance even when directly trained on small datasets. Firstly, a deep CNN-based EfficientNet-B0 backbone is used as the feature extractor. Then, global dependencies are drawn between the input and output by employing a Transformer encoder-decoder (TE-TD), interleaved with Multi-Head Self Attentions (MHSA) for feature encoding. It is followed by a Residual Spatial Module (RSM) to improve the performance of the model further while stabilizing the training. A prediction feed-forward network (PFFN) is used as a classifier. The effectiveness of different modules on the performance of the system and the superiority of the combined CNN and Transformer over plain individual architectures are all investigated through comprehensive ablation studies. Our approach attains a high generalization by obtaining state-of-the-art performance in both recognition and grading on five different benchmark datasets, i.e., EyePACS, APTOS, DDR, Messidor-l, and Messidor-2.
AB - Diabetic retinopathy (DR) is a cause of blindness when it is not cured timely. Therefore, automatic DR detection and grading systems play a significant role in early diagnosis and treatment. However, the accuracy of the existing computer-aided systems is still insufficient for clinical applications and they need large-scale training datasets for obtaining good performance. This paper proposes a hybrid CNN+Transformer DR recognition and grading system to competitively improve performance even when directly trained on small datasets. Firstly, a deep CNN-based EfficientNet-B0 backbone is used as the feature extractor. Then, global dependencies are drawn between the input and output by employing a Transformer encoder-decoder (TE-TD), interleaved with Multi-Head Self Attentions (MHSA) for feature encoding. It is followed by a Residual Spatial Module (RSM) to improve the performance of the model further while stabilizing the training. A prediction feed-forward network (PFFN) is used as a classifier. The effectiveness of different modules on the performance of the system and the superiority of the combined CNN and Transformer over plain individual architectures are all investigated through comprehensive ablation studies. Our approach attains a high generalization by obtaining state-of-the-art performance in both recognition and grading on five different benchmark datasets, i.e., EyePACS, APTOS, DDR, Messidor-l, and Messidor-2.
KW - Diabetics Retinopathy
KW - EfficientNet-B0
KW - Fundus images
KW - Severity grading
KW - Transformers
UR - http://www.scopus.com/inward/record.url?scp=85178254154&partnerID=8YFLogxK
U2 - 10.1109/ASYU58738.2023.10296789
DO - 10.1109/ASYU58738.2023.10296789
M3 - Conference contribution
AN - SCOPUS:85178254154
T3 - 2023 Innovations in Intelligent Systems and Applications Conference, ASYU 2023
BT - 2023 Innovations in Intelligent Systems and Applications Conference, ASYU 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 Innovations in Intelligent Systems and Applications Conference, ASYU 2023
Y2 - 11 October 2023 through 13 October 2023
ER -