TY - JOUR
T1 - PDS-Net
T2 - A novel point and depth-wise separable convolution for real-time object detection
AU - Junayed, Masum Shah
AU - Islam, Md Baharul
AU - Imani, Hassan
AU - Aydin, Tarkan
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature.
PY - 2022/6
Y1 - 2022/6
N2 - Numerous recent object detectors and classifiers have shown acceptable performance in recent years by using convolutional neural networks and other efficient architectures. However, most of them continue to encounter difficulties like overfitting, increased computational costs, and low efficiency and performance in real-time scenarios. This paper proposes a new lightweight model for detecting and classifying objects in images. This model presents a backbone for extracting in-depth features and a spatial feature pyramid network (SFPN) for accurately detecting and categorizing objects. The proposed backbone uses point-wise separable (PWS) and depth-wise separable convolutions, which are more efficient than standard convolution. The PWS convolution utilizes a residual shortcut link to reduce computation time. We also propose a SFPN that comprises concatenation, transformer encoder–decoder, and feature fusion modules, which enables the simultaneous processing of multi-scale features, the extraction of low-level characteristics, and the creation of a pyramid of features to increase the effectiveness of the proposed model. The proposed model outperforms all of the existing backbones for object detection and classification in three publicly accessible datasets: PASCAL VOC 2007, PASCAL VOC 2012, and MS-COCO. Our extensive experiments show that the proposed model outperforms state-of-the-art detectors, with mAP improvements of 2.4% and 2.5% on VOC 2007, 3.0% and 2.6% on VOC 2012, and 2.5% and 3.6% on MS-COCO in the small and large sizes of the images, respectively. In the MS-COCO dataset, our model achieves FPS of 39.4 and 33.1 in a single GPU for the small (320 × 320 ) and large (512 × 512 ) sizes of the images, respectively, which shows that our method can run in real-time.
AB - Numerous recent object detectors and classifiers have shown acceptable performance in recent years by using convolutional neural networks and other efficient architectures. However, most of them continue to encounter difficulties like overfitting, increased computational costs, and low efficiency and performance in real-time scenarios. This paper proposes a new lightweight model for detecting and classifying objects in images. This model presents a backbone for extracting in-depth features and a spatial feature pyramid network (SFPN) for accurately detecting and categorizing objects. The proposed backbone uses point-wise separable (PWS) and depth-wise separable convolutions, which are more efficient than standard convolution. The PWS convolution utilizes a residual shortcut link to reduce computation time. We also propose a SFPN that comprises concatenation, transformer encoder–decoder, and feature fusion modules, which enables the simultaneous processing of multi-scale features, the extraction of low-level characteristics, and the creation of a pyramid of features to increase the effectiveness of the proposed model. The proposed model outperforms all of the existing backbones for object detection and classification in three publicly accessible datasets: PASCAL VOC 2007, PASCAL VOC 2012, and MS-COCO. Our extensive experiments show that the proposed model outperforms state-of-the-art detectors, with mAP improvements of 2.4% and 2.5% on VOC 2007, 3.0% and 2.6% on VOC 2012, and 2.5% and 3.6% on MS-COCO in the small and large sizes of the images, respectively. In the MS-COCO dataset, our model achieves FPS of 39.4 and 33.1 in a single GPU for the small (320 × 320 ) and large (512 × 512 ) sizes of the images, respectively, which shows that our method can run in real-time.
KW - Classification
KW - Computer vision
KW - DWS Convolution
KW - Object detection
KW - PWS convolution
KW - Transformer encoder–decoder
UR - http://www.scopus.com/inward/record.url?scp=85127490589&partnerID=8YFLogxK
U2 - 10.1007/s13735-022-00229-6
DO - 10.1007/s13735-022-00229-6
M3 - Article
AN - SCOPUS:85127490589
SN - 2192-6611
VL - 11
SP - 171
EP - 188
JO - International Journal of Multimedia Information Retrieval
JF - International Journal of Multimedia Information Retrieval
IS - 2
ER -