Edge devices friendly multi-human parsing with lightweight encoding and multi-scale self-attention based decoding

Md Imran Hosen, Tarkan Aydin

Araştırma sonucu: Dergi katkısıMakalebilirkişi

Özet

Multi-human parsing has received considerable research attention in recent years. Deep learning-based Multi-human parsing methods demonstrated promising results. In reality, most methods suffer while running on edge devices due to their extensive network architecture and low inference speed. Moreover, the inadequacies in modeling long-range feature dependencies have led to suboptimal representations of discriminative features across semantic classes. To address these challenges and facilitate real-time implementation on edge devices, we design a deep yet lightweight Encoder and a Multi-Scale Self-Attention based Decoder to capture long-range dependencies and spatial relationships. Furthermore, we have optimized our model through half-precision quantization, enhancing efficiency for edge devices. Experiments on publicly available Crowd Instance-level Human Parsing (CIHP) and Look into Person (LIP) datasets show the efficacy of our framework to parse multi-human with high inference speed at 55.6 FPS. Additionally, real-world testing on Jetson Nano edge devices showcases competitive performance. An extensive ablation study on different modules validates our network.

Orijinal dilİngilizce
DergiMultimedia Tools and Applications
DOI'lar
Yayın durumuKabul Edildi/Basımda - 2024
Harici olarak yayınlandıEvet

Parmak izi

Edge devices friendly multi-human parsing with lightweight encoding and multi-scale self-attention based decoding' araştırma başlıklarına git. Birlikte benzersiz bir parmak izi oluştururlar.

Bundan alıntı yap