Edge devices friendly multi-human parsing with lightweight encoding and multi-scale self-attention based decoding

Md Imran Hosen, Tarkan Aydin

Research output: Contribution to journalArticlepeer-review

Abstract

Multi-human parsing has received considerable research attention in recent years. Deep learning-based Multi-human parsing methods demonstrated promising results. In reality, most methods suffer while running on edge devices due to their extensive network architecture and low inference speed. Moreover, the inadequacies in modeling long-range feature dependencies have led to suboptimal representations of discriminative features across semantic classes. To address these challenges and facilitate real-time implementation on edge devices, we design a deep yet lightweight Encoder and a Multi-Scale Self-Attention based Decoder to capture long-range dependencies and spatial relationships. Furthermore, we have optimized our model through half-precision quantization, enhancing efficiency for edge devices. Experiments on publicly available Crowd Instance-level Human Parsing (CIHP) and Look into Person (LIP) datasets show the efficacy of our framework to parse multi-human with high inference speed at 55.6 FPS. Additionally, real-world testing on Jetson Nano edge devices showcases competitive performance. An extensive ablation study on different modules validates our network.

Original languageEnglish
Pages (from-to)25027-25047
Number of pages21
JournalMultimedia Tools and Applications
Volume84
Issue number22
DOIs
Publication statusPublished - Jul 2025

Keywords

  • Edge devices
  • Inverted residual block
  • Multi-human parsing
  • Self-Attention

Fingerprint

Dive into the research topics of 'Edge devices friendly multi-human parsing with lightweight encoding and multi-scale self-attention based decoding'. Together they form a unique fingerprint.

Cite this