Comparing LSTM and Transformer for Video Depth Estimation

Rozhin Fani, Berke Gur

Araştırma sonucu: Kitap/Rapor/Konferans sürecindeki bölümKonferans katkısıbilirkişi

Özet

Accurate depth estimation from monocular video is critical for robotics applications such as simultaneous localization and mapping (SLAM) and navigation. Monocular depth estimation from video can be improved by incorporating temporal information across frames. The recently introduced sequence modeling techniques of recurrent long-short-term memory (LSTM) networks and Transformer architectures provide two potential approaches for aggregating temporal cues. This work presents a comparative study of using LSTM and Transformer modules for video depth prediction. The proposed depth pipeline extracts optical flow features between frames and passes them to either an LSTM or Transformer encoder before decoding into a depth map prediction. Compared to LSTM, the Transformer’s ability to capture long-range dependencies allows it to propagate information more effectively across long sequences. It is shown that the Transformer outperforms LSTM models by five- to sixfold in depth map estimation based on standard metrics. This analysis provides insights into the advantages of Transformer over recurrent LSTM models for aggregation of temporal signals in depth estimation and other similar sequence prediction tasks. The Transformer’s ability in aggregating motion across sequences holds promise for more robust spatial perception.

Orijinal dilİngilizce
Ana bilgisayar yayını başlığı7th EAI International Conference on Robotic Sensor Networks - EAI ROSENET 2023
EditörlerÖmer Melih Gül, Paolo Fiorini, Seifedine Nimer Kadry
YayınlayanSpringer Science and Business Media Deutschland GmbH
Sayfalar89-99
Sayfa sayısı11
ISBN (Basılı)9783031644948
DOI'lar
Yayın durumuYayınlanan - 2024
Etkinlik7th EAI International Conference on Robotics and Networks, ROSENET 2023 - Istanbul, Turkey
Süre: 15 Ara 202316 Ara 2023

Yayın serisi

AdıEAI/Springer Innovations in Communication and Computing
ISSN (Basılı)2522-8595
ISSN (Elektronik)2522-8609

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???7th EAI International Conference on Robotics and Networks, ROSENET 2023
Ülke/BölgeTurkey
ŞehirIstanbul
Periyot15/12/2316/12/23

Parmak izi

Comparing LSTM and Transformer for Video Depth Estimation' araştırma başlıklarına git. Birlikte benzersiz bir parmak izi oluştururlar.

Bundan alıntı yap