Toward Automatic Streetside Building Identification With an Integrated YOLO Model for Building Detection and a Vision Transformer for Identification

Ossama Krawi, Lavdie Rada

Research output: Contribution to journalArticlepeer-review

Abstract

In an era of widespread digital imagery and advancing machine vision technologies, automated methods to precisely locate photographed buildings are crucial across various sectors. This research initiates the development of an automated Streetside Building Identification System (SBIS). Leveraging the comprehensive coverage of Google Street View images across major cities worldwide, the research integrates a YOLO model for building detection with a Vision Transformer (ViT) model for building identification, supported by Transfer Learning. This innovative approach aims to pinpoint exact building coordinates in urban environments, overcoming challenges associated with insufficient supporting data. Utilizing Google Street View datasets that cover entire urban landscapes, the proposed method offers efficiency and scalability, simplifying data acquisition and avoiding logistical complexities of manual interaction for targeted collections. Furthermore, it ensures a more inclusive representation of diverse urban environments, recognizing buildings of every shape and architectural style. The system can scan areas covered by Google Street View even without commercial or business data, navigating through limited information scenarios and marking a significant progression from previous studies. The initial system version provides insights into its implementation and discusses potential improvements. Tests against privately collected images show a current accuracy of 94.23%, offering a promising foundation for further refinements. The primary objective is to develop an automated solution capable of creating a comprehensive database for building recognition tasks, eliminating the laborious manual search process for extensive datasets.

Original languageEnglish
Pages (from-to)52901-52911
Number of pages11
JournalIEEE Access
Volume13
DOIs
Publication statusPublished - 2025

Keywords

  • Building identification
  • Vision Transformer
  • YOLO
  • computer vision
  • deep learning

Fingerprint

Dive into the research topics of 'Toward Automatic Streetside Building Identification With an Integrated YOLO Model for Building Detection and a Vision Transformer for Identification'. Together they form a unique fingerprint.

Cite this