Publication Details

Vision UFormer: Long-Range Monocular Absolute Depth Estimation

POLÁŠEK Tomáš, ČADÍK Martin, KELLER Yosi and BENEŠ Bedřich. Vision UFormer: Long-Range Monocular Absolute Depth Estimation. Computers and Graphics, vol. 111, no. 4, 2023, pp. 180-189. ISSN 0097-8493. Available from: https://www.sciencedirect.com/science/article/pii/S0097849323000262

Czech title

Vision UFormer: Absolutní Predikce Hloubek na Dlouhé Vzdálenosti

Type

journal article

Language

english

Authors

Polášek Tomáš, Ing. (DCGM FIT BUT)
Čadík Martin, doc. Ing., Ph.D. (DCGM FIT BUT)
Keller Yosi, prof. MSc., Ph.D. (BIU)
Beneš Bedřich, prof., Ph.D. (PU)

URL

https://www.sciencedirect.com/science/article/pii/S0097849323000262

Keywords

Absolute Depth Estimation, Monocular Depth Prediction, Long Range Distance, Transformer, UNet, Staged Training

Abstract

We introduce Vision UFormer (ViUT), a novel deep neural long-range monocular depth estimator. The input is an RGB image, and the output is an image that stores the absolute distance of the object in the scene as its per-pixel values. ViUT consists of a Transformer encoder and a ResNet decoder combined with UNet style of skip connections. It is trained on 1M images across ten datasets in a staged regime that starts with easier-to-predict data such as indoor photographs and continues to more complex long-range outdoor scenes. We show that ViUT provides comparable results for normalized relative distances and short-range classical datasets such as NYUv2 and KITTI. We further show that it successfully estimates of absolute long-range depth in meters. We validate ViUT on a wide variety of long-range scenes showing its high estimation capabilities with a relative improvement of up to 23%. Absolute depth estimation finds application in many areas, and we show its usability in image composition, range annotation, defocus, and scene reconstruction.

Published

2023

Pages

180-189

Journal

Computers and Graphics, vol. 111, no. 4, ISSN 0097-8493

Publisher

Elsevier Science

DOI

10.1016/j.cag.2023.02.003

UT WoS

000954860700001

EID Scopus

2-s2.0-85149382691

BibTeX

@ARTICLE{FITPUB12743,
   author = "Tom\'{a}\v{s} Pol\'{a}\v{s}ek and Martin \v{C}ad\'{i}k and Yosi Keller and Bed\v{r}ich Bene\v{s}",
   title = "Vision UFormer: Long-Range Monocular Absolute Depth Estimation",
   pages = "180--189",
   journal = "Computers and Graphics",
   volume = 111,
   number = 4,
   year = 2023,
   ISSN = "0097-8493",
   doi = "10.1016/j.cag.2023.02.003",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/12743"
}