The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Propomos uma nova estrutura para estimar informações de profundidade a partir de uma única imagem. Nossa estrutura é relativamente pequena e simples, empregando uma arquitetura de dois estágios: uma rede residual e uma rede decodificadora simples. Nossa rede residual neste artigo é uma remodelação da arquitetura ResNet-50 original, que consiste em apenas trinta e oito camadas de convolução no bloco residual seguidas por um par de duas camadas e amostragem ascendente. Enquanto a rede decodificadora simples, pilha de cinco camadas de convolução, aceita que a profundidade inicial seja refinada como a profundidade de saída final. Durante o treinamento, monitoramos o comportamento de perda e ajustamos o hiperparâmetro da taxa de aprendizagem para melhorar o desempenho. Além disso, em vez de usar uma única perda comum em pixels, também calculamos a perda com base na direção do gradiente e na similaridade de sua estrutura. Essa configuração em nossa rede pode reduzir significativamente o número de parâmetros de rede e, ao mesmo tempo, obter um mapa de profundidade de imagem mais preciso. O desempenho da nossa abordagem foi avaliado através da realização de comparações quantitativas e qualitativas com vários métodos anteriores relacionados nos conjuntos de dados públicos da NYU e KITTI.
Andi HENDRA
Toyohashi University of Technology
Yasushi KANAZAWA
Toyohashi University of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Andi HENDRA, Yasushi KANAZAWA, "Smaller Residual Network for Single Image Depth Estimation" in IEICE TRANSACTIONS on Information,
vol. E104-D, no. 11, pp. 1992-2001, November 2021, doi: 10.1587/transinf.2021EDP7076.
Abstract: We propose a new framework for estimating depth information from a single image. Our framework is relatively small and straightforward by employing a two-stage architecture: a residual network and a simple decoder network. Our residual network in this paper is a remodeled of the original ResNet-50 architecture, which consists of only thirty-eight convolution layers in the residual block following by pair of two up-sampling and layers. While the simple decoder network, stack of five convolution layers, accepts the initial depth to be refined as the final output depth. During training, we monitor the loss behavior and adjust the learning rate hyperparameter in order to improve the performance. Furthermore, instead of using a single common pixel-wise loss, we also compute loss based on gradient-direction, and their structure similarity. This setting in our network can significantly reduce the number of network parameters, and simultaneously get a more accurate image depth map. The performance of our approach has been evaluated by conducting both quantitative and qualitative comparisons with several prior related methods on the publicly NYU and KITTI datasets.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2021EDP7076/_p
Copiar
@ARTICLE{e104-d_11_1992,
author={Andi HENDRA, Yasushi KANAZAWA, },
journal={IEICE TRANSACTIONS on Information},
title={Smaller Residual Network for Single Image Depth Estimation},
year={2021},
volume={E104-D},
number={11},
pages={1992-2001},
abstract={We propose a new framework for estimating depth information from a single image. Our framework is relatively small and straightforward by employing a two-stage architecture: a residual network and a simple decoder network. Our residual network in this paper is a remodeled of the original ResNet-50 architecture, which consists of only thirty-eight convolution layers in the residual block following by pair of two up-sampling and layers. While the simple decoder network, stack of five convolution layers, accepts the initial depth to be refined as the final output depth. During training, we monitor the loss behavior and adjust the learning rate hyperparameter in order to improve the performance. Furthermore, instead of using a single common pixel-wise loss, we also compute loss based on gradient-direction, and their structure similarity. This setting in our network can significantly reduce the number of network parameters, and simultaneously get a more accurate image depth map. The performance of our approach has been evaluated by conducting both quantitative and qualitative comparisons with several prior related methods on the publicly NYU and KITTI datasets.},
keywords={},
doi={10.1587/transinf.2021EDP7076},
ISSN={1745-1361},
month={November},}
Copiar
TY - JOUR
TI - Smaller Residual Network for Single Image Depth Estimation
T2 - IEICE TRANSACTIONS on Information
SP - 1992
EP - 2001
AU - Andi HENDRA
AU - Yasushi KANAZAWA
PY - 2021
DO - 10.1587/transinf.2021EDP7076
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E104-D
IS - 11
JA - IEICE TRANSACTIONS on Information
Y1 - November 2021
AB - We propose a new framework for estimating depth information from a single image. Our framework is relatively small and straightforward by employing a two-stage architecture: a residual network and a simple decoder network. Our residual network in this paper is a remodeled of the original ResNet-50 architecture, which consists of only thirty-eight convolution layers in the residual block following by pair of two up-sampling and layers. While the simple decoder network, stack of five convolution layers, accepts the initial depth to be refined as the final output depth. During training, we monitor the loss behavior and adjust the learning rate hyperparameter in order to improve the performance. Furthermore, instead of using a single common pixel-wise loss, we also compute loss based on gradient-direction, and their structure similarity. This setting in our network can significantly reduce the number of network parameters, and simultaneously get a more accurate image depth map. The performance of our approach has been evaluated by conducting both quantitative and qualitative comparisons with several prior related methods on the publicly NYU and KITTI datasets.
ER -