TY - GEN
T1 - Multi-stream Convolutional Networks for Indoor Scene Recognition
AU - Anwer, Rao Muhammad
AU - Khan, Fahad Shahbaz
AU - Laaksonen, Jorma
AU - Zaki, Nazar
N1 - Publisher Copyright:
© 2019, Springer Nature Switzerland AG.
PY - 2019
Y1 - 2019
N2 - Convolutional neural networks (CNNs) have recently achieved outstanding results for various vision tasks, including indoor scene understanding. The de facto practice employed by state-of-the-art indoor scene recognition approaches is to use RGB pixel values as input to CNN models that are trained on large amounts of labeled data (ImageNet or Places). Here, we investigate CNN architectures by augmenting RGB images with estimated depth and texture information, as multiple streams, for monocular indoor scene recognition. First, we exploit the recent advancements in the field of depth estimation from monocular images and use the estimated depth information to train a CNN model for learning deep depth features. Second, we train a CNN model to exploit the successful Local Binary Patterns (LBP) by using mapped coded images with explicit LBP encoding to capture texture information available in indoor scenes. We further investigate different fusion strategies to combine the learned deep depth and texture streams with the traditional RGB stream. Comprehensive experiments are performed on three indoor scene classification benchmarks: MIT-67, OCIS and SUN-397. The proposed multi-stream network significantly outperforms the standard RGB network by achieving an absolute gain of 9.3%, 4.7%, 7.3% on the MIT-67, OCIS and SUN-397 datasets respectively.
AB - Convolutional neural networks (CNNs) have recently achieved outstanding results for various vision tasks, including indoor scene understanding. The de facto practice employed by state-of-the-art indoor scene recognition approaches is to use RGB pixel values as input to CNN models that are trained on large amounts of labeled data (ImageNet or Places). Here, we investigate CNN architectures by augmenting RGB images with estimated depth and texture information, as multiple streams, for monocular indoor scene recognition. First, we exploit the recent advancements in the field of depth estimation from monocular images and use the estimated depth information to train a CNN model for learning deep depth features. Second, we train a CNN model to exploit the successful Local Binary Patterns (LBP) by using mapped coded images with explicit LBP encoding to capture texture information available in indoor scenes. We further investigate different fusion strategies to combine the learned deep depth and texture streams with the traditional RGB stream. Comprehensive experiments are performed on three indoor scene classification benchmarks: MIT-67, OCIS and SUN-397. The proposed multi-stream network significantly outperforms the standard RGB network by achieving an absolute gain of 9.3%, 4.7%, 7.3% on the MIT-67, OCIS and SUN-397 datasets respectively.
KW - Depth features
KW - Scene recognition
KW - Texture features
UR - http://www.scopus.com/inward/record.url?scp=85072854163&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85072854163&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-29888-3_16
DO - 10.1007/978-3-030-29888-3_16
M3 - Conference contribution
AN - SCOPUS:85072854163
SN - 9783030298876
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 196
EP - 208
BT - Computer Analysis of Images and Patterns - 18th International Conference, CAIP 2019, Proceedings
A2 - Vento, Mario
A2 - Percannella, Gennaro
PB - Springer Verlag
T2 - 18th International Conference on Computer Analysis of Images and Patterns, CAIP 2019
Y2 - 3 September 2019 through 5 September 2019
ER -