Recently, the advancements in technologies have enabled volumetric media techniques to capture, encode, decode, and render videos in six degree-of-freedom (6DoF) in order to make the objects highly immersive, interactive, and expressive within the scene. This is enabled by using multiple cameras around the object(s). However, streaming 6DoF videos require a huge bandwidth and computational processing. As the end-user focuses on viewport-scenes, a large portion of the consumed bandwidth is mainly introduced due to unseen video-scenes. To fill this gap, it is imperative to predict the future head-movement (future viewport) of end-user in order to avoid the waste of network bandwidth and reduce computational processing power. In this paper, we propose a holistic architecture for the future viewport prediction using deep-neural-network (DNN)-based model. Specifically, our solution uses residual long-short-term-memory (RLSTM) architecture for accurate future viewport prediction. We confirm the effectiveness of our solution through trace-driven streaming experiments using a popular public dataset over four categories of DNN models: linear, dense, convolutional, and long-short-term-memory (LSTM). Experimental results show that our solution is able to achieve the lowest possible mean absolute error of ∼ 0.01 compared to its competitor.