DEEP LEARNING APPROACHES TO PREDICT FUTURE FRAMES IN VIDEOS

Tariqul Islam(), Tariqul Islam(); Hafizul  Imran; Md. Ramim  Hossain; Tamjeed  Monshi; Himanish  Debnath Himu,; Md.  Ashikur Rahman; Gourob  Saha Surjo

Authors

Tariqul Islam () Daffodil International University, Dhaka, Bangladesh
Hafizul Imran Daffodil International University, Dhaka, Bangladesh
Md. Ramim Hossain Daffodil International University, Dhaka, Bangladesh
4Tamjeed Monshi Daffodil International University, Dhaka, Bangladesh7
Himanish Debnath Himu, Daffodil International University, Dhaka, Bangladesh
6Md. Ashikur Rahman Daffodil International University, Dhaka, Bangladesh
Gourob Saha Surjo Daffodil International University, Dhaka, Bangladesh

Abstract

The field of artificial intelligence (AI) has long pursued the goal of creating machines that can replicate human-like thinking and behavior. While early AI focused on solving complex problems that were difficult for humans, it became evident that tasks requiring intuition, such as image recognition or video understanding, posed significant challenges for traditional algorithmic approaches. This led to the emergence of machine learning (ML), where knowledge is acquired by extracting patterns from raw data. However, ML still requires manual feature selection and does not effectively handle multi-dimensional data like images and videos.

To address these limitations, representation learning and deep learning have been introduced. Representation learning aims to automatically construct meaningful representations of data, while deep learning leverages artificial neural networks (ANNs) inspired by the human brain to learn hierarchies of representations. The rise of computational power has enabled the development of deep neural networks that can handle increasingly complex tasks. Deep learning approaches have been applied to video analysis, including action recognition, video classification, and optical flow prediction, but they often require large amounts of labeled data, which is time-consuming and limits their applicability.

This paper makes several contributions to the field. Firstly, it provides an extensive overview of existing deep learning approaches for future frame prediction in videos. Secondly, it presents a novel neural network architecture that combines batch normalization, convolutional LSTM, and scheduled sampling to improve the training of recurrent models. This architecture achieves significantly reduced prediction errors compared to state-of-the-art models on the Moving MNIST dataset. Moreover, the paper makes all TensorFlow implementations, including the specialized components and evaluation metrics, freely available to the research community. Additionally, a lightweight, high-level, open-source framework for TensorFlow is introduced, simplifying the development of deep learning applications by providing abstractions for common tasks.

DEEP LEARNING APPROACHES TO PREDICT FUTURE FRAMES IN VIDEOS

Authors

Abstract

Keywords:

Downloads

Published

Issue

Section

How to Cite

Similar Articles

submission

SidebarMenu