Video Description and Collision Detection for Visually Impaired

Saini, Vinay Kumar; Kwatra, Hitesh; Narang, Himanshu

doi:https://dx.doi.org/10.21275/ART20197124

Video Description and Collision Detection for Visually Impaired

Vinay Kumar Saini, Hitesh Kwatra, Himanshu Narang

Abstract: Using Deep learning techniques, find a new approach that analyses a video and then present it in understandable language using NLP techniques. For most people, watching a brief video and describing what happened is an easy task. For machines, extracting the meaning from video pixels and generating natural-sounding language is a very complex problem. Solutions have been proposed for narrow domains with a small set of known actions and objects We plan to extract features for each frame, mean pool the features across the entire video and input this at every time step to the LSTM network. The LSTM outputs one word at each time step, based on the video features until it picks the end-of-sentence tag and extends them to generate sentences describing events in videos. They then use a sequence model, specifically a Recurrent Neural Network (RNN), to decode the vector into a sentence. In this work, we plan to show that interpreting a visual vector into a set of English words will work same for videos as well as static images. We did this in all the experiments, and it did help quite a lot in terms of generalization. Another set of weights that could be sensibly initialized are We, the word embeddings. We tried initializing them from a large news corpus, but no significant gains were observed, and we decided to just leave them uninitialized for simplicity. Lastly, we did some model level overfitting-avoiding techniques. We tried dropout and ensembling models, as well as exploring the size (i. e. , capacity) of the model by trading off number of hidden units versus depth. We also propose collision detection system so that along with getting what is happening around the person, it also gets a collision warning if the distance between the object and the person become smaller than a certain threshold.

Keywords: NLP, Computer Vision, Deep Learning, RNN, LSTM, Collision Detection, Object Detection

How to Cite?: Vinay Kumar Saini, Hitesh Kwatra, Himanshu Narang, "Video Description and Collision Detection for Visually Impaired", Volume 8 Issue 4, April 2019, International Journal of Science and Research (IJSR), Pages: 1306-1308, https://www.ijsr.net/getabstract.php?paperid=ART20197124, DOI: https://dx.doi.org/10.21275/ART20197124