M.Tech / M.E / PhD Thesis | Electronics & Communication Engineering | India | Volume 5 Issue 6, June 2016
Discriminating Speech and Nonspeech from Video Signals using SFF VAD
Avani S Babu | Amrutha V Nair 
Abstract: An image processing approach is used for speech/nonspeech discrimination. The approach is based on single frequency filtering (SFF) and visual VAD. SFF is the amplitude envelope of the signal is obtained at each frequency with high temporal and spectral resolution where visual VAD is a classifier to determine whether a speaker is silent or not in a frame using the associated video signal. The high resolution property of SFF helps to exploit the resulting high signal-to-noise ratio (SNR) regions in time and frequency. But in SFF method, nonspeech is also considered as speech in the audio signal at particular situations. To avoid this issue, a technique is proposed with the combination of SFF and Visual VAD in which the speech is extracted from the video signals by the lip movement. In this method uses lip shape and degree of lip opening as visual features representing a subjects lip motion. After the lip movement analysis, the audio analyzed output and video analyzed output is combined together to distinguish the voiced/unvoiced region with a SVM classifier.
Keywords: Single Frequency Filtering SFF, Voice Activity Detection VAD, spectral resolution, lip motion, Support Vector Machine SVM
Edition: Volume 5 Issue 6, June 2016,
Pages: 1669 - 1672
How to Cite this Article?
Avani S Babu, Amrutha V Nair, "Discriminating Speech and Nonspeech from Video Signals using SFF VAD", International Journal of Science and Research (IJSR), https://www.ijsr.net/get_abstract.php?paper_id=NOV164643, Volume 5 Issue 6, June 2016, 1669 - 1672, #ijsrnet
How to Share this Article?
Similar Articles with Keyword 'spectral resolution'
Performance Enhancement of DCA Using CRN
Enhance the Information Content of Medical Image by Image Fusion Techniques - A Survey
Sumit Narayan Jarholiya | Dr. Shachi Awasthi