International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
Call for Papers | Fully Refereed | Open Access | Double Blind Peer Reviewed

ISSN: 2319-7064


Downloads: 7

India | Computer Science Engineering | Volume 13 Issue 8, August 2024 | Pages: 1118 - 1122


Enhancing Speech-to-Text Conversion with Convolutional Reinforcement Learning Algorithms

Pichika Ravikiran, Midhun Chakkaravarthy

Abstract: Speech-to-Text (STT) conversion has become a critical component in various applications, ranging from virtual assistants to real-time transcription services. Traditional models, while effective, often struggle with accuracy and robustness in diverse acoustic environments. This paper introduces a novel approach to STT conversion by leveraging Convolutional Neural Networks (CNNs) for feature extraction and Reinforcement Learning (RL) for optimizing transcription accuracy. Our proposed method employs CNNs to capture local temporal and spectral features from raw audio signals, transforming them into high-dimensional representations suitable for sequential processing. These features are then fed into a Sequence-to-Sequence (Seq2Seq) model, which translates the audio features into textual output. To enhance the performance of the Seq2Seq model, we integrate a reinforcement learning agent that dynamically adjusts model parameters based on a reward function that incentivizes correct transcriptions. We evaluate our model on a benchmark speech recognition dataset, demonstrating significant improvements in accuracy and robustness compared to traditional STT systems. Our results indicate that the convolutional reinforcement learning approach not only enhances the model?s ability to generalize across different speakers and acoustic conditions but also reduces the error rate in noisy environments. This study underscores the potential of combining CNNs and RL to create more efficient and accurate speech recognition systems, paving the way for future advancements in voice-activated technologies and applications.

Keywords: Speech-to-Text (STT), Convolutional Neural Networks (CNNs), Reinforcement Learning (RL), Sequence-to-Sequence (Seq2Seq) model

How to Cite?: Pichika Ravikiran, Midhun Chakkaravarthy, "Enhancing Speech-to-Text Conversion with Convolutional Reinforcement Learning Algorithms", Volume 13 Issue 8, August 2024, International Journal of Science and Research (IJSR), Pages: 1118-1122, https://www.ijsr.net/getabstract.php?paperid=SR24515225027, DOI: https://dx.doi.org/10.21275/SR24515225027


Download Article PDF


Rate This Article!


Top