Speech emotion recognition using speech processing and machine learning

Afrah Sultana Sher

Back

Speech emotion recognition using speech processing and machine learning

Thesis

Open access

Speech emotion recognition using speech processing and machine learning

Afrah Sultana Sher

California State University, Sacramento

Master of Science (MS), California State University, Sacramento

12/23/2025

Handle:

https://hdl.handle.net/20.500.12741/rep:13824

Abstract

Audio

Emotion

MFCC

RAVDESS

Recognition

Speech

Speech recognition is one of the fastest developing engineering technologies at present. It has various applications in several different areas, hence giving multiple benefits to tech today. On the other hand, Speech Emotion Recognition (SER)’s goal is to predict human emotions from speech, and is one of the fastest growing fields in technology. Predicting emotions from audio only is difficult, but SER allows the prediction of emotions just from audio. Different speech features including tone, pitch, and volume, help detect the emotions contained within speech. This project would contribute to an advanced Emotional Voice Conversion (EVC) system that builds on users' emotional expression vocally as both an emotional recognition and speech processing tool. The system would provide emotional recognition capabilities through incorporating machine learning with emotional recognitional speech processing functions. Machine learning tactic used here is prediction, which allows the system to determine the emotion of speech based on volume, pitch etc. In the previous studies pertaining to SER, there has been a more traditional approach to the detection of emotion in speech which yields a high error percentage. To combat this inaccuracy, the use of modern CNN algorithm increases the accuracy of speech emotion reading and thus giving a high accuracy output of about 93% compared to previous works that had a percentage of about 70%. Machine learning is used here in the form of prediction and elimination, where according to the pitch and volume of the audio file, a prediction is made for the emotion of the audio file used and the output of emotion in that audio file is given as a result. Elimination of the less likely match is done, and the final result matches the emotion of the input audio file. For this project, RAVDESS dataset is used as the input.

Files and links (1)

pdf

SherAfrahS_Spring2025 - Final Paper Document1.19 MBDownload View

TextProject Open Access

Metrics

4 Record Views

Details

Title: Speech emotion recognition using speech processing and machine learning
Creators: Afrah Sultana Sher
Contributors: Preetham B Kumar (Committee Member)
Neal Frederick Levine (Advisor) - California State University, Sacramento, Electrical Engineering
Academic Unit: Electrical Engineering
Theses and Dissertations: Master of Science (MS); Electrical and Electronic Engineering; California State University, Sacramento; 05/03/2025; 2025
Publisher: California State University, Sacramento
Publication Details: 12/23/2025
Identifiers: 99258271633201671; https://hdl.handle.net/20.500.12741/rep:13824
Resource Type: Masters Project
Language: English
Number of pages: 58
Comment: The accessibility of this document has been verified by Sacramento State University Library. For questions, please contact lib-508Accessibility@csus.edu.