Jasdeep Garcha, Karan Sabharwal
Voice recognition has gained massive popularity in recent times. Computer interfaces converting speech to text, voice dialing, learning tools, and other relevant technology have gained prominence because of effectiveness. The implementation is not simple and the challenges are numerous, among them is making a robust system to identify common sounds across populations.
This project intended to explore the difference between normal and modified speech and understand whether feature selection is consistent across the two sets of data. Two specific problem areas that were addressed:
- Variable compression and expansion of data required for transmission over certain mediums, and
- Making a robust system to identify common sounds (such as consonants and vowels) across populations.
The compression/expansion problem was addressed by applying a particular time-scale modification algorithm known as Waveform Similarity Overlap and Add (WSOLA). This specific algorithm was chosen for its capability of, theoretically, preserving some pitch and frequency content of the original signal the algorithm is applied to. Unfortunately, this experiment confirmed that although WSOLA proved as an upgrade to up/down sampling a signal, it still has much room for improvement. By viewing the spectrograms of the modified signals, it is evident that WSOLA has two effects on signals:
- It actually distorts the pitch of the original signal, and
- It augments the signal with higher frequencies.
When considering the complexity of an algorithm implemented in large applications, such as VoIP, WSOLA algorithm is insufficient.
In identifying common sounds, we developed a system that made use of a classification algorithm based on Linear Discriminant Analysis (LDA). By selecting the most important features for different phoneme classes, we were able to hit a 90% success rate
in identifying the right phoneme for a series of test sample data.
Digitial Signal Processing, Matlab, Modeling and simulation