Course description

Course website: www.cs136a.mmeteer.com

Speech recognition has had a huge resurgence in the past few years, both commercially and in the underlying technology. We can talk to Google, Siri, and Alexa to get information and carry out routine tasks. However, the technology is far from perfect when compared to human performance and the reality is that while the technology surrounds us, most people don’t use it.

This course covers speech recognition from both applied and theoretical perspectives. We will cover the core algorithms used in speech recognition and use the open source Kaldi speech recognizer to explore how the algorithms perform and how changes in the parameters and training data change the performance.

There will be a large “experiential learning” focus in the course. In the first project students will work in groups to train and test the Kaldi recognizer in on a joint task, competing to get the best performance in the style of a “challenge” typical of research in the field. In the second project students will in groups on a dataset of their choice to improve some specific aspect of recognizer performance, for example for a specific task, language, or speaker population (e.g. non-native English).

There will also be a combination of lectures and student led presentation and discussion on topics such as phonetics, weighted finite state transducers, Hidden Markov Models, statistical language models, neural nets, and conversational systems.


Topics and assignments for each class are posted on the schedule page. Please check this reguarly, as it may change throughout the year.

Details on the assignments are posted on the assignments page. Again, please check this reguarly, I’ll update it as the assignments get closer.


There will be the following types of gradable elements in class. Due dates will be posted on the schedule page and announced in class. No extensions will be considered after the due date of the assignment for any reason and extensions will only be considered for well articulated reasons. If that reason is because you didn’t understand the problem or weren’t able to access data, etc, then it needs to be well in advance of the actual due date. Bottom line: Start early and communicate.

Policy on working together: Unless it is specifically stated in the assignment, all assignments must be done independently. However, when working with 3rd party toolsets, you may collaborate on getting the tools installed and running. In order to make this collaboration fair for everyone, you must post questions and answers on the class Latte blog, even if it’s just a summary of a hallway conversation. If it was helpful, share it.

Programming Assignments 60% There will be two group projects using the Kaldi speech recognizer, the first working with an assigned test and training set to work within the Kaldi toolkit to improve performance and the second to use Kaldi on a dataset of your choice to address some specific problem in speech recognition. Look at the Prog Assignments page for details
Reading, Quizes, Presentations 30% Readings may involve some submissions before class.  Quizes are in class or take home with questions on the material covered in class. If you miss a quiz you need to make it up. In addition there will be some reading with presentations, usually in groups.  Look at the RQP Assignments page for details.
Class Participation 10% Attendance and paying attention and answering questions, particpation on class Latte discussions. Throughout the semester, I will post questions about the readaing or class material. You should make at least one substantive comment per post.

If you are a student with a documented disability on record at Brandeis University and which to have reasonable accommodations made for you in this class, please see me immediately.

Success in this 4 credit hour course is based on the expectation that students will spend a minimum of 9 hours of study time per week in preparation for class (readings, papers, discussion sections, preparation for exams, etc.).