Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
-
Updated
Oct 19, 2023
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
An advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
Command line utility for forced alignment using Kaldi
My-Voice Analysis is a Python library for the analysis of voice (simultaneous speech, high entropy) without the need of a transcription. It breaks utterances and detects syllable boundaries, fundamental frequency contours, and formants.
A Python library for measuring the acoustic features of speech (simultaneous speech, high entropy) compared to ones of native speech.
Foreign Accent Conversion by Synthesizing Speech from Phonetic Posteriorgrams (Interspeech'19)
PyTorch implementation of the Factorized TDNN (TDNN-F) from "Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks" and Kaldi
Official code for "Learning Neural Acoustic Fields" (NeurIPS 2022)
Acoustic mosquito detection code with Bayesian Neural Networks
A Simple Automatic Speech Recognition (ASR) Model in Tensorflow, which only needs to focus on Deep Neural Network. It's easy to test popular cells (most are LSTM and its variants) and models (unidirectioanl RNN, bidirectional RNN, ResNet and so on). Moreover, you are welcome to play with self-defined cells or models.
This is a sub-repository in building to create acoustic model in Mandarin speech recognition.
[AAAI 2024] CTX-txt2vec, the acoustic model in UniCATS
A crash course for training speech recognition models using DeepSpeech.
SC-CNN: Effective Speaker Conditioning Method for Zero-Shot Multi-Speaker Text-to-Speech Systems
Code for the paper: Audio to Score Matching by Combining Phonetic and Duration Information
PyTorch implementation of automatic speech recognition models.
Sequential adaptive elastic net (SAEN) approach, complex-valued LARS solver for weighted Lasso/elastic-net problems, and sparsity (or model) order detection with an application to single-snapshot source localization.
A voice user interface that recognizes the user's voice via the Sphinx library to execute some commands. The system responds with a computer generated voice and sound clips. Finally, there's a server for storing and reacting to the data, and a client for connecting to the system.
Some approaches based on deep learning to build the acoustic model for an end-to-end automatic speech recognition (ASR) pipeline.
🎵 A repository for manually annotating files to create labeled acoustic datasets for machine learning.
Add a description, image, and links to the acoustic-model topic page so that developers can more easily learn about it.
To associate your repository with the acoustic-model topic, visit your repo's landing page and select "manage topics."