An aligner based on Wav2Vec2 and ctc segmentation. Most of the code was created by following this tutorial but uses g2p for preserving the input, turned into a package with a command line interface with a method for exporting to TextGrid.
Create a conda env, then pip install -e .
ctc-segmenter align-single sample.txt sample.wav
which will output a Praat TextGrid with the word, and sentence level alignments.
You can then adjust the Praat TextGrid as necessary and run ctc-segmenter extract-segments-from-textgrid sample.TextGrid sample.wav outdir
which will extract the segments and write them to the outdir
directory along with a metadata file.