Skip to content

Latest commit

 

History

History
 
 

conformer

Conformer: Convolution-augmented Transformer for Speech Recognition

Reference: https://arxiv.org/abs/2005.08100

Conformer Architecture

Example Model YAML Config

speech_config:
  sample_rate: 16000
  frame_ms: 25
  stride_ms: 10
  feature_type: log_mel_spectrogram
  num_feature_bins: 80
  preemphasis: 0.97
  normalize_signal: True
  normalize_feature: True
  normalize_per_feature: False

decoder_config:
  vocabulary: null
  target_vocab_size: 1024
  max_subword_length: 4
  blank_at_zero: True
  beam_width: 5
  norm_score: True

model_config:
  name: conformer
  subsampling:
    type: conv2
    kernel_size: 3
    strides: 2
    filters: 144
  positional_encoding: sinusoid_concat
  dmodel: 144
  num_blocks: 16
  head_size: 36
  num_heads: 4
  mha_type: relmha
  kernel_size: 32
  fc_factor: 0.5
  dropout: 0.1
  embed_dim: 320
  embed_dropout: 0.0
  num_rnns: 1
  rnn_units: 320
  rnn_type: lstm
  layer_norm: True
  joint_dim: 320

learning_config:
  augmentations:
    after:
      time_masking:
        num_masks: 10
        mask_factor: 100
        p_upperbound: 0.2
      freq_masking:
        num_masks: 1
        mask_factor: 27

  dataset_config:
    train_paths: ...
    eval_paths: ...
    test_paths: ...
    tfrecords_dir: ...

  optimizer_config:
    warmup_steps: 10000
    beta1: 0.9
    beta2: 0.98
    epsilon: 1e-9

  running_config:
    batch_size: 4
    num_epochs: 22
    outdir: ...
    log_interval_steps: 400
    save_interval_steps: 400
    eval_interval_steps: 1000

Usage

Training, see python examples/conformer/train_conformer.py --help

Testing, see python examples/conformer/train_conformer.py --help

TFLite Conversion, see python examples/conformer/tflite_conformer.py --help

Conformer Subwords - Results on LibriSpeech

Summary

  • Number of subwords: 1031
  • Maxium length of a subword: 4
  • Subwords corpus: all training sets, dev sets and test-clean
  • Number of parameters: 10,341,639
  • Positional Encoding Type: sinusoid concatenation

Pretrained and Config, go to drive

Transducer Loss

conformer_subword

Error Rates

Test-clean WER (%) CER (%)
Greedy 6.4476862 2.51828337