LibriSQA: Advancing Free-form and Open-ended Spoken Question Answering with a Novel Dataset and Framework
LibriSQA, built on LibriSpeech [1], offers the first free-form and open-ended spoken question answering (SQA) dataset tailored for large language models (LLMs) to undergo end-to-end SQA training, featuring genuine speech and lengths suitable for LLMs. It has two parts: Part I with natural dialogues and Part II in a multiple-choice format, complete with correct answers and analysis. Using LibriSQA, we've successfully introduced a speech-text multimodal training framework capable of handling tasks like common sense question answering, automatic speech recognition (ASR), as well as natural dialogue SQA and multiple-choice SQA, showcasing that with LibriSQA, models can be trained to excel in speech-text alignment, efficiently leveraging multimodal data.
We have released the dataset and we will release the code soon.
Training: https://www.openslr.org/resources/12/train-clean-360.tar.gz
testing: https://www.openslr.org/resources/12/test-clean.tar.gz
The dataset is available at Huggingface
Trained with LibriSpeech [1] with SQA format.
Trained with our LibriSQA without any speech-text pair.
Trained with LibriSQA Part I.
Trained with LibriSQA Part II.
[1] LibriSpeech: An ASR corpus based on public domain audio books: -- https://ieeexplore.ieee.org/abstract/document/7178964
[2] LLaMA: Open and Efficient Foundation Language Models -- https://arxiv.org/abs/2302.13971
[3] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention -- https://arxiv.org/abs/2303.16199
[4] PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering -- https://arxiv.org/abs/2305.10415
We thank the authors for their great idea and open-sourced code which helped us with this paper.
Please raise an issue if you need help, any contributions are welcomed.
If you use LibriSQA for your research, please cite our paper
@article{zhao2023librisqa,
title={LibriSQA: Advancing Free-form and Open-ended Spoken Question Answering with a Novel Dataset and Framework},
author={Zhao, Zihan and Jiang, Yiyang and Liu, Heyang and Wang, Yanfeng and Wang, Yu},
journal={arXiv preprint arXiv:2308.10390},
year={2023}
}