Cross-situational learning in computational models of visually grounded speech (VGS_XSL)

Python and MATLAB scripts for the experiments reported in manuscript titled "Can phones, syllables, and words emerge as side-products of cross-situational audiovisual learning? --- A computational investigation" by Khazar Khorrami and Okko Räsänen.

Feature extraction, model training and semantic retrieval evalution scripts were written in Python and are available from the respective folders. Analysis scripts of hidden layer activations were written mostly in MATLAB, and can be found under selectivity_analyses/

Models and model activation data are available for download at Zenodo: https://doi.org/10.5281/zenodo.4564283

Manuscript is available at: https://ldr.lps.library.cmu.edu/article/id/434/

Model

Data used in the experiments

Brent-Siskind corpus is available at
https://childes.talkbank.org/access/Eng-NA/Brent.html

Places audio captions (English) are available at
https://groups.csail.mit.edu/sls/downloads/placesaudio/downloads.cgi

Places205 images: http://places.csail.mit.edu/downloadData.html

SPEECH-COCO audio captions are available at
https://zenodo.org/record/4282267

MSCOCO images are available at
https://cocodataset.org/#download

The derived version of "Large-Brent" with utterance-level waveforms with their
phone, syllable and word-level transcripts (based on Rytting et al., 2010,
and Räsänen et al., 2018) is available from the second author upon request ([email protected]).
The data cannot be shared publicly as it would require redistribution of modified
Brent-Siskind audio files. Annotations corresponding to the derived model activations
are included in the model activation package shared through Zenodo (link above).

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
feature_extraction		feature_extraction
model_training		model_training
selectivity_analyses		selectivity_analyses
sr_analyses		sr_analyses
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cross-situational learning in computational models of visually grounded speech (VGS_XSL)

Model

Data used in the experiments

About

Releases

Packages

Contributors 2

Languages

SPEECHCOG/VGS_XSL

Folders and files

Latest commit

History

Repository files navigation

Cross-situational learning in computational models of visually grounded speech (VGS_XSL)

Model

Data used in the experiments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages