Porting VAD to Android application

Setting up

1. Setting up PyTorch Android dependency

Insert new line to app level build.gradle file:

 implementation("org.pytorch:pytorch_android:1.9.0")

2. Downloading model

Get model and move it to src/main/assets directory (vad.jit in this example)

Usage

Initializing model

To use model we must first initialize Model object of our model.

To do it we can use 2 functions loadModel() and assetFilePath()

private fun loadModule(path: String): Module {
    val modulePath = assetFilePath(context, path)
    val moduleFileAbsoluteFilePath = File(modulePath).absolutePath
    return Module.load(moduleFileAbsoluteFilePath)
}

private fun assetFilePath(context: Context, assetName: String): String {
    val file = File(context.filesDir, assetName)
    if (file.exists() && file.length() > 0) {
        return file.absolutePath
    }
    context.assets.open(assetName).use { inputStream ->
        FileOutputStream(file).use { os ->
            val buffer = ByteArray(4 * 1024)
            var read: Int
            while (inputStream.read(buffer).also { read = it } != -1) {
                os.write(buffer, 0, read)
            }
            os.flush()
        }
        return file.absolutePath
    }
}

Usage:

private val vadModule: Module by lazy {
    loadModule("vad.jit").also {
        Log.d("PyTorch", "Vad module has been initialized")
    }
}

Getting result from model

To get result from initialized model we can use getResult() function:

private fun Module.getResult(floatInputBuffer: FloatArray): IValue {
    val inTensorBuffer = Tensor.allocateFloatBuffer(floatInputBuffer.size)
    inTensorBuffer.put(floatInputBuffer)
    val inTensor =
        Tensor.fromBlob(inTensorBuffer, longArrayOf(1, floatInputBuffer.size.toLong()))
    return forward(IValue.from(inTensor))
}

Usage:

vadModule.getResult(audioInFloatArray)

where audioInFloatArray is a chunk of audio.

Getting probability of speech

In our case, model returns an array [a, b], where a is probability of not detecting speech and b is probability of detection speech. So we need to get the second element of this array. To do this we can write:

val result = vadModule.getResult(audioInFloatArray)
val probabilityOfSpeech = result.toTensor().dataAsFloatArray[1]

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.idea		.idea
app		app
gradle/wrapper		gradle/wrapper
.gitignore		.gitignore
Import.md		Import.md
README.md		README.md
build.gradle		build.gradle
demo-vad.apk		demo-vad.apk
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Porting VAD to Android application

Setting up

1. Setting up PyTorch Android dependency

2. Downloading model

Usage

Initializing model

Getting result from model

Getting probability of speech

About

Releases

Packages

Languages

bgubanov/VadExample

Folders and files

Latest commit

History

Repository files navigation

Porting VAD to Android application

Setting up

1. Setting up PyTorch Android dependency

2. Downloading model

Usage

Initializing model

Getting result from model

Getting probability of speech

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages