Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KHR_audio_emitter #2137

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
314 changes: 314 additions & 0 deletions extensions/2.0/Khronos/KHR_audio/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,314 @@
# KHR_audio

## Contributors

* Robert Long, Element Inc.
* Anthony Burchell, Individual Contributor
* K. S. Ernest (iFire) Lee, Individual Contributor
* Michael Nisbet, Individual Contributor
* humbletim, Individual Contributor
* Norbert Nopper, UX3D [@UX3DGpuSoftware](https://twitter.com/UX3DGpuSoftware)

## Status

Draft

## Dependencies

Written against the glTF 2.0 spec.

## Overview

This extension allows for the addition of spatialized and non-spatialized audio to glTF scenes.

Audio emitter objects may be added to 3D nodes for positional audio or to the scene for environmental or ambient audio such as background music.

### Example:

```json
{
"extensions": {
"KHR_audio": {
"emitters": [
{
"name": "Positional Emitter",
"type": "positional",
"gain": 0.8,
"sources": [0, 1],
"positional": {
"coneInnerAngle": 6.283185307179586,
"coneOuterAngle": 6.283185307179586,
"coneOuterGain": 0.0,
"distanceModel": "inverse",
"maxDistance": 10.0,
"refDistance": 1.0,
"rolloffFactor": 0.8
}
}
],
"sources": [
{
"name": "Clip 1",
"gain": 0.6,
"autoPlay": true,
"loop": true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 3 properties are in the example and in the schema, but they are not defined in the README.

"audio": 0
},
{
"name": "Clip 2",
"gain": 0.6,
"autoPlay": true,
"loop": true,
"audio": 1
}
],
"audio": [
{
"uri": "audio1.mp3",
},
{
"bufferView": 0,
"mimeType": "audio/mpeg"
}
]
}
},
"scenes": [
{
"name": "Default Scene",
"extensions": {
"KHR_audio": {
"emitters": [0]
}
}
}
],
"nodes": [
{
"name": "Duck",
"translation": [1.0, 2.0, 3.0],
"extensions": {
"KHR_audio": {
"emitter": 1
}
}
}
]
}
```

## glTF Schema Updates

This extension consists of three primary data structures: Audio Data, Audio Sources, and Audio Emitters. Data, sources and emitters are defined on an `KHR_audio` object added to the `extensions` object on the document root.

The extension must be added to the file's `extensionsUsed` array and because it is optional, it does not need to be added to the `extensionsRequired` array.
robertlong marked this conversation as resolved.
Show resolved Hide resolved

#### Example:

```json
{
"asset": {
"version": "2.0"
}
"extensionsUsed" : [
"KHR_audio"
],
"scenes": [...],
"nodes": [...],
"extensions": {
"KHR_audio": {
"audio": [...],
"sources": [...],
"emitters": [...]
}
}
}
```

### Audio Data

Audio data objects define where audio data is located. Data is either accessed via a bufferView or uri.

When storing audio data in a buffer view, the `mimeType` field must be specified. Currently the only supported mime type is `audio/mpeg` for use with MP3 files. MP3 was chosen due to its wide support across browsers and 3D engines as well as its lossy compression with variable bitrate. Other supported audio formats may be added via another extension.

Note that in tools that process glTF files, but do not implement the KHR_audio extension, external files referenced via the `uri` field may not be properly copied to their final destination or baked into the final binary glTF file. In these cases, using the `bufferView` property may be a better choice assuming the referenced `bufferView` index is not changed by the tool. The `uri` field might be a better choice when you want to be able to quickly change the referenced audio asset.

#### `bufferView`

The index of the bufferView that contains the audio data. Use this instead of the audio source's uri property.

#### `mimeType`

The audio's MIME type. Required if `bufferView` is defined. Unless specified by another extension, the only supported mimeType is `audio/mpeg`.

#### `uri`

The uri of the audio file. Relative paths are relative to the .gltf file.

### Audio Sources

Audio sources define the playing state for a given audio data. They connect one audio data to zero to many audio emitters.

#### `gain`

Unitless multiplier against original audio file volume for determining audio source loudness.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unitless, but like, what scale? Is this linear or decibels? This should be specified. Same for emitter gain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is supposed to mirror the gain value in the WebAudio's GainNode. However, it seems like they haven't given any details as to how it should be implemented. https://webaudio.github.io/web-audio-api/#GainOptions Other nodes are explicitly defined.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: @antpb explicitly state that the calculated value is linear against the volume in the audio file


#### `loop`

Whether or not to loop the specified audio when finished.

#### `autoPlay`

Whether or not to play the specified audio when the glTF is loaded.

#### `audio`

The index of the audio data assigned to this clip.

### Audio Emitter

Positional or global sinks for playing back audio sources.

#### `type`

Specifies the audio emitter type.

- `positional` Positional audio emitters. Using sound cones, the orientation is `+Z` having the same front side for a [glTF asset](https://www.khronos.org/registry/glTF/specs/2.0/glTF-2.0.html#coordinate-system-and-units).
Copy link
Contributor

@aaronfranke aaronfranke May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inconsistent with KHR_lights_punctual which defines spotlights as facing -Z. Is it expected that conical audio should be pointing the opposite direction as conical lights?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, -Z forward makes the most sense. It's also in line with how the WebAudio spec defines audio listeners. I can't find the details on panner node, besides it seemingly defaulting to +X forward? This will break existing files but better to do it now.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: @antpb change to negative Z

- `global` Global audio emitters are not affected by the position of audio listeners. `coneInnerAngle`, `coneOuterAngle`, `coneOuterGain`, `distanceModel`, `maxDistance`, `refDistance`, and `rolloffFactor` should all be ignored when set.

#### `gain`

Unitless multiplier against original source volume for determining emitter loudness.

#### `loop`

Whether or not to loop the specified audio clip when finished.

#### `playing`

Whether or not the specified audio clip is playing. Setting this property `true` will set the audio clip to play on load (autoplay).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason for naming this like a state ("playing") vs. naming it as what it does ("autoplay")? Seems that hints at possible implementation details ("changing this should change play state") but that isn't mentioned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bigger discussion about how animation, audio, etc. should be treated on load. Animations are currently just keyframe data and it's up to the implementation to figure out how to play the animations. https://www.khronos.org/registry/glTF/specs/2.0/glTF-2.0.html#animations

So this begs the question if playing or autoPlay or even loop should be included in the spec.

@najadojo @bghgary this also goes against the direction of the MSFT_audio_emitter where there are ways to specify runtime behavior.

Maybe runtime behavior should be left out of this spec and implemented in another?

Copy link

@hybridherbst hybridherbst Apr 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point regarding playback behaviour!

If the glTF is purely seen as "data setup", then it might still be desirable to have a way to connect audio clips to animations - e.g. saying "this audio emitter belongs to that animation" (potentially: "at that point in time") would be pure data. This would be similar to how a mesh + material are connected to a node.

What do you think in which direction this connection should be made? Attaching an emitter to a clip or vice versa? I think on the clip would make somewhat more sense:

"animations": [
    {
      "channels": [...],
      "samplers": [...],
      "name": "SimpleCubeAnim",
      "extensions": {
        "KHR_audio": {
          "emitters": [
            {
              "id": 0,
              "offset": 0.0
            },
            {
              "id": 1,
              "offset": 1.3
            }
          ]
        }
      }
    }
  ],

In this example, two emitters belong to this animation clip, viewers would interpret that as they "loop together", and one emitter has a slight offset. Note having the same emitter play multiple times during the loop would be possible by having multiple references to the same ID with separate offsets.

What do you think? I think that would preserve the "glTF is purely data" aspect by giving hints about how things are connected, not how things should be playing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the OMI glTF meeting we discussed this a bit. I think synchronizing audio with an animation should also be made part of another spec. AFAIK synchronizing audio with animations isn't necessarily the best way to deal with this scenario. We should talk more about autoPlay and loop though. Should this be included in the spec? Should we go into more depth on playing/mixing audio? It'd be good to get some feedback from others on this.

Copy link
Contributor

@bghgary bghgary May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, at a minimum, we need to define both how the glTF carries the audio payload (including metadata) and how to play the audio, whether that is one or more specs. If this spec only defines the former and that's all we define, it will be hard to know if the spec works and it will be hard to demo.

Copy link

@hybridherbst hybridherbst May 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you agree that the options are

  • adding info about play behaviour to the audio data itself
  • connecting audio and animations in some way (either from audio to animation or from animation to audio)
    or do you have another one in mind?

I agree that without any means of knowing how the audio is relevant to the scene, viewers won't be able to do anything with it - e.g. at a minimum I think tools like model-viewer should have a way to infer what to do with a file that has multiple animation clips and multiple audio assets (could be 1:1, could be different). A counter-argument would be saying "well for this case, if there's one animation and one audio clip that will be played back, everything else is undefined" (still allowing for cases such as this) but I'm not a huge fan of undefined-by-design...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the OMI glTF meeting we agreed that playing behavior (audioPlay and loop) should be in this spec to define the minimum behavior to play sounds. However, connecting audio and animations should be delegated to an extension like #2147

Copy link
Contributor

@bghgary bghgary May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to think about the corresponding spec(s) that define how the audio will play before completing this spec. KHR_animation_pointer spec will be able to cover some things (like modifying parameters for looping sounds), but it's probably not enough (e.g. one-shot sounds triggered from an animation).


#### `sources`

An array of audio source indices used by the audio emitter. This array may be empty.

#### `positional`

An object containing the positional audio emitter properties. This may only be defined if `type` is set to `positional`.

### Positional Audio Emitter Properties
Copy link

@hybridherbst hybridherbst Apr 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are doppler effects considered implementation details? Might want to explicitly call this out here (e.g. animating a node with an audio source, fast viewer movement, ... might cause doppler effects on positional sources).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to add a note, yeah. But perhaps audio-listener oriented properties / effects should be defined in another series of extensions?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't think these should be specified here, and probably also not in another extension, as its very application-specific. "Doppler effect for audio" is kind of in the same realm as "bloom effect for rendering" in my opinion. The note would be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can add language such as "Implementors may choose to add effects to spatial audio such as simulating the doppler effect." 👍


#### `coneInnerAngle`

The angle, in radians, of a cone inside of which there will be no volume reduction.

#### `coneOuterAngle`

The angle, in radians, of a cone outside of which the volume will be reduced to a constant value of`coneOuterGain`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to add a note here that setting this to some value > 2 * PI (the max allowed value) will turn this into a spherical audio source. It's implicit from the defaults but could be explained explicitly here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it should act as a point audio source / non-directional source when set. It's worth noting the WebAudio API Specification doesn't specify this detail though. What's also missing is the behavior when the coneOuterAngle is less than the coneInnerAngle or when the coneInnerAngle is greater than the coneOuterAngle. We should check on this before adding these details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Godot only has one cone angle, in that case which of these values should we use? The behavior should be defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can calculate the cone gain using this function defined in the WebAudio spec:
https://webaudio.github.io/web-audio-api/#Spatialization-sound-cones

It shouldn't be too costly to do at runtime.


#### `coneOuterGain`

The gain of the audio emitter set when outside the cone defined by the `coneOuterAngle` property. It is a linear value (not dB).

#### `distanceModel`

Specifies the distance model for the audio emitter.

- `linear` A linear distance model calculating the gain induced by the distance according to:
`1.0 - rolloffFactor * (distance - refDistance) / (maxDistance - refDistance)`
- `inverse` (default) An inverse distance model calculating the gain induced by the distance according to:
`refDistance / (refDistance + rolloffFactor * (Math.max(distance, refDistance) - refDistance))`
- `exponential` An exponential distance model calculating the gain induced by the distance according to:
`pow((Math.max(distance, refDistance) / refDistance, -rolloffFactor))`

#### `maxDistance`

The maximum distance between the emitter and listener, after which the volume will not be reduced any further. `maximumDistance` may only be applied when the distanceModel is set to linear. Otherwise, it should be ignored.

#### `refDistance`

A reference distance for reducing volume as the emitter moves further from the listener. For distances less than this, the volume is not reduced.

#### `rolloffFactor`

Describes how quickly the volume is reduced as the emitter moves away from listener. When distanceModel is set to linear, the maximum value is 1 otherwise there is no upper limit.

### Using Audio Emitters

Audio emitters of type `global` may be added to scenes using the following syntax:

```json
{
"scenes": [
{
"extensions": {
"KHR_audio": {
"emitters": [0, 1]
}
}
}
]
}
```

Audio emitters of type `positional` may be added to nodes using the following syntax:

```json
{
"nodes": [
{
"extensions": {
"KHR_audio": {
"emitter": 2
}
}
}
]
}
```

Note that multiple global audio emitters are allowed on the scene, but only a single audio emitter may be added to a node.

### Audio Rolloff Formula
The Audio Rolloff range is `(0.0, +∞)`. The default is `1.0`.

The rolloff formula is dependant on the distance model defined. The available distance models are `linear`, `inverse`, and `exponential`.

- linear formula: `1.0 - rolloffFactor * (distance - refDistance) / (maxDistance - refDistance)`
- inverse formula: `refDistance / (refDistance + rolloffFactor * (Math.max(distance, refDistance) - refDistance))`
- exponential formula: `pow((Math.max(distance, refDistance) / refDistance, -rolloffFactor))`

### Audio Gain Units
The gain unit range is `(0.0, +∞)`. The default is `1.0`.
- gain formula: `originalVolume * gain`

### Audio Cone Vizualized
<img alt="Audio cone showing how cone parameters impact volume based on relative distance to the source." src="./figures/cone-diagram.svg" width="500px" />

Figure 1. A modified graphic based on the <a href="https://webaudio.github.io/web-audio-api/#Spatialization-sound-cones" target="_blank">W3C Web Audio API Audio cone Figure</a>

The cone properties relate to the `PannerNode` interface and determine the amount of volume relative to a listeners position within the defined cone area.

The gain relative to cone properties is determined in a similar way as described in the web audio api with the difference that this audio emitter extension uses radians in place of degrees. [Cone Gain Algorithm Example](https://webaudio.github.io/web-audio-api/#Spatialization-sound-cones)

### Units for Rotations

Radians are used for rotations matching glTF2.

### JSON Schema

[glTF.KHR_audio.schema.json](/extensions/2.0/KHR_audio/schema/glTF.KHR_audio.schema.json)

## Known Implementations

* Third Room - https://github.com/thirdroom/thirdroom
* Three Object Viewer (WordPress Plugin) - https://wordpress.org/plugins/three-object-viewer/
* UX3D Experimental C++ implementation - https://github.com/ux3d/OMI

## Resources

Prior Art:
* [W3C Web Audio API](https://www.w3.org/TR/webaudio/)
* [MSFT_audio_emitter](https://github.com/KhronosGroup/glTF/pull/1400)
* [MOZ_hubs_components Audio](https://github.com/MozillaReality/hubs-blender-exporter/blob/04fc1d1/default-config.json#L298-L324)

Loading