-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KHR_audio_emitter #2137
base: main
Are you sure you want to change the base?
KHR_audio_emitter #2137
Changes from 7 commits
ee317c1
b795b23
76cc32e
247a882
5d3a2a3
ca2a5d0
55084f3
9ca359e
3bed830
61852ae
96d4dcb
a5fd6ff
53c75fa
0bb5dec
c55849a
b485301
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,314 @@ | ||
# KHR_audio | ||
|
||
## Contributors | ||
|
||
* Robert Long, Element Inc. | ||
* Anthony Burchell, Individual Contributor | ||
* K. S. Ernest (iFire) Lee, Individual Contributor | ||
* Michael Nisbet, Individual Contributor | ||
* humbletim, Individual Contributor | ||
* Norbert Nopper, UX3D [@UX3DGpuSoftware](https://twitter.com/UX3DGpuSoftware) | ||
|
||
## Status | ||
|
||
Draft | ||
|
||
## Dependencies | ||
|
||
Written against the glTF 2.0 spec. | ||
|
||
## Overview | ||
|
||
This extension allows for the addition of spatialized and non-spatialized audio to glTF scenes. | ||
|
||
Audio emitter objects may be added to 3D nodes for positional audio or to the scene for environmental or ambient audio such as background music. | ||
|
||
### Example: | ||
|
||
```json | ||
{ | ||
"extensions": { | ||
"KHR_audio": { | ||
"emitters": [ | ||
{ | ||
"name": "Positional Emitter", | ||
"type": "positional", | ||
"gain": 0.8, | ||
"sources": [0, 1], | ||
"positional": { | ||
"coneInnerAngle": 6.283185307179586, | ||
"coneOuterAngle": 6.283185307179586, | ||
"coneOuterGain": 0.0, | ||
"distanceModel": "inverse", | ||
"maxDistance": 10.0, | ||
"refDistance": 1.0, | ||
"rolloffFactor": 0.8 | ||
} | ||
} | ||
], | ||
"sources": [ | ||
{ | ||
"name": "Clip 1", | ||
"gain": 0.6, | ||
"autoPlay": true, | ||
"loop": true, | ||
"audio": 0 | ||
}, | ||
{ | ||
"name": "Clip 2", | ||
"gain": 0.6, | ||
"autoPlay": true, | ||
"loop": true, | ||
"audio": 1 | ||
} | ||
], | ||
"audio": [ | ||
{ | ||
"uri": "audio1.mp3", | ||
}, | ||
{ | ||
"bufferView": 0, | ||
"mimeType": "audio/mpeg" | ||
} | ||
] | ||
} | ||
}, | ||
"scenes": [ | ||
{ | ||
"name": "Default Scene", | ||
"extensions": { | ||
"KHR_audio": { | ||
"emitters": [0] | ||
} | ||
} | ||
} | ||
], | ||
"nodes": [ | ||
{ | ||
"name": "Duck", | ||
"translation": [1.0, 2.0, 3.0], | ||
"extensions": { | ||
"KHR_audio": { | ||
"emitter": 1 | ||
} | ||
} | ||
} | ||
] | ||
} | ||
``` | ||
|
||
## glTF Schema Updates | ||
|
||
This extension consists of three primary data structures: Audio Data, Audio Sources, and Audio Emitters. Data, sources and emitters are defined on an `KHR_audio` object added to the `extensions` object on the document root. | ||
|
||
The extension must be added to the file's `extensionsUsed` array and because it is optional, it does not need to be added to the `extensionsRequired` array. | ||
robertlong marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
#### Example: | ||
|
||
```json | ||
{ | ||
"asset": { | ||
"version": "2.0" | ||
} | ||
"extensionsUsed" : [ | ||
"KHR_audio" | ||
], | ||
"scenes": [...], | ||
"nodes": [...], | ||
"extensions": { | ||
"KHR_audio": { | ||
"audio": [...], | ||
"sources": [...], | ||
"emitters": [...] | ||
} | ||
} | ||
} | ||
``` | ||
|
||
### Audio Data | ||
|
||
Audio data objects define where audio data is located. Data is either accessed via a bufferView or uri. | ||
|
||
When storing audio data in a buffer view, the `mimeType` field must be specified. Currently the only supported mime type is `audio/mpeg` for use with MP3 files. MP3 was chosen due to its wide support across browsers and 3D engines as well as its lossy compression with variable bitrate. Other supported audio formats may be added via another extension. | ||
|
||
Note that in tools that process glTF files, but do not implement the KHR_audio extension, external files referenced via the `uri` field may not be properly copied to their final destination or baked into the final binary glTF file. In these cases, using the `bufferView` property may be a better choice assuming the referenced `bufferView` index is not changed by the tool. The `uri` field might be a better choice when you want to be able to quickly change the referenced audio asset. | ||
|
||
#### `bufferView` | ||
|
||
The index of the bufferView that contains the audio data. Use this instead of the audio source's uri property. | ||
|
||
#### `mimeType` | ||
|
||
The audio's MIME type. Required if `bufferView` is defined. Unless specified by another extension, the only supported mimeType is `audio/mpeg`. | ||
|
||
#### `uri` | ||
|
||
The uri of the audio file. Relative paths are relative to the .gltf file. | ||
|
||
### Audio Sources | ||
|
||
Audio sources define the playing state for a given audio data. They connect one audio data to zero to many audio emitters. | ||
|
||
#### `gain` | ||
|
||
Unitless multiplier against original audio file volume for determining audio source loudness. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unitless, but like, what scale? Is this linear or decibels? This should be specified. Same for emitter gain. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is supposed to mirror the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. todo: @antpb explicitly state that the calculated value is linear against the volume in the audio file |
||
|
||
#### `loop` | ||
|
||
Whether or not to loop the specified audio when finished. | ||
|
||
#### `autoPlay` | ||
|
||
Whether or not to play the specified audio when the glTF is loaded. | ||
|
||
#### `audio` | ||
|
||
The index of the audio data assigned to this clip. | ||
|
||
### Audio Emitter | ||
|
||
Positional or global sinks for playing back audio sources. | ||
|
||
#### `type` | ||
|
||
Specifies the audio emitter type. | ||
|
||
- `positional` Positional audio emitters. Using sound cones, the orientation is `+Z` having the same front side for a [glTF asset](https://www.khronos.org/registry/glTF/specs/2.0/glTF-2.0.html#coordinate-system-and-units). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is inconsistent with KHR_lights_punctual which defines spotlights as facing -Z. Is it expected that conical audio should be pointing the opposite direction as conical lights? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, -Z forward makes the most sense. It's also in line with how the WebAudio spec defines audio listeners. I can't find the details on panner node, besides it seemingly defaulting to +X forward? This will break existing files but better to do it now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. todo: @antpb change to negative Z |
||
- `global` Global audio emitters are not affected by the position of audio listeners. `coneInnerAngle`, `coneOuterAngle`, `coneOuterGain`, `distanceModel`, `maxDistance`, `refDistance`, and `rolloffFactor` should all be ignored when set. | ||
|
||
#### `gain` | ||
|
||
Unitless multiplier against original source volume for determining emitter loudness. | ||
|
||
#### `loop` | ||
|
||
Whether or not to loop the specified audio clip when finished. | ||
|
||
#### `playing` | ||
|
||
Whether or not the specified audio clip is playing. Setting this property `true` will set the audio clip to play on load (autoplay). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a particular reason for naming this like a state ("playing") vs. naming it as what it does ("autoplay")? Seems that hints at possible implementation details ("changing this should change play state") but that isn't mentioned. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a bigger discussion about how animation, audio, etc. should be treated on load. Animations are currently just keyframe data and it's up to the implementation to figure out how to play the animations. https://www.khronos.org/registry/glTF/specs/2.0/glTF-2.0.html#animations So this begs the question if @najadojo @bghgary this also goes against the direction of the MSFT_audio_emitter where there are ways to specify runtime behavior. Maybe runtime behavior should be left out of this spec and implemented in another? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see your point regarding playback behaviour! If the glTF is purely seen as "data setup", then it might still be desirable to have a way to connect audio clips to animations - e.g. saying "this audio emitter belongs to that animation" (potentially: "at that point in time") would be pure data. This would be similar to how a mesh + material are connected to a node. What do you think in which direction this connection should be made? Attaching an emitter to a clip or vice versa? I think on the clip would make somewhat more sense:
In this example, two emitters belong to this animation clip, viewers would interpret that as they "loop together", and one emitter has a slight offset. Note having the same emitter play multiple times during the loop would be possible by having multiple references to the same ID with separate offsets. What do you think? I think that would preserve the "glTF is purely data" aspect by giving hints about how things are connected, not how things should be playing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. During the OMI glTF meeting we discussed this a bit. I think synchronizing audio with an animation should also be made part of another spec. AFAIK synchronizing audio with animations isn't necessarily the best way to deal with this scenario. We should talk more about There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think, at a minimum, we need to define both how the glTF carries the audio payload (including metadata) and how to play the audio, whether that is one or more specs. If this spec only defines the former and that's all we define, it will be hard to know if the spec works and it will be hard to demo. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would you agree that the options are
I agree that without any means of knowing how the audio is relevant to the scene, viewers won't be able to do anything with it - e.g. at a minimum I think tools like model-viewer should have a way to infer what to do with a file that has multiple animation clips and multiple audio assets (could be 1:1, could be different). A counter-argument would be saying "well for this case, if there's one animation and one audio clip that will be played back, everything else is undefined" (still allowing for cases such as this) but I'm not a huge fan of undefined-by-design... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. During the OMI glTF meeting we agreed that playing behavior ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we need to think about the corresponding spec(s) that define how the audio will play before completing this spec. |
||
|
||
#### `sources` | ||
|
||
An array of audio source indices used by the audio emitter. This array may be empty. | ||
|
||
#### `positional` | ||
|
||
An object containing the positional audio emitter properties. This may only be defined if `type` is set to `positional`. | ||
|
||
### Positional Audio Emitter Properties | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are doppler effects considered implementation details? Might want to explicitly call this out here (e.g. animating a node with an audio source, fast viewer movement, ... might cause doppler effects on positional sources). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We may want to add a note, yeah. But perhaps audio-listener oriented properties / effects should be defined in another series of extensions? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wouldn't think these should be specified here, and probably also not in another extension, as its very application-specific. "Doppler effect for audio" is kind of in the same realm as "bloom effect for rendering" in my opinion. The note would be enough. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can add language such as "Implementors may choose to add effects to spatial audio such as simulating the doppler effect." 👍 |
||
|
||
#### `coneInnerAngle` | ||
|
||
The angle, in radians, of a cone inside of which there will be no volume reduction. | ||
|
||
#### `coneOuterAngle` | ||
|
||
The angle, in radians, of a cone outside of which the volume will be reduced to a constant value of`coneOuterGain`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might want to add a note here that setting this to some value > 2 * PI (the max allowed value) will turn this into a spherical audio source. It's implicit from the defaults but could be explained explicitly here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree it should act as a point audio source / non-directional source when set. It's worth noting the WebAudio API Specification doesn't specify this detail though. What's also missing is the behavior when the coneOuterAngle is less than the coneInnerAngle or when the coneInnerAngle is greater than the coneOuterAngle. We should check on this before adding these details. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Godot only has one cone angle, in that case which of these values should we use? The behavior should be defined. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can calculate the cone gain using this function defined in the WebAudio spec: It shouldn't be too costly to do at runtime. |
||
|
||
#### `coneOuterGain` | ||
|
||
The gain of the audio emitter set when outside the cone defined by the `coneOuterAngle` property. It is a linear value (not dB). | ||
|
||
#### `distanceModel` | ||
|
||
Specifies the distance model for the audio emitter. | ||
|
||
- `linear` A linear distance model calculating the gain induced by the distance according to: | ||
`1.0 - rolloffFactor * (distance - refDistance) / (maxDistance - refDistance)` | ||
- `inverse` (default) An inverse distance model calculating the gain induced by the distance according to: | ||
`refDistance / (refDistance + rolloffFactor * (Math.max(distance, refDistance) - refDistance))` | ||
- `exponential` An exponential distance model calculating the gain induced by the distance according to: | ||
`pow((Math.max(distance, refDistance) / refDistance, -rolloffFactor))` | ||
|
||
#### `maxDistance` | ||
|
||
The maximum distance between the emitter and listener, after which the volume will not be reduced any further. `maximumDistance` may only be applied when the distanceModel is set to linear. Otherwise, it should be ignored. | ||
|
||
#### `refDistance` | ||
|
||
A reference distance for reducing volume as the emitter moves further from the listener. For distances less than this, the volume is not reduced. | ||
|
||
#### `rolloffFactor` | ||
|
||
Describes how quickly the volume is reduced as the emitter moves away from listener. When distanceModel is set to linear, the maximum value is 1 otherwise there is no upper limit. | ||
|
||
### Using Audio Emitters | ||
|
||
Audio emitters of type `global` may be added to scenes using the following syntax: | ||
|
||
```json | ||
{ | ||
"scenes": [ | ||
{ | ||
"extensions": { | ||
"KHR_audio": { | ||
"emitters": [0, 1] | ||
} | ||
} | ||
} | ||
] | ||
} | ||
``` | ||
|
||
Audio emitters of type `positional` may be added to nodes using the following syntax: | ||
|
||
```json | ||
{ | ||
"nodes": [ | ||
{ | ||
"extensions": { | ||
"KHR_audio": { | ||
"emitter": 2 | ||
} | ||
} | ||
} | ||
] | ||
} | ||
``` | ||
|
||
Note that multiple global audio emitters are allowed on the scene, but only a single audio emitter may be added to a node. | ||
|
||
### Audio Rolloff Formula | ||
The Audio Rolloff range is `(0.0, +∞)`. The default is `1.0`. | ||
|
||
The rolloff formula is dependant on the distance model defined. The available distance models are `linear`, `inverse`, and `exponential`. | ||
|
||
- linear formula: `1.0 - rolloffFactor * (distance - refDistance) / (maxDistance - refDistance)` | ||
- inverse formula: `refDistance / (refDistance + rolloffFactor * (Math.max(distance, refDistance) - refDistance))` | ||
- exponential formula: `pow((Math.max(distance, refDistance) / refDistance, -rolloffFactor))` | ||
|
||
### Audio Gain Units | ||
The gain unit range is `(0.0, +∞)`. The default is `1.0`. | ||
- gain formula: `originalVolume * gain` | ||
|
||
### Audio Cone Vizualized | ||
<img alt="Audio cone showing how cone parameters impact volume based on relative distance to the source." src="./figures/cone-diagram.svg" width="500px" /> | ||
|
||
Figure 1. A modified graphic based on the <a href="https://webaudio.github.io/web-audio-api/#Spatialization-sound-cones" target="_blank">W3C Web Audio API Audio cone Figure</a> | ||
|
||
The cone properties relate to the `PannerNode` interface and determine the amount of volume relative to a listeners position within the defined cone area. | ||
|
||
The gain relative to cone properties is determined in a similar way as described in the web audio api with the difference that this audio emitter extension uses radians in place of degrees. [Cone Gain Algorithm Example](https://webaudio.github.io/web-audio-api/#Spatialization-sound-cones) | ||
|
||
### Units for Rotations | ||
|
||
Radians are used for rotations matching glTF2. | ||
|
||
### JSON Schema | ||
|
||
[glTF.KHR_audio.schema.json](/extensions/2.0/KHR_audio/schema/glTF.KHR_audio.schema.json) | ||
|
||
## Known Implementations | ||
|
||
* Third Room - https://github.com/thirdroom/thirdroom | ||
* Three Object Viewer (WordPress Plugin) - https://wordpress.org/plugins/three-object-viewer/ | ||
* UX3D Experimental C++ implementation - https://github.com/ux3d/OMI | ||
|
||
## Resources | ||
|
||
Prior Art: | ||
* [W3C Web Audio API](https://www.w3.org/TR/webaudio/) | ||
* [MSFT_audio_emitter](https://github.com/KhronosGroup/glTF/pull/1400) | ||
* [MOZ_hubs_components Audio](https://github.com/MozillaReality/hubs-blender-exporter/blob/04fc1d1/default-config.json#L298-L324) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 3 properties are in the example and in the schema, but they are not defined in the README.