Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rendered MIDI file's audio export files are too long #1408

Open
ReinholdH opened this issue Oct 14, 2024 · 12 comments
Open

Rendered MIDI file's audio export files are too long #1408

ReinholdH opened this issue Oct 14, 2024 · 12 comments

Comments

@ReinholdH
Copy link

FluidSynth version

fluidsynth 2.3.1 and higher

Describe the bug

Rendered MIDI file's audio export files are too long

Expected behavior

This is a regression which came in with fluidsynth 2.3.1. The MIDI file Dance.mid is 1:36 long. But with fuidsynth 2.3.1 onwards an audio export file is 2:05 long. Between 1:36 and 2:05 there is silence.
At a first glance my gut feeling tells me that the regression came in with #1159 in fluidsynth 2.3.1 but I need to further investigate.

Steps to reproduce

Render the file Dance.mid from here into a wav file. Listen the exported wav files "dance - 16bit.wav" and "dance - 32bit.flac" from here

Additional context

Between 1:36 and 2:05 there is silence. In previous fluidsynth versions before 2.3.1, the exported wav files have the correct duration. The demos were created by es20490446e as part of #1405

@ReinholdH ReinholdH added the bug label Oct 14, 2024
@ReinholdH
Copy link
Author

Yes, the culprit is #1159.

Extending the playing in fluid_midi.c by (Code snippet from fluid_midi.c of fluidsynth 2.3.1)

    /* Once we've run out of MIDI events, keep playing until no voices are active */
    if(status == FLUID_PLAYER_DONE && fluid_synth_get_active_voice_count(player->synth) > 0)
    {
        /* The first time we notice we've run out of MIDI events but there are still active voices, disable all hold pedals */
        if(!player->end_pedals_disabled)
        {
            for(i = 0; i < synth->midi_channels; i++)
            {
                fluid_synth_cc(player->synth, i, SUSTAIN_SWITCH, 0);
                fluid_synth_cc(player->synth, i, SOSTENUTO_SWITCH, 0);
            }

            player->end_pedals_disabled = 1;
        }

        status = FLUID_PLAYER_PLAYING;
    }

causes the problem. Since 2.3.1 there have been further changes in this area which need to be validated.

The demo MIDI file Dance.mid is not only causing the problem. I tested other MIDI files, too. For all the tested MIDI files the duration is not properly determined. Some are 12 seconds longer, some 18 seconds, some even 1/2 minute like Dance.mid.
I come to the conclusion that this solution here is not appropriate. This regression bug is a showstopper for us.

@ReinholdH
Copy link
Author

For the MIDI file Dance.mid I did a deep dive in fluid_player_callback. Here are my findings:

At FLUID_PLAYER_DONE: msec = 94000 = 1:34 which is the correct end of the MIDI file
At no voices active: msec = 123251 = 2:03
Final end: 2:03 + FLUID_PLAYER_STOP_GRACE_MS = 2:05 what we see as the duration of the audio export.

Looks like that the criteria "no voices active" isn't the right criteria to keep player running. My 2 cents...

@ReinholdH
Copy link
Author

For completeness because the link above to download the MIDI file Dance.mid is off-line meanwhile. Dance.mid can be downloaded as part of the GeneralUse GS 2.0.0 package from here.

@ReinholdH
Copy link
Author

The issue is a general fluidsynth MIDI Player one not only relevant for file export. The basic MIDI Player from here (w/o reverb and chorus) plays for 2:05 long and not for 1:36. File rendering for audio export just makes the bug obvious.

@derselbst
Copy link
Member

I found some time to look into this.

The problem is that a sample "Standard Tom 5" of the General User v2 soundfont is playing for about 35 seconds in release phase. The change introduced in #1159 consciously waits for all voices to stop before quitting playback. But since the release phase is so long, it takes a while to get there.

Unfortunately, I don't see a quick solution to this. A proper way to deal with this would be to measure the amplitude of the generated audio at the very end, and if it falls below a certain amount, send an ALL_SOUND_OFF event to the synth.


Also, a few general thoughts: I cannot recommend using fluidsynth's MIDI player in any serious software. Originally, the player was intended for quick audio rehearsals from the commandline. It has grown since then. Various contributors have made changes, yet there is not a single unit test for it.

@ReinholdH
Copy link
Author

The problem is that a sample "Standard Tom 5" of the General User v2 soundfont is playing for about 35 seconds in release phase. The change introduced in #1159 consciously waits for all voices to stop before quitting playback. But since the release phase is so long, it takes a while to get there.

Thank you for your precise analysis. Changing something the the soundfont General User v2 won't probably help because there may be other soundfont with a similar behavior.

Unfortunately, I don't see a quick solution to this. A proper way to deal with this would be to measure the amplitude of the generated audio at the very end, and if it falls below a certain amount, send an ALL_SOUND_OFF event to the synth.

I understand. Therefore I recommend to add a similar statement ...

Also, a few general thoughts: I cannot recommend using fluidsynth's MIDI player in any serious software. Originally, the player was intended for quick audio rehearsals from the commandline. It has grown since then. Various contributors have made changes, yet there is not a single unit test for it.

... either here or here or another adequate place.

I leave it up to you @derselbst either to close #1408 or leave it open.

@mrbumpy409
Copy link
Contributor

One possible solution to this would be to add a configurable cutoff in seconds where rendering would cease cutoff seconds after the last received MIDI event.

The GeneralUser GS Standard Tom 5 is 2.3 seconds long and has a loop at the end. For melodic presets, I would handle this scenario by using the hold envelope phase for 2.3 seconds and then the decay phase would fade the loop out afterward. However, for percussion presets, many MIDI files use zero-length notes, so any and all sound shaping happens in the release phase. Too short of a release phase, and the sample fades out too quickly and gets buried in a mix.

The 40 second release time on the Standard Tom 5 is to keep that primary 2.3 seconds of sample playing reasonably loudly, but is mostly spent on a barely audible loop. This doesn't just cause a delay to the end of the audio file rendering, but also unnecessarily chews up polyphony during song playback. I will look at a more elegant solution to this problem for the next GeneralUser GS revision.

@derselbst
Copy link
Member

Thanks Chris for the insights. I can absolutely comprehend your approach.

One possible solution to this would be to add a configurable cutoff in seconds where rendering would cease cutoff se
conds after the last received MIDI event.

The contributor who first brought up this issue and had the same idea. I was and I am still against introducing additional parameters, because they don't "fix" this issue. I.e. you either end up ending the player too early or too late. It can and should be solved dynamically, depending on the generated amplitude at the very end.

@spessasus
Copy link
Contributor

I honestly don't know what the original reporter expects fluidsynth to do.
First of all,

what would you define as the duration of a MIDI file?

  1. If it's the last event received, then the exported audio will cut off at the last note-off, leaving no room for the audio to fade away. This is technically correct (we reached the end of the file) but it makes no sense musically.

  2. Then there's the current approach: wait until all voices are silent.

Now this approach begs the question:

what is defined as silence?

Perceived silence

According to @ReinholdH, it should not last until there actually is silence, but instead until it's inaudible. Because, when zooming into the waveform, you can see that:
image
There still is noise produced during the 40s release phase. It's effectively silent, but it technically isn't.

But that approach of killing a voice when it's not actually silent would lead to problems:
Haunt Muskie.rmi.zip
Note: remove .zip from the name, this is done to get this file accepted by github.

This MIDI file is super silent. It uses velocities around 10 for the subtle piano and strings, and to actually hear it properly, you have to set the gain to around 50 (!). Playing this file with the stock gain leads to pretty much no sound, when that's actually not the case.

SF2 silence

SF2 clearly defines what silence is. See section 9.1.7:

When 96 dB of attenuation is reached in the final gain amplifier, an abrupt jump to zero gain (infinite dB
of attenuation) occurs.

Fluidsynth follows that and stops the voice when such attenuation occurs, i.e. after the 40 seconds, hence the long export.

So this is not a bug IMO. Fluidsynth correctly waits for the voices to be silent in the sf2 terms and then ends the file. It's up to the external implementation to trim the file until the user thinks that it's silent.

@ReinholdH
Copy link
Author

Of course, a final solution needs to work for super silent midi files with low velocities as well.

Here is the motivation for the report of this issue:

The images below show the duration of the song Dance.mid from here for various soundfont.

GeneralUserGS-sf2

Musyng Kite-sf2

FluidR3_GM2-2 SF2

TimGM6mb-sf2

MS Basic-sf3

The length of the song strongly depends on the used soundfont.

We can now debate whether this is a bug nor not. All in all, in turns down to what the purpose of the fluidsynth's MIDI player is. If the objective of the fluidsynth's MIDI player is to validate and test the behavior of a certain combination of a MIDI file with a soundfont, whether it fulfills the sf2 spec etc, in a kind of clean room then, of course, the observation is not a bug. But if the objective of the fluidsynth's MIDI player is to support musicians in their various musical activities independent of what the soundfonts and settings are then a dependency of the duration of a song on the used soundfont is an issue. No end user (musician) understands this dependency. No musician understands that when a song is inaudible any longer that the song is not yet finished. For example an applause to a rock song starts earlier than to a classical symphony. But also a classical symphony is finished when it is inaudible for humans.

Whether the observation is a bug or enhancement I don't care but wanted to bring up my observation after #1159 was introduced. (btw. #1159 in a playlist only effects the last MIDI file).

IMO the statement

It's up to the external implementation to trim the file until the user thinks that it's silent.

would mean that a musician has to modify his/her song (MIDI file) based on what soundfont is used in fluidsynth, and may modify the song for other soundfonts or external MIDI devices. I am of an opposite opinion.

As for now I agree to the statement of @derselbst that

Also, a few general thoughts: I cannot recommend using fluidsynth's MIDI player in any serious software. Originally, the player was intended for quick audio rehearsals from the commandline. It has grown since then. Various contributors have made changes, yet there is not a single unit test for it.

I hope my report can help to further improve fluidsynth (which is a brilliant and professionally managed software - many thanks to @derselbst and everybody who contributes)'s MIDI player.

@mrbumpy409
Copy link
Contributor

In my case, I always need to edit the rendered audio anyway to trim out silence at the beginning and adjust volume, so also having to trim the end is a non-issue to me.

@derselbst
Copy link
Member

A proper way of addressing this issue might be the following:

  1. Continuously feed the rendered to libebur128
  2. When the last event has been received and all the pedals are lifted, query libebur128 about the loudness.
  3. If it's below 96dB (recommended by SF spec), send ALL_SOUND_OFF to quit rendering.

The advantage of using something like libebur128 would be that multiple loudness measurements are available out of the box. One could easily tweak around to see what works best.

The obvious disadvantage would be yet another dependency. In case that's not desired, one might duplicate the logic of whatever loudness metrics has been decided for in the end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants