-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
immensely high amount of memory usage? #24
Comments
This is using oss output from wildmidi.c If I use a libtimidity (modified version from uhexen2 or the original v0.1 release) + its playmidi.c test app (modified for oss output), valgrind reports only 1.6 mb upon playing the same midi with the same patchset. The difference is, well, ugh.... |
Is this coming from the wildmidi or libwildmidi? I'm sure there is more room for improvement. Have you also tried with ALSA? Basically we need more profiling to figure out what is going on. |
On 2/22/14, Bret Curtis [email protected] wrote:
Seems to be coming from the library. (Verified that by replacing libtimidity ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 54 from 1) LEAK SUMMARY: |
I just used the WAV writer function and got this: ==13135== HEAP SUMMARY: We do a lot of allocate/free so we never get to the total usage size (thankfully), but you're right in that we are abusing the heap instead of reusing are already available buffers. This would gain us some speed too if we could do that. Also.. I see that we are not freeing everything, there is still one pesky buffer somewhere that isn't being freed before wildmidi (or lib) exit. |
Also.. I seem to also get this twice: ;) |
444 mb: ouch ouch ouch! (depends on the midi and especially on the gus patchset, but still..) And yeah, the duplicated msg is annoying too :) |
On 2/23/14, Bret Curtis [email protected] wrote:
It is in wildmidi.c:main() and is harmless (IIRC filename's copy, |
Just committed a fix for the duplicate message and a few other cosmetic things. I'll be adding a README.MD file soon to iterate over all contributers (including yourself). :) For the sake of good housekeeping, we should free those too, harmless or not. |
Also.. note, after your commits for fixing the audio auto-detection, we now get these: [ 85%] Building C object src/CMakeFiles/wildmidi_dynamic.dir/gus_pat.c.o We should probably put a guard around that header and look at include header looping. |
On 2/23/14, Bret Curtis [email protected] wrote:
I think guarding config.h would be a bad idea. What you have is a stale config.h in /include/config.h against |
Testing a patch for this |
OK, does #27 help with the config.h issues? |
The malloc madness seems to have started with wildmidi-0.2.3 With wildmidi-0.2.0: With wildmidi-0.2.2: With wildmidi-0.2.3: |
using sf.net svn, running under valgrind (linux x86_64, fedora 16) : svn r51 : svn r52-53: doesn't build (option macro changes, I think) http://sourceforge.net/p/wildmidi/svn/53/ : svn r 54 : (without the memory errors) So, basically, the madness begins with r52 (http://sourceforge.net/p/wildmidi/svn/52/), where WM_LC_Tokenize_Line() was added. I don't know exactly who is responsible yet. (I'd appreciate the help, of course.) |
Have a look at the code in WM_LC_Tokenize_Line... (char **token_data = NULL;) for every token found, the code does a realloc of the original pointer. A relloc will either take the existing pointer and shrink it, or create a new allocation, memcpy the old data and then free the old one. Since he is adding more to the token_data, it will create the amount of allocations and frees. We should either be allocating a proper buffer (as in a 4KiB) and only when it is full, realloc with an additional 4KiB instead of reallocating every iteration through the loop. I took time to think about the issue in general with some colleagues yesterday, the code is littered with on-the-fly allocations instead of using "large-enough" pre-allocated buffers (like pointed out above). |
On 2/27/14, Bret Curtis [email protected] wrote:
This was my initial suspicion too
Do you have patch so I can test too? |
I said think, I have no code yet... ;) If you haven't gotten to it yet, I'm going to dedicate some time to it this weekend. |
Hmmm, I would be the blame for the issue, looking at those svn commits to Ask if you need further explanation. Chris On Fri, Feb 28, 2014 at 12:56 AM, Bret Curtis [email protected]:
|
There is also a bug in svn r52, see if you can spot it :) will check githib If you want to implement our own malloc/realloc/free to use our own buffer Chris On Fri, Feb 28, 2014 at 2:57 PM, Chris Ison [email protected]:
|
The bug I saw in svn r52 has been fixed at some stage, github source reads On Fri, Feb 28, 2014 at 3:06 PM, Chris Ison [email protected]:
|
Working on a branch "memoryallocation" I managed to trim a bit from the tokenize function, but it hasn't helped much. Going to go through other areas as well. Curren output (in comparison with previous output): That is 147 less allocs/frees. :) |
I have something in the works here... we go from 11,000 allocs/frees to ~3000, from 400MiB allocated down to 92MiB... stay tuned. This is exciting :) |
Have a look at the "memoryallocation" branch, check it out and let me know what you think. There are still a few things I want to clean up... for instance, the 1 strdup. I think we can replace this with either a strcpy or a strdupa, but how everything works needs to be replaced. This is the last problem we have from valgrind and seems rather stupid. |
On 3/3/14, Bret Curtis [email protected] wrote:
Great. Will look when I get home,
The device name copy, yes? I think we can make a top level diff --git a/src/wildmidi.c b/src/wildmidi.c index 8edbff9..8f3c190 100644 --- a/src/wildmidi.c +++ b/src/wildmidi.c @@ -217,7 +217,7 @@ static int midi_test_max = 1; */ static unsigned int rate = 32072; -static char *pcmname = NULL; +static char pcmname[128]; static int (*send_output)(char * output_data, int output_size); static void (*close_output)(void); @@ -493,8 +493,8 @@ static int open_alsa_output(void) { unsigned int alsa_buffer_time; unsigned int alsa_period_time; - if (!pcmname) { - pcmname = strdup("default"); + if (!pcmname[0]) { + strcpy(pcmname, "default"); } if ((err = snd_pcm_open(&pcm, pcmname, SND_PCM_STREAM_PLAYBACK, 0)) < 0) { @@ -561,9 +561,6 @@ static int open_alsa_output(void) { send_output = write_alsa_output; close_output = close_alsa_output; pause_output = pause_output_nop; - if (pcmname != NULL) { - free(pcmname); - } return (0); fail: close_alsa_output(); @@ -635,8 +632,8 @@ static int open_oss_output(void) { int caps, rc, tmp; unsigned long int sz = sysconf(_SC_PAGESIZE); - if (!pcmname) { - pcmname = strdup("/dev/dsp"); + if (!pcmname[0]) { + strcpy(pcmname, "/dev/dsp"); } if ((audio_fd = open(pcmname, O_RDWR)) < 0) { @@ -994,6 +991,8 @@ int main(int argc, char **argv) { (void) tcsetattr(my_tty, TCSADRAIN, &_tty)) #endif /* !_WIN32, !__DJGPP__ */ + pcmname[0] = '\0'; + do_version(); while (1) { i = getopt_long(argc, argv, "vho:lr:c:m:btk:p:ed:wn", long_options, @@ -1036,7 +1035,8 @@ int main(int argc, char **argv) { fprintf(stderr, "Error: empty device name.\n"); return (0); } - pcmname = strdup(optarg); + strncpy(pcmname, optarg, 127); + pcmname[127] = '\0'; break; case 'e': /* Enhanced Resampling */ mixer_options |= WM_MO_ENHANCED_RESAMPLING; |
strdup in wildmidi_lib.c, config_dir But if we can get rid of stdup overall, that would be great. One less thing to support. |
@chris: Gauss, even if not used, still generates about 1024 reallocs... would be nice to only do that if/when gauss is used. I didn't see much usage from Linear. @sezero: I'm not liking the strdup stuff, I backed out that patch on my end because it was dubious. There is nothing wrong with strdup and it saves a few lines of code and allows us not to use arbitrary (magic) numbers on the length of a string. |
Weird that WM_GetOutput_Gauss uses ram even when its not called ... Anyways, I saw the way you changed things for the event struct memory usage and was wondering if you added a mix_buffer + mix_buffer_size to the mdi struct if you could do the same where the malloc is for both Linear and Gauss, taking out the free() at the bottom of both those and placing a free(mdi->mix_buffer) in the WildMidi_Close function; replacing the reference to out_buffer with mdi->mix_buffer and taking out the defining of out_buffer. example: in WM_GetOutput_Linear/Gauss() where it says tmp_buffer = malloc((size / 2) * sizeof(signed long int)); If you change it to say if ((size /2) > mdi->mix_buffer_size) { That may reduce the memory usage in wildmidi_lib a lot more; Chris |
On 3/3/14, Ozkan Sezer [email protected] wrote:
Tried slightly modifying your approach (you possibly considered it diff --git a/src/wildmidi_lib.c b/src/wildmidi_lib.c index 6799487..6de9f32 100644 --- a/src/wildmidi_lib.c +++ b/src/wildmidi_lib.c @@ -148,13 +148,14 @@ struct _event_data { unsigned long int data; }; +#define EVENTS_MEMCHUNK 8192 struct _mdi { int lock; unsigned long int samples_to_mix; struct _event *events; struct _event *current_event; unsigned long int event_count; - unsigned int events_size; // try to stay optimally ahead to prevent reallocs + unsigned int events_size; /* try to stay optimally ahead to prevent reallocs */ unsigned short midi_master_vol; struct _WM_Info info; @@ -511,7 +512,7 @@ static unsigned long int freq_table[] = { 837201792, 837685632, 838169728, static void WM_CheckEventMemoryPool(struct _mdi *mdi){ if (mdi->event_count >= mdi->events_size) { - mdi->events_size *= 2; + mdi->events_size += EVENTS_MEMCHUNK; mdi->events = realloc(mdi->events, (mdi->events_size * sizeof(struct _event))); } @@ -2289,8 +2290,8 @@ Init_MDI(void) { load_patch(mdi, 0x0000); - mdi->events_size = 4096; - mdi->events = malloc(mdi->events_size * sizeof(struct _event)); + mdi->events_size = EVENTS_MEMCHUNK; + mdi->events = malloc(EVENTS_MEMCHUNK * sizeof(struct _event)); mdi->events[0].do_event = NULL; mdi->events[0].event_data.channel = 0; mdi->events[0].event_data.data = 0; Result not much different: |
Yeah... mea culpa on magic numbers there. I'll take what you've done, pound away at it and smelt it into the branch. Thanks! :) I'll also setup a trigger for the gauss issue, that we only alloc when we want to use it and only once. I want to also try to do the gauss allocation in one go. @chrisisonwildcode: can you comment why in some parts of your code you do this: 1 << 10 |
Latest commit brings us down to: This is because I told libwildmidi not to init gauss (nor free) unless gauss was asked for. I also used part of your patch @sezero , thanks! :) @chrisisonwildcode : ^-- that wasn't the cause. Can you still tell me why you do 1 << 10 instead of 1024 or even use a define? |
@chrisisonwildcode I took your idea, ran with it a bit... and it is so full of win! default (linear): (gauss) I think it is safe to say, we're done here. :) I'm sure we could do a bit more work on gauss, but we've nailed this problem pretty good. Thanks guys! |
@psi29a the <<10 is part of the fixed point math thats done ... avoids using floats costing CPU cycles ... the <<10 was left in there even for 1<<10 to remind me that it is 22.10 fixed point. If you could refresh my memory as to what line number you are referring to I can explain it a little better maybe. And how does the performance compare with those changes in WM_GetOutput_*? |
In this case: As for profiling before and after, I haven't done so. I've just done valgrind at this point. If you could have a look at init_gauss() and see about cleaning up the realloc mess involving gauss_table That would get our allocs/frees down to the same level as linear. Thanks! :) |
Not at home at the moment but I will check this when I get home. I will be Chris Sent from my iPhone On 4 Mar 2014, at 9:38 pm, Bret Curtis [email protected] wrote: In this case: As for profiling before and after, I haven't done so. I've just done If you could have a look at init_gauss() and see about cleaning up the That would get our allocs/frees down to the same level as linear. Thanks! :) Reply to this email directly or view it on |
ok, here is what I am thinking. gauss_table is an array of 1024 pointers that points to a double array of so instead of doing the realloc this code: x_inc = 1.0 / (1 << 10); for (m = 0, x = 0.0; m < (1 << 10); m++, x += x_inc) { xz = (x + n_half) / (4 * M_PI); gptr = gauss_table[m] = realloc(gauss_table[m], (n + 1) * sizeof(double)); might be better looking like gauss_table = malloc(1024 * ((n+1) * sizeof(double))); x_inc = 1.0 / (1 << 10); for (m = 0, x = 0.0; m < (1 << 10); m++, x += x_inc) { xz = (x + n_half) / (4 * M_PI); gptr = &gauss_table[m * ((n + 1) * sizeof(double))]; This would mean the definition of gauss_table would need to change from static double gauss_table[(1 << 10)] = { 0 }; / don't need doubles */ to static double *gauss_table = NULL; and static void free_gauss(void) { int m; for (m = 0; m < (1 << 10); m++) { free(gauss_table[m]); gauss_table[m] = NULL; } } can just be replaced with free(gauss_table); and last but not least gptr = gauss_table[note_data->sample_pos & FPMASK]; now becomes gptr = &gauss_table[(note_data->sample_pos & FPMASK)*((gauss_n + 1) * If I missed anything feel free to scream at me, and this will need testing Chris On Wed, Mar 5, 2014 at 10:00 AM, Chris Ison [email protected]:
|
It is (very) similar to the approach I was working on over lunch. Great minds stink alike? ;) Do you have this in code? Feel free to make branch, commit. When finished, we can merge it in. I'm wondering if this even needs to be generated? If the values are stable, why not just generate it once and just set in a 2d array like with other tables? |
Getting into a similar issue as me: ==427== Invalid write of size 8 [ ] [100] [ 0m 44s Processed] [27%] - Line 241: ^-- is the one that is failing. |
Hmm, could try manually creating the 2d array, that would certainly make Chris Sent from my iPhone On 5 Mar 2014, at 7:17 pm, Bret Curtis [email protected] wrote: Getting into a similar issue as me: ==427== Invalid write of size 8 [ ] [100] [ 0m 44s Processed] [27%] - ==427== at 0x4E36363: init_gauss (wildmidi_lib.c:241) Line 241: ^-- is the one that is failing. Reply to this email directly or view it on |
Closing this as we've taken care of all the major offenders. |
On 3/6/14, Bret Curtis [email protected] wrote:
Chris' suggestion about gauss_table generation looked nice |
I agree... black magic. This particular problem will be dealt with one way or another. Either we pre-allocate the entire gauss_table with malloc or we craft the gauss_table by hand. I refuse to believe that we have to realloc for each entry. |
On 3/6/14, Bret Curtis [email protected] wrote:
OK, regarding Chris' suggestion for avoiding 1024 times realloc() Patch attached: please test, and please also forward to Chris for Regarding my concerns about commit 8356363 not being thread-safe (CC'ing Bret, because github unfathomably doesn't handle patch |
Latest commit brings me to: |
With latest patch you sent per email: Please go ahead and apply it. We only have 1 more alloc and we are now thread safe again. Great work! |
On 3/8/14, Bret Curtis [email protected] wrote:
Pushed the changes. |
Great! Thanks! |
Are we going to backport the malloc changes to 0.3, or are we going to leave it as is? |
I'm not sure, if you feel that it would have little impact (not cause any breakage), we can try to cherry-pick what we have over to 0.3 Feel free to do so. I do not see 0.4 being stabilized any time soon. :) |
On 3/8/14, Bret Curtis [email protected] wrote:
Malloc changes seem stable so far, so I pushed b519a70 backporting
What else do we have in store for it? |
Running wildmidi under valgrind, playing elise.mid (http://www.ee.iitb.ac.in/uma/~janas/music/Beethoven's%20Fur%20Elise.RMI, using instruments from here: http://sourceforge.net/project/downloading.php?group_id=124987&filename=timidity_patches.tar.gz), reports that 220 mb of memory allocated (during its lifetime of course, not necessarily all being held at the same time. ) I guess this is a bit too much?
The text was updated successfully, but these errors were encountered: