Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

immensely high amount of memory usage? #24

Closed
sezero opened this issue Feb 22, 2014 · 54 comments
Closed

immensely high amount of memory usage? #24

sezero opened this issue Feb 22, 2014 · 54 comments
Assignees

Comments

@sezero
Copy link
Contributor

sezero commented Feb 22, 2014

Running wildmidi under valgrind, playing elise.mid (http://www.ee.iitb.ac.in/uma/~janas/music/Beethoven's%20Fur%20Elise.RMI, using instruments from here: http://sourceforge.net/project/downloading.php?group_id=124987&filename=timidity_patches.tar.gz), reports that 220 mb of memory allocated (during its lifetime of course, not necessarily all being held at the same time. ) I guess this is a bit too much?

@sezero
Copy link
Contributor Author

sezero commented Feb 22, 2014

This is using oss output from wildmidi.c If I use a libtimidity (modified version from uhexen2 or the original v0.1 release) + its playmidi.c test app (modified for oss output), valgrind reports only 1.6 mb upon playing the same midi with the same patchset. The difference is, well, ugh....

@psi29a
Copy link
Member

psi29a commented Feb 22, 2014

Is this coming from the wildmidi or libwildmidi? I'm sure there is more room for improvement. Have you also tried with ALSA? Basically we need more profiling to figure out what is going on.

@sezero
Copy link
Contributor Author

sezero commented Feb 22, 2014

On 2/22/14, Bret Curtis [email protected] wrote:

Is this coming from the wildmidi or libwildmidi? I'm sure there is more room
for improvement. Have you also tried with ALSA? Basically we need more
profiling to figure out what is going on.

Seems to be coming from the library. (Verified that by replacing libtimidity
calls with libWildMidi calls in the test app from libtimidity and re-running
under valgrind.) Using alsa instead of OSS doesn't change the result.

ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 54 from 1)
malloc/free: in use at exit: 48,919 bytes in 1,290 blocks.
malloc/free: 9,063 allocs, 7,773 frees, 220,744,128 bytes allocated.
For counts of detected errors, rerun with: -v
searching for pointers to 1,290 not-freed blocks.
checked 235,544 bytes.

LEAK SUMMARY:
definitely lost: 0 bytes in 0 blocks.
possibly lost: 10,748 bytes in 299 blocks.
still reachable: 38,171 bytes in 991 blocks.
suppressed: 0 bytes in 0 blocks.

@psi29a
Copy link
Member

psi29a commented Feb 23, 2014

I just used the WAV writer function and got this:

==13135== HEAP SUMMARY:
==13135== in use at exit: 28 bytes in 1 blocks
==13135== total heap usage: 11,094 allocs, 11,093 frees, 444,296,561 bytes allocated
==13135==
==13135== LEAK SUMMARY:
==13135== definitely lost: 0 bytes in 0 blocks
==13135== indirectly lost: 0 bytes in 0 blocks
==13135== possibly lost: 0 bytes in 0 blocks
==13135== still reachable: 28 bytes in 1 blocks
==13135== suppressed: 0 bytes in 0 blocks
==13135== Rerun with --leak-check=full to see details of leaked memory
==13135==
==13135== For counts of detected and suppressed errors, rerun with: -v
==13135== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)

We do a lot of allocate/free so we never get to the total usage size (thankfully), but you're right in that we are abusing the heap instead of reusing are already available buffers. This would gain us some speed too if we could do that.

Also.. I see that we are not freeing everything, there is still one pesky buffer somewhere that isn't being freed before wildmidi (or lib) exit.

@psi29a
Copy link
Member

psi29a commented Feb 23, 2014

Also.. I seem to also get this twice:
Shutting down Sound System
Shutting Down Sound System

;)

@sezero
Copy link
Contributor Author

sezero commented Feb 23, 2014

444 mb: ouch ouch ouch! (depends on the midi and especially on the gus patchset, but still..)

And yeah, the duplicated msg is annoying too :)

@sezero
Copy link
Contributor Author

sezero commented Feb 23, 2014

On 2/23/14, Bret Curtis [email protected] wrote:

Also.. I see that we are not freeing everything, there is still one pesky
buffer somewhere that isn't being freed before wildmidi (or lib) exit.

It is in wildmidi.c:main() and is harmless (IIRC filename's copy,
device name's copy, etc are left unfreed.)

@psi29a
Copy link
Member

psi29a commented Feb 23, 2014

Just committed a fix for the duplicate message and a few other cosmetic things. I'll be adding a README.MD file soon to iterate over all contributers (including yourself). :)

For the sake of good housekeeping, we should free those too, harmless or not.

@psi29a
Copy link
Member

psi29a commented Feb 23, 2014

Also.. note, after your commits for fixing the audio auto-detection, we now get these:

[ 85%] Building C object src/CMakeFiles/wildmidi_dynamic.dir/gus_pat.c.o
In file included from /home/bcurtis/workspace/Wildmidi/wildmidi/src/wildmidi_lib.c:62:0:
/home/bcurtis/workspace/Wildmidi/build/include/config.h:10:0: warning: "PACKAGE_BUGREPORT" redefined [enabled by default]
In file included from /home/bcurtis/workspace/Wildmidi/wildmidi/include/common.h:26:0,
from /home/bcurtis/workspace/Wildmidi/wildmidi/src/wildmidi_lib.c:61:
/home/bcurtis/workspace/Wildmidi/wildmidi/include/config.h:10:0: note: this is the location of the previous definition

We should probably put a guard around that header and look at include header looping.

@sezero
Copy link
Contributor Author

sezero commented Feb 23, 2014

On 2/23/14, Bret Curtis [email protected] wrote:

Also.. note, after your commits for fixing the audio auto-detection, we now
get these:

[ 85%] Building C object src/CMakeFiles/wildmidi_dynamic.dir/gus_pat.c.o
In file included from
/home/bcurtis/workspace/Wildmidi/wildmidi/src/wildmidi_lib.c:62:0:
/home/bcurtis/workspace/Wildmidi/build/include/config.h:10:0: warning:
"PACKAGE_BUGREPORT" redefined [enabled by default]
In file included from
/home/bcurtis/workspace/Wildmidi/wildmidi/include/common.h:26:0,
from
/home/bcurtis/workspace/Wildmidi/wildmidi/src/wildmidi_lib.c:61:
/home/bcurtis/workspace/Wildmidi/wildmidi/include/config.h:10:0: note: this
is the location of the previous definition

We should probably put a guard around that header and look at include header
looping.

I think guarding config.h would be a bad idea.

What you have is a stale config.h in /include/config.h against
freshly generated /include/config.h. However, how does it
happen that both of them are getting included? /include/ is
above /include/ in the cmake file, but it isn't preventing it.

@sezero
Copy link
Contributor Author

sezero commented Feb 23, 2014

What you have is a stale config.h in /include/config.h against
freshly generated /include/config.h. However, how does it
happen that both of them are getting included? /include/ is
above /include/ in the cmake file, but it isn't preventing it.

Testing a patch for this

@sezero
Copy link
Contributor Author

sezero commented Feb 23, 2014

OK, does #27 help with the config.h issues?

This was referenced Feb 23, 2014
@sezero
Copy link
Contributor Author

sezero commented Feb 26, 2014

The malloc madness seems to have started with wildmidi-0.2.3

With wildmidi-0.2.0:
==5454== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 13 from 1)
==5454== malloc/free: in use at exit: 68,404 bytes in 8 blocks.
==5454== malloc/free: 613 allocs, 605 frees, 2,308,259 bytes allocated.
==5454== For counts of detected errors, rerun with: -v
==5454== searching for pointers to 8 not-freed blocks.
==5454== checked 60,592 bytes.
==5454==
==5454== LEAK SUMMARY:
==5454== definitely lost: 68,404 bytes in 8 blocks.
==5454== possibly lost: 0 bytes in 0 blocks.
==5454== still reachable: 0 bytes in 0 blocks.
==5454== suppressed: 0 bytes in 0 blocks.
==5454== Rerun with --leak-check=full to see details of leaked memory.

With wildmidi-0.2.2:
==5452== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 13 from 1)
==5452== malloc/free: in use at exit: 143,373 bytes in 1,025 blocks.
==5452== malloc/free: 1,638 allocs, 613 frees, 2,451,632 bytes allocated.
==5452== For counts of detected errors, rerun with: -v
==5452== searching for pointers to 1,025 not-freed blocks.
==5452== checked 236,732 bytes.
==5452==
==5452== LEAK SUMMARY:
==5452== definitely lost: 0 bytes in 0 blocks.
==5452== possibly lost: 0 bytes in 0 blocks.
==5452== still reachable: 143,373 bytes in 1,025 blocks.
==5452== suppressed: 0 bytes in 0 blocks.
==5452== Rerun with --leak-check=full to see details of leaked memory.

With wildmidi-0.2.3:
==9867== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 13 from 1)
==9867== malloc/free: in use at exit: 13 bytes in 1 blocks.
==9867== malloc/free: 8,203 allocs, 8,202 frees, 260,711,580 bytes allocated.
==9867== For counts of detected errors, rerun with: -v
==9867== searching for pointers to 1 not-freed blocks.
==9867== checked 93,372 bytes.
==9867==
==9867== LEAK SUMMARY:
==9867== definitely lost: 0 bytes in 0 blocks.
==9867== possibly lost: 0 bytes in 0 blocks.
==9867== still reachable: 13 bytes in 1 blocks.
==9867== suppressed: 0 bytes in 0 blocks.
==9867== Rerun with --leak-check=full to see details of leaked memory.

@sezero
Copy link
Contributor Author

sezero commented Feb 27, 2014

using sf.net svn, running under valgrind (linux x86_64, fedora 16) :

svn r51 :
==13541== total heap usage: 1,630 allocs, 1,629 frees, 2,755,338 bytes allocated

svn r52-53: doesn't build (option macro changes, I think)
http://sourceforge.net/p/wildmidi/svn/52/ :
modified midi event handling
modified options and error messages

http://sourceforge.net/p/wildmidi/svn/53/ :
http://sourceforge.net/p/wildmidi/svn/54/ :
modified options

svn r 54 : (without the memory errors)
==15728== total heap usage: 8,200 allocs, 8,197 frees, 519,193,574 bytes allocated

So, basically, the madness begins with r52 (http://sourceforge.net/p/wildmidi/svn/52/), where WM_LC_Tokenize_Line() was added. I don't know exactly who is responsible yet. (I'd appreciate the help, of course.)

@psi29a
Copy link
Member

psi29a commented Feb 27, 2014

Have a look at the code in WM_LC_Tokenize_Line... (char **token_data = NULL;) for every token found, the code does a realloc of the original pointer. A relloc will either take the existing pointer and shrink it, or create a new allocation, memcpy the old data and then free the old one.

Since he is adding more to the token_data, it will create the amount of allocations and frees.

We should either be allocating a proper buffer (as in a 4KiB) and only when it is full, realloc with an additional 4KiB instead of reallocating every iteration through the loop.

I took time to think about the issue in general with some colleagues yesterday, the code is littered with on-the-fly allocations instead of using "large-enough" pre-allocated buffers (like pointed out above).

@sezero
Copy link
Contributor Author

sezero commented Feb 27, 2014

On 2/27/14, Bret Curtis [email protected] wrote:

Have a look at the code in WM_LC_Tokenize_Line... for every token found, the
code does an realloc of the original pointer. A relloc will either take the
existing pointer and shrink it, or create a new allocation and free the old
one.

Since he is adding more to the token_data, it will create the amount of
allocations and frees.

This was my initial suspicion too

We should either be allocating a proper buffer (as in a 4KiB) and only when
it is full, realloc instead of every since iteration through the loop.

I took time to think about the issue in general with some colleagues
yesterday, the code is littered with on-the-fly allocations instead of using
"large-enough" pre-allocated buffers (like pointed out above).

Do you have patch so I can test too?

@psi29a
Copy link
Member

psi29a commented Feb 27, 2014

I said think, I have no code yet... ;) If you haven't gotten to it yet, I'm going to dedicate some time to it this weekend.

@chrisisonwildcode
Copy link
Contributor

Hmmm, I would be the blame for the issue, looking at those svn commits to
see if anything else stands out as well. One thing that it could be is that
a temp buffer was put into the getoutput functions. This was to fix an
issue where the the mix of samples would go over the 16bit signed limit and
create really bad artefacts due to the unpredictable clipping results of
going over the 16bit signed limits. The buffer was implemented allow the
mix to go over the 16bit signed limits and to predictably clip the value
back to the limits.

Ask if you need further explanation.

Chris

On Fri, Feb 28, 2014 at 12:56 AM, Bret Curtis [email protected]:

I said think, I have no code yet... ;) If you haven't gotten to it yet,
I'm going to dedicate some time to it this weekend.

Reply to this email directly or view it on GitHubhttps://github.com//issues/24#issuecomment-36249558
.

@chrisisonwildcode
Copy link
Contributor

There is also a bug in svn r52, see if you can spot it :) will check githib
to see if it was ever spotted and fixed.

If you want to implement our own malloc/realloc/free to use our own buffer
to reduce the amount of malloc/realloc/free calls to the system it would be
great. This is something I thought about myself from time to time but was
not sure I could do as efficiently as the system calls.

Chris

On Fri, Feb 28, 2014 at 2:57 PM, Chris Ison [email protected]:

Hmmm, I would be the blame for the issue, looking at those svn commits to
see if anything else stands out as well. One thing that it could be is that
a temp buffer was put into the getoutput functions. This was to fix an
issue where the the mix of samples would go over the 16bit signed limit and
create really bad artefacts due to the unpredictable clipping results of
going over the 16bit signed limits. The buffer was implemented allow the
mix to go over the 16bit signed limits and to predictably clip the value
back to the limits.

Ask if you need further explanation.

Chris

On Fri, Feb 28, 2014 at 12:56 AM, Bret Curtis [email protected]:

I said think, I have no code yet... ;) If you haven't gotten to it yet,
I'm going to dedicate some time to it this weekend.

Reply to this email directly or view it on GitHubhttps://github.com//issues/24#issuecomment-36249558
.

@chrisisonwildcode
Copy link
Contributor

The bug I saw in svn r52 has been fixed at some stage, github source reads
as it should (in the getoutput functions).

On Fri, Feb 28, 2014 at 3:06 PM, Chris Ison [email protected]:

There is also a bug in svn r52, see if you can spot it :) will check
githib to see if it was ever spotted and fixed.

If you want to implement our own malloc/realloc/free to use our own buffer
to reduce the amount of malloc/realloc/free calls to the system it would be
great. This is something I thought about myself from time to time but was
not sure I could do as efficiently as the system calls.

Chris

On Fri, Feb 28, 2014 at 2:57 PM, Chris Ison [email protected]:

Hmmm, I would be the blame for the issue, looking at those svn commits to
see if anything else stands out as well. One thing that it could be is that
a temp buffer was put into the getoutput functions. This was to fix an
issue where the the mix of samples would go over the 16bit signed limit and
create really bad artefacts due to the unpredictable clipping results of
going over the 16bit signed limits. The buffer was implemented allow the
mix to go over the 16bit signed limits and to predictably clip the value
back to the limits.

Ask if you need further explanation.

Chris

On Fri, Feb 28, 2014 at 12:56 AM, Bret Curtis [email protected]:

I said think, I have no code yet... ;) If you haven't gotten to it yet,
I'm going to dedicate some time to it this weekend.

Reply to this email directly or view it on GitHubhttps://github.com//issues/24#issuecomment-36249558
.

@psi29a
Copy link
Member

psi29a commented Mar 1, 2014

Working on a branch "memoryallocation"

I managed to trim a bit from the tokenize function, but it hasn't helped much. Going to go through other areas as well.

Curren output (in comparison with previous output):
==15597== total heap usage: 10,947 allocs, 10,946 frees, 444,328,337 bytes allocated

That is 147 less allocs/frees. :)

@psi29a
Copy link
Member

psi29a commented Mar 3, 2014

I have something in the works here... we go from 11,000 allocs/frees to ~3000, from 400MiB allocated down to 92MiB... stay tuned. This is exciting :)

@psi29a
Copy link
Member

psi29a commented Mar 3, 2014

Have a look at the "memoryallocation" branch, check it out and let me know what you think.

There are still a few things I want to clean up... for instance, the 1 strdup. I think we can replace this with either a strcpy or a strdupa, but how everything works needs to be replaced. This is the last problem we have from valgrind and seems rather stupid.

@sezero
Copy link
Contributor Author

sezero commented Mar 3, 2014

On 3/3/14, Bret Curtis [email protected] wrote:

Have a look at the "memoryallocation" branch, check it out and let me know
what you think.

Great. Will look when I get home,

There are still a few things I want to clean up... for instance, the 1
strdup. I think we can replace this with either a strcpy or a strdupa, but
how everything works needs to be replaced. This is the last problem we have
from valgrind and seems rather stupid.

The device name copy, yes? I think we can make a top level
static char pcmname[128]; do a pcmname[0] = '\0' early in main(),
and if the user does a -d option then strcpy it, or in case of
no -d, then let the drivers strcpy their default, and get rid of
the strdup. Like:

diff --git a/src/wildmidi.c b/src/wildmidi.c
index 8edbff9..8f3c190 100644
--- a/src/wildmidi.c
+++ b/src/wildmidi.c
@@ -217,7 +217,7 @@ static int midi_test_max = 1;
  */
 static unsigned int rate = 32072;
-static char *pcmname = NULL;
+static char pcmname[128];
 static int (*send_output)(char * output_data, int output_size);
 static void (*close_output)(void);
@@ -493,8 +493,8 @@ static int open_alsa_output(void) {
    unsigned int alsa_buffer_time;
    unsigned int alsa_period_time;
-   if (!pcmname) {
-       pcmname = strdup("default");
+   if (!pcmname[0]) {
+       strcpy(pcmname, "default");
    }
    if ((err = snd_pcm_open(&pcm, pcmname, SND_PCM_STREAM_PLAYBACK, 0)) < 0) {
@@ -561,9 +561,6 @@ static int open_alsa_output(void) {
    send_output = write_alsa_output;
    close_output = close_alsa_output;
    pause_output = pause_output_nop;
-   if (pcmname != NULL) {
-       free(pcmname);
-   }
    return (0);
 fail:  close_alsa_output();
@@ -635,8 +632,8 @@ static int open_oss_output(void) {
    int caps, rc, tmp;
    unsigned long int sz = sysconf(_SC_PAGESIZE);
-   if (!pcmname) {
-       pcmname = strdup("/dev/dsp");
+   if (!pcmname[0]) {
+       strcpy(pcmname, "/dev/dsp");
    }
    if ((audio_fd = open(pcmname, O_RDWR)) < 0) {
@@ -994,6 +991,8 @@ int main(int argc, char **argv) {
        (void) tcsetattr(my_tty, TCSADRAIN, &_tty))
 #endif /* !_WIN32, !__DJGPP__ */
+   pcmname[0] = '\0';
+
    do_version();
    while (1) {
        i = getopt_long(argc, argv, "vho:lr:c:m:btk:p:ed:wn", long_options,
@@ -1036,7 +1035,8 @@ int main(int argc, char **argv) {
                fprintf(stderr, "Error: empty device name.\n");
                return (0);
            }
-           pcmname = strdup(optarg);
+           strncpy(pcmname, optarg, 127);
+           pcmname[127] = '\0';
            break;
        case 'e': /* Enhanced Resampling */
            mixer_options |= WM_MO_ENHANCED_RESAMPLING;

@psi29a
Copy link
Member

psi29a commented Mar 3, 2014

strdup in wildmidi_lib.c, config_dir

But if we can get rid of stdup overall, that would be great. One less thing to support.

@psi29a
Copy link
Member

psi29a commented Mar 3, 2014

@chris: Gauss, even if not used, still generates about 1024 reallocs... would be nice to only do that if/when gauss is used. I didn't see much usage from Linear.

@sezero: I'm not liking the strdup stuff, I backed out that patch on my end because it was dubious. There is nothing wrong with strdup and it saves a few lines of code and allows us not to use arbitrary (magic) numbers on the length of a string.

@psi29a psi29a added this to the 0.4 - Now with more support! milestone Mar 3, 2014
@psi29a psi29a self-assigned this Mar 3, 2014
@chrisisonwildcode
Copy link
Contributor

Weird that WM_GetOutput_Gauss uses ram even when its not called ...

Anyways, I saw the way you changed things for the event struct memory usage and was wondering if you added a mix_buffer + mix_buffer_size to the mdi struct if you could do the same where the malloc is for both Linear and Gauss, taking out the free() at the bottom of both those and placing a free(mdi->mix_buffer) in the WildMidi_Close function; replacing the reference to out_buffer with mdi->mix_buffer and taking out the defining of out_buffer.

example:

in WM_GetOutput_Linear/Gauss() where it says

tmp_buffer = malloc((size / 2) * sizeof(signed long int));

If you change it to say

if ((size /2) > mdi->mix_buffer_size) {
if ((size / 2) <= (mdi->mix_buffer_size * 2) {
mdi->mix_buffer_size *= 2;
} else {
mdi->mix_buffer_size = size / 2;
}
mdi->mix_buffer = realloc(mdi->mix_buffer, mdi->mix_buffer_size * sizeof(signed long int));
}
tmp_buffer = mdi->mix_buffer;

That may reduce the memory usage in wildmidi_lib a lot more;

Chris

@sezero
Copy link
Contributor Author

sezero commented Mar 4, 2014

On 3/3/14, Ozkan Sezer [email protected] wrote:

On 3/3/14, Bret Curtis [email protected] wrote:

Master branch:
==26271== HEAP SUMMARY:
==26271== in use at exit: 28 bytes in 1 blocks
==26271== total heap usage: 11,094 allocs, 11,093 frees, 444,296,561
bytes
allocated

memoryallocation branch:
==29941== HEAP SUMMARY:
==29941== in use at exit: 0 bytes in 0 blocks
==29941== total heap usage: 2,920 allocs, 2,920 frees, 92,775,456 bytes
allocated

BOOM! Headshot!

Fedora-16 (x86_64), wav generation:

==3243== in use at exit: 13 bytes in 1 blocks
==3243== total heap usage: 8,173 allocs, 8,172 frees, 517,534,081
bytes allocated

==3250== in use at exit: 0 bytes in 0 blocks
==3250== total heap usage: 3,229 allocs, 3,229 frees, 82,720,609
bytes allocated

Didn't look at the changes themselves, but the result seems to be
going at a good direction! Still needs more work though, esp. when
compared against wildmidi-0.2.2 (or wildmidi-svn-r51.)

Tried slightly modifying your approach (you possibly considered it
yourself, but anyway.) Like: (patch against your branch)

diff --git a/src/wildmidi_lib.c b/src/wildmidi_lib.c
index 6799487..6de9f32 100644
--- a/src/wildmidi_lib.c
+++ b/src/wildmidi_lib.c
@@ -148,13 +148,14 @@ struct _event_data {
    unsigned long int data;
 };
+#define EVENTS_MEMCHUNK 8192
 struct _mdi {
    int lock;
    unsigned long int samples_to_mix;
    struct _event *events;
    struct _event *current_event;
    unsigned long int event_count;
-   unsigned int events_size;   // try to stay optimally ahead to prevent reallocs
+   unsigned int events_size;   /* try to stay optimally ahead to prevent
reallocs */
    unsigned short midi_master_vol;
    struct _WM_Info info;
@@ -511,7 +512,7 @@ static unsigned long int freq_table[] = {
837201792, 837685632, 838169728,
 static void WM_CheckEventMemoryPool(struct _mdi *mdi){
    if (mdi->event_count >= mdi->events_size) {
-       mdi->events_size *= 2;
+       mdi->events_size += EVENTS_MEMCHUNK;
        mdi->events = realloc(mdi->events,
            (mdi->events_size * sizeof(struct _event)));
    }
@@ -2289,8 +2290,8 @@ Init_MDI(void) {
    load_patch(mdi, 0x0000);
-   mdi->events_size = 4096;
-   mdi->events = malloc(mdi->events_size * sizeof(struct _event));
+   mdi->events_size = EVENTS_MEMCHUNK;
+   mdi->events = malloc(EVENTS_MEMCHUNK * sizeof(struct _event));
    mdi->events[0].do_event = NULL;
    mdi->events[0].event_data.channel = 0;
    mdi->events[0].event_data.data = 0;

Result not much different:
==4204== in use at exit: 0 bytes in 0 blocks
==4204== total heap usage: 3,228 allocs, 3,228 frees, 82,556,769
bytes allocated

@psi29a
Copy link
Member

psi29a commented Mar 4, 2014

Yeah... mea culpa on magic numbers there. I'll take what you've done, pound away at it and smelt it into the branch. Thanks! :)

I'll also setup a trigger for the gauss issue, that we only alloc when we want to use it and only once.

I want to also try to do the gauss allocation in one go.

@chrisisonwildcode: can you comment why in some parts of your code you do this: 1 << 10
This always gives 1024, so why the extra computation?

@psi29a
Copy link
Member

psi29a commented Mar 4, 2014

Latest commit brings us down to:
==8544== HEAP SUMMARY:
==8544== in use at exit: 0 bytes in 0 blocks
==8544== total heap usage: 1,895 allocs, 1,895 frees, 92,324,896 bytes allocated

This is because I told libwildmidi not to init gauss (nor free) unless gauss was asked for. I also used part of your patch @sezero , thanks! :)

@chrisisonwildcode : ^-- that wasn't the cause. Can you still tell me why you do 1 << 10 instead of 1024 or even use a define?

@psi29a
Copy link
Member

psi29a commented Mar 4, 2014

@chrisisonwildcode I took your idea, ran with it a bit... and it is so full of win!

default (linear):
==28041== total heap usage: 617 allocs, 617 frees, 8,571,152 bytes allocated

(gauss)
==27547== total heap usage: 1,641 allocs, 1,641 frees, 8,857,872 bytes allocated

I think it is safe to say, we're done here. :) I'm sure we could do a bit more work on gauss, but we've nailed this problem pretty good. Thanks guys!

@chrisisonwildcode
Copy link
Contributor

@psi29a the <<10 is part of the fixed point math thats done ... avoids using floats costing CPU cycles ... the <<10 was left in there even for 1<<10 to remind me that it is 22.10 fixed point. If you could refresh my memory as to what line number you are referring to I can explain it a little better maybe.

And how does the performance compare with those changes in WM_GetOutput_*?

@psi29a
Copy link
Member

psi29a commented Mar 4, 2014

In this case:
static double gauss_table[(1 << 10)] = { 0 }; / don't need doubles */
^-- 1 << 10 will resolve to 1024
Not a big deal as this only runs once.

As for profiling before and after, I haven't done so. I've just done valgrind at this point.

If you could have a look at init_gauss() and see about cleaning up the realloc mess involving gauss_table

That would get our allocs/frees down to the same level as linear. Thanks! :)

@chrisisonwildcode
Copy link
Contributor

Not at home at the moment but I will check this when I get home. I will be
looking at reducing the realloc calls if possible however I don't know how
much memory usage would be reduced by due to the table of info that is also
created in this function. Will look closely at it tonight when I get home
and post any suggestion I can come up with.

Chris

Sent from my iPhone

On 4 Mar 2014, at 9:38 pm, Bret Curtis [email protected] wrote:

In this case:
static double gauss_table[(1 << 10)] = { 0 }; / don't need doubles */
^-- 1 << 10 will resolve to 1024
Not a big deal as this only runs once.

As for profiling before and after, I haven't done so. I've just done
valgrind at this point.

If you could have a look at init_gauss() and see about cleaning up the
realloc mess involving gauss_table

That would get our allocs/frees down to the same level as linear. Thanks! :)

Reply to this email directly or view it on
GitHubhttps://github.com//issues/24#issuecomment-36615799
.

@chrisisonwildcode
Copy link
Contributor

ok, here is what I am thinking.

gauss_table is an array of 1024 pointers that points to a double array of
35 doubles ... this array is essential ...

so instead of doing the realloc this code:

x_inc = 1.0 / (1 << 10);

for (m = 0, x = 0.0; m < (1 << 10); m++, x += x_inc) {

xz = (x + n_half) / (4 * M_PI);

gptr = gauss_table[m] = realloc(gauss_table[m],

(n + 1) * sizeof(double));

might be better looking like

gauss_table = malloc(1024 * ((n+1) * sizeof(double)));

x_inc = 1.0 / (1 << 10);

for (m = 0, x = 0.0; m < (1 << 10); m++, x += x_inc) {

xz = (x + n_half) / (4 * M_PI);

gptr = &gauss_table[m * ((n + 1) * sizeof(double))];

This would mean the definition of gauss_table would need to change from

static double gauss_table[(1 << 10)] = { 0 }; / don't need doubles */

to

static double *gauss_table = NULL;

and

static void free_gauss(void) {

int m;

for (m = 0; m < (1 << 10); m++) {

free(gauss_table[m]);

gauss_table[m] = NULL;

}

}

can just be replaced with

free(gauss_table);

and last but not least

gptr = gauss_table[note_data->sample_pos & FPMASK];

now becomes

gptr = &gauss_table[(note_data->sample_pos & FPMASK)*((gauss_n + 1) *
sizeof(double))];

If I missed anything feel free to scream at me, and this will need testing
of course to make sure I worked out the changes correctly.

Chris

On Wed, Mar 5, 2014 at 10:00 AM, Chris Ison [email protected]:

Not at home at the moment but I will check this when I get home. I will be
looking at reducing the realloc calls if possible however I don't know how
much memory usage would be reduced by due to the table of info that is also
created in this function. Will look closely at it tonight when I get home
and post any suggestion I can come up with.

Chris

Sent from my iPhone

On 4 Mar 2014, at 9:38 pm, Bret Curtis [email protected] wrote:

In this case:
static double gauss_table[(1 << 10)] = { 0 }; / don't need doubles */
^-- 1 << 10 will resolve to 1024
Not a big deal as this only runs once.

As for profiling before and after, I haven't done so. I've just done
valgrind at this point.

If you could have a look at init_gauss() and see about cleaning up the
realloc mess involving gauss_table

That would get our allocs/frees down to the same level as linear. Thanks!
:)

Reply to this email directly or view it on GitHubhttps://github.com//issues/24#issuecomment-36615799
.

@psi29a
Copy link
Member

psi29a commented Mar 5, 2014

It is (very) similar to the approach I was working on over lunch. Great minds stink alike? ;)

Do you have this in code? Feel free to make branch, commit. When finished, we can merge it in.

I'm wondering if this even needs to be generated? If the values are stable, why not just generate it once and just set in a 2d array like with other tables?

@psi29a
Copy link
Member

psi29a commented Mar 5, 2014

Getting into a similar issue as me:

==427== Invalid write of size 8 [ ] [100] [ 0m 44s Processed] [27%] -
==427== at 0x4E36363: init_gauss (wildmidi_lib.c:241)
==427== by 0x4E41A77: WildMidi_GetOutput (wildmidi_lib.c:3894)
==427== by 0x40344B: main (wildmidi.c:1262)
==427== Address 0x68bd160 is 0 bytes after a block of size 286,720 alloc'd
==427== at 0x4C2A2DB: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)

Line 241:
*gptr++ = ck;

^-- is the one that is failing.

@chrisisonwildcode
Copy link
Contributor

Hmm, could try manually creating the 2d array, that would certainly make
things easier ...

Chris

Sent from my iPhone

On 5 Mar 2014, at 7:17 pm, Bret Curtis [email protected] wrote:

Getting into a similar issue as me:

==427== Invalid write of size 8 [ ] [100] [ 0m 44s Processed] [27%] -

==427== at 0x4E36363: init_gauss (wildmidi_lib.c:241)
==427== by 0x4E41A77: WildMidi_GetOutput (wildmidi_lib.c:3894)
==427== by 0x40344B: main (wildmidi.c:1262)
==427== Address 0x68bd160 is 0 bytes after a block of size 286,720 alloc'd
==427== at 0x4C2A2DB: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)

Line 241:
*gptr++ = ck;

^-- is the one that is failing.

Reply to this email directly or view it on
GitHubhttps://github.com//issues/24#issuecomment-36723161
.

@psi29a
Copy link
Member

psi29a commented Mar 5, 2014

Closing this as we've taken care of all the major offenders.

@psi29a psi29a closed this as completed Mar 5, 2014
@sezero
Copy link
Contributor Author

sezero commented Mar 6, 2014

On 3/6/14, Bret Curtis [email protected] wrote:

Closing this as we've taken care of all the major offenders.

Chris' suggestion about gauss_table generation looked nice
but I was able to reproduce the same segv you reported.
The code there is evil, using some magic numbers, I am trying
to follow it but having a hard time yet.

@psi29a
Copy link
Member

psi29a commented Mar 6, 2014

I agree... black magic. This particular problem will be dealt with one way or another. Either we pre-allocate the entire gauss_table with malloc or we craft the gauss_table by hand. I refuse to believe that we have to realloc for each entry.

@sezero
Copy link
Contributor Author

sezero commented Mar 7, 2014

On 3/6/14, Bret Curtis [email protected] wrote:

I agree... black magic. This particular problem will be dealt with one
way or another. Either we pre-allocate the entire gauss_table with malloc
or we craft the gauss_table by hand. I refuse to believe that we have to
realloc for each entry.

OK, regarding Chris' suggestion for avoiding 1024 times realloc()
from init_gauss(), found the reason for segfault and it seems to
work now: the gptr assignment has an accidental sizeof(double)
multiplier in the index value (possibly copy/paste error from
malloc size calculation).

Patch attached: please test, and please also forward to Chris for
an extra review/test if he's able to (don't have his mail address.)

Regarding my concerns about commit 8356363 not being thread-safe
8356363#commitcomment-5591372
.. the patch also reverts it but that part is optional.

(CC'ing Bret, because github unfathomably doesn't handle patch
attachments..)

@psi29a
Copy link
Member

psi29a commented Mar 8, 2014

Latest commit brings me to:
total heap usage: 486 allocs, 486 frees, 8,210,884 bytes allocated
great job! :)

@psi29a
Copy link
Member

psi29a commented Mar 8, 2014

With latest patch you sent per email:
==29053== total heap usage: 487 allocs, 487 frees, 8,497,604 bytes allocated

Please go ahead and apply it. We only have 1 more alloc and we are now thread safe again. Great work!

@sezero
Copy link
Contributor Author

sezero commented Mar 8, 2014

On 3/8/14, Bret Curtis [email protected] wrote:

With latest patch you sent per email:
==29053== total heap usage: 487 allocs, 487 frees, 8,497,604 bytes
allocated

Please go ahead and apply it. We only have 1 more alloc and we are now
thread safe again. Great work!

Pushed the changes.

@psi29a
Copy link
Member

psi29a commented Mar 8, 2014

Great! Thanks!

@sezero
Copy link
Contributor Author

sezero commented Mar 8, 2014

Are we going to backport the malloc changes to 0.3, or are we going to leave it as is?

@psi29a
Copy link
Member

psi29a commented Mar 8, 2014

I'm not sure, if you feel that it would have little impact (not cause any breakage), we can try to cherry-pick what we have over to 0.3 Feel free to do so.

I do not see 0.4 being stabilized any time soon. :)

sezero added a commit that referenced this issue Mar 8, 2014
and 0366cc9 from master to fix the heap abuse, i.e. issue #24.
@sezero
Copy link
Contributor Author

sezero commented Mar 8, 2014

On 3/8/14, Bret Curtis [email protected] wrote:

I'm not sure, if you feel that it would have little impact (not cause any
breakage), we can try to cherry-pick what we have over to 0.3 Feel free to
do so.

Malloc changes seem stable so far, so I pushed b519a70 backporting
the relevant changesets.

I do not see 0.4 being stabilized any time soon. :)

What else do we have in store for it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants