Skip to content
This repository has been archived by the owner on Dec 1, 2024. It is now read-only.

0.10.0-wip (was 0.9.3-wip) #73

Merged
merged 24 commits into from
Nov 18, 2013
Merged

0.10.0-wip (was 0.9.3-wip) #73

merged 24 commits into from
Nov 18, 2013

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented Nov 12, 2013

No description provided.

@rvagg
Copy link
Member Author

rvagg commented Nov 12, 2013

With [email protected] and this we should have MakeCallback with all our v8 function calls, which means we are properly domains compatible and @tjfontaine's dtrace script should work on this. Just need some test code for pushing the batch "leak", anyone?

@rvagg
Copy link
Member Author

rvagg commented Nov 12, 2013

check out test/leak-tester-batch.js.

chained batch:

writeCount = 1 , rss = 127% 12M ["0","0","0","0","0","0","0"]
writeCount = 101 , rss = 606% 56M ["2","1","1","0","0","0","0"]
writeCount = 201 , rss = 1117% 103M ["7","1","1","0","0","0","0"]
writeCount = 301 , rss = 1083% 99M ["8","10","1","0","0","0","0"]
writeCount = 401 , rss = 1109% 102M ["1","25","12","0","0","0","0"]
writeCount = 501 , rss = 1116% 103M ["6","22","15","0","0","0","0"]
writeCount = 601 , rss = 1118% 103M ["11","20","17","0","0","0","0"]
writeCount = 701 , rss = 1074% 99M ["4","38","23","0","0","0","0"]
writeCount = 801 , rss = 1109% 102M ["9","36","25","0","0","0","0"]
writeCount = 901 , rss = 1135% 104M ["2","38","49","0","0","0","0"]
writeCount = 1001 , rss = 1162% 107M ["7","36","51","0","0","0","0"]
writeCount = 1101 , rss = 1148% 105M ["12","32","54","0","0","0","0"]
writeCount = 1201 , rss = 1237% 114M ["5","41","70","0","0","0","0"]
writeCount = 1301 , rss = 1125% 103M ["10","38","73","0","0","0","0"]
writeCount = 1401 , rss = 1162% 107M ["3","44","95","0","0","0","0"]
writeCount = 1501 , rss = 1149% 105M ["8","41","98","0","0","0","0"]
writeCount = 1601 , rss = 1160% 107M ["1","42","120","0","0","0","0"]
writeCount = 1701 , rss = 1130% 104M ["6","41","121","0","0","0","0"]
writeCount = 1801 , rss = 1228% 113M ["11","39","123","0","0","0","0"]
writeCount = 1901 , rss = 1275% 117M ["4","43","145","0","0","0","0"]
writeCount = 2001 , rss = 1241% 114M ["9","41","147","0","0","0","0"]
writeCount = 2101 , rss = 1216% 112M ["2","48","154","16","0","0","0"]
writeCount = 2201 , rss = 1227% 113M ["7","46","155","16","0","0","0"]
writeCount = 2301 , rss = 1194% 110M ["12","44","157","16","0","0","0"]

array batch:

writeCount = 100 , rss = 690% 63M ["2","1","1","0","0","0","0"]
writeCount = 200 , rss = 1254% 115M ["7","1","1","0","0","0","0"]
writeCount = 300 , rss = 1856% 170M ["12","1","1","0","0","0","0"]
writeCount = 400 , rss = 2143% 196M ["1","25","12","0","0","0","0"]
writeCount = 500 , rss = 2612% 239M ["6","23","14","0","0","0","0"]
writeCount = 600 , rss = 3073% 281M ["11","21","16","0","0","0","0"]
writeCount = 700 , rss = 3557% 326M ["4","39","23","0","0","0","0"]
writeCount = 800 , rss = 4038% 370M ["9","37","25","0","0","0","0"]
writeCount = 900 , rss = 4523% 414M ["2","39","48","0","0","0","0"]
writeCount = 1000 , rss = 5008% 458M ["7","37","50","0","0","0","0"]
writeCount = 1100 , rss = 5459% 500M ["12","33","53","0","0","0","0"]
writeCount = 1200 , rss = 5884% 539M ["5","39","71","0","0","0","0"]
writeCount = 1300 , rss = 6405% 586M ["10","37","73","0","0","0","0"]
writeCount = 1400 , rss = 6869% 629M ["3","43","95","0","0","0","0"]
writeCount = 1500 , rss = 7310% 669M ["8","40","98","0","0","0","0"]
writeCount = 1600 , rss = 7786% 713M ["1","42","121","0","0","0","0"]
writeCount = 1700 , rss = 8280% 758M ["6","41","122","0","0","0","0"]
writeCount = 1800 , rss = 8742% 800M ["11","38","124","0","0","0","0"]
writeCount = 1900 , rss = 9297% 851M ["4","43","145","0","0","0","0"]
writeCount = 2000 , rss = 9720% 890M ["9","42","146","0","0","0","0"]
writeCount = 2100 , rss = 10223% 936M ["2","48","153","16","0","0","0"]
writeCount = 2200 , rss = 10708% 980M ["7","47","154","16","0","0","0"]
writeCount = 2300 , rss = 11190% 1024M ["12","45","156","16","0","0","0"]

is this the problem we're looking for? is chained batch immune to the leak in people's experience?

/cc @brycebaril @mcollina

@juliangruber
Copy link
Member

as far as i remember i think i've seen similar memory growth in leveled

@rvagg
Copy link
Member Author

rvagg commented Nov 12, 2013

yeah baby! I pulled in #70 and did a complete job with it (you also need to DisposeBufferOrSlice() to clean up after you've made the Slice). I also found a missing delete batch; in the post-async batch() operation so the LevelDB WriteBatch object was persisting when you used an array-form batch() operation. The delete batch; was present for chained-form, so this explains the difference in memory ballooning between the two.

I'm currently running both chained and array style batch operations in my batch leak tester and I'm up over 2G and memory usage is consistent around the 70M mark, so WOOHOO!

This is going to be 0.10.0 I think, too big a change with the Persistent removal to be a patch version, thoughts?

@rvagg
Copy link
Member Author

rvagg commented Nov 12, 2013

oh, and can someone who has a segfault or leak situation please try this branch and report back?

@juliangruber
Copy link
Member

aren't those changes only fixes?

@juliangruber
Copy link
Member

congrats!!

@mcollina
Copy link
Member

Congrats @rvagg for putting it all together!

👯 and 🍻 to everybody!

I'm up for 0.10, as its a too-heavy fix to be a patch release. This can have bugs, and if you are not experiencing the issue you do not want a possible unsafe upgrade.

I'll give this a go in my segfault case to see if I still get problems.

@juliangruber
Copy link
Member

ok, 0.10 sounds good then

@mcollina
Copy link
Member

This solves my segfault on node v0.10. 🍻

@mcollina
Copy link
Member

I spoke too early. I still have a segfault in LevelGraph which I am not able to reproduce using just levelup/leveldown :( using chained batch.

@rvagg
Copy link
Member Author

rvagg commented Nov 12, 2013

@mcollina can you do a node-gyp rebuild --debug, run levelgraph case with gdb and try and cause the segfault and then give us a complete backtrace so we can see where it's having trouble?

@mcollina
Copy link
Member

First run (without segfault):

(gdb) run writesStream.js
Starting program: /Users/matteo/.nvm/v0.10.20/bin/node writesStream.js
Reading symbols for shared libraries . done

events.js:72
        throw er; // Unhandled 'error' event
              ^
WriteError: Corruption: WriteBatch has wrong count
    at /Users/matteo/Repositories/levelgraph/node_modules/levelup/lib/batch.js:68:39

Program exited with code 010.

Second run, with segfault:

(gdb) run writesStream.js
Starting program: /Users/matteo/.nvm/v0.10.20/bin/node writesStream.js

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: 13 at address: 0x0000000000000000
[Switching to process 12220 thread 0x1a03]
0x0000000100ec77ec in leveldb::DecodeFixed32 (ptr=0xe00000001023000a <Address 0xe00000001023000a out of bounds>) at coding.h:62
62          memcpy(&result, ptr, sizeof(result));  // gcc optimizes this to a plain load
(gdb) backtrace
#0  0x0000000100ec77ec in leveldb::DecodeFixed32 (ptr=0xe00000001023000a <Address 0xe00000001023000a out of bounds>) at coding.h:62
#1  0x0000000100ec70d3 in leveldb::WriteBatchInternal::Count (b=0x102300c20) at ../deps/leveldb/leveldb-1.14.0/db/write_batch.cc:83
#2  0x0000000100ec7041 in leveldb::WriteBatch::Iterate (this=0x102300c20, handler=0x102209c68) at ../deps/leveldb/leveldb-1.14.0/db/write_batch.cc:75
#3  0x0000000100ec72b0 in leveldb::WriteBatchInternal::InsertInto (b=0x102300c20, memtable=0x102300400) at ../deps/leveldb/leveldb-1.14.0/db/write_batch.cc:133
#4  0x0000000100ea2ce7 in leveldb::DBImpl::Write (this=0x102300060, options=@0x102300d60, my_batch=0x102300c20) at ../deps/leveldb/leveldb-1.14.0/db/db_impl.cc:1192
#5  0x0000000100e88ebe in leveldown::Database::WriteBatchToDatabase (this=0x100c00800, options=0x102300d60, batch=0x102300c20) at ../src/database.cc:74
#6  0x0000000100e83e03 in leveldown::Batch::Write (this=0x102300ad0) at ../src/batch.cc:28
#7  0x0000000100e881da in leveldown::BatchWriteWorker::Execute (this=0x102300c40) at ../src/batch_async.cc:25
#8  0x0000000100e878d1 in NanAsyncExecute (req=0x102300c48) at nan.h:752
#9  0x000000010012c8d0 in worker (arg=<value temporarily unavailable, due to optimizations>) at ../deps/uv/src/unix/threadpool.c:74
#10 0x000000010012330b in uv__thread_start (ctx_v=<value temporarily unavailable, due to optimizations>) at ../deps/uv/src/uv-common.c:322
#11 0x00007fff8a752899 in _pthread_body ()
#12 0x00007fff8a75272a in _pthread_start ()
#13 0x00007fff8a756fc9 in thread_start ()

@rvagg
Copy link
Member Author

rvagg commented Nov 12, 2013

from what I can see, the only possible way this can happen is if the WriteBatch object we're operating on has been deleted before Write() is called, at every stage the internal rep_ representation of the batch data is initialised ore reset, it is set to 12 bytes long which should be enough to encode the "count" value that's causing this segfault.

@mcollina perhaps we have some odd V8 GC thing happening, can you try holding on to a reference to the batch you're working on in your code and reference it after you get the callback from the write()? Something like this ought to suffice I think:

var batch = db.batch()
batch.put('foo', 'bar')
batch.write(function (err) {
  batch.dummy = 'foobar'
})

this is to ensure that V8 doesn't decide that batch can be garbage collected until after we're finished with it, perhaps it's getting cleaned up before the batch is completed.

@rvagg
Copy link
Member Author

rvagg commented Nov 13, 2013

so @KyotoWeb helpfully picked up that we're not actually using the bloom filter on this branch! ooops, that's added now.

@rvagg
Copy link
Member Author

rvagg commented Nov 13, 2013

Could I please get a contributor who is on OSX to check out the latest on this branch, go in to deps/snappy/snappy-1.1.1/ and run ./configure, then copy the config.h generated into deps/snappy/mac/, commit and push back to this branch?

@mcollina
Copy link
Member

Done :).

@mcollina
Copy link
Member

I think the segfault happens when LevelDB do compaction, the batch is 'stopped' and then the GC has time to collect it. I was removing a reference to the batch after starting the write.
Level/level-ws@e252652

The fact that this is triggered by LevelGraph is due to the kind of data: LG stores six very different sets of keys in the same batch.

@mcollina
Copy link
Member

The travis build fails with my config.h commit. Did I do something wrong?

@rvagg
Copy link
Member Author

rvagg commented Nov 13, 2013

sorry, my fault, I haven't fully wired up snappy-1.1.1 yet, just needs a snappy.gyp change which you're welcome to do or I'll do when I get back to the computer.

@mcollina
Copy link
Member

Done!

@rvagg
Copy link
Member Author

rvagg commented Nov 14, 2013

this won't compile for me on Windows, because of the Snappy upgrade, and I can't remember what I had to do to make it work last time ... so .. still "work in progress"

@rvagg
Copy link
Member Author

rvagg commented Nov 15, 2013

My current thoughts on the GC issue: I think what we might do is assign a Persistent handle for the parent object each time we do an async operation, when the async is finished then the Persistent can be freed. This should happen for Database, Iterator and Batch. Each of these could be dropped in JS-land while in an async operation and it's only the fact that most of the time the operation is faster than GC can get to it that we survive.

I'll investigate extending NanAsyncWorker in NAN to more easily support this but it should just be a matter of writing a Persistent to the worker each time we make one and then ensuring that any Persistents in the worker are freed when it is cleaned up.

@mcollina
Copy link
Member

I agree with you... this leveldown release will be huge.

@rvagg
Copy link
Member Author

rvagg commented Nov 15, 2013

need to wait for nodejs/nan#41 but as soon as that's released I'll push my latest changes to do what I described above.

@No9
Copy link
Contributor

No9 commented Nov 15, 2013

So Snappy build looks broken on the windows anyhoo
https://code.google.com/p/snappy/issues/detail?id=79

The lines in configure.ac that are mentioned are added in our one too.
https://github.com/rvagg/node-leveldown/blob/0.9.3-wip/deps/snappy/snappy-1.1.1/configure.ac#L21

By adding

#include <BaseTsd.h>
typedef SSIZE_T ssize_t;

Here
https://github.com/rvagg/node-leveldown/blob/0.9.3-wip/deps/snappy/snappy-1.1.1/snappy-stubs-public.h#L95
Edit:
Should be within the windows if condition

https://github.com/rvagg/node-leveldown/blob/0.9.3-wip/deps/snappy/snappy-1.1.1/snappy-stubs-public.h#L89

Fixes it.
Seems like we might need some additional gyp foo to copy this in from oursource tree?

Thoughts?

@rvagg
Copy link
Member Author

rvagg commented Nov 17, 2013

@No9 my thoughts are that this worked fine on windows with 1.1.0 and the changes for 1.1.1 are relatively minor so we should back up and see what's changed, perhaps I had a fix manually in 1.1.0 for Windows or perhaps I'm not quite doing something right now!

@rvagg
Copy link
Member Author

rvagg commented Nov 17, 2013

@No9 sorry, I've followed up yourl inks and understand what you're talking about now! the ssize_t thing indeed is new in 1.1.1, I started off just throwing a -Dssize_t=int at it but there's a suggestion that there's a perf impact from that (I seriously doubt it'd be a measurable perf difference!). So I've gone ahead and added the SSIZE_T -> ssize_t thing and I've also fixed some other things up. Hopefully we have a more complete and efficient snappy build for all operating systems now.

Anyone got anything else that needs to go in to a 0.10.0? I think we might be done here. Final comments?

@mcollina
Copy link
Member

I'm for getting leveldown 0.10 out. It's already a big release, with lots of new stuff and improvements.
I'm not sure about version handling in levelup/level, but we can discuss it there!

@rvagg
Copy link
Member Author

rvagg commented Nov 17, 2013

@mcollina what we've done for leveldown minors is to bump levelup & level minors too even if there aren't any changes there. I think that's reasonable and so far nobody has suggested otherwise.

@mcollina
Copy link
Member

Cool! Ok. I thought 0.18 was the one without WriteStream, but he PR is still not merged, so we should be OK.

rvagg added a commit that referenced this pull request Nov 18, 2013
0.10.0-wip (was 0.9.3-wip)
@rvagg rvagg merged commit a937180 into master Nov 18, 2013
@rvagg rvagg deleted the 0.9.3-wip branch November 18, 2013 00:09
@rvagg
Copy link
Member Author

rvagg commented Nov 18, 2013

published as 0.10.0!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants