Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't send large batch requests to GrapheneDB #128

Closed
ayosec opened this issue Dec 7, 2013 · 9 comments
Closed

Can't send large batch requests to GrapheneDB #128

ayosec opened this issue Dec 7, 2013 · 9 comments

Comments

@ayosec
Copy link

ayosec commented Dec 7, 2013

When I generate a big batch request, like

require "neography"
batch_commands = (1..2000).map {|n| [:create_unique_node, "foo", :_id, n, { _id: n }] }
neo = Neography::Rest.new("http://user:[email protected]:24789")

puts "Send batch..."
result = neo.batch(*batch_commands)
puts result

I get the following error:

.../gems/httpclient-2.3.3/lib/httpclient/http.rb:498:in `write': execution expired (HTTPClient::SendTimeoutError)
        from .../gems/httpclient-2.3.3/lib/httpclient/http.rb:498:in `<<'
        from .../gems/httpclient-2.3.3/lib/httpclient/http.rb:498:in `dump'
        from .../gems/httpclient-2.3.3/lib/httpclient/http.rb:924:in `dump'
        from .../gems/httpclient-2.3.3/lib/httpclient/session.rb:615:in `block in query'
        from .../gems/httpclient-2.3.3/lib/httpclient/session.rb:613:in `query'
        from .../gems/httpclient-2.3.3/lib/httpclient/session.rb:164:in `query'
        from .../gems/httpclient-2.3.3/lib/httpclient.rb:1083:in `do_get_block'
        from .../gems/httpclient-2.3.3/lib/httpclient.rb:887:in `block in do_request'
        from .../gems/httpclient-2.3.3/lib/httpclient.rb:981:in `protect_keep_alive_disconnected'
        from .../gems/httpclient-2.3.3/lib/httpclient.rb:886:in `do_request'
        from .../gems/httpclient-2.3.3/lib/httpclient.rb:774:in `request'
        from .../gems/httpclient-2.3.3/lib/httpclient.rb:684:in `post'
        from .../gems/neography-1.0.9/lib/neography/connection.rb:42:in `post'
        from .../gems/neography-1.0.9/lib/neography/rest/batch.rb:33:in `batch'
        from .../gems/neography-1.0.9/lib/neography/rest/batch.rb:14:in `execute'
        from .../gems/neography-1.0.9/lib/neography/rest.rb:351:in `batch'

With other drivers it works.

For example, with Py2neo I can do

from py2neo import neo4j, node

neo4j.authenticate("db0000.sb01.stations.graphenedb.com:24789",
                   "user", "token")

graph_db = neo4j.GraphDatabaseService("http://db0000.sb01.stations.graphenedb.com:24789/db/data/")

batch = neo4j.WriteBatch(graph_db)
for n in range(1,5000):
  batch.create(node(id = n))

print len(batch.submit())

And, with cURL, using a batch file with 999 operations, I can do

$ curl -i http://user:[email protected]:24789/db/data/batch \
    -H "Content-Type: application/json" \
    -H "Accept: application/json;stream=true" \
    -d @batch1.json \
    -o output1

Possible cause

The HTTP client tries to send the request with no authorization headers, so GrapheneDB rejects it with an 401 Unauthorized error, and close the connection. However, after capturing the traffic using tcpdump, it seems that the HTTP client is still sending the content, with no reply from the server — the connection is closed as soon as the headers are received.

In the following capture (extracted tcpdump and chaosreader), we can see the response from the server (in blue) almost from the beginning of the connection.

Working versions

With Neography 1.0.5 it works, since the Authorization header is sent from the first request.

@ayosec
Copy link
Author

ayosec commented Dec 7, 2013

I'm sorry. I forgot to attach the capture.

fail

@maxdemarzi
Copy link
Owner

I think if we add
authenticate(nil)
to the connection.rb initialize method it may work.
I'm crunched for time for a while, can one of you try it out and report back?

@maxdemarzi
Copy link
Owner

Ok, so I found some time.
Can you change your gem to this and try it out?

gem 'neography', :git => 'git://github.com/maxdemarzi/neography.git'

@larskluge
Copy link

Awesome, that works for me--thank you!

@ayosec
Copy link
Author

ayosec commented Dec 13, 2013

@maxdemarzi, thanks for dedicating time to this issue.

Unfortunately, the latest Git version has the same issue.

The first request is still with no authorization headers, and I'm getting the same error with a very large batch.

I'm using this Gemfile:

source "https://rubygems.org"

gem "debugger"
gem "neography", path: "/mnt/coding/vendor/neography/"

And this script:

require "debugger"
require "neography"

batch_commands = (1..2000).map {|n| [:create_unique_node, "foo", :_id, n, { _id: n }] }
neo = Neography::Rest.new("http://....:[email protected]:24789")
result = neo.batch(*batch_commands)
p result

The connection is still frozen.

I replaced the neo.batch sentence with

10.times {|i| neo.create_node(value: i) }

And then capture the traffic with ngrep. You can see that the first request is unauthenticated.

@leosoto
Copy link
Contributor

leosoto commented Feb 7, 2014

@ayosec @maxdemarzi I just stumbled over this issue coming from a different problem (neography is working for us but making tons of useless unauthenticated requests that introduce unnecessary latency).

The underlying reason for neography's behaviour is inside httpclient's code. Basically it triggers auth only after receiving a challenge from the server (see https://github.com/nahi/httpclient/blob/master/lib/httpclient/auth.rb#L136) and (most importantly for my own problem) it will continue to do so if subsequent requests aren't "sub-uris" of a URI for which a challenge has been seen (see https://github.com/nahi/httpclient/blob/master/lib/httpclient/auth.rb#L274). Since neography doesn't explicitely hit "/", "/db/node" or any such "root URI" it never gets a challenge for the entire URI hierarchy and thus httpclient keep asking for challenges on the first hit to every node, index, etc.

I ended up writing the following hack for my case:

# Hack to avoid repeating requests to enable HTTP Auth.
# Basically httpclient only presents its credentials in the
# request after it has been challenged by the server on the
# request URL or any 'parent' URL. So let's make it think that
# it has been challenged on '/'.
# neo is an instance of Neography::Rest 
root_url = neo.connection.configuration
neo.connection.client.www_auth.basic_auth.challenge(root_url)

Of course the above hack only works for basic auth. From a quick glance at httpclient's auth.rb I don't think it is easy to generalize to the other auth methods (digest, oauth). But if you think that a config.force_http_basic_auth = true option (or something along the lines) would be useful in neography itself, I can work on a pull request once I get some free time.

@ayosec
Copy link
Author

ayosec commented Feb 7, 2014

Hi @leosoto,

Thanks for the hack. It is working on my tests.

But if you think that a config.force_http_basic_auth = true option (or something along the lines) would be useful in neography itself, I can work on a pull request once I get some free time.

IMO, it should be enabled always. If Neo4j is configured with a user/password, there is no reason to try unauthenticated requests.

@maxdemarzi
Copy link
Owner

I like the idea, on initialization it'll execute the authentication. Send me a pull request :)

@leosoto
Copy link
Contributor

leosoto commented Feb 13, 2014

@maxdemarzi There you go :)

willkessler pushed a commit to willkessler/neography that referenced this issue Apr 21, 2014
@ayosec ayosec closed this as completed Jun 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants