Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing Problem when Node Property has Endline Char #103

Closed
pboling opened this issue Jul 24, 2013 · 17 comments
Closed

Parsing Problem when Node Property has Endline Char #103

pboling opened this issue Jul 24, 2013 · 17 comments

Comments

@pboling
Copy link
Contributor

pboling commented Jul 24, 2013

Update: The failing node has an endline character in it: \r\n. When I get the nodes, including this bad one, or data with this property back in small batches (size 500) using SKIP and LIMIT, where the node with \r\n would be in the second batch, and iterating through the set concatenating the results everything works. When I try to get back a large set of nodes (total 759), or data all at once, it fails.

There doesn't seem to be any obvious reason why a larger data set would trigger the problem, but that is what is happening. 759 nodes doesn't seem to be excessively large to me.

Original:

@neo = Neography::Rest.new
cypher = "START me = node({node_id})
                  MATCH (me)-[:friends]->(friend)
                  RETURN friend.first_name, friend.fb_uid"
@neo.execute_query(cypher, {:node_id => 535})

That query works fine, both in the Neo4j native web query interface, and in Neography.

@neo = Neography::Rest.new
cypher = "START me = node({node_id})
                  MATCH (me)-[:friends]->(friend)
                  RETURN friend.fb_uid, friend.first_name?, friend.last_name"
@neo.execute_query(cypher, {:node_id => 535})

That query works fine for every node except one, (node 535, who happens to have the most friends in my test DB) both in the Neo4j native web query interface, and in Neography. For the node on which it fails I get back a giant string that looks like it is JSON. When I try to parse the string with JSON (in ruby) I get a parse error.

Still looking into this more. Is there a place in the code where a parse error is caught instead of raised and a string returned instead of the array of results marshalled as Ruby objects. Having a string instead of an array come back causes some really messy hacks to be required.

@maxdemarzi
Copy link
Owner

Can we get a dump of the string you get back?

Do all of your nodes have fb_uid, first_name, and last_name fields?

Does this work?
cypher = "START me = node({node_id})
MATCH (me)-[:friends]->(friend)
RETURN friend.fb_uid?, friend.first_name?, friend.last_name?"
How about:
cypher = "START me = node({node_id})
MATCH (me)-[:friends]->(friend)
WHERE has(friend.fb_uid) AND has(friend.first_name) AND has(friend.last_name)
RETURN friend.fb_uid, friend.first_name, friend.last_name"

@pboling
Copy link
Contributor Author

pboling commented Jul 25, 2013

Here is a bit more:

When I go through all of the friends of this user and get each of their individual nodes, all of the ones that have a tilde in a string field, like last_name I get an error, which prints but doesn't raise, in the log (anonymized facebook UID):

log writing failed. "\xC3" from ASCII-8BIT to UTF-8
node: #<Neography::Node birthday="1981-09-22T00:00:00Z", fb_uid=##########, ar_id=860, wants_male=true, ar_type="FacebookProfile", relationship_status="Single", user_state="browsing", first_name="ATeia", wants_female=false, neoid_unique_id="FacebookProfile:860", usable=true, photos_count=3, last_name="da Razão", gender="female", interested_in=0.0, profile_state="non_member">

This is perhaps related to the fact that ASCII-8BIT is a fake string encoding that Ruby uses when dealing with binary sequences.

There are a few other characters that have the same issue, but the most common is the tilde, as I have a lot of Brazilian friends.

Here is another failing node:

log writing failed. "\xC3" from ASCII-8BIT to UTF-8
node: #<Neography::Node birthday="1987-08-05T00:00:00Z", fb_uid=##########, ar_id=880, wants_male=false, ar_type="FacebookProfile", relationship_status="It's Complicated", user_state="browsing", first_name="Ateísmo Sureño", wants_female=true, neoid_unique_id="FacebookProfile:880", usable=true, photos_count=5, last_name="Vive la Vida", gender="male", interested_in=1.0, profile_state="non_member">

Seems like ASCII-8BIT is not an appropriate encoding. Not sure where it is coming from.

@maxdemarzi
Copy link
Owner

Try the latest commit:

Gemfile
gem 'neography' , :git => 'git://github.com/maxdemarzi/neography.git'

@pboling
Copy link
Contributor Author

pboling commented Jul 26, 2013

In the past I have been unable to move much past the 1.0.9 release (ref f800179 is the last commit I know I can use) due to a change in Neography which rendered Neoid incompatible. My app is heavily dependent on Neoid. See: https://github.com/elado/neoid/issues/23

I can remove Neoid dependent code locally to see if that resolves this issue, but I won't be able to deploy it that way.

Neography and Neoid seem to be happy together now. I have no idea what has fixed it, but, yay!

@pboling
Copy link
Contributor Author

pboling commented Jul 26, 2013

Other notes:

This happens with Neo4j 1.8.2 (in production) and 1.9 (locally).

Locally I am using ruby 2.0.0p195 (2013-05-14 revision 40734) [x86_64-darwin12.4.0]

Using Ruby 2.0.0 (whatever that means) on Heroku.

Ruby 2.0.0 changed the way string encodings are handled in significant ways that I am not expert on.

Using Neography 1.0.9 release version, and Neoid 0.1.2 in all environments.

Update: I'm really confused - Looks like I did switch to 1.1.0 and now it is working for some reason. So try HEAD now.

@pboling
Copy link
Contributor Author

pboling commented Jul 26, 2013

This is the trimmed result of the query (I removed all the parts that did not have any odd characters) which comes back as a string instead of being marshalled into a normal array:

http://pastie.org/private/bwjnwf1eovv32m1z2uwsiw

@pboling
Copy link
Contributor Author

pboling commented Jul 26, 2013

I have upgraded to latest HEAD, turned off Neoid, and have the same problem. I also tried your alternate queries:

Does this work?
cypher = "START me = node({node_id})
MATCH (me)-[:friends]->(friend)
RETURN friend.fb_uid?, friend.first_name?, friend.last_name?"

No, exact same result as the one in the pastie above (which is trimmed).

How about:
cypher = "START me = node({node_id})
MATCH (me)-[:friends]->(friend)
WHERE has(friend.fb_uid) AND has(friend.first_name) AND has(friend.last_name)
RETURN friend.fb_uid, friend.first_name, friend.last_name"

No, exact same result as the one in the pastie above (which is trimmed).

@pboling
Copy link
Contributor Author

pboling commented Jul 26, 2013

Of potential interest, when I run the test suite for the gem using master branch @Head locally I get (trimmed):

These are unrelated to this issue per @maxdemarzi comment

@maxdemarzi
Copy link
Owner

2.0.0-M03 is what Travis runs, so these tests pass since they test functionality for rest api transactions and labels.

@pboling
Copy link
Contributor Author

pboling commented Jul 26, 2013

OK, so not related to my original issue at all then. As an aside, is there a REST api for determining version which could be used to automatically switch tests on and off? It would also be useful from a 'tests as documentation' standpoint.

@pboling
Copy link
Contributor Author

pboling commented Jul 26, 2013

OK, so I've turned Neoid back on, and when I load a facebook profile in active record, and then grab the node with Neoid I get properly encoded data, but I do still see the error printing in the output:

[3] pry(main)> i = FacebookProfile.find_by_fb_uid(##################)
=> #<FacebookProfile id: 681, first_name: "美都", last_name: "池水" ... >
[4] pry(main)> i.neo_node
log writing failed. "\xE7" from ASCII-8BIT to UTF-8
=> #<Neography::Node ar_id=681, first_name="美都", last_name="池水" ... >

This is for the same data that came back from the cypher query (see the pastie in previous comment) as:

"
...
[##################,\"\xE7\xBE\x8E\xE9\x83\xBD\",\"\xE6\xB1\xA0\xE6\xB0\xB4\"]
...
"

So when getting a full node back things work. When getting columns of data back the encoding must be handled differently.

@pboling
Copy link
Contributor Author

pboling commented Jul 27, 2013

I am writing some spec tests for this issue right now. :)

@pboling pboling mentioned this issue Jul 27, 2013
@pboling
Copy link
Contributor Author

pboling commented Jul 29, 2013

I figured out the exact node that is causing this failure.

When I request three columns of data, the Facebook UID, first_name and last_name, I get back a giant string If I get all nodes. I narrowed down the exact node causing the problem by getting the nodes in batches. If I get the first 748 nodes I get back an array of arrays, as it should be. When I get back 749 It comes back as a string (full result on pastie.org).

START me = node({node_id})
                    MATCH (me)-[:friends]->(friend)
                    RETURN DISTINCT friend.fb_uid, friend.first_name, friend.last_name
SKIP 0
LIMIT 749;
"
...
[##########,\"Cris\r\npy\",\"Sea\"]]"

Strangely, when I ask for the set starting at the node that breaks this result, the same node comes back fine.

START me = node({node_id})
                    MATCH (me)-[:friends]->(friend)
                    RETURN DISTINCT friend.fb_uid, friend.first_name, friend.last_name
SKIP 748
=> {"columns"=>["friend.fb_uid", "friend.first_name", "friend.last_name"],
 "data"=>
  [
    [##########, "Crispy", "Sea"],
    ...
  ]

When I request two columns of data, first_name and last_name I get back a regular array of data as expected and the \r\n is gone:

START me = node({node_id})
                    MATCH (me)-[:friends]->(friend)
                    RETURN friend.first_name, friend.last_name
SKIP 0
LIMIT 748;
=> {"columns"=>["friend.first_name", "friend.last_name"],
 "data"=>
  [
    ...
    ["Crispy", "Sea"]
  ]

The key, I think, is that there is an end-line character in the first_name field of this node.

Also still seeing this log writing failed. "\xC3" from ASCII-8BIT to UTF-8 very frequently.

@pboling
Copy link
Contributor Author

pboling commented Aug 2, 2013

Update: The failing node has an endline character in it: \r\n. When I get the node or data with this property back in small batches (size 500) using SKIP and LIMIT, where the node with \r\n would be in the second batch, and iterating through the set concatenating the results everything works. When I try to get back a large set of nodes (total 759), or data all at once, it fails.

There doesn't seem to be any obvious reason why a larger data set would trigger the problem, but that is what is happening. 759 nodes doesn't seem to be excessively large to me.

@maxdemarzi
Copy link
Owner

Did you ever figure this one out?

@pboling
Copy link
Contributor Author

pboling commented Sep 1, 2013

No, I had to work around it by using small batches and iterating through with SKIP and LIMIT.

There is definitely still a problem on the nodes with properties containing an end line. It seems as though I would need to have a large data set in a test example to reproduce though based on it not always failing.

Of note though, when the property is parsed, and I get back the set of nodes I expect (instead of the giant string), the endline, \r\n, is not in the property. So when it works it is still failing in a sense, since it doesn't reproduce the actual content of the property correctly.

Also, I experience the issue on Neo4j 1.8 and 1.9 both, and am using Neography v1.1.0. I need to try with the latest version still.

@maxdemarzi
Copy link
Owner

Not sure what to do about this one... so I'm closing it.... please reopen if anyone else runs into this too.

willkessler pushed a commit to willkessler/neography that referenced this issue Apr 21, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants