Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed bnode collision bug. #506

Merged
merged 1 commit into from
Aug 10, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions rdflib/compare.py
Original file line number Diff line number Diff line change
Expand Up @@ -204,7 +204,9 @@ def stringify(x):
return unicode(x)
if isinstance(color, Node):
return stringify(color)
value = sum(map(self.hashfunc, ' '.join([stringify(x) for x in color])))
value = 0
for triple in color:
value += self.hashfunc(' '.join([stringify(x) for x in triple]))
val = u"%x" % value
self._hash_cache[color] = val
return val
Expand Down Expand Up @@ -290,7 +292,7 @@ def _initial_color(self):

def _individuate(self, color, individual):
new_color = list(color.color)
new_color.append((len(color.nodes)))
new_color.append(tuple([len(color.nodes)]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for not spotting this one before :-/

could also be written as new_color.append((len(color.nodes),)), which is a bit faster, but maybe premature optimization:

In [1]: %timeit tuple([1337])
The slowest run took 5.53 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 345 ns per loop

In [2]: %timeit (1337,)
The slowest run took 47.08 times longer than the fastest. This could mean that an intermediate result is being cached
10000000 loops, best of 3: 25.3 ns per loop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Mon, 10 Aug 2015 10:33:29 -0700, Jörn Hees [email protected] said:

> could also be written as
> `new_color.append((len(color.nodes),))`, which is a bit faster,
> but maybe premature optimization:

Creating a value with the right datatype instead of creating one with
the wrong datatype and then converting it isn't optimisation, it's
just doing things correctly.

Or so I think anyways...

-w

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't object to the change. I couldn't remember the syntax for a single-element tuple.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wwaites: well, the double bracket removal is one of the tripping stones in python, so i can somehow understand why some people prefer (tuple([...])) instead of ((...,)). Without seeing it as much of a problem i still changed it in b8df01f ;)


color.nodes.remove(individual)
c = Color([individual], self.hashfunc, tuple(new_color),
Expand Down Expand Up @@ -320,6 +322,7 @@ def _refine(self, coloring, sequence):
sequence = sequence[:si] + colors + sequence[si+1:]
except ValueError:
sequence = colors[1:] + sequence

return coloring

@_runtime("to_hash_runtime")
Expand Down Expand Up @@ -407,7 +410,6 @@ def _traces(self, coloring, stats=None, depth=[0]):
stats['prunings'] += 1
discrete = [x for x in best if self._discrete(x)]
if len(discrete) == 0:
very_best = None
best_score = None
best_depth = None
for coloring in best:
Expand All @@ -434,6 +436,7 @@ def canonical_triples(self, stats=None):
if stats is not None:
stats['initial_coloring_runtime'] = _total_seconds(datetime.now() - start_coloring)
stats['initial_color_count'] = len(coloring)

if not self._discrete(coloring):
depth = [0]
coloring = self._traces(coloring, stats=stats, depth=depth)
Expand Down
2 changes: 2 additions & 0 deletions test/test_canonicalization.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,9 @@ def negative_graph_match_test():
def fn(rdf1, rdf2, identical):
digest1 = get_digest_value(rdf1,"text/turtle")
digest2 = get_digest_value(rdf2,"text/turtle")
print rdf1
print digest1
print rdf2
print digest2
assert (digest1 == digest2) == identical
for inputs in testInputs:
Expand Down