Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process ends unexpectedly on certain path queries #95

Closed
jeswr opened this issue Feb 24, 2023 · 15 comments
Closed

Process ends unexpectedly on certain path queries #95

jeswr opened this issue Feb 24, 2023 · 15 comments
Labels

Comments

@jeswr
Copy link
Member

jeswr commented Feb 24, 2023

Running the following on the latest version of the link-traversal package causes the process to end without logging anything. Contrastingly the empty array is produced if using the latest version of @comunica/query-sparql

import { QueryEngine } from '@comunica/query-sparql-link-traversal';

const query = `
PREFIX : <http://example.org/>

CONSTRUCT {
  :s :p ?rules .
} WHERE {
  <http://example.org/> (:p1|^:p2)/:p3 ?rules .
}
`

async function main() {
  const engine = new QueryEngine();
  const quads = await engine.queryQuads(query, { lenient: true, sources: [ 'http://example.org/' ] });
  console.log(await quads.toArray())
}

main();
@rubensworks
Copy link
Member

Could you follow the issue templates for future issues? Otherwise our auto-labeling and project pipeline doesn't catch it properly.

@rubensworks
Copy link
Member

Is there a public source over which this could be reproduced?

@jeswr
Copy link
Member Author

jeswr commented Feb 24, 2023

Invalid/empty resources such as http://example.org/ actually reproduce the results - and will terminate you simplify the path (for instance by removing the last part of the sequence).

I'll look into also putting up a minimal document to reproduce where there is at least one result.

@jeswr
Copy link
Member Author

jeswr commented Feb 28, 2023

Here is an example against your WebID

import { QueryEngine } from '@comunica/query-sparql-link-traversal';

const query = `
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX : <http://example.org/>

CONSTRUCT {
  <https://www.rubensworks.net/#me> :knowsSomeoneWithName ?name .
} WHERE {
  <https://www.rubensworks.net/#me> (foaf:knows|^foaf:knows)/foaf:name ?name .
}
`

async function main() {
  const engine = new QueryEngine();
  const quads = await engine.queryQuads(query, { lenient: true });
  console.log(await quads.toArray())
}

main();

It will go about a minute (presumably traversing) before eventually stopping withoutputting the quads array. If you listen to the data and end events you will see that a bunch of results are produced; it is just the end event that is not produced.

@rubensworks
Copy link
Member

Thanks!

@rubensworks
Copy link
Member

Can't reproduce the problem on my end it seems.
It's always going out of memory, even after increasing max memory.
So it looks like it just keeps on running on my end.

@rubensworks
Copy link
Member

Ok, can reproduce the problem with just http://example.org/ as source.

@rubensworks
Copy link
Member

It's definitely related to the streaming source, as setting aggregateStore to false fixes the problem.

@rubensworks
Copy link
Member

Just looked into this (a bit), and it looks like the problem is that the 'end' event is not emitted upwards to the top-level bindingsStream when lower-level streams are destroyed.
Modifying this line https://github.com/RubenVerborgh/AsyncIterator/blob/main/asynciterator.ts#L112
to newState >= exports.ENDED fixes the problem.
But this may be not the solution we want.

@jeswr
Copy link
Member Author

jeswr commented Mar 3, 2023

The lack of an end event seems to be expected behavior (see https://github.com/RubenVerborgh/AsyncIterator/blob/3171521759194b3ce463f989d8253e35978ef690/asynciterator.ts#L74).

I also ran a test against node streams and they also do not emit an end event (though they do emit a close event).

import { Readable } from 'stream';
const iterator = new Readable();

iterator.on('data', () => { console.log('data') });
iterator.on('end', () => { console.log('end') });
iterator.on('error', () => { console.log('error') });
iterator.on('close', () => { console.log('close') });

iterator.push('a');
iterator.push(null);

gives

data
end
close

wheras

import { Readable } from 'stream';
const iterator = new Readable();

iterator.on('data', () => { console.log('data') });
iterator.on('end', () => { console.log('end') });
iterator.on('error', () => { console.log('error') });
iterator.on('close', () => { console.log('close') });

iterator.push('a');
iterator.destroy();

gives

data
close

@jeswr
Copy link
Member Author

jeswr commented Mar 3, 2023

I think the greater issue here is that there is an error event that does not get forwarded somewhere (probably one created by a destruction in comunica/comunica#1164); hence why I suggested here that an error be thrown in the next mver of asynciterator if an error event is emitted without any error listeners

@jeswr
Copy link
Member Author

jeswr commented Mar 3, 2023

For this particular bug - I've narrowed the problem down to the fact that the stream returned by StreamingStore#match never emits an end or error event.

If you add the followng to the end of the match method in the StreamingStore then the results will end, and you will observe that no data end or error events are called in the http://example.org/ case

stream.on('data', () => { console.log('data') })
        stream.on('end', () => { console.log('end called in streaming store') })
        stream.on('error', () => { console.log('error called in streaming store') });

        const stream2 = new readable_stream_1.Readable({ objectMode: true });
        stream2.push(null)
        return stream2;

In turn this seems to be because #end is never called on the streaming store.

@rubensworks
Copy link
Member

I've narrowed the problem down to the fact that the stream returned by StreamingStore#match never emits an end or error event.

I don't think this is the cause of the bug.
It's simply due to the circular dependency with the MediatedLinked...Iterator, which will be closed together with the streams created by StreamingStore#match.

@rubensworks
Copy link
Member

One possible solution would be to make AsyncIterator also emit close event, like in Node streams.

@rubensworks
Copy link
Member

Problem is fixed (not released yet).
A close event was not needed after all. The semantics of destroy are compatible with what we need in Comunica.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

No branches or pull requests

2 participants