The Citation Extractor Service (CES) is a Service Node in a Value-Adding Network.
CES watches an LDN Inbox for as:Offer
notification messages about new published PDF documents. For each PDF document an citation extraction process is executed. The
results of this citation extraction process is sent back to the requestor of the offer.
CES is composed out of the following components:
- LDN Inbox - The LDN inbox endpoint from which CES receives incoming notifications about new publications
- Koreografeye - The koreografeye reasoning engine
- Citation processors - Software to fetch PDF files and extract citation/mentions
- N3 rules - Rules that define what to process and how to respond
- Node v16.18.1
- OAI-Bridge - https://github.com/MellonScholarlyCommunication/OAI-Bridge
yarn install
yarn build
{
"@context": "https://www.w3.org/ns/activitystreams",
"id": "urn:uuid:775adc40-3cdf-4a84-827a-b74ca4485800",
"type": [
"Offer",
"http://example.org/CitationExtraction"
],
"actor": {
"id": "https://biblio.ugent.be/profile/card#me",
"type": "Service",
"inbox": "http://n062-07.wall2.ilabt.iminds.be:3000/inbox/",
"name": "Ghent University Academic Bibliography"
},
"object": {
"id": "https://biblio.ugent.be/publication/8655843/file/8655844.pdf",
"type": [
"Article",
"https://schema.org/ScholarlyArticle"
]
},
"origin": {
"id": "https://github.com/MellonScholarlyCommunication/OAI-Bridge/profile/card#me",
"type": "Service",
"name": "OAI-Bridge Demo Service"
},
"target": {
"id": "http://n062-07.wall2.ilabt.iminds.be:3001/profile/card#me",
"type": "Service",
"inbox": "http://n062-07.wall2.ilabt.iminds.be:3001/inbox/",
"name": "Citation Extraction Service"
}
}
Definition of all orchestration and policy execution plugins that will be used.
urn:koreografeye:reasonerInstance
- Definition of the N3 reasoner componenthttp://example.org/sendNotification
- Definition of the LDN sender componenthttp://example.org/extractCitations
- Definition of the PDF citation extraction componenthttp://example.org/serializeAs
- Definition of the N3 store serialization component
An N3 rule file that requests:
- for each PDF file that is found in
as:object/as:url
the extraction of citations and mentions - the discovery of the LDN inbox for each of these citations
- the generation of a new input file for reasoning
An N3 rule file that:
- send the parsed citations as a service result to the requesting data node
By starting the CES Solid CSS server, we create an LDN endpoint on port 3000 on localhost.
Open a new terminal and type:
yarn solid
OAI-Brige is a project as a bridge between the OAI-PMH protocol and the Event Notifications in Value-Adding Network protocol.
In our examples data from https://biblio.ugent.be (Biblio) will be used.
yarn bridge:demo
After this step we can find on the Solid server http://localhost:3000/inbox/ some incoming
as:Announce
notifications about PDF resources published at in the Biblio repository.
In the next step CES will read the LDN Inbox of the Solid instance and for each incoming
as:Announce
the N3 rules in rules/extractCitations
will define what the next processing
steps will be for each notification message. The results will be written to the pre/
directory.
yarn extract:prepare
The pre/
directory will now contain for each as:Announce
notification the required
processing steps. These processing steps "policies" will be executed in the next step
In this processing step CES will do the actual PDF extraction of citations and generate a
new output file in the in
directory for each processed notification.
yarn extract:run
In this procesing step CES will use N3 rules in biblio/sendCitationNotifications.n3
to
decide what to do with the citations found in the previous step.
yarn send:prepare
The results of this step will be in the out
directory.
In the processing step the results of the out
directory will be executed by the
policy executer. In our demo the citations will be sent back to the requestor
of the as:Offer.
yarn send:run