-
Notifications
You must be signed in to change notification settings - Fork 355
API
ConceptNet 5.7 has a REST API at api.conceptnet.io where you can get data from ConceptNet in JSON-LD format. This is the easiest way to start using ConceptNet.
The API is read-only, so you interact with all of these URLs by retrieving them with the HTTP GET command -- you can do this in your Web browser or in your preferred programming language.
If you visit the API in a Web browser, you get a formatted, linked version of the JSON structures, which makes it convenient to browse. Try going to http://api.conceptnet.io/c/en/example right now to start learning about the API by example.
At the end of this page, we'll also describe the local Python API, which you can use if you've installed the code and run the build process.
To use ConceptNet's REST API from a programming language, all you need to do is make an HTTP request and parse the result from JSON. Every reasonable programming language has libraries that make this easy.
In Python, we recommend you use the requests library, which will make API requests a one-liner. Here's an example of exploring the API in Python:
>>> import requests
>>> obj = requests.get('http://api.conceptnet.io/c/en/example').json()
>>> obj.keys()
dict_keys(['view', '@context', '@id', 'edges'])
>>> len(obj['edges'])
20
>>> obj['edges'][2]
{'@id': '/a/[/r/IsA/,/c/en/example/n/,/c/en/information/n/]',
'dataset': '/d/wordnet/3.1',
'end': {'@id': '/c/en/information/n',
'label': 'information',
'language': 'en',
'sense_label': 'n',
'term': '/c/en/information'},
'license': 'cc:by/4.0',
'rel': {'@id': '/r/IsA', 'label': 'IsA'},
'sources': [{'@id': '/s/resource/wordnet/rdf/3.1',
'contributor': '/s/resource/wordnet/rdf/3.1'}],
'start': {'@id': '/c/en/example/n',
'label': 'example',
'language': 'en',
'sense_label': 'n',
'term': '/c/en/example'},
'surfaceText': [[example]] is a type of [[information]]',
'weight': 2.0}
The nodes of ConceptNet are words and phrases of natural language. Each node has a URI within ConceptNet that starts with /c/
and a language code, such as /c/en/example
.
If you know one of these URIs, then you also know the complete URL where you can look it up: just prepend http://api.conceptnet.io to the URI. The node for the word "example" can be found at http://api.conceptnet.io/c/en/example.
When you look up a node, you get a structure that looks like this:
{
"@context": [
"http://api.conceptnet.io/ld/conceptnet5.7/context.ld.json"
],
"@id": "/c/en/example",
"edges": [...],
"view": {
"@id": "/c/en/example?offset=0&limit=20",
"firstPage": "/c/en/example?offset=0&limit=20",
"nextPage": "/c/en/example?offset=20&limit=20",
"paginatedProperty": "edges"
}
}
The actually interesting information is inside the edges
list, which we'll discuss in a moment.
This is standard JSON, but some of the properties have @
at the beginning. These are properties that are there for the benefit of JSON-LD, a standard for Linked Data APIs. By using JSON-LD, we provide information that comes with metadata about what it means and how to retrieve it, and this information can be transformed, reused, and linked to other systems.
@context
links to a file of information that helps JSON-LD tools understand the API, and also comes with comments that may be helpful to humans. The context file explains, in RDF and in English, what the @-less property names like "edges" and "view" mean.
Most of the objects returned from this API have an @id
, which tells you the URI of that object. The @id
at the top level is the URI you just looked up. You'll find more @id
s inside the "edges" list that you can use to browse to related information.
Some objects come with a lot of information, and we don't want to return it all at once. The view
object describes how a long list is paginated: it has an @id
that links to the particular page of results that you're seeing, and firstPage
, previousPage
, and lastPage
values that link to various pages. "paginatedProperty": "edges"
tells you that "edges" is the list that you're browsing page by page.
There are three methods for accessing data through the ConceptNet 5 API: lookup, search, and association.
- Lookup is for when you know the URI of an object in ConceptNet, and want to see a list of edges that include it.
- Search finds a list of edges that match certain criteria.
- Association is for finding concepts similar to a particular concept or a list of concepts.
Inside the "edges" list, you'll find objects representing the edges of the graph -- units of knowledge that link this node to other nodes. Here's one:
{
"@id": "/a/[/r/UsedFor/,/c/en/example/,/c/en/explain/]",
"dataset": "/d/conceptnet/4/en",
"end": {
"@id": "/c/en/explain",
"label": "explain something",
"language": "en",
"term": "/c/en/explain"
},
"license": "cc:by-sa/4.0",
"rel": {
"@id": "/r/UsedFor",
"label": "UsedFor"
},
"sources": [
{
"@id": "/and/[/s/activity/omcs/omcs1_possibly_free_text/,/s/contributor/omcs/pavlos/]",
"activity": "/s/activity/omcs/omcs1_possibly_free_text",
"contributor": "/s/contributor/omcs/pavlos"
}
],
"start": {
"@id": "/c/en/example",
"label": "an example",
"language": "en",
"term": "/c/en/example"
},
"surfaceText": "You can use [[an example]] to [[explain something]]",
"weight": 1.0,
"@context": [
"http://api.conceptnet.io/ld/conceptnet5.5/context.ld.json",
"http://api.conceptnet.io/ld/conceptnet5.5/pagination.ld.json"
]
}
The @id
of this edge is /a/[/r/UsedFor/,/c/en/example/,/c/en/explain/]. This complex-looking URI uniquely describes this edge in terms of the nodes it connects and how it connects them. You don't have to pull information out of this URI -- you'll find it in a more convenient form in the start
, end
, and rel
properties.
"start": {
"@id": "/c/en/example",
"label": "an example",
"language": "en",
"term": "/c/en/example"
}
start
and end
point to nodes of ConceptNet. They contain an @id
where you can look up all the information about that node. They also provide:
- A human-readable
label
, which may be a more complete phrase such as "an example" instead of just the word "example" that appears in the URI. -
language
, the language code for what language the label is in (this is always the same as the language code that appears in its URI). -
term
, a link to the most general version of this term. In many cases this is just the same URI. If you've looked up a particular sense, such as the noun sense of "example" at/c/en/example/n
, this links to the more general/c/en/example
.
"rel": {
"@id": "/r/UsedFor",
"label": "UsedFor"
}
rel
describes one of the 40-ish defined relations that connect the nodes of ConceptNet. Relations are labeled with artificial names such as "UsedFor", which stay the same even as they describe information in different languages or from different data sources.
"surfaceText": "You can use [[an example]] to [[explain something]]"
Some of ConceptNet's data is extracted from natural-language text. The surfaceText
value shows you what this text was.
"sources": [
{
"@id": "/and/[/s/activity/omcs/omcs1_possibly_free_text/,/s/contributor/omcs/pavlos/]",
"activity": "/s/activity/omcs/omcs1_possibly_free_text",
"contributor": "/s/contributor/omcs/pavlos"
}
]
sources
tells you why ConceptNet believes this information. Each edge comes from one or more sources. Each of these sources is an object with its own @id
, but this ID just contains the information in the rest of the object (a redundancy we put up with to make the RDF data nicer).
A source can describe various factors that combined to provide this knowledge, which can include: a contributor
, representing a person participating in a crowd-sourced site; an activity
, indicating what in particular they were doing; or an automated process
that extracted knowledge.
In this case, the source is a contributor named "pavlos", who was typing information into the original Open Mind Common Sense (OMCS) Web site long ago. Our bookkeeping from back then isn't perfect, but we believe they were typing a sentence into OMCS's free-form text box, so we summarize this as omcs1_possibly_free_text
.
"license": "cc:by-sa/4.0"
The license
value tells you how the information you retrieved may be reused. It isn't a link to our API -- instead, it's a link to Creative Commons's linked-data API. The prefix cc:
is defined in the context file to refer to http://creativecommons.org/licenses/ . This means you can find the exact license information for this edge, readable by humans and computers, at http://creativecommons.org/licenses/by-sa/4.0.
(Some edges are available without the ShareAlike requirement, but you have to follow the ShareAlike requirement to re-use ConceptNet as a whole. See Copying and sharing ConceptNet.)
"weight": 1.0
The weight
value says how believable the information is. A typical weight is 1.0, and the number is higher when the information comes from more sources or more reliable sources.
The dataset
value is used in bookkeeping for how ConceptNet is built. It's not very important.
The URIs for terms (also known as "concepts") start with /c/
, and follow a hierarchy from languages, to terms, to senses of terms with a particular part of speech.
Consider the term /c/it/esempio/n. This represents the Italian noun "esempio".
/c/it/esempio represents the term "esempio" in Italian, whether it is a noun or not. You'll still find the results from /c/it/esempio/n
when you browse it. Any term URI implicitly contains all its more specific URIs.
/c/it represents all of ConceptNet's knowledge in Italian. You can browse there for a sample of things ConceptNet knows in Italian. Somewhere in the long, paginated list will be facts about /c/it/esempio/n
.
The hierarchy stops there: /c
is not a URI that you can use.
For more information about what various URIs mean, see URI hierarchy.
Given a word or phrase of natural language text, you can probably figure out by now what its ConceptNet URI is. You replace spaces with _
, and you stick a language code on the front. The phrase "french toast" is at /c/en/french_toast
.
This used to be more complicated in previous versions of ConceptNet.
If you want this spelled out for you, perhaps because you're a computer consuming a Linked Data API and you're very literal-minded, you can send a GET request to http://api.conceptnet.io/uri with these parameters:
-
text
: the text -
language
: the language this text is in.
Because this is a GET request, these parameters end up encoded in the URL: http://api.conceptnet.io/uri?language=en&text=french+toast
Other objects in ConceptNet also have URIs.
As indicated before, you can look up a single edge by its URI: /a/[/r/UsedFor/,/c/en/example/,/c/en/explain/]
You can look up a relation to see examples of edges that use it: /r/Antonym
You can look up a source to see the edges that they contributed: /s/contributor/omcs/dev
To filter for specific information, you can give parameters to http://api.conceptnet.io/query, which gives you a list of matching edges.
You can specify any of the following parameters:
- start: a URI that the "start" or "subject" position must match.
- end: a URI that the "end" or "object" position must match.
- rel: a relation.
- node: a URI that must match either the start or the end.
- other: a URI that must match either the start or the end, and be different from node.
- sources: a URI that must match one of the sources of the edge.
To see all relations that connect "dog" and "bark": /query?node=/c/en/dog&other=/c/en/bark
To see what the original OMCS dev team said about ferrets: /query?node=/c/en/ferret&sources=/s/contributor/omcs/dev
To see assertions about cats (猫) that are entirely in Japanese: /query?node=/c/ja/猫&other=/c/ja
Here's an example in Python of using the API to get all external Linked Data items that are connected to the ConceptNet term "apple":
>>> import requests
>>> response = requests.get('http://api.conceptnet.io/query?start=/c/en/apple&rel=/r/ExternalURL&limit=1000')
>>> obj = response.json()
>>> [edge['end']['@id'] for edge in obj['edges']]
['http://dbpedia.org/resource/Apple',
'http://wikidata.dbpedia.org/resource/Q89',
'http://sw.opencyc.org/2012/05/10/concept/en/Apple',
'http://wordnet-rdf.princeton.edu/wn31/107755101-n',
'http://wordnet-rdf.princeton.edu/wn31/112654755-n',
'http://en.wiktionary.org/wiki/apple',
'http://en.wiktionary.org/wiki/Apple',
'http://fr.wiktionary.org/wiki/apple']
This API endpoint uses word embeddings built from ConceptNet and other inputs to find related terms. The embeddings are a version of ConceptNet Numberbatch, with a reduced vocabulary that makes it more reasonable to load on the server.
To see terms related to "tea kettle": /related/c/en/tea_kettle
{
"@id": "/c/en/tea_kettle",
"related": [
{
"@id": "/c/en/tea_kettle",
"weight": 1.0
},
{
"@id": "/c/en/teakettle",
"weight": 0.771
},
{
"@id": "/c/nl/ketel",
"weight": 0.723
},
{
"@id": "/c/ja/釜",
"weight": 0.718
},
{
"@id": "/c/zh/水壶",
"weight": 0.712
},
...
]
}
By default, this will return results in any of the 10 core languages of ConceptNet. You can filter it for a specific language using the filter
parameter, which specifies a URI that results must match. Thus, the closest English terms to "tea kettle" can be found at /related/c/en/tea_kettle?filter=/c/en.
The /relatedness
API is like /related
, but instead of ranking the top related terms to your query, it returns the relatedness value for a particular pair of terms.
Example: /relatedness?node1=/c/en/tea_kettle&node2=/c/en/coffee_pot
{
"@id": "/relatedness?node1=/c/en/tea_kettle&node2=/c/en/coffee_pot",
"value": 0.543
}
If you have the ConceptNet code and data installed on your computer, you can also access this API through Python, using the AssertionFinder.lookup()
and AssertionFinder.query()
methods.
>>> from conceptnet5.db.query import AssertionFinder
>>> cnfinder = AssertionFinder()
>>> cnfinder.lookup('/c/en/example')
[... lots of edges ...]
>>> cnfinder.query({'node': '/c/en/example'})
[... the same edges ...]
In ConceptNet 5.5, we refactored the API to follow JSON-LD format and to make it obvious what the URI of every term is. ConceptNet 5.1 through 5.4 used a slightly different API, whose historical documentation you can find here.
You can make 3600 requests per hour to the ConceptNet API, with bursts of 120 requests per minute allowed. The /related
and /relatedness
endpoints count as two requests when you call them.
This means you should design your usage of the API to average less than 1 request per second.
Starting points
Reproducibility
Details