-
Notifications
You must be signed in to change notification settings - Fork 14
/
Copy pathREADME.bioc
58 lines (46 loc) · 2.38 KB
/
README.bioc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
## A BioC XML to A BioC XML implementation of MetaMapLite
The class gov.nih.nlm.nls.metamap.lite.BioCProcess allows MetaMapLite
to process BioC XML input and write the results in BioC XML.
### Using from Java
Below is an example using BioC Processing with MetaMapLite
// Initialize MetaMapLite
Properties defaultConfiguration = getDefaultConfiguration();
String configPropertiesFilename =
System.getProperty("metamaplite.propertyfile",
"config/metamaplite.properties");
Properties configProperties = new Properties();
// set any in-line properties here.
FileReader fr = new FileReader(configPropertiesFilename);
configProperties.load(fr);
fr.close();
configProperties.setProperty("metamaplite.semanticgroup",
"acab,anab,bact,cgab,dsyn,emod,inpo,mobd,neop,patf,sosy");
Properties properties =
Configuration.mergeConfiguration(configProperties,
defaultConfiguration);
BioCProcess process = new BioCProcess(properties);
// read BioC XML collection
Reader inputReader = new FileReader(inputFile);
BioCFactory bioCFactory = BioCFactory.newFactory("STANDARD");
BioCCollectionReader collectionReader =
bioCFactory.createBioCCollectionReader(inputReader);
BioCCollection collection = collectionReader.readCollection();
// Run named entity recognition on collection
BioCCollection newCollection = process.processCollection(collection);
// write out the annotated collection
File outputFile = new File(outputFilename);
Writer outputWriter = new PrintWriter(outputFile, "UTF-8");
BioCCollectionWriter collectionWriter = bioCFactory.createBioCCollectionWriter(outputWriter);
collectionWriter.writeCollection(newCollection);
outputWriter.close();
This process attempts to preserve any annotation present in the
original BioC XML input. Some annotations applied to BioC
structures before entity lookup, including `tokenization and
part-of-speech-tagging, are discarded before the writing out the final
annotated collection. This occurs in the entity lookup class
BioCEntityLookup (java package: gov.nih.nlm.nls.metamap.lite).
### Omissions in BioC version
The current version of the BioCProcess class does not call the
abbreviation detector or the negation detector. This should be
relatively simple to add (particularly the abbreviation detector) and
will probably be added in the next release.