-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TCF output of web services not recognised by WebLicht as TCF #9
Comments
Should be text/tcf+xml now. |
The change does not seem to be recognised by WebLicht. The monitoring page says that "7 services were retained" at the last harvest (https://weblicht.sfs.uni-tuebingen.de/harvester/resources/report). I suspect some action has to be taken so that the services are updated instead of just retained. |
Asked a question on the list... |
I guess the TCF converter is somehow underspecified in the CMDI. We will maybe need to add lang etc., see http://weblicht.sfs.uni-tuebingen.de/comet/editor.jsp?id=1541449788338 |
The links have expired, I added lang parameter de but I didn't force re-indexing yet |
This one should be a model for specifying the output parameters: http://weblicht.sfs.uni-tuebingen.de/fedora/objects/WLWS:3/datastreams/CMDI/content |
Ok, I copypasted that for a test |
I think it would be more efficient if HZSK could test the changes directly. (0) Modify CMDI and wait until WebLicht has harvested it (should take around 2h according to Tübingen) What we want is that WebLicht then offers TCF-based services for the next step. Currently, no services are offered. |
Excellent idea, I've played around a bit now, I think it might be the language thing but I still can't get the languages to work around, like with other chains the boxes will contain languages but here it just goes from deutsch to nothing to unknown, even though I copied the input and output parametres, I will continue experimenting... |
The language didn't fix it (alone) but adding version or "text" did, |
Better, but not quite there yet. What WebLicht now offers is a bunch of tokenizers, although the TCF is already tokenized. We'll probably have to add "sentences" and "tokens" to the output as well... |
Now it's text sentences tokens and IMS morphology works at least for a trivial small file. |
It also works for my favourite test files, so I'd venture to say, this issue can be closed. However, there is a similar issue in the mirror operation, so I am opening a mirror issue: issue #10 |
... and this makes it impossible to really use the TCF services on converted ISO/TEI data.
I suspect the reason is the mime type. The metadata for isotei2tcf (https://corpora.uni-hamburg.de/hzsk/de/islandora/object/webservice:isotei2tcfconverter-0.9/datastream/CMDI) specifies the following as output:
application/xml;format-variant=weblicht-tcf
This is what we wanted, but didn't get (see issue#6). In
HZSK-CLARIN-Services/src/main/java/de/uni_hamburg/converters/IsoTeiConverter.java
Line 287 in 2bc7e9e
text/tcf+xml
I think this is what the metadata should use as mime type for the output. Can somebody change that?
Likewise, in...
https://corpora.uni-hamburg.de/hzsk/de/islandora/object/webservice:tcf2isoteiconverter-0.9/datastream/CMDI
... the input mime type should change.
The text was updated successfully, but these errors were encountered: