Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCF output of web services not recognised by WebLicht as TCF #9

Open
berndmoos opened this issue Oct 26, 2018 · 16 comments
Open

TCF output of web services not recognised by WebLicht as TCF #9

berndmoos opened this issue Oct 26, 2018 · 16 comments
Assignees

Comments

@berndmoos
Copy link
Collaborator

... and this makes it impossible to really use the TCF services on converted ISO/TEI data.

I suspect the reason is the mime type. The metadata for isotei2tcf (https://corpora.uni-hamburg.de/hzsk/de/islandora/object/webservice:isotei2tcfconverter-0.9/datastream/CMDI) specifies the following as output:

application/xml;format-variant=weblicht-tcf

This is what we wanted, but didn't get (see issue#6). In

, @produces is given as:

text/tcf+xml

I think this is what the metadata should use as mime type for the output. Can somebody change that?

Likewise, in...

https://corpora.uni-hamburg.de/hzsk/de/islandora/object/webservice:tcf2isoteiconverter-0.9/datastream/CMDI

... the input mime type should change.

@flammie
Copy link
Contributor

flammie commented Oct 29, 2018

Should be text/tcf+xml now.

@berndmoos
Copy link
Collaborator Author

image

Waiting for the change to take effect... Stay tuned.

@berndmoos
Copy link
Collaborator Author

The change does not seem to be recognised by WebLicht. The monitoring page says that "7 services were retained" at the last harvest (https://weblicht.sfs.uni-tuebingen.de/harvester/resources/report). I suspect some action has to be taken so that the services are updated instead of just retained.

image

@berndmoos
Copy link
Collaborator Author

Asked a question on the list...

@berndmoos
Copy link
Collaborator Author

The output mime type is changed now...

image

... and it is the same as for other services with TCF as an output...

image

... but WebLicht still does not offer other services with TCF as input.

@berndmoos
Copy link
Collaborator Author

I guess the TCF converter is somehow underspecified in the CMDI. We will maybe need to add lang etc., see http://weblicht.sfs.uni-tuebingen.de/comet/editor.jsp?id=1541449788338

@berndmoos
Copy link
Collaborator Author

@flammie
Copy link
Contributor

flammie commented Nov 6, 2018

The links have expired, I added lang parameter de but I didn't force re-indexing yet

@berndmoos
Copy link
Collaborator Author

This one should be a model for specifying the output parameters:

http://weblicht.sfs.uni-tuebingen.de/fedora/objects/WLWS:3/datastreams/CMDI/content

@flammie
Copy link
Contributor

flammie commented Nov 12, 2018

Ok, I copypasted that for a test

@berndmoos
Copy link
Collaborator Author

I think it would be more efficient if HZSK could test the changes directly.
Here's a recipe for testing:

(0) Modify CMDI and wait until WebLicht has harvested it (should take around 2h according to Tübingen)
(1) Go to WebLicht at https://weblicht.sfs.uni-tuebingen.de/
(2) Start, login, start
(3) Choose "Upload a file" and pick an EXMARaLDA Basic Transcription (*.exb) - I use RudiVoellerWutausbruch.exb
(4) Pick the appropriate segmentation algorithm and language - in my case: "hiat" and "deutsch"
(5) check "Show tools with status: development"
(6) Add service "IDS, HZSK: EXMARaLDA to ISO/TEI converter" to the chain
(7) Add service "IDS, HZSK: ISO/TEI to TCF" to the chain

What we want is that WebLicht then offers TCF-based services for the next step. Currently, no services are offered.

@flammie
Copy link
Contributor

flammie commented Nov 12, 2018

Excellent idea, I've played around a bit now, I think it might be the language thing but I still can't get the languages to work around, like with other chains the boxes will contain languages but here it just goes from deutsch to nothing to unknown, even though I copied the input and output parametres, I will continue experimenting...

@flammie
Copy link
Contributor

flammie commented Nov 13, 2018

The language didn't fix it (alone) but adding version or "text" did,

@berndmoos
Copy link
Collaborator Author

Better, but not quite there yet. What WebLicht now offers is a bunch of tokenizers, although the TCF is already tokenized. We'll probably have to add "sentences" and "tokens" to the output as well...

@flammie
Copy link
Contributor

flammie commented Nov 14, 2018

Now it's text sentences tokens and IMS morphology works at least for a trivial small file.

@berndmoos
Copy link
Collaborator Author

It also works for my favourite test files, so I'd venture to say, this issue can be closed. However, there is a similar issue in the mirror operation, so I am opening a mirror issue: issue #10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants