Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error because of wrong file name? #1

Closed
JanOdijk opened this issue Dec 1, 2023 · 4 comments
Closed

error because of wrong file name? #1

JanOdijk opened this issue Dec 1, 2023 · 4 comments

Comments

@JanOdijk
Copy link

JanOdijk commented Dec 1, 2023

In project
https://webservices.cls.ru.nl/alpino/1130/
an error is reported, perhaps because of the fact that the filename 1130.txt is not OK?

Relevant part of the log file: (https://webservices.cls.ru.nl/alpino/1130/output/error.log )

**** parsed 3 (line number 3)
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/alpino_webservice/alpino_wrapper.py", line 102, in
doc = alpino2folia.makefoliadoc(foliafile)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/foliatools/alpino2folia.py", line 55, in makefoliadoc
foliadoc = folia.Document(id=baseid, processor=processor)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/folia/main.py", line 7397, in init
isncname(kwargs['id'])
File "/usr/local/lib/python3.11/dist-packages/folia/main.py", line 9296, in isncname
raise ValueError('Invalid XML NCName identifier: ' + name + ' (at position ' + str(i+1)+')')
ValueError: Invalid XML NCName identifier: 1130.txt (at position 1)
[CLAM Dispatcher] Process ended (2023-12-01 10:57:43, 6.50318s)

@proycon
Copy link
Owner

proycon commented Dec 1, 2023

Indeed, filenames with only numerals are problematic as that is not a usable identifier for the XML (FoLiA) output. If you rename your files to start with an alphabetic character then it should be okay.

@sanmai-NL
Copy link

@proycon Where is that folia package on GitHub? I propose to sanitize the identifier automatically.

@proycon
Copy link
Owner

proycon commented Oct 17, 2024

It's https://github.com/proycon/foliapy, but this is mediated via the alpino2folia converter in https://github.com/proycon/foliatools, I'll solve it on that level.

proycon added a commit to proycon/foliatools that referenced this issue Oct 17, 2024
@proycon
Copy link
Owner

proycon commented Oct 17, 2024

Implemented, will deploy later tonight/tomorrow

@proycon proycon closed this as completed Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants