Skip to content

Commit

Permalink
folia2stam: do not include leading whitespace in token/structure offsets
Browse files Browse the repository at this point in the history
  • Loading branch information
proycon committed Mar 19, 2024
1 parent 1f7d6d7 commit 406f279
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions foliatools/folia2stam.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,8 +89,10 @@ def convert_tokens(doc: folia.Document, annotationstore: stam.AnnotationStore, *
if delimiters:
delimiters.sort(key= lambda x: len(x), reverse=True)
text += delimiters[0]
textstart += len(delimiters[0])
elif prevword.space:
text += " "
textstart += 1
try:
text += word.text()
except folia.NoSuchText:
Expand Down

0 comments on commit 406f279

Please sign in to comment.