New validator rule: leaf-det (and det vs. nmod) #1059

nschneid · 2024-10-08T19:31:36Z

I notice that the leaf-det-clf rule introduced in UniversalDependencies/tools@1e4debd and then revised in UniversalDependencies/tools@759c5ae has invalidated quite a lot (a majority?) of treebanks.

Is further revision necessary? For example, EWT is still experiencing some errors that look like they should be valid:

det + nmod e.g. "at least some reports" (det(reports, some), nmod(some, least)). "at least" is admittedly ADV-like, so another option is to make it ExtPos=ADV and advmod.
"such"/det licensing an advcl, as in these results. The guidelines on sufficiency and excess for "so" and similar say the advcl should attach to the adjective or adverb, not the noun in a case like sufficient flour. In such a high price that nobody could afford it, I suppose "such" should have an advcl dependent?

The text was updated successfully, but these errors were encountered:

mr-martian · 2024-10-08T19:48:41Z

The errors in Hebrew are due to things like

# x- so the RTL text doesn't make this unreadable
32	x-ה	x-ה	DET	art	PronType=Art	33	det	_	Gloss=the|Ref=GEN_19.8
33	x-אֲנָשִׁ֤ים	x-אישׁ	NOUN	subs	Gender=Masc|Number=Plur	38	obl	_	Gloss=man|Ref=GEN_19.8
34-35	x-הָאֵל֙	x-_	_	_	_	_	_	_	_
34	x-הָ	x-ה	DET	art	PronType=Art	35	det	_	Gloss=the|Ref=GEN_19.8
35	x-אֵל֙	x-אל	PRON	prde	Number=Plur|PronType=Dem	33	det	_	Gloss=these|Ref=GEN_19.8

where demonstrative pronouns have their own determiners. (I'm open to other means of annotating this.)

amir-zeldes · 2024-10-08T20:04:53Z

@mr-martian this is also the analysis used in the modern Hebrew TBs, so I would be inclined to accept and keep it (it's also parallel to how adjectival modification works in Hebrew)

mr-martian · 2024-10-08T20:13:05Z

If I were doing Hebrew from scratch, the one alternative I'd consider is treating ה as an inflectional prefix rather than a syntactic word.

amir-zeldes · 2024-10-08T21:53:13Z

I would vote against that TBH, it's not how other languages with repeating articles do it either (e.g. Greek) and it complicates lemmatization, type counts, and a bunch of other things.

colinbatchelor · 2024-10-10T13:52:45Z

I have one remaining error:
[(in gd_arcosg-ud-train.conllu) Line 55940 Sent p01_033h Node 79]: [L3 Syntax leaf-det-clf] 'det' not expected to have children (79:a:det --> 81:h-uile:compound)

a h-uile 'every' is treated in the source material as a determiner, which seems reasonable.
I've also been following other treebanks like Turkish in using compound for reduplication: https://universaldependencies.org/gd/dep/compound.html

The offending tree has someone emphasising 'every' by saying a h-uile h-uile. Is there maybe a better way I should be doing this or could it be an exception?

nschneid · 2024-10-10T18:52:04Z

Repetition for emphasis: would flat be a good option instead of compound? Cf. https://universaldependencies.org/u/dep/flat.html#iconic-sequences (though I can't speak to how languages are dealing with reduplication in general).

The validator currently allows fixed, but not flat, it seems.

LeonieWeissweiler · 2024-10-10T18:57:30Z

This invalidated both HDT and GSD for German, mostly because of vor allem (mainly) and unter anderem (among others). For both, the first word is an ADP' and the second is a DET' that depends on it with the `case' relation.

How should we handle this better?

nschneid · 2024-10-10T19:03:50Z

unter anderem is sometimes treated as a fixed expression. Here is a case triggering the error:

I assume this means "among other teachers"—is there a reason not to analyze it as "among [other teachers]", with unter attaching to Lehrer?

amir-zeldes · 2024-10-10T19:25:16Z

No, for the German case it's not "among other teachers", notice "other" is dative but "teacher" is not - it's "among others, teachers". I think the mistake is the deprel det - this is not a determiner but an oblique modifier, just like English "among others".

FedeIure · 2024-10-11T08:07:44Z

Repetition for emphasis: would flat be a good option instead of compound? Cf. https://universaldependencies.org/u/dep/flat.html#iconic-sequences (though I can't speak to how languages are dealing with reduplication in general).

The validator currently allows fixed, but not flat, it seems.

What about flat:redup to mark repetition for emphasis?

Here two examples in one sentence from Roman tragedies in UD Latin-CIRCSE:

sylvainkahane · 2024-10-11T09:41:40Z

For spoken data, we need three relations to be added to the validator:

discourse, which is very common between two determiners in false starts: "a, uh, a gap", "my, uh, our friend"
parataxis for cases such as "a, I don't how to call that, a kiosk, …": here we have a reparandum link between the two "a"s and we would like to attach the parenthesis to the first "a". More exactly we use parataxis:parenth in our spoken French treebanks.
dep for false starts such as "the last, the last day": here "the last" forms a phrase the head of which is missing and we decided to have dep(the, last). I am not against another solution, as long as "the last" is still a phrase.

lrituma · 2024-10-15T10:17:56Z

In Latvian, we have several expressions considered as compound pronouns in Latvian traditional grammar which consist of one particle and one pronoun. For example, kaut kāds where kaut is a particle and kāds is a pronoun (this expression roughly means 'some kind of'). Currently, we annotate the particle as discourse which is dependent of pronoun, and pronoun occasionally becomes det if the expression describes a noun. This leads to validation error.

The particles in these expressions usually are kaut, diez, diezin, nez, nezin, and they all have very fuzzy, hard to pin down semantics so we feel uncomfortable annotating them as adverbs.

We would like to annotate these expressions as compound (instead of fixed) because the pronoun is the second element in the phrase and we feel that it is the head of the phrase because the pronoun inflects together with a noun and bears the most of semantic meaning of the expression.

Would you please consider allowing compound in this construction or is there any other option appropriate here?

nschneid · 2024-10-15T17:29:24Z

@dan-zeman What about relaxing the error to a warning while we figure out the contours of the rule?

Stormur · 2024-10-17T15:45:50Z

I think that this new rule is fine, even if, while correcting, I and colleagues have encountered a couple of cases which really do not look reducible to a trivial correction as all the others.

The already mentioned reduplication, which is treated through flat:redup in Latin treebanks. One example is quot quot from quot: while the latter means 'as many as', the reduplication has a distributive sense as in 'for each possible one...' (this expression is sometimes even univerbated). I think to annotate them separately, each depending on the head, is not the right way to deal with them: here we do not have two or more different terms, but really the same one "clonating" itself. On the other hand, flat is really the closest relation we have to fixed, which would cause no problem, but is not a correct choice (well, in my opinion it is never the correct choice)
- Problem: horizontal relation
The phrase nostra qui remansissemus caede 'the murder of us who are left (behind)', but more literally 'our who are left murder', since nostra is the inflected possessive determiner for the 1st person plural. What happens here is that the possessive adds a nominal person, as it were, and this person is another referent beyond the noun caede 'murder' in this phrase; as such, the relative can target it (or at least, Cicero pleases himself in doing so). We could not really justify an analysis where we shift the relative under the head noun, since the murder is not one of its arguments.
- Problem: the relative clause dependent of the determiner cannot be traced back to the referent of its head

To summarise the above discussion, my two proposals are to deactivate this validation rule if:

the child of det is a flat relation
the head element has the feature Person, at least for acl:relcl

amir-zeldes · 2024-10-17T15:58:34Z

We have something similar to the case in 1. in Coptic where a word is repeated for distributive meaning:

one one = "one by one"
two two = "two by two, in pairs"
color color = "color for color, every color"

Etc. 1-2 also work fine in modern Hebrew BTW, and 3. would work in the plural. What we did in UD Coptic was interpret them as nominal modifiers without a preposition (i.e. "one one" is the same as "one by one" with the word "by" suppressed). We then used the nmod:unmarked relation, which is a subtype of nmod used without a case marker.

jasiewert · 2024-10-20T11:43:22Z

This new rule invalidates an analysis in my Low Saxon dataset that I just presented last spring in my LREC-COLING paper and discussed with other UD people at the conference, even with @dan-zeman himself, if I remember correctly. It is explained in Section 5.1 here: https://aclanthology.org/2024.lrec-main.1388.pdf The gloss and translation of the sentence can be found in Section 4.3.

Attaching the possessor in dative case to the possessee instead of the determiner does not represent the way this construction works because 1) the dative possessor cannot be attached to the possessee without the determiner and 2) the possessee can be dropped while the determiner cannot. E.g., in the example in my paper, "In der Gemoene iarem." (literally "in the parish hers") is a valid answer to a specification question in whose service the person stands. (A note to German speakers: Masculine and neuter nouns show that this is indeed a dative, not a genitive.)
The alternative to change the determiners' tags to PRON in Low Saxon would go against UD's own definition of determiners. I would therefore join @nschneid in asking you to relax the error to a warning or ask for language-specific exceptions to the rule.

@ftyers

@ftyers @jonorthwash Is there a way to get around Pronoun det with appos in (). This is something that might show up in a text «his (John's) text is strange.» I would have: det(text, his) appos(his, John's)

lauma · 2024-10-21T11:25:48Z

Also, in Latvian we struggle with constructions similar to "such a high price that nobody could afford it" from the original post as well.

rueter · 2024-10-21T15:15:06Z

Yes, @nschneid, I think the problem encountered in UD_Erzya-JR should be made explicit, here.
In Erzya (myv), Moksha (mdf) and Skolt Saami (sms), genitive forms of personal pronouns are regularly connected to their possessa with a ‹det› dependency.

sent_id = EKS:2011:39:15:ČesnokovF
Конат-конат сонзэ (Степан Иваныч) ладсо сырелгадсть...
Konat-konat    sonze    (Stepan Ivanych)  ladso    syrelgadstʹ...
such-such.Pl  his/her  (St. I.)                    in.way   become.older.3Pl

`some like him (Stepan Ivanich) had gotten older...'

obl(syrelgadstʹ, ladso)
det(ladso, sonze)
appos(sonze, Stepan)

This could also be dealt with as a postposition, where the noun ‹lad› `way' in the Inessive case would contribute to the same ‹obl› dependency

obl(syrelgadstʹ, sonze)
case(sonze, ladso)
appos(sonze, Stepan)

Departing from a ‹det› dependency, however, we could approach English(, but this is not what EWT does).

His friends come from all over.
det(friends, his)

In linguistics, such a sentence might be quoted with an inserted identifier for contextual clarity, e.g.,

His (Fred's) friends come from all over.
det(friends, his)
appos(His, Fred's)

Authors themselves [their very selves], might do the same thing with commas:
His, Fred's, friends come from all over.
det(friends, his)
appos(His, Fred's)

Since the validator does not allow words with a ‹det› dependency to take children, one might opt to follow a Swedish lead and change all instances of genitive-case personal pronoun ‹det› to ‹nmod:poss/nmod:det›, but wouldn't that go against the established norm?

Here is an example of Swedish hennes ‹her› given with ‹nmod:poss› dependency
The genitive form of a third person singular personal pronoun 'her'

# sent_id = sv-ud-dev-78
# text = Börjar hennes jobb att delas av den moderne mannen?
1	Börjar	börja	VERB	VB|PRS|AKT	Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act	0	root	0:root	_
2	hennes	hon	PRON	PS|UTR/NEU|SIN/PLU|DEF	Definite=Def|Poss=Yes|PronType=Prs	3	nmod:poss	3:nmod:poss	_
3	jobb	jobb	NOUN	NN|NEU|SIN|IND|NOM	Case=Nom|Definite=Ind|Gender=Neut|Number=Sing	1	nsubj	1:nsubj|5:nsubj	_
4	att	att	PART	IE	_	5	mark	5:mark	_
5	delas	dela	VERB	VB|INF|SFO	VerbForm=Inf|Voice=Pass	1	xcomp	1:xcomp	_
6	av	av	ADP	PP	_	9	case	9:case	_
7	den	den	DET	DT|UTR|SIN|DEF	Definite=Def|Gender=Com|Number=Sing|PronType=Art	9	det	9:det	_
8	moderne	modern	ADJ	JJ|POS|MAS|SIN|DEF|NOM	Case=Nom|Definite=Def|Degree=Pos|Gender=Com|Number=Sing	9	amod	9:amod	_
9	mannen	man	NOUN	NN|UTR|SIN|DEF|NOM	Case=Nom|Definite=Def|Gender=Com|Number=Sing	5	obl:agent	5:obl:agent	SpaceAfter=No
10	?	?	PUNCT	MAD	_	1	punct	1:punct	_

In Swedish, the first and second person pronouns are associated with distinct determiners that are called pronouns in UD vår, min, er, din. These words inflect according to their possessa, and therefore they might be seen as analogically the same phenomena as the Czech possessive determiners.

`possessive determiners (which modify a nominal) (note that some languages use PRON for similar words): [cs] můj, tvůj, jeho, její, náš, váš, jejich'
See also
https://universaldependencies.org/cs/dep/nmod.html
The Czech is consistent.

https://universaldependencies.org/ru/dep/nmod.html
I note that Russian also ‹его карта› amod(карта, его)
translated as English ‹his card› amod(card, his)
Syntag appears to contradict this in ‹его мнению› his opinion' det(мнению, его) but also в его (и не только его, но и нашем) случае' ‹в его случае› `in his case' nmod(случае, его)

https://universaldependencies.org/en/dep/nmod.html
I note that the English provides ‹my office› nmod:poss(office, my)
which is the same coding as in EWT.

So it looks like there might be a Swedish–English consensus for nmod:poss use with possessive pronouns, and genitive personal pronouns.

There is disparity within the Russian corpora along side a consistent Czech.

johnnymoretti · 2024-10-22T09:52:04Z

211 treebanks are invalidated by this new rule, and we need guidance on what to do before the freeze!!! Please provide brief and clear instructions, as aligning the treebanks with this rule requires a lot of work.

KoichiYasuoka · 2024-10-23T23:54:03Z

In Classical Chinese 彼此兵 (those and these soldiers) is invalidated by this new rule. How do we solve it?

# sent_id = KR2b0041_018_par8_1550-1557
# text = 訂彼此兵不得過關
1	訂	訂	VERB	v,動詞,行為,動作	_	0	root	_	Gloss=settle|SpaceAfter=No
2	彼	彼	PRON	n,代名詞,指示,*	PronType=Dem	4	det	_	Gloss=that|SpaceAfter=No
3	此	此	PRON	n,代名詞,指示,*	PronType=Dem	2	flat	_	Gloss=this|SpaceAfter=No
4	兵	兵	NOUN	n,名詞,人,役割	_	7	nsubj	_	Gloss=soldier|SpaceAfter=No
5	不	不	ADV	v,副詞,否定,無界	Polarity=Neg	6	advmod	_	Gloss=not|SpaceAfter=No
6	得	得	AUX	v,助動詞,可能,*	Mood=Pot	7	aux	_	Gloss=must|SpaceAfter=No
7	過	過	VERB	v,動詞,行為,移動	_	1	ccomp	_	Gloss=pass|SpaceAfter=No
8	關	關	NOUN	n,名詞,固定物,建造物	Case=Loc	7	obj	_	Gloss=bar|SpaceAfter=No

rueter · 2024-10-24T11:58:38Z

@nschneid, hi! the UD_Finnish-FTB has an interesting construction

# sent_id = j7hnk-6227
«Viron presidentti Lennart Meri on yksi niitä [ilmeisen harvoja valtiomiehiä], jotka laativat puheensa itse.»
«The President of Estonia, Lennart Meri, is one of the [apparently few statesmen] who write their own speeches.»

8       ilmeisen        ilmeinen        ADJ     A,Sg,Gen        Case=Gen|Number=Sing    9       amod    _       _
9       harvoja harva   DET     Pron,Qnt,Pl,Par Case=Par|Number=Plur|PronType=Ind       10      det     _       _
10      valtiomiehiä    valtiomies      NOUN    N,Pl,Par        Case=Par|Number=Plur    0       root    _       _

ilmeisen harvoja valtiomiehiä
The genitive-case adjective ‹apparent› modifies the determiner ‹few›.
This same construction with a genitive-case adjective is observed with expressions of color, e.g.

sininen ‹blue›
vaalean + sininen ‹light + blue›
tumman + punainen ‹dark + red›
NB! in Finnish these are written as one word, i.e., vaaleansininen, tummanpunainen, except, perhaps, when saying ‹especially dark red› erittäin tumman punainen
amod(red, dark)
advmod(dark, especially)
The Finnish grammar might make reference to an instructive case in -n, but this instance ilmeisen harvoja valtiomiehiä does not seem to fall into that category: https://kaino.kotus.fi/visk/sisallys.php?p=389.
What do you think @fginter, @flammie, @jpiitula?

@jpiitula

@jpiitula sent_id = j7hnk-6227 is problematic. See UniversalDependencies/docs#1059

johnnymoretti · 2024-10-24T12:36:08Z

The rule in validator script is something like that :

if re.match(r"^(det|clf)$", pdeprel) and not re.match(r"^(advmod|obl|goeswith|fixed|reparandum|conj|cc|punct)$",cdeprel) :

if I understand correctly we are allowed to use only obl and not obl:cmp , right ? If it is so, why ? The main dependency relation shouldn't cover also its subtypes ?

KoichiYasuoka · 2024-10-24T12:47:27Z

Thank you @johnnymoretti but I think that det and clf cannot be treated in the same way. In Thai clf can be modified by ADJ or PRON (whose). On the other hand det can be linked by flat or conj...

johnnymoretti · 2024-10-24T12:59:58Z

@KoichiYasuoka For sure, I'm not going into detail about the language, I've just reported what the rule says. At the moment det and clf are in the same rule.

Stormur · 2024-10-24T15:36:24Z

In Classical Chinese 彼此兵 (those and these soldiers) is invalidated by this new rule. How do we solve it?

Why not conj here?

lauma · 2024-10-24T15:46:32Z

In Latvian we have occasional subordinate clause problem as well - tās somas, ko atrada vakar 'those bags which were found yesterday', because in this situation we might as well talk about various kinds of bags, some where found yesterday, and some not. We struggle applying concept of determiners for Latvian in general, but this seems to be a determiner situation, right?

Stormur · 2024-10-24T15:55:41Z

https://universaldependencies.org/en/dep/nmod.html I note that the English provides ‹my office› nmod:poss(office, my) which is the same coding as in EWT.

So it looks like there might be a Swedish–English consensus for nmod:poss use with possessive pronouns, and genitive personal pronouns.

But eng. my is not a pronoun... actually, I do not understand how my office can use nmod in English in the current standard.

The case you report

sent_id = EKS:2011:39:15:ČesnokovF
Конат-конат сонзэ (Степан Иваныч) ладсо сырелгадсть...
Konat-konat    sonze    (Stepan Ivanych)  ladso    syrelgadstʹ...
such-such.Pl  his/her  (St. I.)                    in.way   become.older.3Pl

looks very similar to the latin one I discussed: you have one element referring to the Person which is in сонзэ and which cannot go with ладсо. A kind of "double referent" phrase.

But we might be up to something regarding elements adding Persons. The case you report lets me wonder if indeed any element like this warrants nmod even when they look like other DETs.

dan-zeman · 2024-11-03T12:13:43Z

@pkocharov : In Classical Armenian, prepositions and articles can be repeated with modifiers, including demonstrative pronominal adjectives, within NP, cf.

i kʻarancʻ y ayscʻanē from stone.ABL.PL from this.ABL.PL "from these (from) stones"

det(kʻarancʻ, ayscʻanē) case(kʻarancʻ, i) case(ayscʻanē, y)

Shall I change the UPOS of demonstrative pronominal adjectives from DET to PRON or ADJ and replace det with nmod or amod, accordingly? Would not this contradict the guidelines for annotating demonstratives?

I just added an exception to the test because I think we can hardly require that the demonstrative's case is attached as a second copy of the case to the head. I'm not particularly happy with it, as it also opens the door for other cases that are really errors (there was one Chinese example in this thread). Ideally the exception should be more focused and maybe specific to Classical Armenian.

verenablaschke · 2024-11-03T12:18:32Z

@dan-zeman

@jasiewert forgive me if I forgot the discussion of this example at LREC (I am not saying it did not occur!) But attaching Gemoene as nmod:poss of iarem seems odd to me because it suggests that the parish is the possessor of "she", while in fact (if I understand it correctly), the parish and she are coreferential. Attaching them both directly to Denste would not seem so bad to me (and if the possessee is dropped, it could be solved using the standard approach to ellipsis: promotion).

FWIW, this is similar to the structure we decided to go with for the Bavarian treebank, e.g. "in Lutha seina Iwasetzung" ("Luther's translation") is annotated as follows:

It might make sense to discuss adnominal possessive (dative) constructions in a separate Github issue, since lots of Germanic languages appear to have similar constructions (albeit with some idiosyncrasies). The UD annotations are different for every language. (Norwegian "[possessor_PROPN] sin/sitt [possessed]" is annotated as "[possessor] <-obl- sin <-det- [possessed]", and Afrikaans "[possessor] se [possessed]" as "case([possessor], se); nmod([possessed], [possessor])" with se tagged as PART. The Dutch treebanks contain few cases of "[possessor] z'n/d'r [possessed]" and those are annotated differently (one as "[possessor] <-amod- z'n <-nmod:poss- [possessed]" and one as separate phrases ("[possessor]" and "z'n [possessed]") that are independently attached to the root.))

dan-zeman · 2024-11-03T15:35:30Z

If you want to keep a constrained rule, we can allow discourse only when the determiner is a reparandum.

@sylvainkahane I am not sure I understand. If the discourse child is attached to a parent that itself is a reparandum, we do not have a problem at all (regardless whether the UPOS tag of the parent is DET). We would have a problem only if the determiner parent were attached as det.

dan-zeman · 2024-11-03T16:07:25Z

parataxis for cases such as "a, I don't know how to call that, a kiosk, …": here we have a reparandum link between the two "a"s and we would like to attach the parenthesis to the first "a". More exactly we use parataxis:parenth in our spoken French treebanks.

Is attaching the parenthetical to the first determiner better than attaching it to the noun (kiosk)? Apart from the non-projectivity – I realize that there is similarity between this and the discourse point above.

Yes it is similar to the discourse marker case and I propose the same solution. Moreover, in this case, "a, I don't know how to call that" forms a kind of semantic and prosodic unit, which is not the case of "I don't know how to call that, a kiosk". I really want to attach the parenthesis to the first determiner.

Now I realize that here, too, we shouldn't have a problem because the first determiner should be attached to the kiosk as a reparandum:

a , I do n't know how to call that , a kiosk

reparandum(kiosk, a-1)
det(kiosk, a-12)
parataxis(a-1, know)

…urse or parataxis? UniversalDependencies/docs#1059 (comment)

dan-zeman · 2024-11-03T17:01:37Z

The problem is not with the last day, the repandum starts from day. The problem is the analysis of the last, the false start itself. I don't like the idea to analyze it as a correct phrase, for instance with det(last,the). I want to keep the information that it is a false start and not a complete phrase. It is why we chose dep(the, last), but I am ok to use another relation.

It is not a complete phrase but I would still find it natural to apply the UD rules for ellipsis (=> for incomplete phrases), promote last and draw the relation det(last, the). I think the information that it is a false start is already encoded in the incoming reparandum relation (which could be further subtyped to reparandum:falsestart if it is needed).

I give you another example of false start, which I would like to analyze similarly:

les gens qu'on, qu'on voit pour la première fois
'people who we, who we see for the first time' (In French the relative pronoun cannot be omitted)

Here we have two pronouns qu' and on, which form the false start together, and I don't see what could be the link between them apart from dep.

I understand your concern, although I would claim that the UD ellipsis policy provides a possible solution here, too: orphan(on, qu').

Maybe a dedicated relation such as flat:disfluency?

Right now I think I prefer the ellipsis solution sketched above, but flat would probably still be better than dep. None of the two solutions should trigger the leaf-det validation test, so I think further discussion, if desirable, can take place in a new issue.

sylvainkahane · 2024-11-03T18:21:28Z

@dan-zeman My mistake! Your validator doesn't forbid discourse, parataxis, or orphan depending on a DET which is reparandum. And it is good like this.

dan-zeman · 2024-11-03T19:21:44Z

To summarise the above discussion, my two proposals are to deactivate this validation rule if:

the child of det is a flat relation

the head element has the feature Person, at least for acl:relcl

Or maybe Poss=Yes would be more to the point than non-empty Person.

pkocharov · 2024-11-03T23:15:15Z

@dan-zeman thank you very much for an update.

I just added an exception to the test because I think we can hardly require that the demonstrative's case is attached as a second copy of the case to the head. I'm not particularly happy with it, as it also opens the door for other cases that are really errors (there was one Chinese example in this thread). Ideally the exception should be more focused and maybe specific to Classical Armenian.

Would the use of "dislocated" relation leading from the head to the copy adposition be entirely out of question here?
Otherwise, shall I revert "DET...amod" back to "DET...det" before the release?

dan-zeman · 2024-11-04T09:48:29Z

The dislocated relation is intended for dependents of clauses, which is not the case here (and we would need an exception for it anyway). As for modifying the data during data freeze – I will answer that via e-mail in order not to spam the readers of this thread.

dan-zeman · 2024-11-04T10:00:34Z

It would be really, really benefical if when the situation stabilizees there would be some kind of summary about the results of all this thing. With 100 comments I got lost in the end, but the core part of this discussion was very usefull at least for me to understand det better. Maybe even clarification/additions in guidelines?

I wrote a summary here. I'm leaving this issue open as a reminder that after the release, the warnings should be turned back to errors (but with the exceptions I have added).

Stormur · 2024-11-04T15:35:03Z

det + nmod e.g. "at least some reports" (det(reports, some), nmod(some, least)). "at least" is admittedly ADV-like, so another option is to make it ExtPos=ADV and advmod.

I don't think at least is a nominal. Its head is not a NOUN, nor can it be replaced by a noun. If it is not a nominal, then it probably should not have nmod as its incoming relation. Least is an adjective and it is acceptable to use adjectives as advmod, so I would be inclined to use advmod here. The validator should be happy with it.

Is it acceptable? According to the guidelines, advmod should entail ADV, and viceversa. So this seems to be tolerated by the validator, but it is not standard, and in fact avoided by many (most?) treebanks. Or has something changed in the meantime? Not that I am against allowing adjectives to take advmod in certain conditions...

But in this case, the annotation of at least as advmod opens a can of worms. It would mean that as a consequence we can decide to annotate any kind of adverbial expression as advmod even if it is nominal like this one (for very good reasons: least is nominal; it is introduced by an adposition; it can take modifiers like in at the very least, etc.). We already have the triplet obl/advcl/advmod which covers these different cases: obl can be explained as "adverbial nominal phrase".. Else there is no reason not to annotate expressions such as by force as advmod, let's be careful.

Stormur · 2024-11-04T15:45:48Z

We souldn't have a universal page for nmod:poss.

In hindsight, I tend to agree. It was me who created the page (as well as the universal page for det:poss). I did this when I was introducing the requirement that all deprel subtypes are documented and I wanted to reduce the number of treebanks that would immediately become invalid, so for some widely adopted subtypes I picked a language-specific documentation page and made it available at the universal level. That's also the reason why I'm not going to remove these pages now: Many treebanks would become invalid.

It would be better if somebody proposed an improvement of the universal page that will explain the misunderstandings discovered in this thread. Because non-existence of the universal documentation of nmod:poss and det:poss will not prevent people from using it. They have seen it in English or elsewhere, so they assume they should use it too, whenever they find anything "possessive" in their language.

I want to make clear that I do not want that page removed. There was an implication: if nmdo:poss was created to describe the Saxon genitive, then we must not have a universal page for it.

Of course people see this tag and decide to use it in their treebanks, and then the problem arises that language-specific subtypes must make clear that they are language-specific. So :poss as a subtype cannot be used for Saxon's genitive because "possessive" is a generic concept that is realised by any language in many different ways, and then the use of this subtype across treebanks becomes incommensurable (as is already the case with many misleadingly similar or generic tags).

In short, my point was that we need clear policies about subtype labels. And for those already existing, a reworking might be needed, as with nmod:poss. Is it then the case that maintainers of a treebank decide they cannot mark every possessive construction? Good, they will renounce to this specific tag, possibly create a very language-specific one, and the whole documentation will gain in clarity.

Stormur · 2024-11-04T15:59:30Z

@Stormur : 1. The already mentioned reduplication, which is treated through flat:redup in Latin treebanks. One example is quot quot from quot: while the latter means 'as many as', the reduplication has a distributive sense as in 'for each possible one...' (this expression is sometimes even univerbated). I think to annotate them separately, each depending on the head, is not the right way to deal with them: here we do not have two or more different terms, but really the same one "clonating" itself. On the other hand, flat is really the closest relation we have to fixed, which would cause no problem, but is not a correct choice (well, in my opinion it is never the correct choice)

Problem: horizontal relation

Why is fixed not the correct choice? It looks like the correct choice to me. (Although I'm not in principle against relaxing the test also for flat.)

Because this is not a kind of "lexical crystallisation", but it is in fact a sort of derivational, to the limit of inflectional (if there is any difference...), process that can in principle be applied to any word. The exact effect of reduplication can be different (I'd have to look for more literature about that, but for Latin there does not seem to be so much), but it does look to be systematic. Latin just does not seem to use it so often as other languages. Here a quantitative meaning leads to a distributional reading. The deprel fixed, on the contrary, points to an idiosyncratic "merging" of the two words (by the way, I do not see it so distant from a dep...).

More general questions: in languages which express plurality by means of reduplication, what relation would you use for this kind of "extra-word inflection"? In my opinion, surely not fixed.

Anyway, I think it is just logical to allow any horizontal relation in this context, if one of them is accepted.

Stormur · 2024-11-04T16:05:52Z

@pkocharov : In Classical Armenian, prepositions and articles can be repeated with modifiers, including demonstrative pronominal adjectives, within NP, cf.
i kʻarancʻ y ayscʻanē from stone.ABL.PL from this.ABL.PL "from these (from) stones"
det(kʻarancʻ, ayscʻanē) case(kʻarancʻ, i) case(ayscʻanē, y)
Shall I change the UPOS of demonstrative pronominal adjectives from DET to PRON or ADJ and replace det with nmod or amod, accordingly? Would not this contradict the guidelines for annotating demonstratives?

I just added an exception to the test because I think we can hardly require that the demonstrative's case is attached as a second copy of the case to the head. I'm not particularly happy with it, as it also opens the door for other cases that are really errors (there was one Chinese example in this thread). Ideally the exception should be more focused and maybe specific to Classical Armenian.

What do you think of the second from seen as an expl, a needed repetition close to the head if this is separated from the adposition by other material? While the first one is treated as the "main" adposition. It seems to be implied that this is a regular occurrence.

Stormur · 2024-11-04T16:09:37Z

To summarise the above discussion, my two proposals are to deactivate this validation rule if:

the child of det is a flat relation

the head element has the feature Person, at least for acl:relcl

Or maybe Poss=Yes would be more to the point than non-empty Person.

As also @sylvainkahane pointed out, it seems that Poss=Yes is quite redundant. Are there cases where we have Poss=Yes but the Person is not marked / cannot be specified? Because in the cases that were discussed we always have an explicit person that is pointed to by the dependent.

dan-zeman · 2024-11-04T16:17:56Z

Why is fixed not the correct choice? It looks like the correct choice to me. (Although I'm not in principle against relaxing the test also for flat.)

Because this is not a kind of "lexical crystallisation", but it is in fact a sort of derivational, to the limit of inflectional (if there is any difference...), process that can in principle be applied to any word. The exact effect of reduplication can be different (I'd have to look for more literature about that, but for Latin there does not seem to be so much), but it does look to be systematic. Latin just does not seem to use it so often as other languages. Here a quantitative meaning leads to a distributional reading. The deprel fixed, on the contrary, points to an idiosyncratic "merging" of the two words (by the way, I do not see it so distant from a dep...).

More general questions: in languages which express plurality by means of reduplication, what relation would you use for this kind of "extra-word inflection"? In my opinion, surely not fixed.

I am definitely not saying that every type of reduplication should be fixed; certainly not plural nouns (in Indonesian, these are even written as one token, http://hdl.handle.net/11346/PMLTQ-YXCN).

But if a fixed sequence of words works like a function word, e.g., a determiner, then I find it appropriate, and no proof of "lexical crystalization" is necessary; the fact that the multiword expression has different function than the members individually seems enough to me. After all, I also remember a discussion whose result was that more than should be a fixed expression.

dan-zeman · 2024-11-04T16:20:20Z

@pkocharov : In Classical Armenian, prepositions and articles can be repeated with modifiers, including demonstrative pronominal adjectives, within NP, cf.
i kʻarancʻ y ayscʻanē from stone.ABL.PL from this.ABL.PL "from these (from) stones"
det(kʻarancʻ, ayscʻanē) case(kʻarancʻ, i) case(ayscʻanē, y)
Shall I change the UPOS of demonstrative pronominal adjectives from DET to PRON or ADJ and replace det with nmod or amod, accordingly? Would not this contradict the guidelines for annotating demonstratives?

I just added an exception to the test because I think we can hardly require that the demonstrative's case is attached as a second copy of the case to the head. I'm not particularly happy with it, as it also opens the door for other cases that are really errors (there was one Chinese example in this thread). Ideally the exception should be more focused and maybe specific to Classical Armenian.

What do you think of the second from seen as an expl, a needed repetition close to the head if this is separated from the adposition by other material? While the first one is treated as the "main" adposition. It seems to be implied that this is a regular occurrence.

As of now, I think expl is used only for pronouns in UD. In the taxonomy of UD relations places it among non-core dependents of clauses, and the first sentence of its definition says that it is for nominals. Not fitting adpositions.

dan-zeman · 2024-11-04T16:21:19Z

Or maybe Poss=Yes would be more to the point than non-empty Person.

As also @sylvainkahane pointed out, it seems that Poss=Yes is quite redundant. Are there cases where we have Poss=Yes but the Person is not marked / cannot be specified?

Yes.

Stormur · 2024-11-04T16:51:16Z

Why is fixed not the correct choice? It looks like the correct choice to me. (Although I'm not in principle against relaxing the test also for flat.)

Because this is not a kind of "lexical crystallisation", but it is in fact a sort of derivational, to the limit of inflectional (if there is any difference...), process that can in principle be applied to any word. The exact effect of reduplication can be different (I'd have to look for more literature about that, but for Latin there does not seem to be so much), but it does look to be systematic. Latin just does not seem to use it so often as other languages. Here a quantitative meaning leads to a distributional reading. The deprel fixed, on the contrary, points to an idiosyncratic "merging" of the two words (by the way, I do not see it so distant from a dep...).
More general questions: in languages which express plurality by means of reduplication, what relation would you use for this kind of "extra-word inflection"? In my opinion, surely not fixed.

I am definitely not saying that every type of reduplication should be fixed; certainly not plural nouns (in Indonesian, these are even written as one token, http://hdl.handle.net/11346/PMLTQ-YXCN).

But if a fixed sequence of words works like a function word, e.g., a determiner, then I find it appropriate, and no proof of "lexical crystalization" is necessary; the fact that the multiword expression has different function than the members individually seems enough to me. After all, I also remember a discussion whose result was that more than should be a fixed expression.

Why should it be more appropriate for function words than for content ones? I do not see a difference. A reduplicated pluralising expression, too, as a multiword has a different function than its corresponding simple one. This is not the "end phase of a process of grammaticalization", it is just a tool that the language can use. We are very much in the direction of an iconic sequence. On the contrary, fixed seems to imply some kind of non-transparency which cannot be found here.

quot quot is really keeping everything of quot intact, just implying a distributive reading which descends from its (indefinite) quantitative value.

As for more than, I do not think it has analogies with this case. But to put it very briefly I do not understand its treatment as fixed, especially not in the light of this discussion.

dan-zeman · 2024-11-04T16:56:08Z

Why should it be more appropriate for function words than for content ones?

Because it is what fixed was designed for. Its documentation also says "Such expressions tend to behave like function words."

Stormur · 2024-11-04T17:09:10Z

Why should it be more appropriate for function words than for content ones?

Because it is what fixed was designed for. Its documentation also says "Such expressions tend to behave like function words."

I can understand this point of view, but it does not seem to be explicit from the guidelines. There, the quality of being functional is referred to the output of the fixed expression, not to its components, which are very often content words. On the contrary, there does not seem to be the implication that flat combinations of function words require fixed (I don't think quack is a content word 🙂 ).

I mean, in this particular case there is no grammaticalisation: quot was functional and still stays functional when reduplicated. So the logic behind fixed does not seem applicable here.

Also, it is productive and regular (even if possibly confined to a very small class), which again does not seem to be represented by fixed, which seems to imply "each grammaticalisation has its own story".

dan-zeman · 2024-11-04T19:38:49Z

I can understand this point of view, but it does not seem to be explicit from the guidelines.

It is as explicit as the sentence I cited; but it is not defined sharply, and I suppose it is meant to encompass less functional items such as the "light adverbials" that can modify function words. But while the boundary may be blurry, I think it is clear that the fixed expression should not be the main predicate of a clause, or a referential nominal.

There, the quality of being functional is referred to the output of the fixed expression, not to its components,

Exactly. I did not say that the nature of the components matters.

which are very often content words. On the contrary, there does not seem to be the implication that flat combinations of function words require fixed (I don't think quack is a content word 🙂 ).

I mean, in this particular case there is no grammaticalisation: quot was functional and still stays functional when reduplicated. So the logic behind fixed does not seem applicable here.

Also, it is productive and regular (even if possibly confined to a very small class), which again does not seem to be represented by fixed, which seems to imply "each grammaticalisation has its own story".

Fair enough. I tend to take fixed as a convenient tool when there is a functional expression which needs to be kept as one constituent, even without clear grammaticalization path, but I realize that such stance is at odds with the recommendation in the description of fixed.

UniversalDependencies/docs#1059 (comment)

gossebouma · 2024-11-08T10:31:59Z

It might make sense to discuss adnominal possessive (dative) constructions in a separate Github issue, since lots of Germanic languages appear to have similar constructions (albeit with some idiosyncrasies). The UD annotations are different for every language. (Norwegian "[possessor_PROPN] sin/sitt [possessed]" is annotated as "[possessor] <-obl- sin <-det- [possessed]", and Afrikaans "[possessor] se [possessed]" as "case([possessor], se); nmod([possessed], [possessor])" with se tagged as PART. The Dutch treebanks contain few cases of "[possessor] z'n/d'r [possessed]" and those are annotated differently (one as "[possessor] <-amod- z'n <-nmod:poss- [possessed]" and one as separate phrases ("[possessor]" and "z'n [possessed]") that are independently attached to the root.))

FWIW, I just noted the Luxembourgish LuxBank also adopts this analysis (their Fig 6)

Dan objects to the analysis proposed for Lower Saxon by pointing out that both are in fact coreferential, and the nmod:poss relation would be misleading here. One could perhaps save that analysis by annotating it as an apposition (given that a classical criterion for appositions is that they refer to the same entity), although it would mean that the apposition precedes the head.

As for the Dutch examples, they are clearly a mess, and need to be corrected ;-)

verenablaschke mentioned this issue Oct 17, 2024

"unter anderem" and "vor allem" in German #1060

Closed

rueter added a commit to UniversalDependencies/UD_Finnish-FTB that referenced this issue Oct 24, 2024

Correct det dependencies

47a4922

@jpiitula sent_id = j7hnk-6227 is problematic. See UniversalDependencies/docs#1059

dan-zeman added a commit to UniversalDependencies/tools that referenced this issue Nov 3, 2024

Maybe determiners could allow outgoing dependencies of the type disco…

113ec15

…urse or parataxis? UniversalDependencies/docs#1059 (comment)

sylvainkahane mentioned this issue Nov 3, 2024

UD's fundations: functionalism vs distributionalism #1063

Open

sylvainkahane mentioned this issue Nov 3, 2024

nmod:poss and det:poss #1064

Open

dan-zeman added a commit to UniversalDependencies/tools that referenced this issue Nov 4, 2024

flat

3d49ce3

UniversalDependencies/docs#1059 (comment)

dan-zeman closed this as completed in UniversalDependencies/tools@39feccd Dec 3, 2024

nschneid mentioned this issue Dec 3, 2024

ExtPos/advmod for quantitative "at (the) least/most" UniversalDependencies/UD_English-EWT#553

Open

2 tasks

New validator rule: leaf-det (and det vs. nmod) #1059

New validator rule: leaf-det (and det vs. nmod) #1059

Comments

nschneid commented Oct 8, 2024 • edited Loading

mr-martian commented Oct 8, 2024

amir-zeldes commented Oct 8, 2024

mr-martian commented Oct 8, 2024

amir-zeldes commented Oct 8, 2024

colinbatchelor commented Oct 10, 2024

nschneid commented Oct 10, 2024

LeonieWeissweiler commented Oct 10, 2024 • edited Loading

nschneid commented Oct 10, 2024

amir-zeldes commented Oct 10, 2024

FedeIure commented Oct 11, 2024

sylvainkahane commented Oct 11, 2024

lrituma commented Oct 15, 2024

nschneid commented Oct 15, 2024

Stormur commented Oct 17, 2024 • edited Loading

amir-zeldes commented Oct 17, 2024

jasiewert commented Oct 20, 2024

lauma commented Oct 21, 2024

rueter commented Oct 21, 2024 • edited Loading

johnnymoretti commented Oct 22, 2024

KoichiYasuoka commented Oct 23, 2024

rueter commented Oct 24, 2024

johnnymoretti commented Oct 24, 2024

KoichiYasuoka commented Oct 24, 2024

johnnymoretti commented Oct 24, 2024

Stormur commented Oct 24, 2024

lauma commented Oct 24, 2024 • edited Loading

Stormur commented Oct 24, 2024

dan-zeman commented Nov 3, 2024

verenablaschke commented Nov 3, 2024

dan-zeman commented Nov 3, 2024

dan-zeman commented Nov 3, 2024

dan-zeman commented Nov 3, 2024

sylvainkahane commented Nov 3, 2024

dan-zeman commented Nov 3, 2024

pkocharov commented Nov 3, 2024

dan-zeman commented Nov 4, 2024

dan-zeman commented Nov 4, 2024

Stormur commented Nov 4, 2024

Stormur commented Nov 4, 2024

Stormur commented Nov 4, 2024 • edited Loading

Stormur commented Nov 4, 2024

Stormur commented Nov 4, 2024

dan-zeman commented Nov 4, 2024

dan-zeman commented Nov 4, 2024

dan-zeman commented Nov 4, 2024

Stormur commented Nov 4, 2024

dan-zeman commented Nov 4, 2024

Stormur commented Nov 4, 2024

dan-zeman commented Nov 4, 2024

gossebouma commented Nov 8, 2024

nschneid commented Oct 8, 2024 •

edited

Loading

LeonieWeissweiler commented Oct 10, 2024 •

edited

Loading

Stormur commented Oct 17, 2024 •

edited

Loading

rueter commented Oct 21, 2024 •

edited

Loading

lauma commented Oct 24, 2024 •

edited

Loading

Stormur commented Nov 4, 2024 •

edited

Loading