Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Noun+Number constructions #654

Closed
nschneid opened this issue Oct 12, 2019 · 29 comments
Closed

Noun+Number constructions #654

nschneid opened this issue Oct 12, 2019 · 29 comments

Comments

@nschneid
Copy link
Contributor

In constructions like the following, should the number or the preceding noun be the head, and if the latter, should nummod apply?

  • This is the number one restaurant in town.: nummod(number, one)?
    • our # 1 priority
  • This restaurant is number one.: nummod(number, one)?
    • These restaurants are number(s) one and two.
  • He works at station number 3: appos(station, number)?
    • *He works at the station number 3
    • He works at new station number 3 = 'He works at the 3rd new station', not 'He works at the new station, which is station number 3'
  • on day 2 of the trip: nummod(day, 2)?
    • *on the day 2 of the trip
    • Pluralization: on days 2 and 3 of the trip—suggests "day(s)" is the head

(encountered in investigating UniversalDependencies/UD_English-EWT#79)

@colinbatchelor
Copy link
Contributor

colinbatchelor commented Oct 13, 2019

I have a similar question for Scottish Gaelic. In the sports commentary subcorpus the commentator says things like "Alba neoni Yugoslavia neoni" 'Scotland nil Yugoslavia nil'.

This isn't nummod, though, is it? Tentatively going for obj(Alba, neoni) as if it's short for Scotland [have scored] nil [goals].

@sylvainkahane
Copy link
Contributor

@colinbatchelor in this case neoni is the predicate/focus/rhema. I will put it as the root. And Alba as dislocated (or maybe nsubj).

@colinbatchelor
Copy link
Contributor

@sylvainkahane That makes sense. I think I prefer dislocated because Gaelic is VSO. Many thanks!

@dan-zeman
Copy link
Member

I think the current practice is to use nummod(number, one). Personally I think it would be more useful to distinguish it from expressions like one number but I am not aware of a guideline that would say what to do here. (On the other hand, the nummod guideline says that the modifier specifies quantity, so it should not apply to number one.)

@dan-zeman dan-zeman added this to the v2.5 milestone Oct 14, 2019
@dan-zeman
Copy link
Member

A similar older issue: #466

Perhaps we really need to put something about this in the guidelines.

@nschneid
Copy link
Contributor Author

What about subtyping, e.g. nummod:post (or nummod:index to distinguish from proper quantities)?

@amir-zeldes
Copy link
Contributor

amir-zeldes commented Oct 15, 2019

I'm not sure the number is actually the modifier here, despite the pluralization of days... My expectation is that if only one of two things can be omitted then the one we can't omit is the head:

Example question "Which one do you want?". Possible answers:

  • Number 2
  • 2
  • * Number

Couldn't 'number' be a little like a title or modifier, expanding what kind of entity '2' refers to?

Intuitively I like appos better than nummod here, but I almost like compound(2,number) better (possibly subtyped somehow)

@nschneid
Copy link
Contributor Author

nschneid commented Oct 15, 2019

One way to interpret this is that "number" turns a cardinal number into an ordinal number. Are there languages where ordinality is regularly expressed compositionally, with two syntactic words, one being a cardinal number?

@nschneid
Copy link
Contributor Author

Or rather, turning a cardinal into an ordinal is one function of this construction. Another is to express a conventional label for something: "room number 45B".

@sylvainkahane
Copy link
Contributor

I think that in such constructions, numbers behave like nouns (if not as nouns). The analysis might be the same as in N N constructions, such as President Trump (cf. #503). Which also means that the solutions depend on the language: if in English the head seems to be the number (number 3 can commute with 3 but not with number), in French, a determiner is obligatory and I would choose the noun as the head (le nombre 3 commutes with le nombre and almost all noun modifiers are on the right of the noun).

@rueter
Copy link
Contributor

rueter commented Oct 15, 2019

@nschneid This is an interesting idea, but I have a problem with it

Or rather, turning a cardinal into an ordinal is one function of this construction. Another is to express a conventional label for something: "room number 45B".

I presume this is analogous to your question above:

  • (a) He works at new station number 3 = (b) 'He works at the 3rd new station', not (c) 'He works at the new station, which is station number 3'
    

I see a contrast in (a) new station number 3 and old station number 3,
but I fail to see an equivalence in English between (a) new station number 3 and (b) the 3rd new station, which, by the way, might work in Russian, e.g. streetcar **number 3** might be translated into Russian using an ordinal if the word or streetcar and number are absent третий.

In English the matter seems to be semantically scaled.
I could see an alignment between*

  • He lives on floor four. and He lives on the fourth floor.
    This is an ordinal-ish reading for items conceptualized in a specific order; the floors are one on top of the other and therefore can be counted in order according to a one-dimensional setting (but you can start counting anywhere on the line: bottom, top or somewhere in between).

In a two- or three-dimensional setting, however, there is no intuitive starting point.

  • She works at/in building three, i.e. the one whose identity is three and is a building
  • The teacher is coming on streetcar number three or streetcar letter T,

Building or streetcar identities are in no logical/intuitive relationship to ordinals:

  • She works in the third building, which might be building fifty-four for all we know.
  • The teacher is coming on the third streetcar, which means we count the how-manieth.

@nschneid
Copy link
Contributor Author

@rueter

I see a contrast in (a) new station number 3 and old station number 3,
but I fail to see an equivalence in English between (a) new station number 3 and (b) the 3rd new station

If it is a new station officially designated with the number 3, I would say "I work at the new station number 3". Without the determiner, I read it as equivalent to "the 3rd new station". The bracketing is presumably different: [[new station] [number 3]] being the 3rd of the new stations, vs. new [station number 3] for the officially designated Station 3.

@nschneid
Copy link
Contributor Author

@manning, @sebschu: Any thoughts? While I fix the POS tags of "#" I'd like to make the dependencies consistent as well.

@sebschu
Copy link
Member

sebschu commented Oct 21, 2019

I don't have a strong opinion on this, and I agree with a lot of the issues pointed out in the comments here. However, since the current convention is to use nummod(number, one) and there doesn't seem to be a substantially better solution, I would suggest we stick with that convention.

@amir-zeldes
Copy link
Contributor

Is nummod(number,one) really better than compound(one,number)? I still think 'one' is the head. We don't have this case in GUM yet, but other 'non-counting numbers' are not annotated as nummod already (we've used 'dep' for some similar constructions in the past)

@parapluirevel
Copy link

image

@nschneid
Copy link
Contributor Author

@parapluirevel Yes, I agree that depending on context, "number X" can act more like an ordinal or more like a way to identify something by a proper name (that happens to involve a numeric designator). In the latter case "number" may be optional. But I'm not sure whether this ambiguity should or should not be taken into account in terms of the dependency relation.

I think it's important that the designator in the name construction be a label of some sort and not something with its own independent referent: "Room GHC 5713" is fine, but I would not say "*Room Gates Hillman" or but rather "the Gates-Hillman Room" or just "Gates-Hillman" for short. The article is mandatory with "Room" as the head: "*I went to Gates-Hillman Room". Same with "Building", but "Hall" (as on a university campus) prefers not to have an article: "I went to Gates-Hillman Hall".

In any case, UD is far from providing a complete account of the internal structure of and constraints on proper names and other constructions involving dates and values.

@dan-zeman dan-zeman modified the milestones: v2.5, v2.6 Nov 9, 2019
@arademaker
Copy link
Contributor

In the EWT corpus I found cases where a year is nummod of a month in a date.

I had a surgery date of July 17, 2008.

Does it make sense? According to https://universaldependencies.org/u/dep/nummod.html:

A numeric modifier of a noun is any number phrase that serves to modify the meaning of the noun with a quantity.

the year is not modifying the meaning of the month with a quantity, is it?

@amir-zeldes
Copy link
Contributor

I think I've posted about this in the past (not sure about the issue number(s)), in my opinion dates in English should have the day as the head, and the month and year as modifiers of the day.

In UD_English-GUM we do:

compound(17,July)
nmod:tmod(17,2008)

With the idea that "July 17" is a subtype of days of the kind "17th day of the month", and this is "the July one" out of those (so treat it like a compound). 2008 is an adverbially used noun, meaning something like 'in 2008', but since it doesn't have a preposition and carries temporal modification meaning, we don't use the normal nmod relation, and use the nmod:tmod subtype, which is inherited from Stanford Dependencies. This makes the usage of 2008 distinguishable from cases like "July 17 in 2008 was particularly hot", where you have the usual nmod+case construction.

@nschneid
Copy link
Contributor Author

nschneid commented Apr 2, 2020

@arademaker See #455 and linked issues.

@arademaker
Copy link
Contributor

arademaker commented Apr 12, 2020

Hi, in Bush earned 253 points in his first year (EWT corpus) the first is amod of year. Is this the right analysis? If so, would it be different with a numeral () instead of the word?

@amir-zeldes
Copy link
Contributor

I think generally we treat words 'as they are pronounced', so we would treat "first" and "1st" the same. Therefore if a language has an explicit ordinal marker (e.g. in German we can say "der 1. Versuch" - 'the first try', and the period means it is pronounced like 'first'), then I would also tag it ADJ and attach as amod.

@arademaker
Copy link
Contributor

In #455 nothing was said about the nmod:tmod suggested above. These two issues are very related but still with some open questions.

@dan-zeman
Copy link
Member

The decision for number one and similar is that they are treated as NOUN-NOUN constructions, see example (47) in de Marneffe et al. 2021.

@nschneid
Copy link
Contributor Author

Pasting for convenience:

image

There is no example tree but I take it the passage is saying it should be nmod(number/NOUN, one/NOUN).

For English, where plain nmod is reserved for PPs, I take it this would be nmod:npmod.

However, we may want to revisit this in the mischievous nominals discussion.

@dan-zeman
Copy link
Member

the passage is saying it should be nmod(number/NOUN, one/NOUN)

Actually I believe it should be nmod(number/NOUN, one/NUM). The number one is treated as a nominal modifying another nominal, but the word itself is not a noun.

@nschneid
Copy link
Contributor Author

"UD analyzes the number as a noun"—you're saying this is a statement about the deprel and not the UPOS?

@dan-zeman
Copy link
Member

"UD analyzes the number as a noun"—you're saying this is a statement about the deprel and not the UPOS?

Hmm, now that I see the wording again, it certainly doesn't sound so—although that was how I understood the consensus when we were writing the paper! Too bad that we already had such a huge number of annotated examples that we didn't bother to add one here, too.

I was convinced we were talking about the deprels; more specifically, that nummod would be inappropriate in these situations, hence nmod is the substitute. This is also the only explicitly mentioned annotation rule in that paragraph ("...via a nmod relation, unless..."). I don't think it will help to also retag a number as a noun (it would be surprising especially when expressed in digits; it also does not behave morphologically like a noun). On the other hand, if the label is a letter (A) or a mix (102L), the NOUN tag definitely makes more sense than NUM.

@amir-zeldes
Copy link
Contributor

I think this is exactly the construction we discussed in the mischievous nominals paper, no? As of now I think it's nummod in EWT and dep in GUM, which just means 'awaiting decision'. The way English uses nmod it would have to be nmod:npmod, but Nathan and I were intending to put this under the proposed nmod:desc subtype. I'm also fine with dep TBH, which as I you know I take to mean "we know what's the head but none of our other labels fit at the moment"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants