Names of subgenera don't get parsed if subgen. is included in the scientific name value #232

KatjaSchulz · 2022-07-07T20:32:08Z

Psammophanes subgen. Psammophrynopsis Koch, 1953 parses fine, with a warning.

subgen. Psammophrynopsis Koch, 1953 doesn't get parsed at all.

It would be good if names preceded by subgen. would get parsed. They do occur in the wild, e.g. in current Catalogue of Life files.

dimus · 2022-08-17T18:48:02Z

hm, interesting, @gdower, do you know how often such names happen?

gdower · 2022-08-17T19:01:58Z

It's not super-common in the datasets that I work with, but I do see it occasionally. In World Plants/World Ferns, 0.83% of the lines include "subgen." Often in that source it's included like this: subgen. Filago (without the genus included).

KatjaSchulz · 2022-08-18T19:19:28Z

Sorry, I should have provided more context. There are 38 subgenera with this scientificName structure in the current version of COL (2022-07-12).

colSubgen.txt

These are derived from two different sources, both entomological:
World catalogue of the tribe Sepidiini (Tenebrionidae, Coleoptera) and Lygaeoidea Species File

dimus · 2022-08-19T16:02:58Z

@KatjaSchulz and @gdower, thank you for the information! I am on a fence about this particular parsing. If there are only so few of them, does it make sense to slow down parsing for the vast majority of other names by checking this specific case?

I'll try to figure out a faster approach to check the first word.

KatjaSchulz · 2022-08-19T17:23:50Z

I'm not really in a position to evaluate whether it would be worth it. For the time being, we can handle these through post-processing. We can revisit if we encounter more cases with this usage.

gdower · 2022-08-19T18:34:42Z

@yroskov and I are aiming to fix those names in the Sept release of CoL.

dimus · 2024-04-15T20:29:02Z

@gdower @yroskov, is it fixed in CoL, can this issue be closed?

dimus · 2024-05-02T16:03:43Z

According to @yroskov and @gdower such names are allowed by botanical code, so I am going to figure out how to parse them

dimus · 2024-06-04T18:23:10Z

Parsing it now like this:

{
  "parsed": true,
  "quality": 2,
  "qualityWarnings": [
    {
      "quality": 2,
      "warning": "Uninomial prepended by its rank"
    }
  ],
  "verbatim": "subgen. Psammophrynopsis Koch, 1953",
  "normalized": "subgen. Psammophrynopsis Koch 1953",
  "canonical": {
    "stemmed": "Psammophrynopsis",
    "simple": "Psammophrynopsis",
    "full": "subgen. Psammophrynopsis"
  },
  "cardinality": 1,
  "rank": "subgen.",
  "authorship": {
    "verbatim": "Koch, 1953",
    "normalized": "Koch 1953",
    "year": "1953",
    "authors": [
      "Koch"
    ],
    "originalAuth": {
      "authors": [
        "Koch"
      ],
      "year": {
        "year": "1953"
      }
    }
  },
  "details": {
    "uninomial": {
      "uninomial": "Psammophrynopsis",
      "rank": "subgen.",
      "authorship": {
        "verbatim": "Koch, 1953",
        "normalized": "Koch 1953",
        "year": "1953",
        "authors": [
          "Koch"
        ],
        "originalAuth": {
          "authors": [
            "Koch"
          ],
          "year": {
            "year": "1953"
          }
        }
      }
    }
  },
  "words": [
    {
      "verbatim": "subgen.",
      "normalized": "subgen.",
      "wordType": "RANK",
      "start": 0,
      "end": 7
    },
    {
      "verbatim": "Psammophrynopsis",
      "normalized": "Psammophrynopsis",
      "wordType": "UNINOMIAL",
      "start": 8,
      "end": 24
    },
    {
      "verbatim": "Koch",
      "normalized": "Koch",
      "wordType": "AUTHOR_WORD",
      "start": 25,
      "end": 29
    },
    {
      "verbatim": "1953",
      "normalized": "1953",
      "wordType": "YEAR",
      "start": 31,
      "end": 35
    }
  ],
  "id": "1b8f7c8c-16c8-5411-a992-f7945f0e3838",
  "parserVersion": "v1.9.2-5-g93e2782"
}

dimus · 2024-06-04T18:53:24Z

part of v1.10.0 release now

Archilegt mentioned this issue Aug 22, 2022

Consider parsing "Untergattung" gnames/gnfinder#126

Closed

dimus closed this as completed Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Names of subgenera don't get parsed if subgen. is included in the scientific name value #232

Names of subgenera don't get parsed if subgen. is included in the scientific name value #232

KatjaSchulz commented Jul 7, 2022

dimus commented Aug 17, 2022

gdower commented Aug 17, 2022

KatjaSchulz commented Aug 18, 2022

dimus commented Aug 19, 2022 •

edited

Loading

KatjaSchulz commented Aug 19, 2022

gdower commented Aug 19, 2022

dimus commented Apr 15, 2024

dimus commented May 2, 2024

dimus commented Jun 4, 2024

dimus commented Jun 4, 2024

Names of subgenera don't get parsed if subgen. is included in the scientific name value #232

Names of subgenera don't get parsed if subgen. is included in the scientific name value #232

Comments

KatjaSchulz commented Jul 7, 2022

dimus commented Aug 17, 2022

gdower commented Aug 17, 2022

KatjaSchulz commented Aug 18, 2022

dimus commented Aug 19, 2022 • edited Loading

KatjaSchulz commented Aug 19, 2022

gdower commented Aug 19, 2022

dimus commented Apr 15, 2024

dimus commented May 2, 2024

dimus commented Jun 4, 2024

dimus commented Jun 4, 2024

dimus commented Aug 19, 2022 •

edited

Loading