Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(resolver): reference strings of int as object #812

Closed
wants to merge 1 commit into from

Conversation

ericabrauer
Copy link

For strings of numbers (with length 2 or more)
that should maintain leading zeroes, add an
implicit resolver to preserve the object and
return as string.

Examples:
0138 = '0138'
0149 = '0149'

Whereas 0145 doesn't have the issue.

Test mapping:
{
"test_strs": [
{"test_str": "0138000000000"},
{"test_str": "0149000000000"},
{"test_str": "0145000000000"},
{"test_str": "5945000000000"},
]
}

For strings of numbers (with length 2 or more)
that should maintain leading zeroes, add an
implicit resolver to preserve the object and
return as string.

Examples:
0138 = '0138'
0149 = '0149'

Whereas 0145 doesn't have the issue.

Test mapping:
{
    "test_strs": [
        {"test_str": "0138000000000"},
        {"test_str": "0149000000000"},
        {"test_str": "0145000000000"},
        {"test_str": "5945000000000"},
    ]
}

Signed-off-by: erica.brauer <[email protected]>
@ericabrauer
Copy link
Author

If it's not desirable to add this resolver by default, anyone looking to reuse the solution can do so with the following

yaml.SafeDumper.add_implicit_resolver(
    "tag:yaml.org,2002:object",
    re.compile(r"\d{2,}"),
    first=list("0")
)

@ericabrauer
Copy link
Author

Adding test file for reference
test-yaml-dump-str.txt

@ericabrauer
Copy link
Author

Now that I understand it happens when there's an 8 or 9 in the str I think I'm starting to follow a little better.

I tested updating resolver.py line 189 from |[-+]?0[0-7_]+ to |[-+]?0[0-9_]+ and it failed with invalid literal for int() with base 8: '008' which makes sense having read the issues with YAML 1.1 compliance for this repo.

For reference, the int implicit resolver:

186: Resolver.add_implicit_resolver(
187:        'tag:yaml.org,2002:int',
188:        re.compile(r'''^(?:[-+]?0b[0-1_]+
189:                    |[-+]?0[0-9_]+
190:                    |[-+]?(?:0|[1-9][0-7_]*)
191:                    |[-+]?0x[0-9a-fA-F_]+
192:                    |[-+]?[1-9][0-9_]*(?::[0-5]?[0-9])+)$''', re.X),
193:        list('-+0123456789'))

Testing the current state (prior to this PR) with the following:

[{'strId': '005', 'intId': 005},
 {'strId': '006', 'intId': 006},
 {'strId': '007', 'intId': 007},
 {'strId': '008', 'intId': '008'},
 {'strId': '009', 'intId': '009'},
 {'strId': '010', 'intId': 008},
 {'strId': '011', 'intId': 009},
 {'strId': '012', 'intId': 010},
 {'strId': '013', 'intId': 011},
 {'strId': '014', 'intId': 012}]

SafeLoader loads as:

strId 005 <class 'str'>
intId 5 <class 'int'>
strId 006 <class 'str'>
intId 6 <class 'int'>
strId 007 <class 'str'>
intId 7 <class 'int'>
strId 008 <class 'str'>
intId 008 <class 'str'>
strId 009 <class 'str'>
intId 009 <class 'str'>
strId 010 <class 'str'>
intId 8 <class 'int'>
strId 011 <class 'str'>
intId 9 <class 'int'>
strId 012 <class 'str'>
intId 10 <class 'int'>
strId 013 <class 'str'>
intId 11 <class 'int'>
strId 014 <class 'str'>
intId 12 <class 'int'>

SafeDumper with canonical returns:

!!seq [
  !!map {
    ? !!str "intId"
    : !!int "5",
    ? !!str "strId"
    : !!str "005",
  },
  !!map {
    ? !!str "intId"
    : !!int "6",
    ? !!str "strId"
    : !!str "006",
  },
  !!map {
    ? !!str "intId"
    : !!int "7",
    ? !!str "strId"
    : !!str "007",
  },
  !!map {
    ? !!str "intId"
    : !!str "008",
    ? !!str "strId"
    : !!str "008",
  },
  !!map {
    ? !!str "intId"
    : !!str "009",
    ? !!str "strId"
    : !!str "009",
  },
  !!map {
    ? !!str "intId"
    : !!int "8",
    ? !!str "strId"
    : !!str "010",
  },
  !!map {
    ? !!str "intId"
    : !!int "9",
    ? !!str "strId"
    : !!str "011",
  },
  !!map {
    ? !!str "intId"
    : !!int "10",
    ? !!str "strId"
    : !!str "012",
  },
  !!map {
    ? !!str "intId"
    : !!int "11",
    ? !!str "strId"
    : !!str "013",
  },
  !!map {
    ? !!str "intId"
    : !!int "12",
    ? !!str "strId"
    : !!str "014",
  },
]

and SafeDumper without canonical looks like:

- intId: 5
  strId: '005'
- intId: 6
  strId: '006'
- intId: 7
  strId: '007'
- intId: 008
  strId: 008
- intId: 009
  strId: 009
- intId: 8
  strId: '010'
- intId: 9
  strId: '011'
- intId: 10
  strId: '012'
- intId: 11
  strId: '013'
- intId: 12
  strId: '014'

Which seems like, if it's a quoted string of ints, without an 8 or 9, then it dumps as a single-quoted scalar.

If it has an 8 or 9, regardless if it's quoted or not, then it dumps as a non-quoted scalar.

Is there a way to have the resolver identify the quoted strings of integers first, and then identify those that are not quoted? Would that still align with YAML 1.1?

And now it looks like my testing isn't working so well so I guess I'll just close this out for now.

@perlpunk
Copy link
Member

The question is rather, why do you need the quotes? You never told us the reason.

# YAML 1.1
string: 09
octal: 07

This gets loaded as {'string': '09', 'octal': 7}
So 09 stays a string. Everything working correctly.

However, If you are working with a different loader than pyyaml that reads that 09 as an integer, then it either

  1. implements YAML 1.1 and has a bug
  2. it implements the YAML 1.2 core schema

In case of 2) you can just use https://pypi.org/project/yamlcore/ for dumping. It will dump 09 with quotes because in YAML 1.2 Core schema integers can start with zeroes (and octals have to start with 0o):

>>> import yaml
>>> import yamlcore
>>> d = {'string': '09', 'octal': 7}
>>> y = yaml.dump(d, Dumper=yamlcore.CoreDumper)
>>> print(y)
octal: 7
string: '09'

In case of 1) instead of monkeypatching pyyaml's loader, it would be better to monkeypatch the dumper to add quotes around strings that start with a zero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants