Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symbolize keys and freeze values when loading from YAML #583

Merged
merged 2 commits into from
Jan 26, 2022

Conversation

paarthmadan
Copy link
Contributor

Changes introduced

Pysch, Ruby's YAML parser, now supports configuring freeze and symbolize_names as options for YAML#unsafe_load_file and YAML#load_file.

This PR sets both of these parameters to true.

Intended outcomes

In setting freeze: true, all values returned will now be frozen objects. Further, string values will be deduplicated.

In setting symbolize_names: true, keys will be directly parsed as symbols. This is particularly beneficial, as we already eventually perform a call to #deep_symbolize_keys. This way, they'll take on their Symbol representation at parse-time.

@paarthmadan paarthmadan force-pushed the pm/yaml-load-dedup-and-symbolize-keys branch from ea368bf to 0090775 Compare November 5, 2021 14:58
# are supported in the current version of Pysch.
def yaml_load_options
@yaml_load_options ||= YAML_LOAD_OPTIONS.select do |k, v|
YAML.load_file('test/test_data/locales/en.yml', k => v)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
YAML.load_file('test/test_data/locales/en.yml', k => v)
YAML.load('', **{k => v})

Two things here:

  • That load_file wouldn't work once loaded in an app because it is relative to the current working directory ($PWD)). But you don't need to load a file anyway, load take the same options.
  • It's best to use **{} to make sure it's passed as keyword arguments, not a positional hash.

Copy link
Contributor Author

@paarthmadan paarthmadan Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, I was trying to use #load, but it turns out load's signature wasn't always a good proxy for the signature of load_file.

Until this commit (by yourself), ruby/psych#463, not all kwargs on load were available to load_file.

This is problematic, as for instance on Ruby 2.7.4, load accepts :symbolize_names but load_file doesn't.

Any suggestions? I certainly agree that using load would be optimal, and didn't fully think through the consequence of using load_file as you pointed out above.

Copy link
Contributor

@codealchemy codealchemy Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could accessing the methods args directly help here? Something along the lines of

YAML_LOAD_OPTIONS = YAML.method(:load_file).parameters.each_with_object({}) do |(context, param_name), args|
  args[param_name] = true if context == :key && %i[symbolize_names freeze].include?(param_name)
end

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could accessing the methods args directly help here?

In this instance no, because for a very long time these methods weren't taking keyword args, but one big option hash.

@paarthmadan paarthmadan force-pushed the pm/yaml-load-dedup-and-symbolize-keys branch from 0090775 to 2a15f69 Compare November 5, 2021 15:46
@@ -236,14 +236,23 @@ def load_rb(filename)
eval(IO.read(filename), binding, filename)
end

# Feature tests YAML load options and only selects those that
# are supported in the current version of Psych.
YAML_LOAD_OPTIONS = { symbolize_names: true, freeze: true }.select do |k, v|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than rescuing on an ArgumentError, I would rather a version check (either of Ruby, or of whatever's added the support for these options).

Pseudocode like:

if version >= 3.0.0
  <symbolise code>
else
  <old path>
end

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that bundled gem sometimes "lie" about their version because of https://bugs.ruby-lang.org/issues/18169, it happened to me previously hence I I now default to feature testing.

That being said, we could conflate this with the feature testing for unsafe_load_file:

            if YAML.respond_to?(:unsafe_load_file) # Psych 4.0 way
              YAML.unsafe_load_file(filename, symbolize_names: true, freeze: true)
            else
              YAML.load_file(filename)
            end

We'll miss a few versions but that's probably fine.

Copy link
Collaborator

@radar radar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not rescue from ArgumentError.

@casperisfine casperisfine force-pushed the pm/yaml-load-dedup-and-symbolize-keys branch 2 times, most recently from b52fa23 to 0547768 Compare November 8, 2021 15:58
@casperisfine
Copy link

@paarthmadan I took the liberty to update your PR. Now it only applies for psych >= 3.3.2. We could theoretically include a few slightly older versions but it's not worth the trouble.

@paarthmadan
Copy link
Contributor Author

Thanks @casperisfine, I appreciate it! FWIW, I agree with the proposed strategy 👍 We scope the change to a smaller (but incomplete) subset of versions to avoid needing to rescue from an ArgumentError.

@paarthmadan paarthmadan requested a review from radar November 8, 2021 18:00
Copy link

@Sarkar44 Sarkar44 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request change

@ruby-i18n ruby-i18n deleted a comment from Sarkar44 Nov 9, 2021
@ruby-i18n ruby-i18n deleted a comment from Sarkar44 Nov 9, 2021
@paarthmadan
Copy link
Contributor Author

@radar Just bumping this! Addressed your feedback, now we simply append the arguments when we're certain they're supported.

@casperisfine
Copy link

casperisfine commented Nov 22, 2021

A couple note (which doesn't mean they should be added to this PR):

The same could be done for .json files:

  • Recent json gems also have symbolize_names and freeze
  • They also have load_file which allows bootsnap to really speed up parsing.

That being said, I've never seen a .json translation file in the wild, so...

As for performance, with the keys already parsed as symbols, deep_symbolize_keys will be faster, but it still need to walk the object graph which is quite slow. It would be even faster to not call it because we know it's useless.

The problem is that it's called in load_file, so it would need a bit of a refactoring to achieve this, but that may be better done in a followup.

@@ -72,7 +72,11 @@ def setup

test "simple load_yml: loads data from a YAML file" do
data = I18n.backend.send(:load_yml, "#{locales_dir}/en.yml")
assert_equal({ 'en' => { 'foo' => { 'bar' => 'baz' } } }, data)
if ::YAML.respond_to?(:unsafe_load_file)
assert_equal({ :en => { :foo => { :bar => 'baz' } } }, data)
Copy link
Collaborator

@radar radar Dec 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like a check for frozen values here too, please. I think this might work here:

assert data.dig(:en, :foo, :bar).frozen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added!

Copy link
Collaborator

@radar radar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add check for frozen

@paarthmadan paarthmadan force-pushed the pm/yaml-load-dedup-and-symbolize-keys branch from 0547768 to 5e5a092 Compare December 13, 2021 19:15
@paarthmadan paarthmadan requested a review from radar December 13, 2021 19:16
@radar
Copy link
Collaborator

radar commented Jan 25, 2022

Please fix the conflict on this branch.

@paarthmadan paarthmadan force-pushed the pm/yaml-load-dedup-and-symbolize-keys branch from 5e5a092 to 787bf0d Compare January 26, 2022 15:24
@paarthmadan paarthmadan force-pushed the pm/yaml-load-dedup-and-symbolize-keys branch from 787bf0d to 0fda789 Compare January 26, 2022 15:51
@radar radar merged commit b48b0bc into ruby-i18n:master Jan 26, 2022
@paarthmadan paarthmadan deleted the pm/yaml-load-dedup-and-symbolize-keys branch February 18, 2022 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants