Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prune the data before exporting to YAML ✂️ 📉 #86

Merged
merged 1 commit into from
Nov 17, 2021

Conversation

movermeyer
Copy link
Collaborator

@movermeyer movermeyer commented Nov 12, 2021

What are you trying to accomplish?

Fixes #85

There are many places in the code that are using a select(...).inject({}) pattern for producing data, which produces empty Hash's whenever there is no data in the XML.

What approach did you choose and why?

Prune the empty hashes recursively before outputting to YAML.
Since yaml.rb will already not output if data.empty?, it will not output a file in cases where there is no relevant data to output (i.e., the file is entirely empty hashes).

What should reviewers focus on?

Could there be any place in the code where an empty Hash is appropriate to output?
I contend that there is not.

The impact of these changes

Thousands of fewer files are output, and those that are output are smaller since they no longer contain empty nested key structures.

Before After Ratio
# of .yml files 8332 6005 0.79
Total bytes 30464452 (29.05 MiB) 30188555 (28.79 MiB) 0.99

Testing

You can test it yourself:

git checkout master
thor cldr:export --target=./data_from_master

git checkout movermeyer:mo.prune_yaml_files
thor cldr:export --target=./data_from_pr

# Compare the file counts:
find data_from_master | grep "\.yml$" | wc -l
find data_from_pr | grep "\.yml$" | wc -l

# Compare the byte counts:
# Aside: I use `wc -c` since MacOS doesn't have an easy `du -cb`
find data_from_master | grep "\.yml$" | xargs -I{} cat {} | wc -c
find data_from_pr | grep "\.yml$" | xargs -I{} cat {} | wc -c

# Compare the list of files output:
find data_from_master | grep "\.yml$" | sort | sed 's/data_from_master/data/' > files_from_master.txt
find data_from_pr | grep "\.yml$" | sort | sed 's/data_from_pr/data/' > files_from_pr.txt
diff files_from_master.txt files_from_pr.txt

image

@movermeyer movermeyer marked this pull request as ready for review November 16, 2021 14:58
Comment on lines -27 to -28
- Only export plural keys for currencies that have pluralization data, [80](https://github.com/ruby-i18n/ruby-cldr/pull/80)
- Sort the exported data by key, [82](https://github.com/ruby-i18n/ruby-cldr/pull/82)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding the missing # in the link text

@movermeyer movermeyer changed the title Prune the data before exporting to YAML Prune the data before exporting to YAML 📉 Nov 16, 2021
@movermeyer movermeyer changed the title Prune the data before exporting to YAML 📉 Prune the data before exporting to YAML ✂️ 📉 Nov 16, 2021
test/core_ext/deep_prune_test.rb Outdated Show resolved Hide resolved
test/core_ext/deep_prune_test.rb Outdated Show resolved Hide resolved
test/core_ext/deep_prune_test.rb Outdated Show resolved Hide resolved
@movermeyer movermeyer force-pushed the mo.prune_yaml_files branch 3 times, most recently from 7ab4802 to c043130 Compare November 16, 2021 15:47
test/core_ext/deep_prune_test.rb Outdated Show resolved Hide resolved
test/core_ext/deep_prune_test.rb Show resolved Hide resolved
@movermeyer movermeyer requested a review from Korri November 16, 2021 17:12
Copy link
Collaborator

@trishrempel trishrempel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't explicitly approve, but looks good to me! 👍

@Korri Korri merged commit a4f6cbd into ruby-i18n:master Nov 17, 2021
@movermeyer movermeyer mentioned this pull request Jun 20, 2022
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't export files when for locales if there is no relevant data in CLDR
3 participants