-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge data files before lookup #98
Merge data files before lookup #98
Conversation
grouping_nodes = select("rbnf/rulesetGrouping") | ||
return {} if grouping_nodes.empty? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this change since it didn't make sense to check for the existence of path
anymore.
This had a side-effect of cleaning up a edge-case bug:
common/rbnf/en_001.xml
exists, but is empty. So this code was returning []
instead of the expected {}
. Now that it has been fixed to return {}
, en-001/rbnf.yml
is no longer output since we don't output empty hashes.
53747d7
to
ecb1dd3
Compare
# and the <identity> elements are not important to us / make no sense when combined together. | ||
return Nokogiri::XML('') if paths_to_merge.empty? | ||
|
||
rest = paths_to_merge[1..paths_to_merge.size - 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't just use paths_to_merge[1..]
since that's not available in Ruby 2.3, and we haven't officially dropped support for the old versions of Ruby.
rest.inject(Nokogiri::XML(File.read(paths_to_merge.first))) do |result, path| | ||
next_doc = Nokogiri::XML(File.read(path)) | ||
|
||
next_doc.root.children.each do |child| | ||
result.root.add_child(child) | ||
end | ||
|
||
result | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like there is a cleaner/clearer way to do this part, but quickly playing around couldn't quickly figure out something better 🤷♂️
Some parts (`ldml`, `ldmlBCP47` amd `supplementalData`) of CLDR data require that you merge all the files with the same root element before doing lookups. This gives you a mechanism to lookup the paths you need to merge together based on the root element. Ref: https://www.unicode.org/reports/tr35/tr35.html#XML_Format
ecb1dd3
to
d71b940
Compare
What are you trying to accomplish?
Fixes #96.
Some parts (
ldml
,ldmlBCP47
amdsupplementalData
) of CLDR data require that you merge all the files with the same root element before doing lookups.Ref: https://www.unicode.org/reports/tr35/tr35.html#XML_Format
That way, CLDR can move data between files, or split files, for purposes of organization without affecting the actual values.
However,
ruby-cldr
was hard-coding the paths where particular data should be read from.This lead to data being no longer exported when it was moved in the upstream CLDR.
What approach did you choose and why?
I changed
Cldr::Export::Data::Base#doc
to return a doc that has all of the related XML files merged into one Nokogiri document.Merging all of these files is expensive, so I use a class variable as a cache. This might not be the absolute ideal way to do things from a patterns perspective, but it saves us from having to do a much more intense refactor. I like it well enough. 🤷
I use
@locale
as a flag for whether or not the data is supplemental or locale dependent. This will not work if we ever want to use this for the other types of data in CLDR (keyboard
,platform
, orldmlBCP47
). Perhaps a future refactor would pull this out into subclasses ofBase
instead (Supplemental
andLanguageDependent
?)? I have this idea thatlib/cldr/export.rb
could be radically simpler if it didn't have to askis_shared_component?
andTransforms
was less of a special snowflake. But that's a much larger refactor than what I'm attempting right now in this PR. Again, good enough for now. 🤷What should reviewers focus on?
What do you think about the changes?
The impact of these changes
Fixes #96.
Testing
You could run
thor cldr:export
formaster
and this branch, then rundiff -r
to compare the outputs:Which will show 3 classes of diffs:
Only in data_from_master/en-001: rbnf.yml
. See comment.data/transforms/*
files all change slightly since they no longer have a@doc
instance variable. This should not be important to any consumer.Only in data_after: variables.yml
, this is the result of Variables stopped being output #96 getting fixed.