-
-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add --combined option #37
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't feel very useful to me. The only one of these that has meaningful output at all is the SCRIPT table, and that's because we don't include Unknown
/Zzzz
in there (which might be a bug, IDK) -- The tables are &[(0, 0x10ffff)]
. I guess maybe the idea is you could use it with --exclude
/ --include
? Anyway, @BurntSushi's call.
If we're going to add it, it should be supported for all the other cases where we currently support --enum, probably...
Just in terms of the code itself I think it would be better if it tried harder to use the existing output functions in writer.rs.
I also suspect it would be better if the flag were on the writer.opts structure...
src/writer.rs
Outdated
@@ -417,6 +458,29 @@ impl Writer { | |||
Ok(()) | |||
} | |||
|
|||
fn range_set_slice( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function is the same as ranges_slice
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
src/writer.rs
Outdated
/// Write a set that contains ranges of codepoints. | ||
/// | ||
/// The smallest numeric type is used when applicable. | ||
pub fn range_set(&mut self, name: &str, set: &[u32]) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you just turn the slice into BTreeSet
instead you could use the existing ranges
function here. This would make supporting trie output easy, as well as the --split-ranges option in #35, for example.
Actually supporting these might make the help text for the flags confusing though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
src/app.rs
Outdated
@@ -221,6 +221,11 @@ pub fn app() -> App<'static, 'static> { | |||
.arg(Arg::with_name("rust-enum").long("rust-enum").help( | |||
"Emit a Rust enum and a table that maps codepoints to bidi class.", | |||
)) | |||
.arg( | |||
Arg::with_name("combined").long("combined").help( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better if you put this up with the other args and just clone it in. I don't know why --enum and --rust-enum don't, but they also have slightly varying help text for each one, which is probably why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I was using --rust-enum as a reference.
I would only be using this for the
You're right. I didn't expect these functions to exist. It's much better to use them.
I don't think this would make sense, but you know the code better than I do. This option is similar to |
Hmm... For a while I've wanted the ability to generate a custom table based on a combination of property bools / property values. For example, Anyway, I'm going to file a bug for adding something like that -- it would have solved your use case, and some of mine. I think there are still open questions around that though, so I'm not saying that filing the bug makes this obsolete. |
@dylni This looks like a simple addition so I'm inclined to accept it if it's useful, but I don't quite understand the use case. Can you say more about why this is useful to you? It sounds like you want to use it with |
@BurntSushi Yes, I use it to generate this file. That crate only needs to check if characters belong to one of the "other" or "separator" categories, so searching one table is easier and more efficient. If you remove the
Yes, I could search both tables, and that would very likely have a negligible impact on performance. However, since a smaller and simpler table can be generated, that is what I would prefer to include in my crate. |
@dylni Ah I see. I think the thing missing from your explanation is that you were only generating a subset of general category tables via the Can you add a little extra docs to the |
Oh, I can see how this seems very strange without that context.
Sure, the last commit extends the documentation. |
"Emit a single table with all included codepoint ranges. You might \ | ||
want to use this option when checking if characters belong to a \ | ||
subset of categories, since only one table will need to be checked. \ | ||
Searching the combined table should be simpler and more efficient.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This PR is on crates.io in |
Thanks! It's working very well |
Thanks for creating this very useful crate.
This adds a
--combined
option that works the same way as not adding the option, except all tables are combined into a single table. It can create smaller tables than--enum
when it's only necessary to check if a code point is part of any variant.It also supports related categories, which
--enum
doesn't.