Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeSet L1 Work #122

Merged
merged 30 commits into from
Jul 23, 2020
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
02bd23b
initial import of char_collection
Jun 3, 2020
c66b770
anyhow::Error dependency removed and std::error::Error added
Jun 3, 2020
e4b8f56
std imports made consistent and unic-ucd-block dependency removed
Jun 3, 2020
4963cc1
Replaced CharRange, passing 36/40 tests
Jun 10, 2020
01ca5ef
Fixed bug, pass all 40 tests
Jun 10, 2020
b57d18f
Remove dependency file
Jun 12, 2020
dedebe2
github actions and README fixes
Jun 16, 2020
a1c2d6a
L1 initial completion, unit tests not complete
Jun 23, 2020
11bdaec
UnicodeSet tests
Jun 23, 2020
dc41bdf
Complete contains test and docs
Jun 23, 2020
0a55933
formatting
Jun 23, 2020
4a04a96
added is_empty() and size()
Jun 23, 2020
e8c2b1a
proposed changes
Jun 23, 2020
1471213
Closure for contains and docs
Jun 23, 2020
8e2a34a
Removed unnecessary files and formatting changes
Jun 23, 2020
cc781c8
Update to repo
Jun 23, 2020
cfd9edf
formatting and cleaning up changes
Jun 24, 2020
c41fa76
replace u32 with char and fix typos and optimizations
Jun 26, 2020
da5eecf
remove unnecessary imports, made more rusty
Jul 9, 2020
22b4fe0
clipply fix
Jul 9, 2020
8d9138e
more clippy lint fixes
Jul 9, 2020
dac8a4a
Architecture checks minus benchmarks
Jul 14, 2020
7fd9082
added benchmarks and fixed surrogate code points in iter
Jul 14, 2020
0f5a021
fix to is_valid
Jul 15, 2020
c7c4330
bench changes and other minor fixes
Jul 15, 2020
a8a4b50
forgot to run fmt
Jul 15, 2020
845bc35
change to std::char, and unreachable!() optimizations
Jul 16, 2020
dbf3100
size() is now constant check, ranges() temp removed
Jul 17, 2020
a9acfa2
fixed bench
Jul 17, 2020
2c54b63
clippy checks that cargo clippy doesn't catch locally
Jul 17, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,6 @@
members = [
"components/icu",
"components/icu4x",
"components/uniset",
"components/locale",
]
21 changes: 21 additions & 0 deletions components/uniset/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
[package]
name = "icu-unicodeset"
description = "API for managing Unicode Language and Locale Identifiers"
version = "0.0.1"
authors = ["The ICU4X Project Developers"]
edition = "2018"
readme = "README.md"
repository = "https://github.com/unicode-org/icu4x"
license-file = "../../LICENSE"
categories = ["internationalization"]
include = [
"src/**/*",
"Cargo.toml",
]

[dev-dependencies]
criterion = "0.3"

[[bench]]
name = "inv_list"
harness = false
12 changes: 12 additions & 0 deletions components/uniset/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# ICU4X

ICU4X is a set of internationalization components for Unicode.

# Status [![crates.io](http://meritbadge.herokuapp.com/icu4x)](https://crates.io/crates/icu4x)

The project is in an incubation period.

# Authors

The project is managed by a subcommittee of ICU-TC in the Unicode Consortium focused on providing solutions for client-side internationalization.

43 changes: 43 additions & 0 deletions components/uniset/benches/inv_list.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
use criterion::{criterion_group, criterion_main, Criterion};
use icu_unicodeset::UnicodeSet;
use std::{char, convert::TryFrom};

fn contains_bench(c: &mut Criterion) {
let best_ex = vec![65, 70];
let best_sample = UnicodeSet::try_from(best_ex).unwrap();
let worst_ex: Vec<u32> = (0..((char::MAX as u32) + 1)).collect();
let worst_sample = UnicodeSet::try_from(worst_ex).unwrap();

let mut group = c.benchmark_group("uniset/contains");
group.bench_with_input("best", &best_sample, |b, sample| {
b.iter(|| sample.iter().map(|ch| sample.contains(ch)))
});
group.bench_with_input("worst", &worst_sample, |b, sample| {
b.iter(|| sample.iter().take(100).map(|ch| sample.contains(ch)))
});
group.finish();
}

fn contains_range_bench(c: &mut Criterion) {
let best_ex = vec![65, 70];
let best_sample = UnicodeSet::try_from(best_ex).unwrap();
let worst_ex: Vec<u32> = (0..((char::MAX as u32) + 1)).collect();
let worst_sample = UnicodeSet::try_from(worst_ex).unwrap();

let mut group = c.benchmark_group("uniset/contains_range");
group.bench_with_input("best", &best_sample, |b, sample| {
b.iter(|| sample.iter().map(|ch| sample.contains_range(&('A'..ch))))
});
group.bench_with_input("worst", &worst_sample, |b, sample| {
b.iter(|| {
sample
.iter()
.take(100)
.map(|ch| sample.contains_range(&(char::from_u32(0).unwrap()..ch)))
})
});
group.finish();
}

criterion_group!(benches, contains_bench, contains_range_bench);
criterion_main!(benches);
121 changes: 121 additions & 0 deletions components/uniset/src/conversions.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
use super::USetError;
use crate::utils::deconstruct_range;
use crate::UnicodeSet;
use std::{
convert::TryFrom,
ops::{Range, RangeBounds, RangeFrom, RangeFull, RangeInclusive, RangeTo, RangeToInclusive},
};

fn try_from_range(range: &impl RangeBounds<char>) -> Result<UnicodeSet, USetError> {
let (from, till) = deconstruct_range(range);
if from < till {
let set = vec![from, till];
Ok(UnicodeSet::try_from(set).unwrap())
} else {
Err(USetError::InvalidRange(from, till))
}
}

impl TryFrom<&Range<char>> for UnicodeSet {
type Error = USetError;

fn try_from(range: &Range<char>) -> Result<Self, Self::Error> {
try_from_range(range)
}
}

impl TryFrom<&RangeFrom<char>> for UnicodeSet {
type Error = USetError;

fn try_from(range: &RangeFrom<char>) -> Result<Self, Self::Error> {
try_from_range(range)
}
}

impl TryFrom<&RangeFull> for UnicodeSet {
type Error = USetError;

fn try_from(_: &RangeFull) -> Result<Self, Self::Error> {
Ok(UnicodeSet::all())
}
}

impl TryFrom<&RangeInclusive<char>> for UnicodeSet {
type Error = USetError;

fn try_from(range: &RangeInclusive<char>) -> Result<Self, Self::Error> {
try_from_range(range)
}
}

impl TryFrom<&RangeTo<char>> for UnicodeSet {
type Error = USetError;

fn try_from(range: &RangeTo<char>) -> Result<Self, Self::Error> {
try_from_range(range)
}
}

impl TryFrom<&RangeToInclusive<char>> for UnicodeSet {
type Error = USetError;

fn try_from(range: &RangeToInclusive<char>) -> Result<Self, Self::Error> {
try_from_range(range)
}
}

#[cfg(test)]
mod tests {
use super::USetError;
use crate::UnicodeSet;
use std::convert::TryFrom;
#[test]
fn test_try_from_range() {
let check: Vec<char> = UnicodeSet::try_from(&('A'..'B')).unwrap().iter().collect();
assert_eq!(vec!['A'], check);
}
#[test]
fn test_try_from_range_error() {
let check = UnicodeSet::try_from(&('A'..'A'));
assert_eq!(Err(USetError::InvalidRange(65, 65)), check);
}
#[test]
fn test_try_from_range_inclusive() {
let check: Vec<char> = UnicodeSet::try_from(&('A'..='A')).unwrap().iter().collect();
assert_eq!(vec!['A'], check);
}
#[test]
fn test_try_from_range_inclusive_err() {
let check = UnicodeSet::try_from(&('B'..'A'));
assert_eq!(Err(USetError::InvalidRange(66, 65)), check);
}
#[test]
fn test_try_from_range_from() {
let uset = UnicodeSet::try_from(&('A'..)).unwrap();
let check: Vec<&u32> = uset.ranges().collect();
assert_eq!(vec![&65, &((std::char::MAX as u32) + 1)], check);
}
#[test]
fn test_try_from_range_to() {
let uset = UnicodeSet::try_from(&(..'A')).unwrap();
let check: Vec<&u32> = uset.ranges().collect();
assert_eq!(vec![&0, &65], check);
}
#[test]
fn test_try_from_range_to_err() {
let check = UnicodeSet::try_from(&(..(0 as char)));
assert_eq!(Err(USetError::InvalidRange(0, 0)), check);
}
#[test]
fn test_try_from_range_to_inclusive() {
let uset = UnicodeSet::try_from(&(..='A')).unwrap();
let check: Vec<&u32> = uset.ranges().collect();
assert_eq!(vec![&0, &66], check);
}
#[test]
fn test_try_from_range_full() {
let uset = UnicodeSet::try_from(&(..)).unwrap();
let check: Vec<&u32> = uset.ranges().collect();
assert_eq!(vec![&0, &((std::char::MAX as u32) + 1)], check);
}
}
15 changes: 15 additions & 0 deletions components/uniset/src/lib.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#[macro_use]
mod uniset;
mod conversions;
mod utils;

pub use conversions::*;
pub use uniset::UnicodeSet;
pub use utils::*;

/// Custom Errors for UnicodeSet.
#[derive(Debug, PartialEq)]
pub enum USetError {
EvanJP marked this conversation as resolved.
Show resolved Hide resolved
InvalidSet(Vec<u32>),
InvalidRange(u32, u32),
}
Loading