Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Quick Check #2769

Merged
merged 1 commit into from
Jan 29, 2024
Merged

Implement Quick Check #2769

merged 1 commit into from
Jan 29, 2024

Conversation

tamaroning
Copy link
Contributor

@tamaroning tamaroning commented Dec 2, 2023

Addresses #2379

I implemented Unicode Quick Check.
This checks if strings (usually identifiers) should be normalized or not before performing Unicode normalization.
QuickCheckResult::Yes shows the given string is already normalized, saving time and memory.
Strings should be normalized if the result is QuickCheckResult::No or QuickCheckResult::Maybe.

Also, I added Unicode data files which are compatible with GPL.
they are used to generate rust-unicode-data.h
See https://www.gnu.org/licenses/license-list.en.html#Unicode

gcc/rust/ChangeLog:

	* rust-system.h: Add <algorithm>.
	* util/make-rust-unicode.py: Output NFC_Quick_Check table.
	* util/rust-codepoint.h (struct Codepoint): Add is_supplementary
	method.
	* util/rust-unicode-data.h: Generated.
	* util/rust-unicode.cc (lookup_cc): Modified to use
	std::lower_bound.
	(is_alphabetic): Likewise.
	(nfc_quick_check): New function.
	(nfc_normalize): Use nfc_quick_check.
	(is_nfc_qc_maybe): New function.
	(is_nfc_qc_no): New function.
	(rust_nfc_qc_test): New test.
	* util/rust-unicode.h (is_nfc_qc_no): New function.
	(is_nfc_qc_maybe): New function.
	(enum class): New enum class.
	(nfc_quick_check): New function.
	(rust_nfc_qc_test): New test.
	* util/DerivedCoreProperties.txt: New file.
	* util/DerivedNormalizationProps.txt: New file.
	* util/UnicodeData.txt: New file.

Signed-off-by: Raiki Tamura <[email protected]>

@@ -16,7 +16,15 @@
# along with GCC; see the file COPYING3. If not see
# <http://www.gnu.org/licenses/>.

# Run this program as
# First, download the folloqing files from unicode.org
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# First, download the folloqing files from unicode.org
# First, download the following files from unicode.org

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh thanks😆

@tamaroning tamaroning force-pushed the quick-check branch 2 times, most recently from 10a9f78 to 4a9fc52 Compare December 12, 2023 16:20
@tamaroning tamaroning marked this pull request as ready for review December 12, 2023 16:32
Copy link
Member

@CohenArthur CohenArthur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great, thank you :) is it possible to maybe add some unit testing? otherwise this looks good to me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so these files are not present in any other GCC frontend? this is surprising. I guess it'll be a nice thing to bring higher in the GCC file hierarchy, so that other languages can benefit from this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omg, I completely missed the fact that contrib/unicode contains these files 😮😮
https://github.com/tamaroning/gccrs/tree/2a72592cf111f8f3dcc024d1f34ccd7fe6d36878/contrib/unicode

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, no problem! better late than never :D well done finding them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed them and pushed just now

@tamaroning tamaroning force-pushed the quick-check branch 4 times, most recently from 37125bc to b6c3c3d Compare December 14, 2023 11:39
Comment on lines +345 to +349
void
rust_nfc_qc_test ()
{
ASSERT_EQ (Rust::nfc_quick_check ({0x1e0a /* NFC_QC_YES */}),
Rust::QuickCheckResult::YES);
ASSERT_EQ (Rust::nfc_quick_check (
{0x1e0a /* NFC_QC_YES */, 0x0323 /* NFC_QC_MAYBE */}),
Rust::QuickCheckResult::MAYBE);
ASSERT_EQ (Rust::nfc_quick_check ({0x0340 /* NFC_QC_NO */}),
Rust::QuickCheckResult::NO);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CohenArthur
Here are tests. (but very few...)
Sorry, I failed to commit gcc/rust/rust-lang.cc for some reason.

Copy link
Member

@CohenArthur CohenArthur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks great :) thanks @tamaroning, good work!

}

template <std::size_t SIZE>
int64_t
bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we replace this function with a direct call to std::binary_search instead ?

Copy link
Contributor Author

@tamaroning tamaroning Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes.
I will fix it later

gcc/rust/ChangeLog:

	* rust-lang.cc (run_rust_tests): Add test.
	* rust-system.h: Add <algorithm>.
	* util/make-rust-unicode.py: Output NFC_Quick_Check table.
	* util/rust-codepoint.h (struct Codepoint): Add is_supplementary
	method.
	* util/rust-unicode-data.h: Generated.
	* util/rust-unicode.cc (binary_search_sorted_array): Removed.
	(lookup_cc): Remove namespace.
	(is_alphabetic): Use std::binary_search
	(nfc_quick_check): New function.
	(nfc_normalize): Use nfc_quick_check.
	(is_nfc_qc_maybe): New function.
	(is_nfc_qc_no): New function.
	(rust_nfc_qc_test): New test.
	* util/rust-unicode.h (is_nfc_qc_no): New function.
	(is_nfc_qc_maybe): New function.
	(enum class): New enum class.
	(nfc_quick_check): New function.
	(rust_nfc_qc_test): New test.

Signed-off-by: Raiki Tamura <[email protected]>
@tamaroning
Copy link
Contributor Author

tamaroning commented Jan 26, 2024

@P-E-P
Sorry for super late response!
I've been busy for my university exams
i fixed everything.

Copy link
Member

@P-E-P P-E-P left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@P-E-P P-E-P added this pull request to the merge queue Jan 29, 2024
Merged via the queue into Rust-GCC:master with commit d6e10d2 Jan 29, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants