Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode-preserving mutators #1542

Merged
merged 31 commits into from
Nov 20, 2023
Merged

Unicode-preserving mutators #1542

merged 31 commits into from
Nov 20, 2023

Conversation

addisoncrump
Copy link
Collaborator

@addisoncrump addisoncrump commented Sep 22, 2023

This PR adds mutators which preserve the unicode categories of mutated regions.

Cargo.toml Outdated Show resolved Hide resolved
@addisoncrump
Copy link
Collaborator Author

addisoncrump commented Sep 23, 2023

This needs to be modified s.t. we don't require the whole input to be UTF-8 -- this requirement proves to be too strong in practice. completed

@tokatoka
Copy link
Member

perhaps next step could be adding token's category info when add it to the dictionary?

@tokatoka
Copy link
Member

I will try this tomorrow.

@tokatoka
Copy link
Member

when I add a "real" utf8 chars to test I got lots of Utf8Error.

@tokatoka
Copy link
Member

tokatoka commented Sep 25, 2023

Now I fixed the test so they return proper utf8error.
cargo test mutators::string::test::mutate_hex is the command (with unicode enabled)

but with the current state it already panick. I'll look tomorrow too

@addisoncrump
Copy link
Collaborator Author

Right -- I think it is not strictly possible to sanely select UTF-8 data from an input, because there's not a clear indicator for the start of a code point. This makes some mutations lead to non-UTF-8 data, which is less than optimal. Not sure how to get around this without a "this region is UTF-8" pass of some kind. That should be fairly cheap to implement, so maybe we can try this.

@addisoncrump addisoncrump marked this pull request as ready for review September 27, 2023 02:36
@addisoncrump
Copy link
Collaborator Author

@tokatoka more CI failures... 😢

@andreafioraldi
Copy link
Member

status?

@addisoncrump addisoncrump force-pushed the unicode-mutator branch 2 times, most recently from a41121d to 1e4b04c Compare November 20, 2023 14:01
@domenukk
Copy link
Member

This is cool!

@domenukk domenukk merged commit 281524d into main Nov 20, 2023
17 checks passed
@domenukk domenukk deleted the unicode-mutator branch November 20, 2023 23:41
tokatoka added a commit that referenced this pull request Nov 21, 2023
* create the string classification stage

* modify API to pre-group

* preserving mutator

* more meaningful test

* subproperty mutators + some fixes

* document, finalise, integrate with libafl_libfuzzer

* add example, fix for weird range select

* fix for introspection

* fix fuzzer build

* speed optimisation: allow, but do not require, stacking

* property => category

* token replacement

* fixup: rare case where rust does not agree on valid character

* fix CI again

* again again

* take two: dynamic unicode discovery

* oops

* fix: last byte is never selected

* opt: bias to smaller unicode categories

* fix test

* opt: precompute regions and fix tests

* cache and allow stacking

* document and update libafl_libfuzzer

* oops, use reverse

* fix bolts clippy error

* fixup part 2

* clippy

* part 2

* clippy warning allow

* clippy complaint

* use alloc not std

---------

Co-authored-by: toka <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants