Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rollup: bump MSRV to Rust 1.65, bug fixes, memory usage reductions, API improvements, more word boundary assertions and more #1098

Merged
merged 33 commits into from
Oct 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
e711f8e
automata: clean up regression test
BurntSushi Oct 3, 2023
c12a7df
automata: fix line wrapping
BurntSushi Oct 3, 2023
9c8796a
automata: fix word boundary bug
BurntSushi Oct 3, 2023
e1fae8b
automata/onepass: future proof bit packing
BurntSushi Oct 3, 2023
355dd3e
syntax: make Ast the size of a pointer
BurntSushi Oct 3, 2023
8b0b0b0
syntax: box each AstKind variant
BurntSushi Oct 3, 2023
db214e5
syntax: unbox Ast and remove AstKind
BurntSushi Oct 3, 2023
ad2cfd6
syntax: remove guarantees in the HIR related to 'u' flag
BurntSushi Oct 6, 2023
f0147f8
automata: rejigger DFA start state computation
BurntSushi Oct 6, 2023
ac51c5c
automata: fix doc links
BurntSushi Oct 6, 2023
2e67b6f
automata: fix one outdated regex-cli test command
Licheam Jul 23, 2023
5e9204f
automata: fix more out-dated regex-cli commands
BurntSushi Oct 6, 2023
82d7153
syntax: optimize most of the IntervalSet routines
Licheam Jul 21, 2023
a5aa233
syntax and automata: bump LookSet representation from u16 to u32
BurntSushi Oct 7, 2023
8f77e22
syntax/ast: add support for additional word boundary assertions
BurntSushi Oct 7, 2023
37faa6e
syntax/hir: add new special word boundaries to HIR
BurntSushi Oct 7, 2023
97f0205
automata: add special word boundaries to regex-automata
BurntSushi Oct 7, 2023
915a154
doc: explain the new word boundary assertions
BurntSushi Oct 8, 2023
bd36c6f
lite: add special word boundaries to regex-lite
BurntSushi Oct 8, 2023
048b6f8
doc: remove HACKING document
BurntSushi Oct 8, 2023
2d7b355
changelog: add note about decreasing memory usage
BurntSushi Oct 8, 2023
f68f59a
test: disable some tests on non-64-bit
BurntSushi Oct 8, 2023
a85c72e
syntax: fix panics that occur with non-sensical Ast values
BurntSushi Oct 8, 2023
1de1a37
changelog: start filling out the 1.10 release
BurntSushi Oct 8, 2023
0cc1b4d
automata: fix subtle DFA performance bug
BurntSushi Oct 9, 2023
1a50eaa
msrv: bump to Rust 1.65
BurntSushi Oct 9, 2023
9e503cd
fuzz: institute sane limits for arbitrary-based fuzzers
addisoncrump Jul 15, 2023
ee58904
automata: remove 'is_quit_state' debug assertions
BurntSushi Oct 9, 2023
62ce812
automata: fix invalid accelerators
BurntSushi Oct 9, 2023
e378b4d
lite: reduce size limit to avoid timeouts
BurntSushi Oct 9, 2023
c8e4c2e
regex: reject large patterns when fuzzing
BurntSushi Oct 9, 2023
24d08d5
automata: improve sparse DFA validation
BurntSushi Oct 9, 2023
109c8c4
fuzz: add regression test for AST roundtripping
BurntSushi Oct 9, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ jobs:
- name: Install Rust
uses: dtolnay/rust-toolchain@master
with:
toolchain: 1.60.0
toolchain: 1.65.0
# The memchr 2.6 release purportedly bumped its MSRV to Rust 1.60, but it
# turned out that on aarch64, it was using something that wasn't stabilized
# until Rust 1.61[1]. (This was an oversight on my part. I had previously
Expand Down
78 changes: 78 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,81 @@
1.10.0 (2023-10-09)
===================
This is a new minor release of `regex` that adds support for start and end
word boundary assertions. That is, `\<` and `\>`. The minimum supported Rust
version has also been raised to 1.65, which was released about one year ago.

The new word boundary assertions are:

* `\<` or `\b{start}`: a Unicode start-of-word boundary (`\W|\A` on the left,
`\w` on the right).
* `\>` or `\b{end}`: a Unicode end-of-word boundary (`\w` on the left, `\W|\z`
on the right)).
* `\b{start-half}`: half of a Unicode start-of-word boundary (`\W|\A` on the
left).
* `\b{end-half}`: half of a Unicode end-of-word boundary (`\W|\z` on the
right).

The `\<` and `\>` are GNU extensions to POSIX regexes. They have been added
to the `regex` crate because they enjoy somewhat broad support in other regex
engines as well (for example, vim). The `\b{start}` and `\b{end}` assertions
are aliases for `\<` and `\>`, respectively.

The `\b{start-half}` and `\b{end-half}` assertions are not found in any
other regex engine (although regex engines with general look-around support
can certainly express them). They were added principally to support the
implementation of word matching in grep programs, where one generally wants to
be a bit more flexible in what is considered a word boundary.

New features:

* [FEATURE #469](https://github.com/rust-lang/regex/issues/469):
Add support for `\<` and `\>` word boundary assertions.
* [FEATURE(regex-automata) #1031](https://github.com/rust-lang/regex/pull/1031):
DFAs now have a `start_state` method that doesn't use an `Input`.

Performance improvements:

* [PERF #1051](https://github.com/rust-lang/regex/pull/1051):
Unicode character class operations have been optimized in `regex-syntax`.
* [PERF #1090](https://github.com/rust-lang/regex/issues/1090):
Make patterns containing lots of literal characters use less memory.

Bug fixes:

* [BUG #1046](https://github.com/rust-lang/regex/issues/1046):
Fix a bug that could result in incorrect match spans when using a Unicode word
boundary and searching non-ASCII strings.
* [BUG(regex-syntax) #1047](https://github.com/rust-lang/regex/issues/1047):
Fix panics that can occur in `Ast->Hir` translation (not reachable from `regex`
crate).
* [BUG(regex-syntax) #1088](https://github.com/rust-lang/regex/issues/1088):
Remove guarantees in the API that connect the `u` flag with a specific HIR
representation.

`regex-automata` breaking change release:

This release includes a `regex-automata 0.4.0` breaking change release, which
was necessary in order to support the new word boundary assertions. For
example, the `Look` enum has new variants and the `LookSet` type now uses `u32`
instead of `u16` to represent a bitset of look-around assertions. These are
overall very minor changes, and most users of `regex-automata` should be able
to move to `0.4` from `0.3` without any changes at all.

`regex-syntax` breaking change release:

This release also includes a `regex-syntax 0.8.0` breaking change release,
which, like `regex-automata`, was necessary in order to support the new word
boundary assertions. This release also includes some changes to the `Ast`
type to reduce heap usage in some cases. If you are using the `Ast` type
directly, your code may require some minor modifications. Otherwise, users of
`regex-syntax 0.7` should be able to migrate to `0.8` without any code changes.

`regex-lite` release:

The `regex-lite 0.1.1` release contains support for the new word boundary
assertions. There are no breaking changes.


1.9.6 (2023-09-30)
==================
This is a patch release that fixes a panic that can occur when the default
Expand Down
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ categories = ["text-processing"]
autotests = false
exclude = ["/scripts/*", "/.github/*"]
edition = "2021"
rust-version = "1.60.0"
rust-version = "1.65"

[workspace]
members = [
Expand Down
Loading