Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.44 triggers a stack overflow on Windows #751

Closed
jdm opened this issue Mar 14, 2021 · 1 comment · Fixed by #752
Closed

1.44 triggers a stack overflow on Windows #751

jdm opened this issue Mar 14, 2021 · 1 comment · Fixed by #752

Comments

@jdm
Copy link

jdm commented Mar 14, 2021

What version of regex are you using?

1.44

Describe the bug at a high level.

Running bindgen as part of the mozjs build script triggers a stack overflow on Windows.

What are the steps to reproduce the behavior?

N.B. I recognize that from a reproducibility and isolation perspective, this is the worst possible testcase. I'm filing this so I don't lose track of it.

Build Servo in a Windows CI environment. Using regex 1.43 it completes, and 1.44 it encounters a stack overflow. More details on this in servo/servo#28265.

I have not yet been able to reproduce this only building mozjs using github actions.

I verified that e040c1b is the commit that triggers this change in behaviour in servo/servo#28269.

@jdm jdm changed the title 1.44 1.44 triggers a stack overflow on Windows Mar 14, 2021
@BurntSushi
Copy link
Member

This is a dupe of #750.

BurntSushi added a commit that referenced this issue Mar 14, 2021
This commit fixes a fairly large regression in the stack size of a Regex
introduced in regex 1.4.4. When I dropped thread_local and replaced it
with Pool, it turned out that Pool inlined a T into its struct and a
Regex in turn had Pool inlined into itself. It further turns out that
the T=ProgramCache is itself quite large.

We fix this by introducing an indirection in the inner regex type. That
is, we use a Box<Pool> instead of a Pool. This shrinks the size of a
Regex from 856 bytes to 16 bytes.

Interestingly, prior to regex 1.4.4, a Regex was still quite substantial
in size, coming in at around 552 bytes. So it looks like the 1.4.4
release didn't dramatically increase it, but it increased it enough that
folks started experiencing real problems: stack overflows.

Fixes #750, Fixes #751

Ref servo/servo#28269
BurntSushi added a commit that referenced this issue Mar 14, 2021
This commit fixes a fairly large regression in the stack size of a Regex
introduced in regex 1.4.4. When I dropped thread_local and replaced it
with Pool, it turned out that Pool inlined a T into its struct and a
Regex in turn had Pool inlined into itself. It further turns out that
the T=ProgramCache is itself quite large.

We fix this by introducing an indirection in the inner regex type. That
is, we use a Box<Pool> instead of a Pool. This shrinks the size of a
Regex from 856 bytes to 16 bytes.

Interestingly, prior to regex 1.4.4, a Regex was still quite substantial
in size, coming in at around 552 bytes. So it looks like the 1.4.4
release didn't dramatically increase it, but it increased it enough that
folks started experiencing real problems: stack overflows.

Since indirection can lead to worse locality and performance loss, I did
run the benchmark suite. I couldn't see any measurable difference. This
is generally what I would expect. This is an indirection at a fairly
high level. There's lots of other indirection already, and this
indirection isn't accessed in a hot path. (The regex cache itself is of
course used in hot paths, but by the time we get there, we have already
followed this particular pointer.)

We also include a regression test that asserts a Regex (and company) are
16 bytes in size. While this isn't an API guarantee, it at least means
that increasing the size of Regex will be an intentional thing in the
future and not an accidental leakage of implementation details.

Fixes #750, Fixes #751

Ref servo/servo#28269
BurntSushi added a commit that referenced this issue Mar 14, 2021
This commit fixes a fairly large regression in the stack size of a Regex
introduced in regex 1.4.4. When I dropped thread_local and replaced it
with Pool, it turned out that Pool inlined a T into its struct and a
Regex in turn had Pool inlined into itself. It further turns out that
the T=ProgramCache is itself quite large.

We fix this by introducing an indirection in the inner regex type. That
is, we use a Box<Pool> instead of a Pool. This shrinks the size of a
Regex from 856 bytes to 16 bytes.

Interestingly, prior to regex 1.4.4, a Regex was still quite substantial
in size, coming in at around 552 bytes. So it looks like the 1.4.4
release didn't dramatically increase it, but it increased it enough that
folks started experiencing real problems: stack overflows.

Since indirection can lead to worse locality and performance loss, I did
run the benchmark suite. I couldn't see any measurable difference. This
is generally what I would expect. This is an indirection at a fairly
high level. There's lots of other indirection already, and this
indirection isn't accessed in a hot path. (The regex cache itself is of
course used in hot paths, but by the time we get there, we have already
followed this particular pointer.)

We also include a regression test that asserts a Regex (and company) are
16 bytes in size. While this isn't an API guarantee, it at least means
that increasing the size of Regex will be an intentional thing in the
future and not an accidental leakage of implementation details.

Fixes #750, Fixes #751

Ref servo/servo#28269
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants