Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) v0.4 release #173

Merged
merged 124 commits into from
Aug 15, 2023
Merged

(WIP) v0.4 release #173

merged 124 commits into from
Aug 15, 2023

Conversation

shwestrick
Copy link
Collaborator

@shwestrick shwestrick commented May 2, 2023

Working on a v0.4 release which incorporates entanglement management (paper to appear at PLDI'23) as well as other improvements along the way. Details to follow...

Here are some potential TODO items for v0.4. These aren't essential, but having the list is helpful regardless. We can discuss which we want to include and which we can leave for a future release.

Small things:

  • Do -disable-pass splitTypes1 -disable-pass splitTypes2 by default? IIRC, these passes haven't been updated for polymorphic CAS. (We should open an issue about this.)
  • We should change the wording in this error message (link). It's very old, and (with new entanglement management techniques) is misleading.

Big-ish things:

  • Update gdtoa to be thread-safe #172 fixed the Real.toString bug. We should also consider fixing similar bugs, e.g. Real.fromString, Int.toString, some of the Time.{to,from}X functions, etc., all of which are inherited from MLton but which are not thread-safe by default.
  • CGC performance improvements #169 discusses some performance problems with CGC. The second problem in particular (where an ancestor heap is pushed into the CGC chain, causing a space explosion) seems like it could be an easy fix with big impact.

shwestrick and others added 24 commits October 12, 2022 10:54
…gement);signifiicantly less overhead on entangled read barriers
Be warned! `#ifdef ASSERT` is true in all builds.

This was causing the debug version of `traverseAndCheck` to run
in all builds, with significant performance degradation in entangled
benchmarks.

I cleaned up the header and definition a little here, too.
Be warned! `#ifdef ASSERT` is true in all builds.

This was causing the debug version of `traverseAndCheck` to run
in all builds, with significant performance degradation in entangled
benchmarks.

I cleaned up the header and definition a little here, too.
Clear candidates (suspects) in parallel: entanglement management perf improvement (and other fixes)
This fixes a subtle bug where, when scheduling a CGC, an unhandled
exception can skip the cleanup code that happens after checking if
the CGC-task can be popped and executed locally.

In the scheduler, in `forkGC`, there is a call to `fork ... (f, g)`
between the `push` and `popDiscard` for the CGC-task. Previously,
if this call raised an exception, then the `popDiscard` (and other
associated cleanup) was skipped.

The solution is straightforward, similar to how exceptions are
normally caught and re-raised across task boundaries:
  1. catch the exception if needed (via `result (fn () => ...)`)
  2. perform the cleanup
  3. re-raise the exception if needed (via `extractResult`)
merge real.to_String changes
basis-library/util/one.sml defines a structure called `One` which
is used in the basis library to optimize memory usage of a few
functions, including:
  * Int.{fmt,toString}
  * Word.{fmt,toString}
  * Real.{split,toManExp}

This patch fixes a buggy race condition in the implementation of
`One`. With this patch, the above functions should be safe for
parallelism and concurrency.

The idea behind `One` is straightforward: a static buffer or mutable
cell is allocated to be shared across calls. When a call is made, if
the shared buffer is not in use, then the shared buffer can be claimed
and used for the duration of that call, and then released.

The mechanism for claiming the buffer (inherited from MLton) was
previously not thread-safe, because it was not atomic at the
hardware level. For MLton, it didn't need to be. But for MPL
this is no longer correct, hence the bug.

This patch switches to using an atomic compare-and-swap (CAS) to
claim the buffer.
bugfix: update structure One to be thread-safe
@shwestrick
Copy link
Collaborator Author

Collecting a list of what we've done so far:

  • bugfix: various toString/fromString/fmt functions are now thread-safe. (Used to be racy in parallel.)
  • bugfix: disabled splitTypes1/splitTypes2 by default. These passes don't support polymorphic primitive CAS.
  • bugfix: exception handling during CGC scheduling
  • bugfix: polymorphic CAS on Real32.real array

@shwestrick shwestrick merged commit f10dab4 into master Aug 15, 2023
@shwestrick shwestrick deleted the prep-v0.4-release branch September 11, 2024 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants