Skip to content

Commit

Permalink
Merge branch 'master' into performance-eval
Browse files Browse the repository at this point in the history
  • Loading branch information
shwestrick committed Sep 29, 2022
2 parents 6646063 + 0a1b9a9 commit 11e7202
Show file tree
Hide file tree
Showing 14 changed files with 108 additions and 27 deletions.
12 changes: 9 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,10 +1,16 @@
FROM ubuntu:latest
FROM ubuntu:20.04

# Install the dependencies. We'll use the ubuntu provided mlton to bootstrap our local build.
RUN apt-get update -qq \
&& apt-get install -qq git build-essential libgmp-dev mlton mlton-tools vim
&& apt-get install -qq git build-essential libgmp-dev mlton mlton-tools vim \
&& git clone https://github.com/mlton/mlton.git /root/mlton \
&& cd /root/mlton \
&& git checkout on-20210117-release \
&& make

# Copy the current directory (MLton source root) to a location within the container & move there
ENV PATH /root/mlton/build/bin:$PATH

# Copy the current directory (MPL source root) to a location within the container & move there
COPY . /root/mpl
WORKDIR /root/mpl

Expand Down
52 changes: 36 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,15 @@ compiler for Standard ML which implements support for
nested (fork-join) parallelism. MPL generates executables with
excellent multicore performance, utilizing a novel approach to
memory management based on the theory of disentanglement
[[1](#rmab16),[2](#gwraf18),[3](#wyfa20),[4](#awa21)].
[[1](#rmab16),[2](#gwraf18),[3](#wyfa20),[4](#awa21),[5](#waa22)].

MPL is research software and is being actively developed.

If you are you interested in using MPL, consider checking
out the [tutorial](https://github.com/MPLLang/mpl-tutorial).
You might also be interested in exploring
[`mpllib`](https://github.com/MPLLang/mpllib)
(a standard library for MPL) and the
(a library for MPL) and the
[Parallel ML benchmark suite](https://github.com/MPLLang/parallel-ml-bench).

## Docker
Expand All @@ -25,6 +25,20 @@ $ docker run -it shwestrick/mpl /bin/bash
...# examples/bin/primes @mpl procs 4 --
```

If you want to try out MPL by writing and compiling your own code, we recommend
mounting a local directory inside the container. For example, here's how you
can use MPL to compile and run your own `main.mlb` in the current directory.
(To mount some other directory, replace `$(pwd -P)` with a different path.)
```
$ ls
main.mlb
$ docker run -it -v $(pwd -P):/root/mycode shwestrick/mpl /bin/bash
...# cd /root/mycode
...# mpl main.mlb
...# ./main @mpl procs 4 --
```


## Build and Install (from source)

### Requirements
Expand Down Expand Up @@ -211,22 +225,23 @@ to an object allocated concurrently by some other thread, then we say that
the two threads are **entangled**. This is a violation of disentanglement,
which MPL currently does not allow.

To check if your program has entanglement, MPL has a built-in dynamic
entanglement detector. You can enable the detector by using
`-detect-entanglement true` at compile time.
When entanglement detection is enabled, MPL will monitors individual reads
and writes during execution; if entanglement is found, the program will
terminate with an error message.
MPL has a built-in dynamic entanglement detector which is enabled by default.
The entanglement detector monitors individual reads and writes during execution;
if entanglement is found, the program will terminate with an error message.

Entanglement detection is highly optimized, and often does not have a
significant impact on performance. We recommend using entanglement detection
liberally.
The entanglement detector is both "sound" and "complete": there are neither
false negatives nor false positives. In other words, the detector always raises
an alarm when entanglement occurs, and never raises an alarm otherwise. Note
however that entanglement (and therefore also entanglement detection) can
be execution-dependent: if your program is non-deterministic (e.g. racy),
then entanglement may or may not occur depending on the outcome of a race
condition. Similarly, entanglement could be input-dependent.

Note that the detector is execution-dependent: if your program
is non-deterministic (e.g. racy), then entanglement may or may not
occur depending on the outcome of a race condition. Similarly, entanglement
could be input-dependent. Therefore, we recommend testing multiple inputs,
as well as running on varying number of processors.
Entanglement detection is highly optimized, and typically has negligible
overhead (see [[5](#waa22)]). It can be disabled at compile-time by passing
`-detect-entanglement false`; however, we recommend against doing so. MPL
relies on entanglement detection to ensure memory safety. We recommend leaving
entanglement detection enabled at all times.

## Bugs and Known Issues

Expand Down Expand Up @@ -280,3 +295,8 @@ POPL 2020.
[Provably Space-Efficient Parallel Functional Programming](http://www.cs.cmu.edu/~swestric/21/popl.pdf).
Jatin Arora, Sam Westrick, and Umut A. Acar.
POPL 2021.

[<a name="waa22">5</a>]
[Entanglement Detection with Near-Zero Cost](http://www.cs.cmu.edu/~swestric/22/icfp-detect.pdf).
Sam Westrick, Jatin Arora, and Umut A. Acar.
ICFP 2022.
2 changes: 2 additions & 0 deletions basis-library/mpl/gc.sig
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ sig
val numberSuspectsCleared: unit -> IntInf.int
val bytesPinnedEntangled: unit -> IntInf.int

val getControlMaxCCDepth: unit -> int

(* The following are all cumulative statistics (initially 0, and only
* increase throughout execution).
*
Expand Down
3 changes: 3 additions & 0 deletions basis-library/mpl/gc.sml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,9 @@ struct
fun approxRaceFactor () =
(GC.approxRaceFactor (gcState ()))

fun getControlMaxCCDepth () =
Word32.toInt (GC.getControlMaxCCDepth (gcState ()))

fun numberSuspectsMarked () =
C_UIntmax.toLargeInt (GC.numberSuspectsMarked (gcState ()))

Expand Down
2 changes: 2 additions & 0 deletions basis-library/primitive/prim-mlton.sml
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,8 @@ structure GC =
val setSummary = _import "GC_setControlsSummary" private: GCState.t * bool -> unit;
val unpack = _import "GC_unpack" runtime private: GCState.t -> unit;

val getControlMaxCCDepth = _import "GC_getControlMaxCCDepth" runtime private: GCState.t -> Word32.word;

(* SAM_NOTE: TODO: move these to prim-mpl.sml *)
val getLocalGCMillisecondsOfProc = _import "GC_getLocalGCMillisecondsOfProc" runtime private : GCState.t * Word32.word -> C_UIntmax.t;
val getPromoMillisecondsOfProc = _import "GC_getPromoMillisecondsOfProc" runtime private : GCState.t * Word32.word -> C_UIntmax.t;
Expand Down
4 changes: 3 additions & 1 deletion basis-library/schedulers/shh/Scheduler.sml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ struct
| SOME m => depth < m
end

val maxCCDepth = MPL.GC.getControlMaxCCDepth ()

val P = MLton.Parallel.numberOfProcessors
val internalGCThresh = Real.toInt IEEEReal.TO_POSINF
((Math.log10(Real.fromInt P)) / (Math.log10 (2.0)))
Expand Down Expand Up @@ -426,7 +428,7 @@ struct
val depth = HH.getDepth thread
in
(* if ccOkayAtThisDepth andalso depth = 1 then *)
if ccOkayAtThisDepth andalso depth >= 1 andalso depth <= 3 then
if ccOkayAtThisDepth andalso depth >= 1 andalso depth <= maxCCDepth then
forkGC thread depth (f, g)
else if depth < Queue.capacity andalso depthOkayForDECheck depth then
parfork thread depth (f, g)
Expand Down
1 change: 1 addition & 0 deletions basis-library/schedulers/shh/sources.mlb
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
local
$(SML_LIB)/basis/basis.mlb
$(SML_LIB)/basis/mlton.mlb
$(SML_LIB)/basis/mpl.mlb
$(SML_LIB)/basis/unsafe.mlb

local
Expand Down
6 changes: 3 additions & 3 deletions runtime/gc/concurrent-collection.c
Original file line number Diff line number Diff line change
Expand Up @@ -1115,9 +1115,9 @@ size_t CC_collectWithRoots(
}
#endif

uint64_t bytesSaved = HM_getChunkListSize(repList);
uint64_t bytesScanned = HM_getChunkListSize(repList)
+ HM_getChunkListSize(origList);
uint64_t bytesSaved = HM_getChunkListUsedSize(repList);
uint64_t bytesScanned = HM_getChunkListUsedSize(repList)
+ HM_getChunkListUsedSize(origList);

cp->bytesSurvivedLastCollection = bytesSaved;
cp->bytesAllocatedSinceLastCollection = 0;
Expand Down
4 changes: 4 additions & 0 deletions runtime/gc/controls.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,10 @@ struct HM_HierarchicalHeapConfig {
/* smallest amount for a CC */
size_t minCCSize;

size_t maxCCChainLength;
double ccThresholdRatio;
uint32_t maxCCDepth;

/* the shallowest depth that will be claimed for a local
* collection. */
uint32_t minLocalDepth;
Expand Down
4 changes: 4 additions & 0 deletions runtime/gc/gc_state.c
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,10 @@ void GC_setControlsRusageMeasureGC (GC_state s, Bool_t b) {
s->controls->rusageMeasureGC = (bool)b;
}

uint32_t GC_getControlMaxCCDepth(GC_state s) {
return (uint32_t)s->controls->hhConfig.maxCCDepth;
}

// SAM_NOTE: TODO: remove this and replace with blocks statistics
size_t GC_getMaxChunkPoolOccupancy (void) {
return 0;
Expand Down
2 changes: 2 additions & 0 deletions runtime/gc/gc_state.h
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,8 @@ PRIVATE uintmax_t GC_numSuspectsMarked(GC_state s);
PRIVATE uintmax_t GC_numSuspectsCleared(GC_state s);
PRIVATE uintmax_t GC_bytesPinnedEntangled(GC_state s);

PRIVATE uint32_t GC_getControlMaxCCDepth(GC_state s);

PRIVATE pointer GC_getCallFromCHandlerThread (GC_state s);
PRIVATE void GC_setCallFromCHandlerThreads (GC_state s, pointer p);
PRIVATE pointer GC_getCurrentThread (GC_state s);
Expand Down
4 changes: 2 additions & 2 deletions runtime/gc/hierarchical-heap-collection.c
Original file line number Diff line number Diff line change
Expand Up @@ -386,7 +386,7 @@ void HM_HHC_collectLocal(uint32_t desiredScope)
// traverseEachObjInChunkList(s, HM_HH_getChunkList(cursor));
#endif
uint32_t d = HM_HH_getDepth(cursor);
size_t sz = HM_getChunkListSize(HM_HH_getChunkList(cursor));
size_t sz = HM_getChunkListUsedSize(HM_HH_getChunkList(cursor));
sizesBefore[d] = sz;
totalSizeBefore += sz;
}
Expand Down Expand Up @@ -920,7 +920,7 @@ void HM_HHC_collectLocal(uint32_t desiredScope)
uint32_t i = HM_HH_getDepth(cursor);

HM_chunkList lev = HM_HH_getChunkList(cursor);
size_t sizeAfter = HM_getChunkListSize(lev);
size_t sizeAfter = HM_getChunkListUsedSize(lev);
totalSizeAfter += sizeAfter;

#if ASSERT
Expand Down
4 changes: 2 additions & 2 deletions runtime/gc/hierarchical-heap.c
Original file line number Diff line number Diff line change
Expand Up @@ -820,7 +820,7 @@ bool checkPolicyforRoot(
cursor = cursor->subHeapForCC)
{
chainLen++;
if (chainLen > 2)
if (chainLen > s->controls->hhConfig.maxCCChainLength)
return FALSE;
}

Expand All @@ -833,7 +833,7 @@ bool checkPolicyforRoot(
HM_HH_getConcurrentPack(cursor)->bytesSurvivedLastCollection;
}

if((2*bytesSurvived) >
if((s->controls->hhConfig.ccThresholdRatio * bytesSurvived) >
(HM_HH_getConcurrentPack(hh)->bytesAllocatedSinceLastCollection)
|| bytesSurvived == 0) {
// if (!HM_HH_getConcurrentPack(hh)->shouldCollect) {
Expand Down
35 changes: 35 additions & 0 deletions runtime/gc/init.c
Original file line number Diff line number Diff line change
Expand Up @@ -355,6 +355,16 @@ int processAtMLton (GC_state s, int start, int argc, char **argv,
if (s->controls->hhConfig.collectionThresholdRatio < 1.0) {
die("%s collection-threshold-ratio must be at least 1.0", atName);
}
} else if (0 == strcmp (arg, "cc-threshold-ratio")) {
i++;
if (i == argc || (0 == strcmp (argv[i], "--"))) {
die ("%s cc-threshold-ratio missing argument.", atName);
}

s->controls->hhConfig.ccThresholdRatio = stringToFloat(argv[i++]);
if (s->controls->hhConfig.ccThresholdRatio <= 1.0) {
die("%s cc-threshold-ratio must be > 1.0", atName);
}
} else if (0 == strcmp(arg, "min-collection-size")) {
i++;
if (i == argc || (0 == strcmp (argv[i], "--"))) {
Expand All @@ -369,6 +379,17 @@ int processAtMLton (GC_state s, int start, int argc, char **argv,
}

s->controls->hhConfig.minCCSize = stringToBytes(argv[i++]);
} else if (0 == strcmp(arg, "max-cc-chain-length")) {
i++;
if (i == argc || (0 == strcmp (argv[i], "--"))) {
die ("%s max-cc-chain-length missing argument.", atName);
}

int len = stringToInt(argv[i++]);
if (len <= 0) {
die ("%s max-cc-chain-length must be >= 1", atName);
}
s->controls->hhConfig.maxCCChainLength = len;
} else if (0 == strcmp(arg, "min-collection-depth")) {
i++;
if (i == argc || (0 == strcmp (argv[i], "--"))) {
Expand All @@ -380,6 +401,17 @@ int processAtMLton (GC_state s, int start, int argc, char **argv,
die ("%s min-collection-depth must be > 0", atName);
}
s->controls->hhConfig.minLocalDepth = minDepth;
} else if (0 == strcmp(arg, "max-cc-depth")) {
i++;
if (i == argc || (0 == strcmp (argv[i], "--"))) {
die ("%s max-cc-depth missing argument.", atName);
}

int maxd = stringToInt(argv[i++]);
if (maxd < 0) {
die ("%s max-cc-depth must be >= 0", atName);
}
s->controls->hhConfig.maxCCDepth = maxd;
} else if (0 == strcmp(arg, "trace-buffer-size")) {
i++;
if (i == argc || (0 == strcmp (argv[i], "--"))) {
Expand Down Expand Up @@ -434,6 +466,9 @@ int GC_init (GC_state s, int argc, char **argv) {
s->controls->hhConfig.collectionThresholdRatio = 8.0;
s->controls->hhConfig.minCollectionSize = 1024L * 1024L;
s->controls->hhConfig.minCCSize = 1024L * 1024L;
s->controls->hhConfig.maxCCChainLength = 2;
s->controls->hhConfig.ccThresholdRatio = 2.0f;
s->controls->hhConfig.maxCCDepth = 3;
s->controls->hhConfig.minLocalDepth = 2;
s->controls->rusageMeasureGC = FALSE;
s->controls->summary = FALSE;
Expand Down

0 comments on commit 11e7202

Please sign in to comment.