Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test installation in CI #59

Merged
merged 9 commits into from
Oct 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 22 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,17 @@ on:
jobs:
build:

runs-on: ubuntu-latest
strategy:
matrix:
os:
- ubuntu-20.04
#- ubuntu-22.04
# We can test on a Matrix of Ubuntu versions
# But Nix does get us pretty good reproducibility
# So we will test just one in CI.
# More full tests can be run by docker_os_matrix.py

runs-on: ${{ matrix.os }}

steps:
- name: Checkout repository
Expand All @@ -21,5 +31,15 @@ jobs:
extra_nix_config: |
experimental-features = nix-command flakes

- name: Run Just in development shell
- name: Test existence of secrets
run: |
echo -e 'CACHIX_AUTH_TOKEN length: '
echo ${{ secrets.CACHIX_AUTH_TOKEN }} | wc -c

- uses: cachix/cachix-action@v15
with:
name: charmonium
authToken: ${{ secrets.CACHIX_AUTH_TOKEN }}

- name: Run Just on-push stuff in development shell
run: nix develop --command just on-push
29 changes: 22 additions & 7 deletions Justfile
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
fix-format-nix:
#alejandra .
alejandra .

check-format-nix:
#alejandra --check . # TODO: uncomment
alejandra --check .

fix-ruff:
#ruff format probe_src # TODO: uncomment
Expand All @@ -22,6 +22,7 @@ check-clippy:
env --chdir probe_src/frontend cargo clippy

fix-clippy:
git add -A
env --chdir probe_src/frontend cargo clippy --fix --allow-staged

check-mypy:
Expand All @@ -40,15 +41,29 @@ compile-tests:

compile: compile-lib compile-cli compile-tests

test-ci: compile-lib
test-ci: compile
pytest probe_src

test-dev: compile-lib
test-dev: compile
pytest probe_src --failed-first --maxfail=1

check-flake:
nix flake check --all-systems

pre-commit: fix-format-nix fix-ruff fix-format-rust fix-clippy compile check-mypy test-dev

on-push: check-format-nix check-ruff check-format-rust check-clippy compile check-mypy check-flake test-ci
user-facing-build: check-flake
# `just compile` is great, but it's the _dev-facing_ build.
# Users will build PROBE following the `README.md`
# which says `nix profile install github:charmoniumQ/PROBE#probe-bundled`
# Which should be equivalent to this:
nix build .#probe-bundled .#probe-py

upload-cachix: user-facing-build
#!/usr/bin/env bash
if [ -z "$CACHIX_AUTH_TOKEN" ]; then
echo "CACHIX_AUTH_TOKEN not set"
exit 1
fi
nix-store -qR --include-outputs $(nix-store -qd $(nix build --print-out-paths --no-link .#probe-bundled .#probe-py)) | grep -v '\.drv$' | cachix push charmonium

pre-commit: fix-format-nix fix-ruff fix-format-rust fix-clippy compile check-mypy test-dev
on-push: check-format-nix check-ruff check-format-rust check-clippy compile check-mypy test-ci check-flake user-facing-build
43 changes: 29 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,6 @@ The provenance graph tells us where a particular file came from.

The provenance graph can help us re-execute the program, containerize the program, turn it into a workflow, or tell us which version of the data did this program use.

## Reading list

- [_Provenance for Computational Tasks: A Survey_ by Freire, et al. in CiSE 2008](https://sci.utah.edu/~csilva/papers/cise2008a.pdf) for an overview of provenance in general.
- [_Transparent Result Caching_ by Vahdat and Anderson in USENIX ATC 1998](https://www.usenix.org/legacy/publications/library/proceedings/usenix98/full_papers/vahdat/vahdat.pdf) for an early system-level provenance tracer in Solaris using the `/proc` fs. Linux's `/proc` fs doesn't have the same functionality. However, this paper discusses two interesting application of provenance: unmake (query lineage information) and transparent Make (more generally, incremental computation).
- [_CDE: Using System Call Interposition to Automatically Create Portable Software Packages_ by Guo and Engler in USENIX ATC 2011](https://www.usenix.org/legacy/events/atc11/tech/final_files/GuoEngler.pdf) for an early system-level provenance tracer. Their only application is software execution replay, but replay is quite an important application.
- [_Techniques for Preserving Scientific Software Executions: Preserve the Mess or Encourage Cleanliness?_ by Thain, Meng, and Ivie in 2015 ](https://curate.nd.edu/articles/journal_contribution/Techniques_for_Preserving_Scientific_Software_Executions_Preserve_the_Mess_or_Encourage_Cleanliness_/24824439?file=43664937) discusses whether enabling automatic-replay is actually a good idea. A cursory glance makes PROBE seem more like "preserving the mess", but I think, with some care in the design choices, it actually can be more like "encouraging cleanliness", for example, by having heuristics that help cull/simplify provenance and generating human readable/editable package-manager recipes.
- [_SoK: History is a Vast Early Warning System: Auditing the Provenance of System Intrusions_ by Inam et al. in IEEE Symposium on Security and Privacy 2023](https://adambates.org/documents/Inam_Oakland23.pdf) see specifically Inam's survey of different possibilities for the "Capture layer", "Reduction layer", and "Infrastructure layer". Although provenance-for-security has different constraints than provenacne for other purposes, the taxonomy that Inam lays out is still useful. PROBE operates by intercepting libc calls, which is essentially a "middleware" in Table I (platform modification, no program modification, no config change, incomplete mediation, not tamperproof, inter-process tracing, etc.).
- [_System-Level Provenance Tracers_ by me et al. in ACM REP 2023](./docs/acm-rep-pres.pdf) for a motivation of this work. It surveys prior work, identifies potential gaps, and explains why I think library interposition is a promising path for future research.
- [_Computational Experiment Comprehension using Provenance Summarization_ by Bufford et al. in ACM REP 2023](https://dl.acm.org/doi/pdf/10.1145/3641525.3663617) discusses how to implement an interface for querying provenance information. They compare classical graph-based visualization with an interactive LLM in a user-study.

## Installing PROBE

1. Install Nix with flakes. This can be done on any Linux (including Ubuntu, RedHat, Arch Linux, not just NixOS), MacOS X, or even Windows Subsystem for Linux.
Expand All @@ -32,13 +22,22 @@ The provenance graph can help us re-execute the program, containerize the progra

- If you already have Nix and are running NixOS, enable flakes with by adding `nix.settings.experimental-features = [ "nix-command" "flakes" ];` to your configuration.

2. Run `nix env -i github:charmoniumQ/PROBE#probe-bundled`.
2. If you want to avoid a time-consuming build, add our public cache.

```bash
nix profile install --accept-flake-config nixpkgs#cachix
cachix use charmonium
```

If you want to build from source (e.g., for security reasons), skip this step.

3. Now you should be able to run `probe record [-f] [-o probe_log] <cmd...>`, e.g., `probe record ./script.py --foo bar.txt`. See below for more details.
3. Run `nix env -i github:charmoniumQ/PROBE#probe-bundled`.

4. To view the provenance, run `probe dump [-i probe_log]`. See below for more details.
4. Now you should be able to run `probe record [-f] [-o probe_log] <cmd...>`, e.g., `probe record ./script.py --foo bar.txt`. See below for more details.

5. Run `probe --help` for more details.
5. To view the provenance, run `probe dump [-i probe_log]`. See below for more details.

6. Run `probe --help` for more details.

## What does `probe record` do?

Expand Down Expand Up @@ -109,6 +108,22 @@ nix shell nixpkgs#graphviz github:charmoniumQ/PROBE#probe-py-manual \

7. **Before submitting a PR**, run `just pre-commit` which will run pre-commit checks.

## Resarch reading list

- [_Provenance for Computational Tasks: A Survey_ by Freire, et al. in CiSE 2008](https://sci.utah.edu/~csilva/papers/cise2008a.pdf) for an overview of provenance in general.

- [_Transparent Result Caching_ by Vahdat and Anderson in USENIX ATC 1998](https://www.usenix.org/legacy/publications/library/proceedings/usenix98/full_papers/vahdat/vahdat.pdf) for an early system-level provenance tracer in Solaris using the `/proc` fs. Linux's `/proc` fs doesn't have the same functionality. However, this paper discusses two interesting application of provenance: unmake (query lineage information) and transparent Make (more generally, incremental computation).

- [_CDE: Using System Call Interposition to Automatically Create Portable Software Packages_ by Guo and Engler in USENIX ATC 2011](https://www.usenix.org/legacy/events/atc11/tech/final_files/GuoEngler.pdf) for an early system-level provenance tracer. Their only application is software execution replay, but replay is quite an important application.

- [_Techniques for Preserving Scientific Software Executions: Preserve the Mess or Encourage Cleanliness?_ by Thain, Meng, and Ivie in 2015 ](https://curate.nd.edu/articles/journal_contribution/Techniques_for_Preserving_Scientific_Software_Executions_Preserve_the_Mess_or_Encourage_Cleanliness_/24824439?file=43664937) discusses whether enabling automatic-replay is actually a good idea. A cursory glance makes PROBE seem more like "preserving the mess", but I think, with some care in the design choices, it actually can be more like "encouraging cleanliness", for example, by having heuristics that help cull/simplify provenance and generating human readable/editable package-manager recipes.

- [_SoK: History is a Vast Early Warning System: Auditing the Provenance of System Intrusions_ by Inam et al. in IEEE Symposium on Security and Privacy 2023](https://adambates.org/documents/Inam_Oakland23.pdf) see specifically Inam's survey of different possibilities for the "Capture layer", "Reduction layer", and "Infrastructure layer". Although provenance-for-security has different constraints than provenacne for other purposes, the taxonomy that Inam lays out is still useful. PROBE operates by intercepting libc calls, which is essentially a "middleware" in Table I (platform modification, no program modification, no config change, incomplete mediation, not tamperproof, inter-process tracing, etc.).

- [_System-Level Provenance Tracers_ by me et al. in ACM REP 2023](./docs/acm-rep-pres.pdf) for a motivation of this work. It surveys prior work, identifies potential gaps, and explains why I think library interposition is a promising path for future research.

- [_Computational Experiment Comprehension using Provenance Summarization_ by Bufford et al. in ACM REP 2023](https://dl.acm.org/doi/pdf/10.1145/3641525.3663617) discusses how to implement an interface for querying provenance information. They compare classical graph-based visualization with an interactive LLM in a user-study.

## Prior art

- [RR-debugger](https://github.com/rr-debugger/rr) which is much slower, but features more complete capturing, lets you replay but doesn't let you do any other analysis.
Expand Down
Loading
Loading