Published persistent provenance #44

charmoniumQ · 2024-08-07T20:07:48Z

No description provided.

charmoniumQ · 2024-08-07T20:38:00Z

@Ex-32 We will have to implement this in Rust at transcribe-time eventually. This is a mock-up of what the Python-side data structures will be. Please especially review the rationale (markdown).

Ex-32 · 2024-08-09T14:20:51Z

probe_src/python/probe_py/manual/persistent_provenance.py

+@dataclasses.dataclass(frozen=True)
+class InodeMetadataVersion:
+    inode_version: InodeVersion
+    stat_results: bytes


Probably not a big deal, but we should have a dataclass to represent a stat in the probe_py.generated.

Ex-32 · 2024-08-09T14:29:43Z

probe_src/persistent_provenance.md

+- `inode_version_writes` maps an inode-and-version (filename) to the process ID (in the file contents) of the process that created it.
+- `processes` maps a process ID (filename) to a Process object (in the file contents).
+
+Note: multi-process sqlite is either not performant (globally lock entire database) or not easy (lock each row somehow), but filesystem will work fine for our case, since we just need a key-value store.


sqlite has built-in serializable ACID transactions, as long as there are no network filesystems involved, we can just open the database on multiple processes and have it just work™; sqlite's query optimization engine is good enough we could probably just SELECT * FROM Processes WHERE inode_version=...

That being said, we probably don't want to just dump probe_log blobs into sqlite since it has a hardcoded limit of 1GB for strings and blobs (ask me how i know 🙃).

Whether to use sqlite or raw files is an important design choice. We wouldn't be storing the entire probe_log; we would be disassembling probe_log into a set of rows: one for each process, one for each inode at each version (versioned at open/close-time).

As far as I understand, serializing the transactions is a bottleneck. In principle, the transactions should be independent, so we don't need to serialize them at all. They both update different (theoretically) files in the current design, because each inode/version is written by exactly one process. However, there may be future tables that do require locking (e.g., maintaining a table of what processes use the inode/version would be racey).

There is also the issue of transferring; when we run SCP or Rsync, we will need to transfer specific key-value pairs over to the remote. In files, this is easy, but results in a lot of files, but I think that is ok. In sqlite, it would result in a number of rows from a number of tables. I'm not sure which is better.

Another point is that traversing the graph requires "pointer-chasing" through multiple rows (each row tells you where to find the next one). This kind of access pattern results in O(N) file reads, which may be a lot, or O(N) select-queries, which is still a lot of queries, but new query is much cheaper than open/close new file-descriptor.

(No response required, since Ex-32 is already busy; just thinking out loud)

I need to expand on the transactions we will be doing, and at what frequency. Basically, the persistent store should be a bipartite graph between processes and inode-versions.

At transcription-time (which probably happens no more than once per minute): insert nodes and edges.

When the user asks for it (less than once per minute): traverse the graph "up" from a specific inode/version, "what inputs were used to make this output? (aka pull-based updating)". This query is used in applications like "Make-without-Makefile" application.

When the user asks for it (less than once per minute): traverse the graph "down" from specific inode/version, "what outputs were dependent on this input? (aka push-based updating)". This query is used a lot less than the pull-based version, but could still be useful if the user does a system upgrade, and they don't know which of their projects need to get recompiled/recomputed.

When the user does an SCP or Rsync (less than once per minute): traverse the graph "up", so we can transfer the "relevant" bits of provenance to the remote, so a user at the destination-machine (destination could be local or remote) can query the provenance of the files we are sending.

When the user issues "garbage collect" (less than once per minute): iterate over every inode/version; mark the ones that still exist as "not deletable". Mark any ancestor of "not deletable" as "not deletable" (search "up" in the provenance graph from each existent provenance version).

All in all, the operations are slow enough that the parallelism between readers and writers that is potentially available in filesystem but not available in sqlite (uses readers-writers lock), is not that important. On the other hand, traversing up or down the graph is quite important. For a file-based solution, each edge-traversal requires open/close of a file; sqlite loads the graph in memory (if it fits), so each edge-traversal is a pointer dereference. If the graph does not fit in memory, sqlite will load blocks of the table (presumably LRU cached), which is still much more efficient than open/close. While a graph database (e.g., Neo4J) would be even better, Neo4J is not "embedable" in a Python application (only in Java). The most popular deployments of Neo4J are as a daemon process, which would be annoying. With sqlite, PROBE can be "daemonless".

Therefore, I think persistent provenance should be held in sqlite in the future. @Shofiya2003, the logic regarding SCP/Rsync wrapper won't change; still call get_prov_upstream; the output will be a set of objects, which you can call a new function, transfer(objects, host), which will figure out how to transfer the objects to the remote host, whether they are filenames (the current scheme) or objects/sqlite-rows (the future scheme).

At transcription-time (which probably happens no more than once per minute): insert nodes and edges.

While I agree with your overall rational, some tasks, like compiling C/C++ code, where each file spawns several processes (compiler, assembler, etc.), could potentially produce higher transcription loads (in the range of once per second).

Transcription should take place when the PROBEd process-tree completes, not when a single PROBEd process completes. I'm assuming users will do probe make instead of make CC="probe gcc" (but maybe they won't). Ideally, we need to explain that one should try to PROBE the greatest unborn ancestor process (when it gets born) 😆

charmoniumQ · 2024-12-11T22:42:51Z

Closed in favor of #75 .

Published persistent provenance

ee6f048

charmoniumQ requested a review from Ex-32 August 7, 2024 20:34

Update persistent_provenance.md

73930b4

Ex-32 reviewed Aug 9, 2024

View reviewed changes

charmoniumQ marked this pull request as draft August 14, 2024 19:11

Shofiya2003 mentioned this pull request Aug 15, 2024

Created SCP wrapper #51

Merged

charmoniumQ closed this Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Published persistent provenance #44

Published persistent provenance #44

charmoniumQ commented Aug 7, 2024

charmoniumQ commented Aug 7, 2024

Ex-32 Aug 9, 2024 •

edited

Loading

Ex-32 Aug 9, 2024

Ex-32 Aug 9, 2024

charmoniumQ Aug 12, 2024

charmoniumQ Aug 12, 2024

charmoniumQ Aug 13, 2024

Ex-32 Aug 14, 2024

charmoniumQ Aug 14, 2024

charmoniumQ commented Dec 11, 2024

Published persistent provenance #44

Published persistent provenance #44

Conversation

charmoniumQ commented Aug 7, 2024

charmoniumQ commented Aug 7, 2024

Ex-32 Aug 9, 2024 • edited Loading

Choose a reason for hiding this comment

Ex-32 Aug 9, 2024

Choose a reason for hiding this comment

Ex-32 Aug 9, 2024

Choose a reason for hiding this comment

charmoniumQ Aug 12, 2024

Choose a reason for hiding this comment

charmoniumQ Aug 12, 2024

Choose a reason for hiding this comment

charmoniumQ Aug 13, 2024

Choose a reason for hiding this comment

Ex-32 Aug 14, 2024

Choose a reason for hiding this comment

charmoniumQ Aug 14, 2024

Choose a reason for hiding this comment

charmoniumQ commented Dec 11, 2024

Ex-32 Aug 9, 2024 •

edited

Loading