Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Published persistent provenance #44

Closed
wants to merge 2 commits into from
Closed

Published persistent provenance #44

wants to merge 2 commits into from

Conversation

charmoniumQ
Copy link
Owner

No description provided.

@charmoniumQ charmoniumQ requested a review from Ex-32 August 7, 2024 20:34
@charmoniumQ
Copy link
Owner Author

@Ex-32 We will have to implement this in Rust at transcribe-time eventually. This is a mock-up of what the Python-side data structures will be. Please especially review the rationale (markdown).

@dataclasses.dataclass(frozen=True)
class InodeMetadataVersion:
inode_version: InodeVersion
stat_results: bytes
Copy link
Collaborator

@Ex-32 Ex-32 Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not a big deal, but we should have a dataclass to represent a stat in the probe_py.generated.

- `inode_version_writes` maps an inode-and-version (filename) to the process ID (in the file contents) of the process that created it.
- `processes` maps a process ID (filename) to a Process object (in the file contents).

Note: multi-process sqlite is either not performant (globally lock entire database) or not easy (lock each row somehow), but filesystem will work fine for our case, since we just need a key-value store.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sqlite has built-in serializable ACID transactions, as long as there are no network filesystems involved, we can just open the database on multiple processes and have it just work™; sqlite's query optimization engine is good enough we could probably just SELECT * FROM Processes WHERE inode_version=...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said, we probably don't want to just dump probe_log blobs into sqlite since it has a hardcoded limit of 1GB for strings and blobs (ask me how i know 🙃).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether to use sqlite or raw files is an important design choice. We wouldn't be storing the entire probe_log; we would be disassembling probe_log into a set of rows: one for each process, one for each inode at each version (versioned at open/close-time).

As far as I understand, serializing the transactions is a bottleneck. In principle, the transactions should be independent, so we don't need to serialize them at all. They both update different (theoretically) files in the current design, because each inode/version is written by exactly one process. However, there may be future tables that do require locking (e.g., maintaining a table of what processes use the inode/version would be racey).

There is also the issue of transferring; when we run SCP or Rsync, we will need to transfer specific key-value pairs over to the remote. In files, this is easy, but results in a lot of files, but I think that is ok. In sqlite, it would result in a number of rows from a number of tables. I'm not sure which is better.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point is that traversing the graph requires "pointer-chasing" through multiple rows (each row tells you where to find the next one). This kind of access pattern results in O(N) file reads, which may be a lot, or O(N) select-queries, which is still a lot of queries, but new query is much cheaper than open/close new file-descriptor.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(No response required, since Ex-32 is already busy; just thinking out loud)

I need to expand on the transactions we will be doing, and at what frequency. Basically, the persistent store should be a bipartite graph between processes and inode-versions.

  • At transcription-time (which probably happens no more than once per minute): insert nodes and edges.
  • When the user asks for it (less than once per minute): traverse the graph "up" from a specific inode/version, "what inputs were used to make this output? (aka pull-based updating)". This query is used in applications like "Make-without-Makefile" application.
  • When the user asks for it (less than once per minute): traverse the graph "down" from specific inode/version, "what outputs were dependent on this input? (aka push-based updating)". This query is used a lot less than the pull-based version, but could still be useful if the user does a system upgrade, and they don't know which of their projects need to get recompiled/recomputed.
  • When the user does an SCP or Rsync (less than once per minute): traverse the graph "up", so we can transfer the "relevant" bits of provenance to the remote, so a user at the destination-machine (destination could be local or remote) can query the provenance of the files we are sending.
  • When the user issues "garbage collect" (less than once per minute): iterate over every inode/version; mark the ones that still exist as "not deletable". Mark any ancestor of "not deletable" as "not deletable" (search "up" in the provenance graph from each existent provenance version).

All in all, the operations are slow enough that the parallelism between readers and writers that is potentially available in filesystem but not available in sqlite (uses readers-writers lock), is not that important. On the other hand, traversing up or down the graph is quite important. For a file-based solution, each edge-traversal requires open/close of a file; sqlite loads the graph in memory (if it fits), so each edge-traversal is a pointer dereference. If the graph does not fit in memory, sqlite will load blocks of the table (presumably LRU cached), which is still much more efficient than open/close. While a graph database (e.g., Neo4J) would be even better, Neo4J is not "embedable" in a Python application (only in Java). The most popular deployments of Neo4J are as a daemon process, which would be annoying. With sqlite, PROBE can be "daemonless".

Therefore, I think persistent provenance should be held in sqlite in the future. @Shofiya2003, the logic regarding SCP/Rsync wrapper won't change; still call get_prov_upstream; the output will be a set of objects, which you can call a new function, transfer(objects, host), which will figure out how to transfer the objects to the remote host, whether they are filenames (the current scheme) or objects/sqlite-rows (the future scheme).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At transcription-time (which probably happens no more than once per minute): insert nodes and edges.

While I agree with your overall rational, some tasks, like compiling C/C++ code, where each file spawns several processes (compiler, assembler, etc.), could potentially produce higher transcription loads (in the range of once per second).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transcription should take place when the PROBEd process-tree completes, not when a single PROBEd process completes. I'm assuming users will do probe make instead of make CC="probe gcc" (but maybe they won't). Ideally, we need to explain that one should try to PROBE the greatest unborn ancestor process (when it gets born) 😆

@charmoniumQ charmoniumQ marked this pull request as draft August 14, 2024 19:11
@Shofiya2003 Shofiya2003 mentioned this pull request Aug 15, 2024
@charmoniumQ
Copy link
Owner Author

Closed in favor of #75 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants