feat: ucan stream endpoint and upload api with invocations and receipts #172

vasco-santos · 2023-03-22T14:59:48Z

This PR includes two significant changes:

upload-api ucanto server now follows same pattern as access-api proposal from @Gozala's feat: write invocations and receipts into ucan log w3up#592
- saves CAR file with UCAN invocations before executing them
- sends one receipt per invocation when they are executed
modifies POST /ucan endpoint to accept both receipts and CAR invocations as discussed sync yesterday
- it identifies by ContentType wether is a CAR of invocations or a Receipt (processUcanLogRequest)
- for CAR of invocations, everything is quite similar as before:
  - parse CAR from bytes, persist CAR and send a view of it to Kinesis
  - more than persisting the CARs, we also persist a map of invocation CIDs to CARs where they live in same fashion as we do with DUDEWHERE
  - kinesis now also receives a type property, which can either be Invocation, or receipt like the actual endpoint
  - NOTE: removed proofs as they can get quite big and we already talked about getting rid of it before
- for receipts:
  - we parse the receipt CBOR, verify if given receipt is from an Invocation that we previously got. If so, we can persist the Receipt
  - Kinesis stream receives quite similar content to the invocation. Main differences are that a receipt type is used, and we also include the inovcationCid. In other words, we provide value from invocation so that Kinesis consumers can know about invocation and nb from invocation to filter out and account metrics

Other notes:

upload-api hooks in the same functions for integrating with UCAN Stream as the /ucan endpoint, avoiding the HTTP calls
Storing receipts and invocations Bucket keys were decided yesterday.
Like feat: write invocations and receipts into ucan log w3up#592, we aim to get to ucanto part of the code that we currently inline to create the receipts
Will do a follow up PR for metric consumers to update to new world

seed-deploy · 2023-03-22T15:12:37Z

View stack outputs

pr172-w3infra-CarparkStack

Name Value

BucketName carpark-pr172-0

Region us-east-2
pr172-w3infra-SatnavStack

Name Value

BucketName satnav-pr172-0

Region us-east-2
pr172-w3infra-UploadApiStack

Name Value

ApiEndpoint https://6obv024to0.execute-api.us-east-2.amazonaws.com

CustomDomain https://pr172.up.web3.storage
pr172-w3infra-BusStack
pr172-w3infra-ReplicatorStack
pr172-w3infra-UcanInvocationStack
pr172-w3infra-UploadDbStack

alanshaw · 2023-03-22T17:03:13Z

upload-api/buckets/ucan-store.js

+    putInvocation: async (invocationCid, carCid) => {
+      const putCmd = new PutObjectCommand({
+        Bucket: bucketName,
+        Key: `${invocationCid}/${carCid}.invocation`,


🙏 Please can you document the bucket structure on a README somewhere?

So if I understand this correctly, we get a single CAR file (an invocation), and each root is the CID of a task

I think we've been calling tasks => invocations and invocations => requests, but we should probably orient more around invocation+task.

I think we just need to rename invocationCid to taskCid and carCid to invocationCid...right?

Our mapping would be:

# CAR file of invocation, roots are tasks /invocation-cid/invocation-cid.car # empty file, pointer back to invocation /task-cid/invocation-cid.invocation # CBOR bytes of receipt /task-cid/receipt-cid.receipt

Yes, I will add docs for that!

I was under the impression that a CAR file had multiple invocations, that is a batch of invocations. So carCid is CID of a batch, each invocationCid is the taskCid within the batch.

But yes, would love @Gozala thoughts to converge to UCAN flavour naming

I think we've been calling tasks => invocations and invocations => requests, but we should probably orient more around invocation+task.

Lets not overload invocation plz because of the invocation spec according to which invocation is essentially a singed task. What I was calling a request would be a set or a batch of invocations, or a query in GraphQL terms.

I just checked with IPVM team and they call those workflows, so I suggest we embrace than name as well.

From above, looks like there is no reason for us to put receipt.cid in receipt name as this PR has. But rather just /${task.cid}/${invocation.cid}.receipt

There might be a reason to capture receipt cid in the path e.g. to signal encoding so we could do following instead

/${workflow.cid}/${workflow.cid}.workflow /${task.cid}/${invocation.cid}/${workflow.cid}.workflow /${task.cid}/${invocation.cid}/${receipt.cid}.receipt

Wondering if we should just suffix as "extension" for now?

I mean we already have a tag in CID for this so why not just embrace it

Me and @alanshaw had a chat and came up with a following plan

This is the layout that makes most sense in the future and should map well to what we have now:

# Request blobs likely in CAR but cid codec should tell if it's something else /workflow/${workflow.cid} # Pseudo symlink to `/workflow/${workflow.cid}` /invocation/${invocation.cid}/${workflow.cid}.workflow # Receipt block issued for a specific task invocation. /invocation/${invocation.cid}.receipt # Block containing `out` field of the receipt. So that when we get an invocation # with the same task we can read the result and issue receipt without rerunning # a task. Could be written on first receipt. /task/${task.cid}.result # Pseudo symlinks to `/invocation/${invocation.cid}/` # So we can lookup invocations & receipts by task /task/${task.cid}/${invocation.cid}.invocation

For now we'll use delegation CID as both invocation and task CID. Delegation CIDs are the roots in the received CAR and CID in the .ran field of the received receipt.

When we implement invocation spec we'll change endpoint such that receipts will be send in CARs that will also contain corresponding tasks. That way both invocation and task CID will be known without doing additional lookups.

@Gozala we decided to put workflow, invocation and task items in separate buckets to avoid S3 rate limits on a common prefix and still retain the grouping.

Thanks for syncing and reporting back last night. As talked with Alan today;s morning, I like the format but that brings rate limit issues in buckets. Therefore, moved forward with:

# Request blobs likely in CAR but cid codec should tell if it's something else ${workflow.cid}/${workflow.cid} # Pseudo symlink to `/workflow/${workflow.cid}` ${invocation.cid}/${workflow.cid}.workflow # Receipt block issued for a specific task invocation. ${invocation.cid}/${invocation.cid}.receipt # Block containing `out` field of the receipt. So that when we get an invocation # with the same task we can read the result and issue receipt without rerunning # a task. Could be written on first receipt. ${task.cid}/${task.cid}.result # Pseudo symlinks to `/invocation/${invocation.cid}/` # So we can lookup invocations & receipts by task ${task.cid}/${invocation.cid}.invocation

Note that we could also do:

# Pseudo symlink to `/workflow/${workflow.cid}` ${invocation.cid}/${invocation.cid}/${workflow.cid}.workflow # Pseudo symlinks to `/invocation/${invocation.cid}/` # So we can lookup invocations & receipts by task ${task.cid}/${task.cid}/${invocation.cid}.invocation

but I feel that it is not needed? Thoughts?

alanshaw · 2023-03-23T13:14:43Z

stacks/ucan-invocation-stack.js

+    }
+  })
+
+  // TODO: keep for historical content that we might want to process


Does this definitely keep it around? The act of instantiating a class registers it?

Yes, it keeps in Cloudformation. in reality we could even remove it... In prod/staging it won't get deleted per https://github.com/web3-storage/w3infra/blob/main/stacks/config.js#L38

But better to keep it around for now to hook up consumers if we want to still read from there. Otherwise, we will need to hook it up with ARN id

alanshaw · 2023-03-23T13:18:48Z

upload-api/buckets/invocation-store.js

+      await pRetry(() => s3client.send(putCmd))
+    },
+    /**
+     * Get CAR bytes for a given invocation.


Suggested change

* Get CAR bytes for a given invocation.

* Get the workflow CID for an invocation.

alanshaw · 2023-03-23T13:20:26Z

upload-api/buckets/task-store.js

+export const useTaskStore = (s3client, bucketName) => {
+  return {
+    /**
+     * Put block containing `out` filed of the receipt. So that when we get an invocation


Suggested change

* Put block containing `out` filed of the receipt. So that when we get an invocation

* Put block containing `out` field of the receipt, so that when we get an invocation

alanshaw · 2023-03-23T13:22:35Z

upload-api/buckets/task-store.js

+     * @param {string} cid
+     * @param {Uint8Array} bytes
+     */
+    putResult: async (cid, bytes) => {
+      const putCmd = new PutObjectCommand({
+        Bucket: bucketName,
+        Key: `${cid}/${cid}.result`,


I'd explicitly name this so it is not confusing what this CID is referring to.

Suggested change

* @param {string} cid

* @param {Uint8Array} bytes

*/

putResult: async (cid, bytes) => {

const putCmd = new PutObjectCommand({

Bucket: bucketName,

Key: `${cid}/${cid}.result`,

* @param {string} taskCid

* @param {Uint8Array} bytes

*/

putResult: async (taskCid, bytes) => {

const putCmd = new PutObjectCommand({

Bucket: bucketName,

Key: `${taskCid}/${taskCid}.result`,

alanshaw · 2023-03-23T13:23:00Z

upload-api/buckets/task-store.js

+     * @param {string} cid 
+     * @param {string} invocationCid 
+     */
+    putIndex: async (cid, invocationCid) => {
+      const putCmd = new PutObjectCommand({
+        Bucket: bucketName,
+        Key: `${cid}/${invocationCid}.invocation`,


Suggested change

* @param {string} cid

* @param {string} invocationCid

*/

putIndex: async (cid, invocationCid) => {

const putCmd = new PutObjectCommand({

Bucket: bucketName,

Key: `${cid}/${invocationCid}.invocation`,

* @param {string} taskCid

* @param {string} invocationCid

*/

putIndex: async (taskCid, invocationCid) => {

const putCmd = new PutObjectCommand({

Bucket: bucketName,

Key: `${taskCid}/${invocationCid}.invocation`,

alanshaw · 2023-03-23T13:24:33Z

upload-api/buckets/invocation-store.js

+     * @param {string} invocationCid
+     * @param {string} workflowCid
+     */
+    putIndex: async (invocationCid, workflowCid) => {


Suggested change

putIndex: async (invocationCid, workflowCid) => {

putWorkflowIndex: async (invocationCid, workflowCid) => {

...might be better to name this "link" rather than "index"

alanshaw · 2023-03-23T13:32:26Z

upload-api/ucan-invocation.js

+  if (contentType === CONTENT_TYPE.WORKFLOW) {
+    return await processWorkflow(bytes, ctx)
+  } else if (contentType === CONTENT_TYPE.RECEIPT) {
+    return await processTaskReceipt(bytes, ctx)


This is an invocation receipt.

Suggested change

return await processTaskReceipt(bytes, ctx)

return await processInvocationReceipt(bytes, ctx)

seed-deploy bot temporarily deployed to pr172 March 22, 2023 14:59 Inactive

vasco-santos force-pushed the feat/ucan-stream-endpoint-and-upload-api-with-invocations-and-receipts branch from 852f765 to c2a0004 Compare March 22, 2023 15:33

seed-deploy bot temporarily deployed to pr172 March 22, 2023 15:33 Inactive

feat: ucan stream endpoint and upload api with invocations and receipts

2337bd4

vasco-santos force-pushed the feat/ucan-stream-endpoint-and-upload-api-with-invocations-and-receipts branch from c2a0004 to 2337bd4 Compare March 22, 2023 16:34

seed-deploy bot temporarily deployed to pr172 March 22, 2023 16:34 Inactive

vasco-santos marked this pull request as ready for review March 22, 2023 16:45

vasco-santos requested review from Gozala and alanshaw March 22, 2023 16:45

alanshaw reviewed Mar 22, 2023

View reviewed changes

vasco-santos added this to the w3up phase 3 milestone Mar 22, 2023

fix: renameto workflow, invocation and task storing the new information

6ab9c1e

seed-deploy bot temporarily deployed to pr172 March 23, 2023 10:30 Inactive

test: receipt full flow with persisted in bucket validated

f151530

seed-deploy bot temporarily deployed to pr172 March 23, 2023 10:38 Inactive

test: ucan stream content

4a26d44

seed-deploy bot temporarily deployed to pr172 March 23, 2023 11:11 Inactive

fix: use multiple buckets

4cd0a7d

seed-deploy bot temporarily deployed to pr172 March 23, 2023 11:55 Inactive

fix: make bucket keys not be rate limited

aac4cad

seed-deploy bot temporarily deployed to pr172 March 23, 2023 12:03 Inactive

fix: drop ucan replicator

62e4c5a

seed-deploy bot temporarily deployed to pr172 March 23, 2023 12:06 Inactive

vasco-santos requested a review from alanshaw March 23, 2023 12:14

vasco-santos mentioned this pull request Mar 23, 2023

Archive (and process?) alfa stream content #173

Open

alanshaw requested changes Mar 23, 2023

View reviewed changes

chore: address review comments

e9ade23

seed-deploy bot temporarily deployed to pr172 March 23, 2023 13:50 Inactive

alanshaw approved these changes Mar 23, 2023

View reviewed changes

vasco-santos merged commit 2f4b293 into main Mar 23, 2023

vasco-santos deleted the feat/ucan-stream-endpoint-and-upload-api-with-invocations-and-receipts branch March 23, 2023 13:55

vasco-santos mentioned this pull request Mar 23, 2023

Enable storage of UCAN invocations and piping them to UCAN Stream #168

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: ucan stream endpoint and upload api with invocations and receipts #172

feat: ucan stream endpoint and upload api with invocations and receipts #172

vasco-santos commented Mar 22, 2023 •

edited

Loading

seed-deploy bot commented Mar 22, 2023

alanshaw Mar 22, 2023

alanshaw Mar 22, 2023

alanshaw Mar 22, 2023

vasco-santos Mar 22, 2023

Gozala Mar 22, 2023 •

edited

Loading

Gozala Mar 22, 2023

Gozala Mar 22, 2023

Gozala Mar 22, 2023

alanshaw Mar 23, 2023

vasco-santos Mar 23, 2023

alanshaw Mar 23, 2023

vasco-santos Mar 23, 2023

alanshaw Mar 23, 2023

alanshaw Mar 23, 2023

alanshaw Mar 23, 2023

alanshaw Mar 23, 2023

alanshaw Mar 23, 2023

alanshaw Mar 23, 2023

	* Get CAR bytes for a given invocation.
	* Get the workflow CID for an invocation.

	* Put block containing `out` filed of the receipt. So that when we get an invocation
	* Put block containing `out` field of the receipt, so that when we get an invocation

	putIndex: async (invocationCid, workflowCid) => {
	putWorkflowIndex: async (invocationCid, workflowCid) => {

	return await processTaskReceipt(bytes, ctx)
	return await processInvocationReceipt(bytes, ctx)

feat: ucan stream endpoint and upload api with invocations and receipts #172

feat: ucan stream endpoint and upload api with invocations and receipts #172

Conversation

vasco-santos commented Mar 22, 2023 • edited Loading

seed-deploy bot commented Mar 22, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gozala Mar 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vasco-santos commented Mar 22, 2023 •

edited

Loading

Gozala Mar 22, 2023 •

edited

Loading