-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: ucan stream endpoint and upload api with invocations and receipts #172
feat: ucan stream endpoint and upload api with invocations and receipts #172
Conversation
View stack outputs
|
852f765
to
c2a0004
Compare
c2a0004
to
2337bd4
Compare
upload-api/buckets/ucan-store.js
Outdated
putInvocation: async (invocationCid, carCid) => { | ||
const putCmd = new PutObjectCommand({ | ||
Bucket: bucketName, | ||
Key: `${invocationCid}/${carCid}.invocation`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏 Please can you document the bucket structure on a README somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand this correctly, we get a single CAR file (an invocation), and each root is the CID of a task
I think we've been calling tasks => invocations and invocations => requests, but we should probably orient more around invocation+task.
I think we just need to rename invocationCid
to taskCid
and carCid
to invocationCid
...right?
Our mapping would be:
# CAR file of invocation, roots are tasks
/invocation-cid/invocation-cid.car
# empty file, pointer back to invocation
/task-cid/invocation-cid.invocation
# CBOR bytes of receipt
/task-cid/receipt-cid.receipt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @Gozala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will add docs for that!
I was under the impression that a CAR file had multiple invocations, that is a batch of invocations. So carCid
is CID of a batch, each invocationCid is the taskCid within the batch.
But yes, would love @Gozala thoughts to converge to UCAN flavour naming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we've been calling tasks => invocations and invocations => requests, but we should probably orient more around invocation+task.
Lets not overload invocation plz because of the invocation spec according to which invocation is essentially a singed task. What I was calling a request would be a set or a batch of invocations, or a query in GraphQL terms.
I just checked with IPVM team and they call those workflows, so I suggest we embrace than name as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From above, looks like there is no reason for us to put receipt.cid in receipt name as this PR has. But rather just /${task.cid}/${invocation.cid}.receipt
There might be a reason to capture receipt cid in the path e.g. to signal encoding so we could do following instead
/${workflow.cid}/${workflow.cid}.workflow
/${task.cid}/${invocation.cid}/${workflow.cid}.workflow
/${task.cid}/${invocation.cid}/${receipt.cid}.receipt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we should just suffix as "extension" for now?
I mean we already have a tag in CID for this so why not just embrace it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me and @alanshaw had a chat and came up with a following plan
- This is the layout that makes most sense in the future and should map well to what we have now:
# Request blobs likely in CAR but cid codec should tell if it's something else
/workflow/${workflow.cid}
# Pseudo symlink to `/workflow/${workflow.cid}`
/invocation/${invocation.cid}/${workflow.cid}.workflow
# Receipt block issued for a specific task invocation.
/invocation/${invocation.cid}.receipt
# Block containing `out` field of the receipt. So that when we get an invocation
# with the same task we can read the result and issue receipt without rerunning
# a task. Could be written on first receipt.
/task/${task.cid}.result
# Pseudo symlinks to `/invocation/${invocation.cid}/`
# So we can lookup invocations & receipts by task
/task/${task.cid}/${invocation.cid}.invocation
-
For now we'll use delegation CID as both invocation and task CID. Delegation CIDs are the roots in the received CAR and CID in the
.ran
field of the received receipt. -
When we implement invocation spec we'll change endpoint such that receipts will be send in CARs that will also contain corresponding tasks. That way both invocation and task CID will be known without doing additional lookups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Gozala we decided to put workflow, invocation and task items in separate buckets to avoid S3 rate limits on a common prefix and still retain the grouping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for syncing and reporting back last night. As talked with Alan today;s morning, I like the format but that brings rate limit issues in buckets. Therefore, moved forward with:
# Request blobs likely in CAR but cid codec should tell if it's something else
${workflow.cid}/${workflow.cid}
# Pseudo symlink to `/workflow/${workflow.cid}`
${invocation.cid}/${workflow.cid}.workflow
# Receipt block issued for a specific task invocation.
${invocation.cid}/${invocation.cid}.receipt
# Block containing `out` field of the receipt. So that when we get an invocation
# with the same task we can read the result and issue receipt without rerunning
# a task. Could be written on first receipt.
${task.cid}/${task.cid}.result
# Pseudo symlinks to `/invocation/${invocation.cid}/`
# So we can lookup invocations & receipts by task
${task.cid}/${invocation.cid}.invocation
Note that we could also do:
# Pseudo symlink to `/workflow/${workflow.cid}`
${invocation.cid}/${invocation.cid}/${workflow.cid}.workflow
# Pseudo symlinks to `/invocation/${invocation.cid}/`
# So we can lookup invocations & receipts by task
${task.cid}/${task.cid}/${invocation.cid}.invocation
but I feel that it is not needed? Thoughts?
} | ||
}) | ||
|
||
// TODO: keep for historical content that we might want to process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this definitely keep it around? The act of instantiating a class registers it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it keeps in Cloudformation. in reality we could even remove it... In prod/staging it won't get deleted per https://github.com/web3-storage/w3infra/blob/main/stacks/config.js#L38
But better to keep it around for now to hook up consumers if we want to still read from there. Otherwise, we will need to hook it up with ARN id
await pRetry(() => s3client.send(putCmd)) | ||
}, | ||
/** | ||
* Get CAR bytes for a given invocation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Get CAR bytes for a given invocation. | |
* Get the workflow CID for an invocation. |
upload-api/buckets/task-store.js
Outdated
export const useTaskStore = (s3client, bucketName) => { | ||
return { | ||
/** | ||
* Put block containing `out` filed of the receipt. So that when we get an invocation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Put block containing `out` filed of the receipt. So that when we get an invocation | |
* Put block containing `out` field of the receipt, so that when we get an invocation |
upload-api/buckets/task-store.js
Outdated
* @param {string} cid | ||
* @param {Uint8Array} bytes | ||
*/ | ||
putResult: async (cid, bytes) => { | ||
const putCmd = new PutObjectCommand({ | ||
Bucket: bucketName, | ||
Key: `${cid}/${cid}.result`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd explicitly name this so it is not confusing what this CID is referring to.
* @param {string} cid | |
* @param {Uint8Array} bytes | |
*/ | |
putResult: async (cid, bytes) => { | |
const putCmd = new PutObjectCommand({ | |
Bucket: bucketName, | |
Key: `${cid}/${cid}.result`, | |
* @param {string} taskCid | |
* @param {Uint8Array} bytes | |
*/ | |
putResult: async (taskCid, bytes) => { | |
const putCmd = new PutObjectCommand({ | |
Bucket: bucketName, | |
Key: `${taskCid}/${taskCid}.result`, |
upload-api/buckets/task-store.js
Outdated
* @param {string} cid | ||
* @param {string} invocationCid | ||
*/ | ||
putIndex: async (cid, invocationCid) => { | ||
const putCmd = new PutObjectCommand({ | ||
Bucket: bucketName, | ||
Key: `${cid}/${invocationCid}.invocation`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* @param {string} cid | |
* @param {string} invocationCid | |
*/ | |
putIndex: async (cid, invocationCid) => { | |
const putCmd = new PutObjectCommand({ | |
Bucket: bucketName, | |
Key: `${cid}/${invocationCid}.invocation`, | |
* @param {string} taskCid | |
* @param {string} invocationCid | |
*/ | |
putIndex: async (taskCid, invocationCid) => { | |
const putCmd = new PutObjectCommand({ | |
Bucket: bucketName, | |
Key: `${taskCid}/${invocationCid}.invocation`, |
* @param {string} invocationCid | ||
* @param {string} workflowCid | ||
*/ | ||
putIndex: async (invocationCid, workflowCid) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
putIndex: async (invocationCid, workflowCid) => { | |
putWorkflowIndex: async (invocationCid, workflowCid) => { |
...might be better to name this "link" rather than "index"
upload-api/ucan-invocation.js
Outdated
if (contentType === CONTENT_TYPE.WORKFLOW) { | ||
return await processWorkflow(bytes, ctx) | ||
} else if (contentType === CONTENT_TYPE.RECEIPT) { | ||
return await processTaskReceipt(bytes, ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an invocation receipt.
return await processTaskReceipt(bytes, ctx) | |
return await processInvocationReceipt(bytes, ctx) |
This PR includes two significant changes:
upload-api
ucanto server now follows same pattern asaccess-api
proposal from @Gozala's feat: write invocations and receipts into ucan log w3up#592POST /ucan
endpoint to accept both receipts and CAR invocations as discussed sync yesterdayprocessUcanLogRequest
)proofs
as they can get quite big and we already talked about getting rid of it beforeinovcationCid
. In other words, we provide value from invocation so that Kinesis consumers can know aboutinvocation
andnb
from invocation to filter out and account metricsOther notes:
upload-api
hooks in the same functions for integrating with UCAN Stream as the/ucan
endpoint, avoiding the HTTP calls