Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: ucan stream endpoint and upload api with invocations and receipts #172

Conversation

vasco-santos
Copy link
Contributor

@vasco-santos vasco-santos commented Mar 22, 2023

This PR includes two significant changes:

  • upload-api ucanto server now follows same pattern as access-api proposal from @Gozala's feat: write invocations and receipts into ucan log w3up#592
    • saves CAR file with UCAN invocations before executing them
    • sends one receipt per invocation when they are executed
  • modifies POST /ucan endpoint to accept both receipts and CAR invocations as discussed sync yesterday
    • it identifies by ContentType wether is a CAR of invocations or a Receipt (processUcanLogRequest)
    • for CAR of invocations, everything is quite similar as before:
      • parse CAR from bytes, persist CAR and send a view of it to Kinesis
      • more than persisting the CARs, we also persist a map of invocation CIDs to CARs where they live in same fashion as we do with DUDEWHERE
      • kinesis now also receives a type property, which can either be Invocation, or receipt like the actual endpoint
      • NOTE: removed proofs as they can get quite big and we already talked about getting rid of it before
    • for receipts:
      • we parse the receipt CBOR, verify if given receipt is from an Invocation that we previously got. If so, we can persist the Receipt
      • Kinesis stream receives quite similar content to the invocation. Main differences are that a receipt type is used, and we also include the inovcationCid. In other words, we provide value from invocation so that Kinesis consumers can know about invocation and nb from invocation to filter out and account metrics

Other notes:

  • upload-api hooks in the same functions for integrating with UCAN Stream as the /ucan endpoint, avoiding the HTTP calls
  • Storing receipts and invocations Bucket keys were decided yesterday.
  • Like feat: write invocations and receipts into ucan log w3up#592, we aim to get to ucanto part of the code that we currently inline to create the receipts
  • Will do a follow up PR for metric consumers to update to new world

@seed-deploy seed-deploy bot temporarily deployed to pr172 March 22, 2023 14:59 Inactive
@seed-deploy
Copy link

seed-deploy bot commented Mar 22, 2023

View stack outputs
  • pr172-w3infra-CarparkStack

    Name Value
    BucketName carpark-pr172-0
    Region us-east-2
  • pr172-w3infra-SatnavStack

    Name Value
    BucketName satnav-pr172-0
    Region us-east-2
  • pr172-w3infra-UploadApiStack

    Name Value
    ApiEndpoint https://6obv024to0.execute-api.us-east-2.amazonaws.com
    CustomDomain https://pr172.up.web3.storage
  • pr172-w3infra-BusStack

  • pr172-w3infra-ReplicatorStack

  • pr172-w3infra-UcanInvocationStack

  • pr172-w3infra-UploadDbStack

@vasco-santos vasco-santos force-pushed the feat/ucan-stream-endpoint-and-upload-api-with-invocations-and-receipts branch from 852f765 to c2a0004 Compare March 22, 2023 15:33
@seed-deploy seed-deploy bot temporarily deployed to pr172 March 22, 2023 15:33 Inactive
@vasco-santos vasco-santos force-pushed the feat/ucan-stream-endpoint-and-upload-api-with-invocations-and-receipts branch from c2a0004 to 2337bd4 Compare March 22, 2023 16:34
@seed-deploy seed-deploy bot temporarily deployed to pr172 March 22, 2023 16:34 Inactive
@vasco-santos vasco-santos marked this pull request as ready for review March 22, 2023 16:45
@vasco-santos vasco-santos requested review from Gozala and alanshaw March 22, 2023 16:45
putInvocation: async (invocationCid, carCid) => {
const putCmd = new PutObjectCommand({
Bucket: bucketName,
Key: `${invocationCid}/${carCid}.invocation`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏 Please can you document the bucket structure on a README somewhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand this correctly, we get a single CAR file (an invocation), and each root is the CID of a task

I think we've been calling tasks => invocations and invocations => requests, but we should probably orient more around invocation+task.

I think we just need to rename invocationCid to taskCid and carCid to invocationCid...right?

Our mapping would be:

# CAR file of invocation, roots are tasks
/invocation-cid/invocation-cid.car

# empty file, pointer back to invocation
/task-cid/invocation-cid.invocation

# CBOR bytes of receipt
/task-cid/receipt-cid.receipt

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @Gozala

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will add docs for that!

I was under the impression that a CAR file had multiple invocations, that is a batch of invocations. So carCid is CID of a batch, each invocationCid is the taskCid within the batch.

But yes, would love @Gozala thoughts to converge to UCAN flavour naming

Copy link
Contributor

@Gozala Gozala Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we've been calling tasks => invocations and invocations => requests, but we should probably orient more around invocation+task.

Lets not overload invocation plz because of the invocation spec according to which invocation is essentially a singed task. What I was calling a request would be a set or a batch of invocations, or a query in GraphQL terms.

I just checked with IPVM team and they call those workflows, so I suggest we embrace than name as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From above, looks like there is no reason for us to put receipt.cid in receipt name as this PR has. But rather just /${task.cid}/${invocation.cid}.receipt

There might be a reason to capture receipt cid in the path e.g. to signal encoding so we could do following instead

/${workflow.cid}/${workflow.cid}.workflow
/${task.cid}/${invocation.cid}/${workflow.cid}.workflow
/${task.cid}/${invocation.cid}/${receipt.cid}.receipt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we should just suffix as "extension" for now?

I mean we already have a tag in CID for this so why not just embrace it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me and @alanshaw had a chat and came up with a following plan

  1. This is the layout that makes most sense in the future and should map well to what we have now:
# Request blobs likely in CAR but cid codec should tell if it's something else
/workflow/${workflow.cid}

# Pseudo symlink to `/workflow/${workflow.cid}`
/invocation/${invocation.cid}/${workflow.cid}.workflow

# Receipt block issued for a specific task invocation.
/invocation/${invocation.cid}.receipt

# Block containing `out` field of the receipt. So that when we get an invocation
# with the same task we can read the result and issue receipt without rerunning
# a task. Could be written on first receipt.
/task/${task.cid}.result

# Pseudo symlinks to `/invocation/${invocation.cid}/`
# So we can lookup invocations & receipts by task
/task/${task.cid}/${invocation.cid}.invocation
  1. For now we'll use delegation CID as both invocation and task CID. Delegation CIDs are the roots in the received CAR and CID in the .ran field of the received receipt.

  2. When we implement invocation spec we'll change endpoint such that receipts will be send in CARs that will also contain corresponding tasks. That way both invocation and task CID will be known without doing additional lookups.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gozala we decided to put workflow, invocation and task items in separate buckets to avoid S3 rate limits on a common prefix and still retain the grouping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for syncing and reporting back last night. As talked with Alan today;s morning, I like the format but that brings rate limit issues in buckets. Therefore, moved forward with:

# Request blobs likely in CAR but cid codec should tell if it's something else
${workflow.cid}/${workflow.cid}

# Pseudo symlink to `/workflow/${workflow.cid}`
${invocation.cid}/${workflow.cid}.workflow

# Receipt block issued for a specific task invocation.
${invocation.cid}/${invocation.cid}.receipt

# Block containing `out` field of the receipt. So that when we get an invocation
# with the same task we can read the result and issue receipt without rerunning
# a task. Could be written on first receipt.
${task.cid}/${task.cid}.result

# Pseudo symlinks to `/invocation/${invocation.cid}/`
# So we can lookup invocations & receipts by task
${task.cid}/${invocation.cid}.invocation

Note that we could also do:

# Pseudo symlink to `/workflow/${workflow.cid}`
${invocation.cid}/${invocation.cid}/${workflow.cid}.workflow

# Pseudo symlinks to `/invocation/${invocation.cid}/`
# So we can lookup invocations & receipts by task
${task.cid}/${task.cid}/${invocation.cid}.invocation

but I feel that it is not needed? Thoughts?

@vasco-santos vasco-santos added this to the w3up phase 3 milestone Mar 22, 2023
@seed-deploy seed-deploy bot temporarily deployed to pr172 March 23, 2023 10:30 Inactive
@seed-deploy seed-deploy bot temporarily deployed to pr172 March 23, 2023 10:38 Inactive
@seed-deploy seed-deploy bot temporarily deployed to pr172 March 23, 2023 11:11 Inactive
@seed-deploy seed-deploy bot temporarily deployed to pr172 March 23, 2023 11:55 Inactive
@seed-deploy seed-deploy bot temporarily deployed to pr172 March 23, 2023 12:03 Inactive
@seed-deploy seed-deploy bot temporarily deployed to pr172 March 23, 2023 12:06 Inactive
@vasco-santos vasco-santos requested a review from alanshaw March 23, 2023 12:14
}
})

// TODO: keep for historical content that we might want to process
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this definitely keep it around? The act of instantiating a class registers it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it keeps in Cloudformation. in reality we could even remove it... In prod/staging it won't get deleted per https://github.com/web3-storage/w3infra/blob/main/stacks/config.js#L38

But better to keep it around for now to hook up consumers if we want to still read from there. Otherwise, we will need to hook it up with ARN id

await pRetry(() => s3client.send(putCmd))
},
/**
* Get CAR bytes for a given invocation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Get CAR bytes for a given invocation.
* Get the workflow CID for an invocation.

export const useTaskStore = (s3client, bucketName) => {
return {
/**
* Put block containing `out` filed of the receipt. So that when we get an invocation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Put block containing `out` filed of the receipt. So that when we get an invocation
* Put block containing `out` field of the receipt, so that when we get an invocation

Comment on lines 35 to 41
* @param {string} cid
* @param {Uint8Array} bytes
*/
putResult: async (cid, bytes) => {
const putCmd = new PutObjectCommand({
Bucket: bucketName,
Key: `${cid}/${cid}.result`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd explicitly name this so it is not confusing what this CID is referring to.

Suggested change
* @param {string} cid
* @param {Uint8Array} bytes
*/
putResult: async (cid, bytes) => {
const putCmd = new PutObjectCommand({
Bucket: bucketName,
Key: `${cid}/${cid}.result`,
* @param {string} taskCid
* @param {Uint8Array} bytes
*/
putResult: async (taskCid, bytes) => {
const putCmd = new PutObjectCommand({
Bucket: bucketName,
Key: `${taskCid}/${taskCid}.result`,

Comment on lines 50 to 56
* @param {string} cid
* @param {string} invocationCid
*/
putIndex: async (cid, invocationCid) => {
const putCmd = new PutObjectCommand({
Bucket: bucketName,
Key: `${cid}/${invocationCid}.invocation`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param {string} cid
* @param {string} invocationCid
*/
putIndex: async (cid, invocationCid) => {
const putCmd = new PutObjectCommand({
Bucket: bucketName,
Key: `${cid}/${invocationCid}.invocation`,
* @param {string} taskCid
* @param {string} invocationCid
*/
putIndex: async (taskCid, invocationCid) => {
const putCmd = new PutObjectCommand({
Bucket: bucketName,
Key: `${taskCid}/${invocationCid}.invocation`,

* @param {string} invocationCid
* @param {string} workflowCid
*/
putIndex: async (invocationCid, workflowCid) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
putIndex: async (invocationCid, workflowCid) => {
putWorkflowIndex: async (invocationCid, workflowCid) => {

...might be better to name this "link" rather than "index"

if (contentType === CONTENT_TYPE.WORKFLOW) {
return await processWorkflow(bytes, ctx)
} else if (contentType === CONTENT_TYPE.RECEIPT) {
return await processTaskReceipt(bytes, ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an invocation receipt.

Suggested change
return await processTaskReceipt(bytes, ctx)
return await processInvocationReceipt(bytes, ctx)

@seed-deploy seed-deploy bot temporarily deployed to pr172 March 23, 2023 13:50 Inactive
@vasco-santos vasco-santos merged commit 2f4b293 into main Mar 23, 2023
@vasco-santos vasco-santos deleted the feat/ucan-stream-endpoint-and-upload-api-with-invocations-and-receipts branch March 23, 2023 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants