Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(vdp): upload raw inputs for run log #904

Merged
merged 9 commits into from
Dec 5, 2024

Conversation

chuang8511
Copy link
Contributor

Because

  • now, the run log inputs are raw data but not blob storage
  • we want the inputs are download urls rather than dataURI

This commit

  • move upload pipeline run inputs from worker to service
    • remove the worker upload inputs function
    • add the service upload inputs function

Copy link

linear bot commented Dec 2, 2024

@@ -1064,36 +1103,44 @@ func (s *service) preTriggerPipeline(ctx context.Context, ns resource.Namespace,
if err != nil {
return err
}
uploadingPipelineData[idx][k] = v.GetStringValue()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chuang8511
The URL might still be a dataURI (base64). If the user continues using dataURI or base64 in the input, we should save it to our blob storage and replace it with the corresponding blob URL. Could you check if this change can be implemented quickly? I guess we can modify lines L868–L907 to handle this.

Copy link
Contributor Author

@chuang8511 chuang8511 Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL might still be a dataURI (base64). If the user continues using dataURI or base64 in the input

Do you mean when the users call API to trigger pipeline but not from Console?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that’s one of the possibilities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in this commit.

@chuang8511 chuang8511 marked this pull request as draft December 2, 2024 18:24
@chuang8511 chuang8511 marked this pull request as ready for review December 3, 2024 11:27
@chuang8511 chuang8511 requested a review from donch1989 December 3, 2024 14:32
Copy link
Collaborator

@jvallesm jvallesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to follow the changes as best as I could. You detailed the proposed flow in the ticket, could you use that to document what preTriggerActivity does? A comment in the method description and some comments along the function would be very helpful 🙏.

pkg/service/pipelinerun.go Outdated Show resolved Hide resolved
pkg/service/pipelinerun.go Outdated Show resolved Hide resolved
pkg/service/pipelinerun.go Outdated Show resolved Hide resolved
pkg/service/pipelinerun.go Show resolved Hide resolved
pkg/service/blob_storage.go Outdated Show resolved Hide resolved
pkg/service/main.go Show resolved Hide resolved
pkg/service/blob_storage.go Outdated Show resolved Hide resolved
pkg/service/blob_storage.go Outdated Show resolved Hide resolved
pkg/service/pipeline.go Show resolved Hide resolved
@chuang8511 chuang8511 requested a review from jvallesm December 4, 2024 14:35
Comment on lines 950 to +951
variable[k] = data.NewNull()
uploadingPipelineData[idx][k] = nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small question: what will be the difference between variable and uploadingPipelineData[idx] at the end of each iteration (besides the type data.Map and map[string]any, I mean in terms of data)? If it's the same, could we transform one to the other (I think map[string]any -> data.Map would be easiest but I haven't checked the details) at the end of the iteration and handle a single variable along the way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the format is unstructured data, the data will be different.

e.g. format image

  • variable -> will be blob data
  • uploadingPipelineData[idx] -> will be the public URL that the console / users can access to.

Originally, I plan to write this part of code by

  • Loop to set uploadingPipelineData.
  • Upload data
  • Loop to set variable

But, I think the current way is easier, so I did not choose my original plan.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep my suggestion was only if the data was the same. The function is quite long so I missed these small details 😅 Thanks for the clarification, in that case the code makes sense as it is (we can work on it later to make it more readable and compact).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recorded in ins-7071

pkg/service/blob_storage.go Outdated Show resolved Hide resolved
pkg/service/blob_storage.go Outdated Show resolved Hide resolved
@chuang8511 chuang8511 requested a review from jvallesm December 4, 2024 17:41
@chuang8511 chuang8511 merged commit 960f4c2 into main Dec 5, 2024
11 checks passed
@chuang8511 chuang8511 deleted the chunhao/ins-7036-run-log-blob-storage branch December 5, 2024 09:54
donch1989 pushed a commit that referenced this pull request Dec 9, 2024
🤖 I have created a release *beep* *boop*
---


##
[0.48.4-beta](v0.48.3-beta...v0.48.4-beta)
(2024-12-09)


### Features

* **pinecone:** Add rerank task for Pinecone component
([#773](#773))
([e1fd611](e1fd611))
* **vdp:** upload raw inputs for run log
([#904](#904))
([960f4c2](960f4c2))


### Bug Fixes

* **component,http:** fix the request body marshalling
([#928](#928))
([b47cc71](b47cc71))


### Miscellaneous Chores

* release v0.48.4-beta
([b08878e](b08878e))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
jvallesm pushed a commit that referenced this pull request Dec 10, 2024
🤖 I have created a release *beep* *boop*
---


##
[0.48.4-beta](v0.48.3-beta...v0.48.4-beta)
(2024-12-09)


### Features

* **pinecone:** Add rerank task for Pinecone component
([#773](#773))
([e1fd611](e1fd611))
* **vdp:** upload raw inputs for run log
([#904](#904))
([960f4c2](960f4c2))


### Bug Fixes

* **component,http:** fix the request body marshalling
([#928](#928))
([b47cc71](b47cc71))


### Miscellaneous Chores

* release v0.48.4-beta
([b08878e](b08878e))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: No status
Development

Successfully merging this pull request may close these issues.

4 participants