Check if model folder exists on startup and request processing #3044

eliteprox · 2024-05-06T13:49:33Z

What does this pull request do? Explain your changes. (required)

This PR is dependent on livepeer/ai-runner#79

The change checks if the requested model folder exists when loading during startup (warm only) and gracefully handles the condition of a model folder missing in requests from gateway.

This improves response times on the network by immediately returning a 503 API error code when the orchestrator is missing the model and is primarily useful for cold models.
This improves orchestrator onboarding by logging the exact path the container is looking for the model in on startup and individual requests when model is not found.

Gateway error log:

I0506 09:29:28.120307 1985227 discovery.go:180] Done fetching orch info numOrch=1 responses=1/1 timedOut=false
I0506 09:29:30.600500 1985227 ai_process.go:344] clientIP=127.0.0.1 request_id=14b57a61 Error submitting request cap=27 modelID=stabilityai/stable-video-diffusion-img2vid-xt-1-1 try=1 orch=https://0.0.0.0:8936 err=Insufficient capacity for modelID=stabilityai/stable-video-diffusion-img2vid-xt-1-1
E0506 09:29:30.600545 1985227 handlers.go:1479] clientIP=127.0.0.1 request_id=14b57a61 Error with API code=503 err=no orchestrators available within 2s timeout

AI Core error log on cold model request:

I0506 09:29:28.121922 1984042 ai_http.go:198] manifestID=27_stabilityai/stable-video-diffusion-img2vid-xt-1-1 orchSessionID=8983c425 clientIP=127.0.0.1 Received request id=6156387e cap=27 modelID=stabilityai/stable-video-diffusion-img2vid-xt-1-1
2024/05/06 09:29:30 ERROR model stabilityai/stable-video-diffusion-img2vid-xt-1-1 does not exist at /livepeer/ai-core/arbitrum-one-mainnet/models/models--stabilityai--stable-video-diffusion-img2vid-xt-1-1
E0506 09:29:30.600020 1984042 handlers.go:1511] HTTP Response Error 503: Insufficient capacity for modelID=stabilityai/stable-video-diffusion-img2vid-xt-1-1

AI Core error log on startup:

2024/05/06 10:04:25 ERROR model stabilityai/stable-video-diffusion-img2vid-xt-1-1 does not exist at /livepeer/ai-core/arbitrum-one-mainnet/models/models--stabilityai--stable-video-diffusion-img2vid-xt-1-1
E0506 10:04:25.144208 2005927 starter.go:549] Error AI worker warming text-to-image container: model stabilityai/stable-video-diffusion-img2vid-xt-1-1 does not exist
I0506 10:04:25.144224 2005927 db.go:368] Closing DB

Specific updates (required)

This code checks if the given model exists on startup and when processing requests.
Uses a new method ModelExists in ai-worker that returns boolean if specific model folder exists

How did you test each of these updates (required)

Started go-livepeer with aiModels.json config containing a model that does not exist with warm set to true
Started go-livepeer with aiModels.json config containing a model that does not exist with warm set to false
Sent AI request with gateway to go-livepeer running a cold model name that doesn't exist, received immediate error response from orchestrator of 503.

Does this pull request close any open issues?
Addresses LIV-117

Checklist:

Read the contribution guide
make runs successfully
All tests in ./test.sh pass
README and other documentation updated
Pending changelog updated

server/ai_http.go

This commit updates the 'ai-worker' dependency to the latest commit.

This commit adds the `gateway` flag and deprecates the `broadcaster` flag per core team decision (details: https://discord.com/channels/423160867534929930/1051963444598943784/1210356864643109004).

) * Remove -pricePerUnit requirement for orchestrator with -AIWorker flag * refactor: add PricePerUnit comment This commit reintroduces the previously omitted comment for the PricePerUnit variable, improving code readability and maintainability. * refactor: simplify PricePerUnit flag check condition This commit simplifies the conditional check used to check if the `PricePerUnit` flag is needed. --------- Co-authored-by: Rick Staa <[email protected]>

This commit updates the https://github.com/livepeer/ai-worker to the latest version so that Orchestrators can enable the [DeepCache](https://github.com/horseee/DeepCache) optimization. This optimization will provide a 50% speedup for multi-step inference requests.

This commit ensures that the global https://pkg.go.dev/github.com/golang/mock/Mockgen package is correctly found when the binary is built using the makescript.

This commit enables the NSFW filter on the AI Subnet that has been implement at the runner side in livepeer/ai-runner#76. BREAKING CHANGE: Depending on how dApps interact with the subnet this could be a breaking change given that we return an extra `nsfw` property.

This commit updates the ai-worker so that the right go bindings are available and no nil errors are thrown.

This commit ensures that the livepeer builder is triggered when AI-version tags are used (e.g., `v0.7.2-ai-video-1`).

This commit ensures that the ai-worker is up to date so that no `nil` pointer runtime error is thrown when the runner container returns a empty response.

* refactor(census): rename Broadcaster metrics to Gateway This commit renames the metrics related to Broadcaster to Gateway, following a team decision. More details can be found in the discussion here: [Team Discussion Link](.com/channels/423160867534929930/1051963444598943784/1210356864643109004). * chore: update pending changelog

…vepeer#3061) This commit adds the `pricePerGateway` flag and deprecates the `pricePerBroadcaster` flag per core team decision (details: https://discord.com/channels/423160867534929930/1051963444598943784/1210356864643109004).

This commit introduces a safeguard to ensure that the Docker image tagged as 'stable' is only pushed when a new tag is created on the stable branch. This prevents unintended updates to the stable Docker image, ensuring consistency and reliability for users relying on the stable tag.

This commit addresses a syntax error in the Docker image tag creation step.

…vepeer#3059) * Fix nil baseprice when pricePerUnit is unused in aiWorker * fix: fix priceInfo 'nil' error on discovery This commit ensures that when the `transcodePrice` is not set by the AI orchestrator no `nil` error is thrown when a Gateway requests the orchestrators OrchInfo. * fix(ai): fix incorrect transcodePrice condition This commit fixes the check that is performed to check if transcodePrice is set. --------- Co-authored-by: Rick Staa <[email protected]>

This commit ensures that the livepeer_cli does not throw a `nil` error when it tries to retrieve the orchestrator base price.

This commit allows orchestrators to pin the https://hub.docker.com/r/livepeer/ai-runner image, preventing disruptions from breaking changes in the latest tag.

This commit updates the https://github.com/livepeer/ai-worker to the latest commit.

This commit ensures that the stable tag is created on the master branch.

* add safety check to image-to-video input image * refactor(ai): improve code syntax This commit improves the code syntax by making the output format generation step consistent between pipelines. It also updates the ai-worker to the latest version. --------- Co-authored-by: Brad P <[email protected]>

This commit updates the ai-worker dependency to the latest version (i.e. v0.0.4).

This commit updates the AI worker to v0.0.5 so that people can use the new I2I pix2pix model.

This commit updates the ai-worker to the latest version (i.e. v0.0.6) in order to fix a syntax error that was introduced due to an upstream dependency in v0.0.4 and v0.0.5.

…re calculation (livepeer#3074) * fix(ai): Fix accuracy of T2I latency score when num_inference_steps provided * refactor(ai): update numInferenceSteps default This commit ensures that the same numInferenceSteps default value is used as the one set in https://github.com/livepeer/ai-worker/blob/31fe460a45e1d9e908d3a1bdcfdd8822c3889214/runner/app/routes/text_to_image.py#L28. --------- Co-authored-by: Elite Encoder <[email protected]>

This commit ensures that the go-livepeer ai-video branch uses the latest ai-worker dependeny (i.e. v0.0.7).

* add upscale image support using stabilityai/stable-diffusion-x4-upscaler model * fix(ai): fix ai-worker client bindings This commit ensures that the right golang client bindings response and request types are used. It also cleans up the codebase a bit. --------- Co-authored-by: Mike Zupper <[email protected]>

…ilar to Text2Image and Image2Video (livepeer#3092)

…3093) This commit ensures that the I2I pipeline latency score calculation now considers the number of images.

…ivepeer#3099) This commit adds support for the `num_inference_steps` parameter to the I2I, I2V and upscale pipelines. It also fixes a incorrect latencyScore calculation for the bytedance model.

* Add speech-to-text pipeline, refactor processAIRequest and handleAIRequest to allow for various response types * Pin gomod to ai-runner for testing * Revert "Pin gomod to ai-runner for testing" This reverts commit d4ba500. * Update go mod dep for ai-worker * Calculate pixel value of audio file * fix go-mod deps * Adjust price calculation * one second per pixel * cleanup, fix missing duration * Add supported file types, calculate price by milliseconds * Add bad request response for unsupported file types * Update name of function * Update go mod to ai-runner * Use ffmpeg to get duration * update install_ffmpeg.sh to parse audio better * Check for audio codec instead of video codec * gomod edits * add docker file * Update install_ffmpeg.sh to improve audio support, Add duration validation and logging, pin lpms * rename speech-to-text to audio-to-text * Update go-mod * cleanup * update go mod * remove comment * update gomod * Update lpms mod * Update to latest lpms * Update lpms * feat(ai): apply code improvements to AudioToText pipeline This commit applies several code improvements to the AudioToText codebase. * Remove unnecessary logic * Remove unused error * Fix missing err * Update go.mod and tidy * chore(ai): update ai-worker and lpms to latest version This commit ensures that the ai-worker and lpms are at the latest versions which contain the changes needed for the audio-to-text pipeline. --------- Co-authored-by: 0xb79orch <[email protected]> Co-authored-by: Rick Staa <[email protected]>

* Add gateway metric for roundtrip ai times by model and pipeline * Rename metrics and add unique manifest * Fix name mismatch * modelsRequested not working correctly * feat: add initial POC AI gateway metrics This commit adds the initial AI gateway metrics so that they can reviewed by others. The code still need to be cleaned up and the buckets adjusted. * feat: improve AI metrics This commit improves the AI metrics so that they are easier to work with. * feat(ai): log no capacity error to metrics This commit ensures that an error is logged when the Gateway could not find orchestrators for a given model and capability. * feat(ai): add TicketValueSent and TicketsSent metrics This commit ensure that the `ticket_value_sent` abd `tickets_sent` metrics are also created for a AI Gateway. * fix(ai): ensure that AI metrics have orch address label This commit ensures that the AI gateway metrics contain the orch address label. * fix(ai): fix incorrect Gateway pricing metric This commit ensures that the AI job pricing is calculated correctly and cleans up the codebase. * refactor(ai): remove Orch label from ai_request_price metric This commit removes the Orch label from the ai_request_price metrics since that information is better to be retrieved from another endpoint. --------- Co-authored-by: Elite Encoder <[email protected]>

This commit adds the gateway metrics to the Audio-to-text pipeline.

* Add gateway metric for roundtrip ai times by model and pipeline * Rename metrics and add unique manifest * Fix name mismatch * modelsRequested not working correctly * feat: add initial POC AI gateway metrics This commit adds the initial AI gateway metrics so that they can reviewed by others. The code still need to be cleaned up and the buckets adjusted. * feat: improve AI metrics This commit improves the AI metrics so that they are easier to work with. * feat(ai): log no capacity error to metrics This commit ensures that an error is logged when the Gateway could not find orchestrators for a given model and capability. * feat(ai): add TicketValueSent and TicketsSent metrics This commit ensure that the `ticket_value_sent` abd `tickets_sent` metrics are also created for a AI Gateway. * fix(ai): ensure that AI metrics have orch address label This commit ensures that the AI gateway metrics contain the orch address label. * feat(ai): add orchestrator AI census metrics This commit introduces a suite of AI orchestrator metrics to the census module, mirroring those received by the Gateway. The newly added metrics include `ai_models_requested`, `ai_request_latency_score`, `ai_request_price`, and `ai_request_errors`, facilitating comprehensive tracking and analysis of AI request handling performance on the orchestrator side. * refactor: improve orchestrator metrics tags This commit ensures that the right tags are attached to the Orchestrator AI metrics. * refactor(ai): improve latency score calculations This commit ensures that no devide by zero errors can occur in the latency score calculations. --------- Co-authored-by: Elite Encoder <[email protected]>

This commit applies some small comment changes to ease the conflicts between the main and ai-video branch.

Check for model folder when creating container or processing request

f0c5604

github-actions bot added the AI Issues and PR related to the AI-video branch. label May 6, 2024

eliteprox changed the title ~~Check-model-folder~~ Check if model folder exists on startup and request processing May 6, 2024

eliteprox commented May 6, 2024

View reviewed changes

server/ai_http.go Outdated Show resolved Hide resolved

Merge branch 'ai-video' into check-model-folder

41ecad3

eliteprox marked this pull request as ready for review May 6, 2024 14:36

eliteprox requested a review from rickstaa as a code owner May 6, 2024 14:36

eliteprox and others added 23 commits May 7, 2024 07:36

Remove comment

6a650eb

Patching go.mod for ci

4f2878e

Adding go sum

6d8aa0c

chore(ai): update 'ai-worker' dependency

2c2c954

This commit updates the 'ai-worker' dependency to the latest commit.

feat: add '-gateway' and deprecate '-broadcaster' (livepeer#3048)

1477aa8

This commit adds the `gateway` flag and deprecates the `broadcaster` flag per core team decision (details: https://discord.com/channels/423160867534929930/1051963444598943784/1210356864643109004).

chore: fix Mockgen dependency error

470f241

This commit ensures that the global https://pkg.go.dev/github.com/golang/mock/Mockgen package is correctly found when the binary is built using the makescript.

ci(ai): ensure docker builder is build and pushed

37f60f2

chore(ai): update ai-worker version

9e29834

This commit updates the ai-worker so that the right go bindings are available and no nil errors are thrown.

ci(ai): ensure livepeer builder builds on AI version tags

aa6abeb

This commit ensures that the livepeer builder is triggered when AI-version tags are used (e.g., `v0.7.2-ai-video-1`).

fix: apply runner nil error fix (livepeer#3058)

540e6f5

This commit ensures that the ai-worker is up to date so that no `nil` pointer runtime error is thrown when the runner container returns a empty response.

ci: fix syntax error in Docker action tags

e61eae6

This commit addresses a syntax error in the Docker image tag creation step.

fix(ai): fix cli prices nil error (livepeer#3063)

c5fe561

This commit ensures that the livepeer_cli does not throw a `nil` error when it tries to retrieve the orchestrator base price.

feat: add -aiRunnerImage flag to pin docker image ver (livepeer#3064)

a178197

This commit allows orchestrators to pin the https://hub.docker.com/r/livepeer/ai-runner image, preventing disruptions from breaking changes in the latest tag.

chore(ai): update ai-worker dependency

4424cb6

This commit updates the https://github.com/livepeer/ai-worker to the latest commit.

ci(docker): ensure stable tag is created on master branch

2cb0021

This commit ensures that the stable tag is created on the master branch.

rickstaa and others added 15 commits July 26, 2024 14:57

chore(ai): update ai-worker version

78ad6b3

This commit updates the ai-worker dependency to the latest version (i.e. v0.0.4).

chore(ai): update ai-worker to v0.0.5

ad63992

This commit updates the AI worker to v0.0.5 so that people can use the new I2I pix2pix model.

chore(ai): update ai-worker to latest version

8cf06a5

This commit updates the ai-worker to the latest version (i.e. v0.0.6) in order to fix a syntax error that was introduced due to an upstream dependency in v0.0.4 and v0.0.5.

chore(ai): update to latest ai-worker

fca00e7

This commit ensures that the go-livepeer ai-video branch uses the latest ai-worker dependeny (i.e. v0.0.7).

chore: update Image2Image and Upscale OS storage to use requestID sim…

0fb62b7

…ilar to Text2Image and Image2Video (livepeer#3092)

fix(ai): account for number of images in I2I latency score (livepeer#…

d3881a6

…3093) This commit ensures that the I2I pipeline latency score calculation now considers the number of images.

feat(ai): add 'num_inference_steps' to I2I,I2V and upscale pipeliens (l…

3077092

…ivepeer#3099) This commit adds support for the `num_inference_steps` parameter to the I2I, I2V and upscale pipelines. It also fixes a incorrect latencyScore calculation for the bytedance model.

feat(ai): add A2T gateway metrics (livepeer#3100)

c57a53b

This commit adds the gateway metrics to the Audio-to-text pipeline.

ci(ai): improve ci comments

3f35c4f

This commit applies some small comment changes to ease the conflicts between the main and ai-video branch.

Adding go sum

cab3198

eliteprox closed this Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check if model folder exists on startup and request processing #3044

Check if model folder exists on startup and request processing #3044

eliteprox commented May 6, 2024 •

edited

Loading

Check if model folder exists on startup and request processing #3044

Check if model folder exists on startup and request processing #3044

Conversation

eliteprox commented May 6, 2024 • edited Loading

eliteprox commented May 6, 2024 •

edited

Loading