Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] ConsisID #10140

Merged
merged 72 commits into from
Jan 19, 2025
Merged

[core] ConsisID #10140

merged 72 commits into from
Jan 19, 2025

Conversation

SHYuanBest
Copy link
Contributor

@SHYuanBest SHYuanBest commented Dec 6, 2024

What does this PR do?

Add support for ConsisID (#10100)

Paper: https://arxiv.org/abs/2411.17440
Project: https://pku-yuangroup.github.io/ConsisID
Code: https://github.com/PKU-YuanGroup/ConsisID
Demo: https://huggingface.co/spaces/BestWishYsh/ConsisID-preview-Space

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@SHYuanBest
Copy link
Contributor Author

@a-r-r-o-w Do we need to create a branch of huggingface: ConsisID, or I just use SHYuanBest: main?

@a-r-r-o-w
Copy link
Member

SHYuanBest:main works. This is just a branch from your diffusers fork to HF diffusers library, so you are free to make any changes you'd like here. Looking forward to the ConsisID changes!

@SHYuanBest
Copy link
Contributor Author

SHYuanBest commented Dec 10, 2024

@a-r-r-o-w @HuggingFaceDocBuilderDev hi, I have add consisid to this branch, can you help us to reveiew the code? Is there anything else I missed?

import torch
from diffusers import ConsisIDPipeline
from diffusers.pipelines.consisid.consisid_utils import prepare_face_models, process_face_embeddings_infer
from diffusers.utils import export_to_video
from huggingface_hub import snapshot_download

snapshot_download(repo_id="BestWishYsh/ConsisID-preview", local_dir="BestWishYsh/ConsisID-preview")

face_helper_1, face_helper_2, face_clip_model, face_main_model, eva_transform_mean, eva_transform_std = prepare_face_models("BestWishYsh/ConsisID-preview", device="cuda", dtype=torch.bfloat16)

pipe = ConsisIDPipeline.from_pretrained("BestWishYsh/ConsisID-preview", torch_dtype=torch.bfloat16)
pipe.to("cuda")

prompt = "The video captures a boy walking along a city street, filmed in black and white on a classic 35mm camera. His expression is thoughtful, his brow slightly furrowed as if he's lost in contemplation. The film grain adds a textured, timeless quality to the image, evoking a sense of nostalgia. Around him, the cityscape is filled with vintage buildings, cobblestone sidewalks, and softly blurred figures passing by, their outlines faint and indistinct. Streetlights cast a gentle glow, while shadows play across the boy's path, adding depth to the scene. The lighting highlights the boy's subtle smile, hinting at a fleeting moment of curiosity. The overall cinematic atmosphere, complete with classic film still aesthetics and dramatic contrasts, gives the scene an evocative and introspective feel."
image = "https://github.com/PKU-YuanGroup/ConsisID/blob/main/asserts/example_images/2.png?raw=true"

id_cond, id_vit_hidden, image, face_kps = process_face_embeddings_infer(face_helper_1, face_clip_model, face_helper_2, eva_transform_mean, eva_transform_std, face_main_model, "cuda", torch.bfloat16, image, is_align_face=True)

video = pipe(image=image, prompt=prompt, use_dynamic_cfg=False, id_vit_hidden=id_vit_hidden, id_cond=id_cond, kps_cond=face_kps, generator=torch.Generator("cuda").manual_seed(42))
export_to_video(video.frames[0], "output.mp4", fps=8)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@SHYuanBest SHYuanBest requested a review from hlky December 10, 2024 09:04
@SHYuanBest
Copy link
Contributor Author

SHYuanBest commented Dec 11, 2024

@a-r-r-o-w @hlky hi, what should I do next?

@SHYuanBest
Copy link
Contributor Author

SHYuanBest commented Dec 22, 2024

to do:

  • Make the test script very small and pass all (model, pipeline, lora).
  • Check if test_vae_tiling requires expected_max_diff==0.35.
  • Have a conversion script about nn.Sequential.
  • Merge https://huggingface.co/datasets/huggingface/documentation-images/discussions/406 and update the Doc links.

@a-r-r-o-w
Copy link
Member

@SHYuanBest Great work on the changes! We will try and integrate this soon and target it for next diffusers release (we have one this week, which is why we've been very busy). On your end, I think we are mostly good with the changes, and just need to address some minor concerns for diffusers-side integration. I will let YiYi comment and do her review first and then we can tackle the remaining things

@SHYuanBest
Copy link
Contributor Author

@a-r-r-o-w @yiyixuxu That's great, much thanks for your great support! Looking forward to merge.

@a-r-r-o-w
Copy link
Member

Gentle ping @yiyixuxu

@a-r-r-o-w a-r-r-o-w requested a review from yiyixuxu January 4, 2025 21:50
@SHYuanBest
Copy link
Contributor Author

Could you help review the code and merge? Thanks @yiyixuxu

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@SHYuanBest
Copy link
Contributor Author

@SHYuanBest Great work on the changes! We will try and integrate this soon and target it for next diffusers release (we have one this week, which is why we've been very busy). On your end, I think we are mostly good with the changes, and just need to address some minor concerns for diffusers-side integration. I will let YiYi comment and do her review first and then we can tackle the remaining things

@a-r-r-o-w Hi, it seem that yiyixuxu have approved the changes, could you help merge https://huggingface.co/datasets/huggingface/documentation-images/discussions/406 (so that i can update the doc links) and tackle the remaining things, thansk!

@a-r-r-o-w
Copy link
Member

@SHYuanBest I've merged the doc PR just now :) Will do some last refactors after your changes and proceed to merge. I think it's okay to not have a conversion script in this specific case, so please don't worry about that

@SHYuanBest
Copy link
Contributor Author

@a-r-r-o-w Thanks a lot! And have update the docs link.

@a-r-r-o-w
Copy link
Member

@SHYuanBest Could you give the latest changes a look? It seems to be working for me locally as expected.

The major changes in refactor are:

  • Removed lora loader specific to ConsisID. We can re-use CogVideoX lora loader here because the underlying transformer architecture is same and there is a very low probability that users will train loras for other modeling components
  • Removed helper functions where not required

@SHYuanBest
Copy link
Contributor Author

SHYuanBest commented Jan 18, 2025

@a-r-r-o-w Thanks, I have looked the latest changs, it is good to me and the code can run nomally as expected.

@a-r-r-o-w a-r-r-o-w merged commit 23b467c into huggingface:main Jan 19, 2025
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
roadmap Add to current release roadmap
Projects
Development

Successfully merging this pull request may close these issues.

6 participants