Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] Fix test_long_context.py and activation kernels #12111

Closed
wants to merge 1 commit into from

Conversation

jeejeelee
Copy link
Collaborator

@jeejeelee jeejeelee commented Jan 16, 2025

test_long_context.py Failure

The test_long_context.py is currently failing (see failure details at: test_long_context failure). The issue can be reproduced using the following code:

from vllm import LLM, SamplingParams
from vllm.distributed import cleanup_dist_env_and_memory


# Sample prompts.
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.0, top_p=0.95)

model_name="model_path"
llm = LLM(
    model=model_name,
    trust_remote_code=True,
    max_model_len=128,
    max_num_seqs=16,
    enforce_eager=True,
    tensor_parallel_size=1,
)
del llm
cleanup_dist_env_and_memory()
llm = LLM(
    model=model_name,
    trust_remote_code=True,
    max_model_len=128,
    max_num_seqs=16,
    enforce_eager=True,
    tensor_parallel_size=2,
)

outputs = llm.generate(prompts, sampling_params)

The error occurs because during the platform check, CUDA_VISIBLE_DEVICES is written to environment variables, and the env var are not properly cleaned up in cleanup_dist_env_and_memory, resulting in the error mentioned above. @youkaichao Since I'm not sure if this is the expected behavior, I'm only deleting it in test_lora_context.py

Also fixed the activation kernel bug

Signed-off-by: Jee Jee Li <[email protected]>
Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@jeejeelee jeejeelee requested review from rkooo567 and youkaichao and removed request for rkooo567 January 16, 2025 08:50
@jeejeelee jeejeelee changed the title [[Bugfix] Fix test_long_context.py and activation kernels [Bugfix] Fix test_long_context.py and activation kernels Jan 16, 2025
@@ -30,7 +30,7 @@ class FatreluAndMul(CustomOp):
def __init__(self, threshold: float = 0.):
super().__init__()
self.threshold = threshold
if current_platform.is_cuda_alike() or current_platform.is_cpu():
if current_platform.is_cuda_alike():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also solved in #12150 . I prefer to merge that PR to fix cpu ci test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think it makes more sense to fix this LoRA failure in #12102, which can save one full round of CI . So I closed this PR, what do you think?

@youkaichao
Copy link
Member

an update of the test script, it should be:

if __name__ == "__main__":
    from vllm import LLM, SamplingParams
    from vllm.distributed import cleanup_dist_env_and_memory


    # Sample prompts.
    prompts = [
        "Hello, my name is",
        "The president of the United States is",
        "The capital of France is",
        "The future of AI is",
    ]
    # Create a sampling params object.
    sampling_params = SamplingParams(temperature=0.0, top_p=0.95)

    model_name="facebook/opt-125m"
    llm = LLM(
        model=model_name,
        trust_remote_code=True,
        max_model_len=128,
        max_num_seqs=16,
        enforce_eager=True,
        tensor_parallel_size=1,
    )
    del llm
    cleanup_dist_env_and_memory()
    llm = LLM(
        model=model_name,
        trust_remote_code=True,
        max_model_len=128,
        max_num_seqs=16,
        enforce_eager=True,
        tensor_parallel_size=2,
    )

    outputs = llm.generate(prompts, sampling_params)

@jeejeelee jeejeelee closed this Jan 17, 2025
@jeejeelee jeejeelee deleted the fix-test branch January 20, 2025 02:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants