[Bugfix] Fix test_long_context.py and activation kernels #12111

jeejeelee · 2025-01-16T08:50:05Z

test_long_context.py Failure

The test_long_context.py is currently failing (see failure details at: test_long_context failure). The issue can be reproduced using the following code:

from vllm import LLM, SamplingParams
from vllm.distributed import cleanup_dist_env_and_memory


# Sample prompts.
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.0, top_p=0.95)

model_name="model_path"
llm = LLM(
    model=model_name,
    trust_remote_code=True,
    max_model_len=128,
    max_num_seqs=16,
    enforce_eager=True,
    tensor_parallel_size=1,
)
del llm
cleanup_dist_env_and_memory()
llm = LLM(
    model=model_name,
    trust_remote_code=True,
    max_model_len=128,
    max_num_seqs=16,
    enforce_eager=True,
    tensor_parallel_size=2,
)

outputs = llm.generate(prompts, sampling_params)

The error occurs because during the platform check, CUDA_VISIBLE_DEVICES is written to environment variables, and the env var are not properly cleaned up in cleanup_dist_env_and_memory, resulting in the error mentioned above. @youkaichao Since I'm not sure if this is the expected behavior, I'm only deleting it in test_lora_context.py

Also fixed the activation kernel bug

Signed-off-by: Jee Jee Li <[email protected]>

github-actions · 2025-01-16T08:50:18Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

youkaichao · 2025-01-17T10:12:32Z

vllm/model_executor/layers/activation.py

@@ -30,7 +30,7 @@ class FatreluAndMul(CustomOp):
    def __init__(self, threshold: float = 0.):
        super().__init__()
        self.threshold = threshold
-        if current_platform.is_cuda_alike() or current_platform.is_cpu():
+        if current_platform.is_cuda_alike():


this is also solved in #12150 . I prefer to merge that PR to fix cpu ci test.

OK, I think it makes more sense to fix this LoRA failure in #12102, which can save one full round of CI . So I closed this PR, what do you think?

youkaichao · 2025-01-17T13:34:25Z

an update of the test script, it should be:

if __name__ == "__main__":
    from vllm import LLM, SamplingParams
    from vllm.distributed import cleanup_dist_env_and_memory


    # Sample prompts.
    prompts = [
        "Hello, my name is",
        "The president of the United States is",
        "The capital of France is",
        "The future of AI is",
    ]
    # Create a sampling params object.
    sampling_params = SamplingParams(temperature=0.0, top_p=0.95)

    model_name="facebook/opt-125m"
    llm = LLM(
        model=model_name,
        trust_remote_code=True,
        max_model_len=128,
        max_num_seqs=16,
        enforce_eager=True,
        tensor_parallel_size=1,
    )
    del llm
    cleanup_dist_env_and_memory()
    llm = LLM(
        model=model_name,
        trust_remote_code=True,
        max_model_len=128,
        max_num_seqs=16,
        enforce_eager=True,
        tensor_parallel_size=2,
    )

    outputs = llm.generate(prompts, sampling_params)

Done

068ae2a

Signed-off-by: Jee Jee Li <[email protected]>

jeejeelee requested review from rkooo567 and youkaichao and removed request for rkooo567 January 16, 2025 08:50

jeejeelee changed the title ~~[[Bugfix] Fix test_long_context.py and activation kernels~~ [Bugfix] Fix test_long_context.py and activation kernels Jan 16, 2025

joerunde mentioned this pull request Jan 16, 2025

[Kernel][Model] PagedAttention: Support custom attention bias for T5 model (1/2) #11334

Open

This was referenced Jan 17, 2025

[Model] LoRA Support for Ultravox model #11253

Merged

[Misc][LoRA] Improve the readability of LoRA error messages during loading #12102

Merged

youkaichao reviewed Jan 17, 2025

View reviewed changes

jeejeelee closed this Jan 17, 2025

jeejeelee deleted the fix-test branch January 20, 2025 02:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix test_long_context.py and activation kernels #12111

[Bugfix] Fix test_long_context.py and activation kernels #12111

jeejeelee commented Jan 16, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 16, 2025

youkaichao Jan 17, 2025

jeejeelee Jan 17, 2025

youkaichao commented Jan 17, 2025

[Bugfix] Fix test_long_context.py and activation kernels #12111

[Bugfix] Fix test_long_context.py and activation kernels #12111

Conversation

jeejeelee commented Jan 16, 2025 • edited by github-actions bot Loading

test_long_context.py Failure

github-actions bot commented Jan 16, 2025

youkaichao Jan 17, 2025

Choose a reason for hiding this comment

jeejeelee Jan 17, 2025

Choose a reason for hiding this comment

youkaichao commented Jan 17, 2025

jeejeelee commented Jan 16, 2025 •

edited by github-actions bot

Loading