[Feature]: Does VLLM only support MistralModel Architecture for embedding? #7915

hahmad2008 · 2024-08-27T14:49:51Z

🚀 The feature, motivation and pitch

Does VLLM only support MistralModel Architecture for embedding?

_EMBEDDING_MODELS = {
    "MistralModel": ("llama_embedding", "LlamaEmbeddingModel"),
}

I tried to force embedding_mode to true model_config.embedding_mode = True, this error raised:

Activating the server engine with embedding enabled.
INFO 08-27 14:54:06 async_llm_engine.py:173] Added request embd-69a08211c22a4db9baa14c2da3db9dcd-0.
ERROR 08-27 14:54:06 async_llm_engine.py:56] Engine background task failed
ERROR 08-27 14:54:06 async_llm_engine.py:56] Traceback (most recent call last):
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 46, in _log_task_completion
ERROR 08-27 14:54:06 async_llm_engine.py:56]     return_value = task.result()
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 637, in run_engine_loop
ERROR 08-27 14:54:06 async_llm_engine.py:56]     result = task.result()
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 578, in engine_step
ERROR 08-27 14:54:06 async_llm_engine.py:56]     request_outputs = await self.engine.step.remote()  # type: ignore
ERROR 08-27 14:54:06 async_llm_engine.py:56] ray.exceptions.RayTaskError(AttributeError): ray::_AsyncLLMEngine.step() (pid=40485, ip=10.5.8.112, actor_id=da65e597172ea5f7dea0a8b601000000, repr=<vllm.engine.async_llm_engine._AsyncLLMEngine object at 0x7fd4c6e27250>)
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/concurrent/futures/_base.py", line 439, in result
ERROR 08-27 14:54:06 async_llm_engine.py:56]     return self.__get_result()
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
ERROR 08-27 14:54:06 async_llm_engine.py:56]     raise self._exception
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 911, in step
ERROR 08-27 14:54:06 async_llm_engine.py:56]     output = self.model_executor.execute_model(
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/site-packages/vllm/executor/ray_gpu_executor.py", line 273, in execute_model
ERROR 08-27 14:54:06 async_llm_engine.py:56]     return super().execute_model(execute_model_req)
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/site-packages/vllm/executor/distributed_gpu_executor.py", line 76, in execute_model
ERROR 08-27 14:54:06 async_llm_engine.py:56]     driver_outputs = self._driver_execute_model(execute_model_req)
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/site-packages/vllm/executor/ray_gpu_executor.py", line 266, in _driver_execute_model
ERROR 08-27 14:54:06 async_llm_engine.py:56]     return self.driver_worker.execute_method("execute_model",
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/site-packages/vllm/worker/worker_base.py", line 383, in execute_method
ERROR 08-27 14:54:06 async_llm_engine.py:56]     raise e
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/site-packages/vllm/worker/worker_base.py", line 374, in execute_method
ERROR 08-27 14:54:06 async_llm_engine.py:56]     return executor(*args, **kwargs)
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/site-packages/vllm/worker/worker_base.py", line 236, in execute_model
ERROR 08-27 14:54:06 async_llm_engine.py:56]     self.model_runner.prepare_model_input(
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 1227, in prepare_model_input
ERROR 08-27 14:54:06 async_llm_engine.py:56]     sampling_metadata = SamplingMetadata.prepare(seq_group_metadata_list,
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/site-packages/vllm/model_executor/sampling_metadata.py", line 126, in prepare
ERROR 08-27 14:54:06 async_llm_engine.py:56]     ) = _prepare_seq_groups(seq_group_metadata_list, seq_lens, query_lens,
ERROR 08-27 14:54:06 async_llm_engine.py:56]   File "myenv/lib/python3.9/site-packages/vllm/model_executor/sampling_metadata.py", line 218, in _prepare_seq_groups
ERROR 08-27 14:54:06 async_llm_engine.py:56]     if sampling_params.seed is not None:
ERROR 08-27 14:54:06 async_llm_engine.py:56] AttributeError: 'NoneType' object has no attribute 'seed'

The text was updated successfully, but these errors were encountered:

mgoin · 2024-08-27T16:18:19Z

Yes, support for other models for embedding use needs to be added.

hahmad2008 · 2024-08-27T21:14:36Z

@mgoin does it need implementation? or it has the same implementation with the current code?
Also can I use the Meta-Llama-3-8B-Instructfor embedding? or only those models that are designed to be embedding models?

github-actions · 2024-11-27T02:05:07Z

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

mgoin · 2024-11-27T02:21:34Z

The support for embedding models has greatly improved now! https://docs.vllm.ai/en/latest/models/supported_models.html#text-embedding

hahmad2008 added the feature request label Aug 27, 2024

DarkLight1337 mentioned this issue Sep 14, 2024

[Usage]: Dose vLLM support embedding api of multimodal llm? #8483

Closed

1 task

noooop mentioned this issue Sep 26, 2024

[RFC]: Support encode only models by Workflow Defined Engine #8453

Closed

1 task

noooop mentioned this issue Oct 17, 2024

[Model] Add user-configurable task for models that support both generation and embedding #9424

Merged

github-actions bot added the stale label Nov 27, 2024

mgoin closed this as completed Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Does VLLM only support MistralModel Architecture for embedding? #7915

[Feature]: Does VLLM only support MistralModel Architecture for embedding? #7915

hahmad2008 commented Aug 27, 2024 •

edited

Loading

mgoin commented Aug 27, 2024

hahmad2008 commented Aug 27, 2024

github-actions bot commented Nov 27, 2024

mgoin commented Nov 27, 2024

[Feature]: Does VLLM only support MistralModel Architecture for embedding? #7915

[Feature]: Does VLLM only support MistralModel Architecture for embedding? #7915

Comments

hahmad2008 commented Aug 27, 2024 • edited Loading

🚀 The feature, motivation and pitch

mgoin commented Aug 27, 2024

hahmad2008 commented Aug 27, 2024

github-actions bot commented Nov 27, 2024

mgoin commented Nov 27, 2024

hahmad2008 commented Aug 27, 2024 •

edited

Loading