Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nomic-ai/nomic-embed-text-v1.5 fails #1680

Closed
Muennighoff opened this issue Jan 2, 2025 · 2 comments · Fixed by #1683
Closed

nomic-ai/nomic-embed-text-v1.5 fails #1680

Muennighoff opened this issue Jan 2, 2025 · 2 comments · Fixed by #1683

Comments

@Muennighoff
Copy link
Contributor

2025-01-02 01:24:48.412894: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-02 01:24:48.427287: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-02 01:24:48.431289: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
INFO:mteb.cli:Running with parameters: Namespace(model='nomic-ai/nomic-embed-text-v1.5', task_types=None, categories=None, tasks=['NorwegianCourtsBitextMining'], languages=None, benchmarks=None, device=None, output_folder='/data/niklas/results/results', verbosity=2, co2_tracker=True, eval_splits=None, model_revision=None, batch_size=64, overwrite=False, save_predictions=False, func=<function run at 0x7fa3287d7d00>)
WARNING:transformers_modules.nomic-ai.nomic-bert-2048.40b98394640e630d5276807046089b233113aa87.modeling_hf_nomic_bert:<All keys matched successfully>
INFO:mteb.evaluation.MTEB:

## Evaluating 1 tasks:
─────────────────────────────── Selected tasks  ────────────────────────────────
BitextMining
    - NorwegianCourtsBitextMining, s2s


INFO:mteb.evaluation.MTEB:

********************** Evaluating NorwegianCourtsBitextMining **********************
Overwrite dataset info from restored data version if exists.
INFO:datasets.builder:Overwrite dataset info from restored data version if exists.
Loading Dataset info from /data/huggingface/datasets/kardosdrur___norwegian-courts/default/0.0.0/d79af07e969a6678fcbbe819956840425816468f
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/kardosdrur___norwegian-courts/default/0.0.0/d79af07e969a6678fcbbe819956840425816468f
Found cached dataset norwegian-courts (/data/huggingface/datasets/kardosdrur___norwegian-courts/default/0.0.0/d79af07e969a6678fcbbe819956840425816468f)
INFO:datasets.builder:Found cached dataset norwegian-courts (/data/huggingface/datasets/kardosdrur___norwegian-courts/default/0.0.0/d79af07e969a6678fcbbe819956840425816468f)
Loading Dataset info from /data/huggingface/datasets/kardosdrur___norwegian-courts/default/0.0.0/d79af07e969a6678fcbbe819956840425816468f
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/kardosdrur___norwegian-courts/default/0.0.0/d79af07e969a6678fcbbe819956840425816468f
INFO:mteb.abstasks.AbsTaskBitextMining:
Task: NorwegianCourtsBitextMining, split: test, subset: default. Running...

Encoding 2x228 sentences:   0%|          | 0/2 [00:00<?, ?it/s]INFO:mteb.models.wrapper:No combination of task name and prompt type was found in model prompts.

Encoding 2x228 sentences:  50%|█████     | 1/2 [00:00<00:00,  3.17it/s]INFO:mteb.models.wrapper:No combination of task name and prompt type was found in model prompts.

Encoding 2x228 sentences: 100%|██████████| 2/2 [00:00<00:00,  5.03it/s]
Encoding 2x228 sentences: 100%|██████████| 2/2 [00:00<00:00,  4.62it/s]

Matching sentences:   0%|          | 0/1 [00:00<?, ?it/s]INFO:mteb.evaluation.evaluators.BitextMiningEvaluator:Finding nearest neighbors...

Matching sentences:   0%|          | 0/1 [00:00<?, ?it/s]
ERROR:mteb.evaluation.MTEB:Error while evaluating NorwegianCourtsBitextMining: expected np.ndarray (got Tensor)
Traceback (most recent call last):
  File "/env/lib/conda/gritkto/bin/mteb", line 8, in <module>
    sys.exit(main())
  File "/data/niklas/mteb/mteb/cli.py", line 387, in main
    args.func(args)
  File "/data/niklas/mteb/mteb/cli.py", line 145, in run
    eval.run(
  File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 623, in run
    raise e
  File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 562, in run
    results, tick, tock = self._run_eval(
  File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 304, in _run_eval
    results = task.evaluate(
  File "/data/niklas/mteb/mteb/abstasks/AbsTaskBitextMining.py", line 104, in evaluate
    scores[hf_subet] = self._evaluate_subset(
  File "/data/niklas/mteb/mteb/abstasks/AbsTaskBitextMining.py", line 137, in _evaluate_subset
    metrics = evaluator(model, encode_kwargs=encode_kwargs)
  File "/data/niklas/mteb/mteb/evaluation/evaluators/BitextMiningEvaluator.py", line 42, in __call__
    scores = self.compute_metrics(model, encode_kwargs=encode_kwargs)
  File "/data/niklas/mteb/mteb/evaluation/evaluators/BitextMiningEvaluator.py", line 64, in compute_metrics
    scores[f"{key1}-{key2}"] = self._compute_metrics(
  File "/data/niklas/mteb/mteb/evaluation/evaluators/BitextMiningEvaluator.py", line 83, in _compute_metrics
    nearest_neighbors = self._similarity_search(
  File "/data/niklas/mteb/mteb/evaluation/evaluators/BitextMiningEvaluator.py", line 131, in _similarity_search
    query_embeddings = torch.from_numpy(query_embeddings)
TypeError: expected np.ndarray (got Tensor)

@Samoed Samoed mentioned this issue Jan 2, 2025
2 tasks
@isaac-chung
Copy link
Collaborator

@gowitheflow-1998 @KennethEnevoldsen related observation: I believe most (if not all) models added for MIEB return embeddings as tensors. The interface for an ImageEncoder specifies np.ndarray instead, so I think as long as we follow that (or any agreed upon) interface, we should be good.

@KennethEnevoldsen
Copy link
Contributor

Yep generally agree. However would love to have an update that ensure that we could also work with torch tensors (#941)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants