Workflow runs · stanford-crfm/helm

Actions

All workflows

Actions

Loading...
Loading

Showing runs from all workflows

4,105 workflow runs

Add Deepseek-R1 model (#3305) Test #7961: Commit cd9e516 pushed by yifanmai

February 5, 2025 00:09

9m 57s main

main

February 5, 2025 00:09

9m 57s

Add multiple annotators to Omni-MATH and rename shared modules Test #7960: Pull request #3291 synchronize by yifanmai

February 4, 2025 23:41

9m 58s jialiang/multiple_annotator

jialiang/multiple_annotator

February 4, 2025 23:41

9m 58s

Add ECHR Judgment Classification scenario Test #7959: Pull request #3311 opened by yifanmai

February 4, 2025 22:59

10m 34s yifanmai/fix-echr-judge

yifanmai/fix-echr-judge

February 4, 2025 22:59

10m 34s

Add QwQ model on Together AI (#3307) Test #7958: Commit f62ea62 pushed by yifanmai

February 4, 2025 22:17

9m 50s main

main

February 4, 2025 22:17

9m 50s

Add QwQ model on Together AI Test #7957: Pull request #3307 synchronize by yifanmai

February 4, 2025 22:16

10m 8s yifanmai/qwq

yifanmai/qwq

February 4, 2025 22:16

10m 8s

Fixes for BigCodeBench Evaluator Test #7956: Pull request #3310 opened by yifanmai

February 4, 2025 22:15

10m 35s yifanmai/bigcodebench-evaluator

yifanmai/bigcodebench-evaluator

February 4, 2025 22:15

10m 35s

Merged main Frontend #705: Commit dcca97a pushed by MiguelAFH

February 4, 2025 19:38

1m 6s med-helm

med-helm

February 4, 2025 19:38

1m 6s

Add Spider 1.0 scenario (#3300) Test #7955: Commit 5a50569 pushed by yifanmai

February 4, 2025 17:17

9m 53s main

main

February 4, 2025 17:17

9m 53s

Switch aggregation for tables benchmark from win rate to mean (#3309) Test #7954: Commit 6fb429e pushed by yifanmai

February 4, 2025 17:17

10m 34s main

main

February 4, 2025 17:17

10m 34s

Scenario tests Scenario tests #259: Scheduled

February 4, 2025 15:34

8m 17s main

main

February 4, 2025 15:34

8m 17s

Switch aggregation for tables benchmark from win rate to mean Test #7953: Pull request #3309 synchronize by yifanmai

February 4, 2025 01:46

9m 57s yifanmai/mean-tables

yifanmai/mean-tables

February 4, 2025 01:46

9m 57s

Switch aggregation for tables benchmark from win rate to mean Test #7952: Pull request #3309 opened by yifanmai

February 4, 2025 01:22

10m 29s yifanmai/mean-tables

yifanmai/mean-tables

February 4, 2025 01:22

10m 29s

Add support to redact model outputs (#3301) Test #7951: Commit 714a97d pushed by MiguelAFH

February 3, 2025 22:40

9m 59s main

main

February 3, 2025 22:40

9m 59s

Add support to redact model outputs Test #7950: Pull request #3301 synchronize by MiguelAFH

February 3, 2025 22:27

9m 54s redact-output

redact-output

February 3, 2025 22:27

9m 54s

Add Mistral Small 3 model (#3308) Test #7949: Commit 2401e5e pushed by yifanmai

February 3, 2025 17:56

9m 44s main

main

February 3, 2025 17:56

9m 44s

Scenario tests Scenario tests #258: Scheduled

February 3, 2025 15:34

8m 13s main

main

February 3, 2025 15:34

8m 13s

Scenario tests Scenario tests #257: Scheduled

February 2, 2025 15:34

7m 15s main

main

February 2, 2025 15:34

7m 15s

Scenario tests Scenario tests #256: Scheduled

February 1, 2025 15:34

7m 20s main

main

February 1, 2025 15:34

7m 20s

Add Phi 3.5 models (#3306) Test #7948: Commit 228e0f1 pushed by yifanmai

February 1, 2025 04:01

9m 39s main

main

February 1, 2025 04:01

9m 39s

Add Mistral Small 3 model Test #7947: Pull request #3308 opened by yifanmai

February 1, 2025 01:24

10m 41s yifanmai/mistral-small-3

yifanmai/mistral-small-3

February 1, 2025 01:24

10m 41s

Add QwQ model on Together AI Test #7946: Pull request #3307 opened by yifanmai

February 1, 2025 01:13

9m 46s yifanmai/qwq

yifanmai/qwq

February 1, 2025 01:13

9m 46s

Add Phi 3.5 models Test #7945: Pull request #3306 synchronize by yifanmai

February 1, 2025 01:13

9m 46s yifanmai/phi-3.5

yifanmai/phi-3.5

February 1, 2025 01:13

9m 46s

Add Phi 3.5 models Test #7944: Pull request #3306 opened by yifanmai

February 1, 2025 00:58

10m 7s yifanmai/phi-3.5

yifanmai/phi-3.5

February 1, 2025 00:58

10m 7s

Add Deepseek-R1 model Test #7943: Pull request #3305 opened by yifanmai

February 1, 2025 00:18

9m 46s yifanmai/deepseek-r1

yifanmai/deepseek-r1

February 1, 2025 00:18

9m 46s

Add o3-mini model Test #7942: Pull request #3304 opened by yifanmai

January 31, 2025 23:08

9m 43s yifanmai/openai-o3

yifanmai/openai-o3

January 31, 2025 23:08

9m 43s

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Actions

Workflows

Management

All workflows

Actions

Loading...
Loading

All workflows

Filter by Event

Sorry, something went wrong.

Sorry, something went wrong.

No matching events.

Filter by Status

Sorry, something went wrong.

Sorry, something went wrong.

No matching statuses.

Filter by Branch

Sorry, something went wrong.

Sorry, something went wrong.

No matching branches.

Filter by Actor

Sorry, something went wrong.

Sorry, something went wrong.

No matching users.

Actions: stanford-crfm/helm

Actions

All workflows All workflows Actions Loading... Loading Sorry, something went wrong.

All workflows

All workflows

Actions

Loading...
Loading