Skip to content

Actions: stanford-crfm/helm

All workflows

Actions

Loading...
Loading

Showing runs from all workflows
4,105 workflow runs
4,105 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

Add Deepseek-R1 model (#3305)
Test #7961: Commit cd9e516 pushed by yifanmai
February 5, 2025 00:09 9m 57s main
February 5, 2025 00:09 9m 57s
Add ECHR Judgment Classification scenario
Test #7959: Pull request #3311 opened by yifanmai
February 4, 2025 22:59 10m 34s yifanmai/fix-echr-judge
February 4, 2025 22:59 10m 34s
Add QwQ model on Together AI (#3307)
Test #7958: Commit f62ea62 pushed by yifanmai
February 4, 2025 22:17 9m 50s main
February 4, 2025 22:17 9m 50s
Add QwQ model on Together AI
Test #7957: Pull request #3307 synchronize by yifanmai
February 4, 2025 22:16 10m 8s yifanmai/qwq
February 4, 2025 22:16 10m 8s
Fixes for BigCodeBench Evaluator
Test #7956: Pull request #3310 opened by yifanmai
February 4, 2025 22:15 10m 35s yifanmai/bigcodebench-evaluator
February 4, 2025 22:15 10m 35s
Merged main
Frontend #705: Commit dcca97a pushed by MiguelAFH
February 4, 2025 19:38 1m 6s med-helm
February 4, 2025 19:38 1m 6s
Add Spider 1.0 scenario (#3300)
Test #7955: Commit 5a50569 pushed by yifanmai
February 4, 2025 17:17 9m 53s main
February 4, 2025 17:17 9m 53s
Switch aggregation for tables benchmark from win rate to mean (#3309)
Test #7954: Commit 6fb429e pushed by yifanmai
February 4, 2025 17:17 10m 34s main
February 4, 2025 17:17 10m 34s
Scenario tests
Scenario tests #259: Scheduled
February 4, 2025 15:34 8m 17s main
February 4, 2025 15:34 8m 17s
Switch aggregation for tables benchmark from win rate to mean
Test #7953: Pull request #3309 synchronize by yifanmai
February 4, 2025 01:46 9m 57s yifanmai/mean-tables
February 4, 2025 01:46 9m 57s
Switch aggregation for tables benchmark from win rate to mean
Test #7952: Pull request #3309 opened by yifanmai
February 4, 2025 01:22 10m 29s yifanmai/mean-tables
February 4, 2025 01:22 10m 29s
Add support to redact model outputs (#3301)
Test #7951: Commit 714a97d pushed by MiguelAFH
February 3, 2025 22:40 9m 59s main
February 3, 2025 22:40 9m 59s
Add support to redact model outputs
Test #7950: Pull request #3301 synchronize by MiguelAFH
February 3, 2025 22:27 9m 54s redact-output
February 3, 2025 22:27 9m 54s
Add Mistral Small 3 model (#3308)
Test #7949: Commit 2401e5e pushed by yifanmai
February 3, 2025 17:56 9m 44s main
February 3, 2025 17:56 9m 44s
Scenario tests
Scenario tests #258: Scheduled
February 3, 2025 15:34 8m 13s main
February 3, 2025 15:34 8m 13s
Scenario tests
Scenario tests #257: Scheduled
February 2, 2025 15:34 7m 15s main
February 2, 2025 15:34 7m 15s
Scenario tests
Scenario tests #256: Scheduled
February 1, 2025 15:34 7m 20s main
February 1, 2025 15:34 7m 20s
Add Phi 3.5 models (#3306)
Test #7948: Commit 228e0f1 pushed by yifanmai
February 1, 2025 04:01 9m 39s main
February 1, 2025 04:01 9m 39s
Add Mistral Small 3 model
Test #7947: Pull request #3308 opened by yifanmai
February 1, 2025 01:24 10m 41s yifanmai/mistral-small-3
February 1, 2025 01:24 10m 41s
Add QwQ model on Together AI
Test #7946: Pull request #3307 opened by yifanmai
February 1, 2025 01:13 9m 46s yifanmai/qwq
February 1, 2025 01:13 9m 46s
Add Phi 3.5 models
Test #7945: Pull request #3306 synchronize by yifanmai
February 1, 2025 01:13 9m 46s yifanmai/phi-3.5
February 1, 2025 01:13 9m 46s
Add Phi 3.5 models
Test #7944: Pull request #3306 opened by yifanmai
February 1, 2025 00:58 10m 7s yifanmai/phi-3.5
February 1, 2025 00:58 10m 7s
Add Deepseek-R1 model
Test #7943: Pull request #3305 opened by yifanmai
February 1, 2025 00:18 9m 46s yifanmai/deepseek-r1
February 1, 2025 00:18 9m 46s
Add o3-mini model
Test #7942: Pull request #3304 opened by yifanmai
January 31, 2025 23:08 9m 43s yifanmai/openai-o3
January 31, 2025 23:08 9m 43s