Skip to content

Actions: huggingface/trl

Hugging Face Issue Labeler

Actions

Loading...
Loading

Show workflow options

Create status badge

Loading
75 workflow runs
75 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

OOM for 7B model on A100 80Gb
Hugging Face Issue Labeler #75: Issue #2719 opened by JohnConnor123
January 31, 2025 13:17 36s
January 31, 2025 13:17 36s
AttributeError: 'AutoModelForCausalLMWithValueHead' object has no attribute 'base_model_prefix'
Hugging Face Issue Labeler #74: Issue #2718 opened by Tarak200
January 31, 2025 10:08 52s
January 31, 2025 10:08 52s
GRPO for RL on agent trajectories
Hugging Face Issue Labeler #73: Issue #2715 opened by korbinian-hoermann
January 31, 2025 09:09 51s
January 31, 2025 09:09 51s
Isn't the reward *minimized* when len(completion)==20 if this is the reward function?
Hugging Face Issue Labeler #72: Issue #2714 opened by cfpark00
January 31, 2025 09:03 22s
January 31, 2025 09:03 22s
GRPO with tool calling
Hugging Face Issue Labeler #71: Issue #2712 opened by accupham
January 31, 2025 07:25 26s
January 31, 2025 07:25 26s
LoRA 'trainable params: 0'
Hugging Face Issue Labeler #70: Issue #2711 opened by shannonruxin
January 31, 2025 04:50 28s
January 31, 2025 04:50 28s
Examples in training VDPO on llava1.6
Hugging Face Issue Labeler #69: Issue #2710 opened by lucasjinreal
January 31, 2025 04:22 42s
January 31, 2025 04:22 42s
GRPO memory bottleneck from num_generations in compute_loss
Hugging Face Issue Labeler #68: Issue #2709 opened by willccbb
January 31, 2025 03:54 40s
January 31, 2025 03:54 40s
PPOTrainer + LoRA and Continued Training
Hugging Face Issue Labeler #67: Issue #2707 opened by kooryan
January 30, 2025 20:19 37s
January 30, 2025 20:19 37s
Multi-GPU sampling for vLLM in GRPO Trainer
Hugging Face Issue Labeler #66: Issue #2706 opened by nch0w
January 30, 2025 20:09 25s
January 30, 2025 20:09 25s
January 30, 2025 19:09 34s
GRPO: Why does loss start at 0 for first K steps and then increase over time?
Hugging Face Issue Labeler #64: Issue #2703 opened by arnavgarg1
January 30, 2025 18:27 28s
January 30, 2025 18:27 28s
Exposing GenerationConfig in the GRPO Trainer
Hugging Face Issue Labeler #63: Issue #2702 opened by Superskyyy
January 30, 2025 18:00 28s
January 30, 2025 18:00 28s
Allow pretokenized dataset in GRPO Trainer
Hugging Face Issue Labeler #62: Issue #2701 opened by Superskyyy
January 30, 2025 17:57 27s
January 30, 2025 17:57 27s
GRPO VLLM does not work with Lora
Hugging Face Issue Labeler #61: Issue #2698 opened by gagan3012
January 30, 2025 16:03 44s
January 30, 2025 16:03 44s
I cannot launch PPOTrainning script with accelerate launch
Hugging Face Issue Labeler #60: Issue #2696 opened by daehuikim
January 30, 2025 15:38 30s
January 30, 2025 15:38 30s
OOM 8xH100 using latest GRPO code with vLLM
Hugging Face Issue Labeler #59: Issue #2688 opened by abacaj
January 30, 2025 05:55 31s
January 30, 2025 05:55 31s
empty Cache after logps_per_token
Hugging Face Issue Labeler #58: Issue #2686 opened by shirinyamani
January 29, 2025 23:02 36s
January 29, 2025 23:02 36s
rewards_funcs set to eval mode
Hugging Face Issue Labeler #57: Issue #2685 opened by shirinyamani
January 29, 2025 22:29 31s
January 29, 2025 22:29 31s
Support iterative GRPO
Hugging Face Issue Labeler #56: Issue #2684 opened by howardzhou
January 29, 2025 19:37 33s
January 29, 2025 19:37 33s
About the Implementation of GRPO
Hugging Face Issue Labeler #55: Issue #2681 opened by macheng6
January 29, 2025 08:26 30s
January 29, 2025 08:26 30s
Ability to provide a static completion for GRPO
Hugging Face Issue Labeler #54: Issue #2680 opened by Palmik
January 29, 2025 08:25 31s
January 29, 2025 08:25 31s
How is this Possible?
Hugging Face Issue Labeler #52: Issue #2675 opened by August-murr
January 28, 2025 12:51 30s
January 28, 2025 12:51 30s
TypeError: type list doesn't define __round__ method - why I am getting this error
Hugging Face Issue Labeler #51: Issue #2674 opened by Tarak200
January 28, 2025 11:54 24s
January 28, 2025 11:54 24s