v4.49.0: Helium, Qwen2.5-VL, SuperGlue, Granite Vision, Zamba2, GOT-OCR 2.0, DAB-DETR, Depth Pro, RT-DETRv2, GPTQModel
LatestNew models
Helium
Helium-1 preview is a lightweight language model with 2B parameters, targeting edge and mobile devices. It supports the following languages: English, French, German, Italian, Portuguese, Spanish.

- Add-helium by @ArthurZucker in #35669
Qwen2.5-VL
The Qwen2.5-VL model is an update to Qwen2-VL from Qwen team, Alibaba Group.
The abstract from this update is the following:
Qwen2.5-VL marks a major step forward from Qwen2-VL, built upon the latest Qwen2.5 LLM. We’ve accelerated training and testing through the strategic implementation of window attention within the ViT. The ViT architecture itself has been refined with SwiGLU and RMSNorm, aligning it more closely with the LLM’s structure. A key innovation is the expansion of native dynamic resolution to encompass the temporal dimension, in addition to spatial aspects. Furthermore, we’ve upgraded MRoPE, incorporating absolute time alignment on the time axis to allow the model to effectively capture temporal dynamics, regardless of frame rate, leading to superior video understanding.
- add qwen2.5vl by @ShuaiBai623 in #35569
SuperGlue
The SuperGlue model was proposed in SuperGlue: Learning Feature Matching with Graph Neural Networks by Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz and Andrew Rabinovich.
This model consists of matching two sets of interest points detected in an image. Paired with the SuperPoint model, it can be used to match two images and estimate the pose between them. This model is useful for tasks such as image matching, homography estimation, etc.

- Add SuperGlue model by @sbucaille in #29886
Granite Vision Support
The Granite Vision model is a variant of LLaVA-NeXT, leveraging a Granite language model alongside a SigLIP visual encoder. It utilizes multiple concatenated vision hidden states as its image features, similar to VipLlava. It also uses a larger set of image grid pinpoints than the original LlaVa-NeXT models to support additional aspect ratios.
- Granite Vision Support by @alex-jw-brooks in #35579
Zamba2
Zamba2 is a large language model (LLM) trained by Zyphra, and made available under an Apache 2.0 license.
Zamba2-1.2B, Zamba2-2.7B and Zamba2-7B are hybrid models combining state-space models (Specifically Mamba) and transformer, and were trained using next-token prediction. Zamba2 uses shared transformer layers after every 6 mamba blocks. It uses the Mistral v0.1 tokenizer. We came to this architecture after a series of ablations at small scales. Zamba2-1.2B, Zamba2-2.7B and Zamba2-7B were pre-trained on 2T and 3T tokens, respectively.
GOT-OCR 2.0
GOT-OCR2 works on a wide range of tasks, including plain document OCR, scene text OCR, formatted document OCR, and even OCR for tables, charts, mathematical formulas, geometric shapes, molecular formulas and sheet music. While this implementation of the model will only output plain text, the outputs can be further processed to render the desired format, with packages like pdftex, mathpix, matplotlib, tikz, verovio or pyecharts. The model can also be used for interactive OCR, where the user can specify the region to be recognized by providing the coordinates or the color of the region’s bounding box.
- Add GOT-OCR 2.0 to Transformers by @yonigozlan in #34721
DAB-DETR
DAB-DETR is an enhanced variant of Conditional DETR. It utilizes dynamically updated anchor boxes to provide both a reference query point (x, y) and a reference anchor size (w, h), improving cross-attention computation. This new approach achieves 45.7% AP when trained for 50 epochs with a single ResNet-50 model as the backbone.
- Add DAB-DETR for object detection by @conditionedstimulus in #30803
Depth PRO
DepthPro is a foundation model for zero-shot metric monocular depth estimation, designed to generate high-resolution depth maps with remarkable sharpness and fine-grained details. It employs a multi-scale Vision Transformer (ViT)-based architecture, where images are downsampled, divided into patches, and processed using a shared Dinov2 encoder. The extracted patch-level features are merged, upsampled, and refined using a DPT-like fusion stage, enabling precise depth estimation.
RT-DETRv2
An improved Real-Time DEtection TRansformer (RT-DETR). RT-DETRv2 refines RT-DETR by introducing selective multi-scale feature extraction, a discrete sampling operator for broader deployment compatibility. These improvements yield a 0.3 to 1.4 increase in mAP metrics on the COCO dataset, all while maintaining the same parameter count and frames-per-second (FPS) performance.
- Adding RTDETRv2 by @jadechoghari in #34773
Transformers-CLI
Transformers' CLI welcomes a new command: chat
. This command starts a conversation with the model of your choosing directly in your terminal.
This feature exists in TRL and has been migrated to transformers
for easier usage.
Processor Standardization
An ongoing work is to standardize the image processors so that their API is equivalent. Additionally, the processors are given a fast variant so that they are never blockers in the image processing pipelines.
In this release, several processors have been standardized and have seen their fast version be contributed.
- OwlViT/Owlv2 post processing standardization by @qubvel in #34929
- OmDet Turbo processor standardization by @qubvel in #34937
- Grounding DINO Processor standardization by @qubvel in #34853
- Refactoring of ImageProcessorFast by @yonigozlan in #35069
- add Qwen2-VL image processor fast by @yonigozlan in #35733
- Remove Multi-threaded image conversion for fast image processors by @yonigozlan in #36105
Breaking changes
DPT segmentation maps
DPT image processors did not support segmentation_maps
, instead only requiring images
. This has been fixed.
This adds an argument to the preprocess
method, therefore users using arguments as positional arguments with that method may see changed behavior. We recommend using keyword arguments for such methods so as to not be bothered by the addition of new features.
- 🔴 🔴 🔴 Added
segmentation maps
support for DPT image processor by @simonreise in #34345
Image classification pipeline and single vs multi-label
The problem_type
in the config.json file was read incorrectly by the pipeline, which mapped single-label to multi-label losses, and vice-versa. This has been fixed.
- 🚨🚨🚨 image-classification pipeline single-label and multi-label prob type squashing fns (sigmoid vs softmax) are backwards by @rwightman in #35848
Fixing the LayerNorm beta/gamma renames
The description of the pull request is the easiest way to understand the problem, why it exists, and how it is solved; please read the description below:
- 🚨🚨🚨 An attempt to fix #29554. Include 'LayerNorm.' in gamma/beta rename scope, optimize string search. by @rwightman in #35615
VLM cleanup
The ignore_index
property of the llava configuration has been removed as it was not serving a purpose.
- 🔴 VLM: compile compatibility by @zucchini-nlp in #35724
Quantization
Quantization has received several improvements and fixes, including the contribution of FP8 quantization and the HIGGS quantization interface.
Additionally, we're replacing the AutoGPTQ implementaiton with GPTQModel from ModelCloud (see repository here)).
GPTQModel originated as major refractor of AutoGPTQ but is now a full-stand-in replacement with cleaner api, up-to-date model support, faster inference, higher quality quants.
- Enable gptqmodel by @ jiqing-feng in #35012
- Split and clean up GGUF quantization tests by @Isotr0py in #35502
- Display warning for unknown quants config instead of an error by @SunMarc in #35963
- Adding FP8 Quantization to transformers by @MekkCyber in #36026
- New HIGGS quantization interfaces, JIT kernel compilation support. by @BlackSamorez in #36148
Generate
- [generate] revert change in Aria: the maximum cache length must match
max_length
by @gante in #36120 - 🧹 remove
generate
-related objects and methods scheduled for removal in v4.48 by @gante in #35677 - [generate] can instantiate
GenerationConfig(cache_implementation="static")
by @gante in #35679 - [generate] return Cache object even if passed in a legacy format by @gante in #35673
- [generate] update docstring of
SequenceBiasLogitsProcessor
by @gante in #35699 - Test: generate with
torch.compile(model.forward)
as a fast test by @gante in #34544 - [generate] move max time tests by @gante in #35962
- [generate] shape checks in tests compatible with fixed-length caches (+ some minor fixes) by @gante in #35993
Pipelines
Pipelines have received several bug fixes and improvements which are detailed below.
- Stop mutating input dicts in audio classification pipeline by @Rocketknight1 in #35754
- fix document qa bf16 pipeline by @jiqing-feng in #35456
- fix low-precision audio classification pipeline by @jiqing-feng in #35435
- [pipeline] missing import regarding assisted generation by @gante in #35752
- Output dicts support in text generation pipeline by @jonasrohw in #35092
- Fix Audio Classification Pipeline top_k Documentation Mismatch and Bug #35736 by @sambhavnoobcoder in #35771
Bugfixes and improvements
- Fix flaky
test_custom_4d_attention_mask
by @ydshieh in #35606 - Use inherit tempdir makers for tests + fix failing DS tests by @muellerzr in #35600
- Added error when sequence length is bigger than max_position_embeddings by @Taha1506 in #32156
- Let
EarlyStoppingCallback
not requireload_best_model_at_end
by @muellerzr in #35101 - Fix flaky
test_beam_search_low_memory
by @ydshieh in #35611 - Skip
MobileNetV1ModelTest::test_batching_equivalence
for now by @ydshieh in #35614 - Update codeowners with individual model owners by @Rocketknight1 in #35595
- Fix device in rope module when using dynamic updates by @Cyrilvallez in #35608
- Fix whisper compile by @jiqing-feng in #35413
- Removed some duplicated code by @Sai-Suraj-27 in #35637
- [
Phi
] bias should be True by @ArthurZucker in #35650 - Enable different torch dtype in sub models by @zucchini-nlp in #34873
- [
Compile
] Only test compiling model forward pass by @ArthurZucker in #35658 - [tests] make cuda-only tests device-agnostic by @faaany in #35607
- [i18n-ar] Translated file : docs/source/ar/tasks/token_classification.md into Arabic by @AhmedAlmaghz in #35193
- Fix
zero_shot_image_classification
documentation guide link in SigLIP by @aretrace in #35671 - Fix : adding einops lib in the CI docker for some bitsandbytes tests by @MekkCyber in #35652
- Update torchao.md: use auto-compilation by @martin0258 in #35490
- Fix : HQQ config when hqq not available by @MekkCyber in #35655
- Fix expected output for ggml test by @MekkCyber in #35686
- Fix : add require_read_token for gemma2 gated model by @MekkCyber in #35687
- Enhanced Installation Section in README.md by @egojoseph in #35094
- Enhance DataCollatorForLanguageModeling with Configurable Token Replacement Probabilities by @mahdibaghbanzadeh in #35251
- Clean-up composite configs by @zucchini-nlp in #34603
- Add future import for Py < 3.10 by @Rocketknight1 in #35666
- Enable gptqmodel by @jiqing-feng in #35012
- Fix : Nemotron Processor in GGUF conversion by @MekkCyber in #35708
- Fix typo in /docs/source/ja/model_doc/decision_transformer.md URL by @hiroaki222 in #35705
- Replace deprecated batch_size with max_batch_size when using HybridCache by @mtreinik in #35498
- Fix: Falcon tie_word_embeddings in GGUF by @MekkCyber in #35715
- Fix condition when GA loss bug fix is not performed by @techkang in #35651
- Fix the bug that
Trainer
cannot correctly calltorch_jit_model_eval
by @Wanguy in #35722 - [generation] fix type hint by @gante in #35725
- Add proper jinja2 error by @Rocketknight1 in #35533
- Optimize ForCausalLMLoss by removing unnecessary contiguous() call to reduce memory overhead by @efsotr in #35646
- Modular: support for importing functions from any file by @Cyrilvallez in #35692
- Remove batch size argument warning when unjustified by @quintenroets in #35519
- [cache] add a test to confirm we can use cache at train time by @gante in #35709
- Remove
pt_to_tf
by @gante in #35672 - Added resource class configuration option for
check_circleci_user
job by @Sai-Suraj-27 in #32866 - Fix some tests by @Cyrilvallez in #35682
- Unable to use
MimiModel
with DeepSpeed ZeRO-3 by @anferico in #34735 - check is added for the report_to variable in TrainingArguments by @alpertunga-bile in #35403
- Added liger_kernel compatibility with
PeftModel
by @ambroser53 in #35680 - Restore is_torch_greater_or_equal_than for backward compatibility by @tlrmchlsmth in #35734
- Revert "Unable to use
MimiModel
with DeepSpeed ZeRO-3" by @eustlb in #35755 - ci: fix xpu skip condition for test_model_parallel_beam_search by @dvrogozh in #35742
- Use AMD CI workflow defined in hf-workflows by @ivarflakstad in #35058
- Fix CI for VLMs by @zucchini-nlp in #35690
- Security fix for
self-comment-ci.yml
by @ydshieh in #35548 - [ViTPose] Convert more checkpoints by @NielsRogge in #35638
- fix register_buffer in MimiEuclideanCodebook by @anferico in #35759
- remove code owners as it was generating too much noise BUT by @ArthurZucker in #35784
- Skip Falcon 7B GGML Test by @MekkCyber in #35783
- [fix] cannot import name 'Pop2PianoFeatureExtractor' from 'transformers' by @faaany in #35604
- transformers.image_transforms.normalize wrong types by @CalOmnie in #35773
- Patch moonshine by @eustlb in #35731
- Don't import torch.distributed when it's not available by @booxter in #35777
- Fix vits low-precision dtype by @jiqing-feng in #35418
- Tool calling: support more types by @aymeric-roucher in #35776
- Fixes, improvements to
timm
import behaviour by @rwightman in #35800 - modular_model_converter bugfix on assignments by @nikosanto13 in #35642
- Deterministic sorting in modular converter when adding new functions by @Cyrilvallez in #35795
- Fix "test_chat_template_dict" in video LLMs by @zucchini-nlp in #35660
- Update AMD Docker image by @ivarflakstad in #35804
- Add LlavaImageProcessor by @NielsRogge in #33191
- Byebye
test_batching_equivalence
's flakiness by @ydshieh in #35729 - [Doc] Adding blog post to model doc for
TimmWrapper
by @ariG23498 in #35744 - add a new flax example for Bert model inference by @louie-tsai in #34794
- Support adamw_torch_8bit by @fzyzcjy in #34993
- Auto-add
timm
tag to timm-wrapper models. by @pcuenca in #35794 - Fix : BLOOM tie_word_embeddings in GGUF by @MekkCyber in #35812
- Fixed typo in autoawq version number in an error message for IPEX backend requirements. by @InfroLab in #35815
- Remove deprecated
get_cached_models
by @Wauplin in #35809 - Optimized set_initialized_submodules. by @LagPixelLOL in #35493
- [i18n-ar] Translated file:
docs/source/ar/tasks/masked_language_modeling.md
into Arabic by @AhmedAlmaghz in #35198 - move fastspeech to audio models by @eustlb in #35788
- Improve modular documentation by @Cyrilvallez in #35737
- [Mimi] update test expected values for t4 runners by @eustlb in #35696
- Remove old
benchmark
code by @gante in #35730 - Remove pyav pin to allow python 3.11 to be used by @CalOmnie in #35823
- Another security patch for
self-comment-ci.yml
by @ydshieh in #35816 - Init cache on meta device by @zucchini-nlp in #35164
- Hotfix: missing
working-directory
inself-comment-ci.yml
by @ydshieh in #35833 - [gpt2] fix generation tests by @gante in #35822
- Fix : Nemotron tokenizer for GGUF format by @MekkCyber in #35836
- Fix
head_dim
in config extracted from Gemma2 GGUF model by @Isotr0py in #35818 - [chat] docs fix by @gante in #35840
- Fix compatibility issues when using auto_gptq with these older versions by @LRL-ModelCloud in #35830
- Add PyTorch version check for FA backend on AMD GPUs by @mht-sharma in #35813
- Fix NoneType type as it requires py>=3.10 by @SunMarc in #35843
- [
tests
] remove some flash attention class tests by @ArthurZucker in #35817 - [Backend support] Allow
num_logits_to_keep
as Tensor + add flag by @Cyrilvallez in #35757 - Fix GA loss for Deepspeed by @timjeffrey10 in #35808
- Fix uploading processors/tokenizers to WandB on train end by @jack89roberts in #35701
- Fix more CI tests by @ArthurZucker in #35661
- [DOC] Fix contamination and missing paragraph in translation by @Yosshi999 in #35851
- Fix typo by @SilverSoldier in #35854
- fix apply_chat_template() padding choice by @baoyf4244 in #35828
- Fix
test_pipelines_video_classification
that was always failing by @CalOmnie in #35842 - Fix Llava-NeXT / Llava-NeXT Video / Llava-OneVision's token unpadding mismatch by @sheryc in #35779
- use torch.testing.assertclose instead to get more details about error in cis by @ArthurZucker in #35659
- add xpu device check in device_placement by @faaany in #35865
- Add
Rocketknight1
toself-comment-ci.yml
by @ydshieh in #35881 - [doctest] Fixes by @stevhliu in #35863
- Fix fast image processor warnings in object detection examples by @sugendran in #35892
- Update deepspeed amd image by @ivarflakstad in #35906
- Fix typing in audio_utils.chroma_filter_bank by @CalOmnie in #35888
- [docs] uv install by @stevhliu in #35821
- Fix the config class comparison for remote code models by @Rocketknight1 in #35592
- Close Zamba2Config code block by @Rocketknight1 in #35914
- [docs] Fix Zamba2 by @stevhliu in #35916
- Remove
_supports_static_cache = True
for some model classes by @ydshieh in #34975 - Use rocm6.2 for AMD images by @ivarflakstad in #35930
- Add default TP plan for all models with backend support by @Cyrilvallez in #35870
- Fix: loading DBRX back from saved path by @zucchini-nlp in #35728
- Fix mask slicing for models with HybridCache by @Cyrilvallez in #35681
- Qwen-2-5-VL: fix CI by @zucchini-nlp in #35935
- Fix TP initialization by @Cyrilvallez in #35860
- fix(FA): QKV not being casted to target_dtype for FA with dpo lora by @NanoCode012 in #35834
- Remove INC notebook reference in documentation by @echarlaix in #35936
- use torch constraints to check if covariance is positive definite during mean resizing. by @abuelnasr0 in #35693
- fix
test_generated_length_assisted_generation
by @keyboardAnt in #34935 - Update
unwrap_and_save_reload_schedule
to useweights_only=False
by @ydshieh in #35952 - Update
squad_convert_example_to_features
to work with numpy v2 by @ydshieh in #35955 - Fix flaky
test_assisted_decoding_matches_greedy_search
by @ydshieh in #35951 - Trainer Refactor: Part 1 by @muellerzr in #35567
- update docker file
transformers-pytorch-deepspeed-latest-gpu
by @ydshieh in #35940 - [tests] further fix
Tester object has no attribute '_testMethodName'
by @faaany in #35781 - Update README.md by @BlessedTatonka in #35958
- fix iterator overflow when gradient accumulation is 1 by @winglian in #35960
- Fix is_causal being a tensor by @IlyasMoutawwakil in #35791
- [bart] minor test fixes by @gante in #35965
- Pixtral: vectorize patch embeddings and enable tests by @zucchini-nlp in #35122
- Whisper: fix static cache CI by @zucchini-nlp in #35852
- Less flaky for
TimmBackboneModelTest::test_batching_equivalence
by @ydshieh in #35971 - Support batching for UsefulSensors Moonshine by @njeffrie in #35922
- not to use A100 for
benchmark.yml
by @ydshieh in #35974 - Handle empty change indices in SAM's mask to rle conversion by @MSt-10 in #35665
- Add support for nested images to LLava and VipLLava by @yonigozlan in #35558
- [Moonshine] compute head_dim_padding at init by @eustlb in #35984
- [Moshi] disable automatic compilation if the model can't compile by @gante in #35992
- use torch 2.6 for daily CI by @ydshieh in #35985
- Update-tp test by @ArthurZucker in #35844
- Add mean_resizing for every VLMs' resizing_token_embeddings() by @YenFuLin in #35717
- Update Granite Vision Model Path / Tests by @alex-jw-brooks in #35998
- Qwen2-VL: fix rope delta calculation by @zucchini-nlp in #36013
- Fix custom kernel for DeformableDetr, RT-Detr, GroindingDINO, OmDet-Turbo in Pytorch 2.6.0 by @qubvel in #35979
- apply_chat_template: consistent behaviour for return_assistant_tokens_mask=True return_tensors=True by @mrsndmn in #35582
- layernorm_decay_fix by @Ryoo72 in #35927
- Update Mistral converter by @Cyrilvallez in #35967
- Refactor (and fix) gpt_neox by @Cyrilvallez in #35610
- Fix device mismatch error in Whisper model during feature extraction by @thedebugger in #35866
- Fix RMSNormGated in Zamba2 by @pglorio in #35943
- Commont bot CI for other jobs (
generation
/quantization
) by @ydshieh in #35341 - Hotfix for
self-comment-ci.yml
by @ydshieh in #36030 - feat(ci): ignore trufflehog unverified results by @McPatate in #36031
- CircleCI with python 3.9 by @ydshieh in #36027
- Update tests regarding attention types after #35235 by @ydshieh in #36024
- Fix Gemma2 synced multi-GPU generation by @ManukyanD in #35232
- Fix synced multi-GPU generation with LLMs and VLMs by @ManukyanD in #35893
- Add XPU type for work-around -inf mask causing sdpa NaN issue in modeling files by @Liangliang-Ma in #35647
- add support for empty list as input to create_model_card by @ROZBEH in #36042
- DeepSpeed github repo move sync by @stas00 in #36021
- [docs] no hard coding cuda as bnb has multi-backend support by @faaany in #35867
- [docs] fix bugs in the bitsandbytes documentation by @faaany in #35868
- [docs] no hard-coding cuda by @faaany in #36043
- Fix how we compute the final non-padding token for ForSequenceClassification models by @Rocketknight1 in #35911
- Add
Qwen2VLImageProcessorFast
intoQwen2VLProcessor
by @yeliudev in #35987 - Iterative generation using Input embeds and
past_key_values
by @yaswanth19 in #35890 - Fix usage of unpad_input function by @pavelgein in #35925
- Fix repo consistency by @ydshieh in #36063
- Update
test_flash_attn_2_can_dispatch_composite_models
by @ydshieh in #36050 - Paligemma: fix generation with Gemma2 by @zucchini-nlp in #36044
- Save checkpoint to temporary directory to handle partial saves during failures by @SilverSoldier in #35580
- Nail in edge case of torch dtype being overriden permantly in the case of an error by @muellerzr in #35845
- Fix words typos in ggml test. by @zhanluxianshen in #36060
- Fix model kwargs by @muellerzr in #35875
- Fix StopStringCriteria to handle tokens above len(tokenizer) by @Rocketknight1 in #35797
- [docs] fix outdated example code in
trainer.md
by @faaany in #36066 - Adding RT-DETRv2 for object detection by @jadechoghari in #34773
- Fix bug in apply_rotary_pos_emb_flashatt: in Qwen2-5-VL by @DeepWaved in #36065
- Move audio top_k tests to the right file and add slow decorator by @Rocketknight1 in #36072
- Fix OS err by @muellerzr in #36094
- [docs] fix model checkpoint name by @faaany in #36075
- [docs] fix typo by @faaany in #36080
- [docs] fix not-working example code in
perf_infer_gpu_one.md
by @faaany in #36087 - fix MllamaVisionAttention typehint by @kylesayrs in #35975
- Processors: allow tuples of images when checking by @zucchini-nlp in #36084
- Chat template: update for processor by @zucchini-nlp in #35953
- Paligemma: revert #36084 by @zucchini-nlp in #36113
- Support constant lr with cooldown by @LoserCheems in #35453
- Enable pytest live log and show warning logs on GitHub Actions CI runs by @ydshieh in #35912
- Refactor OPT model by @jiqing-feng in #36101
- Revert checkpoint tmp dir by @SunMarc in #36112
- [Bugfix] fix file name of docstring in utils/check_table.py by @kkscilife in #36108
- fix bnb warning by @SunMarc in #36116
- AutoformerForPrediction test add atol by @ivarflakstad in #36017
- Fix nighlty CIs: missing atols by @ArthurZucker in #35903
- Add common test for
torch.export
and fix some vision models by @qubvel in #35124 - fix: typos in documentation files by @maximevtush in #36122
- update awesome-transformers.md. by @zhanluxianshen in #36115
- Fix max size deprecated warning by @HichTala in #34998
- Fix CI issues by @molbap in #35662
- update tiktoken integ to use converted by @ArthurZucker in #36135
- Make
output_dir
Optional inTrainingArguments
#27866 by @sambhavnoobcoder in #35735 - [docs] minor doc fix by @faaany in #36127
- [docs] update awq doc by @faaany in #36079
- Add pipeline parallel plan to
PretrainedConfig
andPreTrainedModel
by @hmellor in #36091 - add RAdamScheduleFree optimizer by @nhamanasu in #35313
- added warning to Trainer when label_names is not specified for PeftModel by @MilkClouds in #32085
- Whisper: remove redundant assisted generation tests by @gante in #34814
- Add utility for Reload Transformers imports cache for development workflow #35508 by @sambhavnoobcoder in #35858
- VLM: enable skipped tests by @zucchini-nlp in #35746
- [commands] remove deprecated/inoperational commands by @gante in #35718
- Fix Gradient Checkpointing for Deberta & Deberta-V2 using PEFT / Adapters by @lenglaender in #35898
- 🚨 Remove cache migration script by @Wauplin in #35810
- multi-gpu: fix tensor device placements for various models by @dvrogozh in #35763
- Optim: APOLLO optimizer integration by @zhuhanqing in #36062
- Fix multi gpu loss sync condition, add doc and test by @techkang in #35743
- adding option to save/reload scaler by @hsilva664 in #34932
- Update doc re list of models supporting TP by @kwen2501 in #35864
- Add more rigerous non-slow grad accum tests by @muellerzr in #35668
- Fix test fetcher by @ydshieh in #36129
- skip
test_initialization
forVitPoseBackboneModelTest
for now by @ydshieh in #36154 - Add git LFS to AMD docker image by @ivarflakstad in #36016
- Mllama fsdp by @blbadger in #36000
- Fix PaliGemma Pad Token Masking During Training #35855 by @sambhavnoobcoder in #35859
- Add reminder config to issue template and print DS version in env by @Ben-Schneider-code in #35156
- Fix Gemma2 dtype issue when storing weights in float16 precision by @Nerogar in #35398
- Replace deprecated update_repo_visibility by @Wauplin in #35970
- Fix tests for vision models by @qubvel in #35654
- qwen2.5vl: fix bugs when using flash2+bf16 or num_return_sequences>1 by @gewenbin0992 in #36083
- docs: fix return type annotation of
get_default_model_revision
by @MarcoGorelli in #35982 - Fix PretrainedTokenizerFast check => Fix PretrainedTokenizerFast Save by @CL-ModelCloud in #35835
- Move
DataCollatorForMultipleChoice
from the docs to the package by @bauwenst in #34763 - Helium documentation fixes by @LysandreJik in #36170
- Remove loading custom kernel for RT-DETRv2 by @qubvel in #36098
- [Modular] skip modular checks based on diff by @gante in #36130
- Fix red CI by @ArthurZucker in #36174
- Fix : fix doc fp8 by @MekkCyber in #36173
- Efficient Inference Kernel for SpQR by @elvircrn in #34976
- fix training issues by @ArthurZucker in #36158
- add disable compile option by @ArthurZucker in #36161
- CI: avoid human error, automatically infer generative models by @gante in #33212
- Use tqdm auto by @SmartManoj in #35726
- Optimize Qwen2VL vision model by precomputing cos/sin embeds before ViT blocks by @li-plus in #35837
- Make
check_repository_consistency
run faster by MP by @ydshieh in #36175 - Fix the key name for _load_rng_state under torch.cuda by @wizyoung in #36138
- Follow up to SpQR integration by @MekkCyber in #36176
- Fix a mistake in #36175 by @ydshieh in #36179
- Fix make_batched_videos and add tests by @yonigozlan in #36143
- Uniformize OwlViT and Owlv2 processors by @yonigozlan in #35700
- Add support for partial rotary embeddings in Phi3 model by @garg-amit in #35947
- CI: fix
test-save-trainer
by @zucchini-nlp in #36191 - Chat template docs by @zucchini-nlp in #36163
- Add ImageProcessorFast to Qwen2.5-VL processor by @Isotr0py in #36164
- Prepare processors for VideoLLMs by @zucchini-nlp in #36149
- Add require_read_token to fp8 tests by @MekkCyber in #36189
- Revert qwen2 breaking changes related to attention refactor by @ArthurZucker in #36162
- Guard against unset resolved_archive_file by @dmlap in #35628
- [Bugfix] Fix reloading of pixtral/llava configs by @kylesayrs in #36077
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @jiqing-feng
- @AhmedAlmaghz
- @sbucaille
- Add SuperGlue model (#29886)
- @Isotr0py
- @ShuaiBai623
- add qwen2.5vl (#35569)
- @alex-jw-brooks
- @pglorio
- @conditionedstimulus
- Add DAB-DETR for object detection (#30803)
- @jadechoghari
- Adding RT-DETRv2 for object detection (#34773)
- @geetu040
- Add Apple's Depth-Pro for depth estimation (#34583)
- @zhuhanqing
- Optim: APOLLO optimizer integration (#36062)
- @bauwenst
- Move
DataCollatorForMultipleChoice
from the docs to the package (#34763)
- Move
- @elvircrn
- Efficient Inference Kernel for SpQR (#34976)