Skip to content
View hongpeng-guo's full-sized avatar
:octocat:
:octocat:

Highlights

  • Pro

Block or report hongpeng-guo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Tracking Ray Enhancement Proposals

50 29 Updated Feb 26, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 3,908 349 Updated Feb 27, 2025

Official Repo for Open-Reasoner-Zero

Python 1,373 56 Updated Feb 25, 2025

FlashMLA: Efficient MLA Decoding Kernel for Hopper GPUs

C++ 10,543 671 Updated Feb 27, 2025

一种任务级GPU算力分时调度的高性能深度学习训练平台

Python 534 67 Updated Oct 24, 2023

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

5,307 89 Updated Feb 27, 2025

The NetHack Learning Environment

C 951 112 Updated May 6, 2024

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 11,800 765 Updated Feb 27, 2025

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 1,331 110 Updated Feb 27, 2025

PyTorch per step fault tolerance (actively under development)

Python 244 20 Updated Feb 25, 2025

A curated list of reinforcement learning with human feedback resources (continually updated)

3,757 232 Updated Feb 19, 2025

[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration

Python 196 18 Updated Nov 18, 2024

A Balatro Modding Framework

Lua 432 87 Updated Feb 27, 2025

赵晨阳写字的地方

2 Updated Jan 30, 2025

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 16,417 2,158 Updated Feb 1, 2025

Clean, minimal, accessible reproduction of DeepSeek R1-Zero

Python 10,762 1,382 Updated Feb 1, 2025

Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)

Python 987 109 Updated Feb 18, 2025

A Python implementation of global optimization with gaussian processes.

Python 8,092 1,558 Updated Feb 27, 2025

Ultra | Ultimate | Unified CCL

C++ 34 2 Updated Feb 14, 2025

A self-learning tutorail for CUDA High Performance Programing.

JavaScript 392 43 Updated Dec 17, 2024

The Amazon S3 Connector for PyTorch delivers high throughput for PyTorch training jobs that access and store data in Amazon S3.

Python 145 20 Updated Feb 24, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 42,309 5,176 Updated Feb 27, 2025

how to optimize some algorithm in cuda.

Cuda 1,923 170 Updated Feb 26, 2025

主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题

HTML 5,655 648 Updated Oct 22, 2024

My learning notes/codes for ML SYS.

Python 1,121 54 Updated Feb 27, 2025

🧑‍🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…

Python 58,835 5,977 Updated Aug 24, 2024

Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also …

C++ 297 52 Updated Feb 25, 2025

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++ 866 102 Updated Feb 27, 2025

Fast implementation of BERT inference directly on NVIDIA (CUDA, CUBLAS) and Intel MKL

C++ 543 85 Updated Nov 18, 2020

10x Faster Long-Context LLM By Smart KV Cache Optimizations

Python 504 51 Updated Feb 27, 2025
Next
Showing results