hongpeng-guo

Follow

Hongpeng Guo hongpeng-guo

Follow

MLE @ Anyscale

45 followers · 93 following

Anyscale
San Francisco
16:07 - 8h behind
https://www.hongpeng-guo.com/

Achievements

Achievements

Highlights

Pro

Stars

ray-project / enhancements

Tracking Ray Enhancement Proposals

50 29 Updated Feb 26, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 3,908 349 Updated Feb 27, 2025

Open-Reasoner-Zero / Open-Reasoner-Zero

Official Repo for Open-Reasoner-Zero

Python 1,373 56 Updated Feb 25, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA Decoding Kernel for Hopper GPUs

C++ 10,543 671 Updated Feb 27, 2025

HFAiLab / hai-platform

一种任务级GPU算力分时调度的高性能深度学习训练平台

Python 534 67 Updated Oct 24, 2023

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

5,307 89 Updated Feb 27, 2025

facebookresearch / nle

The NetHack Learning Environment

C 951 112 Updated May 6, 2024

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 11,800 765 Updated Feb 27, 2025

xdit-project / xDiT

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 1,331 110 Updated Feb 27, 2025

pytorch / torchft

PyTorch per step fault tolerance (actively under development)

Python 244 20 Updated Feb 25, 2025

opendilab / awesome-RLHF

A curated list of reinforcement learning with human feedback resources (continually updated)

3,757 232 Updated Feb 19, 2025

efeslab / fiddler

[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration

Python 196 18 Updated Nov 18, 2024

Steamodded / smods

A Balatro Modding Framework

Lua 432 87 Updated Feb 27, 2025

zhaochenyang20 / blog

赵晨阳写字的地方

2 Updated Jan 30, 2025

deepseek-ai / Janus

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 16,417 2,158 Updated Feb 1, 2025

Jiayi-Pan / TinyZero

Clean, minimal, accessible reproduction of DeepSeek R1-Zero

Python 10,762 1,382 Updated Feb 1, 2025

SafeAILab / EAGLE

Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)

Python 987 109 Updated Feb 18, 2025

bayesian-optimization / BayesianOptimization

A Python implementation of global optimization with gaussian processes.

Python 8,092 1,558 Updated Feb 27, 2025

uccl-project / uccl

Ultra | Ultimate | Unified CCL

C++ 34 2 Updated Feb 14, 2025

PaddleJitLab / CUDATutorial

A self-learning tutorail for CUDA High Performance Programing.

JavaScript 392 43 Updated Dec 17, 2024

awslabs / s3-connector-for-pytorch

The Amazon S3 Connector for PyTorch delivers high throughput for PyTorch training jobs that access and store data in Amazon S3.

Python 145 20 Updated Feb 24, 2025

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 42,309 5,176 Updated Feb 27, 2025

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 1,923 170 Updated Feb 26, 2025

wdndev / llm_interview_note

主要记录大语言大模型（LLMs）算法（应用）工程师相关的知识及面试题

HTML 5,655 648 Updated Oct 22, 2024

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes/codes for ML SYS.

Python 1,121 54 Updated Feb 27, 2025

labmlai / annotated_deep_learning_paper_implementations

🧑‍🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…

Python 58,835 5,977 Updated Aug 24, 2024

facebookincubator / dynolog

Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also …

C++ 297 52 Updated Feb 25, 2025

zhihu / ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++ 866 102 Updated Feb 27, 2025

zhihu / cuBERT

Fast implementation of BERT inference directly on NVIDIA (CUDA, CUBLAS) and Intel MKL

C++ 543 85 Updated Nov 18, 2020

LMCache / LMCache

10x Faster Long-Context LLM By Smart KV Cache Optimizations

Python 504 51 Updated Feb 27, 2025