🔭 I’m currently working on Model serving optimization, training acceleration.
Expecially system for AI is main research topic. vLLM, Flash attention, and Megatron-LM is on my watch list LOL.
Moreover, I'm dissecting best practice cuda implementation and really interested in parallel programming & low level programming(os-level).
📫 How to reach me: email me to [email protected] :) ...