I'm currently the Lead ML Engineer for Drug Discovery at Deloitte, where I work to develop, orchestrate, and deploy deep learning models to accelerate pharmaceutical research and development.
In my free time, I build large language model (LLM) powered tools, develop open source LLM models and datasets, and contribute to LLM research projects.
I've worked on several drug discovery & LLM-focused projects featured on GitHub, including:
- QLoRA for Masked Language Modeling - Updated QLoRA for use with the masked language modeling objective, enabling efficient finetuning of BERT-family models
- Multi-GPU QLoRA - Updated QLoRA to allow for distributed data parallel finetuning, significantly accelerating finetuning workloads
I have also open sourced some of my LLM models and data on HuggingFace:
- ChrisHayduk/Llama-2-SQL-and-Code-Dataset - Curated a SQL-focused code instruction set for LLaMA 2. The eval set includes dummy tables so that the trained model can be evaluated for SQL execution accuracy rather than token prediction accuracy. The dataset was processed in a number of ways, including introducing curriculum learning, fixing table inputs, and instruction filtering.
- ChrisHayduk/OpenGuanaco-13B - Created an open source recreation of Guanaco using OpenLLaMA.
- Twitter: https://twitter.com/chris_hayduk1
- LinkedIn: https://www.linkedin.com/in/chrishayduk/
- Substack: https://www.chrishayduk.com/