Hi, I'm Shuo Wang — 2 years of experience in AI/ML including large language models (LLM), natural language processing (NLP), deep learning, and model development and deployment. I also have 8 years leading teams and managing Communicating and projects. Currently seeking to transition into a Machine Learning Engineer or Data Science role.
- Description: Develope an open-source Transformer-based tool to optimize portfolio weights and recommend hedging strategies.
- Technologies Used: Python (sklearn, pandas, numpy, seaborn), Amazon SageMaker, S3, Kubernetes, PyTorch, Azure, JupyterLab
- Link: https://very-intelligent-portfolio.github.io
- Description: Utilize advanced NLP techniques to build transformer-based models (including BART, BERT, DistilBERT, ALBERT, and RoBERTa) for toxic comment detection.
- Technologies Used: Google Colab, TensorFlow, Neural Networks, Keras, Scikit-Learn
- Link: https://github.com/Shuo-Wang-UCBerkeley/Text-Classification
- Description: Design and implement a FastAPI application to serve prediction of an NLP model using DistilBERT from HuggingFace for sentiment analysis. Orchestrate the deployment of the application in Azure Kubernetes Service (AKS).
- Technologies Used: FastAPI, Docker, Azure, Kubernetes, Redis, Poetry, Istio, Kustomize, Grafana, CI/CD
- Link: https://github.com/Shuo-Wang-UCBerkeley/DistilBERT-API-Deployment
- Description: Predict departure delays greater than 15 minutes, 2 hours before takeoff
- Technologies Used: PySpark (ml, sql), Python (matplotlib, pandas, numpy, seaborn, datetime), DataBricks, MapReduce
- Link: https://github.com/Shuo-Wang-UCBerkeley/FlightDelay
- Description: Develop automated genre classification models and enhance user app experience.
- Technologies Used: Kaggle, Python, Scikit-Learn, Keras
- Link: https://github.com/Shuo-Wang-UCBerkeley/Song-Genre
-
Description: Develop Nerual Network model for Fashion MNIST images multi-class classification.
- Processed 60K data points from Fashion MNIST dataset that includes 784 features (a 28*28 greyscale image) and a label from 10 classes. Visualized summary statistics of the features using Python Seaborn.
- Developed a Neural Network model with three hidden layers and tuned the model with various activation functions (e.g., Tanh, ReLu) and optimizers (e.g., Adam, SGD) using Python TensorFlow, achieving an impressive 99% testing accuracy.
-
Technologies Used: Python, TensorFlow, Neural Networks
- Description: Increase Acme Gourmet Meals brand awareness beyond the local Berkeley neighborhood by identifying top customers, calculating nearest BART station, and deploying pickup/delivery service.
- Technologies Used: SQL, Neo4j, Relational Databases
- Link: https://github.com/Shuo-Wang-UCBerkeley/AGM-Brand
-
Description: Car Transmission vs CO2 Emissions
- Conducted an extensive analysis of car transmission impacts on CO2 emissions using Agency (VCA) data (2000-2013).
- Analyzed over 7,000 entries after data cleaning to establish correlations between transmission types, fuel types, and engine capacities with CO2 emissions. Consolidated data by car models to remove bias due to repetition.
- Employed linear regression to demonstrate that diesel-fueled and manual transmission vehicles emit significantly less CO2.
- Identified potential model limitations including collinearity and omitted variables such as the drag coefficient and improvements in transmission technology.
- Explored the impact of regulatory and technological changes over time, highlighting potential shifts in the relationship between transmission type and emissions in recent years.
-
Technologies Used: R (tidyverse, tsibble, forecast, ggplot), Python
-
Description: Evaluate which Party voters experience more difficulty voting in the 2020 Election.
- Led a statistical investigation into voter difficulties between Democratic and Republican voters during the 2020 U.S. election, utilizing the American National Election Studies data consisting of over 8,000 survey responses.
- Applied a non-parametric Wilcoxon rank-sum test to rigorously compare voting difficulties.
- Defined a novel metric for voter difficulty that included both voters and those who intended but failed to vote, enhancing the breadth of the analysis.
- Applied a non-parametric Wilcoxon rank-sum test to rigorously compare voting difficulties.
- Identified a meaningful difference in difficulty levels that may influence election outcomes, advocating for targeted political strategies to alleviate voting barriers.
-
Technologies Used: R (tidyverse, tsibble, forecast, ggplot), Python, Statistical Analysis, Wilcoxon Test
Feel free to reach out to me via email at [email protected] or connect with me on LinkedIn (https://www.linkedin.com/in/Shuo-Wang-PE).
Check out my resume for an overview of my skills, work experience, and education.