A collection of my favorite tech-related blog posts from companies.
Inspiration for this README comes from @kilimchoi's repo
Note: I am a Data Engineer, so the majority of these links will be data-related. However, there are plenty of general ones in here as well.
Taking Query Optimizations to the Next Level with Iceberg - As someone looking to learn more about Iceberg, this was a good primer on how to optimize queries
Iceberg at Adobe - A good overview on how Iceberg is used at a huge company
What Is CI/CD² (CI/CD Squared)? Continuous Integration and Continuous Delivery - Continuous destruction isn't a concept we think about often, but it's definitely a useful point to consider
Adobe Customer Journey Management’s Journey into the World of GitOps - It's great to see more and more companies embracing GitOps
Data Quality at Airbnb - The gold standard when it comes to data quality and how it should be followed.
Visualizing Data Timeliness at Airbnb - Having the insight to properly track SLAs is helpful for the operational side of things.
Achieving Insights and Savings with Cost Data - Cost dashboards are a big one to identify the biggest pain points for a team.
How Airbnb Built “Wall” to prevent data bugs - A great framework for holding up data quality.
Metis: Building Airbnb’s Next Generation Data Management Platform - A very good implementation of a data management platform
Riverbed: Optimizing Data Access at Airbnb’s Scale - A very sensible way to combine Lambda and Kappa architectures.
Data Quality Score: The next chapter of data quality at Airbnb - A great way to surface the need for data quality
Personal Data Classification - A good shift-left approach for data governance
Sandcastle: data/AI apps for everyone - A very good approach for bringing life to data applications
From Data to Insights: Segmenting Airbnb’s Supply - Good overview on the importance of insights at Airbnb
The Benefits of a Spiking Phase in Agile Development - How spiking helps for proper planning in Agile.
Quality engineering for a shared codebase - Quality engineering is always an important topic.
Should you move into management as a software engineer? - Crossing over into management from the engineering world is a difficult choice but usually has to be made at some point.
AWS Fargate for Data Engineering - Fargate is still a relevant processor for big data jobs.
How to Fight Retrospective Fatigue - Retrospectives are an important part of any sprint review and should be done properly.
Evolution of Babbel’s data pipeline on AWS: from SQS to Kinesis - A good luck at the evolution of a data platform.
Evolving to Enterprise-Grade Permissions - The principle of least access can never be mentioned enough.
Why the way we look at technical debt is wrong - Tech debt needs to be accepted and not be viewed as a negative.
Developer Engagement One Code Review at a Time - A good overview of how to really handle code review in an efficient manner
Domain-Driven Asset Management - Asset management is a discussion that doesn't get a lot of hype but is very relevant in the context of systems like the one at Blackrock
Citizen Developer Cookbook: Python Multiprocessing - I've always wanted to know more about multiprocessing in Python, so this was a helpful tutorial
Telemetry and Observability at BlackRock - A great primer for those wanting to understand more about telemetry and observability
How Reliability and Product Teams Collaborate at Booking.com - Collaborating with the product team needs to become more commonplace for development teams.
What can Dungeons & Dragons teach us about User Experience? - A very good analogy about UX.
Empowering Data with Design - A post stressing the importance of visualization.
Hourglass into Pyramid: how you can improve the structure of your tests - E2E testing needs to stop being an afterthought and part of the actual development process.
Service-aligned Data Platform Architecture - A good overview of Canvas's data platform and CDC.
Our journey to Snowflake monitoring mastery - Effective practices for managing Snowflake costs
The evolution of issue tracking - An important part of being on top of everything, how issue tracking has evolved at Capital One
Data Profiler: Data Drift Model Monitoring Tool - Data drift is an underappreciated topic sometimes, so this was a good overview of how it can be properly monitored
Serverless Computing Reduces Collaboration Costs - Capital One is really all-in on the serverless revolution, and this is a good explanation of why
5 reasons to use ML for better data quality - A good post on how ML can be used to enhance data quality
Serverless architecture technology - A good overview on different serverless technologies in AWS
So You Want to Be a Techie? - Tech is not only for people from a CS background. Anyone can thrive
How Machine Learning Can Help Fight Money Laundering - Applications of ML in the banking industry
Giving a Fast-Changing Data Ecosystem Room to Grow - A good post on how an always-evolving domain is allowed to properly grow
The “Why” of Inner Source - As someone who's big on inner-source, this is a good post of why it's useful
From the CIO’s View: Building a Nimble Learning Organization - Learning done right within an organization
Batch and Streaming in the World of Data Science and Data Engineering - A good overview of both batch and stream processing as it relates to DS/DE
The Journey from Batch to Real-time with Change Data Capture - An explanation of the different technologies that can be used in CDC
CICD and Data - An older post that helps give rise to DataOps
Doing The Hard Things First — Lessons From Our Cloud Journey - The financial industry is usually the last when it comes to bigger migration efforts, so this was a good overview of Capital One's cloud migration
Scaling to Billions of Requests-The Serverless Way at Capital One - How Capital One properly scales to support all the transactions it handles
Guardrails for AWS Event-Driven Serverless Architectures - Best practices for using serverless technologies in AWS
The 3 R’s of SREs: Resiliency, Recovery & Reliability - Even though this post is about SRE, many of the same principles apply to DRE as well
3 Considerations for Containers & Serverless Compute Options - Things to consider when making the migration to serverless
Serverless Stream Consumers — Common Pitfalls and Best Practices - Best practices for properly ingesting stream data
4 Serverless Myths to Understand Before Getting Started with AWS - A good overview on misconceptions to ignore before getting started with serverless
Embrace the Chaos … Engineering - A good overview on how to properly do chaos engineering
6 Principles of a Well Managed Change - Good principles to consider when it comes to bigger changes
Governance in a DevOps Environment - Properly integrated DevOps with governance
4 Steps for Pairing the Cloud and DevOps to Improve Resiliency - A very good post on how DevOps can be used to improve overall resiliency of an architecture
https://medium.com/capital-one-tech/focusing-on-the-devops-pipeline-topo-pal-833d15edf0bd - The intersection of Agile and DevOps
Continuous Chaos — Introducing Chaos Engineering into DevOps Practices - How chaos engineering and DevOps can feed off of one another
Continuous Delivery and What the Heck Happened to QA? - Why CD works and the importance of having a QA environment
DevOps is a State of Mind, Not Just a Role - I 100% agree with the premise of this post, as DevOps is a bigger adjustment than just learning a set of principles
No Testing Strategy, No DevOps - A great overview on why proper testing is needed to successfully pull off DevOps
The Mon-ifesto Part 2: Alerting and Graphing - Proper alerting/graphing principles in application monitoring
The Mon-ifesto Part 3: Alert Response and Post-Mortem - Postmortems are an underappreciated aspect of monitoring and incident management, but they're very relevant to helping ensure that issues suppress themselves in the future
Enterprise Architecture at Chick-fil-A - A great view of how important architecture can be in the restaurant industry.
Site Reliability Engineering at Chick-fil-A - An interesting overview of SRE with CFA.
Decentralized Model Ops Platform w/ Apache Airflow - A good overview of how Airflow is used to power MLOps at CFA.
Defining Clever’s Engineering Culture - The principles mentioned in this engineering culture should be the standard.
DevOps Best Practices: Opinionated Software That Drives a Successful DevOps Culture - A solid collection of best practices when it comes to DevOps.
DevOps Has Evolved Beyond Shift Left - Shift left isn't enough when it comes to DevOps and other key practices. Time to think bigger.
What is data partitioning, and how to do it right - Partitioning can make or break data applications, so it's important to know how to set it up properly.
Establishing Communication - A good post on the importance of communication in effective graphic design.
Databricks cost management at Coinbase - A good overview on various Databricks cost-savings practices, many of which I've implemented with successful outcomes.
Product & Tech — Better Together! - Another post stressing the importance of the interaction between tech and product teams.
Staff and Principal Engineers: why do we need them now? (Part 2) - Principal/staff engineers are paramount to larger organizations, so this was a good summary of how they made a difference at Commercetools.
How we Roadmap in 2021 - Effective roadmapping makes the Agile process a lot easier than it tends to be.
It’s done! Or is it? - A proper definition of done allows there never to be any guesswork when it comes to sprint planning.
Building Great Products at Compass IDC - A good guide on what it takes to build "great" products.
Repositories — One or Many - This is worth more of a discussion than you think. The monorepo vs. multiple repo debate can have cascading effect based on which approach you choose.
The Engineering Manager Guide: Spinning Up a Results Oriented Team - Teams need to be focused on impact/results. This was a great overview on the practices needed to get there.
Writing Good Commit Messages - Leaving this as more of a reminder to myself as my commit messages are lazy (and occasionally involve curse words)
Is Apache Kafka a Database? With ksqlDB, Most Definitely - A good overview of ksql and how it compares to traditional databases.
Effectively attending a tech conference - Making the most out of a tech conference is harder than it seems.
Credit Karma Data Explorer - A good overview of how Credit Karma is making data discovery easier.
How Engineering Rotation Programs Can Help Teams Scale - As someone who started out in a rotational program, these can be really effective for raising the future of your workforce.
How Much Can Bad Data Cost Us? - Bad data has many side effects, and that is why data quality is a fight you must always take on.
DataDoc — The Criteo Data Observability Platform - A wonderful overview of an effective data observability platform and how it met all of its use cases.
Technical Data Roadmap: Why and how to build it using a maturity matrix? - An effective technique for successful roadmapping.
Big Data Quality at Criteo - You can never mention data quality enough.
Data Governance at Criteo - A good overview of how an effective data governance process can be set in place.
The Analytics Development Lifecycle (ADLC) - A good summary of how SDLC can be applied to analytics
All Together Now: FinOps, Kubernetes, and Platform Engineering - Applying FinOps to Kubernetes savings
The Best Way to Control Kubernetes and Cloud Costs - More FinOps with Kubernetes
Dagster at all 5 Steps of the Development Lifecycle - Having an orchestrator involved at all steps of the development lifecycle is really helpful for data engineers. This is how Dagster tries to accomplish just that.
Declarative Scheduling for Data Assets - Declarative scheduling just makes sense when it comes to managing data assets.
Partitions in Data Pipelines - Thinking in partitions helps set Dagster apart from its competitors
What Dagster Believes About Data Platforms - "Data engineering is software engineering". Preach!
Balancing the Data Scales: Centralization vs. Decentralization - Centralization vs. decentralization is a never-ending topic, but this is a good primer on the pros and cons of each.
How to Make Data a Team Sport - Data democratization can be a big challenge at organizations, but when it's done well, it helps everyone.
Data Visibility -- A Primer - Data quality, lineage, and more are all key to proper data visibility.
Data Observability and Monitoring with DataOps - A very useful overview of DataOps and a platform that supports it
Use DataOps With Your Data Mesh to Prevent Data Mush - A good post on how DataOps and data mesh are intertwined
The Emergence and Evolution of Analytics Engineering at Deliveroo - Analytics engineering is definitely on the rise as of late, so this is a good introduction to what exactly that is
CloudFormation To Terraform - I've always been a proponent of Terraform, and it was good to see that Deliveroo agrees with me
Data Sink - An application of data sinks
How Discord Stores Billions of Messages - Discord is one of the fastest growing communities out there, so it's interesting to see how they manage to hold onto all those messages
How Discord Creates Insights From Trillions Of Data Points - Dealing with a lot of data isn't easy, so we can all take a page from Discord
How Data Science Informs Strategy Innovation At Discord - A good post about how relevant Data Science needs to be in the grand scheme of things
How Discord Stores Trillions of Messages - An impressive view of how Discord has continued to grow and managed to sustain that growth
How Discord Uses Open-Source Tools for Scalable Data Orchestration & Transformation - A good overview of how Discord overhauled their orchestration platform to Dagster and dbt
Building a Source of Truth for an Inventory with Disparate Data Sources - Bringing together a single source of truth in a massive organization is definitely a challenge
Meet Sibyl – DoorDash’s New Prediction Service – Learn about its Ideation, Implementation and Rollout - I've always been impressed by the role ML plays in the food service industry, so this was a cool implementation from Doordash
Lifecycle of a Successful ML Product: Reducing Dasher Wait Times - Another good overview of the role that ML plays in the food service industry
Ship to Production, Darkly: Moving Fast, Staying Safe with ML Deployments - DevOps meets ML
Organizing Machine Learning: Every Flavor Welcome! - A solid set of principles for growing ML
Using Metrics Layer to Standardize and Scale Experimentation at DoorDash - As someone wanting to know more about the metrics layer, this was a great post with a very detailed overview
How DoorDash Defines Great Engineering Management - I love the transparency behind how DoorDash wants to deliver on their management practices. This is a good template to follow.
How DoorDash Fosters Meaningful Engineering Career Development - A great model to follow for engineering development.
Five Common Data Quality Gotchas in Machine Learning and How to Detect Them Quickly - A good primer on a proper data quality framework
Finding Joy in Git Conflict Resolution - A cool way to make merge conflicts a lot easier
Data Science & Analytics: Practitioner Insights - A good set of principles for bringing the most out of data
Stars and Dimensions - For those who want to learn more about data modeling, this post has a nice refresher
Balancing quality and coverage with our data validation framework - Data validation frameworks are very helpful, and I like the way Dropbox has implemented theirs
Why we chose Apache Superset as our data exploration platform - Superset seems to fly under the radar from time to time but is always a dependable choice
Lessons learned in incident management - Oncall can be a messy process sometimes, but Dropbox has made it much more streamlined
How we reduced our cloud spending by 20% - A good overview on various ways you can reduce your Cloud spend
Meaningful metrics: How data sharpened the focus of product teams - A good review on the importance of metrics to product teams
Why You Should Make Everyone a Project Lead - Leading a project is a great way to further your leadership skills
Towards Machine Learning Observability at Etsy - A good overview on how Etsy is keeping all their ML models within scope
Software Architectural Patterns in Data Engineering - A helpful expalantion of the software patterns underlying all our popular DE tools
Rethinking Data Visualization - Product thinking applied to data visualization
Unified Machine Learning Platforms At Expedia Group - Awesome overview of Expedia's ML journey
The Importance of Being a Code Reviewer - A good set of practices to follow when it comes to code review.
Enhancing Data Reliability With An SLO Platform - SLO platforms are very helpful for monitoring
Enabling static analysis of SQL queries at Meta - A really neat overview of how FB handles SQL linting, amongst other things
Move faster, wait less: Improving code review time at Meta - FB's code review, especially considering it's a monorepo, is extremely well done
Tulip: Schematizing Meta’s data platform - Logging is very important to FB, so this is good insight into how that performance is maintained
Scaling data ingestion for machine learning training at Meta - I didn't necessarily understand everything here, but this was an interesting read nonetheless
Improving Meta’s SLO workflows with data annotations - Annotations can certainly give more insight into observability
Introducing Zelos: A ZooKeeper API leveraging Delos - Interesting overview of how FB plans on moving from ZooKepper to something more at their scale
BellJar: A new framework for testing system recoverability at scale - Recovering from an outage can't be easy for something the scale of FB, so this was a good overview of how they accomplish it
SLICK: Adopting SLOs for improved reliability - FB's monitoring is top-notch, and it's overviews like these that show why
Nemo: Data discovery at Facebook - FB's data discovery, speaking from personal experience, is immensely impressive
Aria Presto: Making table scan more efficient - Table scans are a painful activity, so making that more efficiently holds a lot of weight in SQL engines
Getafix: How Facebook tools learn to fix bugs automatically - Obviosuly treading into dangerous territory, but automating bug squashing could be very useful for many places
Migrating Messenger storage to optimize performance - How a service the size of Messenger is able to stay afloat
Rapid release at massive scale - DevOps applied to FB
Facebook Chef cookbooks - How Facebook (although an older post) puts CI/CD to use
Engineering Culture: Code ownership - Code ownership is certainly a debatable topic
Scaling Mercurial at Facebook - The Mercurial monorepo is FB is gigantic, so this is an interesting insight into how it's actually serving the thousands of engineers who work on it.
Presto: Interacting with petabytes of data at Facebook - Presto laid the foundation for what's Trino now, so understanding how Presto is as efficient as it is will help explain Starburst Galaxy and the like
Join Optimization in Apache Hive - Older article, but join optimizations in Hive is still a relevant topic
Scaling Out - An earlier post before FB was the FB we know today, but still a good lesson to be learned
Scheduling Jupyter Notebooks at Meta - A bit specific to Meta due to Bento not being open-source, but good principles nonetheless
Data engineering at Meta: High-Level Overview of the internal tech stack - Best thing you'll ever read on comparing the Meta DE tech stack to an open-source one
The future of the data engineer — Part I - A great read on the future of DE
Four Analytics Best Practices We Adopted — and Why You should Too - Good practices to follow for a successful analytics implementation
Analytics Career Development at Meta - What career advancement at Meta looks like
Automating data removal - A good system to remove data with reduced risk
What it takes to be a Senior IC at Meta - A good breakdown of senior vs. other levels of IC
Composable data management at Meta - A good introduction to setting up a composable data stack, which is becoming more and more relevant
How Meta discovers data flows via lineage at scale - Nice overview of how Meta is able to successfully capture lineage across their gigantic codebase
How we manage documentation at Funding Circle for our Data Platform - A great guide on properly handling documentation
Data science and data analytics – know the difference - These terms sometimes are used interchangeably but have their differences, so it's important to distinguish them
7 common Big Data security issues - Security is sometimes an afterthought when it comes to big data, so it's important to be aware of the various issues you may encounter while setting up these applications
Introducing Developer-less Data Workbench — Making business analysts, Masters of the data! - Considering this was written in 2015, this is an impressive overview of data enablement with automation
Apache Airflow on AWS ECS - Many different implementations of Airflow are available, but I haven't see too many leveraging ECS before
Let me automate that for you - Obviously, we want automation wherever we can have it, so this was a simple walkthrough of how it's done at Gamechanger
Data Interruption Process - This was a strange way to word on-call, but it's an effective (albeit older) approach nonetheless
What Good Engineers Do - A solid set of principles for what makes a good engineer
How we store and process millions of orders daily - For those who want to know more about DynamoDB, this is helpful
Embracing a Docs-as-Code approach - Documentation is an often overlooked area, but this is a good approach to making sure it remains a chief priority
Real-time data ingestion in Grab - How food service handles real-time data ingestion
Trident - Real-time Event Processing at Scale - I was not too familiar with IFTTT (if this, then that) design before, so this was an interesting read
The Accidental Tech Lead - Growing into being a tech lead, which can sometimes happen by accident just on account of experience
Cultivating Engineering Growth - Good tips on how to enable engineers for success through mentorship
Kubernetes Production Best Practices - Part I - For those using Kubernetes in their workflows, a solid set of best practices
How to Build Event-Driven Architecture on AWS - A good tutorial on setting up event-driven architectures in AWS, including the different routes that can be taken
How I Learned to Stop Worrying and Love Tech Debt - The term "papercuts" is definitely a reasonable way to pull in tech debt items into planning
How HelloTech’s working and knowledge sharing culture supports a company on scale - Companies with good knowledge sharing cultures are the ones whose employees succeed the most IMO
SLOs for everyone with Sloth - A very well-detailed explanation of how HelloFresh has full-scale monitoring for their SLOs in place
How HelloFresh establishes Data Quality with an in-house tool - A very nice implementation of data quality and attempting to shift left with it as well
Data driven Snowflake optimisation at HelloFresh - It's no secret in the DE world that Snowflake can be expensive. A good guide on how to tune down those costs.
Building a Data warehouse with Hive at Helpshift — Part 1 - A little more outdated, but still a useful overview of how you can build a warehouse with Hive as your backbone
Building for Balance - A very thorough overview of how Instacart finds the balance between fast deliveries and high-earning opportunities for their drivers
The Next Era of Data at Instacart - Good post on the future of the data org at Instacart
Adopting dbt as the Data Transformation Tool at Instacart - Good to see bigger companies starting to adopt dbt
Democratizing AI to Accelerate ML Model Development in Weeks vs. Months - A good overview on how the ML development process has been sped up at Intuit
How to Ensure Release Candidates are Good2Go? Automated Performance Pipelines. - Proper performance testing as a part of the CI/CD process is not something that's done enough, but this is a good set of principles to employ to accomplish that
A story of introducing data lineage into LINE's large-scale data platform - A good implementation of lineage in many different capacities at LINE
Super Tables: The road to building reliable and discoverable data products - LinkedIn's overview of "super tables" helps bring out the best in their data products
Open Sourcing Venice – LinkedIn’s Derived Data Platform - An impressive data platform implementation from LinkedIn
Scalable Automated Config-Driven Data Validation with ValiData - A nice way to automate data validation
LakeChime: A Data Trigger Service for Modern Data Lakes - A great idea of how to ingest data as soon as it's available
Right-sizing Spark executor memory - A good overview on Spark tuning
Practical text-to-SQL for data analytics - Effective guide/overview of building an SQL bot and why it'd be helpful in a larger organization
Securing Apache Airflow UI With DAG Level Access - DAG-level access may be the next evolution of Airflow UI access combined with RBAC
Open Sourcing Amundsen: A Data Discovery And Metadata Platform - Amundsen is becoming a very popular data discoverability platform, and for good reason
Running Apache Airflow At Lyft - Lyft is one of the big "power" users of Airflow, and their model can serve as a template for many
Big Savings On Big Data - A nice overview of how Lyft managed to bring down their costs in their processing
Gotchas of Streaming Pipelines: Profiling & Performance Improvements - Good tips on optimizing streaming pipelines
From Big Data to Better Data: Ensuring Data Quality with Verity - A very thorough overview of a great data quality platform
ETA (Estimated Time of Arrival) Reliability at Lyft - A thorough overview of how Lyft tries to calculate ETA
Searching for quality and speed? Observability can help - How observability is helping keep McDonald's development go quickly
A single source of truth: Building a design system library - This provides a good template for those who want to ensure they provide a consistent user experience
Proactive monitoring: The why, what and how - Proactive monitoring helps prevent bigger incidents from ever arising. It's the best way to pull off proper monitoring.
Data Products Reliability: The Power of Metadata: A good overview of how Miro is implementing data contracts
Navigating the Netflix Data Deluge: The Imperative of Effective Data Management - A great post on how Netflix manages storage costs at scale
ETL development life-cycle with Dataflow - A very good overview of the E2E ETL process with Dataflow at Netflix
Part 1: A Survey of Analytics Engineering Work at Netflix - A good overview of how analytics engineering is applied at Netflix
Congrats, You’re On Call! Now What? - How to effectively handle an on-call rotation
Engineering Principles (v1) at Nextdoor - A gold standard for engineering principles
Coordinated Cost Savings - Cost savings is a team effort and takes a village, as this post details
The next generation of Data Platforms is the Data Mesh - A very solid explanation of why data mesh is needed in data platforms
Next-Gen Data Movement Platform at PayPal - Lots of parts in play, but detailed insight into everything that drives PayPal
The Journey of Metadata at PayPal - Bringing data ownership and discoverability to the masses at PayPal
Gimel: PayPal’s Analytics Data Processing Platform - The coolest part of this blog was realizing Romit now works in a related team at Disney :). But this platform is certainly impressive nonetheless.
Improving efficiency and reducing runtime using S3 read optimization - Reducing runtime with S3 reads is every data engineer's dream
How Pinterest runs Kafka at scale - A good overview on how Kafka can be effectively scaled within an organization
How (and Why) Postman Created a Data-Driven Hiring Process - I have never really liked the interviewing process, on both sides. Postman has a good model in place here.
The Postman Data Team’s Hub-and-Spoke Model - A good explanation of how the hub-and-spoke model works for Postman and its data teams
How Postman Does Data Democratization - A very thorough overview of how Postman enhances their data with proper democratization
Trino at Quora Scale: Cost, Speed, and Reliability - For those using Trino/Presto, a good overview on how it's done in a larger environment
Accelerating experimentation with MLOps - A great resource for those who want to know more about best practices in MLOps
Data Science: Principles for Success - A solid set of principles for enabling success in a Data Science team
Data Discovery - A sensible implementation of Amundsen
Reflections On Designing An Enterprise Data Warehouse - Tips on how to design an effective data warehouse
The Ops Dojo - I'll all for the term "dojo" to better describe more of what we need to be doing
The 25 Percent Rule for Tackling Technical Debt - 25% allotment for tackling technical debt would be a dream, but Shopify raises a very valid point on why it's necessary
The Hardest Part of Writing Tests is Getting Started - A very truthful title. TDD is needed but actually getting to that initial state can be a challenge.
How Good Documentation Can Improve Productivity - As someone who very much agrees with good documentation, I couldn't agree more
Three Essential Remote Work Practices for Engineering Teams - Some of these are easier said than done, but very much true for remote work these days
Reducing BigQuery Costs: How We Fixed A $1 Million Query - Good tips on how to keep your costs low
A Software Engineer's Guide to Working Across Time Zones - As someone who works on a team with teammates halfway around the world, very relatable points
How to Structure Your Data Team for Maximum Influence - The "Diamond Defense" is not one I've ever seen before, but it makes sense on team structure
On the Importance of Pull Request Discipline - Good practices to follow for raising PRs
When Culture and Code Reviews Collide, Communication is Key - More relevant points than you might think
Six Tips for Staying Technical as a CTO - My fear when getting into management is not being technical, so it's cool to see this advice on how to "stay in the game"
5 Steps to Bounce Back from a Negative Performance Review - A bad performance review isn't the end of the world. It provides an opportunity to really grow.
Lessons Learned From Running Apache Airflow at Scale - Shopify has a good model in place for running Airflow
Asynchronous Communication is the Great Leveler in Engineering - Asynchronous communication is absolutely necessary in our current state of work
Data Is An Art, Not Just A Science—And Storytelling Is The Key - Absolutely agree with the title here. Telling a story with data is critical.
The Magic of Merlin: Shopify's New Machine Learning Platform - Merlin is a very cool implementation of ML
A Data Scientist’s Guide To Measuring Product Success - Good tips on how to better enable product success
Using Terraform to Manage Infrastructure - As a big Terraform proponent, this is a good overview on how Shopify is using it
Shopify's Playbook for Scaling Machine Learning - A good model to follow for ML
Search at Shopify—Range in Data and Engineering is the Future - A great post on why range is necessary for future development
Shopify’s Unique Data Science Hierarchy Of Needs - Shopify has a good model in place here for Data Science
Five Tips for Growing Your Engineering Career - A good set of tips for elevating your career
The AWARE Development Plan - A very good acronym to follow for a successful career
5 Steps for Building Machine Learning Models for Business - Good tips on getting ML into the picture
Modelling Developer Infrastructure Teams - A good explanation of the difference between horizontal and vertical teams
Bridging the Gap Between Developers and End Users - Very good tips on how to bring product and tech closer together
A Guide to Running an Engineering Program - Not sure if I'll ever get to this stage, but this seems like a very sensible guide if that day ever were to come
Other Driven Developments - Developments we'd never think about, but they're totally out there
How I Define My Boundaries to Prevent Burnout - Good tips here, including ones I need to follow more honestly
4 Tips for Shipping Data Products Fast - As someone who works with data products, I can attest following these will make things go much smoother
How to Make Dashboards Using a Product Thinking Approach - Good principles to follow for getting the most out of dashboards
How to Reliably Scale Your Data Platform for High Volumes - I feel like this isn't used as often as it should be, but it totally makes sense for making sure platforms scale
Software Release Culture at Shopify - This should set a standard for proper release culture
Great Code Reviews—The Superpower Your Team Needs - Good practices to follow for successful code reviews
Successfully Merging the Work of 1000+ Developers - A good set of proper CI standards
How Shopify Scales Up Its Development Teams - I very much agree with the points listed here on upleveling your team
Five Common Data Stores and When to Use Them - For those who need to evaluate with what type of data store to go with, this is a good reference
Implementing ChatOps into our Incident Management Procedure - I very much agree with the role of ChatOps in incident management
Code Style Consistency for Shopify’s Decade-Old Codebase - Code style is something that try to preach and uphold for our team
Why Shopify Moved to The Production Engineering Model - Having a model in place like this makes everyone's lives easier
Developer Onboarding at Shopify - Proper onboarding can make a world of difference for engineers, and it seems like Shopify has it down pat
Unlocking Real-time Predictions with Shopify's Machine Learning Platform - Very well-done explanation of how Merlin is being used at scale today
What Being a Staff Developer Means at Shopify - Being a staff developer is considered the ultimate rank, but what does it take to get there? This is a helpful guide to getting to that point.
Team Size and Why It Matters - A good breakdown of how smaller vs. bigger teams differ
Automating cloud governance at scale - For those who work in governance, this is a good way to keep guardrails on resource provisioning
Using engineering principles to create autonomous teams at scale - A good set of principles for ensuring teams are successful
Monoliths and Microservices - How to move away from monoliths to microservices
BuildRock: A Build Platform at Slack - Proper CI/CD platforms help unblock many teams, so it's imperative to do it right
Infrastructure Observability for Changing the Spend Curve - Generally, it's not CI infrastructure that hogs costs, but always good to be aware of everything
Data Lineage at Slack - Effective implementation of data lineage, especially with Slack notifications involved
How We Design Our APIs at Slack - For those interested in API design, this is a good set of principles to follow
Starting an Initiative - Finding impact can be difficult at first, but persistence is key
How Big Technical Changes Happen at Slack - Good discussion on when the hype is real and joining the trend
Deploys at Slack - A very solid CI/CD implementation
Disasterpiece Theater: Slack’s process for approachable Chaos Engineering - Chaos engineering helps keep websites like Slack up around the clock
Data Wrangling at Slack - An older article, but an effective implementation for data wrangling
Data Consistency Checks - An older article, but still covers valuable points related to data quality
Service Delivery Index: A Driver for Reliability - For those in SRE, this is a good primer.
Executing Cron Scripts Reliably At Scale - A bit strange not to see Slack using a service like Airflow to handle all of this, but a good overview nonetheless.
Unlocking Efficiency and Performance: Navigating the Spark 3 and EMR 6 Upgrade Journey at Slack - A walkthrough of how Slack upgraded all of their processes to use more recent versions of EMR/Spark.
Cloud Trends: A Mainstream Evolution to DataOps - A good overview on the relevance of DataOps in this current era
The Building Blocks of Success: Is Data Mesh Right for My Organization? - Data mesh is (rightfully) a buzzword right now, but that doesn't mean it's for everyone. This is a good guide on when data mesh makes sense.
Data Is Everywhere. Is Yours Under Control? - A good post on the relevance of data governance
Data Modelling is More than Documentation - A good explanation on the different types of data models
Deconstructing Data Mesh Principles - A good overview on the different key principles of a data mesh
Data Mesh: is the argument a strawman? - A post battling the hype of data meshes
Building a Culture of Data and Insights - A nice overview of how to enable a data-driven culture
Building a Healthy On-Call Culture - Tips for helping ensure a smooth on-call process
How (Not) to Build Datasets and Consume Data at Your Company - An effective approach towards ensure healthy data usage
Getting a Team Back on Track - This is an underdiscussed topic that should be mentioned more. A helpful set of tips for helping keep teams afloat amidst change.
A Better Model of Data Ownership - A helpful definition of what exactly ownership means in relation to data
Why We Switched Our Data Orchestration Service - Flyte isn't necessarily on Airflow or Prefect level yet but Spotify's explanations of why they're doing it makes total sense
Achieving Team Purpose and Pride with Scrum - Getting the most out of scrum, done the right way
Managing Clouds from the Ground Up: Cost Engineering at Spotify - We all could benefit from a dashboard tool like this (and many companies are now realizing how relevant it is)
How We Improved Data Discovery for Data Scientists at Spotify - A very thorough overview of how Spotify has implemented data discovery
TC4D: Data Quality By Engineers, For Engineers - A fun initiative for bringing out the best in testing
Qualities of Quality - A very solid set of principles for holding up quality
Analytics at Spotify - Old post but that only goes to show how much Spotify embraces data
Agile à la Spotify - You don't see many places rewriting the Agile manifesto, but the principles Spotify's outlining make sense
Fleet Management at Spotify (Part 1): Spotify’s Shift to a Fleet-First Mindset - Maintaining a lot of components is extremely difficult, but Spotify makes it look easy with this approach.
Getting More from Your Team Health Checks - How to get the most out of your team health/pulse checks, something that's not done enough.
Data Platform Explained - A data platform at a company that handles data like Spotify is bound to be interesting. I look forward to the continuation of this series.
Data Platform Explained Part II - A continuation of the data platform series
Unlocking Insights with High-Quality Dashboards at Scale - A good set of criteria for high-quality dashboards
Are You a Dalia? How We Created Data Science Personas for Spotify’s Analytics Platform - Persona usage for making sure a platform is built appropriately is a smart model
Creating a Code Review Culture, Part 1: Organizations and Authors - Good tips on how to more effectively put code together for review
Creating a Code Review Culture, Part 2: Code Reviewers - Good tips on how to be an effective code reviewer
Data Traceability and Lineage - A bit older on this topic, but setting the foundations for effective lineage in data
Why Devs (Should) Like Estimates - Good tips on how to more effectively estimate when it comes to planning
A Culture of Trust - Trust is one of the most important things you need to have within a team, and I totally agree with Stack Overflow's discussion on it
Developer Turned Manager - A good retrospective on transitioning from development to the management side of things
Migrating Spark from EMR on EC2 to EMR on EKS - EKS is the "new" standard for Spark processing, so this is a helpful tutorial on moving Spark from EC2 to EKS
Aggressively Helpful Platform Teams - "Aggressively helpful" is exactly what platform teams need to be in order to better enable success within an organization
What is DevOps? - A well done primer on DevOps
Creating Core Values that Actually Stick - Core values are often brushed away, but the organizations that really put time and effort into them are the ones that stand out amongst the crowd
Chaos Leads to Resilience - Chaos engineering can better protect your system in the long run, so it's cool to see how Target is preparing themselves for those scenarios
Review Scrutiny - Code review etiquette is an underappreciated topic but a good one to go back to from time to time
Making the data dream a reality - The origins of data mesh and how it can better enable data-driven thinking
Database Management: Behind-the-Scenes Lessons From a Data Architect - For those who want to learn more about data centers and the ins and outs of big data, this is definitely a good post
Big Data Architecture for the Masses: A ksqlDB and Kubernetes Tutorial - A good overview of ksqlDB
SRE: On-Call Procedure at trivago - On-call procedures would be a lot better for everyone if they followed how Trivago is doing it
Remastering Guilds After Five Years - Guilds are a great way to bring out more collaboration within an organization
Creating a Culture of Quality - A good post on proper quality when it comes to CI/CD
Technical Decision-Making - A good guide to help standardize the technical decision-making process
What Have I Even Been Doing Today? - How to come to terms with moving from an IC into a management role
Twitch Engineering: An Introduction and Overview - Older post, but still a cool overview of how Twitch is set up
Data Quality Automation at Twitter - For those using Great Expectations, this is an effective look at how Twitter is doing it
Powering real-time data analytics with Druid at Twitter - Druid may not be the most relevant platform anymore, but it's cool to see how Twitter is using it to power their use cases
Next generation data insights using natural language queries - This implementation of Qurious looks really, really cool
Advancing Jupyter Notebooks at Twitter - Part 1 - How Twitter leverages Jupyter notebooks for true data-driven analysis
Processing billions of events in real time at Twitter - 400 billion events per day is insane, so to see how Twitter's able to do it under the hood is very interesting
Kafka as a storage system - You don't really think of Kafka being used for storage, but Twitter seems to have done it effectively
Building Twitter’s ad platform architecture for the future - An AdServer per product is a lot, but it definitely does better enable proper scale
Democratizing data analysis with Google BigQuery - A very sensible approach to proper data democratization at Twitter
Interactive Analytics at MoPub: Querying Terabytes of Data in Seconds - An effective use of Druid and microservices to power interactive analytics
ZooKeeper at Twitter - Similar to FB, a detailed breakdown on how a big platform is using ZooKeeper to stay afloat
Productionizing ML with workflows at Twitter - How Twitter uses Airflow to solve their ML use cases
Using Deep Learning at Scale in Twitter’s Timelines - This is a really cool overview of how deep learning is used to power what we see on our Twitter timelines
The Infrastructure Behind Twitter: Scale - This is a lot of context on how Twitter manages to scale, and you know it's only gotten more complex since then
Discovery and Consumption of Analytics Data at Twitter - A pretty detailed discussion on data discovery, especially given that this was in 2016
Introducing WorkflowGuard: The Workflow Governance and Observability System That Oversees over 120,000 Data Workflows - Automated tools like these will become more of a reality, especially in the larger organizations
Crane: Uber’s Next-Gen Infrastructure Stack - The future of big data processing at Uber
Cost Efficiency @ Scale in Big Data File Format - A bit advanced, but a nice overview on how Uber keeps their costs in check
Streaming Real-Time Analytics with Redis, AWS Fargate, and Dash Framework - A good implementation of real-time analytics
How Data Shapes the Uber Rider App - A good overview on what role data plays in the Uber app
How Uber Achieves Operational Excellence in the Data Quality Experience - +1 for operational excellence and proper data quality
Continuous Integration and Deployment for Machine Learning Online Serving and Models - How Uber tackles some of their MLOps challenges
Uber’s Journey Toward Better Data Culture From First Principles - I'm a big fan of the principles mentioned in this page
Turning Metadata Into Insights with Databook - A data discovery/observability platform that can be the gold standard for others
Monitoring Data Quality at Scale with Statistical Modeling - Very useful applications of modeling for proper DQM
Uber’s Data Platform in 2019: Transforming Information to Intelligence - A bit outdated by DE standards, but valuable insight into how Uber manages to continue to perform at scale
Solving Big Data Challenges with Data Science at Uber - Fun applications of Data Science within Uber
Managing Uber’s Data Workflows at Scale - Eliminating single points of failure and converging to unified products when possible are very solid principles to be considering for larger platforms
Databook: Turning Big Data into Knowledge with Metadata at Uber - Cool overview on how Uber brings out their data discovery
Turbocharging Analytics at Uber with our Data Science Workbench - Self-serve analytics platforms like what Uber has built are the backbone of larger organizations
Engineering Data Analytics with Presto and Apache Parquet at Uber - How Uber uses Presto and Parquet for an efficient SQL engine
ETA Phone Home: How Uber Engineers an Efficient Route - An interesting read on how Uber puts routes together
Identifying Outages with Argos, Uber Engineering’s Real-Time Monitoring and Root-Cause Exploration Tool - An earlier but still extremely relevant post on anomaly detection and the role it plays in monitoring
The Pulse of a City: How People Move Using Uber Engineering - For those into data visualization, a nice view into Uber transport in big cities
Evolution of Data Lifecycle Management at Uber - DLM is a very relevant topic these days, especially with an increased focus on costs. How Uber handles it is a good standard to follow.
Dynamic Executor Core Resizing in Spark - OOM errors in Spark are the worst. This is a good method to make that issue easier.
Attribute-Based Access Control at Uber - Proper access control is tricky when it comes to tables, so this is a good foundation for others to follow.
Announcing Cadence 1.0: The Powerful Workflow Platform Built for Scale and Reliability - There's always more room for workflow engines, so cool to see what Cadence can bring to the table.
Sparkle: Standardizing Modular ETL at Uber - I am all for standardizing ETL development wherever it can be. Sparkle seems like a very smart approach.
Preon: Presto Query Analysis for Intelligent and Efficient Analytics - Excellent approach to query optimization/analysis
Genie: Uber’s Gen AI On-Call Copilot - Excellent use of LLMs to cut down on manual on-call effort
Presto® Express: Speeding up Query Processing with Minimal Resources - Good read on chunking and speeding up query processing
Designing for Data - Telling a story with data is underrated
Rapid & Reliable ML Experiments using MLOps Best Practices. - A good application of MLOps principles
element: Walmart’s Machine Learning Platform - A very good overview of the ML platform that's in place at Walmart
Unsung Saga of MLOps - This is a good set of principles to really kick ML engineering into high gear
MLOps — Is it a Buzzword??? Part -1 - MLOps is more than just a buzzword or a trend. It's a cultural change.
The Importance of Good Data - For those who like to sleep on data quality, this one's for you
Pillars of Walmart’s Demand Forecasting - The pillars used to accomplish proper demand forecasting make sense for any company in the same line of work
DataBathing — A Framework for Transferring the Query to Spark Code - We actually use a similar process to simplify SparkSQL queries. It's good to see others do the same.
Engineering Acceleration with InnerSource Culture - Inner-source culture is a big one in companies.
Unified Monitoring of ETL Performance with BumbleBee - A good overview of how to do effective ETL monitoring
Resiliency Through Message-Driven Architecture - The message/event-driven architecture is definitely a sensible one based on the internals of your application
Cloud Native Architecture Fundamentals - A good overview of what it really means to be Cloud-native
Data as a Service - A lot of these same concepts are a part of data products as we know them now
Auditing Airflow Job Runs - Auditing Airflow job runs is crucial as a part of proper observability
The Keystone of Happy Teams - Psychological safety is a great term to use when distinguishing the average team from happy teams
Building a Platform Team — Laying the Foundations - A great overview on how to really set up a proper platform team
Product Management 101: 8 Steps to Design Better Products - Even as engineers, we should be familiar with many of these concepts so we can help our product stakeholders accordingly
The Power of an Invisible Leader - I've never heard the term "invisible leader" before, but it's a sensible one based on the description
Work Got You Stressed? Here Is My Secret To Controlling The Chaos. - A very applicable guide to myself, as I struggle with work stress all the time
5 Principles Guaranteed to Help Build a Strong Team Culture - As someone who is big on team culture, I thought this was a great read.
Introducing our Machine Learning and Data Platforms Team - Platform teams enable a lot of success within an organization
Enabling Supplier Sales Through Real-time Data - How real-time data unlocks more potential for Wayfair
Rolling Back an Airflow Upgrade - Things are never perfect, so this is a good post on how to recover from a failed Airflow upgrade.
Effective Software Design Documents - Effective design documents are super helpful when iterating on a product
Improving Airflow UI Security - A good model for ensuring proper security within the Airflow UI
Knowledge Transfer in Engineering: How to make it go smoothly - Effective KTs are a gamechanger for engineers, so this is a solid set of principles for better enabling it
Spark Data Lineage - I've never really seen Spark being used for lineage purposes, but this is a cool implementation from Yelp on how it can be accomplished
Engineering Career Series: How we onboard engineers across the world at Yelp - Effective onboarding programs make the process so much easier for engineers as they start their journey
Growth Engineering at Zalando - Mentoring and role frameworks help enable growth and success for engineers
Accelerate testing in Apache Airflow through DAG versioning - DAG versioning makes complete sense for dealing with testing alongside ongoing production processes leveraging the same DAGs
Principal Engineering at Zalando - A good primer on what principal engineers mean to an organization
A Systematic Approach to Reducing Technical Debt - The concept of a tech debt rotation isn't a bad idea to help keep that area in check
The Product Playbook - The 4 D's is a sensible approach for product design
Four Pillars Of Leading People - Good principles to follow for those in leadership roles.
The Democratization of ‘Data Science As A Service’ - I'm always a fan of promoting data science/engineering as a service
Discovering Design Sprints - Sometimes the war room doesn't have to be such a bad thing
Data Analysis with Spark - For those newer to DE, a basic overview of Spark.
Dedicated Ownership for Teams at Zalon - A good model on team structure
Thinking Fast and Estimating Wrong - Estimation never seems to be right when it comes to planning in software development, so I totally agree with the message of this post.
Building a Data Streaming Platform - How Zillow Sends Data to its Data Lake - An interesting look at how Zillow combines all its data sources into the lake
Airflow at Zillow: Easily Authoring and Managing ETL Pipelines - Zillow has always had a strong Airflow presence, and this article from 2017 still holds up.
Building a strong foundation to accelerate StreetEasy’s data science efforts - A great post on what it takes to build a data foundation
The Deep Tech Behind Estimating Food Preparation Time - Interesting post into the logic behind food service.
Learning to Be a Tech Lead - Becoming a tech lead is not a simple change and requires shifting your priorities/frame of reference.