-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blog: How we improved feature flag resiliency #5546
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Okay, rough structure is there. I could use some feedback now @andyvan-ph @joethreepwood @liyiy @EDsCODE on content, structure, and if this all makes sense. And @ellie @hazzadous on accuracy of the small infra things I've mentioned here. |
Co-authored-by: Ellie Huxtable <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bunch of edits.
I think it is important to tighten up the intro and get to the "meat" of the article faster (as it is really interesting and I want to make sure as many people as possible get to that point)
|
||
So, when thinking about reliability, we want to prioritise defending against things that happen frequently, or have a high chance of occurring over time. This includes things like redis, postgres, or pgbouncer going down. Then, if we have the resources and nothing better to prioritise, we can focus on defending against asteroids. | ||
|
||
Today, we can't yet defend against asteroids, nor the entire infrastructure going down, but for other things, like postgres, we've found ways to defend against this, leveraging our special problem constraints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the light touch that a small joke about asteroids brings to the piece. I think it'd be even better if it was only contained in the last 1-2 sentences or only referenced 1-2x at most instead of 3-4x? 🤗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOL agree I think I liked it so much I forgot I already included it 😂
Have removed and combined the above sentence into one.
Overall awesome work! Really helps to paint a picture of all the work the team has done in the past couple of months to improve feature flags 🎉 |
I'm not sure what images we would put here, maybe some screenshots of latency improvements from our grafana as an example? Just to split up the text a bit and make it more (visual) reader-friendly |
Co-authored-by: Andy Vandervell <[email protected]>
Co-authored-by: Andy Vandervell <[email protected]>
Hmmm, good question, thinking about this, but nothing great comes to mind. Maybe a flow chart of how things are setup and where the borkages happen? Like, a sample request flow? Request comes in ---> Django server for feature flags ---> fetch feature flag definitions from redis
And the arrows going to redis and pgbouncer are sources of problems & latency. |
Ooh yes this is great! |
yes 2nd option is very nice |
@neilkakkar + @ivanagas: Thanks both for the graphic stuff. Have added both and done another light polish pass on the copy. I was doing a mental Hacker News pre-mortem and I think the only thing this is missing is... evidence. We say we've made flags faster / more reliable, but there's nothing to back that up. @neilkakkar is there something we can near to the end here? I don't think it need to be super in-depth, it just needs something to prove it out. |
hmm, we don't have our latency logs anymore, because this was > 3 weeks old. So we can't show before & after, but I guess we can show current latency times and then the status page: https://status.posthog.com/uptime/1t4b8gf5psbc?page=3 and how incident rate has gone down. Will add a blurb to the end when I'm back tomorrow, thanks! |
just added this section as an appendix |
Changes
Please describe.
Add screenshots or screen recordings for visual / UI-focused changes.
Checklist
vercel.json