Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formalize a process for bulk changes #4469

Closed
3 tasks done
ddbeck opened this issue Jul 8, 2019 · 11 comments · Fixed by #4755
Closed
3 tasks done

Formalize a process for bulk changes #4469

ddbeck opened this issue Jul 8, 2019 · 11 comments · Fixed by #4755
Labels
docs Issues or pull requests regarding the documentation of this project. infra Infrastructure issues (npm, GitHub Actions, releases) of this project KR: Real BCD Key Result: Eliminate true/null values and replace them with "real" values.

Comments

@ddbeck
Copy link
Collaborator

ddbeck commented Jul 8, 2019

Some recent changes, such as #3427 (enforcing feature orders) and #3888 (removing Edge Mobile) has highlighted that our process for making bulk changes to BCD is not ideal: it takes many PRs, reviewing is slow and tedious, and the difficulty has delayed many desirable changes. We can and should:

  • come up with a better process
  • try it out
  • document it for future use
@ddbeck ddbeck added infra Infrastructure issues (npm, GitHub Actions, releases) of this project docs Issues or pull requests regarding the documentation of this project. labels Jul 8, 2019
@ddbeck
Copy link
Collaborator Author

ddbeck commented Jul 8, 2019

I did some reading on how projects do large code changes (e.g., how projects use tools like codemod). After taking a look, it sounds like we might benefit from treating bulk changes a bit like database migrations. Our process might look something like this:

  1. Through the normal PR process, add a script with tests that demonstrates the change to be made. Critically, the script should change one and only one thing about the data. Special cases should get their own script (and go through this process separately); manual edits should go through the ordinary change-and-review process.

  2. Conduct the migration. This should be a tightly scheduled process, in which someone (presumably the script author) runs the script and opens a PR and a designated reviewer reviews and merges it. Ideally, this review should be easy because it's already been tested and the resulting diff will contain no manual edits.

  3. Open and review a new PR, which introduces a linter change that enforces the changes applied by the script.

A lot of this tends to happen already (e.g., @vinyldarkscratch has written and shared fix scripts before). The main difference seems to be in writing tests and scheduling the application of the change. It seems like doing it this way would reduce the number of PRs needed to apply a change, though it may increase the risk that uninvolved PR authors may have to rebase their PRs.

If anyone has ideas for improving on this, I'm all ears. Similarly, we need to find a test case (e.g., @vinyldarkscratch's feature sorting might be a good candidate, if she's interested in testing the process), though there are plenty of bulk changes on our wish list.

@queengooborg
Copy link
Contributor

I think that treating bulk changes like database migrations is just the way to go about it, and I’m in favor of this new review process. I would be more than happy to offer my PR and assist in this!

@ddbeck
Copy link
Collaborator Author

ddbeck commented Jul 8, 2019

Thanks, @vinyldarkscratch!

I missed a couple things: one, I didn't quite make it explicit that the change script should come with tests, independent of the real data. Basically, we should make it abundantly clear what the change script does and validate that it does that thing, before running it real data.

Also, I should probably tag @Elchi3 on this, since we talked a little bit about this recently

@jpmedley
Copy link
Contributor

jpmedley commented Jul 8, 2019

How long would this process take from end to end? I'm close to being able to automate most updates for api/ for a full version of Chrome. I ask because I would need to use this process every six weeks.

@queengooborg
Copy link
Contributor

Personally, I would say that changes like yours aren't really the bulk changes we're thinking about in this discussion, @jpmedley. What we're mostly referring to is things like massive, automated schema changes, or feature sorting, or the removal of a deprecated browser. So, I think you're in the clear!

I didn't quite make it explicit that the change script should come with tests, independent of the real data.

I'd assume that we would want to keep those tests around, correct? This also has me thinking about the linter scripts: perhaps we should have tests for each of the linter functions as well, to make sure that any changes to them will still produce results as expected. Just a little extra thought I figure I'd throw out!

@ddbeck
Copy link
Collaborator Author

ddbeck commented Jul 9, 2019

@jpmedley Yeah, I don't think your case is impacted by this. I was thinking more of cases where we're transforming existing data. We know what the JSON looks like before and what it should look like after. Maybe I should have used the term "transformations" instead of "changes." Your automation would, presumably, introduce wholly new data and I don't think a process like I've described here would be helpful in such cases.

@vinyldarkscratch It's not be a bad idea to have tests for the existing linter behavior, though I suspect the code itself is well exercised anyway, if how many npm test failures I've seen is any indication. Either way, it's probably better as a separate issue; I don't think we need those tests to start using a process for bulk transformations.

@Elchi3
Copy link
Member

Elchi3 commented Jul 9, 2019

I'm totally in favor of having a bit more formal process around bulk-changes. Once we finally switch to semantic versioning (secretly aiming for 1.0.0 after 0.0.99), such changes can be reflected in the minor or (if breaking) major release versions and be properly announced in release notes as well. If we formalize this process until 1.0.0, I'm sure data consumers would appreciate it :)

I also agree Joe's case isn't necessarily affected here. It's more about structural things. Transformations is a good word for it or thinking about it like database migrations.

@jpmedley
Copy link
Contributor

jpmedley commented Jul 9, 2019

(Might I suggest the term 'schema migration'.)

@queengooborg
Copy link
Contributor

Some conversation regarding milestones led me to an idea that I figured I’d place here. Milestones have due dates, and are used to help track big breakthroughs. Currently, only milestones for sprints exist, but it was mentioned in #4417 that peers can create their own milestones. Perhaps it would be useful here? (Just some 5:27-in-the-morning thoughts to throw out there!)

@ddbeck
Copy link
Collaborator Author

ddbeck commented Jul 11, 2019

Yeah, milestones might be useful for tracking these migrations, since there will probably be at least three or four issues/PRs for each one: the proposal issue, the PR adding the migration script and test (this might be combined with the proposal itself), the PR running the migration, and the PR enabling the linter check (if applicable). We can try that out too and see if it's seen as helpful or essential

@ddbeck
Copy link
Collaborator Author

ddbeck commented Aug 28, 2019

OK, we've merged #3427, the first attempt to go through this new process. My thanks to @vinyldarkscratch for doing the hard parts and @Elchi3 for supporting this experiment.

The good news is that this experiment shows this process is better. Despite some minor hiccups, I can't say I'd like to go back to the old way of doing things. In detail, the outline I described in #4469 (comment) seems to hold up. I did discover a couple of ways we might improve for the next migration:

  • It would be a good idea to have a long-lived tracking issue for each migration. We managed next steps in each successive PR of the process, which caused process discussion in each PR to delay progress a little (e.g., leaving a PR open until some task, like scheduling a migration, was completed). We could have that discussion in the issue that provokes a migration, or in a standalone migration tracking issue, but keeping things organized in the PRs themselves was a bit noisy.
  • The actual migration PR and review should probably be completed by two people who share more than a couple of overlapping daylight hours, even if that means deputizing someone to do the migration PR with a Co-authored-by annotation. A rigid schedule and a lot of time zones between us put a needless burden on @vinyldarkscratch and I'd like to avoid that in the future.

The next step toward completing this is to open a PR documenting the process. I plan to do this in the next few days. Watch this space.

@jmswisher jmswisher added the KR: Real BCD Key Result: Eliminate true/null values and replace them with "real" values. label Aug 30, 2019
@jmswisher jmswisher added this to the Odetta (S3 Q3 2019) milestone Aug 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Issues or pull requests regarding the documentation of this project. infra Infrastructure issues (npm, GitHub Actions, releases) of this project KR: Real BCD Key Result: Eliminate true/null values and replace them with "real" values.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants