Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactors the profile route and related utils for increased clarity #68

Merged
1 commit merged into from
Apr 22, 2019

Conversation

ghost
Copy link

@ghost ghost commented Apr 15, 2019

This PR overhauls the backend of factfinder, hopefully making the process a little clearer.
Data fetches were split out into separate SQL calls, but in my testing none took more than 10s of millisecs, so I don't think we should see huge performance hits. Still the larger lag time is from the call to mongo.

might be easier to look at the new branch, as opposed to the diff, because so much changes. However, I tried to be pretty thorough in commenting. If at any point it is hard to follow, please PLEASE throw a comment in the PR. I want this to be maintainable code! :)

related front end changes are necessary: NYCPlanning/labs-factfinder#709

Copy link
Collaborator

@allthesignals allthesignals left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THANK YOU

(Review pt. 1 / 2)

Just some assertions about how this works generally, not value judgements:

  • profile and decennial query helpers have been simplified to accept a boolean argument that determines which release year to source for the calculations, reducing the complexity of the queries. Now, the helpers are called multiple times and configured to change year.
    (image 😍)
  • The API's concept of "previous_" and "comparison_" namespaces are now handled more cleanly through a single class definition with one public method, processRaw, which accepts an argument for determining whether it's previous or comparison. This seems to take care of any "special calculations" that must happen after the SQL query. Additionally, the change and difference "calculations" - which I think refer to the reliability calculations - are now neatly contained in explicit utility functions, and applied on a row-by-row basis.
  • Constants for calculations are sanely stored and managed in special-calculations
  • Configurations for special calculations are sanely organized by profile rather than section names.
  • The main events in this refactor are: data-ingestor, change, and difference, with some adjacent utilities that get pulled into these.
  • All formulas are handled with an Excel formula parser so that they can all be found in one place using a syntax that's at least mostly familiar to experts in population division.

Possible bikesheddy Qs:

  • What is newquery and notes for? Can we remove?
  • profile-single.js was edited then renamed to oldquery - does it still need to be in there?
  • What is file q for? Notes?

I would love to take one more and dig more into the 3 "main events" in a second review, but just gonna gush a lil more: this is awesome, and I now feel I can confidently work with this source code without Xanax.

package.json Outdated Show resolved Hide resolved
routes/profile.js Outdated Show resolved Hide resolved
routes/profile.js Outdated Show resolved Hide resolved
routes/profile.js Outdated Show resolved Hide resolved
routes/profile.js Outdated Show resolved Hide resolved
routes/selection.js Outdated Show resolved Hide resolved
schema/schema.sql Outdated Show resolved Hide resolved
schema/schema.sql Outdated Show resolved Hide resolved
Copy link
Collaborator

@allthesignals allthesignals left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Review pt. 2 / 2)

Rando thoughts:

  • I can't tell for sure, but maybe you can confirm: does some of the same "math" happen in both the SQL query and the data-ingestor (via the formulas exports)? Like, is this duplicated across? I might have answered my own question: difference and change are handled separately from data-ingestor, these calculations are what happen separately from the initial SQL queries and downstream of the special calculations. Yes?
  • Formulas vs JavaScript: how did you make the distinction between what kind of logic should be handled in the Excel formula parser and what should be handled in JavaScript utilities? I get interpolate, for example, but things like the diff and change calcs include some boolean logic before calling formulas. Is it worth migrating some of that boolean logic into the formulas?
  • Can I help with some of the unit testing?
  • This is amazing.

utils/data-ingestor.js Show resolved Hide resolved
utils/data-ingestor.js Outdated Show resolved Hide resolved
utils/data-ingestor.js Outdated Show resolved Hide resolved
utils/data-ingestor.js Show resolved Hide resolved
utils/data-ingestor.js Show resolved Hide resolved
utils/data-ingestor.js Show resolved Hide resolved
utils/difference.js Show resolved Hide resolved
utils/difference.js Outdated Show resolved Hide resolved
utils/difference.js Show resolved Hide resolved
utils/difference.js Outdated Show resolved Hide resolved
/* is aggregate */ false,
/* is previous */ false,
/* is comparison */ true,
).processRaw(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One API enhancement we could do is change the parameters of DataIngestor to expect an object for optional params like is aggregate, previous, comparision, etc:

new DataIngestor(compareProfileData, profileName, {
  isAggregate: false,
  isPrevious: false,
  isComparison: true,
}).processRaw(),

doChangeCalculations(row);
doDifferenceCalculations(row);
return row;
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!!

@ghost ghost force-pushed the backend-refactor branch 2 times, most recently from 8922781 to a7f6180 Compare April 18, 2019 16:20
@ghost ghost force-pushed the backend-refactor branch from a7f6180 to 14ca69f Compare April 18, 2019 21:17
@ghost ghost merged commit 08e20dc into develop Apr 22, 2019
@ghost ghost removed the in progress label Apr 22, 2019
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant