Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate incident targets into journalists and institutions #776

Merged
merged 33 commits into from
Feb 18, 2020

Conversation

chigby
Copy link
Contributor

@chigby chigby commented May 1, 2019

This pull request:

  1. Begins the process of replacing IncidentPage.targets with Institution and Journalist objects.
  2. Adds models for Institution and Journalist, as well as a "through" model of TargetedJournalist relating a journalist to an incident as well as an institution (representing which institution they were with at the time of the incident, if any).
  3. Adds pages in the admin for merging these two new model objects.
  4. Adds a data migration to copy data from the Target table into the Journalist and Institution tables as appropriate. Due to a bug in the modelcluster package, this must be done prior to converting the Incident.targeted_institutions relationship to a ParentalManyToManyField, so there are is a secondary migration file to do that conversion after the data migration is done.
  5. Adds a filter class for handling many-to-many-through relationships for the journalist-incident situation. This allows filtering to work on both the new fields if those fields are enabled as filters in the admin.
  6. Updates the get_summary method on incidents to report counts of both journalists affected and institutions affected (instead of just targets).
  7. Updates the incident page sidebar to display counts of both journalists affected and institutions affected (instead of just targets).

Notably, this pull request does not remove the Target model or its relationship to IncidentPage, only places where that data is displayed publically. I feel that when this pull request is live and working, only then should we feel confident enough to remove the old tables and code.

Note that in order for the new fields to be filterable on the production site, as it stands now, the incident filter settings will have to be manually changed from filtering by targets to filtering by institutions and journalists.

Also notable: this PR doesn't touch the relationship IncidentPage.targets_whose_communications_were_obtained

Addenda 2020-02-12:

  1. Adds another category to replace items in the Targets table with a blank kind field: GovernmentWorker. It functions like Institution above in that its relationship to IncidentPage is many-to-many without a through model. I've added a relationship to IncidentPage named workers_whose_communications_were_obtained and a data migration to copy items from targets_whose_communications_were_obtained into this new scheme.

  2. The old targets_whose... relationship is left intact. It should be removed when we remove Target and its relationships, as mentioned above.

  3. Added human-readable "stringify" values for Journalist and Institution (and GovernmentWorker).

Refs #707

@chigby chigby requested a review from harrislapiroff as a code owner May 1, 2019 21:11
@chigby chigby force-pushed the separate-incident-targets branch from b7be61f to c641eb3 Compare May 28, 2019 14:33
@conorsch
Copy link
Contributor

This one fell out of sync with master, which is blocking deploys to sandbox given lack of logic from #780. @harrislapiroff if you can rebase this in lieu of @chigby as part of review, we can get you a sandbox deploy to aid in review, if that's helpful.

@conorsch conorsch force-pushed the separate-incident-targets branch from 9f69acb to efcea7e Compare June 13, 2019 00:40
@conorsch
Copy link
Contributor

Rebased on top of latest master (05befcf) 😇

@chigby chigby force-pushed the separate-incident-targets branch from efcea7e to 8a2fb04 Compare July 23, 2019 18:35
@chigby chigby force-pushed the separate-incident-targets branch from 320d88c to 19fd40c Compare September 9, 2019 20:14
@chigby chigby changed the base branch from master to csv-header-test-readability September 9, 2019 20:33
@chigby chigby force-pushed the csv-header-test-readability branch from fd9081f to f8b2118 Compare October 21, 2019 14:10
@chigby chigby changed the base branch from csv-header-test-readability to master January 27, 2020 17:06
@chigby chigby force-pushed the separate-incident-targets branch 2 times, most recently from 292d77c to 7dc29de Compare January 27, 2020 18:32
Copy link
Contributor

@SaptakS SaptakS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the code and checked it in my local. It seems to do what it is supposed to do based on the description. I am just a little picky about too many merge migrations. I mean I know that the PR has been out there for a pretty long time and hence merge migrations, so it is not a blocker according to me for getting the PR merged. Also, @harrislapiroff might want to give a second check on the UX and implementation. But I checked and it works.

@harrislapiroff harrislapiroff force-pushed the separate-incident-targets branch 2 times, most recently from e080e68 to 70ac60e Compare February 10, 2020 22:01
Copy link
Contributor

@harrislapiroff harrislapiroff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks great. I think this is absolutely the right approach. Going to test it now.

@@ -50,7 +50,7 @@ class FiltersList extends PureComponent {
</ul>
)
} else {
renderedValue = filterValues[filter.name].label
renderedValue = filterValues[filter.name].title
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops

Targeted Journalists
</th>
<td class="data-table__value">
{% with page.targeted_journalists.all as journalists%}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a space

Suggested change
{% with page.targeted_journalists.all as journalists%}
{% with page.targeted_journalists.all as journalists %}

summary = (
('Total Results', total_queryset.count()),
('Journalists affected', Target.objects.filter(targets_incidents__in=total_queryset).distinct().count()),
('Journalists affected', Journalist.objects.filter(targeted_incidents__in=tj_queryset).distinct().count()),
('Institutions affected', Institution.objects.filter(institutions_incidents__in=total_queryset).distinct().count()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should read "Organizations affected" instead or something. Will check with @KAMcCudden

@harrislapiroff
Copy link
Contributor

One thing that worries me is that if we're not careful, we're going to silently lose targets that don't have a "Kind" set. I noticed in my local testing, targets that were uncategorized weren't migrated at all.

I'm not sure what a good solution to this would be. Having the migration throw and error if there are unmarked targets in the db? Writing a management command to migrate unmigrated targets? Just making absolutely sure all targets are marked before we deploy?

Since this PR leaves the data in place I'm not necessarily considering this a blocker, but I'd like a good idea for how we plan to address this before we merge it. Accepting suggestions

@KAMcCudden
Copy link

I will make sure the DB is up to date from my perspective this week (give me deadline if need sooner) and then there are all the leak case targets, where the target is neither an org nor a journalist. those have intentionally been left blank

@harrislapiroff
Copy link
Contributor

harrislapiroff commented Feb 11, 2020

Oooooh we might need a third category for them then 😬 The notion here is to make a clean separation of different target types in the database (such that we can accurately calculate, e.g., "Number of journalist subpoenas" vs "Number of organization subpoenas") and eventually get rid of the "Target" data type altogether...

We could create a new data type for those called "Non-journalist targets"? Would that make sense as a category @KAMcCudden? Something more specific?

@chigby
Copy link
Contributor Author

chigby commented Feb 11, 2020

I actually think a third category is good. As in, if our concept of a target has three different varieties that need to be treated/counted differently, our database layout should express that. We don't even necessarily need to remove the "Target" table, we could move journalists and institutions out of it (as done in this PR) and leave the rest of the contents there as the data referred to by leak case targets_whose_communications_were_obtained. Then we could either remove "kind" or add some different choices there.

Copy link
Contributor

@harrislapiroff harrislapiroff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After chatting with @chigby and @KAMcCudden, we've agreed that we need a third data type for non-journalist targets (potentially GovernmentEmployee). Since @KAMcCudden has been leaving these unmarked in the database (there's 8 of them, currently), let's migrate all unmarked targets to that object type.

@chigby chigby force-pushed the separate-incident-targets branch 3 times, most recently from 74db844 to 9533e25 Compare February 12, 2020 17:34
Copy link
Contributor

@harrislapiroff harrislapiroff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One change and I still have to do some more local testing. This is looking good though, otherwise!

<li class="data-table__list-item">
<a href="{% pageurl page.get_parent %}?targets={{ target.pk }}">{{ target.title }}</a>
<a href="{% pageurl page.get_parent %}?workers_whose_communications_were_obtained={{ target.pk }}">{{ target.title }}</a>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good catch on this behavior. I agree that the old behavior was likely unintended.

@@ -68,7 +68,7 @@ <h2 class="sr-only section-heading">Incident Data</h2>
Targeted Journalists
</th>
<td class="data-table__value">
{% with page.targeted_journalists.all as journalists%}
{% with page.targeted_journalists.all as journalists %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

]

operations = [
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol we're really gonna wanna squash migrations when we're done here

Copy link
Contributor

@harrislapiroff harrislapiroff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Ready to merge as soon as @chigby prepares a deployment plan.

Also add a migration to copy data from the old field to the new
fields.
The `targeted_institutions` field was only a `ManyToMany` field to
accomodate a data migration, we actually want it to be a
`ParentalManyToMany`.

See discussion for more information:

#707 (comment)
This was something that seems to have slipped through the cracks when
we upgraded to the latest "official" version of the autocomplete
plugin.

It seems fine to hard-code the locations of the css and js files,
given that we don't really expect them to change anytime soon.
We're pretty sure it's easy enough to add journalists to the site that
we're going to want some mechanism for combining erroneously added
ones, i.e. the times when an editor adds a journalist when they
intended to reference an existing one.
This is required due to rebasing on top of the wagtail 2.0 upgrade code.
These tests were added in master, which we've now rebased on, so let's
make them work for us.
This I think is needed because the choices for the incident filters
and statistics items are dynmically generated, and we've altered the
potential options for them by adding the journalists/institution
fields, so these need to be updated to take that into account.
This new model is intended to be a classification of targets whose
communications were obtained during leak investigations.  A data
migration has been added to copy all data from
`IncidentPage.targets_whose_communications_were_obtained` into this
new model.  The target data is not deleted.

I've also added the usual "Incident M2Ms" items for managing and
merging government workers.
In leak cases, we now show the new, more specific classification
Government Worker instead of the old Target information.

It actually looks like there's a change in behavior here (possibly
fixing something unintended) where the link on a target's name now
goes to the search page with the filter for other incidents in which
that target's comms were obtained.  Before, it linked to the filter of
all incidents *targeting* this person *generally*, which is not the
same thing under our schema.  The new behavior seems more correct to
me, as the linked-to filter is now the same context as the link.
@harrislapiroff harrislapiroff force-pushed the separate-incident-targets branch from 513ec67 to 7b06c47 Compare February 18, 2020 15:25
@chigby
Copy link
Contributor Author

chigby commented Feb 18, 2020

Broadly speaking, here is the outline of what changes will occur or will be required when this branch is merged into master and deployed.

  1. Automatic change: The "Targets" data will be copied from the combined Journalists/Institutions field to two new, separated fields. Here is how it will look in the "Details" section of the incident admin:

image

The page above has two targets in the old field, which now exist alongside one target in each of the new fields. The old field can still be modified, but I do not recommend it, as changes there will not automatically be copied to the new fields.

  1. Manual change required: update general incident filters to use new data fields.

The new target data exists, but the site will not allow searching with it until it has been explicitly chosen for such a purpose. Incident pages will display the institution and journalist target names in their right-sidebars with links to search.

image

These links will not work correctly until the filters have been updated.

To update them, from the admin, go to "Settings", then "General incident filters," then click the Add Filters button. The new filters are called "Targeted Institutions" and "Targeted and of these journalists." It should look something like this when you are done:
image

Making this change will also add these options to the main search menu.

  1. Manual change required: update general incident filters to remove old data field.

The old data field will still be searchable. This is not an immediate problem as the old data is the same for the time being. Still, it ought to be removed for being redundant, and the old data will also eventually be removed as well.

The old field is called "Targets (Organizations/Journalists)" and can be removed with the trash can icon:
image

  1. Manual change required: update statistics tags from {% num_targets ... %} to either
  • {% num_journalist_targets ... %} or
  • {% num_institution_targets ... %}

depending on what type of statistic is appropriate.

The old num_targets tag will still work, but it will refer to the old data, and will eventually be removed.

These tags can appear almost anywhere on the site, including statboxes and most rich text fields. There is not a good way that I know of to easily search for these.

  1. Automatic change: For leak cases, targets_whose_communications_were_obtained copied to workers_whose_communications_were_obtained

For incidents in the leak cases category, there is one additional field that we must handle, which I will call target_comms for brevity. Any Targets that were associated with leak cases via the target_comms will be copied to a new field, worker_comms for short. This is a third group in addition to institutions and journalists. There is no overlap between them.

  1. Manual change required: Update category incident filters.

As above, in order for the sidebar search links to work, we must add this new filter to the category filter for Leak Case incidents. This can be done by editing the Leak Case categoty page in the admin, and finding the in the "fields to include in filters" section.

The new field name is still "targets whose..." to keep the new wording consistent with the old on the public site. When this is done, it should look like this:

image

Conclusion

I believe this is all the changes that will happen. In spite of there being a fair number of things that must be adjusted manually, it will make it possible to, in the event of a problem with the new fields or data, undo references to the new fields without deploying new code.

The "hard-coded" bits are the incident page sidebar search links, which cannot be changed directly in the admin. What data these reference and what search filters they link to cannot be controlled by the admin. This is means there will be a window of time when they will not work properly immediately after the code is deployed, and also will require code changes to revert back, should the need arise. This in my opinion is unfortunate, but given the technical constraints of how the site works, I don't see any way around it without additional work to restructure the sidebar.

@harrislapiroff
Copy link
Contributor

Going to merge this, but I would like to coordinate the deploy for this feature a little more carefully, which maybe means holding off on the next release till Monday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants