Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only keep four months of raw filings data in PG #3789

Closed
1 of 2 tasks
PaulClark2 opened this issue May 24, 2019 · 12 comments
Closed
1 of 2 tasks

Only keep four months of raw filings data in PG #3789

PaulClark2 opened this issue May 24, 2019 · 12 comments
Assignees
Milestone

Comments

@PaulClark2
Copy link
Contributor

PaulClark2 commented May 24, 2019

What we're after: We need to develop an overall strategy to improve raw filing performance

Ideas:

  1. Reduce volume of data: We only need to keep a certain amount of raw filing data beyond a certain point. We need to get a clarification on what that time period is exactly. Once we determine how far back we need to go, we should clear out unneeded raw filing data.
  2. Add indexes

Raw vs. processed filters documentation: 🔒 https://docs.google.com/document/d/16DWIquTYpbdtCCDrXYLXXRrHQXS51sX_8cLY75spLA0/ 🔒

Related issues:
#3627
#3636

Things to consider:

  • How much raw filings data should we keep in API
  • Make a new ticket to check if the database performance for efiling improve after removing legacy data. Include a list of datatables that display raw data.
@lbeaufort lbeaufort added this to the Sprint 9.5 milestone Jul 15, 2019
@lbeaufort lbeaufort changed the title Develop strategy for raw filings data Develop strategy for raw filings data performance Jul 16, 2019
@AmyKort
Copy link

AmyKort commented Jul 17, 2019

@PaulClark2 and I met to discuss. Paul will reach out to the Press Office and Public Records to confirm we aren't aware of situations where users would look at raw filings that are more than one year old.

@PaulClark2
Copy link
Contributor Author

PaulClark2 commented Jul 18, 2019

We only need to keep three months of raw electronic data.

To do:

  • ask Salient to modify their upload process to include deleting data older than three months
  • will this cause short-term problems for the read-replicas

@PaulClark2
Copy link
Contributor Author

PaulClark2 commented Jul 23, 2019

Discussed this with the Press Office. They are good with keeping as little as 3 months of electronic filing data.

I'm going to move this ticket to 9.6 so that EFO and the DB team can work on it after the mid-year filing deadline.

@PaulClark2 PaulClark2 modified the milestones: Sprint 9.5, Sprint 9.6 Jul 23, 2019
@lbeaufort
Copy link
Member

Thanks @paul! Re: read-replica issues, we could possibly do this off-hours to minimize performance impact.

@lbeaufort
Copy link
Member

From @rjayasekera

Today I talked with Mahi about deleting eFile/pFile records prior to 5/1/2019 from AWS-DEV instance. He is currently busy with processing large amount of ActBlue amendments (about 25 mil records). He can get to this task after that, probably 1st or 2nd week of August.

@pkfec
Copy link
Contributor

pkfec commented Aug 27, 2019

@paul will follow up with Salient team

@PaulClark2
Copy link
Contributor Author

PaulClark2 commented Aug 30, 2019

Salient will start working on this Tuesday, 9/3

@PaulClark2 PaulClark2 changed the title Develop strategy for raw filings data performance Only keep four months of raw filings data in PG Sep 11, 2019
@PaulClark2
Copy link
Contributor Author

Salient is working on this.

@PaulClark2
Copy link
Contributor Author

Mahi and Rohan will start testing the delete process on DEV. We'll delete 5,000 reports per day during this test.

@patphongs
Copy link
Member

Still troubleshooting deletion process for all raw filings greater than 4 months. Moving to 10.2 to be completed then.

@patphongs
Copy link
Member

Right now we have 9 months of data for raw filing. We would like to delete and only keep 4 months of data.

@PaulClark2
Copy link
Contributor Author

Closing in favor of #4004

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants