Add org type, designation to Schedule A and B tables #3787

dorothyyeager · 2019-05-23T16:26:09Z

lbeaufort · 2019-05-23T16:52:51Z

Thanks for entering this issue, @dorothyyeager! After speaking with @PaulClark2, we need organization type and designation in addition to committee type to filter by SSF's and Leadership PACs. We'd need to add these to Schedule A in order to filter by them, which is a big lift.

For now, think we should remove those options from the receipts/individual contributions datatables, since they don't work.

lbeaufort · 2019-05-23T16:53:47Z

I'll go ahead and copy this to the CMS to remove the filters as a short-term workaround.

PaulClark2 · 2019-05-24T12:32:49Z

This includes joint fundraising committees. For the statistical releases we exclude joint fundraising committees.

Separate Segregated Funds (SSFs):
Corporate: committee types N, Q or U, committee designations B, J or U, organization type C
Labor: committee types N, Q or U, committee designations B, J or U, organization type L
Trade: committee types N, Q or U, committee designations B, J or U, organization type T
Membership: committee types N, Q or U, committee designations B, J or U, organization type M, connected organization name NOT NULL and not equal to NONE
Cooperative: committee types N, Q or U, committee designations B, J or U, organization type V
Corporations without stock: committee types N, Q or U, committee designations B, J or U, organization type W

Nonconnected Committees:
Independent expenditure-only political committees (Super PACs): committee type O, committee designations B, J and U
Committees w with non-contribution accounts (Hybrid PACs): committee types V or W, committee designations B, J or U
Leadership PACs: committee types N or Q, committee designations D
Other nonconnected PACs:: committee types N, Q or U, committee designations B, J or U, organization type IS NULL OR (organization type M AND connected organization name IS NULL) OR (organization type M AND connected organization name NONE)

This sql is similar to what I use for the statistical releases. This sql includes joint fundraising committees.

SELECT 
(CASE WHEN cm.org_tp = 'C' THEN '1-Corporate'                        
    WHEN cm.org_tp = 'L' THEN '2-Labor'                        
    WHEN cm.org_tp = 'T' THEN '3-Trade'                        
    WHEN  ((cm.org_tp IS NULL) OR (cm.org_tp = 'M' AND cm.connected_org_nm IS NULL) OR (cm.org_tp = 'M' AND cm.connected_org_nm = 'NONE')) THEN '7-NonConnected'                    
    WHEN cm.org_tp = 'M'  AND cm.connected_org_nm IS NOT NULL AND cm.connected_org_nm <> 'NONE'  THEN '4-Membership'                        
    WHEN cm.org_tp = 'V' THEN '5-Cooperative'                        
    WHEN cm.org_tp = 'W' THEN '6-Corporation without stock'  END)  organization_type,
cm.cmte_id,
cm.cmte_nm,
cm.cmte_tp, 
cm.cmte_dsgn, 
cm.org_tp
FROM cmte_valid_fec_yr cm
WHERE cm.fec_election_yr = 2020
AND cm.cmte_tp in ('N', 'Q', 'U')
AND cm.cmte_dsgn not in 'D';

lbeaufort · 2019-06-04T14:42:46Z

The CMS issue is already here: fecgov/fec-cms#2913

lbeaufort · 2019-06-12T20:46:11Z

We should consider asking the database team whether they'd rather do this at the same time as the DB updates for #3798

PaulClark2 · 2019-06-14T19:59:01Z

org_tp and cmte_dsgn are in cmte_valid_fec_yr

fecjjeng · 2019-07-09T14:38:04Z

org_tp and cmte_dsgn columns had been added to fec_fitem_sched_a and fec_fitem_sched_b tables. It is better to keep this concise name of the columns

these tables are semi-base tables, it is better to keep column name consistent with all the other base tables as much as we can so it is easier to understand and maintain.
the object name in database has length limit. Long object name will be truncated. With lengthy table or MV name, plus lengthy column name, the length of derived object name such as index (especially indexes with more than one column) will be truncated and cause problem.

lbeaufort · 2019-07-09T14:48:54Z

@fecjjeng sounds good! I think the WIP PR has those column names. https://github.com/fecgov/openFEC/pull/3837/files

fecjjeng · 2019-07-16T16:54:41Z

The following is a summary of the steps need to be done for this ticket. A document with more detail named ADD_UPDATE_ORG_TP_COLUMNS.txt is in the shared drive.
Please note: This is a general list of work need to be done for add columns for the sched_a/sched_b tables. The actual code change will really very depending on the nature of the columns added.
-- ***************************************************
-- Prepare migration script. The migration script needs to include
-- ***************************************************

Add column statement
If the added column need to be in the filter, then add create indexes statement.
If this is a B-tree indexes, then include the sorting field (disb_dt/sub_id and disb_amt/sub_id) to make compound indexes

Also update the following function as approprieate (which can be used to create indexes for the correspoinging fec_sched_a_xxxx_yyyy/fec_sched_b_xxxx_yyyy) in the future
FUNCTION disclosure.finalize_itemized_schedule_a_tables
FUNCTION disclosure.finalize_itemized_schedule_b_tables

NOTE: These columns will be added to all 3 cloud environment ahead of time at the same time since

the java program that load the data (which is the same program, just pass in different db as parameter) will run to all environments
the java program that handle the new data need to be in place and ready to transfer BEFORE backfill happened.
the update statement to backfill this column for existing rows and indexes creation are time and resource consuming so need to be performed at off-peak hour

The migration file will be submitted and merged AFTER the actual tasks (add column, backfill data, create indexes) had been done. Therefore it need to be written in a specific way so it will not error out IF the columns and indexes already exists.

-- ***************************************************
-- Update Oracle packages, procedures, triggers, tables, materialized views, and views in intermediate database that calculate and control the Java Transfer program.
-- ***************************************************
-- ***************************************************
-- Update Java programs and shell script
-- ***************************************************
-- ***************************************************
-- Add columns to table, start daily transfer program, make sure all Oracle programs, Java programs, and shell scripts execute successfully
-- ***************************************************
-- ***************************************************
-- Prepare scripts to backfilled the data.
-- ***************************************************
-- ***************************************************
-- ***************************************************
-- on Postgresql Database
-- In this ticket, we also backfill the data for several tsvector columns that their content changed in issue #3798 (3 columns in sched_a and 2 columns in sched_b)
-- A temporary update triggers are added to each sched_a and sched_b partitions tables. Therefore during the update activity of this ticket, the update trigger will fire and update the tsvector columns

-- ***************************************************
-- ON AWS console
-- temporarily upgrade the class of our postgresql database to more powerful machine
-- NOTE: need to increase both master and replica. Otherwise a slow replica will not be able to catch up. This will slow down the whole process and the lag time will to too big to recover.
-- Parameters WAL_MAX_SIZE, WAL_MIN_SIZE, WAL_KEEP_SEGMENTS will need to be increased as well
-- Also need to anticipate increated storage size as well. Update operation in Postgresql will increase the size a lot. Also indexes will need space too.
-- ***************************************************

-- ***************************************************
-- on Postgresql Database

-- get "BEFORE UPDATE" size
select pg_size_pretty(pg_database_size('fec'));
-- ***************************************************
-- on Linux server
-- Execute the shell script to do the update work.(may not be able to finish all job at one time, need to break into smaller tasks)
-- NOTE: need to adjust the execution time to avoid nightly refresh time for all databases
-- NOTE: need to adjust the execution time to avoid peak query time for PRD database

-- ***************************************************
-- on Postgresql Database

-- get "AFTER UPDATE" size
select pg_size_pretty(pg_database_size('fec'));

-- ****************************
-- validate the update

-- ****************************************************
There are few rows with cmte_id that does not have corresponding data in cmte_valid_fec_yr. These are all in earlier cycles.
Data after 2007_2008 are completed with cmte_dsgn/org_tp data

For those few rows, the tsvector columns are not updated. Use the following statement to update them.

-- ****************************
-- Run create indexes script

-- ****************************
-- check if all indexes are created
select tablename, count(*)
from pg_indexes
where tablename like 'fec_fitem_sched_a%'
and (indexname like 'idx_sched_a_%org_tp%' or indexname like 'idx_sched_a%_cmte_dsgn%')
group by tablename
order by tablename;

select tablename, indexname
from pg_indexes
where tablename like 'fec_fitem_sched_a%'
and (indexname like 'idx_sched_a_%org_tp%' or indexname like 'idx_sched_a%_cmte_dsgn%')
order by tablename, indexname
-- ****************************
-- clean up the temporary UPDATE TRIGGERS that were used to update the tsvector columns from another issue

-- ***************************************************
-- ON AWS console
-- downgrade the class to be back to the regular class we use.
-- NOTE: need to increase both master and replica. Otherwise a slow replica will not be able to catch up. This will slow down the whole process and the lag time will to too big to recover.
-- Parameters WAL_MAX_SIZE, WAL_MIN_SIZE, WAL_KEEP_SEGMENTS will need to be back to original value
-- Review the storage size increase, make it official via Terraform.
-- ***************************************************

fecjjeng · 2019-07-16T16:57:53Z

DEV/STG/PRD databases had been backfilled, indexes created. Following is the summary of time consumed/space increased by this task.
NOTE: The time indicated is the actual running time, with most powerful class of AWS database we can use, does NOT include lag time in between to avoid peak time and nightly refresh time. So it is a minimum. The local database is usually 2 to 3 times faster than the cloud databases in the regular classes.

Local backup database:
Time: 57 hours

DEV:
Time: 22 hours 48 minutes
Space increased: 442 GB

STG:
Time: 17 hours 26 minutes
Space increased: 458 GB

PRD:
Time: 19 hours 31 minutes
Space increased: 315 GB

fecjjeng · 2019-07-17T15:36:45Z

Unblock #3841 and moved it to ready column.

fecjjeng · 2019-07-17T15:48:14Z

A CMS ticket fecgov/fec-cms#3054 had been opened to add the filter back on.

dorothyyeager added Bug Needs refinement labels May 23, 2019

lbeaufort mentioned this issue May 23, 2019

Remove SSFs and Leadership PACs from receipts/individual contribution datatables fecgov/fec-cms#2913

Closed

1 task

lbeaufort added the Needs prioritization label May 23, 2019

lbeaufort changed the title ~~Search for "all contributions to corporate SSFs" yields all contributions~~ Add org type, designation to Schedule A datatables May 23, 2019

lbeaufort added this to the Sprint 9.2 milestone Jun 3, 2019

lbeaufort added the Work: Database label Jun 4, 2019

dorothyyeager assigned lbeaufort Jun 4, 2019

lbeaufort removed their assignment Jun 4, 2019

lbeaufort modified the milestones: Sprint 9.2, Sprint 9.3 Jun 4, 2019

PaulClark2 mentioned this issue Jun 5, 2019

Research: Add connected org to committee history #3807

Closed

2 tasks

AmyKort removed Needs refinement Needs prioritization labels Jun 10, 2019

AmyKort added the Needs refinement label Jun 13, 2019

lbeaufort mentioned this issue Jun 13, 2019

Remove SSFs from Sch B datatables fecgov/fec-cms#2966

Closed

1 task

lbeaufort changed the title ~~Add org type, designation to Schedule A datatables~~ Add org type, designation to Schedule A and B datatables Jun 13, 2019

lbeaufort removed the Needs refinement label Jun 13, 2019

lbeaufort assigned rjayasekera Jun 18, 2019

This was referenced Jun 26, 2019

Add org_type and designation to schedules a and b #3837

Merged

Add org type to API for Schedules A and B #3841

Closed

lbeaufort changed the title ~~Add org type, designation to Schedule A and B datatables~~ Add org type, designation to Schedule A and B tables Jun 26, 2019

lbeaufort modified the milestones: Sprint 9.3, Sprint 9.4 Jul 1, 2019

dorothyyeager assigned fecjjeng and unassigned rjayasekera Jul 2, 2019

dorothyyeager modified the milestones: Sprint 9.4, Sprint 9.5 Jul 12, 2019

fecjjeng mentioned this issue Jul 16, 2019

Add org_tp and cmte_dsgn columns and related indexes to fec_fitem_sch… #3867

Merged

hcaofec closed this as completed in #3867 Jul 17, 2019

fecjjeng mentioned this issue Jul 17, 2019

Add the filter for SSFs and Leadership PACs back to receipts/individual contribution/disbursement datatables fecgov/fec-cms#3054

Closed

fecjjeng mentioned this issue Jul 18, 2019

non-schedule_a and non-schedule_b tables: Update .*_text columns for tables with new tsvector #3836

Closed

patphongs mentioned this issue Mar 4, 2024

Epic: Miscellaneous tasks, chores and bugs fecgov/fec-epics#113

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add org type, designation to Schedule A and B tables #3787

Add org type, designation to Schedule A and B tables #3787

dorothyyeager commented May 23, 2019 •

edited by fecjjeng

Loading

lbeaufort commented May 23, 2019

lbeaufort commented May 23, 2019

PaulClark2 commented May 24, 2019 •

edited

Loading

lbeaufort commented Jun 4, 2019

lbeaufort commented Jun 12, 2019

PaulClark2 commented Jun 14, 2019

fecjjeng commented Jul 9, 2019 •

edited

Loading

lbeaufort commented Jul 9, 2019 •

edited

Loading

fecjjeng commented Jul 16, 2019 •

edited

Loading

fecjjeng commented Jul 16, 2019

fecjjeng commented Jul 17, 2019

fecjjeng commented Jul 17, 2019

Add org type, designation to Schedule A and B tables #3787

Add org type, designation to Schedule A and B tables #3787

Comments

dorothyyeager commented May 23, 2019 • edited by fecjjeng Loading

lbeaufort commented May 23, 2019

lbeaufort commented May 23, 2019

PaulClark2 commented May 24, 2019 • edited Loading

lbeaufort commented Jun 4, 2019

lbeaufort commented Jun 12, 2019

PaulClark2 commented Jun 14, 2019

fecjjeng commented Jul 9, 2019 • edited Loading

lbeaufort commented Jul 9, 2019 • edited Loading

fecjjeng commented Jul 16, 2019 • edited Loading

fecjjeng commented Jul 16, 2019

fecjjeng commented Jul 17, 2019

fecjjeng commented Jul 17, 2019

dorothyyeager commented May 23, 2019 •

edited by fecjjeng

Loading

PaulClark2 commented May 24, 2019 •

edited

Loading

fecjjeng commented Jul 9, 2019 •

edited

Loading

lbeaufort commented Jul 9, 2019 •

edited

Loading

fecjjeng commented Jul 16, 2019 •

edited

Loading