Skip to content

Latest commit

 

History

History
1463 lines (1292 loc) · 34.8 KB

dataLocations.md

File metadata and controls

1463 lines (1292 loc) · 34.8 KB

Better Banking Data Process

As of January 13th, 2020

Table of Contents

Updates

Report Date Update Started Update Complete
12/31/2018 August, 2019 January, 2020
6/30/2019 Jan 13, 2020 tbd

Existing Data

Qualified Census Tract Geographies

  • Contains ~75,000 census tracts stored as geoJSON that includes a "qct" boolean within the properties object
  • These are used to determine if branches are located within qcts or not

If this needs to be updated, this is the direct download link found on this page. The file it downloads contains a list of all census tracts (by FIPS) that are considered qualified, as of 2017

Data to Refresh

FDIC Data


FDIC All Reports

FDIC Branch Data

FDIC Folder Structure

|-- fdic_data
|   |-- all_reports
|   |   |-- all_reports_raw
|   |   |   +-- [all *.csv from FDIC download]
|   |   +-- All_reports_***_readme.htm
|   |-- branch_data
|       |-- branch_data_definitions
|       |   +-- "Definitions-Table 1.csv"
|       |   +-- "Deleted Variables-Table 1.csv"
|       |   +-- "Read_me-Table 1.csv"
|       |   +-- "Sheet1-Table 1.csv"
|       |-- branch_data_raw
|           +-- ALL_2019.csv

NCUA Data


NCUA Call Reports

After download, separate out the files into the following folders/sections

NCUA Branch Data
  • ATM.txt
  • Credit Union Branch Information.txt
NCUA Institution Data
  • Acct-DescTradeNames.txt
  • TradeNames.txt (This is where you get CU names from rather than the search and replace crap)
  • foicu.txt (data for institutions)
  • FOICUDES.txt (data key for institutions)
NCUA Report Data
  • fs220.txt,
  • fs220A.txt,
  • fs220B.txt
  • fs220C.txt
  • fs220D.txt
  • fs220G.txt
  • fs220H.txt
  • fs220I.txt,
  • fs220J.txt
  • fs220K.txt
  • fs220L.txt
  • fs220M.txt
  • fs220N.txt

NCUA Folder Structure

|-- ncua_data
    |-- call_reports_raw
        |-- branch_data
            +-- ATM.txt
            +-- "Credit Union Branch Information.txt"
        |-- institution_data
            +-- Acct-DescTradeNames.txt
            +-- foicu.txt
            +-- FOICUDES.txt
            +-- TradeNames.txt
        |-- report_data
            +-- fs220.txt
            +-- fs220A.txt
            +-- fs220B.txt
            ...
            +-- fs220N.txt
        +-- Acct-DescGrants.txt
        +-- AcctDesc.txt
        +-- Grants.txt
        +-- Readme.txt
        +-- Report1.txt

Low Income Credit Unions (LICUs)

// TODO

Data Processing

All data will be going to a local mongoDb database.

A basic command in terminal to load a csv into mongo is the following:

mongoimport --db [databaseName] -c [collection] --type=csv --headerline

Import FDIC Institution Data

This data comes from fdic_all_reports. We need the following fields:

  • bkclass - common
  • fed_rssd - common
  • cert - common
  • specgrp - common
  • name - common
  • address - common
  • city - common
  • state - common
  • zip - common
  • webaddr - common
  • repdte - common
  • asset - Assets and Liabilities
  • lnlsnet - Net Loans and Leases
  • dep - Assets and Liabilities
  • depdom - Assets and Liabilities
  • lnreres - Net Loans and Leases
  • lnag - Net Loans and Leases
  • lnreag - Net Loans and Leases
  • lnci - Net Loans and Leases
  • lncon - Net Loans and Leases
  • lncrcd - Net Loans and Leases
  • Lnrenr4 - Small Business Loans
  • Lnci4 - Small Business Loans
  • Lnreag4 - Small Business Loans
  • Lnag4 - Small Business Loans
  • mutual - common
  • cb - common

We only need to import: All_Reports_20190630_Assets and Liabilities.csv All_Reports_20190630_Net Loans and Leases.csv All_Reports_20190630_Small Business Loans.csv

However, after importing the first file, we will be merging using the --upsertFields command

// first import command
mongoimport -d bb2020 -c fdic_all_reports --headerline --type=csv --file="All_Reports_20190630_Assets and Liabilities.csv"

The second and third import commands:

// second import command. Note --mode and --upsertFields
mongoimport -d bb2020 -c fdic_all_reports --headerline --type=csv --file="All_Reports_20190630_Net Loans and Leases.csv" --mode merge --upsertFields=fed_rssd

// third import command. Note --mode and --upsertFields
mongoimport -d bb2020 -c fdic_all_reports --headerline --type=csv --file="All_Reports_20190630_Small Business Loans.csv" --mode merge --upsertFields=fed_rssd

Create an rssd field on each document:

db.fdic_all_reports.aggregate([
  {
    $addFields: {
      rssd: '$fed_rssd'
    }
  },
  {
    $out: 'fdic_all_reports'
  }
]);

Now we index the fed_rssd field in the fdic_all_reports collection:

db.fdic_all_reports.createIndex({ fed_rssd: 1 });
db.fdic_all_reports.createIndex({ rssd: 1 });

Import FDIC Branch Data

While in the folder: ~/fdic_data/branch_data/branch_data_raw

mongoimport -d bb2020 -c fdic_branches --headerline --type=csv --file=ALL_2019.csv

First aggregation will filter out bank branches that either have no geolocation, or have a latitude/longitude of 0. It will create a location field and a 'rssd' field. The location and rssd will be indexed to facilitate future aggregations.

// First fdic_branches aggregation
db.fdic_branches_raw.aggregate([
  {
    $match: {
      $expr: {
        $and: [
          {
            $ne: [
              {
                $type: '$SIMS_LONGITUDE'
              },
              'string'
            ]
          },
          {
            $ne: [
              {
                $type: '$SIMS_LATITUDE'
              },
              'string'
            ]
          }
        ]
      }
    }
  },
  {
    $match: {
      $expr: {
        $and: [
          {
            $ne: [
              '$SIMS_LONGITUDE',
              0
            ]
          },
          {
            $ne: [
              '$SIMS_LATITUDE',
              0
            ]
          }
        ]
      }
    }
  },
  {
    $addFields: {
      location: {
        type: "Point",
        coordinates: [
          '$SIMS_LONGITUDE',
          '$SIMS_LATITUDE'
        ]
      },
      rssd: '$RSSDID',
      regulator: 'fdic'
    }
  },
  {
    $out: 'fdic_branches'
  }
])

Two index commands in the mongo client terminal:

// create 2dsphere index on location field
db.fdic_branches.createIndex({ location: '2dsphere' });
db.fdic_branches.createIndex({ rssd: 1 });

Calculate FDIC Branch Statistics

First we need to normalize the numbers on each fdic_branch document: (enter in mongo client)

db.fdic_branches.find().forEach((x, i) => {
  const domesticDeposits = parseInt(x.DEPDOM.toString().replace(/,/g, ''));
  const branchDeposits = parseInt(x.DEPSUMBR.toString().replace(/,/g, ''));
  x.domesticDeposits = domesticDeposits;
  x.branchDeposits = branchDeposits;
  db.fdic_branches.save(x);
});

Calculate centroid location

We'll use the average geolocation helper function to calculate the centroid of all branches:

// define this averageGeolocation function in the mongo termiinal
const averageGeolocation = require('./helper_functions/averageGeolocation.js');

//group all branch locations by rssd and calculate the centroid
db.fdic_branches
  .aggregate([
    {
      $group: {
        _id: {
          rssd: '$rssd'
        },
        branchIds: {
          $push: '$_id'
        },
        coords: {
          $push: {
            latitude: '$SIMS_LATITUDE',
            longitude: '$SIMS_LONGITUDE'
          }
        }
      }
    }
  ])
  .forEach(({ _id: rssd, branchIds, coords }) => {
    // for each set of branch ids, find the avg. location and write it back to all
    // of the branches as a new location
    const { latitude, longitude } = averageGeolocation(coords);
    branchIds.forEach(id => {
      db.fdic_branches.findOneAndUpdate(
        { _id: id },
        {
          $set: {
            centroidLocation: {
              type: 'Point',
              coordinates: [longitude, latitude]
            }
          }
        }
      );
    });
  });

Calculate Distance to Centroid

The following is the haversine formula to find the distance between two points. We use this on each fdic_branch document to calculate the distance in meters between the branch location and the centroid location. This will eventually be used to calculate the Branch Density score.

// define haversine formula in mongo terminal
const haversineDistance = require('./helper_functions/haversineDistance.js');

Now we iterate through all the fdic_branches, calculate the distance to the centroid and write it back to the branch as 'distanceToCentroid'.

db.fdic_branches.find().forEach((x, i) => {
  const branchCoords = {
    latitude: x.location.coordinates[1],
    longitude: x.location.coordinates[0]
  };
  const centroidCoords = {
    latitude: x.centroidLocation.coordinates[1],
    longitude: x.centroidLocation.coordinates[0]
  };

  const distToCentroid = haversineDistance(branchCoords, centroidCoords);
  x.distanceToCentroid = distToCentroid;
  db.fdic_branches.save(x);
});

Import NCUA Institution Data

This data comes from ncua_data/call_reports_raw/report_data.

Field locations

We need the following fields:

  • RSSD
    • Rssd for id
    • foicu.txt
  • CU_NUMBER
    • Credit Union charter number
    • foicu.txt
  • CU_NAME
    • Credit Union Name
    • foicu.txt
  • STREET
    • Street Addres
    • foicu.txt
  • CITY
    • City
    • foicu.txt
  • STATE
    • State
    • foicu.txt
  • ZIP_CODE
    • Zip
    • foicu.txt
  • Acct_891
    • website
    • fs220D.txt
  • CYCLE_DATE
    • report date
    • foicu.txt
  • CU_TYPE
    • credit union type
      1. Federal Credit Union
      2. Federally Insured State-Chartered Credit Union
      3. State Credit Union
    • foicu.txt
  • ACCT_010
    • total assets
    • fs220.txt
  • ACCT_025B
    • total loans
    • fs220.txt
  • ACCT_018
    • total deposits
    • fs220.txt
  • ACCT_703
    • housing lending
    • fs220.txt
  • ACCT_387
    • small business lending (non agricultural business loans)
    • fs220.txt
  • ACCT_042
    • small agricultural lending
    • fs220.txt
  • MemberMinorityStatus
    • member minority status
    • fs220D.txt
  • Acct_886H
    • supports mobile banking
    • fs220D.txt
  • Acct_887J
    • supports bill pay
    • fs220D.txt

This leaves us importing only three documents for institutions:

  1. foicu.txt
  2. fs220.txt
  3. fs220D.txt

Import foicu.txt

Inside the ncua_data/call_reports_raw/institution_data folder in terminal use:

mongoimport -d bb2020 -c ncua_all_reports --headerline --type=csv --file=foicu.txt

Import fs220 and fs220D

We will merge in these second two files using CU_NUMBER as the merge key. Once in the report_data folder use:

mongoimport -d bb2020 -c ncua_all_reports --headerline --type=csv --file="fs220.txt" --mode merge --upsertFields=CU_NUMBER

mongoimport -d bb2020 -c ncua_all_reports --headerline --type=csv --file="fs220D.txt" --mode merge --upsertFields=CU_NUMBER

We'll do one aggregation here to fill out the credit union type and rssd. This will help match credit unions with CDFI data:

db.ncua_all_reports.aggregate([
  {
    $addFields: {
      rssd: '$RSSD',
      creditUnionType: {
        $switch: {
          branches: [
            {
              case: {
                $eq: ['$CU_TYPE', 1]
              },
              then: 'Federal Credit Union'
            },
            {
              case: {
                $eq: ['$CU_TYPE', 2]
              },
              then: 'Federally Insured State-Chartered Credit Union'
            },
            {
              case: {
                $eq: ['$CU_TYPE', 3]
              },
              then: 'State Credit Union'
            }
          ]
        }
      }
    }
  },
  {
    $out: 'ncua_all_reports'
  }
]);

Create index on rssd and CU_NUMBER field:

db.ncua_all_reports.createIndex({ rssd: 1 });
db.ncua_all_reports.createIndex({ CU_NUMBER: 1 });

Import NCUA Branches

The file ncua_data/call_reports_raw/branch_data/Credit Union Branch Information.txt contains address information about each credit union branch. We need to convert this .txt file to a .csv file and then upload it to geocod.io for processing. We use the physical address for geocoding.

First import to mongo:

mongoimport -d bb2020 -c ncua_branches --headerline --type=csv --file="Credit Union Branch Information.txt"

Then bring rssd down from ncua_all_reports:

db.ncua_branches.aggregate([
  {
    $lookup: {
      from: 'ncua_all_reports',
      localField: 'CU_NUMBER',
      foreignField: 'CU_NUMBER',
      as: 'hq'
    }
  },
  {
    $unwind: '$hq'
  },
  {
    $addFields: {
      rssd: '$hq.RSSD'
    }
  },
  {
    $project: {
      hq: 0
    }
  },
  {
    $out: 'ncua_branches'
  }
]);

Confirm that all the branches received an rssd (i.e. have a parent institution)

db.ncua_branches.find({ rssd: { $exists: true } }).count();
// should equal
db.ncua_branches.count();

Then create an index on rssd and a joint unique index on rssd & SiteId:

db.ncua_branches.createIndex({ rssd: 1 });
db.ncua_branches.createIndex({ rssd: 1, SiteId: 1 }, { unique: true });

Now we can export this collection as a csv. In the ncua_data/geocoding folder run:

mongoexport -d bb2020 -c ncua_branches --fields rssd,CU_NUMBER,SiteId,CU_NAME,SiteName,SiteTypeName,MainOffice,PhysicalAddressLine1,PhysicalAddressLine2,PhysicalAddressCity,PhysicalAddressStateCode,PhysicalAddressPostalCode,PhysicalAddressCountyName,PhysicalAddressCountry --type=csv --out ncua_branches_for_geocoding.csv

Then submit to geocod.io for geocoding of addresses into latitude longitude. Then we'll essentially repeat the above steps.

Once you receive the file back, merge back into ncua_branches collection in mongo using SiteId as key:

mongoimport -d bb2020 -c ncua_branches --fieldFile="fieldfile.txt"  --columnsHaveTypes --type=csv  --file="geoCode_response.csv" --mode merge --upsertFields=rssd,SiteId

hint: If something is being weird, you may have a document in ncua_branches that is the headerline as the above command uses a field file. Just find and delete that one.

Then delete any that were not geolocated (4 in 6/30 update for example):

db.ncua_branches.deleteMany({ Longitude: { $eq: 0 } });

Now add the location field:

db.ncua_branches.aggregate([
  {
    $addFields: {
      location: {
        type: 'Point',
        coordinates: ['$Longitude', '$Latitude']
      }
    }
  },
  {
    $out: 'ncua_branches'
  }
]);

Calculate NCUA Branch Statistics (NCUA)

We'll use the average geolocation helper function to calculate the centroid of all branches:

// define this averageGeolocation function in the mongo termiinal
const averageGeolocation = require('./helper_functions/averageGeolocation.js');

//group all branch locations by rssd and calculate the centroid
db.ncua_branches
  .aggregate([
    {
      $group: {
        _id: {
          rssd: '$rssd'
        },
        branchIds: {
          $push: '$_id'
        },
        coords: {
          $push: {
            latitude: '$Latitude',
            longitude: '$Longitude'
          }
        }
      }
    }
  ])
  .forEach(({ _id: rssd, branchIds, coords }) => {
    const { latitude, longitude } = averageGeolocation(coords);
    branchIds.forEach(id => {
      db.ncua_branches.findOneAndUpdate(
        { _id: id },
        {
          $set: {
            centroidLocation: {
              type: 'Point',
              coordinates: [longitude, latitude]
            }
          }
        }
      );
    });
  });

Calculate Distance to Centroid (NCUA)

The following is the haversine formula to find the distance between two points. We use this on each fdic_branch document to calculate the distance in meters between the branch location and the centroid location. This will eventually be used to calculate the Branch Density score.

// define haversine formula in mongo terminal
const haversineDistance = require('./helper_functions/haversineDistance.js');

Now we iterate through all the fdic_branches, calculate the distance to the centroid and write it back to the branch as 'distanceToCentroid'.

db.ncua_branches.find().forEach((x, i) => {
  const branchCoords = {
    latitude: x.location.coordinates[1],
    longitude: x.location.coordinates[0]
  };
  const centroidCoords = {
    latitude: x.centroidLocation.coordinates[1],
    longitude: x.centroidLocation.coordinates[0]
  };

  const distToCentroid = haversineDistance(branchCoords, centroidCoords);
  x.distanceToCentroid = distToCentroid;
  db.ncua_branches.save(x);
});

Create 2d sphere index on NCUA branches

db.ncua_branches.createIndex({ location: '2dsphere' });

Seting QCT Status for branches

We are using the qct_geo collection found in our older database bb

const otherDb = db.getSiblingDB('bb');
db.ncua_branches.find().forEach((x, i) => {
  const { location } = x;
  const tract = otherDb.qct_geo.findOne(
    {
      geometry: {
        $geoIntersects: {
          $geometry: location
        }
      }
    },
    { geometry: 0 }
  );
  const isQct = tract ? tract.properties.qct : null;
  x.qct = isQct;
  db.ncua_branches.save(x);
  if (i % 100 === 0) {
    print(i, ' complete');
  }
});

db.fdic_branches.find().forEach((x, i) => {
  const { location } = x;
  const tract = otherDb.qct_geo.findOne(
    {
      geometry: {
        $geoIntersects: {
          $geometry: location
        }
      }
    },
    { geometry: 0 }
  );
  const isQct = tract ? tract.properties.qct : null;
  x.qct = isQct;
  db.fdic_branches.save(x);
});

SPM Query

NCUA SPM Base

Create NCUA institutions collection

db.ncua_all_reports.aggregate([
  {
    $project: {
      rssd: 1,
      cu_number: '$CU_NUMBER',
      regulator: 'ncua',
      bankInformation: {
        city: '$CITY',
        state: '$STATE',
        zip: '$ZIP_CODE',
        name: '$CU_NAME',
        streetAddress: '$STREET',
        creditUnionType: '$creditUnionType',
        year_opened: '$ISSUE_DATE',
        website: '$Acct_891'
      },
      meta: {
        reportDate: '$CYCLE_DATE',
        collection: 'ncua_all_reports',
        originId: '$_id'
      },
      coreStatistics: {
        totalAssets: {
          $divide: [
            '$ACCT_010',
            1000
          ]
        },
        totalAssetsThousands: {
          $divide: ['$ACCT_010', 1000]
        },
        totalAssetsDollars: '$ACCT_010',
        totalAssetsBillions: {
          $divide: ['$ACCT_010', 1000000000]
        },
        totalLoans: {
          $divide: [
            '$ACCT_025B',
            1000
          ]
        },
        totalDeposits: {
          $divide: [
            '$ACCT_018',
            1000
          ]
        },
        housingLending: {
          $divide: [
            '$ACCT_703',
            1000
          ]
        },
        smallBizLending: {
          $divide: [
            '$ACCT_387',
            1000
          ]
        },
        farmLending: {
          $divide: [
            '$ACCT_042',
            1000
          ]
        }
      },
      flags: {
        mcu: {
          $cond: {
            if: {
              $eq: ['$MemberMinorityStatus', 1]
            },
            then: true,
            else: false
          }
        },
        licu: {
          $cond: {
            if: {
              $eq: ['$LIMITED_INC', 1]
            },
            then: true,
            else: false
          }
        }
      }
    }
  },
  {
    $lookup: {
      from: 'ncua_branches',
      let: {
        inst_rssd: '$rssd'
      },
      pipeline: [
        {
          $match: {
            $expr: {
              $eq: ['$rssd', '$$inst_rssd']
            }
          }
        },
        {
          $group: {
            _id: null,
            countQct: {
              $sum: {
                $cond: {
                  if: {
                    $eq: ['$qct', true]
                  },
                  then: 1,
                  else: 0
                }
              }
            },
            countTotal: {
              $sum: 1
            },
            avgDistToCenter: {
              $avg: '$distanceToCentroid'
            }
          }
        },
        {
          $project: {
            _id: 0
          }
        }
      ],
      as: 'branchStatistics'
    }
  },
  {
    $unwind: '$branchStatistics'
  },
  {
    $out: 'ncua_institutions_06_2019'
  }
]);

FDIC SPM Base

Create FDIC institutions collection.

NOTE!!! NCUA publishes actual dollar values for fields like total assets, total loans, etc. FDIC publishes those same values in thousands of dollars. In this query, we are converting all values that NCUA has as actual into thousands.

db.fdic_all_reports.aggregate([
  {
    $project: {
      rssd: '$fed_rssd',
      cert: '$cert',
      regulator: 'fdic',
      bankInformation: {
        city: '$city',
        state: '$state',
        zip: '$zip',
        name: '$name',
        streetAddress: '$address',
        year_opened: '$estymd',
        website: '$webaddr'
      },
      meta: {
        reportDate: '$repdte',
        collection: 'fdic_all_reports',
        originId: '$_id'
      },
      coreStatistics: {
        totalAssets: '$asset',
        totalAssetsThousands: '$asset',
        totalAssetsDollars: {
          $multiply: ['$asset', 1000]
        },
        totalAssetsBillions: {
          $divide: ['$asset', 1000000]
        },
        totalLoans: '$lnlsnet',
        totalDeposits: '$depdom',
        housingLending: '$lnreres',
        smallBizLending: {
          $sum: ['$Lnrenr4', '$Lnci4']
        },
        farmLending: {
          $sum: ['Lnreag4', 'Lnag4']
        }
      },
      flags: {
        mutual: {
          $cond: {
            if: {
              $eq: ['$mutual', 1]
            },
            then: true,
            else: false
          }
        },
        communityBank: {
          $cond: {
            if: {
              $eq: ['$cb', 1]
            },
            then: true,
            else: false
          }
        }
      }
    }
  },
  {
    $lookup: {
      from: 'fdic_branches',
      let: {
        inst_rssd: '$rssd'
      },
      pipeline: [
        {
          $match: {
            $expr: {
              $eq: ['$rssd', '$$inst_rssd']
            }
          }
        },
        {
          $group: {
            _id: null,
            countQct: {
              $sum: {
                $cond: {
                  if: {
                    $eq: ['$qct', true]
                  },
                  then: 1,
                  else: 0
                }
              }
            },
            countTotal: {
              $sum: 1
            },
            avgDistToCenter: {
              $avg: '$distanceToCentroid'
            }
          }
        },
        {
          $project: {
            _id: 0
          }
        }
      ],
      as: 'branchStatistics'
    }
  },
  {
    $unwind: '$branchStatistics'
  },
  {
    $out: 'fdic_institutions_06_2019'
  }
]);

Adding FDIC Minority Depository Institution Flag

List of MDIs is found on the FDIC website

June 2019 data found in MDI_data

Load in the cert numbers as FDIC_MDI_06_2019 collection:

mongoimport -d bb2020 -c fdic_mdi_06_2019 --file=fdic_mdi_certs.csv --headerline --type=csv

Now run the following aggregation to look for FDIC institutions, then lookup if the cert matches an MDI. If so, then it is an MDI, otherwise not.

db.fdic_institutions_06_2019.aggregate([
  {
    $lookup: {
      from: 'fdic_mdi_06_2019',
      localField: 'cert',
      foreignField: 'cert',
      as: 'mdi_list'
    }
  },
  {
    $addFields: {
      'flags.mdi': {
        $cond: {
          if: {
            $gt: [{ $size: '$mdi_list' }, 0]
          },
          then: true,
          else: false
        }
      }
    }
  },
  {
    $project: {
      mdi_list: 0
    }
  },
  {
    $out: 'fdic_institutions_06_2019'
  }
]);

Matching CDFI Banks & Credit Unions

Create a collection using all_cdfis.csv from the CDFI_data folder

mongoimport -d bb2020 -c all_cdfis --file=all_cdfis.csv --type=csv --headerline

Ensure rssd is indexed in both collections:

db.all_institutions_spm_06_2019.ensureIndex({ rssd: 1 });
db.all_cdfis.ensureIndex({ rssd: 1 });

Then add this as a flag to fdic_institutions_06_2019 and ncua_institutions_06_2019 with the following aggregations:

db.fdic_institutions_06_2019.aggregate([
  {
    $lookup: {
      from: 'all_cdfis',
      localField: 'rssd',
      foreignField: 'rssd',
      as: 'cdfi_record'
    }
  },
  {
    $unwind: '$cdfi_record'
  },
  {
    $addFields: {
      'flags.cdfi': {
        $cond: {
          if: {
            $eq: ['$cdfi_record.CDFI', 'TRUE']
          },
          then: true,
          else: false
        }
      }
    }
  },
  {
    $project: {
      cdfi_record: 0
    }
  },
  {
    $out: 'fdic_institutions_06_2019'
  }
]);

db.ncua_institutions_06_2019.aggregate([
  {
    $lookup: {
      from: 'all_cdfis',
      localField: 'rssd',
      foreignField: 'rssd',
      as: 'cdfi_record'
    }
  },
  {
    $unwind: '$cdfi_record'
  },
  {
    $addFields: {
      'flags.cdfi': {
        $cond: {
          if: {
            $eq: ['$cdfi_record.CDFI', 'TRUE']
          },
          then: true,
          else: false
        }
      }
    }
  },
  {
    $project: {
      cdfi_record: 0
    }
  },
  {
    $out: 'ncua_institutions_06_2019'
  }
]);

Combine FDIC & NCUA SPM_BASE INTO ALL INSTITUTIONS

Run the following commands in the mongo terminal:

// save each fdic record into all_institutions
db.fdic_institutions_06_2019.find().forEach(x => {
  db.all_institutions_06_2019.save(x);
});

// save each ncua record into all_institutions
db.ncua_institutions_06_2019.find().forEach(x => {
  db.all_institutions_06_2019.save(x);
});

Add score data

Run this aggregation on all institutions to add social performance metrics and scores:

db.all_institutions_06_2019.aggregate([
  {
    $addFields: {
      'branchStatistics.pctQctBranches': {
        $divide: ['$branchStatistics.countQct', '$branchStatistics.countTotal']
      },
      'branchStatistics.avgDistToCenterMiles': {
        $multiply: ['$branchStatistics.avgDistToCenter', 0.000621371]
      },
      weights: {
        qualityLending: 0.6,
        qctPresence: 0.2,
        branchDensity: 0.1,
        totalAssets: 0.1
      }
    }
  },
  {
    $addFields: {
      scores: {
        qualityLending: {
          $divide: [
            {
              $sum: [
                '$coreStatistics.housingLending',
                '$coreStatistics.smallBizLending',
                '$coreStatistics.farmLending'
              ]
            },
            '$coreStatistics.totalAssetsThousands'
          ]
        },
        qctPresence: '$branchStatistics.pctQctBranches',
        branchDensity: {
          $max: [
            {
              $divide: [
                {
                  $subtract: [250, '$branchStatistics.avgDistToCenterMiles']
                },
                250
              ]
            },
            0
          ]
        },
        totalAssets: {
          $divide: [
            1,
            {
              $add: [
                1,
                {
                  $divide: [
                    {
                      $pow: ['$coreStatistics.totalAssetsBillions', 2]
                    },
                    25
                  ]
                }
              ]
            }
          ]
        }
      }
    }
  },
  {
    $addFields: {
      weightedScores: {
        qualityLending: {
          $multiply: ['$scores.qualityLending', '$weights.qualityLending']
        },
        qctPresence: {
          $multiply: ['$scores.qctPresence', '$weights.qctPresence']
        },
        branchDensity: {
          $multiply: ['$scores.branchDensity', '$weights.branchDensity']
        },
        totalAssets: {
          $multiply: ['$scores.totalAssets', '$weights.totalAssets']
        }
      }
    }
  },
  {
    $addFields: {
      totalScores: {
        raw: {
          $sum: [
            '$scores.qualityLending',
            '$scores.qctPresence',
            '$scores.branchDensity',
            '$scores.totalAssets'
          ]
        },
        weighted: {
          $sum: [
            '$weightedScores.qualityLending',
            '$weightedScores.qctPresence',
            '$weightedScores.branchDensity',
            '$weightedScores.totalAssets'
          ]
        }
      }
    }
  },
  {
    $out: 'all_institutions_spm_06_2019'
  }
]);

And create an index on rssd:

db.all_institutions_spm_06_2019.createIndex({rssd: 1})

STATUS

There should now be a main institution collection called all_institutions_spm_06_2019. For FDIC institutions, flags include MDI, Community Bank, and CDFI. For NCUA institutions, flags include MemberMinorityStatus, LowIncome, and CDFI.

Combining Branches into All_Branches Collection

Convert FDIC & NCUA Branches into identical structure

We will be creating new collections ncua & fdic _spm_base_06_2019 and mapping data to the same fields and turning them into geoJson objects:

db.fdic_branches.aggregate([
  {
    $project: {
      rssd: 1,
      meta: {
        uid: {
          UNINUMBR: '$UNINUMBR'
        },
        collection: 'fdic_branches',
        originId: '$_id',
        reportDate: '$YEAR'
      },
      mainOffice: {
        $cond: {
          if: {
            $eq: ['$BKMO', 1]
          },
          then: true,
          else: false
        }
      },
      bankName: '$NAMEFULL',
      branchName: '$NAMEBR',
      address1: '$ADDRESBR',
      city: '$CITYBR',
      state: '$STALPBR',
      stateName: '$STNAMEBR',
      zip: '$ZIPBR',
      location: 1,
      regulator: 1,
      centroidLocation: 1,
      distanceToCentroid: 1,
      distanceToCentroidMiles: {
        $multiply: ['$distanceToCentroid', 0.000621371]
      },
      qct: 1
    }
  },
  {
    $out: 'fdic_branches_base_06_2019'
  }
]);

db.ncua_branches.aggregate([
  {
    $project: {
      rssd: 1,
      meta: {
        uid: {
          rssd: '$rssd',
          SiteId: '$SiteId'
        },
        collection: 'ncua_branches',
        originId: '$_id',
        reportDate: '$CYCLE_DATE'
      },
      mainOffice: {
        $cond: {
          if: {
            $eq: ['$MainOffice', 'Yes']
          },
          then: true,
          else: false
        }
      },
      bankName: '$CU_NAME',
      branchName: '$SiteName',
      address1: '$PhysicalAddressLine1',
      address2: '$PhysicalAddressLine2',
      city: '$PhysicalAddressCity',
      state: '$PhysicalAddressStateCode',
      zip: '$PhysicalAddressPostalCode',
      location: 1,
      regulator: 'ncua',
      centroidLocation: 1,
      distanceToCentroid: 1,
      distanceToCentroidMiles: {
        $multiply: ['$distanceToCentroid', 0.000621371]
      },
      qct: 1
    }
  },
  {
    $out: 'ncua_branches_base_06_2019'
  }
]);

Then combine them:

db.fdic_branches_base_06_2019.find().forEach(x => {
  db.all_branches_06_2019.save(x);
});

db.ncua_branches_base_06_2019.find().forEach(x => {
  db.all_branches_06_2019.save(x);
});

Add indexes on rssd and location.

db.all_branches_06_2019.createIndex({ rssd: 1 });
db.all_branches_06_2019.createIndex({ location: '2dsphere' });

Add Zip Code Data to New DB

const otherDb = db.getSiblingDB('bb');
otherDb.zips.find().forEach(x => {
  db.zips.save(x);
});

Create index on 'postalCode' and 'location' (2dsphere for location)