- Better Banking Data Process
- As of January 13th, 2020
- Table of Contents
- Updates
- Existing Data
- Data to Refresh
- Data Processing
Report Date | Update Started | Update Complete |
---|---|---|
12/31/2018 | August, 2019 | January, 2020 |
6/30/2019 | Jan 13, 2020 | tbd |
- Contains ~75,000 census tracts stored as geoJSON that includes a "qct" boolean within the properties object
- These are used to determine if branches are located within qcts or not
If this needs to be updated, this is the direct download link found on this page. The file it downloads contains a list of all census tracts (by FIPS) that are considered qualified, as of 2017
|-- fdic_data
| |-- all_reports
| | |-- all_reports_raw
| | | +-- [all *.csv from FDIC download]
| | +-- All_reports_***_readme.htm
| |-- branch_data
| |-- branch_data_definitions
| | +-- "Definitions-Table 1.csv"
| | +-- "Deleted Variables-Table 1.csv"
| | +-- "Read_me-Table 1.csv"
| | +-- "Sheet1-Table 1.csv"
| |-- branch_data_raw
| +-- ALL_2019.csv
After download, separate out the files into the following folders/sections
ATM.txt
Credit Union Branch Information.txt
Acct-DescTradeNames.txt
TradeNames.txt
(This is where you get CU names from rather than the search and replace crap)foicu.txt
(data for institutions)FOICUDES.txt
(data key for institutions)
fs220.txt
,fs220A.txt
,fs220B.txt
fs220C.txt
fs220D.txt
fs220G.txt
fs220H.txt
fs220I.txt
,fs220J.txt
fs220K.txt
fs220L.txt
fs220M.txt
fs220N.txt
|-- ncua_data
|-- call_reports_raw
|-- branch_data
+-- ATM.txt
+-- "Credit Union Branch Information.txt"
|-- institution_data
+-- Acct-DescTradeNames.txt
+-- foicu.txt
+-- FOICUDES.txt
+-- TradeNames.txt
|-- report_data
+-- fs220.txt
+-- fs220A.txt
+-- fs220B.txt
...
+-- fs220N.txt
+-- Acct-DescGrants.txt
+-- AcctDesc.txt
+-- Grants.txt
+-- Readme.txt
+-- Report1.txt
// TODO
All data will be going to a local mongoDb database.
A basic command in terminal to load a csv into mongo is the following:
mongoimport --db [databaseName] -c [collection] --type=csv --headerline
This data comes from fdic_all_reports. We need the following fields:
- bkclass - common
- fed_rssd - common
- cert - common
- specgrp - common
- name - common
- address - common
- city - common
- state - common
- zip - common
- webaddr - common
- repdte - common
- asset - Assets and Liabilities
- lnlsnet - Net Loans and Leases
- dep - Assets and Liabilities
- depdom - Assets and Liabilities
- lnreres - Net Loans and Leases
- lnag - Net Loans and Leases
- lnreag - Net Loans and Leases
- lnci - Net Loans and Leases
- lncon - Net Loans and Leases
- lncrcd - Net Loans and Leases
- Lnrenr4 - Small Business Loans
- Lnci4 - Small Business Loans
- Lnreag4 - Small Business Loans
- Lnag4 - Small Business Loans
- mutual - common
- cb - common
We only need to import:
All_Reports_20190630_Assets and Liabilities.csv
All_Reports_20190630_Net Loans and Leases.csv
All_Reports_20190630_Small Business Loans.csv
However, after importing the first file, we will be merging using the --upsertFields command
// first import command
mongoimport -d bb2020 -c fdic_all_reports --headerline --type=csv --file="All_Reports_20190630_Assets and Liabilities.csv"
The second and third import commands:
// second import command. Note --mode and --upsertFields
mongoimport -d bb2020 -c fdic_all_reports --headerline --type=csv --file="All_Reports_20190630_Net Loans and Leases.csv" --mode merge --upsertFields=fed_rssd
// third import command. Note --mode and --upsertFields
mongoimport -d bb2020 -c fdic_all_reports --headerline --type=csv --file="All_Reports_20190630_Small Business Loans.csv" --mode merge --upsertFields=fed_rssd
Create an rssd field on each document:
db.fdic_all_reports.aggregate([
{
$addFields: {
rssd: '$fed_rssd'
}
},
{
$out: 'fdic_all_reports'
}
]);
Now we index the fed_rssd
field in the fdic_all_reports
collection:
db.fdic_all_reports.createIndex({ fed_rssd: 1 });
db.fdic_all_reports.createIndex({ rssd: 1 });
While in the folder: ~/fdic_data/branch_data/branch_data_raw
mongoimport -d bb2020 -c fdic_branches --headerline --type=csv --file=ALL_2019.csv
First aggregation will filter out bank branches that either have no geolocation, or have a latitude/longitude of 0. It will create a location field and a 'rssd' field. The location and rssd will be indexed to facilitate future aggregations.
// First fdic_branches aggregation
db.fdic_branches_raw.aggregate([
{
$match: {
$expr: {
$and: [
{
$ne: [
{
$type: '$SIMS_LONGITUDE'
},
'string'
]
},
{
$ne: [
{
$type: '$SIMS_LATITUDE'
},
'string'
]
}
]
}
}
},
{
$match: {
$expr: {
$and: [
{
$ne: [
'$SIMS_LONGITUDE',
0
]
},
{
$ne: [
'$SIMS_LATITUDE',
0
]
}
]
}
}
},
{
$addFields: {
location: {
type: "Point",
coordinates: [
'$SIMS_LONGITUDE',
'$SIMS_LATITUDE'
]
},
rssd: '$RSSDID',
regulator: 'fdic'
}
},
{
$out: 'fdic_branches'
}
])
Two index commands in the mongo client terminal:
// create 2dsphere index on location field
db.fdic_branches.createIndex({ location: '2dsphere' });
db.fdic_branches.createIndex({ rssd: 1 });
First we need to normalize the numbers on each fdic_branch document: (enter in mongo client)
db.fdic_branches.find().forEach((x, i) => {
const domesticDeposits = parseInt(x.DEPDOM.toString().replace(/,/g, ''));
const branchDeposits = parseInt(x.DEPSUMBR.toString().replace(/,/g, ''));
x.domesticDeposits = domesticDeposits;
x.branchDeposits = branchDeposits;
db.fdic_branches.save(x);
});
We'll use the average geolocation helper function to calculate the centroid of all branches:
// define this averageGeolocation function in the mongo termiinal
const averageGeolocation = require('./helper_functions/averageGeolocation.js');
//group all branch locations by rssd and calculate the centroid
db.fdic_branches
.aggregate([
{
$group: {
_id: {
rssd: '$rssd'
},
branchIds: {
$push: '$_id'
},
coords: {
$push: {
latitude: '$SIMS_LATITUDE',
longitude: '$SIMS_LONGITUDE'
}
}
}
}
])
.forEach(({ _id: rssd, branchIds, coords }) => {
// for each set of branch ids, find the avg. location and write it back to all
// of the branches as a new location
const { latitude, longitude } = averageGeolocation(coords);
branchIds.forEach(id => {
db.fdic_branches.findOneAndUpdate(
{ _id: id },
{
$set: {
centroidLocation: {
type: 'Point',
coordinates: [longitude, latitude]
}
}
}
);
});
});
The following is the haversine formula to find the distance between two points. We use this on each fdic_branch
document to calculate the distance in meters between the branch location and the centroid location. This will eventually be used to calculate the Branch Density score.
// define haversine formula in mongo terminal
const haversineDistance = require('./helper_functions/haversineDistance.js');
Now we iterate through all the fdic_branches, calculate the distance to the centroid and write it back to the branch as 'distanceToCentroid'.
db.fdic_branches.find().forEach((x, i) => {
const branchCoords = {
latitude: x.location.coordinates[1],
longitude: x.location.coordinates[0]
};
const centroidCoords = {
latitude: x.centroidLocation.coordinates[1],
longitude: x.centroidLocation.coordinates[0]
};
const distToCentroid = haversineDistance(branchCoords, centroidCoords);
x.distanceToCentroid = distToCentroid;
db.fdic_branches.save(x);
});
This data comes from ncua_data/call_reports_raw/report_data
.
We need the following fields:
- RSSD
- Rssd for id
foicu.txt
- CU_NUMBER
- Credit Union charter number
foicu.txt
- CU_NAME
- Credit Union Name
foicu.txt
- STREET
- Street Addres
foicu.txt
- CITY
- City
foicu.txt
- STATE
- State
foicu.txt
- ZIP_CODE
- Zip
foicu.txt
- Acct_891
- website
fs220D.txt
- CYCLE_DATE
- report date
foicu.txt
- CU_TYPE
- credit union type
- Federal Credit Union
- Federally Insured State-Chartered Credit Union
- State Credit Union
foicu.txt
- credit union type
- ACCT_010
- total assets
- fs220.txt
- ACCT_025B
- total loans
- fs220.txt
- ACCT_018
- total deposits
- fs220.txt
- ACCT_703
- housing lending
- fs220.txt
- ACCT_387
- small business lending (non agricultural business loans)
- fs220.txt
- ACCT_042
- small agricultural lending
- fs220.txt
- MemberMinorityStatus
- member minority status
- fs220D.txt
- Acct_886H
- supports mobile banking
- fs220D.txt
- Acct_887J
- supports bill pay
- fs220D.txt
This leaves us importing only three documents for institutions:
foicu.txt
fs220.txt
fs220D.txt
Inside the ncua_data/call_reports_raw/institution_data
folder in terminal use:
mongoimport -d bb2020 -c ncua_all_reports --headerline --type=csv --file=foicu.txt
We will merge in these second two files using CU_NUMBER as the merge key. Once in the report_data folder use:
mongoimport -d bb2020 -c ncua_all_reports --headerline --type=csv --file="fs220.txt" --mode merge --upsertFields=CU_NUMBER
mongoimport -d bb2020 -c ncua_all_reports --headerline --type=csv --file="fs220D.txt" --mode merge --upsertFields=CU_NUMBER
We'll do one aggregation here to fill out the credit union type and rssd. This will help match credit unions with CDFI data:
db.ncua_all_reports.aggregate([
{
$addFields: {
rssd: '$RSSD',
creditUnionType: {
$switch: {
branches: [
{
case: {
$eq: ['$CU_TYPE', 1]
},
then: 'Federal Credit Union'
},
{
case: {
$eq: ['$CU_TYPE', 2]
},
then: 'Federally Insured State-Chartered Credit Union'
},
{
case: {
$eq: ['$CU_TYPE', 3]
},
then: 'State Credit Union'
}
]
}
}
}
},
{
$out: 'ncua_all_reports'
}
]);
Create index on rssd and CU_NUMBER field:
db.ncua_all_reports.createIndex({ rssd: 1 });
db.ncua_all_reports.createIndex({ CU_NUMBER: 1 });
The file ncua_data/call_reports_raw/branch_data/Credit Union Branch Information.txt
contains address information about each credit union branch. We need to convert this .txt file to a .csv file and then upload it to geocod.io for processing. We use the physical address for geocoding.
First import to mongo:
mongoimport -d bb2020 -c ncua_branches --headerline --type=csv --file="Credit Union Branch Information.txt"
Then bring rssd down from ncua_all_reports:
db.ncua_branches.aggregate([
{
$lookup: {
from: 'ncua_all_reports',
localField: 'CU_NUMBER',
foreignField: 'CU_NUMBER',
as: 'hq'
}
},
{
$unwind: '$hq'
},
{
$addFields: {
rssd: '$hq.RSSD'
}
},
{
$project: {
hq: 0
}
},
{
$out: 'ncua_branches'
}
]);
Confirm that all the branches received an rssd (i.e. have a parent institution)
db.ncua_branches.find({ rssd: { $exists: true } }).count();
// should equal
db.ncua_branches.count();
Then create an index on rssd and a joint unique index on rssd & SiteId:
db.ncua_branches.createIndex({ rssd: 1 });
db.ncua_branches.createIndex({ rssd: 1, SiteId: 1 }, { unique: true });
Now we can export this collection as a csv. In the ncua_data/geocoding
folder run:
mongoexport -d bb2020 -c ncua_branches --fields rssd,CU_NUMBER,SiteId,CU_NAME,SiteName,SiteTypeName,MainOffice,PhysicalAddressLine1,PhysicalAddressLine2,PhysicalAddressCity,PhysicalAddressStateCode,PhysicalAddressPostalCode,PhysicalAddressCountyName,PhysicalAddressCountry --type=csv --out ncua_branches_for_geocoding.csv
Then submit to geocod.io for geocoding of addresses into latitude longitude. Then we'll essentially repeat the above steps.
Once you receive the file back, merge back into ncua_branches
collection in mongo using SiteId as key:
mongoimport -d bb2020 -c ncua_branches --fieldFile="fieldfile.txt" --columnsHaveTypes --type=csv --file="geoCode_response.csv" --mode merge --upsertFields=rssd,SiteId
hint: If something is being weird, you may have a document in ncua_branches that is the headerline as the above command uses a field file. Just find and delete that one.
Then delete any that were not geolocated (4 in 6/30 update for example):
db.ncua_branches.deleteMany({ Longitude: { $eq: 0 } });
Now add the location field:
db.ncua_branches.aggregate([
{
$addFields: {
location: {
type: 'Point',
coordinates: ['$Longitude', '$Latitude']
}
}
},
{
$out: 'ncua_branches'
}
]);
We'll use the average geolocation helper function to calculate the centroid of all branches:
// define this averageGeolocation function in the mongo termiinal
const averageGeolocation = require('./helper_functions/averageGeolocation.js');
//group all branch locations by rssd and calculate the centroid
db.ncua_branches
.aggregate([
{
$group: {
_id: {
rssd: '$rssd'
},
branchIds: {
$push: '$_id'
},
coords: {
$push: {
latitude: '$Latitude',
longitude: '$Longitude'
}
}
}
}
])
.forEach(({ _id: rssd, branchIds, coords }) => {
const { latitude, longitude } = averageGeolocation(coords);
branchIds.forEach(id => {
db.ncua_branches.findOneAndUpdate(
{ _id: id },
{
$set: {
centroidLocation: {
type: 'Point',
coordinates: [longitude, latitude]
}
}
}
);
});
});
The following is the haversine formula to find the distance between two points. We use this on each fdic_branch
document to calculate the distance in meters between the branch location and the centroid location. This will eventually be used to calculate the Branch Density score.
// define haversine formula in mongo terminal
const haversineDistance = require('./helper_functions/haversineDistance.js');
Now we iterate through all the fdic_branches, calculate the distance to the centroid and write it back to the branch as 'distanceToCentroid'.
db.ncua_branches.find().forEach((x, i) => {
const branchCoords = {
latitude: x.location.coordinates[1],
longitude: x.location.coordinates[0]
};
const centroidCoords = {
latitude: x.centroidLocation.coordinates[1],
longitude: x.centroidLocation.coordinates[0]
};
const distToCentroid = haversineDistance(branchCoords, centroidCoords);
x.distanceToCentroid = distToCentroid;
db.ncua_branches.save(x);
});
db.ncua_branches.createIndex({ location: '2dsphere' });
We are using the qct_geo
collection found in our older database bb
const otherDb = db.getSiblingDB('bb');
db.ncua_branches.find().forEach((x, i) => {
const { location } = x;
const tract = otherDb.qct_geo.findOne(
{
geometry: {
$geoIntersects: {
$geometry: location
}
}
},
{ geometry: 0 }
);
const isQct = tract ? tract.properties.qct : null;
x.qct = isQct;
db.ncua_branches.save(x);
if (i % 100 === 0) {
print(i, ' complete');
}
});
db.fdic_branches.find().forEach((x, i) => {
const { location } = x;
const tract = otherDb.qct_geo.findOne(
{
geometry: {
$geoIntersects: {
$geometry: location
}
}
},
{ geometry: 0 }
);
const isQct = tract ? tract.properties.qct : null;
x.qct = isQct;
db.fdic_branches.save(x);
});
Create NCUA institutions collection
db.ncua_all_reports.aggregate([
{
$project: {
rssd: 1,
cu_number: '$CU_NUMBER',
regulator: 'ncua',
bankInformation: {
city: '$CITY',
state: '$STATE',
zip: '$ZIP_CODE',
name: '$CU_NAME',
streetAddress: '$STREET',
creditUnionType: '$creditUnionType',
year_opened: '$ISSUE_DATE',
website: '$Acct_891'
},
meta: {
reportDate: '$CYCLE_DATE',
collection: 'ncua_all_reports',
originId: '$_id'
},
coreStatistics: {
totalAssets: {
$divide: [
'$ACCT_010',
1000
]
},
totalAssetsThousands: {
$divide: ['$ACCT_010', 1000]
},
totalAssetsDollars: '$ACCT_010',
totalAssetsBillions: {
$divide: ['$ACCT_010', 1000000000]
},
totalLoans: {
$divide: [
'$ACCT_025B',
1000
]
},
totalDeposits: {
$divide: [
'$ACCT_018',
1000
]
},
housingLending: {
$divide: [
'$ACCT_703',
1000
]
},
smallBizLending: {
$divide: [
'$ACCT_387',
1000
]
},
farmLending: {
$divide: [
'$ACCT_042',
1000
]
}
},
flags: {
mcu: {
$cond: {
if: {
$eq: ['$MemberMinorityStatus', 1]
},
then: true,
else: false
}
},
licu: {
$cond: {
if: {
$eq: ['$LIMITED_INC', 1]
},
then: true,
else: false
}
}
}
}
},
{
$lookup: {
from: 'ncua_branches',
let: {
inst_rssd: '$rssd'
},
pipeline: [
{
$match: {
$expr: {
$eq: ['$rssd', '$$inst_rssd']
}
}
},
{
$group: {
_id: null,
countQct: {
$sum: {
$cond: {
if: {
$eq: ['$qct', true]
},
then: 1,
else: 0
}
}
},
countTotal: {
$sum: 1
},
avgDistToCenter: {
$avg: '$distanceToCentroid'
}
}
},
{
$project: {
_id: 0
}
}
],
as: 'branchStatistics'
}
},
{
$unwind: '$branchStatistics'
},
{
$out: 'ncua_institutions_06_2019'
}
]);
Create FDIC institutions collection.
NOTE!!! NCUA publishes actual dollar values for fields like total assets, total loans, etc. FDIC publishes those same values in thousands of dollars. In this query, we are converting all values that NCUA has as actual into thousands.
db.fdic_all_reports.aggregate([
{
$project: {
rssd: '$fed_rssd',
cert: '$cert',
regulator: 'fdic',
bankInformation: {
city: '$city',
state: '$state',
zip: '$zip',
name: '$name',
streetAddress: '$address',
year_opened: '$estymd',
website: '$webaddr'
},
meta: {
reportDate: '$repdte',
collection: 'fdic_all_reports',
originId: '$_id'
},
coreStatistics: {
totalAssets: '$asset',
totalAssetsThousands: '$asset',
totalAssetsDollars: {
$multiply: ['$asset', 1000]
},
totalAssetsBillions: {
$divide: ['$asset', 1000000]
},
totalLoans: '$lnlsnet',
totalDeposits: '$depdom',
housingLending: '$lnreres',
smallBizLending: {
$sum: ['$Lnrenr4', '$Lnci4']
},
farmLending: {
$sum: ['Lnreag4', 'Lnag4']
}
},
flags: {
mutual: {
$cond: {
if: {
$eq: ['$mutual', 1]
},
then: true,
else: false
}
},
communityBank: {
$cond: {
if: {
$eq: ['$cb', 1]
},
then: true,
else: false
}
}
}
}
},
{
$lookup: {
from: 'fdic_branches',
let: {
inst_rssd: '$rssd'
},
pipeline: [
{
$match: {
$expr: {
$eq: ['$rssd', '$$inst_rssd']
}
}
},
{
$group: {
_id: null,
countQct: {
$sum: {
$cond: {
if: {
$eq: ['$qct', true]
},
then: 1,
else: 0
}
}
},
countTotal: {
$sum: 1
},
avgDistToCenter: {
$avg: '$distanceToCentroid'
}
}
},
{
$project: {
_id: 0
}
}
],
as: 'branchStatistics'
}
},
{
$unwind: '$branchStatistics'
},
{
$out: 'fdic_institutions_06_2019'
}
]);
List of MDIs is found on the FDIC website
June 2019 data found in MDI_data
Load in the cert numbers as FDIC_MDI_06_2019 collection:
mongoimport -d bb2020 -c fdic_mdi_06_2019 --file=fdic_mdi_certs.csv --headerline --type=csv
Now run the following aggregation to look for FDIC institutions, then lookup if the cert matches an MDI. If so, then it is an MDI, otherwise not.
db.fdic_institutions_06_2019.aggregate([
{
$lookup: {
from: 'fdic_mdi_06_2019',
localField: 'cert',
foreignField: 'cert',
as: 'mdi_list'
}
},
{
$addFields: {
'flags.mdi': {
$cond: {
if: {
$gt: [{ $size: '$mdi_list' }, 0]
},
then: true,
else: false
}
}
}
},
{
$project: {
mdi_list: 0
}
},
{
$out: 'fdic_institutions_06_2019'
}
]);
Create a collection using all_cdfis.csv from the CDFI_data folder
mongoimport -d bb2020 -c all_cdfis --file=all_cdfis.csv --type=csv --headerline
Ensure rssd is indexed in both collections:
db.all_institutions_spm_06_2019.ensureIndex({ rssd: 1 });
db.all_cdfis.ensureIndex({ rssd: 1 });
Then add this as a flag to fdic_institutions_06_2019
and ncua_institutions_06_2019
with the following aggregations:
db.fdic_institutions_06_2019.aggregate([
{
$lookup: {
from: 'all_cdfis',
localField: 'rssd',
foreignField: 'rssd',
as: 'cdfi_record'
}
},
{
$unwind: '$cdfi_record'
},
{
$addFields: {
'flags.cdfi': {
$cond: {
if: {
$eq: ['$cdfi_record.CDFI', 'TRUE']
},
then: true,
else: false
}
}
}
},
{
$project: {
cdfi_record: 0
}
},
{
$out: 'fdic_institutions_06_2019'
}
]);
db.ncua_institutions_06_2019.aggregate([
{
$lookup: {
from: 'all_cdfis',
localField: 'rssd',
foreignField: 'rssd',
as: 'cdfi_record'
}
},
{
$unwind: '$cdfi_record'
},
{
$addFields: {
'flags.cdfi': {
$cond: {
if: {
$eq: ['$cdfi_record.CDFI', 'TRUE']
},
then: true,
else: false
}
}
}
},
{
$project: {
cdfi_record: 0
}
},
{
$out: 'ncua_institutions_06_2019'
}
]);
Run the following commands in the mongo terminal:
// save each fdic record into all_institutions
db.fdic_institutions_06_2019.find().forEach(x => {
db.all_institutions_06_2019.save(x);
});
// save each ncua record into all_institutions
db.ncua_institutions_06_2019.find().forEach(x => {
db.all_institutions_06_2019.save(x);
});
Run this aggregation on all institutions to add social performance metrics and scores:
db.all_institutions_06_2019.aggregate([
{
$addFields: {
'branchStatistics.pctQctBranches': {
$divide: ['$branchStatistics.countQct', '$branchStatistics.countTotal']
},
'branchStatistics.avgDistToCenterMiles': {
$multiply: ['$branchStatistics.avgDistToCenter', 0.000621371]
},
weights: {
qualityLending: 0.6,
qctPresence: 0.2,
branchDensity: 0.1,
totalAssets: 0.1
}
}
},
{
$addFields: {
scores: {
qualityLending: {
$divide: [
{
$sum: [
'$coreStatistics.housingLending',
'$coreStatistics.smallBizLending',
'$coreStatistics.farmLending'
]
},
'$coreStatistics.totalAssetsThousands'
]
},
qctPresence: '$branchStatistics.pctQctBranches',
branchDensity: {
$max: [
{
$divide: [
{
$subtract: [250, '$branchStatistics.avgDistToCenterMiles']
},
250
]
},
0
]
},
totalAssets: {
$divide: [
1,
{
$add: [
1,
{
$divide: [
{
$pow: ['$coreStatistics.totalAssetsBillions', 2]
},
25
]
}
]
}
]
}
}
}
},
{
$addFields: {
weightedScores: {
qualityLending: {
$multiply: ['$scores.qualityLending', '$weights.qualityLending']
},
qctPresence: {
$multiply: ['$scores.qctPresence', '$weights.qctPresence']
},
branchDensity: {
$multiply: ['$scores.branchDensity', '$weights.branchDensity']
},
totalAssets: {
$multiply: ['$scores.totalAssets', '$weights.totalAssets']
}
}
}
},
{
$addFields: {
totalScores: {
raw: {
$sum: [
'$scores.qualityLending',
'$scores.qctPresence',
'$scores.branchDensity',
'$scores.totalAssets'
]
},
weighted: {
$sum: [
'$weightedScores.qualityLending',
'$weightedScores.qctPresence',
'$weightedScores.branchDensity',
'$weightedScores.totalAssets'
]
}
}
}
},
{
$out: 'all_institutions_spm_06_2019'
}
]);
And create an index on rssd:
db.all_institutions_spm_06_2019.createIndex({rssd: 1})
There should now be a main institution collection called all_institutions_spm_06_2019
. For FDIC institutions, flags include MDI, Community Bank, and CDFI. For NCUA institutions, flags include MemberMinorityStatus, LowIncome, and CDFI.
We will be creating new collections ncua & fdic _spm_base_06_2019 and mapping data to the same fields and turning them into geoJson objects:
db.fdic_branches.aggregate([
{
$project: {
rssd: 1,
meta: {
uid: {
UNINUMBR: '$UNINUMBR'
},
collection: 'fdic_branches',
originId: '$_id',
reportDate: '$YEAR'
},
mainOffice: {
$cond: {
if: {
$eq: ['$BKMO', 1]
},
then: true,
else: false
}
},
bankName: '$NAMEFULL',
branchName: '$NAMEBR',
address1: '$ADDRESBR',
city: '$CITYBR',
state: '$STALPBR',
stateName: '$STNAMEBR',
zip: '$ZIPBR',
location: 1,
regulator: 1,
centroidLocation: 1,
distanceToCentroid: 1,
distanceToCentroidMiles: {
$multiply: ['$distanceToCentroid', 0.000621371]
},
qct: 1
}
},
{
$out: 'fdic_branches_base_06_2019'
}
]);
db.ncua_branches.aggregate([
{
$project: {
rssd: 1,
meta: {
uid: {
rssd: '$rssd',
SiteId: '$SiteId'
},
collection: 'ncua_branches',
originId: '$_id',
reportDate: '$CYCLE_DATE'
},
mainOffice: {
$cond: {
if: {
$eq: ['$MainOffice', 'Yes']
},
then: true,
else: false
}
},
bankName: '$CU_NAME',
branchName: '$SiteName',
address1: '$PhysicalAddressLine1',
address2: '$PhysicalAddressLine2',
city: '$PhysicalAddressCity',
state: '$PhysicalAddressStateCode',
zip: '$PhysicalAddressPostalCode',
location: 1,
regulator: 'ncua',
centroidLocation: 1,
distanceToCentroid: 1,
distanceToCentroidMiles: {
$multiply: ['$distanceToCentroid', 0.000621371]
},
qct: 1
}
},
{
$out: 'ncua_branches_base_06_2019'
}
]);
Then combine them:
db.fdic_branches_base_06_2019.find().forEach(x => {
db.all_branches_06_2019.save(x);
});
db.ncua_branches_base_06_2019.find().forEach(x => {
db.all_branches_06_2019.save(x);
});
Add indexes on rssd and location.
db.all_branches_06_2019.createIndex({ rssd: 1 });
db.all_branches_06_2019.createIndex({ location: '2dsphere' });
const otherDb = db.getSiblingDB('bb');
otherDb.zips.find().forEach(x => {
db.zips.save(x);
});
Create index on 'postalCode' and 'location' (2dsphere for location)