-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat] Connect NYSERDA API and Download Data from NYISO #14
[feat] Connect NYSERDA API and Download Data from NYISO #14
Conversation
…le geocoding api to retrieve latitude and longitude
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good Deena. Please address the changes I suggested.
google_maps_api_key = os.environ.get('NEXT_PUBLIC_GOOGLE_MAPS_API_KEY') | ||
|
||
def check_status(status): | ||
if status is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, let's return NULL for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Monique! just wanted to clarify--in python, None
is a special keyword used to refer to null objects. So do you want me to return a string "NULL"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return None
then ! But make sure when you write it in the database it will be NULL.
return 'Cancelled' | ||
elif status.lower() == 'operational' or status.lower() == 'completed': | ||
return 'Operational' | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's return proposed only if status says under development
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it! what should we return if the status is missing? I can leave it as a null
/None
status and then add error handling for that in the other scraper files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's leave status as None
if it is missing
'longitude': long, | ||
'data_through_date': item.get('data_through_date', None), | ||
} | ||
small_list.append(project_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am checking on the missing / ambiguous data and will have you fix anything once we get confirmation from the npo.
from io import BytesIO | ||
import urllib | ||
|
||
nyiso = requests.get('https://www.nyiso.com/documents/20142/1407078/NYISO-Interconnection-Queue.xlsx') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next steps for this is to scrape the actual URL from the NYISO page
api/webscraper/nyserda_scraper.py
Outdated
response = requests.get('https://data.ny.gov/resource/dprp-55ye.json') | ||
data = response.json() | ||
filtered_list = [] | ||
for item in large_data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data
is created from the response.json
but isn't used in this method. Do we want to go through the items in large_data
or data
from line 81?
'proposed_cod': item.get('year_of_delivery_start_date', None), | ||
'county': item.get('county_province', None), | ||
'region': item.get('redc', None), | ||
'zipcode:': item.get('zip_code', None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should be 'zipcode':
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nehaahussain ah yes! it's a little funky but that's the way they titled the "zipcode" field for the NYSERDA large-scale data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job Deena! I added some comments:)
What's new in this PR
Description
Created
api/webscraper/utils
module:scraper_utils.py
file with helper methods for parsing and filtering the data from our data sourcescheck_status
checks the status of our projectsjson
filesgeocode_lat_long
uses the Google Maps API Geocoding function to get the approximate latitude and longitude information based on a city and state. This function is used for the small solar project data from NYSERDA, which contains a field for city_town but not latitude/longitudeCreated
api/webscraper/nyserda_scraper.py
file:nyserda_large.json
andnyserda_small.json
filesCreated
api/webscraper/scraper.py
file:How to review
Standard procedure
NOTE: This function to parse and filter the data for the small solar projects NYSERDA data set makes calls to our Google Maps API! Don't run this file very frequently unless needed---you can also check the data already inside
nyserda_large.json
andnyserda_small.json
.To check if the scrapers for the NYSERDA data are working, you can run this command in your terminal:
This will dump data into the
nyserda_large.json
andnyserda_small.json
files inside theapi/webscraper
directory! If you want to see it in action, you can delete the data in there and run the command above and the json files should be repopulated.If you run into any issues with dependencies, you may need to download certain python packages such as:
requests
json
python-dotenv
urllib
pandas
io
Run this command in your terminal:
for each of the needed dependencies!
Next steps
Relevant links
Online sources
Related PRs
CC: @itsliterallymonique