Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Connect NYSERDA API and Download Data from NYISO #14

Merged

Conversation

deenasun
Copy link
Collaborator

@deenasun deenasun commented Oct 14, 2024

What's new in this PR

Description

Created api/webscraper/utils module:

  • Contains a scraper_utils.py file with helper methods for parsing and filtering the data from our data sources
  • check_status checks the status of our projects
  • Any projects with "Completed" or "Operational" are considered "Operational"
  • Projects that have the status "Cancelled" are not included in our json files
  • Any other projects are classified as "Proposed" for now
  • geocode_lat_long uses the Google Maps API Geocoding function to get the approximate latitude and longitude information based on a city and state. This function is used for the small solar project data from NYSERDA, which contains a field for city_town but not latitude/longitude

Created api/webscraper/nyserda_scraper.py file:

  • This function fetches data from the NYSERDA Large-scale Renewable Projects database and NYSERDA Statewide Distributed Solar Projects database.
  • Filters for specific fields in the data, excluding any projects that have a "Cancelled" status
  • Dumps the data into nyserda_large.json and nyserda_small.json files

Created api/webscraper/scraper.py file:

  • Makes a get request to the NYISO url to download a xlsx file
  • For now, we simply load the fetched bytes into a Pandas dataframe

How to review

Standard procedure

git fetch origin deenasun/10-feat-connect-nyserda-api-and-download-data-from-nyiso
git checkout deenasun/10-feat-connect-nyserda-api-and-download-data-from-nyiso

NOTE: This function to parse and filter the data for the small solar projects NYSERDA data set makes calls to our Google Maps API! Don't run this file very frequently unless needed---you can also check the data already inside nyserda_large.json and nyserda_small.json.
To check if the scrapers for the NYSERDA data are working, you can run this command in your terminal:

python api/webscraper/nyserda_scraper.py

This will dump data into the nyserda_large.json and nyserda_small.json files inside the api/webscraper directory! If you want to see it in action, you can delete the data in there and run the command above and the json files should be repopulated.

If you run into any issues with dependencies, you may need to download certain python packages such as:

  • requests
  • json
  • python-dotenv
  • urllib
  • pandas
  • io

Run this command in your terminal:

pip install requests

for each of the needed dependencies!

Next steps

  • Download NYISO data using a webscraper to parse for the most up-to-date xlsx NYISO spreadsheet

Relevant links

Online sources

Related PRs

CC: @itsliterallymonique

@deenasun deenasun linked an issue Oct 14, 2024 that may be closed by this pull request
Copy link
Collaborator

@itsliterallymonique itsliterallymonique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good Deena. Please address the changes I suggested.

google_maps_api_key = os.environ.get('NEXT_PUBLIC_GOOGLE_MAPS_API_KEY')

def check_status(status):
if status is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, let's return NULL for now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Monique! just wanted to clarify--in python, None is a special keyword used to refer to null objects. So do you want me to return a string "NULL"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return None then ! But make sure when you write it in the database it will be NULL.

return 'Cancelled'
elif status.lower() == 'operational' or status.lower() == 'completed':
return 'Operational'
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's return proposed only if status says under development

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it! what should we return if the status is missing? I can leave it as a null/None status and then add error handling for that in the other scraper files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's leave status as None if it is missing

'longitude': long,
'data_through_date': item.get('data_through_date', None),
}
small_list.append(project_dict)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am checking on the missing / ambiguous data and will have you fix anything once we get confirmation from the npo.

from io import BytesIO
import urllib

nyiso = requests.get('https://www.nyiso.com/documents/20142/1407078/NYISO-Interconnection-Queue.xlsx')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next steps for this is to scrape the actual URL from the NYISO page

response = requests.get('https://data.ny.gov/resource/dprp-55ye.json')
data = response.json()
filtered_list = []
for item in large_data:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data is created from the response.json but isn't used in this method. Do we want to go through the items in large_data or data from line 81?

'proposed_cod': item.get('year_of_delivery_start_date', None),
'county': item.get('county_province', None),
'region': item.get('redc', None),
'zipcode:': item.get('zip_code', None),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should be 'zipcode':

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nehaahussain ah yes! it's a little funky but that's the way they titled the "zipcode" field for the NYSERDA large-scale data

Copy link
Collaborator

@nehaahussain nehaahussain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job Deena! I added some comments:)

@deenasun deenasun merged commit 8d52e3b into main Oct 18, 2024
2 checks passed
@deenasun deenasun deleted the 10-feat-connect-nyserda-api-and-download-data-from-nyiso branch November 8, 2024 04:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feat] Connect NYSERDA API and Download Data from NYISO
3 participants