Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Projects List Feature #14

Open
wants to merge 57 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 54 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
d26d6b1
Add fork:false to Github queries
mrthankyou Feb 11, 2021
2e58640
Initial work setting up custom LGTM project list curation
mrthankyou Feb 17, 2021
d4098d3
Clean up code and get basic cache parsing file setup
mrthankyou Feb 17, 2021
e69003a
Continued work on custom project lists feature
mrthankyou Feb 17, 2021
02f16a3
Fix misc issues
mrthankyou Feb 17, 2021
d3898cb
Add comment and ignore cache files
mrthankyou Feb 17, 2021
8e839a8
Refactor code
mrthankyou Feb 17, 2021
5c996fc
Reword text
mrthankyou Feb 17, 2021
7bd1bf0
Revert stars to accurate count
mrthankyou Feb 17, 2021
3f8d336
Remove comment
mrthankyou Feb 17, 2021
ca2bbc6
Update README.md
mrthankyou Feb 17, 2021
69543fb
Add custom project list feature to search term script
mrthankyou Feb 17, 2021
a687d35
Save only real projects to LGTM project lists
mrthankyou Feb 17, 2021
a4133cc
Remove unnecessary modules
mrthankyou Feb 17, 2021
b5ecc8a
Create cache folder if it already doesn't exist
mrthankyou Feb 18, 2021
c770a2f
Add draft for build in progress guard clause
mrthankyou Feb 18, 2021
b83bacf
Accept both proto and real projects
mrthankyou Feb 18, 2021
1b2982a
Add ProjectBuild and ProjectBuilds classes
mrthankyou Feb 19, 2021
0bdd4cc
Remove logs and add new request for proto projects
mrthankyou Feb 19, 2021
88e3793
Save more project data to cache files
mrthankyou Feb 19, 2021
f515563
Refactor how we move repos to LGTM lists
mrthankyou Feb 19, 2021
a853b98
Update README with LGTM build process info
mrthankyou Feb 19, 2021
01842f7
Add Python documentation for functions
mrthankyou Feb 21, 2021
2313cc3
Add comment
mrthankyou Feb 22, 2021
32d4fd9
Remove unnecessary comments
mrthankyou Feb 22, 2021
6c825f6
Add guard clauses and improved project filtering
mrthankyou Feb 22, 2021
3922973
Increase timer
mrthankyou Feb 22, 2021
a3cf8e2
Uncomment code
mrthankyou Feb 22, 2021
50fc91e
Remove unnecessary comment
mrthankyou Feb 22, 2021
2cd04b5
Add HTTP retries
mrthankyou Feb 22, 2021
c8e33ae
Remove unnecessary prints
mrthankyou Feb 22, 2021
58b4d1e
Fix various issues with moving repos to lists
mrthankyou Feb 23, 2021
08f1b7c
Add HTTP retries when retrieving a project
mrthankyou Feb 24, 2021
429c9ba
Add check for protoprojects
mrthankyou Feb 24, 2021
ba0e6f4
Handle exceptions from LGTM
mrthankyou Mar 3, 2021
85b368e
Delete test.py
mrthankyou Mar 3, 2021
cbe5fa5
Clarify API call to LGTM
mrthankyou Mar 3, 2021
1e40f13
Refactor how we build SimpleProjects
mrthankyou Mar 3, 2021
aa14305
Rename method
mrthankyou Mar 3, 2021
bedc587
Remove useless code
mrthankyou Mar 3, 2021
e362ef8
Rename ProjectBuild#name and refactor code
mrthankyou Mar 3, 2021
fda2a9f
Add SimpleProject#project_type method
mrthankyou Mar 3, 2021
b51fced
Continue refactoring how we determine LGTM project types
mrthankyou Mar 3, 2021
0b182b9
Rename ProjectBuild#id to #key
mrthankyou Mar 3, 2021
c6db487
Update comment on refactoring
mrthankyou Mar 3, 2021
9eb9c4a
Refactor SimpleProject to store the project type
mrthankyou Mar 3, 2021
0ba1576
Simplify logic in determining project state
mrthankyou Mar 3, 2021
476faeb
Add comments
mrthankyou Mar 3, 2021
2c0d44b
Refactor logic with guard clauses
mrthankyou Mar 3, 2021
574d0f6
Add unfollow_all_followed_projects.py script
mrthankyou Mar 3, 2021
803ebd7
Convert ProjectBuild to a subclass of SimpleProject
mrthankyou Mar 3, 2021
69b6614
Refactor simple project build to not raise error
mrthankyou Mar 3, 2021
89a25d7
Add checks confirming LGTM project is valid
mrthankyou Mar 3, 2021
f44b959
Fix misc errors
mrthankyou Mar 3, 2021
9ba28cf
Reword comment
mrthankyou Mar 4, 2021
2e1d595
Remove unnecessary code
mrthankyou Mar 4, 2021
5ea2b9a
Remove comments
mrthankyou Mar 4, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -376,4 +376,6 @@ $RECYCLE.BIN/
# Windows shortcuts
*.lnk

cache/*

# End of https://www.toptal.com/developers/gitignore/api/java,pycharm+all,intellij+all,python,macos,windows,linux
44 changes: 42 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,12 +63,52 @@ python3 move_org_projects_under_project_list_then_unfollow.py <LGTM_PROJECT_LIST
python3 follow_repos_by_search_term_via_code_instances.py <LANGUAGE> <SEARCH_TERM>

# Finds repositories given a search term. Under the hood, the script searches Github for repositories that match the provided search term.
python3 follow_repos_by_search_term.py <LANGUAGE> <SEARCH_TERM>
python3 follow_repos_by_search_term.py <LANGUAGE> <SEARCH_TERM> <CUSTOM_LIST_NAME>(optional)

# Finds top repositories that have a minimum 500 stars and use the provided programming language.
python3 follow_top_repos_by_star_count.py <LANGUAGE>
python3 follow_top_repos_by_star_count.py <LANGUAGE> <CUSTOM_LIST_NAME>(optional)

# Unfollows all projects you're currently following that are not in a custom list.
python3 unfollow_all_followed_projects.py
```

## The Custom Projects Lists Feature
In developing these collection of scripts, we realized that when a user follows thousands of repos in their LGTM account, there is a chance that the LGTM account will break. You won't be able to use the query console and some API
calls will be broken.

To resolve this, we decided to create a feature users can opt-in. The "Custom Projects Lists" feature does the following:

- Follows all repos (aka project) in your LGTM account.
- Stores every project you follow in a txt file.
- At a later date (we suggest 24 hours), the user may run a follow-up command that will take the repos followed, add them to a LGTM custom list, and finally unfollow the projects in the user's LGTM account.

Although these steps are tedious, this is the best work-around we've found. We avoid bricking the LGTM account when projects are placed in custom lists. Also, we typically wait 24 hours since if the project is new to LGTM it will want to first process the project and projects being processed can't be added to custom lists.

Finally, by having custom lists we hope that the security researcher will have an easier time picking which repos they want to test.

### How To Run The Custom Projects Lists Feature
In some of the commands above, you will see the <CUSTOM_LIST_NAME> option. This is optional for all
commands. This CUSTOM_LIST_NAME represents the name of a LGTM project list that will be created and used to add projects to. Any projects found from that command will then be added to the LGTM custom list. Let's show an example below to get a better idea of how this works:

1. Run a command passing in the name of the custom list name. The command below will follow Javascript repos and generate a cache file of every repo you follow for the project list called "cool_javascript_projects".

`python3 follow_top_repos_by_star_count.py javascript big_ole_js_projects`

2. Wait 1 - 24 hours.

3. Run the command below. This will take a cached file you created earlier, create a LGTM custom project list, add the projects to that project list, and finally unfollow the repositories in your LGTM account.

`python3 move_repos_to_lgtm_lists.py`

Note: When naming a project custom list name, please use alphanumeric, dashes, and underscore characters only.

### Build Processes By LGTM
LGTM can't move projects that are being processed into custom lists. To resolve this, we've added a check that confirms whether or not all projects you plan on moving to a custom list are processed. If a project isn't processed, we will not move any projects into the custom list and you'll receive the following error:

> The <CACHED_FILE_NAME> can't be processed at this time because a project build is still in progress.

If you receive this error, wait a few hours and run the script again.

## Legal

The author of this script assumes no liability for your use of this project, including,
Expand Down
2 changes: 1 addition & 1 deletion auto_sort_projects.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@
project_list_name = gh_org_to_project_list_name[org]
project_list_id = site.get_or_create_project_list(project_list_name)
for project in org_to_projects[org]:
if project.is_protoproject:
if project.is_protoproject():
print('Unable to add project to project list since it is a protoproject. %s' % project)
continue
site.load_into_project_list(project_list_id, [project.key])
Expand Down
33 changes: 26 additions & 7 deletions follow_repos_by_search_term.py
Original file line number Diff line number Diff line change
@@ -1,29 +1,35 @@
from typing import List
from lgtm import LGTMSite
from lgtm import LGTMSite, LGTMDataFilters

import utils.cacher
import utils.github_dates
import utils.github_api

import sys
import time

def save_project_to_lgtm(site: 'LGTMSite', repo_name: str):
def save_project_to_lgtm(site: 'LGTMSite', repo_name: str) -> dict:
print("About to save: " + repo_name)
# Another throttle. Considering we are sending a request to Github
# owned properties twice in a small time-frame, I would prefer for
# this to be here.
time.sleep(1)

repo_url: str = 'https://github.com/' + repo_name
site.follow_repository(repo_url)
project = site.follow_repository(repo_url)
print("Saved the project: " + repo_name)
return project

def find_and_save_projects_to_lgtm(language: str, search_term: str):
def find_and_save_projects_to_lgtm(language: str, search_term: str) -> List[str]:
github = utils.github_api.create()
site = LGTMSite.create_from_file()
saved_project_data: List[str] = []

for date_range in utils.github_dates.generate_dates():
repos = github.search_repositories(query=f'language:{language} created:{date_range} {search_term}')
repos = github.search_repositories(query=f'stars:>5 language:{language} fork:false created:{date_range} {search_term}')

# TODO: This occasionally returns requests.exceptions.ConnectionError which is annoying as hell.
# It would be nice if we built in exception handling.
for repo in repos:
# Github has rate limiting in place hence why we add a sleep here. More info can be found here:
# https://docs.github.com/rest/overview/resources-in-the-rest-api#rate-limiting
Expand All @@ -32,7 +38,15 @@ def find_and_save_projects_to_lgtm(language: str, search_term: str):
if repo.archived or repo.fork:
continue

save_project_to_lgtm(site, repo.full_name)
saved_project = save_project_to_lgtm(site, repo.full_name)

simple_project = LGTMDataFilters.build_simple_project(saved_project)

if simple_project.is_valid_project:
saved_data = f'{simple_project.display_name},{simple_project.key},{simple_project.project_type}'
saved_project_data.append(saved_data)

return saved_project_data

if len(sys.argv) < 3:
print("Please make sure you provided a language and search term")
Expand All @@ -42,4 +56,9 @@ def find_and_save_projects_to_lgtm(language: str, search_term: str):
search_term = sys.argv[2]

print(f'Following repos for the {language} language that contain the \'{search_term}\' search term.')
find_and_save_projects_to_lgtm(language, search_term)
saved_project_data = find_and_save_projects_to_lgtm(language, search_term)

# If the user provided a second arg then they want to create a custom list.
if len(sys.argv) <= 4:
custom_list_name = sys.argv[3]
utils.cacher.write_project_data_to_file(saved_project_data, custom_list_name)
30 changes: 23 additions & 7 deletions follow_top_repos_by_star_count.py
Original file line number Diff line number Diff line change
@@ -1,28 +1,32 @@
from typing import List
from lgtm import LGTMSite
from lgtm import LGTMSite, LGTMDataFilters

import utils.github_dates
import utils.github_api
import utils.cacher
import sys
import time

def save_project_to_lgtm(site: 'LGTMSite', repo_name: str):
def save_project_to_lgtm(site: 'LGTMSite', repo_name: str) -> dict:
print("Adding: " + repo_name)
# Another throttle. Considering we are sending a request to Github
# owned properties twice in a small time-frame, I would prefer for
# this to be here.
time.sleep(1)

repo_url: str = 'https://github.com/' + repo_name
site.follow_repository(repo_url)
project = site.follow_repository(repo_url)

print("Saved the project: " + repo_name)
return project

def find_and_save_projects_to_lgtm(language: str):
def find_and_save_projects_to_lgtm(language: str) -> List[str]:
github = utils.github_api.create()
site = LGTMSite.create_from_file()
saved_project_data: List[str] = []

for date_range in utils.github_dates.generate_dates():
repos = github.search_repositories(query=f'stars:>500 created:{date_range} sort:stars language:{language}')
repos = github.search_repositories(query=f'stars:>500 created:{date_range} fork:false sort:stars language:{language}')

for repo in repos:
# Github has rate limiting in place hence why we add a sleep here. More info can be found here:
Expand All @@ -32,7 +36,14 @@ def find_and_save_projects_to_lgtm(language: str):
if repo.archived or repo.fork:
continue

save_project_to_lgtm(site, repo.full_name)
saved_project = save_project_to_lgtm(site, repo.full_name)
simple_project = LGTMDataFilters.build_simple_project(saved_project)

if simple_project.is_valid_project:
saved_data = f'{simple_project.display_name},{simple_project.key},{simple_project.project_type}'
saved_project_data.append(saved_data)

return saved_project_data

if len(sys.argv) < 2:
print("Please provide a language you want to search")
Expand All @@ -41,4 +52,9 @@ def find_and_save_projects_to_lgtm(language: str):
language = sys.argv[1].capitalize()

print('Following the top repos for %s' % language)
find_and_save_projects_to_lgtm(language)
saved_project_data = find_and_save_projects_to_lgtm(language)

# If the user provided a second arg then they want to create a custom list.
if len(sys.argv) <= 3:
custom_list_name = sys.argv[2]
utils.cacher.write_project_data_to_file(saved_project_data, custom_list_name)
Loading