-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avatars: Fetch avatars for websites with taken usernames #42
Conversation
username_api.py
Outdated
import sys | ||
import re | ||
import yaml | ||
|
||
app = Flask(__name__) | ||
cors = CORS(app) | ||
cors = CORS(app) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated change
username_api.py
Outdated
app.run(host='0.0.0.0', port=8521) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated change, #41 will take care of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, none of these are conflicting/breaking changes, I guess it's fine for them to be here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're gonna keep them here, you should do them in a new commit so there's not a bunch of random changes in a commit that adds a new feature 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lol, it's just a newline at EOF, nonetheless, I've undone those changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, sorry for being nitpicky, it's just generally distracting from the purpose of the PR, and makes the git blame
look weird.
@@ -1,5 +1,8 @@ | |||
--- | |||
facebook: | |||
avatar_usernames: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need different usernames for facebook?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted in the PR description, profiles on Facebook can be “hidden”, therefore having no avatar to display. These are public profile links.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
websites.yml
Outdated
type: meta | ||
property: og:image | ||
url: null | ||
behance: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use behance: false
, and order the keys alphabetically, so behance is at the top of the avatar
group.
websites.yml
Outdated
pinterest: | ||
type: regex | ||
property: image_xlarge_url | ||
url: null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to not have empty keys in yaml unless strictly necessary.
websites.yml
Outdated
@@ -46,3 +46,40 @@ username_patterns: | |||
invalid_patterns: | |||
- "\\.(com|net)" | |||
- "(\\.)\\1{1,}" # consecutive '.' not allowed | |||
avatar: | |||
pinterest: | |||
type: regex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type
isnt needed. it can be inferred. we need property:
, url:
, otherwise the entries can be short and clear gitlab: opengraph
, as that is the most common.
tests/test_get_avatar.py
Outdated
|
||
class TestGet_avatar(object): | ||
|
||
@pytest.mark.parametrize('user', data['github']['taken_usernames']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have better test parameterisation in this PR? (specifically for the newly added tests) There is lot of duplicated logic already in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I will look into parameterisation. I would assume it's okay to use the same methods/parameterisation in the username_check tests. Is that fine?(Of course, I'll do those in a separate PR later)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If more parameterisation is not easy, it can be deferred to GCI task #36
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to shift us from pytest to nosetests as part of #36 I think. It has a pretty powerful library parameterized, which although supports py.test, isn't as well supported(not as pretty and I see quite a few bug reports) (see FAQ for more details).
This is probably a large change(to builds etc.) , so I'd want it to be in it's own PR, but let me know whatever you decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nosetests is dead, and nose2 is stillborn.
It works well in pytest. The caveat about the generated method names is of little consequence as usually test method names are hidden, and name_func
can be used to create new names that are meaningful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we okay dealing parameterized/#34 then?
Or can we just shift to unittest(since we probably won't need pytest with that)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not our problem. pytest devs are not going to delete remove support for that until their main plugins are fixed. That is just an early warning system, and the alarm has been heard correctly.
username_api.py
Outdated
soup = BeautifulSoup(response.text, 'html.parser') | ||
if data['type'] == 'meta': | ||
# Look in metadata for `property` attributed image. | ||
result = [item.attrs['content'] for item in soup('meta') if item.has_attr('property') and item.attrs['property'].lower()==data['property']] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split this line
username_api.py
Outdated
result = result[0] | ||
elif data['type'] == 'regex': | ||
# Searches for "`property`": "`link`" | ||
regex = re.compile('[\'\"]' + re.escape(data['property']) + '[\'\"]:(\s)?[\'\"](?P<link>[^\s]+)[\'\"]') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split
username_api.py
Outdated
@@ -12,11 +14,56 @@ | |||
|
|||
patterns = yaml.load(open('websites.yml')) | |||
|
|||
def check_username(website, username): | |||
url = patterns['urls'].get(website, 'https://{w}.com/{u}').format( | |||
def get_url(website, username): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_profile_url
?
websites.yml
Outdated
pinterest: | ||
property: image_xlarge_url | ||
gitlab: | ||
property: og:image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isnt a property. This is a standard. It is the standard. If you dont like gitlab: true
or gitlab: opengraph
to reflect using OpenGraph, then please use
gitlab:
opengraph: true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jayvdb : Not sure if I understand what you mean correctly but this is for future compatibility, if there is a website added in future which has twitter:image
( see this ) but not og:image
, all that needs to be done is just write twitter:image
here(instead of og:image
) for it to work.
This also allows for supporting arbitrary meta tags for the image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you mean this should be renamed to meta-name
or something?
Or that this is an unnecessary feature? Let me know if it's the second case, I'll remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I want is gitlab: true
. OpenGraph is the default. No extra information is needed for this case. Any other alternative is an override. And that can be designed later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so I'll shift this to
gitlab:
opengraph: true
so we can still have url overrides easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gitlab: true
just coexists fine with
facebook:
url: ...
This is YAML.
You are using YAML to store data in a JSON-like structure. That is not YAML.
websites.yml
Outdated
gitlab: opengraph | ||
github: opengraph | ||
tumblr: opengraph | ||
behance: null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
behance: false
. That is more user-friendly . It has a clearer meaning.
username_api.py
Outdated
if not result: | ||
return None | ||
result = result.group('link') | ||
elif 'image' in response.headers.get('content-type'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
response.headers.get('content-type')
can return None.
...startswith('image/')
username_api.py
Outdated
if response.status_code == 404: | ||
return None | ||
|
||
soup = BeautifulSoup(response.text, 'html.parser') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be inside the relevant branch of the if
; it is an expensive object to create
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only a few things; you can submit when they are done
tests/test_get_avatar.py
Outdated
@parameterized.expand(load_test_cases('with'), | ||
testcase_func_name=custom_name_func) | ||
def test_with_avatar(self, website, user): | ||
if website == 'behance': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this belongs in load_test_cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we shouldn't even load those test cases? I'd rather keep a "skipped" notification no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can generate a skipped test in load_test_cases
tests/test_get_avatar.py
Outdated
pytest.skip("behance doesn't work") | ||
link = username_api.check_username(website, user)['avatar'] | ||
response = r.get(link) | ||
assert('image' in response.headers.get('content-type') or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
startswith(
image/)
username_api.py
Outdated
w=website, | ||
u=username | ||
) | ||
u=username) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary change here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually not a very good diff comparison, I have a fresh function in those lines, for some reason because of the repetition from the old one, git decides it's a modification of that. I'll change it nonetheless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is a very good comparison. :P
Splitting a function like this is very common.
not changing the lines in the middle helps code review and understanding what is changing and what isnt.
username_api.py
Outdated
if not result: | ||
return None | ||
result = result.group('link') | ||
elif (response.headers.get('content-type') and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use response.headers.get('content-type', '').startswith('image/')
to make it more readable and only perform slightly worse in the very unlikely uncommon scenario that one of these sites sends a response without a content-type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, that would still perform better, because every .
is a perf hit using getattribute
so your and
here is doubling the number of times response.headers.get('content-type')
occurs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, optimally, by locality of reference, the perf hit would be negligible, but we're dealing with python so performance^TM 🙃
websites.yml
Outdated
@@ -46,3 +46,15 @@ username_patterns: | |||
invalid_patterns: | |||
- "\\.(com|net)" | |||
- "(\\.)\\1{1,}" # consecutive '.' not allowed | |||
avatar: | |||
pinterest: | |||
property: image_xlarge_url |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is property
a good term for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about key
if you don't like it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think key
would be better since we're picking from json
data. property
seems vague here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
result = [item.attrs['content'] for item in soup('meta') | ||
if item.has_attr('property') and | ||
item.attrs['property'].lower() == 'og:image'] | ||
if not result or not result[0]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is check for both result
and result[0]
needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, some website returned an empty string for some reason. This is for that. (I can't remember which one, I think it was fb)
if not result: | ||
return None | ||
result = result.group('link') | ||
elif response.headers.get('content-type', '').startswith('image/'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one applies for twitter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. It redirects to homepage if the user doesn't exist or something else.
tests/test_get_avatar.py
Outdated
|
||
class TestGet_avatar(object): | ||
|
||
@parameterized.expand(load_test_cases('with'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we do with_avatar
and without_avatar
here? Other than that this PR is good to go.
websites.yml
Outdated
@@ -46,3 +46,15 @@ username_patterns: | |||
invalid_patterns: | |||
- "\\.(com|net)" | |||
- "(\\.)\\1{1,}" # consecutive '.' not allowed | |||
avatar: | |||
pinterest: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alpha sort the websites?
(it isnt done elsewhere, but that is a cleanup for later.)
Makes function get_avatar() and adds avatar related info to websites.yml Closes #35
Adds tests for get_avatar. Checks if given links are images or not.
good job @nalinbhardwaj 😄 |
Adds
get_avatar
to fetch avatars from all websites. Currently works for everything except Behance. Only works for Facebook ifprofile == visible
(as expected). The function expects the user's page to exist, but handles most errors gracefully even if it doesn't(['pinterest', 'gitlab']
are offenders of this).Adds appropriate tests, maybe they should be mixed with test_username_check, but I think the repo could use some more breaking down into parts, perhaps this could be a start.
Closes #35