Avatars: Fetch avatars for websites with taken usernames #42

nalinbhardwaj · 2017-12-20T19:41:57Z

Adds get_avatar to fetch avatars from all websites. Currently works for everything except Behance. Only works for Facebook if profile == visible(as expected). The function expects the user's page to exist, but handles most errors gracefully even if it doesn't(['pinterest', 'gitlab'] are offenders of this).

Adds appropriate tests, maybe they should be mixed with test_username_check, but I think the repo could use some more breaking down into parts, perhaps this could be a start.

Closes #35

andrewda · 2017-12-20T20:45:41Z

username_api.py

 import sys
 import re
 import yaml

 app = Flask(__name__)
-cors = CORS(app) 
+cors = CORS(app)


Unrelated change

andrewda · 2017-12-20T20:46:48Z

username_api.py

Unrelated change, #41 will take care of this.

Well, none of these are conflicting/breaking changes, I guess it's fine for them to be here as well.

If you're gonna keep them here, you should do them in a new commit so there's not a bunch of random changes in a commit that adds a new feature 👍

Lol, it's just a newline at EOF, nonetheless, I've undone those changes.

Yea, sorry for being nitpicky, it's just generally distracting from the purpose of the PR, and makes the git blame look weird.

jayvdb · 2017-12-21T02:21:04Z

tests/test_data.yml

@@ -1,5 +1,8 @@
 ---
 facebook:
+  avatar_usernames:


why do we need different usernames for facebook?

As noted in the PR description, profiles on Facebook can be “hidden”, therefore having no avatar to display. These are public profile links.

jayvdb · 2017-12-21T04:08:21Z

websites.yml

+    type: meta
+    property: og:image
+    url: null
+  behance:


Use behance: false , and order the keys alphabetically, so behance is at the top of the avatar group.

jayvdb · 2017-12-21T04:09:03Z

websites.yml

+  pinterest:
+    type: regex
+    property: image_xlarge_url
+    url: null


I would prefer to not have empty keys in yaml unless strictly necessary.

jayvdb · 2017-12-21T04:14:09Z

websites.yml

@@ -46,3 +46,40 @@ username_patterns:
    invalid_patterns:
      - "\\.(com|net)"
      - "(\\.)\\1{1,}"  # consecutive '.' not allowed
+avatar:
+  pinterest:
+    type: regex


type isnt needed. it can be inferred. we need property: , url: , otherwise the entries can be short and clear gitlab: opengraph , as that is the most common.

manu-chroma · 2017-12-21T05:51:09Z

tests/test_get_avatar.py

+
+class TestGet_avatar(object):
+
+	@pytest.mark.parametrize('user', data['github']['taken_usernames'])


Can we have better test parameterisation in this PR? (specifically for the newly added tests) There is lot of duplicated logic already in tests.

Okay, I will look into parameterisation. I would assume it's okay to use the same methods/parameterisation in the username_check tests. Is that fine?(Of course, I'll do those in a separate PR later)

If more parameterisation is not easy, it can be deferred to GCI task #36

I'm going to shift us from pytest to nosetests as part of #36 I think. It has a pretty powerful library parameterized, which although supports py.test, isn't as well supported(not as pretty and I see quite a few bug reports) (see FAQ for more details).

This is probably a large change(to builds etc.) , so I'd want it to be in it's own PR, but let me know whatever you decide.

nosetests is dead, and nose2 is stillborn.
It works well in pytest. The caveat about the generated method names is of little consequence as usually test method names are hidden, and name_func can be used to create new names that are meaningful.

Are we okay dealing parameterized/#34 then?
Or can we just shift to unittest(since we probably won't need pytest with that)?

Not our problem. pytest devs are not going to delete remove support for that until their main plugins are fixed. That is just an early warning system, and the alarm has been heard correctly.

manu-chroma · 2017-12-21T05:54:09Z

username_api.py

+		soup = BeautifulSoup(response.text, 'html.parser')
+		if data['type'] == 'meta':
+			# Look in metadata for `property` attributed image.
+			result = [item.attrs['content'] for item in soup('meta') if item.has_attr('property') and item.attrs['property'].lower()==data['property']]


split this line

manu-chroma · 2017-12-21T05:54:17Z

username_api.py

+			result = result[0]
+		elif data['type'] == 'regex':
+			# Searches for "`property`": "`link`"
+			regex = re.compile('[\'\"]' + re.escape(data['property']) + '[\'\"]:(\s)?[\'\"](?P<link>[^\s]+)[\'\"]')


jayvdb · 2017-12-21T13:46:47Z

username_api.py

@@ -12,11 +14,56 @@

 patterns = yaml.load(open('websites.yml'))

-def check_username(website, username):
-	url = patterns['urls'].get(website, 'https://{w}.com/{u}').format(
+def get_url(website, username):


get_profile_url ?

jayvdb · 2017-12-21T13:50:08Z

websites.yml

+  pinterest:
+    property: image_xlarge_url
+  gitlab:
+    property: og:image


This isnt a property. This is a standard. It is the standard. If you dont like gitlab: true or gitlab: opengraph to reflect using OpenGraph, then please use

gitlab: opengraph: true

@jayvdb : Not sure if I understand what you mean correctly but this is for future compatibility, if there is a website added in future which has twitter:image( see this ) but not og:image, all that needs to be done is just write twitter:image here(instead of og:image) for it to work.
This also allows for supporting arbitrary meta tags for the image.

Maybe you mean this should be renamed to meta-name or something?
Or that this is an unnecessary feature? Let me know if it's the second case, I'll remove it.

What I want is gitlab: true. OpenGraph is the default. No extra information is needed for this case. Any other alternative is an override. And that can be designed later.

Ok, so I'll shift this to

gitlab: opengraph: true

so we can still have url overrides easily.

gitlab: true just coexists fine with

facebook: url: ...

This is YAML.
You are using YAML to store data in a JSON-like structure. That is not YAML.

jayvdb · 2017-12-21T15:49:18Z

websites.yml

+  gitlab: opengraph
+  github: opengraph
+  tumblr: opengraph
+  behance: null


behance: false . That is more user-friendly . It has a clearer meaning.

jayvdb · 2017-12-21T15:51:09Z

username_api.py

+		if not result:
+			return None
+		result = result.group('link')
+	elif 'image' in response.headers.get('content-type'):


response.headers.get('content-type') can return None.

...startswith('image/')

jayvdb · 2017-12-21T15:52:22Z

username_api.py

+	if response.status_code == 404:
+		return None
+
+	soup = BeautifulSoup(response.text, 'html.parser')


this should be inside the relevant branch of the if ; it is an expensive object to create

jayvdb

only a few things; you can submit when they are done

jayvdb · 2017-12-21T16:53:48Z

tests/test_get_avatar.py

+	@parameterized.expand(load_test_cases('with'),
+						  testcase_func_name=custom_name_func)
+	def test_with_avatar(self, website, user):
+		if website == 'behance':


this belongs in load_test_cases

So we shouldn't even load those test cases? I'd rather keep a "skipped" notification no?

you can generate a skipped test in load_test_cases

jayvdb · 2017-12-21T16:54:24Z

tests/test_get_avatar.py

+			pytest.skip("behance doesn't work")
+		link = username_api.check_username(website, user)['avatar']
+		response = r.get(link)
+		assert('image' in response.headers.get('content-type') or


startswith(image/)

jayvdb · 2017-12-21T17:18:27Z

username_api.py

 		w=website,
-		u=username
-	)
+		u=username)


unnecessary change here

This is actually not a very good diff comparison, I have a fresh function in those lines, for some reason because of the repetition from the old one, git decides it's a modification of that. I'll change it nonetheless.

it is a very good comparison. :P
Splitting a function like this is very common.
not changing the lines in the middle helps code review and understanding what is changing and what isnt.

jayvdb · 2017-12-21T17:20:21Z

username_api.py

+		if not result:
+			return None
+		result = result.group('link')
+	elif (response.headers.get('content-type') and


I would use response.headers.get('content-type', '').startswith('image/') to make it more readable and only perform slightly worse in the very ~~unlikely~~ uncommon scenario that one of these sites sends a response without a content-type

actually, that would still perform better, because every . is a perf hit using getattribute so your and here is doubling the number of times response.headers.get('content-type') occurs.

Well, optimally, by locality of reference, the perf hit would be negligible, but we're dealing with python so performance^TM 🙃

manu-chroma · 2017-12-21T17:45:16Z

websites.yml

@@ -46,3 +46,15 @@ username_patterns:
    invalid_patterns:
      - "\\.(com|net)"
      - "(\\.)\\1{1,}"  # consecutive '.' not allowed
+avatar:
+  pinterest:
+    property: image_xlarge_url


is property a good term for this?

how about key if you don't like it?

i think key would be better since we're picking from json data. property seems vague here

manu-chroma · 2017-12-21T18:00:49Z

username_api.py

+		result = [item.attrs['content'] for item in soup('meta')
+				  if item.has_attr('property') and
+				  item.attrs['property'].lower() == 'og:image']
+		if not result or not result[0]:


why is check for both result and result[0] needed here?

Yes, some website returned an empty string for some reason. This is for that. (I can't remember which one, I think it was fb)

manu-chroma · 2017-12-21T18:02:50Z

username_api.py

+		if not result:
+			return None
+		result = result.group('link')
+	elif response.headers.get('content-type', '').startswith('image/'):


this one applies for twitter?

yes. It redirects to homepage if the user doesn't exist or something else.

manu-chroma · 2017-12-21T18:12:45Z

tests/test_get_avatar.py

+
+class TestGet_avatar(object):
+
+	@parameterized.expand(load_test_cases('with'),


can we do with_avatar and without_avatar here? Other than that this PR is good to go.

jayvdb · 2017-12-21T18:21:17Z

websites.yml

@@ -46,3 +46,15 @@ username_patterns:
    invalid_patterns:
      - "\\.(com|net)"
      - "(\\.)\\1{1,}"  # consecutive '.' not allowed
+avatar:
+  pinterest:


alpha sort the websites?
(it isnt done elsewhere, but that is a cleanup for later.)

Makes function get_avatar() and adds avatar related info to websites.yml Closes #35

Adds tests for get_avatar. Checks if given links are images or not.

manu-chroma · 2017-12-21T18:55:29Z

good job @nalinbhardwaj 😄

andrewda suggested changes Dec 20, 2017

View reviewed changes

jayvdb reviewed Dec 21, 2017

View reviewed changes

jayvdb requested changes Dec 21, 2017

View reviewed changes

manu-chroma reviewed Dec 21, 2017

View reviewed changes

jayvdb reviewed Dec 21, 2017

View reviewed changes

jayvdb requested changes Dec 21, 2017

View reviewed changes

jayvdb reviewed Dec 21, 2017

View reviewed changes

jayvdb approved these changes Dec 21, 2017

View reviewed changes

manu-chroma reviewed Dec 21, 2017

View reviewed changes

jayvdb mentioned this pull request Dec 21, 2017

Add more test parameterisation #36

Closed

manu-chroma reviewed Dec 21, 2017

View reviewed changes

jayvdb reviewed Dec 21, 2017

View reviewed changes

Nalin Bhardwaj added 2 commits December 22, 2017 00:17

username_api: Add avatar fetching

ff0a537

Makes function get_avatar() and adds avatar related info to websites.yml Closes #35

tests/test_get_avatar: Add tests

3e353fb

Adds tests for get_avatar. Checks if given links are images or not.

manu-chroma approved these changes Dec 21, 2017

View reviewed changes

manu-chroma merged commit c738d8a into manu-chroma:master Dec 21, 2017

nalinbhardwaj deleted the avatars branch December 21, 2017 19:23


		class TestGet_avatar(object):

		@pytest.mark.parametrize('user', data['github']['taken_usernames'])


		class TestGet_avatar(object):

		@parameterized.expand(load_test_cases('with'),

Avatars: Fetch avatars for websites with taken usernames #42

Avatars: Fetch avatars for websites with taken usernames #42

Conversation

nalinbhardwaj commented Dec 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jayvdb Dec 21, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jayvdb Dec 21, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jayvdb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jayvdb Dec 21, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nalinbhardwaj Dec 21, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manu-chroma commented Dec 21, 2017

nalinbhardwaj commented Dec 20, 2017 •

edited

Loading

jayvdb Dec 21, 2017 •

edited

Loading

jayvdb Dec 21, 2017 •

edited

Loading

jayvdb Dec 21, 2017 •

edited

Loading

nalinbhardwaj Dec 21, 2017 •

edited

Loading