-
Notifications
You must be signed in to change notification settings - Fork 198
get_friends / get_follows consistency #308
Comments
Side note: if there's going to be a breaking change here, I would love if we could get explicit |
Not to be difficult, but to avoid some breaking changes, it would be better not to change any column names. If some reference is required, rather add columns such as "follower_of" or "friend_of" which provide reference to a particular user_id for network mapping if required. Converting to edgelists is then fairly simple and there is no breaking change in terms of referencing the user_id column for other functions. (Speaking very selfishly... I fear that this would break a few of my existing analysis functions... and I'd prefer not to try figure out what I was doing two years ago to get them to work again). |
On top of this it seems like rtweet::get_friends("hadleywickham")
#> # A tibble: 274 x 2
#> user ids
#> <chr> <chr>
#> 1 hadleywickham 1344499345773752321
#> 2 hadleywickham 3959153969
#> 3 hadleywickham 43875304
#> 4 hadleywickham 2423861950
#> 5 hadleywickham 793171723772395521
#> 6 hadleywickham 1218534623959044096
#> 7 hadleywickham 2798914670
#> 8 hadleywickham 319822761
#> 9 hadleywickham 43470348
#> 10 hadleywickham 326511843
#> # … with 264 more rows
rtweet::get_friends("SmallBuStudio")
#> # A tibble: 1 x 1
#> user
#> <chr>
#> 1 SmallBuStudio Created on 2021-07-15 by the reprex package (v2.0.0) |
I like the idea of making them more consistent and it is an easy change, but I don't like to add new columns with the same information. It might be worth adding to get_followers a first column with the user as on For the moment when a user has no friends the output will be a 0 x 2 tibble, for easy For reference on 0.7.0 they returned this: > rtweet::get_friends("SmallBuStudio")
## A tibble: 0 × 0
> rtweet::get_friends("hadleywickham")
## A tibble: 279 × 2
# user user_id
# <chr> <chr>
# 1 hadleywickham 1215516763024003074
# 2 hadleywickham 34677653
# 3 hadleywickham 911618422483640320
# 4 hadleywickham 877452117581213696
# 5 hadleywickham 235261861
# 6 hadleywickham 935427373620658176
# 7 hadleywickham 13202482
# 8 hadleywickham 1344499345773752321
# 9 hadleywickham 3959153969
# 10 hadleywickham 43875304
## … with 269 more rows
> rtweet::get_followers("hadleywickham")
## A tibble: 5,000 × 1
# user_id
# <chr>
# 1 1461812126457253895
# 2 1460159073039618048
# 3 1253866294358585348
# 4 552589054
# 5 1302715713052794880
# 6 1268043412239835138
# 7 452215809
# 8 831911196840300547
# 9 1461893465885741063
#10 3181675553
## … with 4,990 more rows |
Column name wise could also go with something like |
So happy to see this happen 🎉 ! |
I went with your suggestion because it made it clear which account is following which. Glad to close an old issue 😃 . |
Upon further thought I don't think the rtweet::get_friends("hadleywickham")
#> # A tibble: 279 × 2
#> from_id to_id
#> <chr> <chr>
#> 1 hadleywickham 1215516763024003074
#> 2 hadleywickham 34677653
#> 3 hadleywickham 911618422483640320
#> 4 hadleywickham 877452117581213696
#> 5 hadleywickham 235261861
#> 6 hadleywickham 935427373620658176
#> 7 hadleywickham 13202482
#> 8 hadleywickham 1344499345773752321
#> 9 hadleywickham 3959153969
#> 10 hadleywickham 43875304
#> # … with 269 more rows Created on 2021-11-23 by the reprex package (v2.0.1.9000) |
Yes, I briefly considered changing to I see that v2 returns id, name and screen name. I don't think there is any way to make the result consistent between API versions, but if you have an idea you can send a PR and I will review it to get it merged. |
For the tibble returned for this request, the ideal (for me) would be some kind of naming consistency for the IDs returned as followers/friends, and a column indicating the account friended/followed.
So, I'd prefer if the queried account column be called 'follower_of' or 'friend_of' and the returned friends/followers as 'user_id' (or whatever the account ID is named in other rtweet tibbles returned by searches or lookups)
That way we only add one new naming convention per query rather than two.
I hope my explanation is clear.
…________________________________
From: Lluís ***@***.***>
Sent: Wednesday, November 24, 2021 10:26:55 AM
To: ropensci/rtweet ***@***.***>
Cc: Andrew Fraser ***@***.***>; Comment ***@***.***>
Subject: Re: [ropensci/rtweet] get_friends / get_follows consistency (#308)
Yes, I briefly considered changing to from_user and to_user which is neither screen_name or id.
I see that v2 returns<https://developer.twitter.com/en/docs/twitter-api/users/follows/api-reference/get-users-id-followers> id, name and screen name. I don't think there is any way to make the result consistent between API versions, but if you have an idea you can send a PR and I will review it to get it merged.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#308 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AIY5MIJCKFVJQ5ZB76VNBHLUNSOU7ANCNFSM4GTLIS2A>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
@Arf9999 I am not sure I understand what kind of consistency you want or why. Currently it is consistent between followers and friends, they have the same column name. Do you want different name between get_followers and get_friends to be able to differentiate where does the data come from? I don't follow up your naming convention per query comment, as I wrote the API on v2 will return different fields (column names) which are not possible to obtain now (without more API calls), and the current response of API v1 on friends and followers is not consistent with the other endpoints, so we might as well chose which names are used on |
Current (0.7) rtweet returns two columns for get_friends.
'user' and 'user_id' and a single column for get_followers, 'user_id'.
So, although the structure between the two is inconsistent, the key column is similarly named. I'd prefer to keep that consistency. An additional column for context can be named accordingly, but the key column should be named consistently across tibbles returned by the package, I believe, as it allows for simpler joins between different query responses.
What that consistent name should be is negotiable based on API return from V1 or V2 queries. I'm happy with current (0.7) naming because I have existing functions that use it, and changing it will break those functions. But that's just my selfish view.
…________________________________
From: Lluís ***@***.***>
Sent: Wednesday, November 24, 2021 10:49:33 AM
To: ropensci/rtweet ***@***.***>
Cc: Andrew Fraser ***@***.***>; Mention ***@***.***>
Subject: Re: [ropensci/rtweet] get_friends / get_follows consistency (#308)
@Arf9999<https://github.com/Arf9999> I am not sure I understand what kind of consistency you want or why. Currently it is consistent between followers and friends, they have the same column name. Do you want different name between get_followers and get_friends to be able to differentiate where does the data come from?
I don't follow up your naming convention per query comment, as I wrote the API on v2 will return different fields (column names) which are not possible to obtain now (without more API calls), and the current response of API v1 on friends and followers is not consistent with the other endpoints, so we might as well chose which names are used on get_friends and get_followers.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#308 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AIY5MIIZ2UB2GE3AYH4JBSDUNSRJ3ANCNFSM4GTLIS2A>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
To summarize, you want to have a column that is equally named on both @Arf9999 What are those simpler joins you want to keep and how hard are they currently as on 0.7.0.9008? |
I think Alex wants a simple way to build networks, while I want to use these tibbles to examine things like the Jaccard Index between accounts and also to easily join these tibbles to the responses from a (for example) The joins are very easy in 0.7.0 because the key column is the same in both i.e, 'user_id', so using a dplyr join simply requires 'by = user_id' consistently across almost all rtweet tibbles. |
get_friends()
returns a tibble with two columns, butget_followers()
only returns a tibble with a single column. It would be nice if they both returned the same columns. Always returning the two columnsuser
anduser_id
would make it easy to build graphs from edge lists.The text was updated successfully, but these errors were encountered: