Skip to content
This repository has been archived by the owner on Nov 10, 2024. It is now read-only.

get_friends / get_follows consistency #308

Closed
alexpghayes opened this issue Jan 30, 2019 · 14 comments
Closed

get_friends / get_follows consistency #308

alexpghayes opened this issue Jan 30, 2019 · 14 comments

Comments

@alexpghayes
Copy link
Contributor

get_friends() returns a tibble with two columns, but get_followers() only returns a tibble with a single column. It would be nice if they both returned the same columns. Always returning the two columns user and user_id would make it easy to build graphs from edge lists.

@alexpghayes
Copy link
Contributor Author

Side note: if there's going to be a breaking change here, I would love if we could get explicit from and to column names rather than user and user_id.

@Arf9999
Copy link

Arf9999 commented Mar 15, 2021

Not to be difficult, but to avoid some breaking changes, it would be better not to change any column names. If some reference is required, rather add columns such as "follower_of" or "friend_of" which provide reference to a particular user_id for network mapping if required. Converting to edgelists is then fairly simple and there is no breaking change in terms of referencing the user_id column for other functions.

(Speaking very selfishly... I fear that this would break a few of my existing analysis functions... and I'd prefer not to try figure out what I was doing two years ago to get them to work again).

@alexpghayes
Copy link
Contributor Author

On top of this it seems like get_friends() only returns a single column when the requested user doesn't follow anyone.

rtweet::get_friends("hadleywickham")
#> # A tibble: 274 x 2
#>    user          ids                
#>    <chr>         <chr>              
#>  1 hadleywickham 1344499345773752321
#>  2 hadleywickham 3959153969         
#>  3 hadleywickham 43875304           
#>  4 hadleywickham 2423861950         
#>  5 hadleywickham 793171723772395521 
#>  6 hadleywickham 1218534623959044096
#>  7 hadleywickham 2798914670         
#>  8 hadleywickham 319822761          
#>  9 hadleywickham 43470348           
#> 10 hadleywickham 326511843          
#> # … with 264 more rows

rtweet::get_friends("SmallBuStudio")
#> # A tibble: 1 x 1
#>   user         
#>   <chr>        
#> 1 SmallBuStudio

Created on 2021-07-15 by the reprex package (v2.0.0)

@llrs
Copy link
Collaborator

llrs commented Nov 20, 2021

I like the idea of making them more consistent and it is an easy change, but I don't like to add new columns with the same information.

It might be worth adding to get_followers a first column with the user as on get_friends, that way the second column will always be the to user and the first one the from. The column names can be confusing and people that might not pay attention might think that the user_id is the id of the user on the first column, so a name change might be useful.

For the moment when a user has no friends the output will be a 0 x 2 tibble, for easy rbind.

For reference on 0.7.0 they returned this:

> rtweet::get_friends("SmallBuStudio")
## A tibble: 0 × 0
> rtweet::get_friends("hadleywickham")
## A tibble: 279 × 2
#    user          user_id            
#    <chr>         <chr>              
#  1 hadleywickham 1215516763024003074
#  2 hadleywickham 34677653           
#  3 hadleywickham 911618422483640320 
#  4 hadleywickham 877452117581213696 
#  5 hadleywickham 235261861          
#  6 hadleywickham 935427373620658176 
#  7 hadleywickham 13202482           
#  8 hadleywickham 1344499345773752321
#  9 hadleywickham 3959153969         
# 10 hadleywickham 43875304           
## … with 269 more rows
> rtweet::get_followers("hadleywickham")
## A tibble: 5,000 × 1
#   user_id            
#   <chr>              
# 1 1461812126457253895
# 2 1460159073039618048
# 3 1253866294358585348
# 4 552589054          
# 5 1302715713052794880
# 6 1268043412239835138
# 7 452215809          
# 8 831911196840300547 
# 9 1461893465885741063
#10 3181675553         
## … with 4,990 more rows

@alexpghayes
Copy link
Contributor Author

Column name wise could also go with something like from_id and to_id for more clarity.

@llrs llrs closed this as completed in 1c6025b Nov 23, 2021
@alexpghayes
Copy link
Contributor Author

So happy to see this happen 🎉 !

@llrs
Copy link
Collaborator

llrs commented Nov 23, 2021

I went with your suggestion because it made it clear which account is following which. Glad to close an old issue 😃 .

@alexpghayes
Copy link
Contributor Author

Upon further thought I don't think the _id suffix is a good idea. This is because sometimes you get a screen name and sometimes you get a user ID and the _id suffix makes it seem like you always get a user ID. For example, this is confusing here, and uses the _id suffix in a way that is not consistent with the v2 API.

rtweet::get_friends("hadleywickham")
#> # A tibble: 279 × 2
#>    from_id       to_id              
#>    <chr>         <chr>              
#>  1 hadleywickham 1215516763024003074
#>  2 hadleywickham 34677653           
#>  3 hadleywickham 911618422483640320 
#>  4 hadleywickham 877452117581213696 
#>  5 hadleywickham 235261861          
#>  6 hadleywickham 935427373620658176 
#>  7 hadleywickham 13202482           
#>  8 hadleywickham 1344499345773752321
#>  9 hadleywickham 3959153969         
#> 10 hadleywickham 43875304           
#> # … with 269 more rows

Created on 2021-11-23 by the reprex package (v2.0.1.9000)

@llrs
Copy link
Collaborator

llrs commented Nov 24, 2021

Yes, I briefly considered changing to from_user and to_user which is neither screen_name or id.

I see that v2 returns id, name and screen name. I don't think there is any way to make the result consistent between API versions, but if you have an idea you can send a PR and I will review it to get it merged.

@Arf9999
Copy link

Arf9999 commented Nov 24, 2021 via email

@llrs
Copy link
Collaborator

llrs commented Nov 24, 2021

@Arf9999 I am not sure I understand what kind of consistency you want or why. Currently it is consistent between followers and friends, they have the same column name. Do you want different name between get_followers and get_friends to be able to differentiate where does the data come from?

I don't follow up your naming convention per query comment, as I wrote the API on v2 will return different fields (column names) which are not possible to obtain now (without more API calls), and the current response of API v1 on friends and followers is not consistent with the other endpoints, so we might as well chose which names are used on get_friends and get_followers.

@Arf9999
Copy link

Arf9999 commented Nov 24, 2021 via email

@llrs
Copy link
Collaborator

llrs commented Nov 24, 2021

To summarize, you want to have a column that is equally named on both get_friends and get_followers even if the "direction" of the relationship is not the same, while Alex wants both to have the same column names.

@Arf9999 What are those simpler joins you want to keep and how hard are they currently as on 0.7.0.9008?
I don't see a simple way to reconcile these two ideas, perhaps you could send a PR with what you have in mind?

@Arf9999
Copy link

Arf9999 commented Nov 25, 2021

To summarize, you want to have a column that is equally named on both get_friends and get_followers even if the "direction" of the relationship is not the same, while Alex wants both to have the same column names.

@Arf9999 What are those simpler joins you want to keep and how hard are they currently as on 0.7.0.9008? I don't see a simple way to reconcile these two ideas, perhaps you could send a PR with what you have in mind?

I think Alex wants a simple way to build networks, while I want to use these tibbles to examine things like the Jaccard Index between accounts and also to easily join these tibbles to the responses from a (for example) search_tweets or lookup_user query.

The joins are very easy in 0.7.0 because the key column is the same in both i.e, 'user_id', so using a dplyr join simply requires 'by = user_id' consistently across almost all rtweet tibbles.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants