You must be signed in to change notification settings - Fork 198
Tweets of multiple Twitter-Accounts #136
The code worked for me: > tmls_flw <- get_timelines(c("cnn", "BBCWorld", "foxnews"), n = 3200, retryonratelimit =TRUE)
# A tibble: 9,649 x 42
status_id created_at user_id screen_name
* <chr> <dttm> <chr> <chr>
1 930797610092449792 2017-11-15 14:00:18 759251 CNN
2 930794780812070913 2017-11-15 13:49:03 759251 CNN
3 930792031496044544 2017-11-15 13:38:08 759251 CNN
4 930789258218164224 2017-11-15 13:27:07 759251 CNN
5 930786544159518720 2017-11-15 13:16:19 759251 CNN
6 930784144744951808 2017-11-15 13:06:47 759251 CNN
7 930783345948200961 2017-11-15 13:03:37 759251 CNN
8 930778887403048960 2017-11-15 12:45:54 759251 CNN
9 930775665586163714 2017-11-15 12:33:06 759251 CNN
10 930772881491005441 2017-11-15 12:22:02 759251 CNN
# ... with 9,639 more rows, and 38 more variables: text <chr>, source <chr>,
# reply_to_status_id <chr>, reply_to_user_id <chr>,
# reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
# favorite_count <int>, retweet_count <int>, hashtags <list>, symbols <list>,
# urls_url <list>, urls_t.co <list>, urls_expanded_url <list>,
# media_url <list>, media_t.co <list>, media_expanded_url <list>,
# media_type <list>, ext_media_url <list>, ext_media_t.co <list>,
# ext_media_expanded_url <list>, ext_media_type <lgl>,
# mentions_user_id <list>, mentions_screen_name <list>, lang <chr>,
# quoted_status_id <chr>, quoted_text <chr>, retweet_status_id <chr>,
# retweet_text <chr>, place_url <chr>, place_name <chr>,
# place_full_name <chr>, place_type <chr>, country <chr>, country_code <chr>,
# geo_coords <list>, coords_coords <list>, bbox_coords <list>
I actually haven't gotten around to adding Otherwise, it looks like you'd burn through about 17 requests per user, which means you should be able to get the max number of statuses returned for 52 users every 15 minutes. > rate_limit("get_timeline")
# A tibble: 1 x 6
query limit remaining reset reset_at
<chr> <int> <int> <time> <dttm>
1 statuses/user_timeline 900 849 12.05776 mins 2017-11-15 08:15:46
# ... with 1 more variables: app <chr> If you're dealing with a larger number of accounts than 52, then you'd probably want to set up a for loop. For example, let's say you have a vector, tmls <- vector("list", length(users))
for (i in seq_along(tmls)) {
tmls[[i]] <- get_timeline(users[i], n = 3200)
## assuming full rate limit at start, wait for fresh reset every 52 users
if (i %% 52L == 0L) {
rl <- rate_limit("get_timeline")
Sys.sleep(as.numeric(rl$reset, "secs"))
## print update message
cat(i, " ")
## merge into single data frame (do_call_rbind will preserve users data)
tmls <- do_call_rbind(tmls) Side note, this actually returned slightly more than 3200 [unique] tweets per user, which I don't think I've seen before. # A tibble: 3 x 3
term n percent
<chr> <int> <dbl>
1 BBCWorld 3218 0.3335061
2 FoxNews 3216 0.3332988
3 CNN 3215 0.3331951 |
Thanks for your reply. I run the loop code you considered and it worked fine. I just got multiple warnings, that some pages do not exist. Could this be an error as a result of no tweets on these timelines and if the answer is yes, is there a possibility to code it with an if-function like "if statuses_count <=1 then dismiss this account" or something like this? It would help me to save a lot of time and processing power. Thanks in advance RG |
Hi @renegro90.
@renegro90 @mrmvergeer Thanks for following up on this! Question: with the newest version (0.6.0) of rtweet, are these empty timelines creating errors or warnings? The should be creating warnings...so please let me know if you experience anything differently! |
@mkearney. Yes I got plenty of warnings by running the code with ~100 accounts. After completing the computation, R says:
@mrmvergeer. Your code works. But the original script by @mkearney worked as well (I got the same output with both of your codes) and didn't stopped but with your addition it's possible to see on which user the script is working at the moment (it's kind of like a loading bar). @mkearney Is it possible (maybe in an interim stage between the |
@renegro90 You should be able to filter users using the > ## users with public/english, public/french, private/english accounts respectively
> sns <- c("kearneymw", "Vachier_Lagrave", "mikewaynesworld")
> ## lookup users data
> (usr <- lookup_users(sns))
# A tibble: 3 x 20
user_id name screen_name location
<chr> <chr> <chr> <chr>
1 2973406683 "Mike Kearney\U0001f4ca" kearneymw Columbia, MO
2 157070052 MVL Vachier_Lagrave Paris, France
3 174454226 mw mikewaynesworld SMDHU
# ... with 16 more variables: description <chr>, url <chr>, protected <lgl>,
# followers_count <int>, friends_count <int>, listed_count <int>,
# statuses_count <int>, favourites_count <int>, account_created_at <dttm>,
# verified <lgl>, profile_url <chr>, profile_expanded_url <chr>,
# account_lang <chr>, profile_banner_url <chr>, profile_background_url <chr>,
# profile_image_url <chr>
> ## view protected variable values
> usr$protected
> ## view account_lang variable values
> usr$account_lang
[1] "en" "fr" "en" So you could create a function to filter those like this: ## function to filter only English-language and public accounts.
filter_users <- function(x) {
if (!is.data.frame(x) || !all(c("account_lang", "protected") %in% names(x))) {
stop("Users data not found")
x$user_id[x$account_lang == "en" & x$protected]
} Apply > filter_users(usr)
[1] "174454226" |
Hey, thanks for fixing the issues about the authorization method and the data output yesterday :)
Now I'm a bit puzzled if there's a possibility to get the maximum number of tweets (3.200 per account) from a large sample i.e. 1.000 persons.
I already tried something like this:
tmls_flw <- get_timelines(c("cnn", "BBCWorld", "foxnews"), n = 3200, retryonratelimit =TRUE)
But it didn't worked the way I expected. I'm now just getting a total of 3.200 tweets and not 3.200 from each of them.
Is there any workaround to get all the tweets of such a large number of accounts with the get_timelines-function which says: "Hey R, give me the maximum number (3.200 per account) of recent tweets of these accounts."?
Or do I have to code it like this, for every account I want to mine?
flw1 <- get_timeline("cnn"), n = 3200)
flw2 <- get_timeline("bbc"), n = 3200)
flw3 <- get_timeline("fox"), n = 3200)
Thanks in advance
The text was updated successfully, but these errors were encountered: