Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Days with no CRAN downloads #54

Open
lindbrook opened this issue Jan 10, 2020 · 30 comments
Open

Days with no CRAN downloads #54

lindbrook opened this issue Jan 10, 2020 · 30 comments

Comments

@lindbrook
Copy link
Contributor

There are 43 days when cranlogs::cran_downloads() reports that there were zero package downloads. I've checked a couple of logs at http://cran-logs.rstudio.com/; they seem to disagree.

dates <- as.Date(c("2018-01-05", "2018-02-09", "2018-02-10", "2018-02-23",
  "2018-02-24", "2018-05-06", "2018-05-12", "2018-05-19", "2018-05-27",
  "2018-07-07", "2018-07-08", "2018-07-28", "2018-08-31", "2018-10-21",
  "2017-01-12", "2017-07-16", "2017-09-01", "2017-09-02", "2016-02-03",
  "2016-06-02", "2016-06-12", "2016-07-12", "2016-07-24", "2016-08-04",
  "2016-08-11", "2016-08-13", "2016-08-14" ,"2016-08-20", "2016-09-02",
  "2016-09-09", "2015-08-23", "2015-09-07", "2015-09-09", "2015-10-18",
  "2015-10-26", "2015-10-31", "2015-11-01", "2015-11-15", "2014-01-01",
  "2014-11-17", "2012-12-29", "2012-12-30", "2012-12-31"))

dates <- sort(dates)

zero_downloads <- lapply(dates, function(x) {
  cranlogs::cran_downloads(from = x, to = x)
})

zero_downloads <- do.call(rbind, zero_downloads)

I'm guessing these will be fixed when you update the DB script (#45).

FWIW

@IndrajeetPatil
Copy link

Maybe related to this: the download counts for 16th and 17th of Jan. are 0 as well.

cranlogs::cran_downloads(
  packages = "ggplot2",
  from = "2020-01-12",
  to = Sys.Date()
)
#>         date count package
#> 1 2020-01-12 23692 ggplot2
#> 2 2020-01-13 41793 ggplot2
#> 3 2020-01-14 42412 ggplot2
#> 4 2020-01-15 40575 ggplot2
#> 5 2020-01-16     0 ggplot2
#> 6 2020-01-17     0 ggplot2
#> 7 2020-01-18 19643 ggplot2
#> 8 2020-01-19     0 ggplot2

Created on 2020-01-19 by the reprex package (v0.3.0.9001)

@gaborcsardi
Copy link
Contributor

I have fixed most of these, except for the ones in 2012, for which my parser fails, so I'll need to take a closer look to these....

@gaborcsardi
Copy link
Contributor

These three days are really missing, because the 2012-12-29 file contains the data for 2012-12-26, etc., but then from 2013-01-01 the files names actually refer to the correct day. So these three days are lost forever. IDK if we should document this somewhere or do something else about it.

@lindbrook
Copy link
Contributor Author

Use a warning() to flag those dates in cranlogs::cran_downloads()?

@gaborcsardi
Copy link
Contributor

Yeah, possibly.

@gaborcsardi gaborcsardi reopened this Jan 29, 2020
@lindbrook
Copy link
Contributor Author

Logs for 2012, which start on Oct 1, need some TLC. They are fixable but the last three days of 2012 do indeed seem to be lost.

  1. Logs "2012-10-01" between "2012-10-10" are OK.

  2. Logs between "2012-10-16" and "2012-12-31" are offset by -3 days. If you look at the log for "2012-10-16" you get "2012-10-13"; If you look at the log for "2012-12-31" you get "2012-12-28".

  3. Logs from "2012-10-11" though "2012-10-15" have three duplicates, Oct 7, Oct 8 and Oct 11. This probably requires some juggling.

Nominal Actual
11 ----- 07
12 ----- 11
13 ----- 08
14 ----- 12
15 ----- 11

Details later if you want.

FWIW, with 'packageRank' 0.3.0.9026, you can check these with:

unique(packageRank::packageLog(date = "2012-10-11")$date)

@gaborcsardi
Copy link
Contributor

Thanks! I don't think there is much to fix, I don't actually use the filenames when updating the db, only the data in the files.

@lindbrook
Copy link
Contributor Author

That's interesting (is that part of the code on GitHub?). I actually make use of the filenames. So I can "fix" it on my end. But do you think this would be something worth informing RStudio about?

@gaborcsardi
Copy link
Contributor

Yeah, it is here: https://github.com/r-hub/cranlogs.app/blob/master/db/update.sh

These logs are gone, I am pretty sure, so there is nothing anyone can do about these three days.

@lindbrook
Copy link
Contributor Author

I meant updating the filenames so they point to the correct log file.

@gaborcsardi
Copy link
Contributor

Ah, I see. I am not sure if it is worth changing it. People might have their own workarounds already, and then we'll break them.

@lindbrook
Copy link
Contributor Author

Then, it's probably worth just noting the missing days in the README/webpage.

@IndrajeetPatil
Copy link

I am just posting here it because I wonder if the low download count has anything to do with the date being 29th Feb!

cranlogs::cran_downloads(
  packages = "ggplot2",
  from = "2020-02-25",
  to = Sys.Date()
)

#>         date count package
#> 1 2020-02-25 42860 ggplot2
#> 2 2020-02-26 44631 ggplot2
#> 3 2020-02-27 42154 ggplot2
#> 4 2020-02-28 34426 ggplot2
#> 5 2020-02-29  5554 ggplot2
#> 6 2020-03-01     0 ggplot2

@lindbrook
Copy link
Contributor Author

My guess is that part of the reason is that scripts used to do automated downloads may not have accounted for the leap day. That said, the last available leap day, in 2016, wasn't particularly unusual:

plot(cranlogs::cran_downloads(from = "2016-02-01", to = "2016-02-29"), type = "o")
plot(packageRank::cranDownloads(from = "2016-02", to = "2016-02"))

@lindbrook
Copy link
Contributor Author

Also source for R v3.6.3 was released on 2020-02-29.

@lindbrook
Copy link
Contributor Author

FWIW, also affected downloads of R:
r_downloads

@gaborcsardi
Copy link
Contributor

Wow. This is probably an oversimplification, but maybe there weren't that many automated downloads back in 2016.

IDK if the release date has anything to do with it, but that's easy to check, the other release dates are these: https://rversions.r-pkg.org/r-versions

@lindbrook
Copy link
Contributor Author

"2016-02-29" was a monday
pkgs2016
r2016

@IndrajeetPatil
Copy link

IndrajeetPatil commented Mar 28, 2020

The 2020-03-26 was also with 0 downloads.

cranlogs::cran_downloads(
  packages = "ggplot2",
  from = "2020-03-25",
  to = Sys.Date()
)
#>         date count package
#> 1 2020-03-25 63129 ggplot2
#> 2 2020-03-26     0 ggplot2
#> 3 2020-03-27 63344 ggplot2

@IndrajeetPatil
Copy link

IndrajeetPatil commented Apr 4, 2020

Download counts are also 0 for 2nd and 3rd of April.

cranlogs::cran_downloads(
  packages = "ggplot2",
  from = "2020-03-31",
  to = Sys.Date()
)
#>         date count package
#> 1 2020-03-31 66205 ggplot2
#> 2 2020-04-01 65428 ggplot2
#> 3 2020-04-02     0 ggplot2
#> 4 2020-04-03     0 ggplot2
#> 5 2020-04-04 50522 ggplot2

@IndrajeetPatil
Copy link

2020-04-20 also had 0 downloads.

cranlogs::cran_downloads(
  packages = "ggplot2",
  from = "2020-04-18",
  to = Sys.Date()
)
#>         date count package
#> 1 2020-04-18 52350 ggplot2
#> 2 2020-04-19 48923 ggplot2
#> 3 2020-04-20     0 ggplot2
#> 4 2020-04-21 63808 ggplot2
#> 5 2020-04-22     0 ggplot2

Created on 2020-04-22 by the reprex package (v0.3.0.9001)

@hongooi73
Copy link

hongooi73 commented Jun 25, 2020

Haven't seen any downloads for the last week.

r$> cran_downloads(from="2020-06-01", package="dplyr")
         date count package
1  2020-06-01 50366   dplyr
2  2020-06-02 52765   dplyr
3  2020-06-03 52948   dplyr
4  2020-06-04 50348   dplyr
5  2020-06-05 47053   dplyr
6  2020-06-06 31556   dplyr
7  2020-06-07 32620   dplyr
8  2020-06-08 51816   dplyr
9  2020-06-09 51841   dplyr
10 2020-06-10 49710   dplyr
11 2020-06-11 48361   dplyr
12 2020-06-12 44394   dplyr
13 2020-06-13 29262   dplyr
14 2020-06-14 29947   dplyr
15 2020-06-15 48074   dplyr
16 2020-06-16 47806   dplyr
17 2020-06-17 45596   dplyr
18 2020-06-18 43152   dplyr
19 2020-06-19 37575   dplyr
20 2020-06-20     0   dplyr
21 2020-06-21     0   dplyr
22 2020-06-22     0   dplyr
23 2020-06-23     0   dplyr
24 2020-06-24     0   dplyr
25 2020-06-25     0   dplyr

@hongooi73
Copy link

Seems to have updated; when I rerun the above command, I get more days filled in. Still missing the most recent 2 days though.

r$> cran_downloads(from="2020-06-01", package="dplyr")
         date count package
. . .
21 2020-06-21 23537   dplyr
22 2020-06-22 41854   dplyr
23 2020-06-23 44296   dplyr
24 2020-06-24     0   dplyr
25 2020-06-25     0   dplyr

@lindbrook
Copy link
Contributor Author

The log for the current day (e.g. 2020-06-25) isn't be available till the next day (e.g. 2020-06-26).

Regarding the 24th, I think they're moving servers/services so my understanding is that they've been manually running the script of late (time zones may come into play as well).

FWIW, if you really want the latest counts, you can fetch the logs directly (http://cran-logs.rstudio.com/) or use packages/functions that do so.

@hongooi73
Copy link

Logs are getting hung up again:

. . .
21 2020-06-21 23537   dplyr
22 2020-06-22 41854   dplyr
23 2020-06-23 44296   dplyr
24 2020-06-24 42407   dplyr
25 2020-06-25 42091   dplyr
26 2020-06-26 37934   dplyr
27 2020-06-27     0   dplyr
28 2020-06-28     0   dplyr
29 2020-06-29     0   dplyr
30 2020-06-30     0   dplyr

It's weird that the service is so patchy. I'd have thought it's just a daily cron job or something, so that updates "just work".

@nbarrowman
Copy link

Logs seem to be hung up again:

        date count package
1 2020-08-12 48458 ggplot2
2 2020-08-13 49645 ggplot2
3 2020-08-14 44313 ggplot2
4 2020-08-15 35502 ggplot2
5 2020-08-16 39237 ggplot2
6 2020-08-17     0 ggplot2
7 2020-08-18     0 ggplot2
8 2020-08-19     0 ggplot2
9 2020-08-20     0 ggplot2

@IndrajeetPatil
Copy link

No download count for 2020-10-03:

    cranlogs::cran_downloads(
      packages = "ggplot2",
      from = "2020-09-26",
      to = Sys.Date()
    )
    #>          date count package
    #> 1  2020-09-26 43607 ggplot2
    #> 2  2020-09-27 45068 ggplot2
    #> 3  2020-09-28 60917 ggplot2
    #> 4  2020-09-29 63517 ggplot2
    #> 5  2020-09-30 64071 ggplot2
    #> 6  2020-10-01 60625 ggplot2
    #> 7  2020-10-02 56791 ggplot2
    #> 8  2020-10-03     0 ggplot2
    #> 9  2020-10-04 43545 ggplot2

Created on 2020-10-06 by the reprex package (v0.3.0.9001)

@lindbrook
Copy link
Contributor Author

There are five days in 2020 that cranlogs::cran_downloads() still reports as having zero downloads:

days <- c("2020-03-26", "2020-04-02", "2020-04-03", "2020-04-20", "2020-10-03")
out <- lapply(days, function(x) cranlogs::cran_downloads(from = x, to = x))
do.call(rbind, out)

#         date count
# 1 2020-03-26     0
# 2 2020-04-02     0
# 3 2020-04-03     0
# 4 2020-04-20     0
# 5 2020-10-03     0

Would it be possible to fix these?

@IndrajeetPatil
Copy link

The count is 0 also for 2021-11-20.

cranlogs::cran_downloads(
  packages = "ggplot2",
  from = "2021-11-18",
  to = "2021-11-22"
)
#>         date  count package
#> 1 2021-11-18 115004 ggplot2
#> 2 2021-11-19 106105 ggplot2
#> 3 2021-11-20      0 ggplot2
#> 4 2021-11-21  86233 ggplot2
#> 5 2021-11-22 110980 ggplot2

Created on 2021-11-27 by the reprex package (v2.0.1)

@lindbrook
Copy link
Contributor Author

FWIW, the RStudio logs were posted "late" that day. When that happens, 'cranlogs' will return a zero count.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants