-
Notifications
You must be signed in to change notification settings - Fork 0
/
free_profile_usage.Rmd
153 lines (117 loc) · 5.98 KB
/
free_profile_usage.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---
title: "How many profiles do Free users use?"
output: github_document
---
## Motivation
The purpose of this analysis is to get a better understanding of how users on the Individual plan use Buffer, in order to better inform decisions around the plan limits. Specifically, we would love to learn about how many profiles Free users actually schedule updates for.
More data questions and context can be found in [**this Paper**](https://paper.dropbox.com/doc/Free-Plan-data-KJNJLDW3KtfMdZK4Z6DRS), and potential changes to the Free plan can be found in [**this Paper**](https://paper.dropbox.com/doc/Free-plan-changes-DVheYkvNs3yiE9Nhb9XRF).
## Methodology
In order to answer this question reliably with data, we would like to only consider users that are currently _active_, and analyze their updates _during the period in which they were active in Buffer_. In this case, _active_ is defined as having scheduled at least one update with Buffer in the past 28 days.
To answer the question of how many profiles Free users actually use, we'll first gather all of the currently active Free users. Then, we will gather data on the number of profiles they have, and the number of updates that they have scheduled _for each profile_.
Once we have this data, we will gather some summary statistics on the number of active profiles for each active user, and visualize the distribution of the number of profiles in use. It is unlikely to be normally distributed (bell shaped), so visualizing this breakdown would be useful to understanding the potential impact that changing the limits might have.
## Data collection
Let's start by getting the Free users that have scheduled at least one update in the past 28 days. We'll use the following SQL query to grab them.
```{r include = FALSE}
library(buffer); library(dplyr); library(ggplot2)
```
```{r include = FALSE}
con <- redshift_connect()
```
```{sql connection=con, eval = FALSE}
with user_facts as (
select
up.user_id
, max(up.created_at) as last_update_created_at
from transformed_updates as up
inner join users
on users.user_id = up.user_id
where up.status != 'service'
and users.billing_plan = 'individual'
group by up.user_id
)
select
user_id
, last_update_created_at
from user_facts
where last_update_created_at > (current_date - 29)
```
```{r include = F}
# Set WD
setwd("/Users/julianwinternheimer/Documents/analysis")
# Save users
# save(users, file = "user_profiles.Rda")
# Load users data
load("user_profiles.Rda")
```
Great! Now we need to get the number of updates they have scheduled, and the number of profiles that they have scheduled updates for, in the past 28 days.
```{sql connection=con, eval = FALSE}
select
up.user_id
, count(distinct up.id) as updates_count
, count(distinct up.profile_id) as profile_count
from transformed_updates as up
left join users
on up.user_id = users.user_id
where users.billing_plan = 'individual'
and up.created_at > (current_date - 29)
group by up.user_id
```
```{r include = F}
# Set WD
setwd("/Users/julianwinternheimer/Documents/analysis")
# Save users
# save(updates, file = "user_profile_updates.Rda")
# Load users data
load("user_profile_updates.Rda")
```
Awesome! Now let's join the two datasets -- then we'll be ready to explore. :)
```{r}
# Join updates and users
users <- users %>%
left_join(updates, by = 'user_id')
# Replace NAs with 0s
users[is.na(users)] <- 0
```
## Exploratory analysis
So how does the profile count break down?
```{r}
# Get summary stats on the number of profiles
summary(users$profile_count)
```
Alright, here is the breakdown:
- Around 50% of active Free plan users scheduled updates for only 1 profile in the past 28 days.
- The average number of profiles that active Free plan users used in the past 28 days is 1.9.
- Around 75% of active Free plan users scheduled updates for only 2 or less profiles.
- Someone scheduled updates for 194 profiles in the past 28 days!
The maximum of 194 brings up a good point about our sample: these are only users that are _currently_ on individual plans. This means that they could have been on a paid plan in the past 28 days. I don't think it's a huge portion, or very significant for this particular analysis, but it's something to keep in mind!
Let's look at the distribution.
```{r include = FALSE}
by_profile <- users %>%
group_by(profile_count) %>%
summarise(users = n_distinct(user_id)) %>%
mutate(proportion = users / sum(users))
```
```{r echo = FALSE}
options(scipen=10000)
ggplot(by_profile, aes(x = as.factor(profile_count), y = users)) +
geom_bar(stat = 'identity') +
coord_cartesian(xlim = c(1,10)) +
labs(x = 'Number of Active Profiles', y = 'Users')
```
Wow, that's interesting! It looks like most currently active Free users have only scheduled updates to one profile in the past 28 days. Let's look at the CDF of this distribution.
```{r echo = FALSE}
ggplot(users, aes(x = profile_count)) +
stat_ecdf() +
coord_cartesian(xlim = c(0, 10)) +
scale_x_continuous(breaks = seq(0, 10, 2)) +
scale_y_continuous(breaks = seq(0, 1, 0.2)) +
labs(x = 'Number of Active Profiles', y = '' , title = 'Profile Count CDF')
```
Interesting! This graph tells us the following:
- Around 54% of users scheduled updates to 1 profile in the past 28 days.
- Around 76% of users scheduled updates to 2 or less profiles in the past 28 days.
- Around 90% of users scheduled updates to 3 or less profiles in the past 28 days.
## Conclusions
If we were to go with the "less limited" Free plan and only allow users to connect up to 3 profiles, the data suggests that somewhere around 10% of currently active users would be affected. This equates to around 25k users.
If we were to go with the "more limited" Free plan and only allow 2 profiles, the data suggests that somewhere around 25% of currently active users would be affected! This equates to around 63k users.
I think it's important to note that this is just a snapshot of _currently_ active users that are _currently_ on Individual plans, but I do think we can infer some insights from them!