Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFM Segmentation #680

Merged
merged 24 commits into from
May 28, 2024
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
f022c60
init rfm_segments func
ColtAllen Apr 29, 2024
592a2a4
TODOs
ColtAllen Apr 29, 2024
b2de3eb
Merge branch 'pymc-labs:main' into rfm_segment
ColtAllen May 1, 2024
d653915
docstrings and for loop
ColtAllen May 1, 2024
199ad46
docstrings and for loop
ColtAllen May 1, 2024
3c4f684
Merge branch 'rfm_segment' of https://github.com/ColtAllen/pymc-marke…
ColtAllen May 1, 2024
aa5f871
WIP dev notebook debugging
ColtAllen May 2, 2024
e1fdc93
Merge branch 'pymc-labs:main' into rfm_segment
ColtAllen May 4, 2024
e5f112a
Merge branch 'pymc-labs:main' into rfm_segment
ColtAllen May 10, 2024
bea0bbd
checkpoint commit for remote pull
ColtAllen May 10, 2024
7cda576
Merge branch 'rfm_segment' of https://github.com/ColtAllen/pymc-marke…
ColtAllen May 10, 2024
00c58c5
code testing in dev notebook
ColtAllen May 10, 2024
bcc7274
unit tests added
ColtAllen May 11, 2024
210c245
dev notebook cleanup
ColtAllen May 11, 2024
798eb3b
Merge branch 'pymc-labs:main' into rfm_segment
ColtAllen May 26, 2024
eca910d
clean up type hints
ColtAllen May 27, 2024
2cfae06
comments and code cleanup
ColtAllen May 27, 2024
2d685e5
docstrings
ColtAllen May 27, 2024
763060b
move formatting to rfm_summary and quickstart edits
ColtAllen May 28, 2024
9710df5
fix rfm_train_test_split bug
ColtAllen May 28, 2024
e18d0c6
Merge branch 'pymc-labs:main' into rfm_segment
ColtAllen May 28, 2024
62caf94
added test for rfm_quartile_labels
ColtAllen May 28, 2024
93fb6f9
added rfm score warning
ColtAllen May 28, 2024
91e22b3
Merge branch 'main' into rfm_segment
ColtAllen May 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions docs/source/notebooks/clv/clv_quickstart.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -67,10 +67,10 @@
"* `customer_id` represents a unique identifier for each customer.\n",
"* `frequency` represents the number of _repeat_ purchases that a customer has made, i.e. one less than the total number of purchases.\n",
"* `T` represents a customer's \"age\", i.e. the duration between a customer's first purchase and the end of the period of study. In this example notebook, the units of time are in weeks.\n",
"* `recency` represents the timepoint when a customer made their most recent purchase. This is also equal to the duration between a customer’s first non-repeat purchase (usually time 0) and last purchase. If a customer has made only 1 purchase, their recency is 0;\n",
"* `recency` represents the time period when a customer made their most recent purchase. This is equal to the duration between a customer’s first and last purchase. If a customer has made only 1 purchase, their recency is 0.\n",
"* `monetary_value` represents the average value of a given customer’s repeat purchases. Customers who have only made a single purchase have monetary values of zero.\n",
"\n",
"If working with raw transaction data, the `rfm_summary` function can be used to preprocess data for modeling:"
"The `rfm_summary` function can be used to preprocess raw transaction data for modeling:"
]
},
{
Expand Down Expand Up @@ -339,6 +339,8 @@
"id": "514ee548",
"metadata": {},
"source": [
"It is important to note these definitions differ from that used in RFM segmentation, where the first purchase is included, `T` is not used, and `recency` is the number of time periods since a customer's most recent purchase.\n",
"\n",
"To visualize data in RFM format, we can plot the recency and T of the customers with the `plot_customer_exposure` function. We see a large chunk (>60%) of customers haven't made another purchase in a while."
]
},
Expand Down Expand Up @@ -2579,7 +2581,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.18"
"version": "3.10.14"
},
"toc": {
"base_numbering": 1,
Expand Down
186 changes: 151 additions & 35 deletions docs/source/notebooks/clv/dev/utilities_plotting.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,7 @@
"execution_count": 1,
"id": "435ed203-5c3c-4efc-93d1-abac66ce7187",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.\n"
]
}
],
"outputs": [],
"source": [
"from pymc_marketing.clv import utils\n",
"\n",
Expand All @@ -30,7 +22,7 @@
},
{
"cell_type": "code",
"execution_count": 69,
"execution_count": 2,
"id": "7de7f396-1d5b-4457-916b-c29ed90aa132",
"metadata": {},
"outputs": [],
Expand Down Expand Up @@ -66,7 +58,7 @@
},
{
"cell_type": "code",
"execution_count": 70,
"execution_count": 3,
"id": "932e8db6-78cf-49df-aa4a-83ee6584e5dd",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -196,7 +188,7 @@
"13 6 2015-02-02 True"
]
},
"execution_count": 70,
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -223,7 +215,7 @@
},
{
"cell_type": "code",
"execution_count": 74,
"execution_count": 4,
"id": "4c0a7de5-8825-40af-84e5-6cd0ad26a0e3",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -259,57 +251,57 @@
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>2.0</td>\n",
" <td>5.0</td>\n",
" <td>5.0</td>\n",
" <td>2.0</td>\n",
" <td>1.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>5.0</td>\n",
" <td>0.0</td>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1.0</td>\n",
" <td>2.0</td>\n",
" <td>1.0</td>\n",
" <td>5.0</td>\n",
" <td>5.0</td>\n",
" <td>4.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1.0</td>\n",
" <td>2.0</td>\n",
" <td>3.0</td>\n",
" <td>3.0</td>\n",
" <td>8.0</td>\n",
" <td>7.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>3.0</td>\n",
" <td>0.0</td>\n",
" <td>12.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" customer_id frequency recency T monetary_value\n",
"0 1 1.0 5.0 5.0 2.0\n",
"1 2 0.0 0.0 5.0 0.0\n",
"2 3 1.0 1.0 5.0 5.0\n",
"3 4 1.0 3.0 3.0 8.0\n",
"4 5 0.0 0.0 3.0 0.0"
"0 1 2.0 5.0 5.0 1.5\n",
"1 2 1.0 0.0 5.0 2.0\n",
"2 3 2.0 1.0 5.0 4.5\n",
"3 4 2.0 3.0 3.0 7.0\n",
"4 5 1.0 0.0 3.0 12.0"
]
},
"execution_count": 74,
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -323,7 +315,7 @@
" observation_period_end = \"2015-02-06\",\n",
" datetime_format = \"%Y-%m-%d\",\n",
" time_unit = \"W\",\n",
" include_first_transaction=False,\n",
" include_first_transaction=True,\n",
")\n",
"\n",
"rfm_df.head()"
Expand All @@ -339,7 +331,7 @@
},
{
"cell_type": "code",
"execution_count": 76,
"execution_count": 5,
"id": "761edfe9-1b69-4966-83bf-4f1242eda2d5",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -450,7 +442,7 @@
"4 0.0 5.0 "
]
},
"execution_count": 76,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -467,13 +459,137 @@
"train_test.head()"
]
},
{
"cell_type": "markdown",
"id": "73dc1b93-6a4f-4171-b838-30759b2c1e0e",
"metadata": {},
"source": [
"`rfm_segments` will assign customer to segments based on their recency, frequency, and monetary value. It uses a quartile-based RFM score approach that is very computationally efficient, but defining custom segments is a rather subjective exercise. The returned dataframe also cannot be used for modeling because it does not zero out the initial transactions."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 40,
"id": "c7b3f800-8dfb-4e5a-b939-5f908281563c",
"metadata": {},
"outputs": [],
"source": []
"source": [
"segments = utils.rfm_segments(\n",
" test_data, \n",
" customer_id_col = \"id\", \n",
" datetime_col = \"date\", \n",
" monetary_value_col = \"monetary_value\",\n",
" observation_period_end = \"2015-02-06\",\n",
" datetime_format = \"%Y-%m-%d\",\n",
" time_unit = \"W\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "932ac4e5-361e-42fa-97d3-d8e508128944",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>customer_id</th>\n",
" <th>frequency</th>\n",
" <th>recency</th>\n",
" <th>monetary_value</th>\n",
" <th>segment</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>2.0</td>\n",
" <td>0.0</td>\n",
" <td>1.5</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1.0</td>\n",
" <td>5.0</td>\n",
" <td>2.0</td>\n",
" <td>Inactive Customer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>2.0</td>\n",
" <td>4.0</td>\n",
" <td>4.5</td>\n",
" <td>At Risk Customer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>2.0</td>\n",
" <td>0.0</td>\n",
" <td>7.0</td>\n",
" <td>Top Spender</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>1.0</td>\n",
" <td>3.0</td>\n",
" <td>12.0</td>\n",
" <td>At Risk Customer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>6</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>5.0</td>\n",
" <td>Top Spender</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" customer_id frequency recency monetary_value segment\n",
"0 1 2.0 0.0 1.5 Other\n",
"1 2 1.0 5.0 2.0 Inactive Customer\n",
"2 3 2.0 4.0 4.5 At Risk Customer\n",
"3 4 2.0 0.0 7.0 Top Spender\n",
"4 5 1.0 3.0 12.0 At Risk Customer\n",
"5 6 1.0 0.0 5.0 Top Spender"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"segments"
]
}
],
"metadata": {
Expand All @@ -492,7 +608,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.18"
"version": "3.10.14"
}
},
"nbformat": 4,
Expand Down
2 changes: 2 additions & 0 deletions pymc_marketing/clv/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
)
from pymc_marketing.clv.utils import (
customer_lifetime_value,
rfm_segments,
rfm_summary,
rfm_train_test_split,
)
Expand All @@ -39,6 +40,7 @@
"plot_customer_exposure",
"plot_frequency_recency_matrix",
"plot_probability_alive_matrix",
"rfm_segments",
"rfm_summary",
"rfm_train_test_split",
)
Loading
Loading