Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFM Segmentation #680

Merged
merged 24 commits into from
May 28, 2024
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
f022c60
init rfm_segments func
ColtAllen Apr 29, 2024
592a2a4
TODOs
ColtAllen Apr 29, 2024
b2de3eb
Merge branch 'pymc-labs:main' into rfm_segment
ColtAllen May 1, 2024
d653915
docstrings and for loop
ColtAllen May 1, 2024
199ad46
docstrings and for loop
ColtAllen May 1, 2024
3c4f684
Merge branch 'rfm_segment' of https://github.com/ColtAllen/pymc-marke…
ColtAllen May 1, 2024
aa5f871
WIP dev notebook debugging
ColtAllen May 2, 2024
e1fdc93
Merge branch 'pymc-labs:main' into rfm_segment
ColtAllen May 4, 2024
e5f112a
Merge branch 'pymc-labs:main' into rfm_segment
ColtAllen May 10, 2024
bea0bbd
checkpoint commit for remote pull
ColtAllen May 10, 2024
7cda576
Merge branch 'rfm_segment' of https://github.com/ColtAllen/pymc-marke…
ColtAllen May 10, 2024
00c58c5
code testing in dev notebook
ColtAllen May 10, 2024
bcc7274
unit tests added
ColtAllen May 11, 2024
210c245
dev notebook cleanup
ColtAllen May 11, 2024
798eb3b
Merge branch 'pymc-labs:main' into rfm_segment
ColtAllen May 26, 2024
eca910d
clean up type hints
ColtAllen May 27, 2024
2cfae06
comments and code cleanup
ColtAllen May 27, 2024
2d685e5
docstrings
ColtAllen May 27, 2024
763060b
move formatting to rfm_summary and quickstart edits
ColtAllen May 28, 2024
9710df5
fix rfm_train_test_split bug
ColtAllen May 28, 2024
e18d0c6
Merge branch 'pymc-labs:main' into rfm_segment
ColtAllen May 28, 2024
62caf94
added test for rfm_quartile_labels
ColtAllen May 28, 2024
93fb6f9
added rfm score warning
ColtAllen May 28, 2024
91e22b3
Merge branch 'main' into rfm_segment
ColtAllen May 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
186 changes: 151 additions & 35 deletions docs/source/notebooks/clv/dev/utilities_plotting.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,7 @@
"execution_count": 1,
"id": "435ed203-5c3c-4efc-93d1-abac66ce7187",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.\n"
]
}
],
"outputs": [],
"source": [
"from pymc_marketing.clv import utils\n",
"\n",
Expand All @@ -30,7 +22,7 @@
},
{
"cell_type": "code",
"execution_count": 69,
"execution_count": 2,
"id": "7de7f396-1d5b-4457-916b-c29ed90aa132",
"metadata": {},
"outputs": [],
Expand Down Expand Up @@ -66,7 +58,7 @@
},
{
"cell_type": "code",
"execution_count": 70,
"execution_count": 3,
"id": "932e8db6-78cf-49df-aa4a-83ee6584e5dd",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -196,7 +188,7 @@
"13 6 2015-02-02 True"
]
},
"execution_count": 70,
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -223,7 +215,7 @@
},
{
"cell_type": "code",
"execution_count": 74,
"execution_count": 4,
"id": "4c0a7de5-8825-40af-84e5-6cd0ad26a0e3",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -259,57 +251,57 @@
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>2.0</td>\n",
" <td>5.0</td>\n",
" <td>5.0</td>\n",
" <td>2.0</td>\n",
" <td>1.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>5.0</td>\n",
" <td>0.0</td>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1.0</td>\n",
" <td>2.0</td>\n",
" <td>1.0</td>\n",
" <td>5.0</td>\n",
" <td>5.0</td>\n",
" <td>4.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1.0</td>\n",
" <td>2.0</td>\n",
" <td>3.0</td>\n",
" <td>3.0</td>\n",
" <td>8.0</td>\n",
" <td>7.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>3.0</td>\n",
" <td>0.0</td>\n",
" <td>12.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" customer_id frequency recency T monetary_value\n",
"0 1 1.0 5.0 5.0 2.0\n",
"1 2 0.0 0.0 5.0 0.0\n",
"2 3 1.0 1.0 5.0 5.0\n",
"3 4 1.0 3.0 3.0 8.0\n",
"4 5 0.0 0.0 3.0 0.0"
"0 1 2.0 5.0 5.0 1.5\n",
"1 2 1.0 0.0 5.0 2.0\n",
"2 3 2.0 1.0 5.0 4.5\n",
"3 4 2.0 3.0 3.0 7.0\n",
"4 5 1.0 0.0 3.0 12.0"
]
},
"execution_count": 74,
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -323,7 +315,7 @@
" observation_period_end = \"2015-02-06\",\n",
" datetime_format = \"%Y-%m-%d\",\n",
" time_unit = \"W\",\n",
" include_first_transaction=False,\n",
" include_first_transaction=True,\n",
")\n",
"\n",
"rfm_df.head()"
Expand All @@ -339,7 +331,7 @@
},
{
"cell_type": "code",
"execution_count": 76,
"execution_count": 5,
"id": "761edfe9-1b69-4966-83bf-4f1242eda2d5",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -450,7 +442,7 @@
"4 0.0 5.0 "
]
},
"execution_count": 76,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -467,13 +459,137 @@
"train_test.head()"
]
},
{
"cell_type": "markdown",
"id": "73dc1b93-6a4f-4171-b838-30759b2c1e0e",
"metadata": {},
"source": [
"`rfm_segments` will assign customer to segments based on their recency, frequency, and monetary value. It uses a quartile-based RFM score approach that is very computationally efficient, but defining custom segments is a rather subjective exercise. The returned dataframe also cannot be used for modeling because it does not zero out the initial transactions."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 40,
"id": "c7b3f800-8dfb-4e5a-b939-5f908281563c",
"metadata": {},
"outputs": [],
"source": []
"source": [
"segments = utils.rfm_segments(\n",
" test_data, \n",
" customer_id_col = \"id\", \n",
" datetime_col = \"date\", \n",
" monetary_value_col = \"monetary_value\",\n",
" observation_period_end = \"2015-02-06\",\n",
" datetime_format = \"%Y-%m-%d\",\n",
" time_unit = \"W\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "932ac4e5-361e-42fa-97d3-d8e508128944",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>customer_id</th>\n",
" <th>frequency</th>\n",
" <th>recency</th>\n",
" <th>monetary_value</th>\n",
" <th>segment</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>2.0</td>\n",
" <td>0.0</td>\n",
" <td>1.5</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1.0</td>\n",
" <td>5.0</td>\n",
" <td>2.0</td>\n",
" <td>Inactive Customer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>2.0</td>\n",
" <td>4.0</td>\n",
" <td>4.5</td>\n",
" <td>At Risk Customer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>2.0</td>\n",
" <td>0.0</td>\n",
" <td>7.0</td>\n",
" <td>Top Spender</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>1.0</td>\n",
" <td>3.0</td>\n",
" <td>12.0</td>\n",
" <td>At Risk Customer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>6</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>5.0</td>\n",
" <td>Top Spender</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" customer_id frequency recency monetary_value segment\n",
"0 1 2.0 0.0 1.5 Other\n",
"1 2 1.0 5.0 2.0 Inactive Customer\n",
"2 3 2.0 4.0 4.5 At Risk Customer\n",
"3 4 2.0 0.0 7.0 Top Spender\n",
"4 5 1.0 3.0 12.0 At Risk Customer\n",
"5 6 1.0 0.0 5.0 Top Spender"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"segments"
]
}
],
"metadata": {
Expand All @@ -492,7 +608,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.18"
"version": "3.10.14"
}
},
"nbformat": 4,
Expand Down
2 changes: 2 additions & 0 deletions pymc_marketing/clv/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
)
from pymc_marketing.clv.utils import (
customer_lifetime_value,
rfm_segments,
rfm_summary,
rfm_train_test_split,
)
Expand All @@ -39,6 +40,7 @@
"plot_customer_exposure",
"plot_frequency_recency_matrix",
"plot_probability_alive_matrix",
"rfm_segments",
"rfm_summary",
"rfm_train_test_split",
)
Loading
Loading