Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial commit for aka.ms/econml doc migration #640

Merged
merged 11 commits into from
Jul 27, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ __pycache__/
*.log
*.out
*.synctex.gz
*.pdf
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is fine.

If you need to make any further changes to this PR, you might consider also changing azure-pipelines.yml to exempt .gitignore changes so that you don't have to run the full set of checks, since all of your other changes are in the doc subdirectory.


# C extensions
*.so
Expand Down
3 changes: 2 additions & 1 deletion azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ jobs:
foreach ($file in $editedFiles) {
switch -Wildcard ($file) {
"README.md" { Continue }
".gitignore" { Continue }
"econml/_version.py" { Continue }
"prototypes/*" { Continue }
"images/*" { Continue }
Expand Down Expand Up @@ -70,7 +71,7 @@ jobs:
- script: 'pip install git+https://github.com/slundberg/shap.git@d1d2700acc0259f211934373826d5ff71ad514de'
displayName: 'Install specific version of shap'

- script: 'pip install sphinx sphinx_rtd_theme'
- script: 'pip install sphinx!=5.1.0 sphinx_rtd_theme'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^

displayName: 'Install sphinx'

- script: 'python setup.py build_sphinx -W'
Expand Down
Binary file added doc/Causal-Inference-User-Guide-v4-022520.pdf
Binary file not shown.
4 changes: 2 additions & 2 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
# -- Project information -----------------------------------------------------

project = 'econml'
copyright = '2019, Microsoft Research'
copyright = '2022, Microsoft Research'
author = 'Microsoft Research'
version = econml.__version__
release = econml.__version__
Expand Down Expand Up @@ -119,7 +119,7 @@
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
# html_static_path = ['_static']
html_extra_path = ['map.svg']
html_extra_path = ['map.svg', 'Causal-Inference-User-Guide-v4-022520.pdf', "spec/img"]

# Custom sidebar templates, must be a dictionary that maps document names
# to template names.
Expand Down
10 changes: 10 additions & 0 deletions doc/spec/causal_intro.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Introduction to Causal Inference
=================================

If you are new to causal inference, it may be helpful to walk through a quick overview of concepts and techniques that we refer to over the course of the documentation. Below we provide a high level introduction to causal inference tailored for EconML:

.. raw:: html

<iframe src="../Causal-Inference-User-Guide-v4-022520.pdf" width="700" height="388"> </iframe>

The folks at DoWhy also have a broader introduction `here <https://causalinference.gitlab.io/kdd-tutorial/>`__.
77 changes: 77 additions & 0 deletions doc/spec/faq.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
Frequently Asked Questions (FAQ)
====================================================================

When should I use EconML?
--------------------------

EconML is designed to answer causal questions: what will happen in response to some change in behavior,
prices, or conditions? These questions require different methods than forecasting questions:
what will happen next if everything continues as it has been?


What are the advantages of EconML?
-----------------------------------

EconML offers the broadest range of cutting-edge AI models designed specifically to answer causal questions.
The EconML models also build on familiar Python packages, allowing users to easily select the best model for their question.
Finally, EconML includes custom interpreters to create presentation-ready output.


How do I know if the results make sense?
----------------------------------------

Try comparing the consistency of your estimates across multiple models, including some that make
stronger structural assumptions like linear relationships and some that do not. Pay attention to the
standard errors as well as the point estimates—imprecise estimates should be interpreted accordingly.
While researchers can introduce bias by narrowly fishing for estimates that match their prior, it is also important
to use your expertise to evaluate results. If you estimate that a 5% decrease in price generates
an implausible 5000% increase in sales you should carefully review your code!

I'm getting causal estimates that don't make sense. What next?
----------------------------------------------------------------
First carefully check your code for errors and try several causal models.
If your estimates are consistent, but implausible, you may have a confounding variable that hasn’t been measured in your data.
Think carefully about the source of the data you are using: was there something unusual going on
during the period when the data were collected (for example a holiday or an economic downturn)?
Is there something unusual about your sample (for example, all men with pre-existing heart conditions)?


What if I don't have a good instrument, can't run an experiment, and don't observe all confounders?
------------------------------------------------------------------------------------------------------------
In this case, no statistical approach can perfectly isolate the causal effect of the treatment on the outcome.
DML, OrthoForest, or MetaLearners, all including all the confounders you can observe,
will deliver the best approximation of the causal effect that minimizes the bias from confounders.
Be aware of some remaining bias when using these estimates.


How can I test whether I'm identifying the causal effect?
------------------------------------------------------------
You are identifying a valid causal effect if and only if the underlying assumptions of the causal model
assumed by the estimation routine are correct. Those are often hard to test (though the `DoWhy <https://py-why.github.io/dowhy/>`__ package may help).
Having made those assumptions, the EconML package allows you to fit the best causal model you can.
Many models will store a final stage fit metric that can be used to validate how well the causal model predicts out of sample,
which is a good diagnostic as to the quality of your model.


How do I give feedback?
------------------------------------

This project welcomes contributions and suggestions. Most contributions require you to agree to
a Contributor License Agreement (CLA) declaring that you have the right to, and actually do,
grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.


When you submit a pull request, a CLA-bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., label, comment).
Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.


This project has adopted the Microsoft Open Source Code of Conduct.
For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.







Binary file added doc/spec/img/Attribution.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/spec/img/Recommendation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/spec/img/Segmentation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/spec/img/imgFamiliar.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/spec/img/imgFlexible.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/spec/img/imgUnified.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
125 changes: 73 additions & 52 deletions doc/spec/motivation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,55 +31,76 @@ python API.
Motivating Examples
===================

Customer Targeting
------------------

An important problem in modern business analytics is building automated tools to prioritize customer
acquisition and personalize customer interactions to increase sales and revenue. Typically businesses
will offer personalized incentives to customers to increase spend or increase the level of
engagement via more human resources. Any such personalized intervention corresponds to a monetary
investment and the main question that business analytics are called to answer is: what is the return
on investment (ROI)?

Analyzing the ROI is inherently a treatment effect question: what was the effect of any investment
on a particular customer on its spend? Understanding how these return on investment varies across
customers can enable more targeted investment policies and increased ROI via better targeting. Using historical
data from deployed investments, and estimating the heterogeneous treatment effect via any of
the proposed methods, business analysts can learn in an automated manner, data-driven
customer targeting and prioritization policies.

Personalized Pricing
--------------------

Personalized discounts have become very widespread in the digital economy. To set the optimal
personalized discount policy a business needs to understand what is the effect
of a drop in price on the demand of a customer for a product as a function of customer
characteristics. The estimation of such personalized demand elasticities can also be
phrased in the language of heterogeneous treatment effects, where the treatment
is the price (or typically log of price) on the demand (or typically log of demand)
as a function of observable features of the customer. Hence, estimation of heterogeneous
treatment effects can lead to optimal pricing policies.


Stratification in Clinical Trials
----------------------------------------

Which patients should be selected for a clinical trial? If we want to demonstrate
that a clinical treatment has an effect on at least some subset of a population, then
fully randomized clinical trials are inappropriate as they will solely estimate
average effects. Using heterogeneous treatment effect techniques, we can use
observational data to come up with estimates of these effects and identify
good candidate patients for a clinical trial that our model estimates have high
treatment effects.

Learning Click-Through-Rates
----------------------------

In the design of a page layout and more importantly in ad placement, it is important
to understand the click-through-rate of page components (e.g. ads) on different positions
of a page. Even though the modern approach is to run multiple A/B tests, when such
page components involve revenue considerations (such as ad placement), then observational
data can help guide correct A/B tests to run. Heterogeneous treatment effect estimation
can provide estimates of the click-through-rate of page components from
observational data. In this setting, the treatment is simply whether the component is
placed on that page position and the response is whether the user clicked on it.
EconML is designed to measure the causal effect of some treatment variable(s) T on an outcome variable Y, controlling for a set of features X. Use cases include:

Recommendation A/B testing
-----------------------------

*Interpret experiments with imperfect compliance*

.. image:: img/Recommendation.png
:alt: Recommendation A/B testing logo

**Question**: A travel website would like to know whether joining a membership program
causes users to spend more time engaging with the website.

**Problem**: They can’t look directly at existing data, comparing members and non-members,
because the customers who chose to become members are likely already more engaged than other users.
Nor can they run a direct A/B test because they can’t force users to sign up for membership.

**Solution**: The company had run an earlier experiment to test the value of a new,
faster sign-up process. EconML’s DRIV estimator uses this experimental nudge towards membership
as an instrument that generates random variation in the likelihood of membership.
The DRIV model adjusts for the fact that not every customer who was offered the easier sign-up
became a member and returns the effect of membership rather than the effect of receiving the quick sign-up.

Link to jupyter notebook:
`Recommendation A/B Testing <https://github.com/microsoft/EconML/blob/main/notebooks/CustomerScenarios/Case%20Study%20-%20Recommendation%20AB%20Testing%20at%20An%20Online%20Travel%20Company.ipynb>`__

More details:
`Trip Advisor Case Study <https://www.microsoft.com/en-us/research/uploads/prod/2020/04/MSR_ALICE_casestudy_2020.pdf>`__


Customer Segmentation
----------------------

*Estimate individualized responses to incentives*

.. image:: img/Segmentation.png
:alt: Customer Segmentation logo

**Question**: A media subscription service would like to offer targeted discounts
through a personalized pricing plan.

**Problem**: They observe many features of their customers,
but are not sure which customers will respond most to a lower price.

**Solution**: EconML’s DML estimator uses price variations in existing data,
along with a rich set of user features, to estimate heterogeneous price sensitivities
that vary with multiple customer features.
The tree interpreter provides a presentation-ready summary of the key features
that explain the biggest differences in responsiveness to a discount.

Link to jupyter notebook:
`Customer Segmentation <https://github.com/microsoft/EconML/blob/main/notebooks/CustomerScenarios/Case%20Study%20-%20Customer%20Segmentation%20at%20An%20Online%20Media%20Company.ipynb>`__.

Multi-investment Attribution
-----------------------------
*Distinguish the effects of multiple outreach efforts*

.. image:: img/Attribution.png
:alt: Multi-investment Attribution logo

**Question**: A startup would like to know the most effective approach for recruiting new customers:
price discounts, technical support to ease adoption, or a combination of the two.

**Problem**: The risk of losing customers makes experiments across outreach efforts too expensive.
So far, customers have been offered incentives strategically,
for example larger businesses are more likely to get technical support.

**Solution**: EconML’s Doubly Robust Learner model jointly estimates the effects of multiple discrete treatments.
The model uses flexible functions of observed customer features to filter out confounding correlations
in existing data and deliver the causal effect of each effort on revenue.

Link to jupyter notebook:
`Multi-investment Attribution <https://github.com/microsoft/EconML/blob/main/notebooks/CustomerScenarios/Case%20Study%20-%20Multi-investment%20Attribution%20at%20A%20Software%20Company.ipynb>`__.
32 changes: 32 additions & 0 deletions doc/spec/overview.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
Overview
=========

EconML is a Python package that applies the power of machine learning techniques to estimate individualized causal responses from observational or experimental data. The suite of estimation methods provided in EconML represents the latest advances in causal machine learning. By incorporating individual machine learning steps into interpretable causal models, these methods improve the reliability of what-if predictions and make causal analysis quicker and easier for a broad set of users.

EconML is open source software developed by the `ALICE <https://www.microsoft.com/en-us/research/project/alice/>`__ team at Microsoft Research.

.. raw:: html

<p></p>
<div class="ms-grid " style = "text-align: left; box-sizing: border-box; display: block; margin-left: auto; margin-right: auto; max-width: 1600px; position: relative; padding-left: 0; padding-right: 0; width: 100%;">
<div class="ms-row" style = "text-align: left; box-sizing: border-box; -webkit-box-align: stretch; align-items: stretch; display: flex; flex-wrap: wrap; margin-left: 3px; margin-right: 3px;">
<div class="m-col-8-24 x-hidden-focus" style = "text-align: left; box-sizing: border-box; float: left; margin: 0; padding-left: 1vw; padding-right: 1vw; position: relative; width: 33.33333%;">
<p style="text-align:center;"><img loading="lazy" class="size-full wp-image-656358 aligncenter x-hidden-focus" src="../imgFlexible.png" alt="Flexible icon" width="92" height="92"></p><p style="text-align: center"><b>Flexible</b></p><p class="x-hidden-focus">Allows for flexible model forms that do not impose strong assumptions, including models of heterogenous responses to treatment.</p><p> </p></div>
<div class="m-col-8-24" style = "text-align: left; box-sizing: border-box; float: left; margin: 0; padding-left: 1vw; padding-right: 1vw; position: relative; width: 33.33333%;">
<p style="text-align:center;"><img loading="lazy" class="size-full wp-image-656355 aligncenter" src="../imgUnified.png" alt="Unified icon" width="92" height="92"></p><p style="text-align: center"><b>Unified</b></p><p>Broad set of methods representing latest advances in the econometrics and machine learning literature within a unified API.</p><p> </p></div>
<div class="m-col-8-24" style = "text-align: left; box-sizing: border-box; float: left; margin: 0; padding-left: 1vw; padding-right: 1vw; position: relative; width: 33.33333%;">
<p style="text-align:center;"><img loading="lazy" class="size-full wp-image-656352 aligncenter" src="../imgFamiliar.png" alt="Familiar icon" width="92" height="92"></p><p style="text-align: center"><b>Familiar Interface</b></p><p class="x-hidden-focus">Built on standard Python packages for machine learning and data analysis.</p><p> </p></div>
<p></p> </div>
</div>

**Why causality?**

Decision-makers need estimates of causal impacts to answer what-if questions about shifts in policy - such as changes in product pricing for businesses or new treatments for health professionals.

**Why not just a vanilla machine learning solution?**

Most current machine learning tools are designed to forecast what will happen next under the present strategy, but cannot be interpreted to predict the effects of particular changes in behavior.

**Why causal machine learning/EconML?**

Existing solutions to answer what-if questions are expensive. Decision-makers can engage in active experimentation like A/B testing or employ highly trained economists who use traditional statistical models to infer causal effects from previously collected data.
14 changes: 3 additions & 11 deletions doc/spec/spec.rst
Original file line number Diff line number Diff line change
@@ -1,19 +1,10 @@
EconML User Guide
=================

Causal machine learning applies the power of machine learning techniques to answer causal questions.

* Decision-makers need estimates of causal impacts to answer what-if questions about shifts in policy - such as changes in product pricing for businesses or new treatments for health professionals.

* Most current machine learning tools are designed to forecast what will happen next under the present strategy, but cannot be interpreted to predict the effects of particular changes in behavior.

* Existing solutions to answer what-if questions are expensive. Decision-makers can engage in active experimentation like A/B testing or employ highly trained economists who use traditional statistical models to infer causal effects from previously collected data.

The EconML Python SDK, developed by the ALICE team at MSR New England, incorporates individual machine learning steps into interpretable causal models. By reducing the need for expert judgment, these innovations improve the reliability of what-if predictions and empower data scientists without extensive economic training to conduct causal analysis using existing data.


.. toctree::
overview
motivation
causal_intro
api
flowchart
comparison
Expand All @@ -23,6 +14,7 @@ The EconML Python SDK, developed by the ALICE team at MSR New England, incorpora
inference
interpretability
references
faq

.. todo::
benchmark
Expand Down