CausalPipe is a Python wrapper built on Causal-Learn and Lavaan that offers a predefined and well-formalized process for causal analysis tailored for everyday users. It provides intuitive tools for data preparation, constructing and orienting causal graphs, and visualizing results, supporting both ordinal and continuous variables.
- Data Preprocessing: Handle missing values using multiple imputation (
MICE
), encode categorical variables, standardize features, and perform feature selection based on correlation. - Skeleton Identification: Identify the global skeleton of the causal graph using methods like Fast Adjacency Search (
FAS
) or Bootstrap-based Causal Structure Learning (BCSL
). - Edge Orientation: Orient edges in the skeleton using algorithms such as Fast Causal Inference (
FCI
) or Hill Climbing. - Causal Effect Estimation: Estimate causal effects using various methods, including Partial Pearson Correlation, Partial Spearman Correlation, Conditional Mutual Information (
MI
), Kernel Conditional Independence (KCI
), Structural Equation Modeling (SEM
), and Hill Climbing-based SEM. - Visualization: Generate and save visualizations for correlation graphs, skeletons, oriented graphs, and SEM results.
- Modular Configuration: Easily configure different aspects of the pipeline through dataclasses, allowing for flexible and customizable causal discovery workflows.
- Integration with R: Utilize R's
lavaan
package for advanced Structural Equation Modeling directly within Python usingrpy2
.
You can install causal-pipe
via PyPI using pip
:
pip install causal-pipe
CausalPipe relies on several Python and R packages. Ensure that you have the following dependencies installed:
- Python 3.6 or higher
- R: Required for Structural Equation Modeling (
lavaan
) and multiple imputation (mice
). - Python Packages:
numpy>=1.18.0
scipy>=1.4.0
scikit-learn>=0.22.0
causal-learn==0.1.3.8
bcsl-python==0.8.0
rpy2==3.5.16
npeet-plus==0.2.0
networkx==3.2.1
pandas==2.2.3
factor_analyzer==0.5.1
Begin by defining the configuration for your causal discovery pipeline using the CausalPipeConfig
dataclass. This includes specifying variable types, preprocessing parameters, skeleton identification methods, edge orientation methods, and causal effect estimation methods.
from causal_pipe.pipe_config import (
DataPreprocessingParams,
CausalPipeConfig,
VariableTypes,
FASSkeletonMethod,
FCIOrientationMethod,
CausalEffectMethod,
)
# Define preprocessing parameters
preprocessor_params = DataPreprocessingParams(
cat_to_codes=False,
standardize=True,
# keep_only_correlated_with=None,
# filter_method="mi",
# filter_threshold=0.1,
handling_missing="impute",
imputation_method="mice",
use_r_mice=True,
full_obs_cols=None,
)
# Define variable types
variable_types = VariableTypes(
continuous=["age", "income"],
ordinal=["education_level"],
nominal=["gender", "diagnosis_1", "diagnosis_2"],
)
# Initialize the configuration
config = CausalPipeConfig(
variable_types=variable_types,
preprocessing_params=preprocessor_params,
skeleton_method=FASSkeletonMethod(),
orientation_method=FCIOrientationMethod(),
causal_effect_methods=[CausalEffectMethod(name="pearson")],
study_name="causal_analysis",
output_path="./output",
show_plots=True,
verbose=True,
)
Create an instance of the CausalPipe
class by passing the configuration object.
from causal_pipe import CausalPipe
# Initialize the toolkit
causal_pipe = CausalPipe(config)
Use the run_pipeline
method to execute the full causal discovery process, including data preprocessing, skeleton identification, edge orientation, and causal effect estimation.
import pandas as pd
# Load your data
data = pd.read_csv("your_data.csv")
# Run the causal discovery pipeline
causal_pipe.run_pipeline(data)
Below is an example demonstrating how to configure and run the full causal discovery pipeline using CausalPipe
.
import numpy as np
import pandas as pd
# Create a dummy DataFrame
np.random.seed(42)
df = pd.DataFrame(
{
"age": np.random.randint(20, 70, size=100),
"income": np.random.normal(50000, 15000, size=100),
"education_level": np.random.randint(1, 5, size=100),
"gender": np.random.choice(["Male", "Female"], size=100),
"diagnosis_1": np.random.randint(0, 2, size=100),
"diagnosis_2": np.random.randint(0, 2, size=100),
}
)
# Run the causal discovery pipeline
causal_pipe.run_pipeline(df)
# Access causal effects
print("Causal Effects:", causal_pipe.causal_effects)
Customize the skeleton identification and orientation methods to suit your specific analysis needs.
# Define preprocessing parameters
preprocessor_params = DataPreprocessingParams(
cat_to_codes=True,
standardize=False,
keep_only_correlated_with=None,
filter_method="pearson",
filter_threshold=0.2,
handling_missing="drop",
imputation_method="mice",
use_r_mice=True,
full_obs_cols=["age"],
)
# Initialize the configuration with BCSL skeleton method and Hill Climbing orientation
config = CausalPipeConfig(
variable_types=variable_types,
preprocessing_params=preprocessor_params,
skeleton_method=BCSLSkeletonMethod(
num_bootstrap_samples=200,
multiple_comparison_correction="fdr",
bootstrap_all_edges=True,
use_aee_alpha=0.05,
max_k=3,
),
orientation_method=HillClimbingOrientationMethod(
max_k=3,
multiple_comparison_correction="fdr",
),
causal_effect_methods=[
CausalEffectMethod(name="sem"),
CausalEffectMethod(name="pearson"),
],
study_name="custom_causal_analysis",
output_path="./output/custom_analysis",
show_plots=True,
verbose=True,
)
# Initialize the toolkit
causal_pipe = CausalPipe(config)
# Load your data
data = pd.read_csv("your_custom_data.csv")
# Run the causal discovery pipeline
causal_pipe.run_pipeline(data)
# Access causal effects
print("Causal Effects:", causal_pipe.causal_effects)
Comprehensive documentation is available to help you get started with CausalPipe and explore its full range of functionalities. Visit the CausalPipe Documentation for detailed guides, API references, and tutorials.
Contributions are welcome! If you'd like to contribute to CausalPipe, please follow these steps:
- Fork the Repository: Click the "Fork" button at the top-right corner of the repository page.
- Clone Your Fork:
git clone https://github.com/your-username/causal-pipe.git
- Create a Branch:
git checkout -b feature/your-feature-name
- Commit Your Changes:
git commit -m "Add your detailed description here"
- Push to Your Fork:
git push origin feature/your-feature-name
- Open a Pull Request: Navigate to the original repository and click "Compare & pull request."
Please ensure that your code adheres to the project's coding standards and includes appropriate tests.
This project is licensed under the MIT License.
For any questions or suggestions, feel free to reach out:
- Author: Albert Buchard
- Email: [email protected]
- GitHub: https://github.com/albertbuchard/causal-pipe
-
Visualization Outputs: Ensure that the output directory specified in the configuration exists or is created by CausalPipe. The toolkit will save visualizations like correlation graphs, skeletons, oriented graphs, and SEM results in the specified
output_path
. -
R Package Dependencies: Since CausalPipe integrates with R's
lavaan
andmice
packages, make sure that R is installed on your system and that these packages are accessible. The toolkit attempts to install missing R packages automatically, but you may need to configure R's library paths or permissions accordingly. -
Error Handling: The toolkit includes error handling to catch and report issues during data preprocessing, model fitting, and causal effect estimation. Pay attention to console outputs for any warnings or error messages that may require your attention.
-
Extensibility: CausalPipe is designed to be modular. You can extend its functionalities by adding new methods for skeleton identification, edge orientation, or causal effect estimation by creating new dataclasses and integrating them into the pipeline.
-
Performance Considerations: Some methods, especially those involving multiple imputation or complex SEM models, can be computationally intensive. Ensure that your system has sufficient resources, and consider optimizing parameters like
num_bootstrap_samples
ormax_iter
based on your dataset's size and complexity.
By following this guide and leveraging the provided examples, you can effectively utilize CausalPipe to perform sophisticated causal discovery and analysis on your datasets.