diff_heu_heu.py
is a Python command-line application designed to create a diff model, apply the Heuristics Miner algorithm, compare "old" and "new" process models, and generate visualizations highlighting their differences.
This tool is particularly useful for analyzing and visualizing the differences in simulated models.
Illustrative example of the New Diff Model:
Consider two versions of a model, referred to as the 1st model and the 2nd model, each obtained by simulating potentially distinct variants of a formal specification. Unlike the original diff model, which relied on automatically generated graphical representations of the procedural part of the model, the new diff model is derived directly from the simulated event logs of the two model variants under comparison.
Let:
- L1 be the event log obtained from simulating the 1st model.
- L2 be the event log obtained from simulating the 2nd model.
Each log Li contains sequences of events, with each event including at least a case ID, a timestamp, and an activity name. If the underlying formalism distinguishes between states and activities, a preprocessing step merges states and transitions into a unified set of activities for process discovery. Otherwise, if only activities are available, no such merging is required.
We apply the Heuristics Miner (HM) algorithm to each log separately, obtaining two Heuristics Nets (HNs):
- H1 = (N1, E1, freq1) from L1
- H2 = (N2, E2, freq2) from L2
Here:
- Ni is a set of nodes representing discovered activities, as well as special start and end nodes.
- Ei is a set of edges capturing directly-follows relationships among nodes in Ni.
- freqi assigns frequencies to each edge in Ei.
The new diff model D is defined as:
D = (N_D, E_D, lN, lE)
Where:
- N_D = N1 ∪ N2 (the union of the nodes from both HNs).
- E_D = E1 ∪ E2 (the union of all edges).
- lN : N_D → { common, 1st-only, 2nd-only } labels each node based on whether it appears in both models (common), only in the 1st model (1st-only), or only in the 2nd model (2nd-only).
- lE : E_D → { common, 1st-only, 2nd-only } labels each edge similarly.
In the new diff model:
- Black nodes and edges (common) appear in both ( H_1 ) and ( H_2 ).
- Red nodes and edges (1st-only) represent behaviors and transitions present only in the 1st model.
- Blue nodes and edges (2nd-only) indicate behaviors and transitions introduced in or unique to the 2nd model.
This new diff model allows direct comparison of two mined models without relying on a known procedural representation.
- Pre-processing Logs: Cleans and formats event logs from CSV files.
- Heuristics Miner: Applies the Heuristics Miner algorithm to discover process models.
- Difference Analysis: Compares old and new process models to identify differences.
- Visualization: Generates PDF graphs highlighting the differences between models.
White-Box Validation of Collective Adaptive Systems by Statistical Model Checking and Process Mining. ISoLA (1) 2024: 204-222 Roberto Casaluce, Max Tschaikowski, Andrea Vandin:
- Python: Ensure you have Python 3.6 or higher installed. You can download Python from python.org.
- Graphviz: This tool requires Graphviz to generate visualizations.
First, clone this repository or download the diff_heu_heu.py
script to your local machine.
git clone https://github.com/rcasaluce/diff_heu_heu.git
cd process_logs_cli
It's recommended to use a virtual environment to manage dependencies. Below are instructions for creating and activating a virtual environment on different operating systems.
-
Open Command Prompt:
Press
Win + R
, typecmd
, and pressEnter
. -
Navigate to the Project Directory:
cd path\to\diff_heu_heu
-
Create a Virtual Environment:
python -m venv venv
-
Activate the Virtual Environment:
venv\Scripts\activate
-
Open Terminal.
-
Navigate to the Project Directory:
cd path/to/diff_heu_heu
-
Create a Virtual Environment:
python3 -m venv venv
-
Activate the Virtual Environment:
source venv/bin/activate
With the virtual environment activated, install the required Python libraries using pip
.
pip install -r requirements.txt
Alternatively, if a requirements.txt
file is not provided, install the dependencies manually:
pip install pm4py pandas numpy graphviz pydotplus pygraphviz
Note: If you encounter issues installing
pygraphviz
, ensure that Graphviz is properly installed on your system and that the Graphviz binaries are accessible via your system's PATH.
Graphviz is required for generating the visualization PDFs.
-
Download Graphviz:
Download the Graphviz installer from the Graphviz Download Page.
-
Install Graphviz:
Run the installer and follow the on-screen instructions.
-
Add Graphviz to PATH:
- Open the Start Menu, search for "Environment Variables," and select "Edit the system environment variables."
- Click on "Environment Variables."
- Under "System variables," find and select the
Path
variable, then click "Edit." - Click "New" and add the path to the Graphviz
bin
directory (e.g.,C:\Program Files\Graphviz\bin
). - Click "OK" to save changes.
-
Verify Installation:
Open Command Prompt and run:
dot -V
You should see the Graphviz version information.
-
Using Homebrew:
If you have Homebrew installed, you can install Graphviz with:
brew install graphviz
-
Verify Installation:
Open Terminal and run:
dot -V
You should see the Graphviz version information.
-
Using APT (Debian/Ubuntu):
sudo apt-get update sudo apt-get install graphviz
-
Using YUM (CentOS/RHEL):
sudo yum install graphviz
-
Verify Installation:
Open Terminal and run:
dot -V
You should see the Graphviz version information.
The script accepts the following command-line arguments:
--file_path_old
: (Required) Path to thefirst_model.csv
file.--file_path_new
: (Required) Path to thesecond_model.csv
file.--output_full
: (Optional) Filename for the complete differences PDF. Default:complete_differences
.--output_filtered_full
: (Optional) Filename for the filtered complete differences PDF. Default:filtered_differences
.
Ensure that your virtual environment is activated and that all dependencies are installed.
python diff_heu_heu.py \
--file_path_old "path/to/first_model.csv" \
--file_path_new "path/to/second_model.csv"
python diff_heu_heu.py \
--file_path_old "path/to/first_model.csv" \
--file_path_new "path/to/second_model.csv" \
--output_full "complete_differences" \
--output_filtered_full "filtered_differences"
This will generate:
complete_differences.pdf
filtered_differences.pdf
White-Box Validation of Collective Adaptive Systems by Statistical Model Checking and Process Mining. ISoLA (1) 2024: 204-222
Assuming your CSV files are located in ./logs/
, run:
python diff_heu_heu.py \
--file_path_old "./logs/robot_main_first.csv" \
--file_path_new "./logs/robot_main_second.csv" \
--output_full "complete_differences" \
--output_filtered_full "filtered_differences"
After execution, the script will generate the following PDF files in the current directory (or in the specified output path):
complete_differences.pdf
: Visualizes the complete differences between the old and new process models.filtered_differences.pdf
: Visualizes the filtered differences.
@inproceedings{DBLP:conf/isola/CasaluceTV24,
author = {Roberto Casaluce and
Max Tschaikowski and
Andrea Vandin},
editor = {Tiziana Margaria and
Bernhard Steffen},
title = {White-Box Validation of Collective Adaptive Systems by Statistical
Model Checking and Process Mining},
booktitle = {Leveraging Applications of Formal Methods, Verification and Validation.
REoCAS Colloquium in Honor of Rocco De Nicola - 12th International
Symposium, ISoLA 2024, Crete, Greece, October 27-31, 2024, Proceedings,
Part {I}},
series = {Lecture Notes in Computer Science},
volume = {15219},
pages = {204--222},
publisher = {Springer},
year = {2024},
url = {https://doi.org/10.1007/978-3-031-73709-1\_13},
doi = {10.1007/978-3-031-73709-1\_13},
timestamp = {Tue, 22 Oct 2024 21:07:33 +0200},
biburl = {https://dblp.org/rec/conf/isola/CasaluceTV24.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
This project is licensed under Apache License 2.0. See the LICENSE file for details.