Skip to content

Commit

Permalink
improve plot description
Browse files Browse the repository at this point in the history
  • Loading branch information
Ming-Yan committed Jan 21, 2025
1 parent 21c8694 commit f1ac017
Show file tree
Hide file tree
Showing 19 changed files with 130 additions and 79 deletions.
Binary file added docs/_static/figs/example_jetpt.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/figs/example_rebin2_jetpt.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/figs/example_rebin_jetpt.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/figs/example_sample_jetpt.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/figs/example_samplesplit_jetpt.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/auto.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Automation
# Automation


At the moment the automation is limited with the computing resources using gitlab ci [autobtv](https://gitlab.cern.ch/cms-analysis/btv/software-and-algorithms/autobtv).
Expand Down
22 changes: 11 additions & 11 deletions docs/developer.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## For developers: Add new workflow
# For developers: Add new workflow


The BTV tutorial for coffea part is under [`notebooks`](https://github.com/cms-btv-pog/BTVNanoCommissioning/tree/master/notebooks) and the template to construct new workflow is [`src/BTVNanoCommissioning/workflows/example.py`](https://github.com/cms-btv-pog/BTVNanoCommissioning/blob/master/src/BTVNanoCommissioning/workflows/example.py)
Expand All @@ -8,7 +8,7 @@ The BTV tutorial for coffea part is `notebooks/BTV_commissiong_tutorial-coffea.i

Use the `example.py` as template to develope new workflow.

### 0. Add new workflow info to `workflows/__init__.py`
## 0. Add new workflow info to `workflows/__init__.py`


```python
Expand All @@ -25,7 +25,7 @@ workflows["ctag_ttsemilep_sf"] = partial(
```
Notice that if you are working on a WP SFs, please put **WP** in the name.

### 1. Add histogram collections to `utils/histogrammer.py`
## 1. Add histogram collections to `utils/histogrammer.py`

The histograms are use the [`hist`](https://hist.readthedocs.io/en/latest/) in this framework. This can be easily to convert to root histogram by `uproot` or numpy histograms. For quick start of hist can be found [here](https://hist.readthedocs.io/en/latest/user-guide/quickstart.html)

Expand All @@ -46,7 +46,7 @@ _hist_dict["mujet_pt"] = Hist.Hist(
The kinematic variables/workflow specific variables are defined first, then it takes the common collections of input variables from the common defintion.
In case you want to add common variables use for all the workflow, you can go to [`helper/definition.py`](#add-new-common-variables)

### 2. Selections: Implemented selections on events (`workflow/`)
## 2. Selections: Implemented selections on events (`workflow/`)

Create `boolean` arrays along event axis. Also check whether some common selctions already in `utils/selection.py`

Expand Down Expand Up @@ -83,7 +83,7 @@ if self.selMod=="WcM":
event_level = req_trig & req_lumi & req_jet & req_muon & req_ele & req_leadlep_pt& req_Wc
```

### 3. Selected objects: Pruned objects with reduced event_level
## 3. Selected objects: Pruned objects with reduced event_level
Store the selected objects to event-based arrays. The selected object must contains **Sel**, for the muon-enriched jet and soft muon is **MuonJet** and **SoftMu**, the kinematics will store. The cross-object variables need to create entry specifically.

```python
Expand Down Expand Up @@ -136,7 +136,7 @@ if self.isArray:
</details>


### 4. Setup CI pipeline `.github/workflow`
## 4. Setup CI pipeline `.github/workflow`

The actions are checking the changes would break the framework. The actions are collected in `.github/workflow`
You can simply include a workflow by adding the entries with name
Expand Down Expand Up @@ -187,7 +187,7 @@ Yout can find the secret configuration in the direcotry : `Settings>>Secrets>>Ac
</details>


### 5. Refine used MC as input `sample.py`
## 5. Refine used MC as input `sample.py`
The `sample.py` collects the samples (dataset name) used in the workflow. This collections are use to create the dataset json file.
- `data` : data sample (MuonEG, Muon0....)
- `MC`: main MC used for the workflow
Expand Down Expand Up @@ -223,8 +223,8 @@ Here's the example for BTA_ttbar
},
```

### Optional changes
#### Add workflow to `scripts/suball.py`
## Optional changes
### Add workflow to `scripts/suball.py`
The `suball.py` summarize the steps to obtain the result.
In case your task requires to run several workflows, you can wrapped them as `dict` of the workflows
```python
Expand All @@ -244,7 +244,7 @@ scheme = {
],
}
```
#### Add new common variables in `helper/definition.py`
### Add new common variables in `helper/definition.py`

In the `definition.py` we collect the axis definition, name and label of tagger scores/input variables
```python
Expand All @@ -263,7 +263,7 @@ definitions_dict = {
...
}
```
#### Additional corrections and uncertainty variations not in the framework
### Additional corrections and uncertainty variations not in the framework
The corrections are collected in `utils/correction.py`. There are two types of the variation: weight varations, i.e. SFs, ueps weight, or object energy scale/resolution variations: JES/JER. Here's an example to add new corrections

1. Add new info `utils/AK4_parameter.py`
Expand Down
3 changes: 2 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,10 @@ Currently the available workflows are summarized
installation.md
user.md
developer.md
structure.md
scaleout.md
wf.md
scripts.md
scaleout.md
auto.md
api.rst
```
2 changes: 0 additions & 2 deletions docs/run.md

This file was deleted.

19 changes: 14 additions & 5 deletions docs/scaleout.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,14 +27,14 @@ WJets_inc (Nano_v11)|1183MB |630MB |1180MB|



#### Condor@FNAL (CMSLPC)
#### dask: Condor@FNAL (CMSLPC)
Follow setup instructions at https://github.com/CoffeaTeam/lpcjobqueue. After starting
the singularity container run with
```bash
python runner.py --wf ttcom --executor dask/lpc
```

#### Condor@CERN (lxplus)
#### daskLCondor@CERN (lxplus)
Only one port is available per node, so its possible one has to try different nodes until hitting
one with `8786` being open. Other than that, no additional configurations should be necessary.

Expand Down Expand Up @@ -62,7 +62,7 @@ python runner.py --wf ttcom --executor dask/casa
Authentication is handled automatically via login auth token instead of a proxy. File paths need to replace xrootd redirector with "xcache", `runner.py` does this automatically.


#### Condor@DESY
#### parsl/dask with Condor
```bash
python runner.py --wf ttcom --executor dask/condor(parsl/condor)
```
Expand Down Expand Up @@ -95,12 +95,21 @@ After executing the command, a new folder will be created, preparing the submiss

::: {admonition} Frequent issues for standalone condor jobs submission



1. CMS Connect provides a condor interface where one can submit jobs to all resources available in the CMS Global Pool. See [WorkBookCMSConnect Twiki](https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookCMSConnect#Requesting_different_Operative_S) for the instructions if you use it for the first time.
2. The submitted jobs are of the kind which requires a proper setup of the X509 proxy, to use the XRootD service to access and store data. In the generated `.jdl` file, you may see a line configured for this purpose `use_x509userproxy = true`. If you have not submitted jobs of this kind on lxplus condor, we recommend you to add a line
```bash
export X509_USER_PROXY=$HOME/x509up_u`id -u`
```
to `.bashrc` and run it so the proxy file will be stored in your AFS folder instead of in your `/tmp/USERNAME` folder. For submission on cmsconnect, no specific action is required.
:::


### FAQ for submission

- All jobs held: might indicate environment setup issue→ check out the condor err/out for parsl jobs the info are in `runinfo/JOBID/submit_scripts/`
- Exit without complain: might be huge memory consumption:
- Reduce `--chunk`, especially JERC variation are memory intense
- check the memory usage by calling `memory_usage_psutil`
- partially failed/held:
- could be temporarily unavailable of the files/site. If the retries not work, considering obtained failure file list and resubmit.
- error of certain files→ check the failed files and run it locally with `--executor iterative`
86 changes: 58 additions & 28 deletions docs/scripts.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
## Scripts for prepre input & process output
# Scripts for prepre input & process output

Here lists scripts can be used for BTV tasks


### `fetch.py` : create input json
## `fetch.py` : create input json


Use `fetch.py` in folder `scripts/` to obtain your samples json files. You can create `$input_list` ,which can be a list of datasets taken from CMS DAS or names of dataset(need to specify campaigns explicity), and create the json contains `dataset_name:[filelist]`. One can specify the local path in that input list for samples not published in CMS DAS.
Expand All @@ -13,25 +13,22 @@ The `--whitelist_sites, --blacklist_sites` are considered for fetch dataset if m






### `dump_prescale.py`: Get Prescale weights
## `dump_prescale.py`: Get Prescale weights

:::{caution}
Only works if `/cvmfs` is binding in the system
:::

Generate prescale weights using `brilcalc`

```python
```bash
python scripts/dump_prescale.py --HLT $HLT --lumi $LUMIMASK
# HLT : put prescaled triggers
# lumi: golden lumi json
```


### Get processed information
## Get processed information

Get the run & luminosity information for the processed events from the coffea output files. When you use `--skipbadfiles`, the submission will ignore files not accesible(or time out) by `xrootd`. This script helps you to dump the processed luminosity into a json file which can be calculated by `brilcalc` tool and provide a list of failed lumi sections by comparing the original json input to the one from the `.coffea` files.

Expand All @@ -41,7 +38,7 @@ Get the run & luminosity information for the processed events from the coffea ou
python scripts/dump_processed.py -c $COFFEA_FILES -n $OUTPUT_NAME (-j $ORIGINAL_JSON -t [all,lumi,failed])
```

### `make_template.py`: Store histograms from coffea file
## `make_template.py`: Store histograms from coffea file

Use `scripts/make_template.py` to dump 1D/2D histogram from `.coffea` to `TH1D/TH2D` with hist. MC histograms can be reweighted to according to luminosity value given via `--lumi`. You can also merge several files

Expand All @@ -65,25 +62,62 @@ python scripts/make_template.py -i "testfile/*.coffea" --lumi 7650 -o test.root



### Plotting code
#### data/MC comparisons
:exclamation_mark: If using wildcard for input, do not forget the quoatation marks! (see 2nd example below)

You can specify `-v all` to plot all the variables in the `coffea` file, or use wildcard options (e.g. `-v "*DeepJet*"` for the input variables containing `DeepJet`)
## Plotting code
### data/MC comparisons

:new: non-uniform rebinning is possible, specify the bins with list of edges `--autorebin 50,80,81,82,83,100.5`
Obtain the data MC comparisons from the input coffea files by normalized MC to corresponding luminosity.
You can specify `-v all` to plot all the variables in the `coffea` file, or use wildcard options (e.g. `-v "*DeepJet*"` for the input variables containing `DeepJet`). Individual variables can be also specify by splitting with `,`.

```bash
python scripts/plotdataMC.py -i $COFFEA --lumi $LUMI_IN_invPB -p $WORKFLOW -v $VARIABLE --autorebin $REBIN_OPTION --split $SPLIT_OPTION
python scripts/plotdataMC.py -i a.coffea,b.coffea --lumi 41500 -p ttdilep_sf -v z_mass,z_pt
python scripts/plotdataMC.py -i "test*.coffea" --lumi 41500 -p ttdilep_sf -v z_mass,z_pt # with wildcard option need ""
```

There are a few options supply for the splitting scheme based on jet flavor or sample.

<div style="display: flex; justify-content: space-around; align-items: center;">
<figure style="text-align: center;">
<img src="_static/figs/example_rebin_jetpt.png" alt="Picture 1" width="300" height="auto" style="display: block; margin: 0 auto" />
<figcaption>Default: split by jet flavor</figcaption>
</figure>

<figure style="text-align: center;">
<img src="_static/figs/example_sample_jetpt.png" alt="Picture 2" width="300" height="auto" style="display: block; margin: 0 auto" />
<figcaption>--split sample: split by MC samples</figcaption>
</figure>

<figure style="text-align: center;">
<img src="_static/figs/example_samplesplit_jetpt.png" alt="Picture 3" width="300" height="auto" style="display: block; margin: 0 auto" />
<figcaption>--split sample: split by MC samples</figcaption>
</figure>

</div>

It also supports rebinning. Integer input refers the the rebinning through merging bins `--rebin 2`. It also supports non-uniform rebinning, specify the bins with a list of edges `--autorebin 30,36,42,48,54,60,66,72,78,84,90,96,102,114,126,144,162,180,210,240,300`

<div style="display: flex; justify-content: space-around; align-items: center;">
<figure style="text-align: center;">
<img src="_static/figs/example_rebin_jetpt.png" alt="Picture 1" width="300" height="auto" style="display: block; margin: 0 auto" />
<figcaption>Default</figcaption>
</figure>
<figure style="text-align: center;">
<img src="_static/figs/example_rebin2_jetpt.png" alt="Picture 1" width="300" height="auto" style="display: block; margin: 0 auto" />
<figcaption>merge neighboring bins</figcaption>
</figure>
<figure style="text-align: center;">
<img src="_static/figs/example_rebin_jetpt.png" alt="Picture 2" width="300" height="auto" style="display: block; margin: 0 auto" />
<figcaption>non-uniform rebin</figcaption>
</figure>
</div>



```


```python

options:
-h, --help show this help message and exit
--lumi LUMI luminosity in /pb
--com COM sqrt(s) in TeV
-p {ttdilep_sf,ttsemilep_sf,ctag_Wc_sf,ctag_DY_sf,ctag_ttsemilep_sf,ctag_ttdilep_sf}, --phase {dilep_sf,ttsemilep_sf,ctag_Wc_sf,ctag_DY_sf,ctag_ttsemilep_sf,ctag_ttdilep_sf}
Expand Down Expand Up @@ -112,10 +146,10 @@ options:



#### data/data, MC/MC comparisons
### data/data, MC/MC comparisons

You can specify `-v all` to plot all the variables in the `coffea` file, or use wildcard options (e.g. `-v "*DeepJet*"` for the input variables containing `DeepJet`)
:exclamation_mark: If using wildcard for input, do not forget the quoatation marks! (see 2nd example below)


```bash
# with merge map, compare ttbar with data
Expand Down Expand Up @@ -158,19 +192,15 @@ options:



#### ROCs & efficiency plots
### ROCs & efficiency plots

Extract the ROCs for different tagger and efficiencies from validation workflow

```python
```bash
python scripts/validation_plot.py -i $INPUT_COFFEA -v $VERSION
```






```json
{
"WJets": ["WJetsToLNu_TuneCP5_13p6TeV-madgraphMLM-pythia8"],
Expand All @@ -181,15 +211,15 @@ python scripts/validation_plot.py -i $INPUT_COFFEA -v $VERSION
}
```

#### `correlation_plots.py` : get linear correlation from arrays
### `correlation_plots.py` : get linear correlation from arrays

You can perform a study of linear correlations of b-tagging input variables. Additionally, soft muon variables may be added into the study by requesting `--SMu` argument. If you wan to limit the outputs only to DeepFlavB, PNetB and RobustParTAK4B, you can use the `--limit_outputs` option. If you want to use only the set of variables used for tagger training, not just all the input variables, then use the option `--limit_inputs`. To limit number of files read, make use of option `--max_files`. In case your study requires splitting samples by flavour, use `--flavour_split`. `--split_region_b` performs a sample splitting based on the DeepFlavB >/< 0.5.

:::{caution}
For Data/MC comparison purpose pay attention - change ranking factors (xs/sumw) in L420!
:::

```python
```bash
python correlation_plots.py $input_folder [--max_files $nmax_files --SMu --limit_inputs --limit_outputs --specify_MC --flavour_split --split_region_b]
```

Expand All @@ -198,6 +228,6 @@ python correlation_plots.py $input_folder [--max_files $nmax_files --SMu --limit

To further investigate the correlations, one can create the 2D plots of the variables used in this study. Inputs and optional arguments are the same as for the correlation plots study.

```python
```bash
python 2Dhistogramms.py $input_folder [--max_files $nmax_files --SMu --limit_inputs --limit_outputs --specify_MC --flavour_split --split_region_b]
```
Loading

0 comments on commit f1ac017

Please sign in to comment.