Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polars 1xx #149

Open
wants to merge 65 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
ef010bc
Format all python files with "ruff format".
ghuls Jun 10, 2024
9bb9fda
Cleanup code of get_barcodes_passing_qc_for_sample.
ghuls Jun 11, 2024
0f73b45
Update Polars syntax to 1.0.0+ version.
ghuls Jul 9, 2024
2915437
Skip empty lines when reading barcode file in "read_barcodes_file_to_…
ghuls Jul 10, 2024
7d9a9cc
Add workaround for very rare cases in QC where KDE for duplication ra…
ghuls Jul 10, 2024
35e13ea
Add mypy stub generation and mypy numpy typing plugin.
ghuls Jul 10, 2024
14fce9a
Only keep one copy of each barcode when reading barcode file in "read…
ghuls Jul 15, 2024
39ace8a
Change df.columns with df.collect_schema().names() so it will work on…
ghuls Jul 15, 2024
9892252
Update "write_csv" usage to Polars syntax 1.0.0+ by using "include_he…
ghuls Jul 15, 2024
ca9b230
Do not use "by=" keyword in group_by as Polars 1.0.0+ treats that as …
ghuls Jul 15, 2024
861f9b7
Reformat TSS profile chained code by adding no opt "clone" calls.
ghuls Jul 16, 2024
1e2e0bf
Restrict version numbers for some depencencies.
ghuls Jul 16, 2024
bfa6e7b
Expose "engine" option in "pycistopic qc".
ghuls Jul 16, 2024
deef721
Support adding sample ID to cell barcodes when reading fragments or c…
ghuls Jul 16, 2024
49200f5
Remove unused argument from `get_insert_size_distribution` docstring.
ghuls Jul 16, 2024
15899de
Use greater than or equal for threshold filters in "get_barcodes_pass…
ghuls Jul 16, 2024
419a536
Change `pycistopic qc` to `pycistopic qc run`.
ghuls Jul 16, 2024
f9d78a1
Add `pycistopic qc filter` to be able to filter cell barcodes based o…
ghuls Jul 16, 2024
4661c7d
Fix "get_tss_profile" so it works both with "polars" and "polars-u64-…
ghuls Jul 17, 2024
7b285a4
Fix some columns to pl.UInt32 so dataframes have same schema both wit…
ghuls Jul 17, 2024
a40e47c
Add "create_fragment_matrix_from_fragments" to create directly a spar…
ghuls Jul 18, 2024
b074950
Add "create_regions_topics_frequency_matrix" function to replace "loa…
ghuls Aug 2, 2024
fab1300
Rename "create_regions_topics_frequency_matrix" to "create_regions_to…
ghuls Aug 5, 2024
ad9c756
Allow creating of mallet serialized corpus file directly from sparse …
ghuls Aug 5, 2024
2d54473
Expose creation of Mallet corpus file from pycistopic CLI interface.
ghuls Aug 5, 2024
b680420
Add some basic sanity checking to "convert_binary_matrix_to_mallet_co…
ghuls Aug 22, 2024
646db07
Pass correct alpha and eta to loglikelihood function.
ghuls Aug 26, 2024
a59ae9e
Remove Mallet text corpus after conversion to serialised corpus file.
ghuls Aug 26, 2024
b6d0973
Add LDAMalletFilenames class.
ghuls Aug 26, 2024
5d8cd37
Use --word-topic-counts-file instead of --output-state file when runn…
ghuls Aug 26, 2024
12c7c7b
Add static methods to read cell-topic probabilities from Mallet "--ou…
ghuls Aug 26, 2024
cbfe3bb
Add static methods to read region-topic counts and probabilities from…
ghuls Aug 26, 2024
8dbe474
Add static method to read JSON parameter file written by `LDAMallet.r…
ghuls Aug 26, 2024
3bbc709
Use new functions for reading Mallet "--word-topic-counts-file" output.
ghuls Aug 27, 2024
52de7e1
Remove deprecated code from LDAMallet class.
ghuls Aug 27, 2024
223981c
Add/update some logging statements in LDAMallet class.
ghuls Aug 27, 2024
14b6330
Add "--verbose" parameter to `pycistopic topic_modeling` subcommands.
ghuls Aug 27, 2024
bb8aa47
Update `pycistopic topic_modeling mallet` CLI code to use the new LDA…
ghuls Aug 27, 2024
3daefd2
Move "create_mallet_corpus" argument parsing code above "mallet" argu…
ghuls Aug 27, 2024
64ea33f
Create `pycistopic topic_modeling mallet` subparser and move Mallet r…
ghuls Aug 27, 2024
b6e7006
Rework `run_cgs_model_mallet` to `calculate_model_evaluation_stats` b…
ghuls Aug 28, 2024
801f334
Add `pycistopic topic_modeling mallet stats` subcommand.
ghuls Aug 28, 2024
8f2faef
Rename `binary_matrix` to `binary_accessibility_matrix`.
ghuls Aug 28, 2024
f47cb0c
Retrieve also Ensembl gene ID in `get_tss_annotation_from_ensembl`.
ghuls Sep 18, 2024
f15501c
Use dictionary for schema as required for Polars 1.xx.
ghuls Oct 24, 2024
3e7030d
Add docstring to loglikelihood function to explain why we use that ve…
ghuls Oct 24, 2024
7714f86
Use`infer_schema=False` instead of `infer_schema_length=0` (polars >=…
ghuls Oct 30, 2024
627bac6
Add `get_nonzero_row_indices` function.
ghuls Nov 6, 2024
ee945cf
Fix "*" imports in diff_features.py.
ghuls Nov 6, 2024
5c1a16b
Add `calculate_per_region_mean_and_dispersion_on_normalized_imputed_a…
ghuls Nov 6, 2024
75ae39e
Use more descriptive variable names in `calculate_per_region_mean_and…
ghuls Nov 7, 2024
b375ad0
Remove unneeded `normalize_scores` function, as it is superseded by `…
ghuls Nov 7, 2024
2fa4464
Update `find_highly_variable_features` to use output of `calculate_pe…
ghuls Nov 7, 2024
60c0a76
Add/update numba optimized version related to wilcoxon test.
ghuls Nov 8, 2024
acf1545
Change "wilcox" to "wilcoxon".
ghuls Nov 8, 2024
6870e73
Add longer explanation for tss "--no-cache" option.
ghuls Nov 12, 2024
06d3816
Catch likely BioMart request caching problems and give "--no-cache" h…
ghuls Nov 12, 2024
1d51050
import gr_overlap
SeppeDeWinter Dec 6, 2024
cb5f541
typo
SeppeDeWinter Dec 6, 2024
fadafc0
Update topic binarization code
SeppeDeWinter Dec 10, 2024
acedbc1
Add extra directories to ignore to gitignore.
ghuls Dec 13, 2024
ac87d35
Use "read_fragments_to_pyranges" instead of deprecated "read_fragment…
SeppeDeWinter Aug 29, 2024
ad8aadd
Fix `variables=None` and `color_dictionary=None` cases in "cell_topic…
SeppeDeWinter Sep 4, 2024
1b7d2ff
Raise error when target is not "cell" or "region" in "find_clusters".
SeppeDeWinter Sep 10, 2024
53fe3f7
Fix "create_fragment_matrix_from_fragments" in case last (few) region…
ghuls Jan 15, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,8 @@ share/python-wheels/
*.egg
target/
*.out

# Local directories
stubs/
test_files/
test_scripts/
10 changes: 5 additions & 5 deletions docs/source/notebooks/human_cerebellum.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -803,7 +803,7 @@
"id": "405b984c-4131-4d17-a253-5d056caa922e",
"metadata": {},
"source": [
"Next, let's calculate the QC metrics using the `pycistopic qc` command."
"Next, let's calculate the QC metrics using the `pycistopic qc run` command."
]
},
{
Expand All @@ -813,7 +813,7 @@
"metadata": {},
"outputs": [],
"source": [
"!pycistopic qc \\\n",
"!pycistopic qc run \\\n",
" --fragments data/fragments.tsv.gz \\\n",
" --regions outs/consensus_peak_calling/consensus_regions.bed \\\n",
" --tss outs/qc/tss.bed \\\n",
Expand Down Expand Up @@ -841,11 +841,11 @@
"\n",
"pycistopic_qc_commands_filename = \"pycistopic_qc_commands.txt\"\n",
"\n",
"# Create text file with all pycistopic qc command lines.\n",
"# Create text file with all pycistopic qc run command lines.\n",
"with open(pycistopic_qc_commands_filename, \"w\") as fh:\n",
" for sample, fragment_filename in fragments_dict.items():\n",
" print(\n",
" \"pycistopic qc\",\n",
" \"pycistopic qc run\",\n",
" f\"--fragments {fragment_filename}\",\n",
" f\"--regions {regions_bed_filename}\",\n",
" f\"--tss {tss_bed_filename}\",\n",
Expand Down Expand Up @@ -935,7 +935,7 @@
"\n",
"**Note:**\n",
"\n",
"The `pycistopic qc` command will determine automatic thresholds for the minimum number of unique number of fragments and the minumum TSS enrichment.\n",
"The `pycistopic qc run` command will determine automatic thresholds for the minimum number of unique number of fragments and the minumum TSS enrichment.\n",
"In case you want to change these thresholds or want to threhold based on FRIP, you can provide manually defined thresholds using the parameters:\n",
"- unique_fragments_threshold\n",
"- tss_enrichment_threshold\n",
Expand Down
11 changes: 8 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,13 +31,14 @@ classifiers = [
"Topic :: Scientific/Engineering :: Bio-Informatics",
]
dependencies = [
"numpy >= 1.20.3",
"numpy >= 1.20.3, < 2",
"pandas == 1.5",
"polars >= 0.18.3",
"polars >= 1",
"pyarrow >= 8.0.0",
"pyranges < 0.0.128",
"numba",
"ray",
"scatac_fragment_tools",
"scatac_fragment_tools >= 0.1.2",
"scikit-learn",
"lda",
"matplotlib < 3.7",
Expand Down Expand Up @@ -144,3 +145,7 @@ max-doc-length = 88

[tool.ruff.lint.flake8-tidy-imports]
ban-relative-imports = "all"

[tool.mypy]
mypy_path = "$MYPY_CONFIG_FILE_DIR/stubs"
plugins = ["numpy.typing.mypy_plugin"]
7 changes: 5 additions & 2 deletions src/pycisTopic/cistopic_class.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,10 @@
get_position_index,
non_zero_rows,
prepare_tag_cells,
read_fragments_from_file,
region_names_to_coordinates,
subset_list,
)
from pycisTopic.fragments import read_fragments_to_pyranges
from scipy import sparse

if TYPE_CHECKING:
Expand Down Expand Up @@ -813,7 +813,10 @@ def create_cistopic_object_from_fragments(
if path_to_fragments is not None:
log.info("Using fragments of provided pandas data frame")
else:
fragments = read_fragments_from_file(path_to_fragments, use_polars=use_polars)
fragments = read_fragments_to_pyranges(
fragments_bed_filename=path_to_fragments,
engine = "polars"
)

if "Score" not in fragments.df:
fragments_df = fragments.df
Expand Down
Loading