Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mic plots #162

Merged
merged 88 commits into from
Mar 4, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
996d60a
Start of MicPlotter class
martinghunt Feb 17, 2017
82ef5c3
Option --only_clusters instead of --noly_cluster
martinghunt Feb 17, 2017
4495b0f
new method _load_summary_file
martinghunt Feb 17, 2017
89b2a1f
New method to_boxplot_tsv
martinghunt Feb 17, 2017
50994ac
Return mutation info
martinghunt Feb 17, 2017
c940627
New method to_dots_tsv
martinghunt Feb 20, 2017
1ff9338
Separate with dots instead of plus for R happiness
martinghunt Feb 20, 2017
f0a50e9
Add method _make_plot, fix some file format issues
martinghunt Feb 21, 2017
b0e50ec
Add task micplot
martinghunt Feb 21, 2017
2ff0f0c
Allow columns starting with digits to stop R doing that annoying thin…
martinghunt Feb 21, 2017
8ce89bf
Add option --plot_types
martinghunt Feb 21, 2017
d3c499a
Remove debug print
martinghunt Feb 21, 2017
6644a23
Fix log_y option
martinghunt Feb 21, 2017
3c4aa3e
Add options jitter_width jitter_height
martinghunt Feb 21, 2017
8132972
Tidy up help for micplot
martinghunt Feb 21, 2017
723910d
Optional column 2 in filenames fofn
martinghunt Feb 22, 2017
3497108
Add option --no_combinations
martinghunt Feb 22, 2017
b8de5c7
nodingbats to keep Illustrator happy with the pdf
martinghunt Feb 22, 2017
f24e8a4
Add option --mic_values
martinghunt Feb 22, 2017
5b354fd
New option --hlines
martinghunt Feb 22, 2017
362ead9
fix usage typo
martinghunt Feb 22, 2017
38520b9
add options --point_size --dot_size
martinghunt Feb 22, 2017
1694631
Fix quotes in usage
martinghunt Feb 22, 2017
902f092
R library latex2exp not needed
martinghunt Feb 22, 2017
663667f
Remove dependency on R library cowplot
martinghunt Feb 22, 2017
b9a5da9
Remove whitespace in R data files
martinghunt Feb 23, 2017
7cc5edd
Add option --panel_heights
martinghunt Feb 23, 2017
893ac18
Add options --palette --number_of_colours
martinghunt Feb 23, 2017
673d796
Bug fix using chosen palette with 2 colours
martinghunt Feb 23, 2017
c6de9c1
Proprtionally sized point plotting
martinghunt Feb 23, 2017
f21dcfe
Add option --dot_outline
martinghunt Feb 23, 2017
fdac1d6
Tidy up indentation
martinghunt Feb 23, 2017
f7a5de7
Add legend and related options
martinghunt Feb 23, 2017
9767db1
Do not try to log zero (breaks older R versions in ggplot)
martinghunt Feb 24, 2017
50e24d2
Fix y axis label when logging
martinghunt Feb 24, 2017
b212674
Fixes for older R versions
martinghunt Feb 24, 2017
ddf74be
Fix for older R version
martinghunt Feb 24, 2017
bff8514
Add option --use_hets
martinghunt Feb 24, 2017
b389eff
No hlines by default
martinghunt Feb 24, 2017
4909635
New option --dot_y_text_size
martinghunt Feb 24, 2017
3ca945f
Check depth not too low
martinghunt Feb 24, 2017
e441a78
Change variant string format in summary output
martinghunt Feb 24, 2017
9d5dfb8
Group micplot options
martinghunt Feb 24, 2017
27902b1
Sort mutations by position; speed up avoid for loop
martinghunt Feb 27, 2017
cbc0f39
Report filename when wrong number of columns
martinghunt Feb 27, 2017
9c86b78
Add option --interrupted
martinghunt Feb 27, 2017
c02f1ec
Add --interrupted option
martinghunt Feb 27, 2017
4261c35
Fix sorting variant names
martinghunt Feb 27, 2017
3c60d72
Remove unused option
martinghunt Feb 27, 2017
800fc7c
Bug fix reporting hets not really present
martinghunt Feb 27, 2017
5614d62
bug fix losing without_mutation
martinghunt Feb 27, 2017
6d76f9d
Add options --violin_y_jitter --violin_scale_width
martinghunt Feb 28, 2017
7332426
metavar=float
martinghunt Feb 28, 2017
f78a2f5
Use matplotlib instead of R
martinghunt Feb 28, 2017
a95def6
New method _nucmer_hits_to_ref_and_qry_coords
Mar 1, 2017
9cbaf33
Bug fix when one contig hits same ref position in two different place…
Mar 1, 2017
db5ef80
add hidden --xkcd option
martinghunt Mar 2, 2017
b5a367b
plot_width/height fix (were swapped)
martinghunt Mar 2, 2017
cbd4fd4
Fix plot_width/height usage
martinghunt Mar 2, 2017
9bcb29d
Add --colour_skip option
martinghunt Mar 2, 2017
f445903
Rename palette -> colourmap
martinghunt Mar 2, 2017
e324a81
Tweak usage, add metavar
martinghunt Mar 2, 2017
c003682
Remove jitter_height option
martinghunt Mar 2, 2017
07ff877
New method _filter_top_data
martinghunt Mar 2, 2017
93a8862
Add option --min_samples
martinghunt Mar 2, 2017
9b28a32
Implement --point_size option
martinghunt Mar 2, 2017
f065a87
Fix typo in usage
martinghunt Mar 2, 2017
ec30404
Implement x jitter
martinghunt Mar 2, 2017
fe7d072
Report presence/absence sequences
martinghunt Mar 2, 2017
0927a65
Ignore MULTIPLE
martinghunt Mar 2, 2017
e7f018c
fix --plot_types (remove option of boxplot)
martinghunt Mar 2, 2017
add501b
Fix short usage
martinghunt Mar 2, 2017
3e1a80d
Remove unused options
martinghunt Mar 2, 2017
bdd7746
Implement --dot_y_text_size
martinghunt Mar 2, 2017
0aac512
Implement --dot_outline
martinghunt Mar 2, 2017
929625e
Add a little to top y axis min/max
martinghunt Mar 2, 2017
844ee4b
Add options --panel_widths --count_legend_x
martinghunt Mar 2, 2017
5c42985
Add option out_format
martinghunt Mar 2, 2017
d4f5ad2
Bug fix checking if variant only
martinghunt Mar 2, 2017
9d0b3e2
Add Mann Whiteny test
martinghunt Mar 3, 2017
24e714f
Tidy up counts legend
martinghunt Mar 3, 2017
283b253
Add --point_scale option
martinghunt Mar 3, 2017
080c59c
Add option --p_cutoff and do p-value correction
martinghunt Mar 3, 2017
8c8b9b3
report effect size
Mar 4, 2017
2403016
version bump 2.8.0
Mar 4, 2017
e6feb25
Require matplotlib
Mar 4, 2017
0d2904f
fix print to work with python 3.4.2
Mar 4, 2017
c2f3935
Version bump 2.8.1
Mar 4, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,8 @@ Once the dependencies are installed, install ARIBA using pip:

ARIBA also depends on several Python packages, all of which are available
via pip, so the above command will get those automatically if they
are not installed. The packages are dendropy >= 4.2.0,
are not installed. The packages are dendropy >= 4.2.0, matplotlib (no
minimum version required, but only tested on 2.0.0),
pyfastaq >= 3.12.0, pysam >= 0.9.1, and pymummer >= 0.10.1.

Alternatively, you can download the latest release from this github repository,
Expand Down
1 change: 1 addition & 0 deletions ariba/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
'mapping',
'megares_data_finder',
'megares_zip_parser',
'mic_plotter',
'mlst_profile',
'mlst_reporter',
'pubmlst_getter',
Expand Down
33 changes: 33 additions & 0 deletions ariba/assembly_compare.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,39 @@ def nucmer_hits_to_ref_coords(cls, nucmer_hits, contig=None):
return coords


@classmethod
def nucmer_hits_to_ref_and_qry_coords(cls, nucmer_hits, contig=None):
'''Same as nucmer_hits_to_ref_coords, except removes containing hits first,
and returns ref and qry coords lists'''
if contig is None:
ctg_coords = {key: [] for key in nucmer_hits.keys()}
else:
ctg_coords = {contig: []}

ref_coords = {}

for key in ctg_coords:
hits = copy.copy(nucmer_hits[key])
hits.sort(key=lambda x: len(x.ref_coords()))

if len(hits) > 1:
i = 0
while i < len(hits) - 1:
c1 = hits[i].ref_coords()
c2 = hits[i+1].ref_coords()
if c2.contains(c1):
hits.pop(i)
else:
i += 1

ref_coords[key] = [hit.ref_coords() for hit in hits]
ctg_coords[key] = [hit.qry_coords() for hit in hits]
pyfastaq.intervals.merge_overlapping_in_list(ref_coords[key])
pyfastaq.intervals.merge_overlapping_in_list(ctg_coords[key])

return ctg_coords, ref_coords


@staticmethod
def ref_cov_per_contig(nucmer_hits):
'''Input is hits made by self._parse_nucmer_coords_file.
Expand Down
23 changes: 19 additions & 4 deletions ariba/assembly_variants.py
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,7 @@ def _get_remaining_known_ref_variants(known_ref_variants, used_ref_variants, nuc
return variants


def get_variants(self, ref_sequence_name, nucmer_coords):
def get_variants(self, ref_sequence_name, allowed_ctg_coords, allowed_ref_coords, nucmer_matches=None):
'''Nucmr coords = dict. Key=contig name. Value = list of intervals of ref coords that match the contig.
Made by assembly_compare.AssemblyCompare.nucmer_hits_to_ref_coords
Returns dictionary. Key=contig name. Value = list of variants. Each variant
Expand All @@ -287,12 +287,27 @@ def get_variants(self, ref_sequence_name, nucmer_coords):

known_non_wild_variants_in_ref = self.refdata.all_non_wild_type_variants(ref_sequence_name)

for contig in nucmer_coords:
for contig in allowed_ctg_coords:
if contig not in allowed_ref_coords:
continue

used_known_variants = set()
variants[contig] = []

if contig in mummer_variants:
for mummer_variant_list in mummer_variants[contig]:
ref_start = min([x.ref_start for x in mummer_variant_list])
ref_end = max([x.ref_end for x in mummer_variant_list])
ctg_start = min([x.qry_start for x in mummer_variant_list])
ctg_end = min([x.qry_end for x in mummer_variant_list])
ref_interval = intervals.Interval(ref_start, ref_end)
ctg_interval = intervals.Interval(ctg_start, ctg_end)
ref_ok = True in {x.intersects(ref_interval) for x in allowed_ref_coords[contig]}
qry_ok = True in {x.intersects(ctg_interval) for x in allowed_ctg_coords[contig]}

if not (ref_ok and qry_ok):
continue

if seq_type == 'p':
new_variant, used_variants = self._get_one_variant_for_one_contig_coding(ref_sequence, refdata_var_dict, mummer_variant_list)
else:
Expand All @@ -306,10 +321,10 @@ def get_variants(self, ref_sequence_name, nucmer_coords):
# for this contig, need to know all the ref sequence and coords it maps to.
# Then report just the unused known variants, as the contig also has these variants
if seq_type == 'p':
new_variants = self._get_remaining_known_ref_variants(known_non_wild_variants_in_ref['p'], used_known_variants, nucmer_coords[contig])
new_variants = self._get_remaining_known_ref_variants(known_non_wild_variants_in_ref['p'], used_known_variants, allowed_ref_coords[contig])

else:
new_variants = self._get_remaining_known_ref_variants(known_non_wild_variants_in_ref['n'], used_known_variants, nucmer_coords[contig])
new_variants = self._get_remaining_known_ref_variants(known_non_wild_variants_in_ref['n'], used_known_variants, allowed_ref_coords[contig])

if is_variant_only:
new_variants = [x for x in new_variants if len(x[5]) > 0]
Expand Down
4 changes: 2 additions & 2 deletions ariba/cluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -382,9 +382,9 @@ def _run(self):
self.assembly_compare.run()
self.status_flag = self.assembly_compare.update_flag(self.status_flag)

nucmer_hits_to_ref = assembly_compare.AssemblyCompare.nucmer_hits_to_ref_coords(self.assembly_compare.nucmer_hits)
allowed_ctg_pos, allowed_ref_pos = assembly_compare.AssemblyCompare.nucmer_hits_to_ref_and_qry_coords(self.assembly_compare.nucmer_hits)
assembly_variants_obj = assembly_variants.AssemblyVariants(self.refdata, self.assembly_compare.nucmer_snps_file)
self.assembly_variants = assembly_variants_obj.get_variants(self.ref_sequence.id, nucmer_hits_to_ref)
self.assembly_variants = assembly_variants_obj.get_variants(self.ref_sequence.id, allowed_ctg_pos, allowed_ref_pos)

for var_list in self.assembly_variants.values():
for var in var_list:
Expand Down
3 changes: 2 additions & 1 deletion ariba/ext/vcfcall_ariba.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,8 @@ void vectorSumAndMax(const std::vector<uint32_t>& v, uint32_t& maxValue, uint32_

bool depthOk(const uint32_t& totalDepth, const uint32_t& testNum, const uint32_t& minSecondDepth, const float& maxAlleleFreq)
{
return (testNum >= minSecondDepth && 1.0 * testNum / totalDepth <= maxAlleleFreq);
double depthFreq = 1.0 * testNum / totalDepth;
return (testNum >= minSecondDepth && depthFreq >= (1 - maxAlleleFreq) && depthFreq <= maxAlleleFreq);
}


Expand Down
Loading