Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When I invoke poretools times command the result is not what is expected. #189

Open
PBGLMichaelHall opened this issue Jul 15, 2022 · 7 comments

Comments

@PBGLMichaelHall
Copy link

issue

@gringer
Copy link

gringer commented Jul 18, 2022

Poretools was created and developed at a time when fast5 files only had one read per file. Based on the file names, I'd guess you're looking at recent multi-fast5 files (probably from a Flongle), which have multiple reads per file. ONT does provide utilities in their github repostory to convert from one to another, but I expect you'll get a better outcome for what you want by looking directly at the read summary output from basecalling.

@PBGLMichaelHall
Copy link
Author

OK....
git clone https://github.com/nanoporetech/ont_fast5_api
pip install ./ont_fast5_api

python multi_to_single_fast5.py -i path/to-multi-fast5/directory -s some/output/directory

poretools times /some/output/directory

WARNING:poretools:No start time for fast5_.fast5!
WARNING:poretools:No start time for fast5_.fast5!
WARNING:poretools:No start time for fast5_.fast5!
WARNING:poretools:No start time for fast5_.fast5!
.
.
.
.
It can find keyinfo now but not start times after converting from multi to single!

@PBGLMichaelHall
Copy link
Author

I need specific columns of data to be generated by poretools times which is not in the sequencing summary text file generated from a MINION run. These specific data names are read in by a python script. The following data names are what is not generated currently and what is actually needed. Is there a way to generate these data variables with sequencing summary without using poretools times?

exp_starttime
unix_timestamp
unix_timestamp_end
iso_timestamp
read_length
day
hour
minute

@PBGLMichaelHall
Copy link
Author

A list of data variables the sequencing summary text file generates from a Minion Run;

filename
read_id
run_id
batch_id
channel
mux
start_time
duration
num_events
passes_filtering
template_start
num_events_template
template_duration
sequence_length_template
mean_qscore_template
strand_score_template
median_template
mad_template
scaling_median_template
scaling_mad_template

@gringer
Copy link

gringer commented Jul 27, 2022

I'll repeat that it's really not a great idea to use this old software for processing new data. It seems odd to need UNIX timestamp values (and derived values) for every single read.

ONT changed their time representation between different versions, and may have altered other things with FAST5 files. I think they changed from absolute time to relative time, so adding unix timestamp values would require fetching the experiment start time from the sequencing logs.

Or you could add a constant timestamp value of 1st January 2000 to everything, to make it really obvious that the timestamps are incorrect.

@arq5x
Copy link
Owner

arq5x commented Jul 27, 2022

Completely agree that this is no longer the toolset to use here. I need to update the README and make it obvious that poretools is deprecated owing to all of the ONT changes.

@PBGLMichaelHall
Copy link
Author

Which version of poretools has the correct time representation (UNIX)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants