Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing tmp file during bamCoverage #546

Closed
blaiseli opened this issue Jun 30, 2017 · 26 comments
Closed

missing tmp file during bamCoverage #546

blaiseli opened this issue Jun 30, 2017 · 26 comments

Comments

@blaiseli
Copy link

I'm trying to obtain bigwig files for some RNA-seq data, and it takes a lot of time and ends up crashing on a FileNotFoundError for a file in /tmp

I noticed that the process used quite a lot of RAM: it seemed to be using almost all what was available (around 10 GB), but I experienced no swap symptoms. I did not observe lot of IO going on with iotop.

Is it possible that my workstation has not enough RAM for this and ends up loosing files in /tmp. Is it something that can happen?

I updated to deeptools-2.5.1, and the error is still there.

$ bamCoverage -b mapping_gonad_met2set25/hisat2/mapped_C_elegans/met2set25_2_on_C_elegans_sorted.bam --filterRNAstrand reverse -of=bigwig -bs 10 -p=1 --scaleFactor ${scale} -o mapping_gonad_met2set25/hisat2/mapped_C_elegans/met2set25_2/met2set25_2_on_C_elegans_norm_fwd.bw 
bamFilesList: ['mapping_gonad_met2set25/hisat2/mapped_C_elegans/met2set25_2_on_C_elegans_sorted.bam']
binLength: 10
numberOfSamples: None
blackListFileName: None
defaultFragmentLength: read length
numberOfProcessors: 1
verbose: False
region: None
bedFile: None
minMappingQuality: None
ignoreDuplicates: False
chrsToSkip: []
stepSize: 10
center_read: False
samFlag_include: None
samFlag_exclude: None
minFragmentLength: 0
maxFragmentLength: 0
zerosToNans: False
smoothLength: None
save_data: False
out_file_for_raw_data: None
maxPairedFragmentLength: 1000
filter_strand: reverse
Offset: [1, -1]
Traceback (most recent call last):
  File "/home/bli/.local/bin/bamCoverage", line 11, in <module>
    main(args)
  File "/home/bli/.local/lib/python3.6/site-packages/deeptools/bamCoverage.py", line 238, in main
    format=args.outFileFormat, smoothLength=args.smoothLength)
  File "/home/bli/.local/lib/python3.6/site-packages/deeptools/writeBedGraph.py", line 164, in run
    chrom_names_and_size, bedgraph_file, out_file_name, True)
  File "/home/bli/.local/lib/python3.6/site-packages/deeptools/writeBedGraph.py", line 296, in bedGraphToBigWig
    f = open("{}.sorted".format(_file.name))
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpf444sdye.sorted'

(${scale} is a number close to 1, if that matters)

@dpryan79
Copy link
Collaborator

How much space is free in /tmp?

@blaiseli
Copy link
Author

blaiseli commented Jun 30, 2017

I'm not sure this is the correct way of checking:

$ df -h /tmp
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       458G  343G   92G  79% /

This is before and at the beginning of a run of the bamCoverage command shown earlier, when bamCoverage still does not use a lot of RAM.

After some time, the available RAM is almost entirely used, no swapping observed, bamCoverage is running at 100% of one CPU. df -h /tmp still shows 92G available.

I spot the following lines in the output of du -hs /tmp/*(which took about 1 or 2 minutes to run)

1.7M	/tmp/_deeptools_84vs5do_.bg
1.9M	/tmp/_deeptools_8g4a75ls.bg
1004K	/tmp/_deeptools__kr8qupd.bg
1.9M	/tmp/_deeptools_tzai631l.bg
1008K	/tmp/_deeptools_y2k375qo.bg

There are also 2 other recent files in /tmp:

$ ls -lhrt /tmp/ | tail -7
-rw------- 1 bli  Mhe  1007K Jun 30 16:02 _deeptools_y2k375qo.bg
-rw------- 1 bli  Mhe   1.7M Jun 30 16:04 _deeptools_84vs5do_.bg
-rw------- 1 bli  Mhe   1.9M Jun 30 16:07 _deeptools_8g4a75ls.bg
-rw------- 1 bli  Mhe   1.9M Jun 30 16:09 _deeptools_tzai631l.bg
-rw------- 1 bli  Mhe  1001K Jun 30 16:10 _deeptools__kr8qupd.bg
-rw------- 1 bli  Mhe   513K Jun 30 16:21 ycmd_53944_stderr_szm0drdd.log
-rw------- 1 bli  Mhe   1.8K Jun 30 16:22 ycmd_32835_stderr_nhebjixr.log

The bam file is 1.5G.

@blaiseli
Copy link
Author

blaiseli commented Jun 30, 2017

To be noted: I see a /tmp/tmpf444sdye file dating from the begining of this afternoon. This file name pattern corresponds to the missing *.sorted file reported in my first message:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpf444sdye.sorted'

I now see more /tmp/_deeptools_*.bg files, but still no /tmp/tmp?????

bamCoverage still running at 100% CPU and using a little less RAM now.

@blaiseli
Copy link
Author

Update: About an hour after starting, bamCoverage crashed:

Traceback (most recent call last):
  File "/home/bli/.local/bin/bamCoverage", line 11, in <module>
    main(args)
  File "/home/bli/.local/lib/python3.6/site-packages/deeptools/bamCoverage.py", line 238, in main
    format=args.outFileFormat, smoothLength=args.smoothLength)
  File "/home/bli/.local/lib/python3.6/site-packages/deeptools/writeBedGraph.py", line 164, in run
    chrom_names_and_size, bedgraph_file, out_file_name, True)
  File "/home/bli/.local/lib/python3.6/site-packages/deeptools/writeBedGraph.py", line 296, in bedGraphToBigWig
    f = open("{}.sorted".format(_file.name))
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpwqppp77v.sorted'

The only recent files left in /tmp are the following:

-rw------- 1 bli  Mhe    82 Jun 30 17:01 tmpwqppp77v
-rw------- 1 bli  Mhe  513K Jun 30 17:01 ycmd_53944_stderr_szm0drdd.log
-rw------- 1 bli  Mhe  2.3K Jun 30 17:02 ycmd_32835_stderr_nhebjixr.log

The /tmp/ycmd* files are actually completely unrelated: from their content I guess that they are generated by my text editor.

I couldn't monitor precisely what happened during the last minute, but the last time I checked the recent files in /tmp, I could see the following:

-rw------- 1 bli  Mhe  1007K Jun 30 16:02 _deeptools_y2k375qo.bg
-rw------- 1 bli  Mhe   1.7M Jun 30 16:04 _deeptools_84vs5do_.bg
-rw------- 1 bli  Mhe   1.9M Jun 30 16:07 _deeptools_8g4a75ls.bg
-rw------- 1 bli  Mhe   1.9M Jun 30 16:09 _deeptools_tzai631l.bg
-rw------- 1 bli  Mhe  1001K Jun 30 16:10 _deeptools__kr8qupd.bg
-rw------- 1 bli  Mhe   989K Jun 30 16:27 _deeptools_fnqqxtbv.bg
-rw------- 1 bli  Mhe   670K Jun 30 16:27 _deeptools_iw298w3l.bg
-rw------- 1 bli  Mhe   898K Jun 30 16:28 _deeptools_4uvy5ayn.bg
-rw------- 1 bli  Mhe   1.5M Jun 30 16:30 _deeptools_czrjci6r.bg
-rw------- 1 bli  Mhe   1.5M Jun 30 16:32 _deeptools_3di3imub.bg
-rw------- 1 bli  Mhe   1.4M Jun 30 16:33 _deeptools_x1woahhq.bg
-rw------- 1 bli  Mhe   844K Jun 30 16:33 _deeptools_tutsgh05.bg
-rw------- 1 bli  Mhe   943K Jun 30 16:34 _deeptools_4r2xgyv9.bg
-rw------- 1 bli  Mhe   1.8M Jun 30 16:36 _deeptools_u9dbl_il.bg
-rw------- 1 bli  Mhe   1.9M Jun 30 16:39 _deeptools_naeopo4g.bg
-rw------- 1 bli  Mhe   1.9M Jun 30 16:41 _deeptools_0j5hswrk.bg
-rw------- 1 bli  Mhe   1.4M Jun 30 16:41 _deeptools_bq70ao25.bg
-rw------- 1 bli  Mhe   380K Jun 30 16:42 _deeptools_0p8j_t2o.bg
-rw------- 1 bli  Mhe   880K Jun 30 16:42 _deeptools_u5e07_45.bg
-rw------- 1 bli  Mhe   898K Jun 30 16:43 _deeptools_xuwreca3.bg
-rw------- 1 bli  Mhe   1.1M Jun 30 16:45 _deeptools_hqp9_jbw.bg
-rw------- 1 bli  Mhe   1.4M Jun 30 16:46 _deeptools_6c237bz0.bg
-rw------- 1 bli  Mhe   1.5M Jun 30 16:48 _deeptools_qtz3rhmy.bg
-rw------- 1 bli  Mhe   655K Jun 30 16:49 _deeptools_xzcpjdsp.bg
-rw------- 1 bli  Mhe   287K Jun 30 16:49 _deeptools_ouplxyub.bg
-rw------- 1 bli  Mhe   513K Jun 30 16:51 ycmd_53944_stderr_szm0drdd.log
-rw------- 1 bli  Mhe   2.2K Jun 30 16:52 ycmd_32835_stderr_nhebjixr.log
-rw------- 1 bli  Mhe    31K Jun 30 16:53 _deeptools_6dbt4pph.bg
-rw------- 1 bli  Mhe   454K Jun 30 16:54 _deeptools_cf3fdlmc.bg
-rw------- 1 bli  Mhe   502K Jun 30 16:54 _deeptools_9r0zw06_.bg
-rw------- 1 bli  Mhe   810K Jun 30 16:55 _deeptools_kqctrui_.bg
-rw------- 1 bli  Mhe   944K Jun 30 16:56 _deeptools_n_ub8g40.bg
-rw------- 1 bli  Mhe   1.1M Jun 30 16:58 _deeptools_9pxg9986.bg
-rw------- 1 bli  Mhe   852K Jun 30 16:59 _deeptools_k94qzeev.bg
-rw------- 1 bli  Mhe   450K Jun 30 16:59 _deeptools_fztil8ie.bg
-rw------- 1 bli  Mhe   561K Jun 30 16:59 _deeptools_zfkm1sqh.bg
-rw------- 1 bli  Mhe   552K Jun 30 17:00 _deeptools_y09_v87f.bg
-rw------- 1 bli  Mhe   662K Jun 30 17:00 _deeptools_6w5vcn4f.bg
-rw------- 1 bli  Mhe   536K Jun 30 17:00 _deeptools_r_o780de.bg
-rw------- 1 bli  Mhe   588K Jun 30 17:00 _deeptools__30md89q.bg
-rw------- 1 bli  Mhe   609K Jun 30 17:00 _deeptools_7g75wayw.bg

The /tmp/tmpwqppp77v file has therefore been created only during the last minute over about one hour of run.

@blaiseli
Copy link
Author

I looked at the code where the error occurs:

sort_cmd = cfg.config.get('external_tools', 'sort')
    _file = NamedTemporaryFile(delete=False)
    for chrom, size in chromSizes:
        _file.write(toBytes("{}\t{}\n".format(chrom, size)))
    _file.close()
    system("LC_ALL=C {} -k1,1 -k2,2n {} > {}.sorted".format(sort_cmd, _file.name, _file.name))

I'm able to "manually" create the .sorted file in python, but I don't know what cfg.config.get('external_tools', 'sort') actually returns:

system("LC_ALL=C {} -k1,1 -k2,2n {} > {}.sorted".format("sort", "/tmp/tmpwqppp77v", "/tmp/tmpwqppp77v"))

And I now have /tmp/tmpwqppp77v.sorted

@blaiseli
Copy link
Author

I tried to edit the code as follows:

    ret_code = system("LC_ALL=C {} -k1,1 -k2,2n {} > {}.sorted".format(sort_cmd, _file.name, _file.name))
    if ret_code != 0:
        raise OSError()

The following error shows that the sort command fails:

Traceback (most recent call last):
  File "/home/bli/.local/bin/bamCoverage", line 11, in <module>
    main(args)
  File "/home/bli/.local/lib/python3.6/site-packages/deeptools/bamCoverage.py", line 238, in main
    format=args.outFileFormat, smoothLength=args.smoothLength)
  File "/home/bli/.local/lib/python3.6/site-packages/deeptools/writeBedGraph.py", line 164, in run
    chrom_names_and_size, bedgraph_file, out_file_name, True)
  File "/home/bli/.local/lib/python3.6/site-packages/deeptools/writeBedGraph.py", line 296, in bedGraphToBigWig
    raise OSError()
OSError

I should have taken the opportunity to insert more debugging information...

@blaiseli
Copy link
Author

blaiseli commented Jul 3, 2017

With more debugging info:

    sort_cmd = cfg.config.get('external_tools', 'sort')
    _file = NamedTemporaryFile(delete=False)
    for chrom, size in chromSizes:
        _file.write(toBytes("{}\t{}\n".format(chrom, size)))
    _file.close()
    print(sort_cmd)
    print(_file.name)
    print("LC_ALL=C {} -k1,1 -k2,2n {} > {}.sorted".format(sort_cmd, _file.name, _file.name))
    print(', '.join(listdir("/tmp")))
    ret_code = system("LC_ALL=C {} -k1,1 -k2,2n {} > {}.sorted".format(sort_cmd, _file.name, _file.name))
    if ret_code != 0:
        raise OSError()
sort
/tmp/tmpubs_xfmx
LC_ALL=C sort -k1,1 -k2,2n /tmp/tmpubs_xfmx > /tmp/tmpubs_xfmx.sorted
.read_FCS.py.swp, jedihttp_58492_stderr_D_bomT.log, ycmd_32835_stderr_nhebjixr.log, .XIM-unix, .X1-lock, ycmd_32835_stdout_5yzzor1l.log, tmpf444sdye, .Test-unix, tmpeherjtxt, config-err-CNkHGE, systemd-private-4106110627ad405f89ae62a1d7527b8e-rtkit-daemon.service-tZbpvy, ycmd_53944_stderr_szm0drdd.log, .xfsm-ICE-CWCQ2Y, tmpk8b79_xe, systemd-private-4106110627ad405f89ae62a1d7527b8e-colord.service-h2iEnX, orbit-bli, .X11-unix, mozilla_bli0, jedihttp_40612_stderr_Dh_RYP.log, read_FCS.py, tmpubs_xfmx, ycm_od7ubot_.log, jedihttp_45110_stdout_0rabAW.log, .font-unix, tmpz73tm66i, ycm_132e0kih.log, tmpwqppp77v, thunderbird_bli, jedihttp_58492_stdout_ReROSs.log, ycmd_53944_stdout_shccctv6.log, jedihttp_45110_stderr_06xJAv.log, .X0-lock, tmp7clyo100, jedihttp_40612_stdout_ZLZn1O.log, tmp2535m6j6, .ICE-unix, ycmd_54594_stderr_oklmw42x.log, ycmd_54594_stdout_ln2ofpsx.log, tmpwqppp77v.sorted, ycm_xdwiz7xs.log, firefox_bli
Traceback (most recent call last):
  File "/home/bli/.local/bin/bamCoverage", line 11, in <module>
    main(args)
  File "/home/bli/.local/lib/python3.6/site-packages/deeptools/bamCoverage.py", line 238, in main
    format=args.outFileFormat, smoothLength=args.smoothLength)
  File "/home/bli/.local/lib/python3.6/site-packages/deeptools/writeBedGraph.py", line 164, in run
    chrom_names_and_size, bedgraph_file, out_file_name, True)
  File "/home/bli/.local/lib/python3.6/site-packages/deeptools/writeBedGraph.py", line 300, in bedGraphToBigWig
    raise OSError()
OSError

The file /tmp/tmpubs_xfmx is present in the os.listdir generated list, and if I manually run LC_ALL=C sort -k1,1 -k2,2n /tmp/tmpubs_xfmx > /tmp/tmpubs_xfmx.sorted, there's no error and a corresponding .sorted file is generated.

@blaiseli
Copy link
Author

blaiseli commented Jul 3, 2017

Side question: What's the point of this piece of code?

What I see is that the NamedTemporaryFile is used to store chromosome names and lengths, then this is unix-sorted, and the sorted file is read into a list of (chrom, int(chromLen)) pairs. Then both files are removed.

Isn't it possible to achieve the same sort within python?

@dpryan79
Copy link
Collaborator

dpryan79 commented Jul 3, 2017

Possibly. This is pretty old code and predates me, but my understanding is that the goal was to produce consistent sorting across machines with different locale settings. That's possible with forcing LC_ALL=C on the command line, but I don't know if it can be done internally in python. My personal preference would be to just use the same order that the input BAM file(s) have and forget ordering. I'd also write only BigWig files as temp files to cut down on space requirements and then use some custom merging code.

@dpryan79
Copy link
Collaborator

dpryan79 commented Jul 3, 2017

Out of curiousity, what is the ret_code? That could be telling as to what's going wrong.

@blaiseli
Copy link
Author

blaiseli commented Jul 3, 2017

According to https://stackoverflow.com/questions/8776807/is-pythons-sort-function-the-same-as-linuxs-sort-with-lc-all-c, the correct sorting behaviour can be enforced the following way:

import locale
locale.setlocale(locale.LC_ALL, "C")

And then use cmp=locale.strcoll in the sort command.

@blaiseli
Copy link
Author

blaiseli commented Jul 3, 2017

I modified the code as follows in order to report the exit code and try using python sort:

    print("LC_ALL=C {} -k1,1 -k2,2n {} > {}.sorted".format(sort_cmd, _file.name, _file.name))
    ret_code = system("LC_ALL=C {} -k1,1 -k2,2n {} > {}.sorted".format(sort_cmd, _file.name, _file.name))
    if ret_code != 0:
        print("command returned with %d exit code\nTrying python sort" % ret_code)
        cl = sorted([(chrom, int(chromLen)) for chrom, chromLen in chromSizes])
    else:
        cl = []
        f = open("{}.sorted".format(_file.name))
        for line in f:
            chrom, chromLen = line.split()
            cl.append((chrom, int(chromLen)))
        f.close()
        remove("{}.sorted".format(_file.name))
    remove(_file.name)

Here is the output:

LC_ALL=C sort -k1,1 -k2,2n /tmp/tmpixq7m0g6 > /tmp/tmpixq7m0g6.sorted
command returned with -1 exit code
Trying python sort
[bwClose] There was an error while finishing writing a bigWig file! The output is likely truncated.

@dpryan79
Copy link
Collaborator

dpryan79 commented Jul 3, 2017

What does cl look like after it's been sorted?

@blaiseli
Copy link
Author

blaiseli commented Jul 3, 2017

I tried to use the locale.strcoll trick, but one apparently cannot use cmp in sorted in python 3. There is a functools.cmp_to_key utility to convert this to a sorting key. It's currently running.

@blaiseli
Copy link
Author

blaiseli commented Jul 3, 2017

I had to do it differently because TypeError: strcoll() argument 1 must be str, not tuple:

    import locale
    locale.setlocale(locale.LC_ALL, "C")
    from functools import cmp_to_key
    def str_int_cmp(pair1, pair2):
        str1, int1 = pair1
        str2, int2 = pair2
        if locale.strcoll(str1, str2) < 0:
            return -1
        elif locale.strcoll(str1, str2) > 0:
            return 1
        else:
            return int1 - int2


    # Make a list of tuples for the bigWig header, this MUST be sorted identically to the bedGraph file
    sort_cmd = cfg.config.get('external_tools', 'sort')
    _file = NamedTemporaryFile(delete=False)
    for chrom, size in chromSizes:
        _file.write(toBytes("{}\t{}\n".format(chrom, size)))
    _file.close()
    unix_sort = "LC_ALL=C {} -k1,1 -k2,2n {} > {}.sorted".format(sort_cmd, _file.name, _file.name)
    ret_code = system(unix_sort)
    if ret_code != 0:
        import sys
        sys.stderr.write("command \"%s\" returned with %d exit code\nTrying python sort\n" % (unix_sort, ret_code))
        cl = sorted([(chrom, int(chromLen)) for chrom, chromLen in chromSizes], key=cmp_to_key(str_int_cmp))
        sys.stderr.write("\n".join(["%s\t%d" % (chr, chr_len) for chr, chr_len in cl]))
    else:
        cl = []
        f = open("{}.sorted".format(_file.name))
        for line in f:
            chrom, chromLen = line.split()
            cl.append((chrom, int(chromLen)))
        f.close()
        remove("{}.sorted".format(_file.name))
    remove(_file.name)

I think the comparison function can be simplified as follows:

def str_int_cmp(pair1, pair2):
    str1, int1 = pair1
    str2, int2 = pair2
    return locale.strcoll(str1, str2) or (int1 - int2)

But this looks a bit "hackish".

Probably better:

def str_int_key(pair):
    return (locale.strxfrm(pair[0]), pair[1])

And use this directly as key in sorted.

@blaiseli
Copy link
Author

blaiseli commented Jul 3, 2017

When I run the code using sorted as described above (with the first version of str_int_cmp), I get the following:

command "LC_ALL=C sort -k1,1 -k2,2n /tmp/tmpilsh0syc > /tmp/tmpilsh0syc.sorted" returned with -1 exit code
Trying python sort
I	15072434
II	15279421
III	13783801
IV	17493829
MtDNA	13794
V	20924180
X	17718942[bwClose] There was an error while finishing writing a bigWig file! The output is likely truncated.

@fidelram
Copy link
Collaborator

fidelram commented Jul 3, 2017 via email

@blaiseli
Copy link
Author

blaiseli commented Jul 3, 2017

I just noticed this in my dmesg output:

[346800.176174] INFO: task bamCoverage:28788 blocked for more than 120 seconds.
[346800.176181]       Not tainted 4.4.0-81-generic #104-Ubuntu
[346800.176183] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[346800.176186] bamCoverage     D ffff8801056ffc68     0 28788   2823 0x00000000
[346800.176192]  ffff8801056ffc68 ffff8801056ffc60 ffff88041ce90e00 ffff880347721c00
[346800.176197]  ffff880105700000 ffff88041b4d5984 ffff880347721c00 00000000ffffffff
[346800.176201]  ffff88041b4d5988 ffff8801056ffc80 ffffffff8183c955 ffff88041b4d5980
[346800.176204] Call Trace:
[346800.176216]  [<ffffffff8183c955>] schedule+0x35/0x80
[346800.176220]  [<ffffffff8183cbfe>] schedule_preempt_disabled+0xe/0x10
[346800.176224]  [<ffffffff8183e839>] __mutex_lock_slowpath+0xb9/0x130
[346800.176228]  [<ffffffff8183e8cf>] mutex_lock+0x1f/0x30
[346800.176236]  [<ffffffff8121df5c>] path_openat+0x4bc/0x1330
[346800.176239]  [<ffffffff8121ffc1>] do_filp_open+0x91/0x100
[346800.176245]  [<ffffffff81226954>] ? dput+0x34/0x220
[346800.176249]  [<ffffffff8122d906>] ? __alloc_fd+0x46/0x190
[346800.176255]  [<ffffffff8120e468>] do_sys_open+0x138/0x2a0
[346800.176260]  [<ffffffff8120e5ee>] SyS_open+0x1e/0x20
[346800.176264]  [<ffffffff81840a72>] entry_SYSCALL_64_fastpath+0x16/0x71
[359980.976545] bamCoverage[3827]: segfault at 0 ip 00007f2797af7ed9 sp 00007ffe622b78e8 error 6 in libc-2.23.so[7f27979aa000+1c0000]

Could be related to the [bwClose] There was an error while finishing writing a bigWig file! The output is likely truncated. message.

@blaiseli
Copy link
Author

blaiseli commented Jul 3, 2017

I further edited writeBedGraph.py to catch possible system errors:

    if sort:
        # temporary file to store sorted bedgraph file
        _file = NamedTemporaryFile(delete=False)
        tempfilename1 = _file.name
        unix_sort = "LC_ALL=C {} -k1,1 -k2,2n {} > {}".format(sort_cmd, bedGraphPath, tempfilename1)
        ret_code = system(unix_sort)
        if ret_code != 0:
            raise OSError("Command \"%s\" failed with %d exit code." % (unix_sort, ret_code))
        bedGraphPath = tempfilename1

And it appears that this sort is actually also failing:

Traceback (most recent call last):
  File "/home/bli/.local/bin/bamCoverage", line 11, in <module>
    main(args)
  File "/home/bli/.local/lib/python3.6/site-packages/deeptools/bamCoverage.py", line 238, in main
    format=args.outFileFormat, smoothLength=args.smoothLength)
  File "/home/bli/.local/lib/python3.6/site-packages/deeptools/writeBedGraph.py", line 164, in run
    chrom_names_and_size, bedgraph_file, out_file_name, True)
  File "/home/bli/.local/lib/python3.6/site-packages/deeptools/writeBedGraph.py", line 345, in bedGraphToBigWig
    raise OSError("Command \"%s\" failed with %d exit code." % (unix_sort, ret_code))
OSError: Command "LC_ALL=C sort -k1,1 -k2,2n mapping_gonad_met2set25/hisat2/mapped_C_elegans/met2set25_2/met2set25_2_on_C_elegans_norm_fwd.bw.bg > /tmp/tmpgmymc1c6" failed with -1 exit code.

I don't know what a -1 error code is supposed to mean in this context. I suspect it is not the direct error code returned by sort.

I modified the code to have better diagnostics:

    unix_sort = "LC_ALL=C {} -k1,1 -k2,2n {} > {}.sorted".format(sort_cmd, _file.name, _file.name)
    p = Popen(
        [unix_sort],
        stdout=PIPE,
        stderr=PIPE,
        shell=True)
    result, err = p.communicate()
    if p.returncode != 0:
        import sys
        #raise IOError("The following error occurred:\n%s\nThe command was:\n%s\nThe return code was %d" % (err, unix_sort, p.returncode))
        sys.stderr.write("The following error occurred:\n%s\nThe command was:\n%s\nThe return code was %d\n" % (err, unix_sort, p.returncode))
    #ret_code = system(unix_sort)
    #if ret_code != 0:
        #sys.stderr.write("Command \"%s\" returned with %d exit code.\nTrying python sort\n" % (unix_sort, ret_code))
        sys.stderr.write("Trying python sort.\n")
        cl = sorted([(chrom, int(chromLen)) for chrom, chromLen in chromSizes], key=str_int_key)
        sys.stderr.write("\n".join(["%s\t%d" % (chr, chr_len) for chr, chr_len in cl]) + "\n")
    else:
        cl = []
        f = open("{}.sorted".format(_file.name))
        for line in f:
            chrom, chromLen = line.split()
            cl.append((chrom, int(chromLen)))
        f.close()
        remove("{}.sorted".format(_file.name))
    remove(_file.name)

And I obtain the following result:

Traceback (most recent call last):
  File "/home/bli/.local/bin/bamCoverage", line 11, in <module>
    main(args)
  File "/home/bli/.local/lib/python3.6/site-packages/deeptools/bamCoverage.py", line 238, in main
    format=args.outFileFormat, smoothLength=args.smoothLength)
  File "/home/bli/.local/lib/python3.6/site-packages/deeptools/writeBedGraph.py", line 164, in run
    chrom_names_and_size, bedgraph_file, out_file_name, True)
  File "/home/bli/.local/lib/python3.6/site-packages/deeptools/writeBedGraph.py", line 319, in bedGraphToBigWig
    shell=True)
  File "/home/bli/lib/python3.6/subprocess.py", line 707, in __init__
    restore_signals, start_new_session)
  File "/home/bli/lib/python3.6/subprocess.py", line 1260, in _execute_child
    restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory

This is strange considering that I experience no swapping symptoms, so the RAM is not saturated.

@steffenheyne
Copy link
Collaborator

steffenheyne commented Jul 4, 2017

can you try
df -h
to see what size tmpfs really has and where this is mounted!? Because tmpfs can be mounted somewhere else, eg under /run/ although /tmp/ is used as well and has enough space. I'm not a linux system expert, but maybe some piping is using tmpfs (ie. swap, or?) ...

ulimit -a can be helpful as well...

@steffenheyne
Copy link
Collaborator

steffenheyne commented Jul 4, 2017

you can also try to set a fixed tmpdir in the deeptools code for sort, eg. sort -T /enough/space/ -k1,1 -k2,2n to rule out any strange default dir usage

@blaiseli
Copy link
Author

blaiseli commented Jul 4, 2017

Here are the output of df -h:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            7.8G     0  7.8G   0% /dev
tmpfs           1.6G   38M  1.6G   3% /run
/dev/sda1       458G  344G   92G  79% /
tmpfs           7.8G  1.5M  7.8G   1% /dev/shm
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs           7.8G     0  7.8G   0% /sys/fs/cgroup
/dev/sdb1       1.8T   97G  1.7T   6% /extra
/dev/sdc1       1.8T  972G  769G  56% /extra2
tmpfs           1.6G   44K  1.6G   1% /run/user/1000

and ulimit -a

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 63715
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 63715
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

I'm suspecting the issue may be related to me having set /proc/sys/vm/overcommit_memory to 2 about a year ago, after having experienced heavy swapping issues and read things such as https://www.etalabs.net/overcommit.html.

I set back /proc/sys/vm/overcommit_memory to 0, and I'll see it this has visible effects.

@blaiseli
Copy link
Author

blaiseli commented Jul 4, 2017

Some interesting reading that reinforces my suspicion about overcommit_memory: https://stackoverflow.com/a/13329386/1878788

Memory-wise, we already know that subprocess.Popen uses fork/clone under the hood, meaning that every time you call it you're requesting once more as much memory as Python is already eating up, i.e. in the hundreds of additional MB, all in order to then exec a puny 10kB executable such as free or ps. In the case of an unfavourable overcommit policy, you'll soon see ENOMEM

I'm currently running the thing under valgrind --tool=massif (hoping that it can be useful understanding what is happening), after having set overcommit_memory back to 0.
It has been more than 6 and a half hours now and the program has not crashed yet.

It is quite surprizing that it takes so much time. Maybe this is valgrind massively slowing things [1]. There are no signs of swapping.

[1](It does: "Massif runs programs about 20x slower than normal.")

@blaiseli
Copy link
Author

blaiseli commented Jul 5, 2017

No crashes this time.

The memory usage graph obtained using ms_print is as follows:

    GB
7.286^           ::  :     :  :   :                                           
     |          #: ::::::@@::::::::::                                         
     |          #: : ::: @ :: ::: ::                                          
     |          #: : ::: @ :: ::: ::                                          
     |          #: : ::: @ :: ::: ::                                          
     |          #: : ::: @ :: ::: ::                                          
     |          #: : ::: @ :: ::: ::                                          
     |          #: : ::: @ :: ::: ::                                          
     |          #: : ::: @ :: ::: ::                                          
     |          #: : ::: @ :: ::: ::                                          
     |          #: : ::: @ :: ::: ::                                          
     |          #: : ::: @ :: ::: ::                                          
     |          #: : ::: @ :: ::: ::                                          
     |          #: : ::: @ :: ::: ::                                          
     |          #: : ::: @ :: ::: ::                                          
     |          #: : ::: @ :: ::: ::                             :::::        
     |          #: : ::: @ :: ::: ::                             :: :         
     |          #: : ::: @ :: ::: ::            ::::             :: :         
     |   :::::::#: : ::: @ :: ::: ::     ::: :::::: ::      :::  :: :         
     | @:: : : :#: : ::: @ :: ::: ::   ::::  :::::: ::   :::::   :: :  :::::: 
   0 +----------------------------------------------------------------------->Ti
     0                                                                   12.20

So it actually never exceeds the available memory (around 10 GB), and the problem was likely this overcommit restriction I had set. I'll rerun the program without valgrind and back with os.system, just to be sure.

That said, I believe it would be useful to use subprocess, check for failures and report the command's error message. If you think it is a good idea, I can do a pull request.

@dpryan79
Copy link
Collaborator

dpryan79 commented Jul 5, 2017

Thanks for the update. Don't worry about making a PR, I plan to just rewrite that bit of code completely for 2.6.0 (don't close this issue, I'm using it as the reminder to do that!).

@dpryan79
Copy link
Collaborator

I've implemented the bigWig creation changes in the develop branch and they're slated for inclusion in the 2.6.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants