Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

help: Restore from hibernation frequently fails (tuxonice) #30

Open
MPagel opened this issue Jun 6, 2017 · 0 comments
Open

help: Restore from hibernation frequently fails (tuxonice) #30

MPagel opened this issue Jun 6, 2017 · 0 comments

Comments

@MPagel
Copy link

MPagel commented Jun 6, 2017

The full post is posted to https://askubuntu.com/questions/922763/restore-from-hibernation-frequently-fails-tuxonice with a few other system details.

My system is running the unity DM on xenial (16.04LTS) currently with kernel Linux [namehere] 4.4.0-78-generic-tuxonice #99~ppa1-Ubuntu SMP Thu May 18 20:29:50 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
...
Approximately 50% of the time that I hibernate, I can not restore this disk image on the first try. The process makes it to 100% of the main userspace + kernel copy/restore, but then freezes on one of the next several steps:

(Non-backlit blank screen)
Doing atomic copy/restore (at 100%)
Doing atomic copy/restore (at 0%)

(possibly Doing atomic copy/restore at 100% again)

Infrequently it makes it to the login screen, but then freezes.

Sometimes when the system locks, the caps lock light flashes steadily. Other times there's no outward indication that it's locked. After shutting down by holding in the power button waiting a few secs and turning it back on, it usually (maybe 75% of the time) is able to restore from hibernate (of course, if I hit the 'c' key to confirm that I want to try to use the image again).

If it locks up on the second try, I'll try up to 8 times to restore..but if it hasn't restored on the 4th try, I don't think I've ever had it restore later, so I'll eventually invalidate the image, my assumption being that some part of the hibernation image was saved improperly. If it locks up on the login screen it sometimes, but not always, has already removed the hibernation image, making it impossible to try to restore again with that image.

If I use the default (non-tuxonice) kernel, I SEEM to have a slightly higher rate of success on the first try, but in the event of failure I'm not provided with the option to try to use the image again. So, I haven't rigorously tested the rates of success with both kernels.

I seem to have issues restoring from image more frequently when I've changed power or display state (plugged in to unplugged or vice versa; external display to unplugged or vv) between when I created the image and restore the image. But the issues don't happen every time that occurs, nor do they exclusively happen when power or display state are changed.

I sometimes use suspend instead of hibernate. About 10% of the time, this locks on the login screen, displaying the time I put it to sleep rather than the current time. When shutting down and rebooting, there is no option to use an image to restore (as expected).

Failure rate for either hibernate or suspend does not seem to be dependent on the method of hibernation/suspend: selection from the power menu on the top menubar, closing laptop w/o both external display+power plugged in, critically low battery, or command-line invocation.

Again, the full version of this plea for help posted to https://askubuntu.com/questions/922763/restore-from-hibernation-frequently-fails-tuxonice

NigelCunningham pushed a commit that referenced this issue Jul 24, 2017
[ Upstream commit 15a77c6 ]

With >=32 CPUs the userfaultfd selftest triggered a graceful but
unexpected SIGBUS because VM_FAULT_RETRY was returned by
handle_userfault() despite the UFFDIO_COPY wasn't completed.

This seems caused by rwsem waking the thread blocked in
handle_userfault() and we can't run up_read() before the wait_event
sequence is complete.

Keeping the wait_even sequence identical to the first one, would require
running userfaultfd_must_wait() again to know if the loop should be
repeated, and it would also require retaking the rwsem and revalidating
the whole vma status.

It seems simpler to wait the targeted wakeup so that if false wakeups
materialize we still wait for our specific wakeup event, unless of
course there are signals or the uffd was released.

Debug code collecting the stack trace of the wakeup showed this:

  $ ./userfaultfd 100 99999
  nr_pages: 25600, nr_pages_per_cpu: 800
  bounces: 99998, mode: racing ver poll, userfaults: 32 35 90 232 30 138 69 82 34 30 139 40 40 31 20 19 43 13 15 28 27 38 21 43 56 22 1 17 31 8 4 2
  bounces: 99997, mode: rnd ver poll, Bus error (core dumped)

    save_stack_trace+0x2b/0x50
    try_to_wake_up+0x2a6/0x580
    wake_up_q+0x32/0x70
    rwsem_wake+0xe0/0x120
    call_rwsem_wake+0x1b/0x30
    up_write+0x3b/0x40
    vm_mmap_pgoff+0x9c/0xc0
    SyS_mmap_pgoff+0x1a9/0x240
    SyS_mmap+0x22/0x30
    entry_SYSCALL_64_fastpath+0x1f/0xbd
    0xffffffffffffffff
    FAULT_FLAG_ALLOW_RETRY missing 70
  CPU: 24 PID: 1054 Comm: userfaultfd Tainted: G        W       4.8.0+ #30
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
  Call Trace:
    dump_stack+0xb8/0x112
    handle_userfault+0x572/0x650
    handle_mm_fault+0x12cb/0x1520
    __do_page_fault+0x175/0x500
    trace_do_page_fault+0x61/0x270
    do_async_page_fault+0x19/0x90
    async_page_fault+0x25/0x30

This always happens when the main userfault selftest thread is running
clone() while glibc runs either mprotect or mmap (both taking mmap_sem
down_write()) to allocate the thread stack of the background threads,
while locking/userfault threads already run at full throttle and are
susceptible to false wakeups that may cause handle_userfault() to return
before than expected (which results in graceful SIGBUS at the next
attempt).

This was reproduced only with >=32 CPUs because the loop to start the
thread where clone() is too quick with fewer CPUs, while with 32 CPUs
there's already significant activity on ~32 locking and userfault
threads when the last background threads are started with clone().

This >=32 CPUs SMP race condition is likely reproducible only with the
selftest because of the much heavier userfault load it generates if
compared to real apps.

We'll have to allow "one more" VM_FAULT_RETRY for the WP support and a
patch floating around that provides it also hidden this problem but in
reality only is successfully at hiding the problem.

False wakeups could still happen again the second time
handle_userfault() is invoked, even if it's a so rare race condition
that getting false wakeups twice in a row is impossible to reproduce.
This full fix is needed for correctness, the only alternative would be
to allow VM_FAULT_RETRY to be returned infinitely.  With this fix the WP
support can stick to a strict "one more" VM_FAULT_RETRY logic (no need
of returning it infinite times to avoid the SIGBUS).

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Andrea Arcangeli <[email protected]>
Reported-by: Shubham Kumar Sharma <[email protected]>
Tested-by: Mike Kravetz <[email protected]>
Acked-by: Hillf Danton <[email protected]>
Cc: Michael Rapoport <[email protected]>
Cc: "Dr. David Alan Gilbert" <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
NigelCunningham pushed a commit that referenced this issue Aug 13, 2017
commit d1aa245 upstream.

generic/361 reports below warning, this is because: once, there is
someone entering into critical region of sbi.cp_lock, if write_end_io.
f2fs_stop_checkpoint is invoked from an triggered IRQ, we will encounter
deadlock.

So this patch changes to use spin_{,un}lock_irq{save,restore} to create
critical region without IRQ enabled to avoid potential deadlock.

 irq event stamp: 83391573
 loop: Write error at byte offset 438729728, length 1024.
 hardirqs last  enabled at (83391573): [<c1809752>] restore_all+0xf/0x65
 hardirqs last disabled at (83391572): [<c1809eac>] reschedule_interrupt+0x30/0x3c
 loop: Write error at byte offset 438860288, length 1536.
 softirqs last  enabled at (83389244): [<c180cc4e>] __do_softirq+0x1ae/0x476
 softirqs last disabled at (83389237): [<c101ca7c>] do_softirq_own_stack+0x2c/0x40
 loop: Write error at byte offset 438990848, length 2048.
 ================================
 WARNING: inconsistent lock state
 4.12.0-rc2+ #30 Tainted: G           O
 --------------------------------
 inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
 xfs_io/7959 [HC1[1]:SC0[0]:HE0:SE1] takes:
  (&(&sbi->cp_lock)->rlock){?.+...}, at: [<f96f96cc>] f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
 {HARDIRQ-ON-W} state was registered at:
   __lock_acquire+0x527/0x7b0
   lock_acquire+0xae/0x220
   _raw_spin_lock+0x42/0x50
   do_checkpoint+0x165/0x9e0 [f2fs]
   write_checkpoint+0x33f/0x740 [f2fs]
   __f2fs_sync_fs+0x92/0x1f0 [f2fs]
   f2fs_sync_fs+0x12/0x20 [f2fs]
   sync_filesystem+0x67/0x80
   generic_shutdown_super+0x27/0x100
   kill_block_super+0x22/0x50
   kill_f2fs_super+0x3a/0x40 [f2fs]
   deactivate_locked_super+0x3d/0x70
   deactivate_super+0x40/0x60
   cleanup_mnt+0x39/0x70
   __cleanup_mnt+0x10/0x20
   task_work_run+0x69/0x80
   exit_to_usermode_loop+0x57/0x85
   do_fast_syscall_32+0x18c/0x1b0
   entry_SYSENTER_32+0x4c/0x7b
 irq event stamp: 1957420
 hardirqs last  enabled at (1957419): [<c1808f37>] _raw_spin_unlock_irq+0x27/0x50
 hardirqs last disabled at (1957420): [<c1809f9c>] call_function_single_interrupt+0x30/0x3c
 softirqs last  enabled at (1953784): [<c180cc4e>] __do_softirq+0x1ae/0x476
 softirqs last disabled at (1953773): [<c101ca7c>] do_softirq_own_stack+0x2c/0x40

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&(&sbi->cp_lock)->rlock);
   <Interrupt>
     lock(&(&sbi->cp_lock)->rlock);

  *** DEADLOCK ***

 2 locks held by xfs_io/7959:
  #0:  (sb_writers#13){.+.+.+}, at: [<c11fd7ca>] vfs_write+0x16a/0x190
  #1:  (&sb->s_type->i_mutex_key#16){+.+.+.}, at: [<f96e33f5>] f2fs_file_write_iter+0x25/0x140 [f2fs]

 stack backtrace:
 CPU: 2 PID: 7959 Comm: xfs_io Tainted: G           O    4.12.0-rc2+ #30
 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
 Call Trace:
  dump_stack+0x5f/0x92
  print_usage_bug+0x1d3/0x1dd
  ? check_usage_backwards+0xe0/0xe0
  mark_lock+0x23d/0x280
  __lock_acquire+0x699/0x7b0
  ? __this_cpu_preempt_check+0xf/0x20
  ? trace_hardirqs_off_caller+0x91/0xe0
  lock_acquire+0xae/0x220
  ? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
  _raw_spin_lock+0x42/0x50
  ? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
  f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
  f2fs_write_end_io+0x147/0x150 [f2fs]
  bio_endio+0x7a/0x1e0
  blk_update_request+0xad/0x410
  blk_mq_end_request+0x16/0x60
  lo_complete_rq+0x3c/0x70
  __blk_mq_complete_request_remote+0x11/0x20
  flush_smp_call_function_queue+0x6d/0x120
  ? debug_smp_processor_id+0x12/0x20
  generic_smp_call_function_single_interrupt+0x12/0x30
  smp_call_function_single_interrupt+0x25/0x40
  call_function_single_interrupt+0x37/0x3c
 EIP: _raw_spin_unlock_irq+0x2d/0x50
 EFLAGS: 00000296 CPU: 2
 EAX: 00000001 EBX: d2ccc51c ECX: 00000001 EDX: c1aacebd
 ESI: 00000000 EDI: 00000000 EBP: c96c9d1c ESP: c96c9d18
  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
  ? inherit_task_group.isra.98.part.99+0x6b/0xb0
  __add_to_page_cache_locked+0x1d4/0x290
  add_to_page_cache_lru+0x38/0xb0
  pagecache_get_page+0x8e/0x200
  f2fs_write_begin+0x96/0xf00 [f2fs]
  ? trace_hardirqs_on_caller+0xdd/0x1c0
  ? current_time+0x17/0x50
  ? trace_hardirqs_on+0xb/0x10
  generic_perform_write+0xa9/0x170
  __generic_file_write_iter+0x1a2/0x1f0
  ? f2fs_preallocate_blocks+0x137/0x160 [f2fs]
  f2fs_file_write_iter+0x6e/0x140 [f2fs]
  ? __lock_acquire+0x429/0x7b0
  __vfs_write+0xc1/0x140
  vfs_write+0x9b/0x190
  SyS_pwrite64+0x63/0xa0
  do_fast_syscall_32+0xa1/0x1b0
  entry_SYSENTER_32+0x4c/0x7b
 EIP: 0xb7786c61
 EFLAGS: 00000293 CPU: 2
 EAX: ffffffda EBX: 00000003 ECX: 08416000 EDX: 00001000
 ESI: 18b24000 EDI: 00000000 EBP: 00000003 ESP: bf9b36b0
  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b

Fixes: aaec2b1 ("f2fs: introduce cp_lock to protect updating of ckpt_flags")
Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
mschlaeffer pushed a commit to mschlaeffer/ubuntu-kernel-with-tuxonice that referenced this issue Jul 5, 2018
BugLink: https://bugs.launchpad.net/bugs/1775477

[ Upstream commit a8d7aa1 ]

syzbot reported a crash in tasklet_action_common() caused by dccp.

dccp needs to make sure socket wont disappear before tasklet handler
has completed.

This patch takes a reference on the socket when arming the tasklet,
and moves the sock_put() from dccp_write_xmit_timer() to dccp_write_xmitlet()

kernel BUG at kernel/softirq.c:514!
invalid opcode: 0000 [NigelCunningham#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 17 Comm: ksoftirqd/1 Not tainted 4.17.0-rc3+ NigelCunningham#30
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:tasklet_action_common.isra.19+0x6db/0x700 kernel/softirq.c:515
RSP: 0018:ffff8801d9b3faf8 EFLAGS: 00010246
dccp_close: ABORT with 65423 bytes unread
RAX: 1ffff1003b367f6b RBX: ffff8801daf1f3f0 RCX: 0000000000000000
RDX: ffff8801cf895498 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffff8801d9b3fc40 R08: ffffed0039f12a95 R09: ffffed0039f12a94
dccp_close: ABORT with 65423 bytes unread
R10: ffffed0039f12a94 R11: ffff8801cf8954a3 R12: 0000000000000000
R13: ffff8801d9b3fc18 R14: dffffc0000000000 R15: ffff8801cf895490
FS:  0000000000000000(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2bc28000 CR3: 00000001a08a9000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 tasklet_action+0x1d/0x20 kernel/softirq.c:533
 __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
dccp_close: ABORT with 65423 bytes unread
 run_ksoftirqd+0x86/0x100 kernel/softirq.c:646
 smpboot_thread_fn+0x417/0x870 kernel/smpboot.c:164
 kthread+0x345/0x410 kernel/kthread.c:238
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
Code: 48 8b 85 e8 fe ff ff 48 8b 95 f0 fe ff ff e9 94 fb ff ff 48 89 95 f0 fe ff ff e8 81 53 6e 00 48 8b 95 f0 fe ff ff e9 62 fb ff ff <0f> 0b 48 89 cf 48 89 8d e8 fe ff ff e8 64 53 6e 00 48 8b 8d e8
RIP: tasklet_action_common.isra.19+0x6db/0x700 kernel/softirq.c:515 RSP: ffff8801d9b3faf8

Fixes: dc841e3 ("dccp: Extend CCID packet dequeueing interface")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Cc: Gerrit Renker <[email protected]>
Cc: [email protected]
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Stefan Bader <[email protected]>
Acked-by: Kleber Sacilotto de Souza <[email protected]>
Signed-off-by: Khalid Elmously <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant