Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revert hack. #1

Conversation

xfguo
Copy link

@xfguo xfguo commented Oct 25, 2018

OF_PCI -> OF + PCI

@atishp04
Copy link
Owner

@xfguo That branch was just for my testing and I had some stupid hacks (like the one you mentioned)

The fedora working branch is 4.17_fedora_boot_success_sep11

Here is the actual fixes for microsemi pcie driver. Let me know if you see any issues with this branch.
1fd6c8f

@xfguo
Copy link
Author

xfguo commented Nov 2, 2018

Yes, I am using 4.19 now and I will use your 4.19_fedora_success_sep11 branch.

Thanks

@xfguo xfguo closed this Nov 2, 2018
atishp04 pushed a commit that referenced this pull request Nov 29, 2018
After commit 3c83dd5 ("wlcore: Add support for optional
wakeirq") landed upstream, I started seeing the following oops
on my HiKey board:

[    1.870279] Unable to handle kernel read from unreadable memory at virtual address 0000000000000010
[    1.870283] Mem abort info:
[    1.870287]   ESR = 0x96000005
[    1.870292]   Exception class = DABT (current EL), IL = 32 bits
[    1.870296]   SET = 0, FnV = 0
[    1.870299]   EA = 0, S1PTW = 0
[    1.870302] Data abort info:
[    1.870306]   ISV = 0, ISS = 0x00000005
[    1.870309]   CM = 0, WnR = 0
[    1.870312] [0000000000000010] user address but active_mm is swapper
[    1.870318] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[    1.870327] CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 4.19.0-05129-gb3d1e8e riscvarchive#48
[    1.870331] Hardware name: HiKey Development Board (DT)
[    1.870350] Workqueue: events_freezable mmc_rescan
[    1.870358] pstate: 60400005 (nZCv daif +PAN -UAO)
[    1.870366] pc : wl1271_probe+0x210/0x350
[    1.870371] lr : wl1271_probe+0x210/0x350
[    1.870374] sp : ffffff80080739b0
[    1.870377] x29: ffffff80080739b0 x28: 0000000000000000
[    1.870384] x27: 0000000000000000 x26: 0000000000000000
[    1.870391] x25: 0000000000000036 x24: ffffffc074ecb598
[    1.870398] x23: ffffffc07ffdce78 x22: ffffffc0744ed808
[    1.870404] x21: ffffffc074ecbb98 x20: ffffff8008ff9000
[    1.870411] x19: ffffffc0744ed800 x18: ffffff8008ff9a48
[    1.870418] x17: 0000000000000000 x16: 0000000000000000
[    1.870425] x15: ffffffc074ecb503 x14: ffffffffffffffff
[    1.870431] x13: ffffffc074ecb502 x12: 0000000000000030
[    1.870438] x11: 0101010101010101 x10: 0000000000000040
[    1.870444] x9 : ffffffc075400248 x8 : ffffffc075400270
[    1.870451] x7 : 0000000000000000 x6 : 0000000000000000
[    1.870457] x5 : 0000000000000000 x4 : 0000000000000000
[    1.870463] x3 : 0000000000000000 x2 : 0000000000000000
[    1.870469] x1 : 0000000000000028 x0 : 0000000000000000
[    1.870477] Process kworker/0:0 (pid: 5, stack limit = 0x(____ptrval____))
[    1.870480] Call trace:
[    1.870485]  wl1271_probe+0x210/0x350
[    1.870491]  sdio_bus_probe+0x100/0x128
[    1.870500]  really_probe+0x1a8/0x2b8
[    1.870506]  driver_probe_device+0x58/0x100
[    1.870511]  __device_attach_driver+0x94/0xd8
[    1.870517]  bus_for_each_drv+0x70/0xc8
[    1.870522]  __device_attach+0xe0/0x140
[    1.870527]  device_initial_probe+0x10/0x18
[    1.870532]  bus_probe_device+0x94/0xa0
[    1.870537]  device_add+0x374/0x5b8
[    1.870542]  sdio_add_func+0x60/0x88
[    1.870546]  mmc_attach_sdio+0x1b0/0x358
[    1.870551]  mmc_rescan+0x2cc/0x390
[    1.870558]  process_one_work+0x12c/0x320
[    1.870563]  worker_thread+0x48/0x458
[    1.870569]  kthread+0xf8/0x128
[    1.870575]  ret_from_fork+0x10/0x18
[    1.870583] Code: 92400c21 b2760021 a90687a2 97e95bf9 (f9400803)
[    1.870587] ---[ end trace 1e15f81d3c139ca9 ]---

It seems since we don't have a wakeirq value in the dts, the wakeirq
value in wl1271_probe() is zero, which then causes trouble in
irqd_get_trigger_type(irq_get_irq_data(wakeirq)).

This patch tries to address this by checking if wakeirq is zero,
and not trying to add it to the resources if that is the case.

Fixes: 3c83dd5 ("wlcore: Add support for optional wakeirq")
Cc: Tony Lindgren <[email protected]>
Cc: Kalle Valo <[email protected]>
Cc: Eyal Reizer <[email protected]>
Cc: Anders Roxell <[email protected]>
Cc: [email protected]
Acked-by: Tony Lindgren <[email protected]>
Signed-off-by: John Stultz <[email protected]>
Tested-by: Anders Roxell <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
atishp04 pushed a commit that referenced this pull request Nov 29, 2018
Following on from this patch: https://lkml.org/lkml/2017/11/3/516,
Corsair K70 LUX RGB keyboards also require the DELAY_INIT quirk to
start correctly at boot.

Dmesg output:
usb 1-6: string descriptor 0 read error: -110
usb 1-6: New USB device found, idVendor=1b1c, idProduct=1b33
usb 1-6: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 1-6: can't set config #1, error -110

Signed-off-by: Emmanuel Pescosta <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
atishp04 pushed a commit that referenced this pull request Nov 29, 2018
Raydium USB touchscreen fails to set config if LPM is enabled:
[    2.030658] usb 1-8: New USB device found, idVendor=2386, idProduct=3119
[    2.030659] usb 1-8: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[    2.030660] usb 1-8: Product: Raydium Touch System
[    2.030661] usb 1-8: Manufacturer: Raydium Corporation
[    7.132209] usb 1-8: can't set config #1, error -110

Same behavior can be observed on 2386:3114.

Raydium claims the touchscreen supports LPM under Windows, so I used
Microsoft USB Test Tools (MUTT) [1] to check its LPM status. MUTT shows
that the LPM doesn't work under Windows, either. So let's just disable LPM
for Raydium touchscreens.

[1] https://docs.microsoft.com/en-us/windows-hardware/drivers/usbcon/usb-test-tools

Signed-off-by: Kai-Heng Feng <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
atishp04 pushed a commit that referenced this pull request Nov 29, 2018
We get the following warning:

[   47.926140] 32-bit node address hash set to 2010a0a
[   47.927202]
[   47.927433] ================================
[   47.928050] WARNING: inconsistent lock state
[   47.928661] 4.19.0+ riscvarchive#37 Tainted: G            E
[   47.929346] --------------------------------
[   47.929954] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[   47.930116] swapper/3/0 [HC0[0]:SC1[3]:HE1:SE0] takes:
[   47.930116] 00000000af8bc31e (&(&ht->lock)->rlock){+.?.}, at: rhashtable_walk_enter+0x36/0xb0
[   47.930116] {SOFTIRQ-ON-W} state was registered at:
[   47.930116]   _raw_spin_lock+0x29/0x60
[   47.930116]   rht_deferred_worker+0x556/0x810
[   47.930116]   process_one_work+0x1f5/0x540
[   47.930116]   worker_thread+0x64/0x3e0
[   47.930116]   kthread+0x112/0x150
[   47.930116]   ret_from_fork+0x3a/0x50
[   47.930116] irq event stamp: 14044
[   47.930116] hardirqs last  enabled at (14044): [<ffffffff9a07fbba>] __local_bh_enable_ip+0x7a/0xf0
[   47.938117] hardirqs last disabled at (14043): [<ffffffff9a07fb81>] __local_bh_enable_ip+0x41/0xf0
[   47.938117] softirqs last  enabled at (14028): [<ffffffff9a0803ee>] irq_enter+0x5e/0x60
[   47.938117] softirqs last disabled at (14029): [<ffffffff9a0804a5>] irq_exit+0xb5/0xc0
[   47.938117]
[   47.938117] other info that might help us debug this:
[   47.938117]  Possible unsafe locking scenario:
[   47.938117]
[   47.938117]        CPU0
[   47.938117]        ----
[   47.938117]   lock(&(&ht->lock)->rlock);
[   47.938117]   <Interrupt>
[   47.938117]     lock(&(&ht->lock)->rlock);
[   47.938117]
[   47.938117]  *** DEADLOCK ***
[   47.938117]
[   47.938117] 2 locks held by swapper/3/0:
[   47.938117]  #0: 0000000062c64f90 ((&d->timer)){+.-.}, at: call_timer_fn+0x5/0x280
[   47.938117]  #1: 00000000ee39619c (&(&d->lock)->rlock){+.-.}, at: tipc_disc_timeout+0xc8/0x540 [tipc]
[   47.938117]
[   47.938117] stack backtrace:
[   47.938117] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G            E     4.19.0+ riscvarchive#37
[   47.938117] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   47.938117] Call Trace:
[   47.938117]  <IRQ>
[   47.938117]  dump_stack+0x5e/0x8b
[   47.938117]  print_usage_bug+0x1ed/0x1ff
[   47.938117]  mark_lock+0x5b5/0x630
[   47.938117]  __lock_acquire+0x4c0/0x18f0
[   47.938117]  ? lock_acquire+0xa6/0x180
[   47.938117]  lock_acquire+0xa6/0x180
[   47.938117]  ? rhashtable_walk_enter+0x36/0xb0
[   47.938117]  _raw_spin_lock+0x29/0x60
[   47.938117]  ? rhashtable_walk_enter+0x36/0xb0
[   47.938117]  rhashtable_walk_enter+0x36/0xb0
[   47.938117]  tipc_sk_reinit+0xb0/0x410 [tipc]
[   47.938117]  ? mark_held_locks+0x6f/0x90
[   47.938117]  ? __local_bh_enable_ip+0x7a/0xf0
[   47.938117]  ? lockdep_hardirqs_on+0x20/0x1a0
[   47.938117]  tipc_net_finalize+0xbf/0x180 [tipc]
[   47.938117]  tipc_disc_timeout+0x509/0x540 [tipc]
[   47.938117]  ? call_timer_fn+0x5/0x280
[   47.938117]  ? tipc_disc_msg_xmit.isra.19+0xa0/0xa0 [tipc]
[   47.938117]  ? tipc_disc_msg_xmit.isra.19+0xa0/0xa0 [tipc]
[   47.938117]  call_timer_fn+0xa1/0x280
[   47.938117]  ? tipc_disc_msg_xmit.isra.19+0xa0/0xa0 [tipc]
[   47.938117]  run_timer_softirq+0x1f2/0x4d0
[   47.938117]  __do_softirq+0xfc/0x413
[   47.938117]  irq_exit+0xb5/0xc0
[   47.938117]  smp_apic_timer_interrupt+0xac/0x210
[   47.938117]  apic_timer_interrupt+0xf/0x20
[   47.938117]  </IRQ>
[   47.938117] RIP: 0010:default_idle+0x1c/0x140
[   47.938117] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 53 65 8b 2d d8 2b 74 65 0f 1f 44 00 00 e8 c6 2c 8b ff fb f4 <65> 8b 2d c5 2b 74 65 0f 1f 44 00 00 5b 5d 41 5c c3 65 8b 05 b4 2b
[   47.938117] RSP: 0018:ffffaf6ac0207ec8 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
[   47.938117] RAX: ffff8f5b3735e200 RBX: 0000000000000003 RCX: 0000000000000001
[   47.938117] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8f5b3735e200
[   47.938117] RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000000
[   47.938117] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[   47.938117] R13: 0000000000000000 R14: ffff8f5b3735e200 R15: ffff8f5b3735e200
[   47.938117]  ? default_idle+0x1a/0x140
[   47.938117]  do_idle+0x1bc/0x280
[   47.938117]  cpu_startup_entry+0x19/0x20
[   47.938117]  start_secondary+0x187/0x1c0
[   47.938117]  secondary_startup_64+0xa4/0xb0

The reason seems to be that tipc_net_finalize()->tipc_sk_reinit() is
calling the function rhashtable_walk_enter() within a timer interrupt.
We fix this by executing tipc_net_finalize() in work queue context.

Acked-by: Ying Xue <[email protected]>
Signed-off-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
atishp04 pushed a commit that referenced this pull request Nov 29, 2018
The current Cadence QSPI driver caused a kernel panic sporadically
when writing to QSPI. The problem was caused by writing more bytes
than needed because the QSPI operated on 4 bytes at a time.
<snip>
[   11.202044] Unable to handle kernel paging request at virtual address bffd3000
[   11.209254] pgd = e463054d
[   11.211948] [bffd3000] *pgd=2fffb811, *pte=00000000, *ppte=00000000
[   11.218202] Internal error: Oops: 7 [#1] SMP ARM
[   11.222797] Modules linked in:
[   11.225844] CPU: 1 PID: 1317 Comm: systemd-hwdb Not tainted 4.17.7-d0c45cd44a8f
[   11.235796] Hardware name: Altera SOCFPGA Arria10
[   11.240487] PC is at __raw_writesl+0x70/0xd4
[   11.244741] LR is at cqspi_write+0x1a0/0x2cc
</snip>
On a page boundary limit the number of bytes copied from the tx buffer
to remain within the page.

This patch uses a temporary buffer to hold the 4 bytes to write and then
copies only the bytes required from the tx buffer.

Reported-by: Adrian Amborzewicz <[email protected]>
Signed-off-by: Thor Thayer <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>
atishp04 pushed a commit that referenced this pull request Dec 12, 2018
nf_conncount_tuple is an element of nft_connlimit and that is deleted by
conn_free(). Elements can be deleted by both GC routine and data path
functions (nf_conncount_lookup, nf_conncount_add) and they call
conn_free() to free elements. But conn_free() only protects lists, not
each element. So that list_del corruption could occurred.

The conn_free() doesn't check whether element is already deleted. In
order to protect elements, dead flag is added. If an element is deleted,
dead flag is set. The only conn_free() can delete elements so that both
list lock and dead flag are enough to protect it.

test commands:
   %nft add table ip filter
   %nft add chain ip filter input { type filter hook input priority 0\; }
   %nft add rule filter input meter test { ip id ct count over 2 } counter

splat looks like:
[ 1779.495778] list_del corruption, ffff8800b6e12008->prev is LIST_POISON2 (dead000000000200)
[ 1779.505453] ------------[ cut here ]------------
[ 1779.506260] kernel BUG at lib/list_debug.c:50!
[ 1779.515831] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 1779.516772] CPU: 0 PID: 33 Comm: kworker/0:2 Not tainted 4.19.0-rc6+ riscvarchive#22
[ 1779.516772] Workqueue: events_power_efficient nft_rhash_gc [nf_tables_set]
[ 1779.516772] RIP: 0010:__list_del_entry_valid+0xd8/0x150
[ 1779.516772] Code: 39 48 83 c4 08 b8 01 00 00 00 5b 5d c3 48 89 ea 48 c7 c7 00 c3 5b 98 e8 0f dc 40 ff 0f 0b 48 c7 c7 60 c3 5b 98 e8 01 dc 40 ff <0f> 0b 48 c7 c7 c0 c3 5b 98 e8 f3 db 40 ff 0f 0b 48 c7 c7 20 c4 5b
[ 1779.516772] RSP: 0018:ffff880119127420 EFLAGS: 00010286
[ 1779.516772] RAX: 000000000000004e RBX: dead000000000200 RCX: 0000000000000000
[ 1779.516772] RDX: 000000000000004e RSI: 0000000000000008 RDI: ffffed0023224e7a
[ 1779.516772] RBP: ffff88011934bc10 R08: ffffed002367cea9 R09: ffffed002367cea9
[ 1779.516772] R10: 0000000000000001 R11: ffffed002367cea8 R12: ffff8800b6e12008
[ 1779.516772] R13: ffff8800b6e12010 R14: ffff88011934bc20 R15: ffff8800b6e12008
[ 1779.516772] FS:  0000000000000000(0000) GS:ffff88011b200000(0000) knlGS:0000000000000000
[ 1779.516772] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1779.516772] CR2: 00007fc876534010 CR3: 000000010da16000 CR4: 00000000001006f0
[ 1779.516772] Call Trace:
[ 1779.516772]  conn_free+0x9f/0x2b0 [nf_conncount]
[ 1779.516772]  ? nf_ct_tmpl_alloc+0x2a0/0x2a0 [nf_conntrack]
[ 1779.516772]  ? nf_conncount_add+0x520/0x520 [nf_conncount]
[ 1779.516772]  ? do_raw_spin_trylock+0x1a0/0x1a0
[ 1779.516772]  ? do_raw_spin_trylock+0x10/0x1a0
[ 1779.516772]  find_or_evict+0xe5/0x150 [nf_conncount]
[ 1779.516772]  nf_conncount_gc_list+0x162/0x360 [nf_conncount]
[ 1779.516772]  ? nf_conncount_lookup+0xee0/0xee0 [nf_conncount]
[ 1779.516772]  ? _raw_spin_unlock_irqrestore+0x45/0x50
[ 1779.516772]  ? trace_hardirqs_off+0x6b/0x220
[ 1779.516772]  ? trace_hardirqs_on_caller+0x220/0x220
[ 1779.516772]  nft_rhash_gc+0x16b/0x540 [nf_tables_set]
[ ... ]

Fixes: 5c789e1 ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search")
Signed-off-by: Taehee Yoo <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
atishp04 pushed a commit that referenced this pull request Dec 12, 2018
Adam reported a record command crash for simple session like:

  $ perf record -e cpu-clock ls

with following backtrace:

  Program received signal SIGSEGV, Segmentation fault.
  3543            ev = event_update_event__new(size + 1, PERF_EVENT_UPDATE__UNIT, evsel->id[0]);
  (gdb) bt
  #0  perf_event__synthesize_event_update_unit
  #1  0x000000000051e469 in perf_event__synthesize_extra_attr
  riscvarchive#2  0x00000000004445cb in record__synthesize
  riscvarchive#3  0x0000000000444bc5 in __cmd_record
  ...

We synthesize an update event that needs to touch the evsel id array,
which is not defined at that time. Fix this by forcing the id allocation
for events with their unit defined.

Reflecting possible read_format ID bit in the attr tests.

Reported-by: Yongxin Liu <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Adam Lee <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201477
Fixes: bfd8f72 ("perf record: Synthesize unit/scale/... in event update")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
atishp04 pushed a commit that referenced this pull request Dec 12, 2018
Naresh reported an issue with the non-atomic memory allocation of
cgroup local storage buffers:

[   73.047526] BUG: sleeping function called from invalid context at
/srv/oe/build/tmp-rpb-glibc/work-shared/intel-corei7-64/kernel-source/mm/slab.h:421
[   73.060915] in_atomic(): 1, irqs_disabled(): 0, pid: 3157, name: test_cgroup_sto
[   73.068342] INFO: lockdep is turned off.
[   73.072293] CPU: 2 PID: 3157 Comm: test_cgroup_sto Not tainted
4.20.0-rc2-next-20181113 #1
[   73.080548] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.0b 07/27/2017
[   73.088018] Call Trace:
[   73.090463]  dump_stack+0x70/0xa5
[   73.093783]  ___might_sleep+0x152/0x240
[   73.097619]  __might_sleep+0x4a/0x80
[   73.101191]  __kmalloc_node+0x1cf/0x2f0
[   73.105031]  ? cgroup_storage_update_elem+0x46/0x90
[   73.109909]  cgroup_storage_update_elem+0x46/0x90

cgroup_storage_update_elem() (as well as other update map update
callbacks) is called with disabled preemption, so GFP_ATOMIC
allocation should be used: e.g. alloc_htab_elem() in hashtab.c.

Reported-by: Naresh Kamboju <[email protected]>
Tested-by: Naresh Kamboju <[email protected]>
Signed-off-by: Roman Gushchin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
atishp04 pushed a commit that referenced this pull request Dec 12, 2018
When drm_new_set_master() fails, set is_master to 0, to prevent a
possible NULL pointer deref.

Here is a problematic flow: we check is_master in drm_is_current_master(),
then proceed to call drm_lease_owner() passing master. If we do not restore
is_master status when drm_new_set_master() fails, we may have a situation
in which is_master will be 1 and master itself, NULL, leading to the deref
of a NULL pointer in drm_lease_owner().

This fixes the following OOPS, observed on an ArchLinux running a 4.19.2
kernel:

[   97.804282] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
[   97.807224] PGD 0 P4D 0
[   97.807224] Oops: 0000 [#1] PREEMPT SMP NOPTI
[   97.807224] CPU: 0 PID: 1348 Comm: xfwm4 Tainted: P           OE     4.19.2-arch1-1-ARCH #1
[   97.807224] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./AB350 Pro4, BIOS P5.10 10/16/2018
[   97.807224] RIP: 0010:drm_lease_owner+0xd/0x20 [drm]
[   97.807224] Code: 83 c4 18 5b 5d c3 b8 ea ff ff ff eb e2 b8 ed ff ff ff eb db e8 b4 ca 68 fb 0f 1f 40 00 0f 1f 44 00 00 48 89 f8 eb 03 48 89 d0 <48> 8b 90 80 00 00 00 48 85 d2 75 f1 c3 66 0f 1f 44 00 00 0f 1f 44
[   97.807224] RSP: 0018:ffffb8cf08e07bb0 EFLAGS: 00010202
[   97.807224] RAX: 0000000000000000 RBX: ffff9cf0f2586c00 RCX: ffff9cf0f2586c88
[   97.807224] RDX: ffff9cf0ddbd8000 RSI: 0000000000000000 RDI: 0000000000000000
[   97.807224] RBP: ffff9cf1040e9800 R08: 0000000000000000 R09: 0000000000000000
[   97.807224] R10: ffffdeb30fd5d680 R11: ffffdeb30f5d6808 R12: ffff9cf1040e9888
[   97.807224] R13: 0000000000000000 R14: dead000000000200 R15: ffff9cf0f2586cc8
[   97.807224] FS:  00007f4145513180(0000) GS:ffff9cf10ea00000(0000) knlGS:0000000000000000
[   97.807224] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   97.807224] CR2: 0000000000000080 CR3: 00000003d7548000 CR4: 00000000003406f0
[   97.807224] Call Trace:
[   97.807224]  drm_is_current_master+0x1a/0x30 [drm]
[   97.807224]  drm_master_release+0x3e/0x130 [drm]
[   97.807224]  drm_file_free.part.0+0x2be/0x2d0 [drm]
[   97.807224]  drm_open+0x1ba/0x1e0 [drm]
[   97.807224]  drm_stub_open+0xaf/0xe0 [drm]
[   97.807224]  chrdev_open+0xa3/0x1b0
[   97.807224]  ? cdev_put.part.0+0x20/0x20
[   97.807224]  do_dentry_open+0x132/0x340
[   97.807224]  path_openat+0x2d1/0x14e0
[   97.807224]  ? mem_cgroup_commit_charge+0x7a/0x520
[   97.807224]  do_filp_open+0x93/0x100
[   97.807224]  ? __check_object_size+0x102/0x189
[   97.807224]  ? _raw_spin_unlock+0x16/0x30
[   97.807224]  do_sys_open+0x186/0x210
[   97.807224]  do_syscall_64+0x5b/0x170
[   97.807224]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   97.807224] RIP: 0033:0x7f4147b07976
[   97.807224] Code: 89 54 24 08 e8 7b f4 ff ff 8b 74 24 0c 48 8b 3c 24 41 89 c0 44 8b 54 24 08 b8 01 01 00 00 89 f2 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 77 30 44 89 c7 89 44 24 08 e8 a6 f4 ff ff 8b 44
[   97.807224] RSP: 002b:00007ffcced96ca0 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
[   97.807224] RAX: ffffffffffffffda RBX: 00005619d5037f80 RCX: 00007f4147b07976
[   97.807224] RDX: 0000000000000002 RSI: 00005619d46b969c RDI: 00000000ffffff9c
[   98.040039] RBP: 0000000000000024 R08: 0000000000000000 R09: 0000000000000000
[   98.040039] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000024
[   98.040039] R13: 0000000000000012 R14: 00005619d5035950 R15: 0000000000000012
[   98.040039] Modules linked in: nct6775 hwmon_vid algif_skcipher af_alg nls_iso8859_1 nls_cp437 vfat fat uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common arc4 videodev media snd_usb_audio snd_hda_codec_hdmi snd_usbmidi_lib snd_rawmidi snd_seq_device mousedev input_leds iwlmvm mac80211 snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec edac_mce_amd kvm_amd snd_hda_core kvm iwlwifi snd_hwdep r8169 wmi_bmof cfg80211 snd_pcm irqbypass snd_timer snd libphy soundcore pinctrl_amd rfkill pcspkr sp5100_tco evdev gpio_amdpt k10temp mac_hid i2c_piix4 wmi pcc_cpufreq acpi_cpufreq vboxnetflt(OE) vboxnetadp(OE) vboxpci(OE) vboxdrv(OE) msr sg crypto_user ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 fscrypto uas usb_storage dm_crypt hid_generic usbhid hid
[   98.040039]  dm_mod raid1 md_mod sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc ahci libahci aesni_intel aes_x86_64 libata crypto_simd cryptd glue_helper ccp xhci_pci rng_core scsi_mod xhci_hcd nvidia_drm(POE) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm agpgart nvidia_uvm(POE) nvidia_modeset(POE) nvidia(POE) ipmi_devintf ipmi_msghandler
[   98.040039] CR2: 0000000000000080
[   98.040039] ---[ end trace 3b65093b6fe62b2f ]---
[   98.040039] RIP: 0010:drm_lease_owner+0xd/0x20 [drm]
[   98.040039] Code: 83 c4 18 5b 5d c3 b8 ea ff ff ff eb e2 b8 ed ff ff ff eb db e8 b4 ca 68 fb 0f 1f 40 00 0f 1f 44 00 00 48 89 f8 eb 03 48 89 d0 <48> 8b 90 80 00 00 00 48 85 d2 75 f1 c3 66 0f 1f 44 00 00 0f 1f 44
[   98.040039] RSP: 0018:ffffb8cf08e07bb0 EFLAGS: 00010202
[   98.040039] RAX: 0000000000000000 RBX: ffff9cf0f2586c00 RCX: ffff9cf0f2586c88
[   98.040039] RDX: ffff9cf0ddbd8000 RSI: 0000000000000000 RDI: 0000000000000000
[   98.040039] RBP: ffff9cf1040e9800 R08: 0000000000000000 R09: 0000000000000000
[   98.040039] R10: ffffdeb30fd5d680 R11: ffffdeb30f5d6808 R12: ffff9cf1040e9888
[   98.040039] R13: 0000000000000000 R14: dead000000000200 R15: ffff9cf0f2586cc8
[   98.040039] FS:  00007f4145513180(0000) GS:ffff9cf10ea00000(0000) knlGS:0000000000000000
[   98.040039] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   98.040039] CR2: 0000000000000080 CR3: 00000003d7548000 CR4: 00000000003406f0

Signed-off-by: Sergio Correia <[email protected]>
Cc: [email protected]
Signed-off-by: Daniel Vetter <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Sean Paul <[email protected]>
atishp04 pushed a commit that referenced this pull request Dec 12, 2018
coprocessor_flush_all may be called from a context of a thread that is
different from the thread being flushed. In that case contents of the
cpenable special register may not match ti->cpenable of the target
thread, resulting in unhandled coprocessor exception in the kernel
context.
Set cpenable special register to the ti->cpenable of the target register
for the duration of the flush and restore it afterwards.
This fixes the following crash caused by coprocessor register inspection
in native gdb:

  (gdb) p/x $w0
  Illegal instruction in kernel: sig: 9 [#1] PREEMPT
  Call Trace:
    ___might_sleep+0x184/0x1a4
    __might_sleep+0x41/0xac
    exit_signals+0x14/0x218
    do_exit+0xc9/0x8b8
    die+0x99/0xa0
    do_illegal_instruction+0x18/0x6c
    common_exception+0x77/0x77
    coprocessor_flush+0x16/0x3c
    arch_ptrace+0x46c/0x674
    sys_ptrace+0x2ce/0x3b4
    system_call+0x54/0x80
    common_exception+0x77/0x77
  note: gdb[100] exited with preempt_count 1
  Killed

Cc: [email protected]
Signed-off-by: Max Filippov <[email protected]>
atishp04 pushed a commit that referenced this pull request Dec 12, 2018
Reported by syzkaller:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000014
 PGD 800000040410c067 P4D 800000040410c067 PUD 40410d067 PMD 0
 Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 3 PID: 2567 Comm: poc Tainted: G           OE     4.19.0-rc5 riscvarchive#16
 RIP: 0010:kvm_pv_send_ipi+0x94/0x350 [kvm]
 Call Trace:
  kvm_emulate_hypercall+0x3cc/0x700 [kvm]
  handle_vmcall+0xe/0x10 [kvm_intel]
  vmx_handle_exit+0xc1/0x11b0 [kvm_intel]
  vcpu_enter_guest+0x9fb/0x1910 [kvm]
  kvm_arch_vcpu_ioctl_run+0x35c/0x610 [kvm]
  kvm_vcpu_ioctl+0x3e9/0x6d0 [kvm]
  do_vfs_ioctl+0xa5/0x690
  ksys_ioctl+0x6d/0x80
  __x64_sys_ioctl+0x1a/0x20
  do_syscall_64+0x83/0x6e0
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The reason is that the apic map has not yet been initialized, the testcase
triggers pv_send_ipi interface by vmcall which results in kvm->arch.apic_map
is dereferenced. This patch fixes it by checking whether or not apic map is
NULL and bailing out immediately if that is the case.

Fixes: 4180bf1 (KVM: X86: Implement "send IPI" hypercall)
Reported-by: Wei Wu <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Wei Wu <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
atishp04 pushed a commit that referenced this pull request Dec 12, 2018
Reported by syzkaller:

 BUG: unable to handle kernel NULL pointer dereference at 00000000000001c8
 PGD 80000003ec4da067 P4D 80000003ec4da067 PUD 3f7bfa067 PMD 0
 Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 7 PID: 5059 Comm: debug Tainted: G           OE     4.19.0-rc5 riscvarchive#16
 RIP: 0010:__lock_acquire+0x1a6/0x1990
 Call Trace:
  lock_acquire+0xdb/0x210
  _raw_spin_lock+0x38/0x70
  kvm_ioapic_scan_entry+0x3e/0x110 [kvm]
  vcpu_enter_guest+0x167e/0x1910 [kvm]
  kvm_arch_vcpu_ioctl_run+0x35c/0x610 [kvm]
  kvm_vcpu_ioctl+0x3e9/0x6d0 [kvm]
  do_vfs_ioctl+0xa5/0x690
  ksys_ioctl+0x6d/0x80
  __x64_sys_ioctl+0x1a/0x20
  do_syscall_64+0x83/0x6e0
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The reason is that the testcase writes hyperv synic HV_X64_MSR_SINT6 msr
and triggers scan ioapic logic to load synic vectors into EOI exit bitmap.
However, irqchip is not initialized by this simple testcase, ioapic/apic
objects should not be accessed.
This can be triggered by the following program:

    #define _GNU_SOURCE

    #include <endian.h>
    #include <stdint.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <sys/syscall.h>
    #include <sys/types.h>
    #include <unistd.h>

    uint64_t r[3] = {0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff};

    int main(void)
    {
    	syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
    	long res = 0;
    	memcpy((void*)0x20000040, "/dev/kvm", 9);
    	res = syscall(__NR_openat, 0xffffffffffffff9c, 0x20000040, 0, 0);
    	if (res != -1)
    		r[0] = res;
    	res = syscall(__NR_ioctl, r[0], 0xae01, 0);
    	if (res != -1)
    		r[1] = res;
    	res = syscall(__NR_ioctl, r[1], 0xae41, 0);
    	if (res != -1)
    		r[2] = res;
    	memcpy(
    			(void*)0x20000080,
    			"\x01\x00\x00\x00\x00\x5b\x61\xbb\x96\x00\x00\x40\x00\x00\x00\x00\x01\x00"
    			"\x08\x00\x00\x00\x00\x00\x0b\x77\xd1\x78\x4d\xd8\x3a\xed\xb1\x5c\x2e\x43"
    			"\xaa\x43\x39\xd6\xff\xf5\xf0\xa8\x98\xf2\x3e\x37\x29\x89\xde\x88\xc6\x33"
    			"\xfc\x2a\xdb\xb7\xe1\x4c\xac\x28\x61\x7b\x9c\xa9\xbc\x0d\xa0\x63\xfe\xfe"
    			"\xe8\x75\xde\xdd\x19\x38\xdc\x34\xf5\xec\x05\xfd\xeb\x5d\xed\x2e\xaf\x22"
    			"\xfa\xab\xb7\xe4\x42\x67\xd0\xaf\x06\x1c\x6a\x35\x67\x10\x55\xcb",
    			106);
    	syscall(__NR_ioctl, r[2], 0x4008ae89, 0x20000080);
    	syscall(__NR_ioctl, r[2], 0xae80, 0);
    	return 0;
    }

This patch fixes it by bailing out scan ioapic if ioapic is not initialized in
kernel.

Reported-by: Wei Wu <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Wei Wu <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
atishp04 pushed a commit that referenced this pull request Dec 12, 2018
When a PCIe NVMe device is not present, nvme_dev_remove_admin() calls
blk_cleanup_queue() on the admin queue, which frees the hctx for that
queue.  Moments later, on the same path nvme_kill_queues() calls
blk_mq_unquiesce_queue() on admin queue and tries to access hctx of it,
which leads to following OOPS:

Oops: 0000 [#1] SMP PTI
RIP: 0010:sbitmap_any_bit_set+0xb/0x40
Call Trace:
 blk_mq_run_hw_queue+0xd5/0x150
 blk_mq_run_hw_queues+0x3a/0x50
 nvme_kill_queues+0x26/0x50
 nvme_remove_namespaces+0xb2/0xc0
 nvme_remove+0x60/0x140
 pci_device_remove+0x3b/0xb0

Fixes: cb4bfda ("nvme-pci: fix hot removal during error handling")
Signed-off-by: Igor Konopko <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
atishp04 pushed a commit that referenced this pull request Dec 12, 2018
The bug is not easily reproducable, as it may occur very infrequently
(we had machines with 20minutes heavy downloading before it occurred)
However, on a virual machine (VMWare on Windows 10 host) it occurred
pretty frequently (1-2 seconds after a speedtest was started)

dev->tx_skb mab be freed via dev_kfree_skb_irq on a callback
before it is set.

This causes the following problems:
- double free of the skb or potential memory leak
- in dmesg: 'recvmsg bug' and 'recvmsg bug 2' and eventually
  general protection fault

Example dmesg output:
[  134.841986] ------------[ cut here ]------------
[  134.841987] recvmsg bug: copied 9C24A555 seq 9C24B557 rcvnxt 9C25A6B3 fl 0
[  134.841993] WARNING: CPU: 7 PID: 2629 at /build/linux-hwe-On9fm7/linux-hwe-4.15.0/net/ipv4/tcp.c:1865 tcp_recvmsg+0x44d/0xab0
[  134.841994] Modules linked in: ipheth(OE) kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd vmw_balloon intel_rapl_perf joydev input_leds serio_raw vmw_vsock_vmci_transport vsock shpchp i2c_piix4 mac_hid binfmt_misc vmw_vmci parport_pc ppdev lp parport autofs4 vmw_pvscsi vmxnet3 hid_generic usbhid hid vmwgfx ttm drm_kms_helper syscopyarea sysfillrect mptspi mptscsih sysimgblt ahci psmouse fb_sys_fops pata_acpi mptbase libahci e1000 drm scsi_transport_spi
[  134.842046] CPU: 7 PID: 2629 Comm: python Tainted: G        W  OE    4.15.0-34-generic riscvarchive#37~16.04.1-Ubuntu
[  134.842046] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[  134.842048] RIP: 0010:tcp_recvmsg+0x44d/0xab0
[  134.842048] RSP: 0018:ffffa6630422bcc8 EFLAGS: 00010286
[  134.842049] RAX: 0000000000000000 RBX: ffff997616f4f200 RCX: 0000000000000006
[  134.842049] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff9976257d6490
[  134.842050] RBP: ffffa6630422bd98 R08: 0000000000000001 R09: 000000000004bba4
[  134.842050] R10: 0000000001e00c6f R11: 000000000004bba4 R12: ffff99760dee3000
[  134.842051] R13: 0000000000000000 R14: ffff99760dee3514 R15: 0000000000000000
[  134.842051] FS:  00007fe332347700(0000) GS:ffff9976257c0000(0000) knlGS:0000000000000000
[  134.842052] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  134.842053] CR2: 0000000001e41000 CR3: 000000020e9b4006 CR4: 00000000003606e0
[  134.842055] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  134.842055] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  134.842057] Call Trace:
[  134.842060]  ? aa_sk_perm+0x53/0x1a0
[  134.842064]  inet_recvmsg+0x51/0xc0
[  134.842066]  sock_recvmsg+0x43/0x50
[  134.842070]  SYSC_recvfrom+0xe4/0x160
[  134.842072]  ? __schedule+0x3de/0x8b0
[  134.842075]  ? ktime_get_ts64+0x4c/0xf0
[  134.842079]  SyS_recvfrom+0xe/0x10
[  134.842082]  do_syscall_64+0x73/0x130
[  134.842086]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  134.842086] RIP: 0033:0x7fe331f5a81d
[  134.842088] RSP: 002b:00007ffe8da98398 EFLAGS: 00000246 ORIG_RAX: 000000000000002d
[  134.842090] RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 00007fe331f5a81d
[  134.842094] RDX: 00000000000003fb RSI: 0000000001e00874 RDI: 0000000000000003
[  134.842095] RBP: 00007fe32f642c70 R08: 0000000000000000 R09: 0000000000000000
[  134.842097] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe332347698
[  134.842099] R13: 0000000001b7e0a0 R14: 0000000001e00874 R15: 0000000000000000
[  134.842103] Code: 24 fd ff ff e9 cc fe ff ff 48 89 d8 41 8b 8c 24 10 05 00 00 44 8b 45 80 48 c7 c7 08 bd 59 8b 48 89 85 68 ff ff ff e8 b3 c4 7d ff <0f> 0b 48 8b 85 68 ff ff ff e9 e9 fe ff ff 41 8b 8c 24 10 05 00
[  134.842126] ---[ end trace b7138fc08c83147f ]---
[  134.842144] general protection fault: 0000 [#1] SMP PTI
[  134.842145] Modules linked in: ipheth(OE) kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd vmw_balloon intel_rapl_perf joydev input_leds serio_raw vmw_vsock_vmci_transport vsock shpchp i2c_piix4 mac_hid binfmt_misc vmw_vmci parport_pc ppdev lp parport autofs4 vmw_pvscsi vmxnet3 hid_generic usbhid hid vmwgfx ttm drm_kms_helper syscopyarea sysfillrect mptspi mptscsih sysimgblt ahci psmouse fb_sys_fops pata_acpi mptbase libahci e1000 drm scsi_transport_spi
[  134.842161] CPU: 7 PID: 2629 Comm: python Tainted: G        W  OE    4.15.0-34-generic riscvarchive#37~16.04.1-Ubuntu
[  134.842162] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[  134.842164] RIP: 0010:tcp_close+0x2c6/0x440
[  134.842165] RSP: 0018:ffffa6630422bde8 EFLAGS: 00010202
[  134.842167] RAX: 0000000000000000 RBX: ffff99760dee3000 RCX: 0000000180400034
[  134.842168] RDX: 5c4afd407207a6c4 RSI: ffffe868495bd300 RDI: ffff997616f4f200
[  134.842169] RBP: ffffa6630422be08 R08: 0000000016f4d401 R09: 0000000180400034
[  134.842169] R10: ffffa6630422bd98 R11: 0000000000000000 R12: 000000000000600c
[  134.842170] R13: 0000000000000000 R14: ffff99760dee30c8 R15: ffff9975bd44fe00
[  134.842171] FS:  00007fe332347700(0000) GS:ffff9976257c0000(0000) knlGS:0000000000000000
[  134.842173] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  134.842174] CR2: 0000000001e41000 CR3: 000000020e9b4006 CR4: 00000000003606e0
[  134.842177] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  134.842178] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  134.842179] Call Trace:
[  134.842181]  inet_release+0x42/0x70
[  134.842183]  __sock_release+0x42/0xb0
[  134.842184]  sock_close+0x15/0x20
[  134.842187]  __fput+0xea/0x220
[  134.842189]  ____fput+0xe/0x10
[  134.842191]  task_work_run+0x8a/0xb0
[  134.842193]  exit_to_usermode_loop+0xc4/0xd0
[  134.842195]  do_syscall_64+0xf4/0x130
[  134.842197]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  134.842197] RIP: 0033:0x7fe331f5a560
[  134.842198] RSP: 002b:00007ffe8da982e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[  134.842200] RAX: 0000000000000000 RBX: 00007fe32f642c70 RCX: 00007fe331f5a560
[  134.842201] RDX: 00000000008f5320 RSI: 0000000001cd4b50 RDI: 0000000000000003
[  134.842202] RBP: 00007fe32f6500f8 R08: 000000000000003c R09: 00000000009343c0
[  134.842203] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe32f6500d0
[  134.842204] R13: 00000000008f5320 R14: 00000000008f5320 R15: 0000000001cd4770
[  134.842205] Code: c8 00 00 00 45 31 e4 49 39 fe 75 4d eb 50 83 ab d8 00 00 00 01 48 8b 17 48 8b 47 08 48 c7 07 00 00 00 00 48 c7 47 08 00 00 00 00 <48> 89 42 08 48 89 10 0f b6 57 34 8b 47 2c 2b 47 28 83 e2 01 80
[  134.842226] RIP: tcp_close+0x2c6/0x440 RSP: ffffa6630422bde8
[  134.842227] ---[ end trace b7138fc08c831480 ]---

The proposed patch eliminates a potential racing condition.
Before, usb_submit_urb was called and _after_ that, the skb was attached
(dev->tx_skb). So, on a callback it was possible, however unlikely that the
skb was freed before it was set. That way (because dev->tx_skb was not set
to NULL after it was freed), it could happen that a skb from a earlier
transmission was freed a second time (and the skb we should have freed did
not get freed at all)

Now we free the skb directly in ipheth_tx(). It is not passed to the
callback anymore, eliminating the posibility of a double free of the same
skb. Depending on the retval of usb_submit_urb() we use dev_kfree_skb_any()
respectively dev_consume_skb_any() to free the skb.

Signed-off-by: Oliver Zweigle <[email protected]>
Signed-off-by: Bernd Eckstein <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
atishp04 pushed a commit that referenced this pull request Dec 12, 2018
Fix a possible NULL pointer dereference in nic_remove routine
removing the nicpf module if nic_probe fails.
The issue can be triggered with the following reproducer:

$rmmod nicvf
$rmmod nicpf

[  521.412008] Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000000000000014
[  521.422777] Mem abort info:
[  521.425561]   ESR = 0x96000004
[  521.428624]   Exception class = DABT (current EL), IL = 32 bits
[  521.434535]   SET = 0, FnV = 0
[  521.437579]   EA = 0, S1PTW = 0
[  521.440730] Data abort info:
[  521.443603]   ISV = 0, ISS = 0x00000004
[  521.447431]   CM = 0, WnR = 0
[  521.450417] user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000072a3da42
[  521.457022] [0000000000000014] pgd=0000000000000000
[  521.461916] Internal error: Oops: 96000004 [#1] SMP
[  521.511801] Hardware name: GIGABYTE H270-T70/MT70-HD0, BIOS T49 02/02/2018
[  521.518664] pstate: 80400005 (Nzcv daif +PAN -UAO)
[  521.523451] pc : nic_remove+0x24/0x88 [nicpf]
[  521.527808] lr : pci_device_remove+0x48/0xd8
[  521.532066] sp : ffff000013433cc0
[  521.535370] x29: ffff000013433cc0 x28: ffff810f6ac50000
[  521.540672] x27: 0000000000000000 x26: 0000000000000000
[  521.545974] x25: 0000000056000000 x24: 0000000000000015
[  521.551274] x23: ffff8007ff89a110 x22: ffff000001667070
[  521.556576] x21: ffff8007ffb170b0 x20: ffff8007ffb17000
[  521.561877] x19: 0000000000000000 x18: 0000000000000025
[  521.567178] x17: 0000000000000000 x16: 000000000000010ffc33ff98 x8 : 0000000000000000
[  521.593683] x7 : 0000000000000000 x6 : 0000000000000001
[  521.598983] x5 : 0000000000000002 x4 : 0000000000000003
[  521.604284] x3 : ffff8007ffb17184 x2 : ffff8007ffb17184
[  521.609585] x1 : ffff000001662118 x0 : ffff000008557be0
[  521.614887] Process rmmod (pid: 1897, stack limit = 0x00000000859535c3)
[  521.621490] Call trace:
[  521.623928]  nic_remove+0x24/0x88 [nicpf]
[  521.627927]  pci_device_remove+0x48/0xd8
[  521.631847]  device_release_driver_internal+0x1b0/0x248
[  521.637062]  driver_detach+0x50/0xc0
[  521.640628]  bus_remove_driver+0x60/0x100
[  521.644627]  driver_unregister+0x34/0x60
[  521.648538]  pci_unregister_driver+0x24/0xd8
[  521.652798]  nic_cleanup_module+0x14/0x111c [nicpf]
[  521.657672]  __arm64_sys_delete_module+0x150/0x218
[  521.662460]  el0_svc_handler+0x94/0x110
[  521.666287]  el0_svc+0x8/0xc
[  521.669160] Code: aa1e03e0 9102c295 d503201f f9404eb3 (b9401660)

Fixes: 4863dea ("net: Adding support for Cavium ThunderX network controller")
Signed-off-by: Lorenzo Bianconi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
atishp04 pushed a commit that referenced this pull request Dec 12, 2018
We see the following lockdep warning:

[ 2284.078521] ======================================================
[ 2284.078604] WARNING: possible circular locking dependency detected
[ 2284.078604] 4.19.0+ riscvarchive#42 Tainted: G            E
[ 2284.078604] ------------------------------------------------------
[ 2284.078604] rmmod/254 is trying to acquire lock:
[ 2284.078604] 00000000acd94e28 ((&n->timer)riscvarchive#2){+.-.}, at: del_timer_sync+0x5/0xa0
[ 2284.078604]
[ 2284.078604] but task is already holding lock:
[ 2284.078604] 00000000f997afc0 (&(&tn->node_list_lock)->rlock){+.-.}, at: tipc_node_stop+0xac/0x190 [tipc]
[ 2284.078604]
[ 2284.078604] which lock already depends on the new lock.
[ 2284.078604]
[ 2284.078604]
[ 2284.078604] the existing dependency chain (in reverse order) is:
[ 2284.078604]
[ 2284.078604] -> #1 (&(&tn->node_list_lock)->rlock){+.-.}:
[ 2284.078604]        tipc_node_timeout+0x20a/0x330 [tipc]
[ 2284.078604]        call_timer_fn+0xa1/0x280
[ 2284.078604]        run_timer_softirq+0x1f2/0x4d0
[ 2284.078604]        __do_softirq+0xfc/0x413
[ 2284.078604]        irq_exit+0xb5/0xc0
[ 2284.078604]        smp_apic_timer_interrupt+0xac/0x210
[ 2284.078604]        apic_timer_interrupt+0xf/0x20
[ 2284.078604]        default_idle+0x1c/0x140
[ 2284.078604]        do_idle+0x1bc/0x280
[ 2284.078604]        cpu_startup_entry+0x19/0x20
[ 2284.078604]        start_secondary+0x187/0x1c0
[ 2284.078604]        secondary_startup_64+0xa4/0xb0
[ 2284.078604]
[ 2284.078604] -> #0 ((&n->timer)riscvarchive#2){+.-.}:
[ 2284.078604]        del_timer_sync+0x34/0xa0
[ 2284.078604]        tipc_node_delete+0x1a/0x40 [tipc]
[ 2284.078604]        tipc_node_stop+0xcb/0x190 [tipc]
[ 2284.078604]        tipc_net_stop+0x154/0x170 [tipc]
[ 2284.078604]        tipc_exit_net+0x16/0x30 [tipc]
[ 2284.078604]        ops_exit_list.isra.8+0x36/0x70
[ 2284.078604]        unregister_pernet_operations+0x87/0xd0
[ 2284.078604]        unregister_pernet_subsys+0x1d/0x30
[ 2284.078604]        tipc_exit+0x11/0x6f2 [tipc]
[ 2284.078604]        __x64_sys_delete_module+0x1df/0x240
[ 2284.078604]        do_syscall_64+0x66/0x460
[ 2284.078604]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 2284.078604]
[ 2284.078604] other info that might help us debug this:
[ 2284.078604]
[ 2284.078604]  Possible unsafe locking scenario:
[ 2284.078604]
[ 2284.078604]        CPU0                    CPU1
[ 2284.078604]        ----                    ----
[ 2284.078604]   lock(&(&tn->node_list_lock)->rlock);
[ 2284.078604]                                lock((&n->timer)riscvarchive#2);
[ 2284.078604]                                lock(&(&tn->node_list_lock)->rlock);
[ 2284.078604]   lock((&n->timer)riscvarchive#2);
[ 2284.078604]
[ 2284.078604]  *** DEADLOCK ***
[ 2284.078604]
[ 2284.078604] 3 locks held by rmmod/254:
[ 2284.078604]  #0: 000000003368be9b (pernet_ops_rwsem){+.+.}, at: unregister_pernet_subsys+0x15/0x30
[ 2284.078604]  #1: 0000000046ed9c86 (rtnl_mutex){+.+.}, at: tipc_net_stop+0x144/0x170 [tipc]
[ 2284.078604]  riscvarchive#2: 00000000f997afc0 (&(&tn->node_list_lock)->rlock){+.-.}, at: tipc_node_stop+0xac/0x19
[...}

The reason is that the node timer handler sometimes needs to delete a
node which has been disconnected for too long. To do this, it grabs
the lock 'node_list_lock', which may at the same time be held by the
generic node cleanup function, tipc_node_stop(), during module removal.
Since the latter is calling del_timer_sync() inside the same lock, we
have a potential deadlock.

We fix this letting the timer cleanup function use spin_trylock()
instead of just spin_lock(), and when it fails to grab the lock it
just returns so that the timer handler can terminate its execution.
This is safe to do, since tipc_node_stop() anyway is about to
delete both the timer and the node instance.

Fixes: 6a939f3 ("tipc: Auto removal of peer down node instance")
Acked-by: Ying Xue <[email protected]>
Signed-off-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
atishp04 pushed a commit that referenced this pull request Dec 12, 2018
list_for_each_entry_safe() is not safe for deleting entries from the
list if the spin lock, which protects it, is released and reacquired during
the list iteration. Fix this issue by replacing this construction with
a simple check if list is empty and removing the first entry in each
iteration. This is almost equivalent to a revert of the commit mentioned in
the Fixes: tag.

This patch fixes following issue:
--->8---
Unable to handle kernel NULL pointer dereference at virtual address 00000104
pgd = (ptrval)
[00000104] *pgd=00000000
Internal error: Oops: 817 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 1 PID: 84 Comm: kworker/1:1 Not tainted 4.20.0-rc2-next-20181114-00009-g8266b35ec404 #1061
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
Workqueue: events eth_work
PC is at rx_fill+0x60/0xac
LR is at _raw_spin_lock_irqsave+0x50/0x5c
pc : [<c065fee0>]    lr : [<c0a056b8>]    psr: 80000093
sp : ee7fbee8  ip : 00000100  fp : 00000000
r10: 006000c0  r9 : c10b0ab0  r8 : ee7eb5c0
r7 : ee7eb614  r6 : ee7eb5ec  r5 : 000000dc  r4 : ee12ac00
r3 : ee12ac24  r2 : 00000200  r1 : 60000013  r0 : ee7eb5ec
Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 6d5dc04a  DAC: 00000051
Process kworker/1:1 (pid: 84, stack limit = 0x(ptrval))
Stack: (0xee7fbee8 to 0xee7fc000)
...
[<c065fee0>] (rx_fill) from [<c0143b7c>] (process_one_work+0x200/0x738)
[<c0143b7c>] (process_one_work) from [<c0144118>] (worker_thread+0x2c/0x4c8)
[<c0144118>] (worker_thread) from [<c014a8a4>] (kthread+0x128/0x164)
[<c014a8a4>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
Exception stack(0xee7fbfb0 to 0xee7fbff8)
...
---[ end trace 64480bc835eba7d6 ]---

Fixes: fea14e6 ("usb: gadget: u_ether: use better list accessors")
Signed-off-by: Marek Szyprowski <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>
atishp04 pushed a commit that referenced this pull request Dec 12, 2018
Yonghong Song says:

====================
This patch set added name checking for PTR, ARRAY, VOLATILE, TYPEDEF,
CONST, RESTRICT, STRUCT, UNION, ENUM and FWD types. Such a strict
name checking makes BTF more sound in the kernel and future
BTF-to-header-file converesion ([1]) less fragile.

Patch #1 implemented btf_name_valid_identifier() for name checking
which will be used in Patch riscvarchive#2.
Patch riscvarchive#2 checked name validity for the above mentioned types.
Patch riscvarchive#3 fixed two existing test_btf unit tests exposed by the strict
name checking.
Patch riscvarchive#4 added additional test cases.

This patch set is against bpf tree.

Patch #1 has been implemented in bpf-next commit
Commit 2667a26 ("bpf: btf: Add BTF_KIND_FUNC
and BTF_KIND_FUNC_PROTO"), so there is no need to apply this
patch to bpf-next. In case this patch is applied to bpf-next,
there will be a minor conflict like
  diff --cc kernel/bpf/btf.c
  index a09b2f94ab25,93c233ab2db6..000000000000
  --- a/kernel/bpf/btf.c
  +++ b/kernel/bpf/btf.c
  @@@ -474,7 -451,7 +474,11 @@@ static bool btf_name_valid_identifier(c
          return !*src;
    }

  ++<<<<<<< HEAD
   +const char *btf_name_by_offset(const struct btf *btf, u32 offset)
  ++=======
  + static const char *btf_name_by_offset(const struct btf *btf, u32 offset)
  ++>>>>>>> fa9566b0847d... bpf: btf: implement btf_name_valid_identifier()
    {
          if (!offset)
                  return "(anon)";
Just resolve the conflict by taking the "const char ..." line.

Patches riscvarchive#2, riscvarchive#3 and riscvarchive#4 can be applied to bpf-next without conflict.

[1]: http://vger.kernel.org/lpc-bpf2018.html#session-2
====================

Signed-off-by: Alexei Starovoitov <[email protected]>
atishp04 pushed a commit that referenced this pull request Dec 12, 2018
We can concurrently try to open the same sub-channel from 2 paths:

path #1: vmbus_onoffer() -> vmbus_process_offer() -> handle_sc_creation().
path riscvarchive#2: storvsc_probe() -> storvsc_connect_to_vsp() ->
	 -> storvsc_channel_init() -> handle_multichannel_storage() ->
	 -> vmbus_are_subchannels_present() -> handle_sc_creation().

They conflict with each other, but it was not an issue before the recent
commit ae6935e ("vmbus: split ring buffer allocation from open"),
because at the beginning of vmbus_open() we checked newchannel->state so
only one path could succeed, and the other would return with -EINVAL.

After ae6935e, the failing path frees the channel's ringbuffer by
vmbus_free_ring(), and this causes a panic later.

Commit ae6935e itself is good, and it just reveals the longstanding
race. We can resolve the issue by removing path riscvarchive#2, i.e. removing the
second vmbus_are_subchannels_present() in handle_multichannel_storage().

BTW, the comment "Check to see if sub-channels have already been created"
in handle_multichannel_storage() is incorrect: when we unload the driver,
we first close the sub-channel(s) and then close the primary channel, next
the host sends rescind-offer message(s) so primary->sc_list will become
empty. This means the first vmbus_are_subchannels_present() in
handle_multichannel_storage() is never useful.

Fixes: ae6935e ("vmbus: split ring buffer allocation from open")
Cc: [email protected]
Cc: Long Li <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: K. Y. Srinivasan <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Signed-off-by: Dexuan Cui <[email protected]>
Signed-off-by: K. Y. Srinivasan <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
atishp04 pushed a commit that referenced this pull request Dec 12, 2018
__qdisc_drop_all() accesses skb->prev to get to the tail of the
segment-list.

With commit 68d2f84 ("net: gro: properly remove skb from list")
the skb-list handling has been changed to set skb->next to NULL and set
the list-poison on skb->prev.

With that change, __qdisc_drop_all() will panic when it tries to
dereference skb->prev.

Since commit 992cba7 ("net: Add and use skb_list_del_init().")
__list_del_entry is used, leaving skb->prev unchanged (thus,
pointing to the list-head if it's the first skb of the list).
This will make __qdisc_drop_all modify the next-pointer of the list-head
and result in a panic later on:

[   34.501053] general protection fault: 0000 [#1] SMP KASAN PTI
[   34.501968] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.20.0-rc2.mptcp riscvarchive#108
[   34.502887] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
[   34.504074] RIP: 0010:dev_gro_receive+0x343/0x1f90
[   34.504751] Code: e0 48 c1 e8 03 42 80 3c 30 00 0f 85 4a 1c 00 00 4d 8b 24 24 4c 39 65 d0 0f 84 0a 04 00 00 49 8d 7c 24 38 48 89 f8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 04
[   34.507060] RSP: 0018:ffff8883af507930 EFLAGS: 00010202
[   34.507761] RAX: 0000000000000007 RBX: ffff8883970b2c80 RCX: 1ffff11072e165a6
[   34.508640] RDX: 1ffff11075867008 RSI: ffff8883ac338040 RDI: 0000000000000038
[   34.509493] RBP: ffff8883af5079d0 R08: ffff8883970b2d40 R09: 0000000000000062
[   34.510346] R10: 0000000000000034 R11: 0000000000000000 R12: 0000000000000000
[   34.511215] R13: 0000000000000000 R14: dffffc0000000000 R15: ffff8883ac338008
[   34.512082] FS:  0000000000000000(0000) GS:ffff8883af500000(0000) knlGS:0000000000000000
[   34.513036] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   34.513741] CR2: 000055ccc3e9d020 CR3: 00000003abf32000 CR4: 00000000000006e0
[   34.514593] Call Trace:
[   34.514893]  <IRQ>
[   34.515157]  napi_gro_receive+0x93/0x150
[   34.515632]  receive_buf+0x893/0x3700
[   34.516094]  ? __netif_receive_skb+0x1f/0x1a0
[   34.516629]  ? virtnet_probe+0x1b40/0x1b40
[   34.517153]  ? __stable_node_chain+0x4d0/0x850
[   34.517684]  ? kfree+0x9a/0x180
[   34.518067]  ? __kasan_slab_free+0x171/0x190
[   34.518582]  ? detach_buf+0x1df/0x650
[   34.519061]  ? lapic_next_event+0x5a/0x90
[   34.519539]  ? virtqueue_get_buf_ctx+0x280/0x7f0
[   34.520093]  virtnet_poll+0x2df/0xd60
[   34.520533]  ? receive_buf+0x3700/0x3700
[   34.521027]  ? qdisc_watchdog_schedule_ns+0xd5/0x140
[   34.521631]  ? htb_dequeue+0x1817/0x25f0
[   34.522107]  ? sch_direct_xmit+0x142/0xf30
[   34.522595]  ? virtqueue_napi_schedule+0x26/0x30
[   34.523155]  net_rx_action+0x2f6/0xc50
[   34.523601]  ? napi_complete_done+0x2f0/0x2f0
[   34.524126]  ? kasan_check_read+0x11/0x20
[   34.524608]  ? _raw_spin_lock+0x7d/0xd0
[   34.525070]  ? _raw_spin_lock_bh+0xd0/0xd0
[   34.525563]  ? kvm_guest_apic_eoi_write+0x6b/0x80
[   34.526130]  ? apic_ack_irq+0x9e/0xe0
[   34.526567]  __do_softirq+0x188/0x4b5
[   34.527015]  irq_exit+0x151/0x180
[   34.527417]  do_IRQ+0xdb/0x150
[   34.527783]  common_interrupt+0xf/0xf
[   34.528223]  </IRQ>

This patch makes sure that skb->prev is set to NULL when entering
netem_enqueue.

Cc: Prashant Bhole <[email protected]>
Cc: Tyler Hicks <[email protected]>
Cc: Eric Dumazet <[email protected]>
Fixes: 68d2f84 ("net: gro: properly remove skb from list")
Suggested-by: Eric Dumazet <[email protected]>
Signed-off-by: Christoph Paasch <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
atishp04 pushed a commit that referenced this pull request Dec 12, 2018
It was observed that a process blocked indefintely in
__fscache_read_or_alloc_page(), waiting for FSCACHE_COOKIE_LOOKING_UP
to be cleared via fscache_wait_for_deferred_lookup().

At this time, ->backing_objects was empty, which would normaly prevent
__fscache_read_or_alloc_page() from getting to the point of waiting.
This implies that ->backing_objects was cleared *after*
__fscache_read_or_alloc_page was was entered.

When an object is "killed" and then "dropped",
FSCACHE_COOKIE_LOOKING_UP is cleared in fscache_lookup_failure(), then
KILL_OBJECT and DROP_OBJECT are "called" and only in DROP_OBJECT is
->backing_objects cleared.  This leaves a window where
something else can set FSCACHE_COOKIE_LOOKING_UP and
__fscache_read_or_alloc_page() can start waiting, before
->backing_objects is cleared

There is some uncertainty in this analysis, but it seems to be fit the
observations.  Adding the wake in this patch will be handled correctly
by __fscache_read_or_alloc_page(), as it checks if ->backing_objects
is empty again, after waiting.

Customer which reported the hang, also report that the hang cannot be
reproduced with this fix.

The backtrace for the blocked process looked like:

PID: 29360  TASK: ffff881ff2ac0f80  CPU: 3   COMMAND: "zsh"
 #0 [ffff881ff43efbf8] schedule at ffffffff815e56f1
 #1 [ffff881ff43efc58] bit_wait at ffffffff815e64ed
 riscvarchive#2 [ffff881ff43efc68] __wait_on_bit at ffffffff815e61b8
 riscvarchive#3 [ffff881ff43efca0] out_of_line_wait_on_bit at ffffffff815e625e
 riscvarchive#4 [ffff881ff43efd08] fscache_wait_for_deferred_lookup at ffffffffa04f2e8f [fscache]
 riscvarchive#5 [ffff881ff43efd18] __fscache_read_or_alloc_page at ffffffffa04f2ffe [fscache]
 riscvarchive#6 [ffff881ff43efd58] __nfs_readpage_from_fscache at ffffffffa0679668 [nfs]
 riscvarchive#7 [ffff881ff43efd78] nfs_readpage at ffffffffa067092b [nfs]
 riscvarchive#8 [ffff881ff43efda0] generic_file_read_iter at ffffffff81187a73
 riscvarchive#9 [ffff881ff43efe50] nfs_file_read at ffffffffa066544b [nfs]
riscvarchive#10 [ffff881ff43efe70] __vfs_read at ffffffff811fc756
riscvarchive#11 [ffff881ff43efee8] vfs_read at ffffffff811fccfa
riscvarchive#12 [ffff881ff43eff18] sys_read at ffffffff811fda62
riscvarchive#13 [ffff881ff43eff50] entry_SYSCALL_64_fastpath at ffffffff815e986e

Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: David Howells <[email protected]>
atishp04 pushed a commit that referenced this pull request Dec 12, 2018
There are actually two kinds of discard merge:

- one is the normal discard merge, just like normal read/write request,
and call it single-range discard

- another is the multi-range discard, queue_max_discard_segments(rq->q) > 1

For the former case, queue_max_discard_segments(rq->q) is 1, and we
should handle this kind of discard merge like the normal read/write
request.

This patch fixes the following kernel panic issue[1], which is caused by
not removing the single-range discard request from elevator queue.

Guangwu has one raid discard test case, in which this issue is a bit
easier to trigger, and I verified that this patch can fix the kernel
panic issue in Guangwu's test case.

[1] kernel panic log from Jens's report

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000148
 PGD 0 P4D 0.
 Oops: 0000 [#1] SMP PTI
 CPU: 37 PID: 763 Comm: kworker/37:1H Not tainted \
4.20.0-rc3-00649-ge64d9a554a91-dirty riscvarchive#14  Hardware name: Wiwynn \
Leopard-Orv2/Leopard-DDR BW, BIOS LBM08   03/03/2017       Workqueue: kblockd \
blk_mq_run_work_fn                                            RIP: \
0010:blk_mq_get_driver_tag+0x81/0x120                                       Code: 24 \
10 48 89 7c 24 20 74 21 83 fa ff 0f 95 c0 48 8b 4c 24 28 65 48 33 0c 25 28 00 00 00 \
0f 85 96 00 00 00 48 83 c4 30 5b 5d c3 <48> 8b 87 48 01 00 00 8b 40 04 39 43 20 72 37 \
f6 87 b0 00 00 00 02  RSP: 0018:ffffc90004aabd30 EFLAGS: 00010246                     \
  RAX: 0000000000000003 RBX: ffff888465ea1300 RCX: ffffc90004aabde8
 RDX: 00000000ffffffff RSI: ffffc90004aabde8 RDI: 0000000000000000
 RBP: 0000000000000000 R08: ffff888465ea1348 R09: 0000000000000000
 R10: 0000000000001000 R11: 00000000ffffffff R12: ffff888465ea1300
 R13: 0000000000000000 R14: ffff888465ea1348 R15: ffff888465d10000
 FS:  0000000000000000(0000) GS:ffff88846f9c0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000148 CR3: 000000000220a003 CR4: 00000000003606e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  blk_mq_dispatch_rq_list+0xec/0x480
  ? elv_rb_del+0x11/0x30
  blk_mq_do_dispatch_sched+0x6e/0xf0
  blk_mq_sched_dispatch_requests+0xfa/0x170
  __blk_mq_run_hw_queue+0x5f/0xe0
  process_one_work+0x154/0x350
  worker_thread+0x46/0x3c0
  kthread+0xf5/0x130
  ? process_one_work+0x350/0x350
  ? kthread_destroy_worker+0x50/0x50
  ret_from_fork+0x1f/0x30
 Modules linked in: sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel \
kvm switchtec irqbypass iTCO_wdt iTCO_vendor_support efivars cdc_ether usbnet mii \
cdc_acm i2c_i801 lpc_ich mfd_core ipmi_si ipmi_devintf ipmi_msghandler acpi_cpufreq \
button sch_fq_codel nfsd nfs_acl lockd grace auth_rpcgss oid_registry sunrpc nvme \
nvme_core fuse sg loop efivarfs autofs4  CR2: 0000000000000148                        \

 ---[ end trace 340a1fb996df1b9b ]---
 RIP: 0010:blk_mq_get_driver_tag+0x81/0x120
 Code: 24 10 48 89 7c 24 20 74 21 83 fa ff 0f 95 c0 48 8b 4c 24 28 65 48 33 0c 25 28 \
00 00 00 0f 85 96 00 00 00 48 83 c4 30 5b 5d c3 <48> 8b 87 48 01 00 00 8b 40 04 39 43 \
20 72 37 f6 87 b0 00 00 00 02

Fixes: 445251d ("blk-mq: fix discard merge with scheduler attached")
Reported-by: Jens Axboe <[email protected]>
Cc: Guangwu Zhang <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Jianchao Wang <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
atishp04 pushed a commit that referenced this pull request Dec 12, 2018
When we use direct_IO with an NFS backing store, we can trigger a
WARNING in __set_page_dirty(), as below, since we're dirtying the page
unnecessarily in nfs_direct_read_completion().

To fix, replicate the logic in commit 53cbf3b ("fs: direct-io:
don't dirtying pages for ITER_BVEC/ITER_KVEC direct read").

Other filesystems that implement direct_IO handle this; most use
blockdev_direct_IO(). ceph and cifs have similar logic.

mount 127.0.0.1:/export /nfs
dd if=/dev/zero of=/nfs/image bs=1M count=200
losetup --direct-io=on -f /nfs/image
mkfs.btrfs /dev/loop0
mount -t btrfs /dev/loop0 /mnt/

kernel: WARNING: CPU: 0 PID: 8067 at fs/buffer.c:580 __set_page_dirty+0xaf/0xd0
kernel: Modules linked in: loop(E) nfsv3(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) fuse(E) tun(E) ip6t_rpfilter(E) ipt_REJECT(E) nf_
kernel:  snd_seq(E) snd_seq_device(E) snd_pcm(E) video(E) snd_timer(E) snd(E) soundcore(E) ip_tables(E) xfs(E) libcrc32c(E) sd_mod(E) sr_mod(E) cdrom(E) ata_generic(E) pata_acpi(E) crc32c_intel(E) ahci(E) li
kernel: CPU: 0 PID: 8067 Comm: kworker/0:2 Tainted: G            E     4.20.0-rc1.master.20181111.ol7.x86_64 #1
kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
kernel: Workqueue: nfsiod rpc_async_release [sunrpc]
kernel: RIP: 0010:__set_page_dirty+0xaf/0xd0
kernel: Code: c3 48 8b 02 f6 c4 04 74 d4 48 89 df e8 ba 05 f7 ff 48 89 c6 eb cb 48 8b 43 08 a8 01 75 1f 48 89 d8 48 8b 00 a8 04 74 02 eb 87 <0f> 0b eb 83 48 83 e8 01 eb 9f 48 83 ea 01 0f 1f 00 eb 8b 48 83 e8
kernel: RSP: 0000:ffffc1c8825b7d78 EFLAGS: 00013046
kernel: RAX: 000fffffc0020089 RBX: fffff2b603308b80 RCX: 0000000000000001
kernel: RDX: 0000000000000001 RSI: ffff9d11478115c8 RDI: ffff9d11478115d0
kernel: RBP: ffffc1c8825b7da0 R08: 0000646f6973666e R09: 8080808080808080
kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffff9d11478115d0
kernel: R13: ffff9d11478115c8 R14: 0000000000003246 R15: 0000000000000001
kernel: FS:  0000000000000000(0000) GS:ffff9d115ba00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007f408686f640 CR3: 0000000104d8e004 CR4: 00000000000606f0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: Call Trace:
kernel:  __set_page_dirty_buffers+0xb6/0x110
kernel:  set_page_dirty+0x52/0xb0
kernel:  nfs_direct_read_completion+0xc4/0x120 [nfs]
kernel:  nfs_pgio_release+0x10/0x20 [nfs]
kernel:  rpc_free_task+0x30/0x70 [sunrpc]
kernel:  rpc_async_release+0x12/0x20 [sunrpc]
kernel:  process_one_work+0x174/0x390
kernel:  worker_thread+0x4f/0x3e0
kernel:  kthread+0x102/0x140
kernel:  ? drain_workqueue+0x130/0x130
kernel:  ? kthread_stop+0x110/0x110
kernel:  ret_from_fork+0x35/0x40
kernel: ---[ end trace 01341980905412c9 ]---

Signed-off-by: Dave Kleikamp <[email protected]>
Signed-off-by: Santosh Shilimkar <[email protected]>

[forward-ported to v4.20]
Signed-off-by: Calum Mackay <[email protected]>
Reviewed-by: Dave Kleikamp <[email protected]>
Reviewed-by: Chuck Lever <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
atishp04 pushed a commit that referenced this pull request Dec 12, 2018
Frank Schreiner reported, that since kernel 4.18 he faces sysfs-warnings
when loading modules on a 32-bit kernel. Here is one such example:

 sysfs: cannot create duplicate filename '/module/nfs/sections/.text'
 CPU: 0 PID: 98 Comm: modprobe Not tainted 4.18.0-2-parisc #1 Debian 4.18.10-2
 Backtrace:
  [<1017ce2c>] show_stack+0x3c/0x50
  [<107a7210>] dump_stack+0x28/0x38
  [<103f900c>] sysfs_warn_dup+0x88/0xac
  [<103f8b1c>] sysfs_add_file_mode_ns+0x164/0x1d0
  [<103f9e70>] internal_create_group+0x11c/0x304
  [<103fa0a0>] sysfs_create_group+0x48/0x60
  [<1022abe8>] load_module.constprop.35+0x1f9c/0x23b8
  [<1022b278>] sys_finit_module+0xd0/0x11c
  [<101831dc>] syscall_exit+0x0/0x14

This warning gets triggered by the fact, that due to commit 24b6c22
("parisc: Build kernel without -ffunction-sections") we now get multiple .text
sections in the kernel modules for which sysfs_create_group() can't create
multiple virtual files.

This patch works around the problem by re-enabling the -ffunction-sections
compiler option for modules, while keeping it disabled for the non-module
kernel code.

Reported-by: Frank Scheiner <[email protected]>
Fixes: 24b6c22 ("parisc: Build kernel without -ffunction-sections")
Cc: <[email protected]> # v4.18+
Signed-off-by: Helge Deller <[email protected]>
atishp04 pushed a commit that referenced this pull request Dec 12, 2018
When changing mtu many times with traffic, a bug is triggered:

[ 1035.684037] kernel BUG at lib/dynamic_queue_limits.c:26!
[ 1035.684042] invalid opcode: 0000 [#1] SMP
[ 1035.684049] Modules linked in: loop binfmt_misc 8139cp(OE) macsec
tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag tcp_lp
fuse uinput xt_CHECKSUM iptable_mangle ipt_MASQUERADE
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun
bridge stp llc ebtable_filter ebtables ip6table_filter devlink
ip6_tables iptable_filter sunrpc snd_hda_codec_generic snd_hda_intel
snd_hda_codec snd_hda_core snd_hwdep ppdev snd_seq iosf_mbi crc32_pclmul
parport_pc snd_seq_device ghash_clmulni_intel parport snd_pcm
aesni_intel joydev lrw snd_timer virtio_balloon sg gf128mul glue_helper
ablk_helper cryptd snd soundcore i2c_piix4 pcspkr ip_tables xfs
libcrc32c sr_mod sd_mod cdrom crc_t10dif crct10dif_generic ata_generic
[ 1035.684102]  pata_acpi virtio_console qxl drm_kms_helper syscopyarea
sysfillrect sysimgblt floppy fb_sys_fops crct10dif_pclmul
crct10dif_common ttm crc32c_intel serio_raw ata_piix drm libata 8139too
virtio_pci drm_panel_orientation_quirks virtio_ring virtio mii dm_mirror
dm_region_hash dm_log dm_mod [last unloaded: 8139cp]
[ 1035.684132] CPU: 9 PID: 25140 Comm: if-mtu-change Kdump: loaded
Tainted: G           OE  ------------ T 3.10.0-957.el7.x86_64 #1
[ 1035.684134] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 1035.684136] task: ffff8f59b1f5a080 ti: ffff8f5a2e32c000 task.ti:
ffff8f5a2e32c000
[ 1035.684149] RIP: 0010:[<ffffffffba3a40d0>]  [<ffffffffba3a40d0>]
dql_completed+0x180/0x190
[ 1035.684162] RSP: 0000:ffff8f5a75483e50  EFLAGS: 00010093
[ 1035.684162] RAX: 00000000000000c2 RBX: ffff8f5a6f91c000 RCX:
0000000000000000
[ 1035.684162] RDX: 0000000000000000 RSI: 0000000000000184 RDI:
ffff8f599fea3ec0
[ 1035.684162] RBP: ffff8f5a75483ea8 R08: 00000000000000c2 R09:
0000000000000000
[ 1035.684162] R10: 00000000000616ef R11: ffff8f5a75483b56 R12:
ffff8f599fea3e00
[ 1035.684162] R13: 0000000000000001 R14: 0000000000000000 R15:
0000000000000184
[ 1035.684162] FS:  00007fa8434de740(0000) GS:ffff8f5a75480000(0000)
knlGS:0000000000000000
[ 1035.684162] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1035.684162] CR2: 00000000004305d0 CR3: 000000024eb66000 CR4:
00000000001406e0
[ 1035.684162] Call Trace:
[ 1035.684162]  <IRQ>
[ 1035.684162]  [<ffffffffc08cbaf8>] ? cp_interrupt+0x478/0x580 [8139cp]
[ 1035.684162]  [<ffffffffba14a294>]
__handle_irq_event_percpu+0x44/0x1c0
[ 1035.684162]  [<ffffffffba14a442>] handle_irq_event_percpu+0x32/0x80
[ 1035.684162]  [<ffffffffba14a4cc>] handle_irq_event+0x3c/0x60
[ 1035.684162]  [<ffffffffba14db29>] handle_fasteoi_irq+0x59/0x110
[ 1035.684162]  [<ffffffffba02e554>] handle_irq+0xe4/0x1a0
[ 1035.684162]  [<ffffffffba7795dd>] do_IRQ+0x4d/0xf0
[ 1035.684162]  [<ffffffffba76b362>] common_interrupt+0x162/0x162
[ 1035.684162]  <EOI>
[ 1035.684162]  [<ffffffffba0c2ae4>] ? __wake_up_bit+0x24/0x70
[ 1035.684162]  [<ffffffffba1e46f5>] ? do_set_pte+0xd5/0x120
[ 1035.684162]  [<ffffffffba1b64fb>] unlock_page+0x2b/0x30
[ 1035.684162]  [<ffffffffba1e4879>] do_read_fault.isra.61+0x139/0x1b0
[ 1035.684162]  [<ffffffffba1e9134>] handle_pte_fault+0x2f4/0xd10
[ 1035.684162]  [<ffffffffba1ebc6d>] handle_mm_fault+0x39d/0x9b0
[ 1035.684162]  [<ffffffffba76f5e3>] __do_page_fault+0x203/0x500
[ 1035.684162]  [<ffffffffba76f9c6>] trace_do_page_fault+0x56/0x150
[ 1035.684162]  [<ffffffffba76ef42>] do_async_page_fault+0x22/0xf0
[ 1035.684162]  [<ffffffffba76b788>] async_page_fault+0x28/0x30
[ 1035.684162] Code: 54 c7 47 54 ff ff ff ff 44 0f 49 ce 48 8b 35 48 2f
9c 00 48 89 77 58 e9 fe fe ff ff 0f 1f 80 00 00 00 00 41 89 d1 e9 ef fe
ff ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 55 8d 42 ff 48
[ 1035.684162] RIP  [<ffffffffba3a40d0>] dql_completed+0x180/0x190
[ 1035.684162]  RSP <ffff8f5a75483e50>

It's not the same as in 7fe0ee0 patch described.
As 8139cp uses shared irq mode, other device irq will trigger
cp_interrupt to execute.

cp_change_mtu
 -> cp_close
 -> cp_open

In cp_close routine  just before free_irq(), some interrupt may occur.
In my environment, cp_interrupt exectutes and IntrStatus is 0x4,
exactly TxOk. That will cause cp_tx to wake device queue.

As device queue is started, cp_start_xmit and cp_open will run at same
time which will cause kernel BUG.

For example:
[#] for tx descriptor

At start:

[#][#][#]
num_queued=3

After cp_init_hw->cp_start_hw->netdev_reset_queue:

[#][#][#]
num_queued=0

When 8139cp starts to work then cp_tx will check
num_queued mismatchs the complete_bytes.

The patch will check IntrMask before check IntrStatus in cp_interrupt.
When 8139cp interrupt is disabled, just return.

Signed-off-by: Su Yanjun <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
atishp04 pushed a commit that referenced this pull request Feb 6, 2019
This is the same sort of error we saw in commit 17e2e7d ("mm,
page_alloc: fix has_unmovable_pages for HugePages").

Gigantic hugepages cross several memblocks, so it can be that the page
we get in scan_movable_pages() is a page-tail belonging to a
1G-hugepage.  If that happens, page_hstate()->size_to_hstate() will
return NULL, and we will blow up in hugepage_migration_supported().

The splat is as follows:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  #PF error: [normal kernel read fault]
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP PTI
  CPU: 1 PID: 1350 Comm: bash Tainted: G            E     5.0.0-rc1-mm1-1-default+ riscvarchive#27
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
  RIP: 0010:__offline_pages+0x6ae/0x900
  Call Trace:
   memory_subsys_offline+0x42/0x60
   device_offline+0x80/0xa0
   state_store+0xab/0xc0
   kernfs_fop_write+0x102/0x180
   __vfs_write+0x26/0x190
   vfs_write+0xad/0x1b0
   ksys_write+0x42/0x90
   do_syscall_64+0x5b/0x180
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  Modules linked in: af_packet(E) xt_tcpudp(E) ipt_REJECT(E) xt_conntrack(E) nf_conntrack(E) nf_defrag_ipv4(E) ip_set(E) nfnetlink(E) ebtable_nat(E) ebtable_broute(E) bridge(E) stp(E) llc(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ebtable_filter(E) ebtables(E) iptable_filter(E) ip_tables(E) x_tables(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) bochs_drm(E) ttm(E) aesni_intel(E) drm_kms_helper(E) aes_x86_64(E) crypto_simd(E) cryptd(E) glue_helper(E) drm(E) virtio_net(E) syscopyarea(E) sysfillrect(E) net_failover(E) sysimgblt(E) pcspkr(E) failover(E) i2c_piix4(E) fb_sys_fops(E) parport_pc(E) parport(E) button(E) btrfs(E) libcrc32c(E) xor(E) zstd_decompress(E) zstd_compress(E) xxhash(E) raid6_pq(E) sd_mod(E) ata_generic(E) ata_piix(E) ahci(E) libahci(E) libata(E) crc32c_intel(E) serio_raw(E) virtio_pci(E) virtio_ring(E) virtio(E) sg(E) scsi_mod(E) autofs4(E)

[[email protected]: fix brace layout, per David.  Reduce indentation]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Oscar Salvador <[email protected]>
Reviewed-by: Anthony Yznaga <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
atishp04 pushed a commit that referenced this pull request Feb 6, 2019
On an arm64 ThunderX2 server, the first kmemleak scan would crash [1]
with CONFIG_DEBUG_VM_PGFLAGS=y due to page_to_nid() found a pfn that is
not directly mapped (MEMBLOCK_NOMAP).  Hence, the page->flags is
uninitialized.

This is due to the commit 9f1eb38 ("mm, kmemleak: little
optimization while scanning") starts to use pfn_to_online_page() instead
of pfn_valid().  However, in the CONFIG_MEMORY_HOTPLUG=y case,
pfn_to_online_page() does not call memblock_is_map_memory() while
pfn_valid() does.

Historically, the commit 68709f4 ("arm64: only consider memblocks
with NOMAP cleared for linear mapping") causes pages marked as nomap
being no long reassigned to the new zone in memmap_init_zone() by
calling __init_single_page().

Since the commit 2d070ea ("mm: consider zone which is not fully
populated to have holes") introduced pfn_to_online_page() and was
designed to return a valid pfn only, but it is clearly broken on arm64.

Therefore, let pfn_to_online_page() call pfn_valid_within(), so it can
handle nomap thanks to the commit f52bb98 ("arm64: mm: always
enable CONFIG_HOLES_IN_ZONE"), while it will be optimized away on
architectures where have no HOLES_IN_ZONE.

[1]
  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000006
  Mem abort info:
    ESR = 0x96000005
    Exception class = DABT (current EL), IL = 32 bits
    SET = 0, FnV = 0
    EA = 0, S1PTW = 0
  Data abort info:
    ISV = 0, ISS = 0x00000005
    CM = 0, WnR = 0
  Internal error: Oops: 96000005 [#1] SMP
  CPU: 60 PID: 1408 Comm: kmemleak Not tainted 5.0.0-rc2+ riscvarchive#8
  pstate: 60400009 (nZCv daif +PAN -UAO)
  pc : page_mapping+0x24/0x144
  lr : __dump_page+0x34/0x3dc
  sp : ffff00003a5cfd10
  x29: ffff00003a5cfd10 x28: 000000000000802f
  x27: 0000000000000000 x26: 0000000000277d00
  x25: ffff000010791f56 x24: ffff7fe000000000
  x23: ffff000010772f8b x22: ffff00001125f670
  x21: ffff000011311000 x20: ffff000010772f8b
  x19: fffffffffffffffe x18: 0000000000000000
  x17: 0000000000000000 x16: 0000000000000000
  x15: 0000000000000000 x14: ffff802698b19600
  x13: ffff802698b1a200 x12: ffff802698b16f00
  x11: ffff802698b1a400 x10: 0000000000001400
  x9 : 0000000000000001 x8 : ffff00001121a000
  x7 : 0000000000000000 x6 : ffff0000102c53b8
  x5 : 0000000000000000 x4 : 0000000000000003
  x3 : 0000000000000100 x2 : 0000000000000000
  x1 : ffff000010772f8b x0 : ffffffffffffffff
  Process kmemleak (pid: 1408, stack limit = 0x(____ptrval____))
  Call trace:
   page_mapping+0x24/0x144
   __dump_page+0x34/0x3dc
   dump_page+0x28/0x4c
   kmemleak_scan+0x4ac/0x680
   kmemleak_scan_thread+0xb4/0xdc
   kthread+0x12c/0x13c
   ret_from_fork+0x10/0x18
  Code: d503201f f9400660 36000040 d1000413 (f9400661)
  ---[ end trace 4d4bd7f573490c8e ]---
  Kernel panic - not syncing: Fatal exception
  SMP: stopping secondary CPUs
  Kernel Offset: disabled
  CPU features: 0x002,20000c38
  Memory Limit: none
  ---[ end Kernel panic - not syncing: Fatal exception ]---

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 9f1eb38 ("mm, kmemleak: little optimization while scanning")
Signed-off-by: Qian Cai <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
atishp04 pushed a commit that referenced this pull request Feb 6, 2019
When option CONFIG_KASAN is enabled toghether with ftrace, function
ftrace_graph_caller() gets in to a recursion, via functions
kasan_check_read() and kasan_check_write().

 Breakpoint 2, ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:179
 179             mcount_get_pc             x0    //     function's pc
 (gdb) bt
 #0  ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:179
 #1  0xffffff90101406c8 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:151
 riscvarchive#2  0xffffff90106fd084 in kasan_check_write (p=0xffffffc06c170878, size=4) at ../mm/kasan/common.c:105
 riscvarchive#3  0xffffff90104a2464 in atomic_add_return (v=<optimized out>, i=<optimized out>) at ./include/generated/atomic-instrumented.h:71
 riscvarchive#4  atomic_inc_return (v=<optimized out>) at ./include/generated/atomic-fallback.h:284
 riscvarchive#5  trace_graph_entry (trace=0xffffffc03f5ff380) at ../kernel/trace/trace_functions_graph.c:441
 riscvarchive#6  0xffffff9010481774 in trace_graph_entry_watchdog (trace=<optimized out>) at ../kernel/trace/trace_selftest.c:741
 riscvarchive#7  0xffffff90104a185c in function_graph_enter (ret=<optimized out>, func=<optimized out>, frame_pointer=18446743799894897728, retp=<optimized out>) at ../kernel/trace/trace_functions_graph.c:196
 riscvarchive#8  0xffffff9010140628 in prepare_ftrace_return (self_addr=18446743592948977792, parent=0xffffffc03f5ff418, frame_pointer=18446743799894897728) at ../arch/arm64/kernel/ftrace.c:231
 riscvarchive#9  0xffffff90101406f4 in ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:182
 Backtrace stopped: previous frame identical to this frame (corrupt stack?)
 (gdb)

Rework so that the kasan implementation isn't traced.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Anders Roxell <[email protected]>
Acked-by: Dmitry Vyukov <[email protected]>
Tested-by: Dmitry Vyukov <[email protected]>
Acked-by: Steven Rostedt (VMware) <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
hdmi-codec oopses the kernel when it is unbound from a successfully
bound audio subsystem, and is then rebound:

Unable to handle kernel NULL pointer dereference at virtual address 0000001c
pgd = ee3f0000
[0000001c] *pgd=3cc59831
Internal error: Oops: 817 [#1] PREEMPT ARM
Modules linked in: ext2 snd_soc_spdif_tx vmeta dove_thermal snd_soc_kirkwood ofpart marvell_cesa m25p80 orion_wdt mtd spi_nor des_generic gpio_ir_recv snd_soc_kirkwood_spdif bmm_dmabuf auth_rpcgss nfsd autofs4 etnaviv thermal_sys hwmon gpu_sched tda9950
CPU: 0 PID: 1005 Comm: bash Not tainted 4.20.0+ #1762
Hardware name: Marvell Dove (Cubox)
PC is at hdmi_dai_probe+0x68/0x80
LR is at find_held_lock+0x20/0x94
pc : [<c04c7de0>]    lr : [<c0063bf4>]    psr: 600f0013
sp : ee15bd28  ip : eebd8b1c  fp : c093b488
r10: ee048000  r9 : eebdab18  r8 : ee048600
r7 : 00000001  r6 : 00000000  r5 : 00000000  r4 : ee82c100
r3 : 00000006  r2 : 00000001  r1 : c067e38c  r0 : ee82c100
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none[  297.318599] Control: 10c5387d  Table: 2e3f0019  DAC: 00000051
Process bash (pid: 1005, stack limit = 0xee15a248)
...
[<c04c7de0>] (hdmi_dai_probe) from [<c04b7060>] (soc_probe_dai.part.9+0x34/0x70)
[<c04b7060>] (soc_probe_dai.part.9) from [<c04b81a8>] (snd_soc_instantiate_card+0x734/0xc9c)
[<c04b81a8>] (snd_soc_instantiate_card) from [<c04b8b6c>] (snd_soc_add_component+0x29c/0x378)
[<c04b8b6c>] (snd_soc_add_component) from [<c04b8c8c>] (snd_soc_register_component+0x44/0x54)
[<c04b8c8c>] (snd_soc_register_component) from [<c04c64b4>] (devm_snd_soc_register_component+0x48/0x84)
[<c04c64b4>] (devm_snd_soc_register_component) from [<c04c7be8>] (hdmi_codec_probe+0x150/0x260)
[<c04c7be8>] (hdmi_codec_probe) from [<c0373124>] (platform_drv_probe+0x48/0x98)

This happens because hdmi_dai_probe() attempts to access the HDMI
codec private data, but this has not been assigned by hdmi_dai_probe()
before it calls devm_snd_soc_register_component().  Move the call to
dev_set_drvdata() before devm_snd_soc_register_component() to avoid
this oops.

Signed-off-by: Russell King <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Cc: [email protected]
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
This patch prevents division by zero htotal.

In a follow-up mail Tina writes:

> > How did you manage to get here with htotal == 0? This needs backtraces (or if
> > this is just about static checkers, a mention of that).
> > -Daniel
>
> In GVT-g, we are trying to enable a virtual display w/o setting timings for a pipe
> (a.k.a htotal=0), then we met the following kernel panic:
>
> [   32.832048] divide error: 0000 [#1] SMP PTI
> [   32.833614] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc4-sriov+ riscvarchive#33
> [   32.834438] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.10.1-0-g8891697-dirty-20180511_165818-tinazhang-linux-1 04/01/2014
> [   32.835901] RIP: 0010:drm_mode_hsync+0x1e/0x40
> [   32.836004] Code: 31 c0 c3 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 8b 87 d8 00 00 00 85 c0 75 22 8b 4f 68 85 c9 78 1b 69 47 58 e8 03 00 00 99 <f7> f9 b9 d3 4d 62 10 05 f4 01 00 00 f7 e1 89 d0 c1 e8 06 f3 c3 66
> [   32.836004] RSP: 0000:ffffc900000ebb90 EFLAGS: 00010206
> [   32.836004] RAX: 0000000000000000 RBX: ffff88001c67c8a0 RCX: 0000000000000000
> [   32.836004] RDX: 0000000000000000 RSI: ffff88001c67c000 RDI: ffff88001c67c8a0
> [   32.836004] RBP: ffff88001c7d03a0 R08: ffff88001c67c8a0 R09: ffff88001c7d0330
> [   32.836004] R10: ffffffff822c3a98 R11: 0000000000000001 R12: ffff88001c67c000
> [   32.836004] R13: ffff88001c7d0370 R14: ffffffff8207eb78 R15: ffff88001c67c800
> [   32.836004] FS:  0000000000000000(0000) GS:ffff88001da00000(0000) knlGS:0000000000000000
> [   32.836004] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   32.836004] CR2: 0000000000000000 CR3: 000000000220a000 CR4: 00000000000006f0
> [   32.836004] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   32.836004] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   32.836004] Call Trace:
> [   32.836004]  intel_mode_from_pipe_config+0x72/0x90
> [   32.836004]  intel_modeset_setup_hw_state+0x569/0xf90
> [   32.836004]  intel_modeset_init+0x905/0x1db0
> [   32.836004]  i915_driver_load+0xb8c/0x1120
> [   32.836004]  i915_pci_probe+0x4d/0xb0
> [   32.836004]  local_pci_probe+0x44/0xa0
> [   32.836004]  ? pci_assign_irq+0x27/0x130
> [   32.836004]  pci_device_probe+0x102/0x1c0
> [   32.836004]  driver_probe_device+0x2b8/0x480
> [   32.836004]  __driver_attach+0x109/0x110
> [   32.836004]  ? driver_probe_device+0x480/0x480
> [   32.836004]  bus_for_each_dev+0x67/0xc0
> [   32.836004]  ? klist_add_tail+0x3b/0x70
> [   32.836004]  bus_add_driver+0x1e8/0x260
> [   32.836004]  driver_register+0x5b/0xe0
> [   32.836004]  ? mipi_dsi_bus_init+0x11/0x11
> [   32.836004]  do_one_initcall+0x4d/0x1eb
> [   32.836004]  kernel_init_freeable+0x197/0x237
> [   32.836004]  ? rest_init+0xd0/0xd0
> [   32.836004]  kernel_init+0xa/0x110
> [   32.836004]  ret_from_fork+0x35/0x40
> [   32.836004] Modules linked in:
> [   32.859183] ---[ end trace 525608b0ed0e8665 ]---
> [   32.859722] RIP: 0010:drm_mode_hsync+0x1e/0x40
> [   32.860287] Code: 31 c0 c3 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 8b 87 d8 00 00 00 85 c0 75 22 8b 4f 68 85 c9 78 1b 69 47 58 e8 03 00 00 99 <f7> f9 b9 d3 4d 62 10 05 f4 01 00 00 f7 e1 89 d0 c1 e8 06 f3 c3 66
> [   32.862680] RSP: 0000:ffffc900000ebb90 EFLAGS: 00010206
> [   32.863309] RAX: 0000000000000000 RBX: ffff88001c67c8a0 RCX: 0000000000000000
> [   32.864182] RDX: 0000000000000000 RSI: ffff88001c67c000 RDI: ffff88001c67c8a0
> [   32.865206] RBP: ffff88001c7d03a0 R08: ffff88001c67c8a0 R09: ffff88001c7d0330
> [   32.866359] R10: ffffffff822c3a98 R11: 0000000000000001 R12: ffff88001c67c000
> [   32.867213] R13: ffff88001c7d0370 R14: ffffffff8207eb78 R15: ffff88001c67c800
> [   32.868075] FS:  0000000000000000(0000) GS:ffff88001da00000(0000) knlGS:0000000000000000
> [   32.868983] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   32.869659] CR2: 0000000000000000 CR3: 000000000220a000 CR4: 00000000000006f0
> [   32.870599] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   32.871598] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   32.872549] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>
> Since drm_mode_hsync() has the logic to check mode->htotal, I just extend it to cover the case htotal==0.

Signed-off-by: Tina Zhang <[email protected]>
Cc: Adam Jackson <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
[danvet: Add additional explanations + cc: stable.]
Cc: [email protected]
Signed-off-by: Daniel Vetter <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
Yonghong Song says:

====================
The current btf implementation disallows the typedef of
a func_proto type. This actually is allowed per C standard.
This patch fixed btf verification to permit such types.
Patch #1 fixed the kernel side and Patch riscvarchive#2 fixed
the tools test_btf test.
====================

Signed-off-by: Alexei Starovoitov <[email protected]>
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
Lockdep found a potential deadlock between cpu_hotplug_lock, bpf_event_mutex, and cpuctx_mutex:
[   13.007000] WARNING: possible circular locking dependency detected
[   13.007587] 5.0.0-rc3-00018-g2fa53f892422-dirty #477 Not tainted
[   13.008124] ------------------------------------------------------
[   13.008624] test_progs/246 is trying to acquire lock:
[   13.009030] 0000000094160d1d (tracepoints_mutex){+.+.}, at: tracepoint_probe_register_prio+0x2d/0x300
[   13.009770]
[   13.009770] but task is already holding lock:
[   13.010239] 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
[   13.010877]
[   13.010877] which lock already depends on the new lock.
[   13.010877]
[   13.011532]
[   13.011532] the existing dependency chain (in reverse order) is:
[   13.012129]
[   13.012129] -> riscvarchive#4 (bpf_event_mutex){+.+.}:
[   13.012582]        perf_event_query_prog_array+0x9b/0x130
[   13.013016]        _perf_ioctl+0x3aa/0x830
[   13.013354]        perf_ioctl+0x2e/0x50
[   13.013668]        do_vfs_ioctl+0x8f/0x6a0
[   13.014003]        ksys_ioctl+0x70/0x80
[   13.014320]        __x64_sys_ioctl+0x16/0x20
[   13.014668]        do_syscall_64+0x4a/0x180
[   13.015007]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   13.015469]
[   13.015469] -> riscvarchive#3 (&cpuctx_mutex){+.+.}:
[   13.015910]        perf_event_init_cpu+0x5a/0x90
[   13.016291]        perf_event_init+0x1b2/0x1de
[   13.016654]        start_kernel+0x2b8/0x42a
[   13.016995]        secondary_startup_64+0xa4/0xb0
[   13.017382]
[   13.017382] -> riscvarchive#2 (pmus_lock){+.+.}:
[   13.017794]        perf_event_init_cpu+0x21/0x90
[   13.018172]        cpuhp_invoke_callback+0xb3/0x960
[   13.018573]        _cpu_up+0xa7/0x140
[   13.018871]        do_cpu_up+0xa4/0xc0
[   13.019178]        smp_init+0xcd/0xd2
[   13.019483]        kernel_init_freeable+0x123/0x24f
[   13.019878]        kernel_init+0xa/0x110
[   13.020201]        ret_from_fork+0x24/0x30
[   13.020541]
[   13.020541] -> #1 (cpu_hotplug_lock.rw_sem){++++}:
[   13.021051]        static_key_slow_inc+0xe/0x20
[   13.021424]        tracepoint_probe_register_prio+0x28c/0x300
[   13.021891]        perf_trace_event_init+0x11f/0x250
[   13.022297]        perf_trace_init+0x6b/0xa0
[   13.022644]        perf_tp_event_init+0x25/0x40
[   13.023011]        perf_try_init_event+0x6b/0x90
[   13.023386]        perf_event_alloc+0x9a8/0xc40
[   13.023754]        __do_sys_perf_event_open+0x1dd/0xd30
[   13.024173]        do_syscall_64+0x4a/0x180
[   13.024519]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   13.024968]
[   13.024968] -> #0 (tracepoints_mutex){+.+.}:
[   13.025434]        __mutex_lock+0x86/0x970
[   13.025764]        tracepoint_probe_register_prio+0x2d/0x300
[   13.026215]        bpf_probe_register+0x40/0x60
[   13.026584]        bpf_raw_tracepoint_open.isra.34+0xa4/0x130
[   13.027042]        __do_sys_bpf+0x94f/0x1a90
[   13.027389]        do_syscall_64+0x4a/0x180
[   13.027727]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   13.028171]
[   13.028171] other info that might help us debug this:
[   13.028171]
[   13.028807] Chain exists of:
[   13.028807]   tracepoints_mutex --> &cpuctx_mutex --> bpf_event_mutex
[   13.028807]
[   13.029666]  Possible unsafe locking scenario:
[   13.029666]
[   13.030140]        CPU0                    CPU1
[   13.030510]        ----                    ----
[   13.030875]   lock(bpf_event_mutex);
[   13.031166]                                lock(&cpuctx_mutex);
[   13.031645]                                lock(bpf_event_mutex);
[   13.032135]   lock(tracepoints_mutex);
[   13.032441]
[   13.032441]  *** DEADLOCK ***
[   13.032441]
[   13.032911] 1 lock held by test_progs/246:
[   13.033239]  #0: 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
[   13.033909]
[   13.033909] stack backtrace:
[   13.034258] CPU: 1 PID: 246 Comm: test_progs Not tainted 5.0.0-rc3-00018-g2fa53f892422-dirty #477
[   13.034964] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
[   13.035657] Call Trace:
[   13.035859]  dump_stack+0x5f/0x8b
[   13.036130]  print_circular_bug.isra.37+0x1ce/0x1db
[   13.036526]  __lock_acquire+0x1158/0x1350
[   13.036852]  ? lock_acquire+0x98/0x190
[   13.037154]  lock_acquire+0x98/0x190
[   13.037447]  ? tracepoint_probe_register_prio+0x2d/0x300
[   13.037876]  __mutex_lock+0x86/0x970
[   13.038167]  ? tracepoint_probe_register_prio+0x2d/0x300
[   13.038600]  ? tracepoint_probe_register_prio+0x2d/0x300
[   13.039028]  ? __mutex_lock+0x86/0x970
[   13.039337]  ? __mutex_lock+0x24a/0x970
[   13.039649]  ? bpf_probe_register+0x1d/0x60
[   13.039992]  ? __bpf_trace_sched_wake_idle_without_ipi+0x10/0x10
[   13.040478]  ? tracepoint_probe_register_prio+0x2d/0x300
[   13.040906]  tracepoint_probe_register_prio+0x2d/0x300
[   13.041325]  bpf_probe_register+0x40/0x60
[   13.041649]  bpf_raw_tracepoint_open.isra.34+0xa4/0x130
[   13.042068]  ? __might_fault+0x3e/0x90
[   13.042374]  __do_sys_bpf+0x94f/0x1a90
[   13.042678]  do_syscall_64+0x4a/0x180
[   13.042975]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   13.043382] RIP: 0033:0x7f23b10a07f9
[   13.045155] RSP: 002b:00007ffdef42fdd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
[   13.045759] RAX: ffffffffffffffda RBX: 00007ffdef42ff70 RCX: 00007f23b10a07f9
[   13.046326] RDX: 0000000000000070 RSI: 00007ffdef42fe10 RDI: 0000000000000011
[   13.046893] RBP: 00007ffdef42fdf0 R08: 0000000000000038 R09: 00007ffdef42fe10
[   13.047462] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[   13.048029] R13: 0000000000000016 R14: 00007f23b1db4690 R15: 0000000000000000

Since tracepoints_mutex will be taken in tracepoint_probe_register/unregister()
there is no need to take bpf_event_mutex too.
bpf_event_mutex is protecting modifications to prog array used in kprobe/perf bpf progs.
bpf_raw_tracepoints don't need to take this mutex.

Fixes: c4f6699 ("bpf: introduce BPF_RAW_TRACEPOINT")
Acked-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
Similarly to commit 276bdb8 ("dccp: check ccid before dereferencing")
it is wise to test for a NULL ccid.

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.0.0-rc3+ riscvarchive#37
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:ccid_hc_tx_parse_options net/dccp/ccid.h:205 [inline]
RIP: 0010:dccp_parse_options+0x8d9/0x12b0 net/dccp/options.c:233
Code: c5 0f b6 75 b3 80 38 00 0f 85 d6 08 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 b8 4c 8b b8 f8 07 00 00 4c 89 f8 48 c1 e8 03 <80> 3c 08 00 0f 85 95 08 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b
kobject: 'loop5' (0000000080f78fc1): kobject_uevent_env
RSP: 0018:ffff8880a94df0b8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8880858ac723 RCX: dffffc0000000000
RDX: 0000000000000100 RSI: 0000000000000007 RDI: 0000000000000001
RBP: ffff8880a94df140 R08: 0000000000000001 R09: ffff888061b83a80
R10: ffffed100c370752 R11: ffff888061b83a97 R12: 0000000000000026
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0defa33518 CR3: 000000008db5e000 CR4: 00000000001406e0
kobject: 'loop5' (0000000080f78fc1): fill_kobj_path: path = '/devices/virtual/block/loop5'
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 dccp_rcv_state_process+0x2b6/0x1af6 net/dccp/input.c:654
 dccp_v4_do_rcv+0x100/0x190 net/dccp/ipv4.c:688
 sk_backlog_rcv include/net/sock.h:936 [inline]
 __sk_receive_skb+0x3a9/0xea0 net/core/sock.c:473
 dccp_v4_rcv+0x10cb/0x1f80 net/dccp/ipv4.c:880
 ip_protocol_deliver_rcu+0xb6/0xa20 net/ipv4/ip_input.c:208
 ip_local_deliver_finish+0x23b/0x390 net/ipv4/ip_input.c:234
 NF_HOOK include/linux/netfilter.h:289 [inline]
 NF_HOOK include/linux/netfilter.h:283 [inline]
 ip_local_deliver+0x1f0/0x740 net/ipv4/ip_input.c:255
 dst_input include/net/dst.h:450 [inline]
 ip_rcv_finish+0x1f4/0x2f0 net/ipv4/ip_input.c:414
 NF_HOOK include/linux/netfilter.h:289 [inline]
 NF_HOOK include/linux/netfilter.h:283 [inline]
 ip_rcv+0xed/0x620 net/ipv4/ip_input.c:524
 __netif_receive_skb_one_core+0x160/0x210 net/core/dev.c:4973
 __netif_receive_skb+0x2c/0x1c0 net/core/dev.c:5083
 process_backlog+0x206/0x750 net/core/dev.c:5923
 napi_poll net/core/dev.c:6346 [inline]
 net_rx_action+0x76d/0x1930 net/core/dev.c:6412
 __do_softirq+0x30b/0xb11 kernel/softirq.c:292
 run_ksoftirqd kernel/softirq.c:654 [inline]
 run_ksoftirqd+0x8e/0x110 kernel/softirq.c:646
 smpboot_thread_fn+0x6ab/0xa10 kernel/smpboot.c:164
 kthread+0x357/0x430 kernel/kthread.c:246
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
Modules linked in:
---[ end trace 58a0ba03bea2c376 ]---
RIP: 0010:ccid_hc_tx_parse_options net/dccp/ccid.h:205 [inline]
RIP: 0010:dccp_parse_options+0x8d9/0x12b0 net/dccp/options.c:233
Code: c5 0f b6 75 b3 80 38 00 0f 85 d6 08 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 b8 4c 8b b8 f8 07 00 00 4c 89 f8 48 c1 e8 03 <80> 3c 08 00 0f 85 95 08 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b
RSP: 0018:ffff8880a94df0b8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8880858ac723 RCX: dffffc0000000000
RDX: 0000000000000100 RSI: 0000000000000007 RDI: 0000000000000001
RBP: ffff8880a94df140 R08: 0000000000000001 R09: ffff888061b83a80
R10: ffffed100c370752 R11: ffff888061b83a97 R12: 0000000000000026
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0defa33518 CR3: 0000000009871000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Cc: Gerrit Renker <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
Since .scsi_done() must only be called after scsi_queue_rq() has
finished, make sure that the SRP initiator driver does not call
.scsi_done() while scsi_queue_rq() is in progress. Although
invoking sg_reset -d while I/O is in progress works fine with kernel
v4.20 and before, that is not the case with kernel v5.0-rc1. This
patch avoids that the following crash is triggered with kernel
v5.0-rc1:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000138
CPU: 0 PID: 360 Comm: kworker/0:1H Tainted: G    B             5.0.0-rc1-dbg+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Workqueue: kblockd blk_mq_run_work_fn
RIP: 0010:blk_mq_dispatch_rq_list+0x116/0xb10
Call Trace:
 blk_mq_sched_dispatch_requests+0x2f7/0x300
 __blk_mq_run_hw_queue+0xd6/0x180
 blk_mq_run_work_fn+0x27/0x30
 process_one_work+0x4f1/0xa20
 worker_thread+0x67/0x5b0
 kthread+0x1cf/0x1f0
 ret_from_fork+0x24/0x30

Cc: <[email protected]>
Fixes: 94a9174 ("IB/srp: reduce lock coverage of command completion")
Signed-off-by: Bart Van Assche <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
Creating a macvtap on a DSA-backed interface results in the following
splat when lockdep is enabled:

[   19.638080] IPv6: ADDRCONF(NETDEV_CHANGE): lan0: link becomes ready
[   23.041198] device lan0 entered promiscuous mode
[   23.043445] device eth0 entered promiscuous mode
[   23.049255]
[   23.049557] ============================================
[   23.055021] WARNING: possible recursive locking detected
[   23.060490] 5.0.0-rc3-00013-g56c857a1b8d3 riscvarchive#118 Not tainted
[   23.066132] --------------------------------------------
[   23.071598] ip/2861 is trying to acquire lock:
[   23.076171] 00000000f61990cb (_xmit_ETHER){+...}, at: dev_set_rx_mode+0x1c/0x38
[   23.083693]
[   23.083693] but task is already holding lock:
[   23.089696] 00000000ecf0c3b4 (_xmit_ETHER){+...}, at: dev_uc_add+0x24/0x70
[   23.096774]
[   23.096774] other info that might help us debug this:
[   23.103494]  Possible unsafe locking scenario:
[   23.103494]
[   23.109584]        CPU0
[   23.112093]        ----
[   23.114601]   lock(_xmit_ETHER);
[   23.117917]   lock(_xmit_ETHER);
[   23.121233]
[   23.121233]  *** DEADLOCK ***
[   23.121233]
[   23.127325]  May be due to missing lock nesting notation
[   23.127325]
[   23.134315] 2 locks held by ip/2861:
[   23.137987]  #0: 000000003b766c72 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x338/0x4e0
[   23.146231]  #1: 00000000ecf0c3b4 (_xmit_ETHER){+...}, at: dev_uc_add+0x24/0x70
[   23.153757]
[   23.153757] stack backtrace:
[   23.158243] CPU: 0 PID: 2861 Comm: ip Not tainted 5.0.0-rc3-00013-g56c857a1b8d3 riscvarchive#118
[   23.166212] Hardware name: Globalscale Marvell ESPRESSOBin Board (DT)
[   23.172843] Call trace:
[   23.175358]  dump_backtrace+0x0/0x188
[   23.179116]  show_stack+0x14/0x20
[   23.182524]  dump_stack+0xb4/0xec
[   23.185928]  __lock_acquire+0x123c/0x1860
[   23.190048]  lock_acquire+0xc8/0x248
[   23.193724]  _raw_spin_lock_bh+0x40/0x58
[   23.197755]  dev_set_rx_mode+0x1c/0x38
[   23.201607]  dev_set_promiscuity+0x3c/0x50
[   23.205820]  dsa_slave_change_rx_flags+0x5c/0x70
[   23.210567]  __dev_set_promiscuity+0x148/0x1e0
[   23.215136]  __dev_set_rx_mode+0x74/0x98
[   23.219167]  dev_uc_add+0x54/0x70
[   23.222575]  macvlan_open+0x170/0x1d0
[   23.226336]  __dev_open+0xe0/0x160
[   23.229830]  __dev_change_flags+0x16c/0x1b8
[   23.234132]  dev_change_flags+0x20/0x60
[   23.238074]  do_setlink+0x2d0/0xc50
[   23.241658]  __rtnl_newlink+0x5f8/0x6e8
[   23.245601]  rtnl_newlink+0x50/0x78
[   23.249184]  rtnetlink_rcv_msg+0x360/0x4e0
[   23.253397]  netlink_rcv_skb+0xe8/0x130
[   23.257338]  rtnetlink_rcv+0x14/0x20
[   23.261012]  netlink_unicast+0x190/0x210
[   23.265043]  netlink_sendmsg+0x288/0x350
[   23.269075]  sock_sendmsg+0x18/0x30
[   23.272659]  ___sys_sendmsg+0x29c/0x2c8
[   23.276602]  __sys_sendmsg+0x60/0xb8
[   23.280276]  __arm64_sys_sendmsg+0x1c/0x28
[   23.284488]  el0_svc_common+0xd8/0x138
[   23.288340]  el0_svc_handler+0x24/0x80
[   23.292192]  el0_svc+0x8/0xc

This looks fairly harmless (no actual deadlock occurs), and is
fixed in a similar way to c6894de ("bridge: fix lockdep
addr_list_lock false positive splat") by putting the addr_list_lock
in its own lockdep class.

Signed-off-by: Marc Zyngier <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
Otherwise we introduce a race condition where userspace can request input
before we're ready leading to null pointer dereference such as

input: bma150 as /devices/platform/i2c-gpio-2/i2c-5/5-0038/input/input3
Unable to handle kernel NULL pointer dereference at virtual address 00000018
pgd = (ptrval)
[00000018] *pgd=55dac831, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] PREEMPT ARM
Modules linked in: bma150 input_polldev [last unloaded: bma150]
CPU: 0 PID: 2870 Comm: accelerometer Not tainted 5.0.0-rc3-dirty riscvarchive#46
Hardware name: Samsung S5PC110/S5PV210-based board
PC is at input_event+0x8/0x60
LR is at bma150_report_xyz+0x9c/0xe0 [bma150]
pc : [<80450f70>]    lr : [<7f0a614c>]    psr: 800d0013
sp : a4c1fd78  ip : 00000081  fp : 00020000
r10: 00000000  r9 : a5e2944c  r8 : a7455000
r7 : 00000016  r6 : 00000101  r5 : a7617940  r4 : 80909048
r3 : fffffff2  r2 : 00000000  r1 : 00000003  r0 : 00000000
Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 54e34019  DAC: 00000051
Process accelerometer (pid: 2870, stack limit = 0x(ptrval))
Stackck: (0xa4c1fd78 to 0xa4c20000)
fd60:                                                       fffffff3 fc813f6c
fd80: 40410581 d7530ce3 a5e2817c a7617f00 a5e2940 a5e2817c 00000000 7f008324
fda0: a5e28000 8044f59c a5fdd9d0 a5e2945c a46a4a00 a5e29668 a7455000 80454f10
fdc0: 80909048 a5e29668 a5fdd9d0 a46a4a00 806316d0 00000000 a46a4a00 801df5f0
fde0: 00000000 d7530ce3 a4c1fec0 a46a4a00 00000000 a5fdd9d0 a46a4a08 801df53c
fe00: 00000000 801d74bc a4c1fec0 00000000 a4c1ff70 00000000 a7038da8 00000000
fe20: a46a4a00 801e91fc a411bbe0 801f2e88 00000004 00000000 80909048 00000041
fe40: 00000000 00020000 00000000 dead4ead a6a88da0 00000000 ffffe000 806fcae8
fe60: a4c1fec8 00000000 80909048 00000002 a5fdd9d0 a7660110 a411bab0 00000001
fe80: dead4ead ffffffff ffffffff a4c1fe8c a4c1fe8c d7530ce3 20000013 80909048
fea0: 80909048 a4c1ff70 00000001 fffff000 a4c1e000 00000005 00026038 801eabd8
fec0: a7660110 a411bab0 b9394901 00000006 a696201b 76fb3000 00000000 a7039720
fee0: a5fdd9d0 00000101 00000002 00000096 00000000 00000000 00000000 a4c1ff00
ff00: a6b310f4 805cb174 a6b310f4 00000010 00000fe0 00000010 a4c1e000 d7530ce3
ff20: 00000003 a5f41400 a5f41424 00000000 a6962000 00000000 00000003 00000002
ff40: ffffff9c 000a0000 80909048 d7530ce3 a6962000 00000003 80909048 ffffff9c
ff60: a6962000 801d890c 00000000 00000000 00020000 a7590000 00000004 00000100
ff80: 00000001 d7530ce3 000288b8 00026320 000288b8 00000005 80101204 a4c1e000
ffa0: 00000005 80101000 000288b8 00026320 000288b8 000a0000 00000000 00000000
ffc0: 000288b8 00026320 000288b8 00000005 7eef3bac 000264e8 00028ad8 00026038
ffe0: 00000005 7eef3300 76f76e91 76f78546 800d0030 000288b8 00000000 00000000
[<80450f70>] (input_event) from [<a5e2817c>] (0xa5e2817c)
Code: e1a08148 eaffffa8 e351001f 812fff1e (e590c018)
---[ end trace 1c691ee85f2ff243 ]---

Signed-off-by: Jonathan Bakker <[email protected]>
Signed-off-by: Paweł Chmiel <[email protected]>
Cc: [email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
This patch moves clk_get_rate() call from trigger() to hw_params()
callback to avoid calling sleeping clk API from atomic context
and prevent deadlock as indicated below.

Before this change clk_get_rate() was being called with same
spinlock held as the one passed to the clk API when registering
clocks exposed by the I2S driver.

[   82.109780] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:908
[   82.117009] in_atomic(): 1, irqs_disabled(): 128, pid: 1554, name: speaker-test
[   82.124235] 3 locks held by speaker-test/1554:
[   82.128653]  #0: cc8c5328 (snd_pcm_link_rwlock){...-}, at: snd_pcm_stream_lock_irq+0x20/0x38
[   82.137058]  #1: ec9eda17 (&(&substream->self_group.lock)->rlock){..-.}, at: snd_pcm_ioctl+0x900/0x1268
[   82.146417]  riscvarchive#2: 6ac279bf (&(&pri_dai->spinlock)->rlock){..-.}, at: i2s_trigger+0x64/0x6d4
[   82.154650] irq event stamp: 8144
[   82.157949] hardirqs last  enabled at (8143): [<c0a0f574>] _raw_read_unlock_irq+0x24/0x5c
[   82.166089] hardirqs last disabled at (8144): [<c0a0f6a8>] _raw_read_lock_irq+0x18/0x58
[   82.174063] softirqs last  enabled at (8004): [<c01024e4>] __do_softirq+0x3a4/0x66c
[   82.181688] softirqs last disabled at (7997): [<c012d730>] irq_exit+0x140/0x168
[   82.188964] Preemption disabled at:
[   82.188967] [<00000000>]   (null)
[   82.195728] CPU: 6 PID: 1554 Comm: speaker-test Not tainted 5.0.0-rc5-00192-ga6e6caca8f03 #191
[   82.204302] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[   82.210376] [<c0111a54>] (unwind_backtrace) from [<c010d8f4>] (show_stack+0x10/0x14)
[   82.218084] [<c010d8f4>] (show_stack) from [<c09ef004>] (dump_stack+0x90/0xc8)
[   82.225278] [<c09ef004>] (dump_stack) from [<c0152980>] (___might_sleep+0x22c/0x2c8)
[   82.232990] [<c0152980>] (___might_sleep) from [<c0a0a2e4>] (__mutex_lock+0x28/0xa3c)
[   82.240788] [<c0a0a2e4>] (__mutex_lock) from [<c0a0ad80>] (mutex_lock_nested+0x1c/0x24)
[   82.248763] [<c0a0ad80>] (mutex_lock_nested) from [<c04923dc>] (clk_prepare_lock+0x78/0xec)
[   82.257079] [<c04923dc>] (clk_prepare_lock) from [<c049538c>] (clk_core_get_rate+0xc/0x5c)
[   82.265309] [<c049538c>] (clk_core_get_rate) from [<c0766b18>] (i2s_trigger+0x490/0x6d4)
[   82.273369] [<c0766b18>] (i2s_trigger) from [<c074fec4>] (soc_pcm_trigger+0x100/0x140)
[   82.281254] [<c074fec4>] (soc_pcm_trigger) from [<c07378a0>] (snd_pcm_do_start+0x2c/0x30)
[   82.289400] [<c07378a0>] (snd_pcm_do_start) from [<c07376cc>] (snd_pcm_action_single+0x38/0x78)
[   82.298065] [<c07376cc>] (snd_pcm_action_single) from [<c073a450>] (snd_pcm_ioctl+0x910/0x1268)
[   82.306734] [<c073a450>] (snd_pcm_ioctl) from [<c0292344>] (do_vfs_ioctl+0x90/0x9ec)
[   82.314443] [<c0292344>] (do_vfs_ioctl) from [<c0292cd4>] (ksys_ioctl+0x34/0x60)
[   82.321808] [<c0292cd4>] (ksys_ioctl) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
[   82.329431] Exception stack(0xeb875fa8 to 0xeb875ff0)
[   82.334459] 5fa0:                   00033c18 b6e31000 00000004 00004142 00033d80 00033d80
[   82.342605] 5fc0: 00033c18 b6e31000 00008000 00000036 00008000 00000000 beea38a8 00008000
[   82.350748] 5fe0: b6e3142c beea384c b6da9a30 b6c9212c
[   82.355789]
[   82.357245] ======================================================
[   82.363397] WARNING: possible circular locking dependency detected
[   82.369551] 5.0.0-rc5-00192-ga6e6caca8f03 #191 Tainted: G        W
[   82.376395] ------------------------------------------------------
[   82.382548] speaker-test/1554 is trying to acquire lock:
[   82.387834] 6d2007f4 (prepare_lock){+.+.}, at: clk_prepare_lock+0x78/0xec
[   82.394593]
[   82.394593] but task is already holding lock:
[   82.400398] 6ac279bf (&(&pri_dai->spinlock)->rlock){..-.}, at: i2s_trigger+0x64/0x6d4
[   82.408197]
[   82.408197] which lock already depends on the new lock.
[   82.416343]
[   82.416343] the existing dependency chain (in reverse order) is:
[   82.423795]
[   82.423795] -> #1 (&(&pri_dai->spinlock)->rlock){..-.}:
[   82.430472]        clk_mux_set_parent+0x34/0xb8
[   82.434975]        clk_core_set_parent_nolock+0x1c4/0x52c
[   82.440347]        clk_set_parent+0x38/0x6c
[   82.444509]        of_clk_set_defaults+0xc8/0x308
[   82.449186]        of_clk_add_provider+0x84/0xd0
[   82.453779]        samsung_i2s_probe+0x408/0x5f8
[   82.458376]        platform_drv_probe+0x48/0x98
[   82.462879]        really_probe+0x224/0x3f4
[   82.467037]        driver_probe_device+0x70/0x1c4
[   82.471716]        bus_for_each_drv+0x44/0x8c
[   82.476049]        __device_attach+0xa0/0x138
[   82.480382]        bus_probe_device+0x88/0x90
[   82.484715]        deferred_probe_work_func+0x6c/0xbc
[   82.489741]        process_one_work+0x200/0x740
[   82.494246]        worker_thread+0x2c/0x4c8
[   82.498408]        kthread+0x128/0x164
[   82.502131]        ret_from_fork+0x14/0x20
[   82.506204]          (null)
[   82.508976]
[   82.508976] -> #0 (prepare_lock){+.+.}:
[   82.514264]        __mutex_lock+0x60/0xa3c
[   82.518336]        mutex_lock_nested+0x1c/0x24
[   82.522756]        clk_prepare_lock+0x78/0xec
[   82.527088]        clk_core_get_rate+0xc/0x5c
[   82.531421]        i2s_trigger+0x490/0x6d4
[   82.535494]        soc_pcm_trigger+0x100/0x140
[   82.539913]        snd_pcm_do_start+0x2c/0x30
[   82.544246]        snd_pcm_action_single+0x38/0x78
[   82.549012]        snd_pcm_ioctl+0x910/0x1268
[   82.553345]        do_vfs_ioctl+0x90/0x9ec
[   82.557417]        ksys_ioctl+0x34/0x60
[   82.561229]        ret_fast_syscall+0x0/0x28
[   82.565477]        0xbeea384c
[   82.568421]
[   82.568421] other info that might help us debug this:
[   82.568421]
[   82.576394]  Possible unsafe locking scenario:
[   82.576394]
[   82.582285]        CPU0                    CPU1
[   82.586792]        ----                    ----
[   82.591297]   lock(&(&pri_dai->spinlock)->rlock);
[   82.595977]                                lock(prepare_lock);
[   82.601782]                                lock(&(&pri_dai->spinlock)->rlock);
[   82.608975]   lock(prepare_lock);
[   82.612268]
[   82.612268]  *** DEADLOCK ***

Fixes: 647d04f ("ASoC: samsung: i2s: Ensure the RCLK rate is properly determined")
Reported-by: Krzysztof Kozłowski <[email protected]>
Signed-off-by: Sylwester Nawrocki <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
Vince (and later on Ravi) reported crashes in the BTS code during
fuzzing with the following backtrace:

  general protection fault: 0000 [#1] SMP PTI
  ...
  RIP: 0010:perf_prepare_sample+0x8f/0x510
  ...
  Call Trace:
   <IRQ>
   ? intel_pmu_drain_bts_buffer+0x194/0x230
   intel_pmu_drain_bts_buffer+0x160/0x230
   ? tick_nohz_irq_exit+0x31/0x40
   ? smp_call_function_single_interrupt+0x48/0xe0
   ? call_function_single_interrupt+0xf/0x20
   ? call_function_single_interrupt+0xa/0x20
   ? x86_schedule_events+0x1a0/0x2f0
   ? x86_pmu_commit_txn+0xb4/0x100
   ? find_busiest_group+0x47/0x5d0
   ? perf_event_set_state.part.42+0x12/0x50
   ? perf_mux_hrtimer_restart+0x40/0xb0
   intel_pmu_disable_event+0xae/0x100
   ? intel_pmu_disable_event+0xae/0x100
   x86_pmu_stop+0x7a/0xb0
   x86_pmu_del+0x57/0x120
   event_sched_out.isra.101+0x83/0x180
   group_sched_out.part.103+0x57/0xe0
   ctx_sched_out+0x188/0x240
   ctx_resched+0xa8/0xd0
   __perf_event_enable+0x193/0x1e0
   event_function+0x8e/0xc0
   remote_function+0x41/0x50
   flush_smp_call_function_queue+0x68/0x100
   generic_smp_call_function_single_interrupt+0x13/0x30
   smp_call_function_single_interrupt+0x3e/0xe0
   call_function_single_interrupt+0xf/0x20
   </IRQ>

The reason is that while event init code does several checks
for BTS events and prevents several unwanted config bits for
BTS event (like precise_ip), the PERF_EVENT_IOC_PERIOD allows
to create BTS event without those checks being done.

Following sequence will cause the crash:

If we create an 'almost' BTS event with precise_ip and callchains,
and it into a BTS event it will crash the perf_prepare_sample()
function because precise_ip events are expected to come
in with callchain data initialized, but that's not the
case for intel_pmu_drain_bts_buffer() caller.

Adding a check_period callback to be called before the period
is changed via PERF_EVENT_IOC_PERIOD. It will deny the change
if the event would become BTS. Plus adding also the limit_period
check as well.

Reported-by: Vince Weaver <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Naveen N. Rao <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/20190204123532.GA4794@krava
Signed-off-by: Ingo Molnar <[email protected]>
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
If a non zero value happens to be in xt[NFPROTO_BRIDGE].cur at init
time, the following panic can be caused by running

% ebtables -t broute -F BROUTING

from a 32-bit user level on a 64-bit kernel. This patch replaces
kmalloc_array with kcalloc when allocating xt.

[  474.680846] BUG: unable to handle kernel paging request at 0000000009600920
[  474.687869] PGD 2037006067 P4D 2037006067 PUD 2038938067 PMD 0
[  474.693838] Oops: 0000 [#1] SMP
[  474.697055] CPU: 9 PID: 4662 Comm: ebtables Kdump: loaded Not tainted 4.19.17-11302235.AroraKernelnext.fc18.x86_64 #1
[  474.707721] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.0 06/28/2013
[  474.714313] RIP: 0010:xt_compat_calc_jump+0x2f/0x63 [x_tables]
[  474.720201] Code: 40 0f b6 ff 55 31 c0 48 6b ff 70 48 03 3d dc 45 00 00 48 89 e5 8b 4f 6c 4c 8b 47 60 ff c9 39 c8 7f 2f 8d 14 08 d1 fa 48 63 fa <41> 39 34 f8 4c 8d 0c fd 00 00 00 00 73 05 8d 42 01 eb e1 76 05 8d
[  474.739023] RSP: 0018:ffffc9000943fc58 EFLAGS: 00010207
[  474.744296] RAX: 0000000000000000 RBX: ffffc90006465000 RCX: 0000000002580249
[  474.751485] RDX: 00000000012c0124 RSI: fffffffff7be17e9 RDI: 00000000012c0124
[  474.758670] RBP: ffffc9000943fc58 R08: 0000000000000000 R09: ffffffff8117cf8f
[  474.765855] R10: ffffc90006477000 R11: 0000000000000000 R12: 0000000000000001
[  474.773048] R13: 0000000000000000 R14: ffffc9000943fcb8 R15: ffffc9000943fcb8
[  474.780234] FS:  0000000000000000(0000) GS:ffff88a03f840000(0063) knlGS:00000000f7ac7700
[  474.788612] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[  474.794632] CR2: 0000000009600920 CR3: 0000002037422006 CR4: 00000000000606e0
[  474.802052] Call Trace:
[  474.804789]  compat_do_replace+0x1fb/0x2a3 [ebtables]
[  474.810105]  compat_do_ebt_set_ctl+0x69/0xe6 [ebtables]
[  474.815605]  ? try_module_get+0x37/0x42
[  474.819716]  compat_nf_setsockopt+0x4f/0x6d
[  474.824172]  compat_ip_setsockopt+0x7e/0x8c
[  474.828641]  compat_raw_setsockopt+0x16/0x3a
[  474.833220]  compat_sock_common_setsockopt+0x1d/0x24
[  474.838458]  __compat_sys_setsockopt+0x17e/0x1b1
[  474.843343]  ? __check_object_size+0x76/0x19a
[  474.847960]  __ia32_compat_sys_socketcall+0x1cb/0x25b
[  474.853276]  do_fast_syscall_32+0xaf/0xf6
[  474.857548]  entry_SYSENTER_compat+0x6b/0x7a

Signed-off-by: Francesco Ruggeri <[email protected]>
Acked-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
Attempting to avoid cloning the skb when broadcasting by inflating
the refcount with sock_hold/sock_put while under RCU lock is dangerous
and violates RCU principles. It leads to subtle race conditions when
attempting to free the SKB, as we may reference sockets that have
already been freed by the stack.

Unable to handle kernel paging request at virtual address 6b6b6b6b6b6c4b
[006b6b6b6b6b6c4b] address between user and kernel address ranges
Internal error: Oops: 96000004 [#1] PREEMPT SMP
task: fffffff78f65b380 task.stack: ffffff8049a88000
pc : sock_rfree+0x38/0x6c
lr : skb_release_head_state+0x6c/0xcc
Process repro (pid: 7117, stack limit = 0xffffff8049a88000)
Call trace:
	sock_rfree+0x38/0x6c
	skb_release_head_state+0x6c/0xcc
	skb_release_all+0x1c/0x38
	__kfree_skb+0x1c/0x30
	kfree_skb+0xd0/0xf4
	pfkey_broadcast+0x14c/0x18c
	pfkey_sendmsg+0x1d8/0x408
	sock_sendmsg+0x44/0x60
	___sys_sendmsg+0x1d0/0x2a8
	__sys_sendmsg+0x64/0xb4
	SyS_sendmsg+0x34/0x4c
	el0_svc_naked+0x34/0x38
Kernel panic - not syncing: Fatal exception

Suggested-by: Eric Dumazet <[email protected]>
Signed-off-by: Sean Tranchetti <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
… fault

The userspace can ask kprobe to intercept strings at any memory address,
including invalid kernel address. In this case, fetch_store_strlen()
would crash since it uses general usercopy function, and user access
functions are no longer allowed to access kernel memory.

For example, we can crash the kernel by doing something as below:

$ sudo kprobe 'p:do_sys_open +0(+0(%si)):string'

[  103.620391] BUG: GPF in non-whitelisted uaccess (non-canonical address?)
[  103.622104] general protection fault: 0000 [#1] SMP PTI
[  103.623424] CPU: 10 PID: 1046 Comm: cat Not tainted 5.0.0-rc3-00130-gd73aba1-dirty riscvarchive#96
[  103.625321] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-2-g628b2e6-dirty-20190104_103505-linux 04/01/2014
[  103.628284] RIP: 0010:process_fetch_insn+0x1ab/0x4b0
[  103.629518] Code: 10 83 80 28 2e 00 00 01 31 d2 31 ff 48 8b 74 24 28 eb 0c 81 fa ff 0f 00 00 7f 1c 85 c0 75 18 66 66 90 0f ae e8 48 63
 ca 89 f8 <8a> 0c 31 66 66 90 83 c2 01 84 c9 75 dc 89 54 24 34 89 44 24 28 48
[  103.634032] RSP: 0018:ffff88845eb37ce0 EFLAGS: 00010246
[  103.635312] RAX: 0000000000000000 RBX: ffff888456c4e5a8 RCX: 0000000000000000
[  103.637057] RDX: 0000000000000000 RSI: 2e646c2f6374652f RDI: 0000000000000000
[  103.638795] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  103.640556] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[  103.642297] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  103.644040] FS:  0000000000000000(0000) GS:ffff88846f000000(0000) knlGS:0000000000000000
[  103.646019] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  103.647436] CR2: 00007ffc79758038 CR3: 0000000463360006 CR4: 0000000000020ee0
[  103.649147] Call Trace:
[  103.649781]  ? sched_clock_cpu+0xc/0xa0
[  103.650747]  ? do_sys_open+0x5/0x220
[  103.651635]  kprobe_trace_func+0x303/0x380
[  103.652645]  ? do_sys_open+0x5/0x220
[  103.653528]  kprobe_dispatcher+0x45/0x50
[  103.654682]  ? do_sys_open+0x1/0x220
[  103.655875]  kprobe_ftrace_handler+0x90/0xf0
[  103.657282]  ftrace_ops_assist_func+0x54/0xf0
[  103.658564]  ? __call_rcu+0x1dc/0x280
[  103.659482]  0xffffffffc00000bf
[  103.660384]  ? __ia32_sys_open+0x20/0x20
[  103.661682]  ? do_sys_open+0x1/0x220
[  103.662863]  do_sys_open+0x5/0x220
[  103.663988]  do_syscall_64+0x60/0x210
[  103.665201]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  103.666862] RIP: 0033:0x7fc22fadccdd
[  103.668034] Code: 48 89 54 24 e0 41 83 e2 40 75 32 89 f0 25 00 00 41 00 3d 00 00 41 00 74 24 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff
 ff 0f 05 <48> 3d 00 f0 ff ff 77 33 f3 c3 66 0f 1f 84 00 00 00 00 00 48 8d 44
[  103.674029] RSP: 002b:00007ffc7972c3a8 EFLAGS: 00000287 ORIG_RAX: 0000000000000101
[  103.676512] RAX: ffffffffffffffda RBX: 0000562f86147a21 RCX: 00007fc22fadccdd
[  103.678853] RDX: 0000000000080000 RSI: 00007fc22fae1428 RDI: 00000000ffffff9c
[  103.681151] RBP: ffffffffffffffff R08: 0000000000000000 R09: 0000000000000000
[  103.683489] R10: 0000000000000000 R11: 0000000000000287 R12: 00007fc22fce90a8
[  103.685774] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[  103.688056] Modules linked in:
[  103.689131] ---[ end trace 43792035c28984a1 ]---

This can be fixed by using probe_mem_read() instead, as it can handle faulting
kernel memory addresses, which kprobes can legitimately do.

Link: http://lkml.kernel.org/r/[email protected]

Cc: [email protected]
Fixes: 9da3f2b ("x86/fault: BUG() when uaccess helpers fault on kernel addresses")
Signed-off-by: Changbin Du <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
When a target sends Check Condition, whilst initiator is busy xmiting
re-queued data, could lead to race between iscsi_complete_task() and
iscsi_xmit_task() and eventually crashing with the following kernel
backtrace.

[3326150.987523] ALERT: BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
[3326150.987549] ALERT: IP: [<ffffffffa05ce70d>] iscsi_xmit_task+0x2d/0xc0 [libiscsi]
[3326150.987571] WARN: PGD 569c8067 PUD 569c9067 PMD 0
[3326150.987582] WARN: Oops: 0002 [#1] SMP
[3326150.987593] WARN: Modules linked in: tun nfsv3 nfs fscache dm_round_robin
[3326150.987762] WARN: CPU: 2 PID: 8399 Comm: kworker/u32:1 Tainted: G O 4.4.0+2 #1
[3326150.987774] WARN: Hardware name: Dell Inc. PowerEdge R720/0W7JN5, BIOS 2.5.4 01/22/2016
[3326150.987790] WARN: Workqueue: iscsi_q_13 iscsi_xmitworker [libiscsi]
[3326150.987799] WARN: task: ffff8801d50f3800 ti: ffff8801f5458000 task.ti: ffff8801f5458000
[3326150.987810] WARN: RIP: e030:[<ffffffffa05ce70d>] [<ffffffffa05ce70d>] iscsi_xmit_task+0x2d/0xc0 [libiscsi]
[3326150.987825] WARN: RSP: e02b:ffff8801f545bdb0 EFLAGS: 00010246
[3326150.987831] WARN: RAX: 00000000ffffffc3 RBX: ffff880282d2ab20 RCX: ffff88026b6ac480
[3326150.987842] WARN: RDX: 0000000000000000 RSI: 00000000fffffe01 RDI: ffff880282d2ab20
[3326150.987852] WARN: RBP: ffff8801f545bdc8 R08: 0000000000000000 R09: 0000000000000008
[3326150.987862] WARN: R10: 0000000000000000 R11: 000000000000fe88 R12: 0000000000000000
[3326150.987872] WARN: R13: ffff880282d2abe8 R14: ffff880282d2abd8 R15: ffff880282d2ac08
[3326150.987890] WARN: FS: 00007f5a866b4840(0000) GS:ffff88028a640000(0000) knlGS:0000000000000000
[3326150.987900] WARN: CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[3326150.987907] WARN: CR2: 0000000000000078 CR3: 0000000070244000 CR4: 0000000000042660
[3326150.987918] WARN: Stack:
[3326150.987924] WARN: ffff880282d2ad58 ffff880282d2ab20 ffff880282d2abe8 ffff8801f545be18
[3326150.987938] WARN: ffffffffa05cea90 ffff880282d2abf8 ffff88026b59cc80 ffff88026b59cc00
[3326150.987951] WARN: ffff88022acf32c0 ffff880289491800 ffff880255a80800 0000000000000400
[3326150.987964] WARN: Call Trace:
[3326150.987975] WARN: [<ffffffffa05cea90>] iscsi_xmitworker+0x2f0/0x360 [libiscsi]
[3326150.987988] WARN: [<ffffffff8108862c>] process_one_work+0x1fc/0x3b0
[3326150.987997] WARN: [<ffffffff81088f95>] worker_thread+0x2a5/0x470
[3326150.988006] WARN: [<ffffffff8159cad8>] ? __schedule+0x648/0x870
[3326150.988015] WARN: [<ffffffff81088cf0>] ? rescuer_thread+0x300/0x300
[3326150.988023] WARN: [<ffffffff8108ddf5>] kthread+0xd5/0xe0
[3326150.988031] WARN: [<ffffffff8108dd20>] ? kthread_stop+0x110/0x110
[3326150.988040] WARN: [<ffffffff815a0bcf>] ret_from_fork+0x3f/0x70
[3326150.988048] WARN: [<ffffffff8108dd20>] ? kthread_stop+0x110/0x110
[3326150.988127] ALERT: RIP [<ffffffffa05ce70d>] iscsi_xmit_task+0x2d/0xc0 [libiscsi]
[3326150.988138] WARN: RSP <ffff8801f545bdb0>
[3326150.988144] WARN: CR2: 0000000000000078
[3326151.020366] WARN: ---[ end trace 1c60974d4678d81b ]---

Commit 6f8830f ("scsi: libiscsi: add lock around task lists to fix
list corruption regression") introduced "taskqueuelock" to fix list
corruption during the race, but this wasn't enough.

Re-setting of conn->task to NULL, could race with iscsi_xmit_task().
iscsi_complete_task()
{
    ....
    if (conn->task == task)
        conn->task = NULL;
}

conn->task in iscsi_xmit_task() could be NULL and so will be task.
__iscsi_get_task(task) will crash (NullPtr de-ref), trying to access
refcount.

iscsi_xmit_task()
{
    struct iscsi_task *task = conn->task;

    __iscsi_get_task(task);
}

This commit will take extra conn->session->back_lock in iscsi_xmit_task()
to ensure iscsi_xmit_task() waits for iscsi_complete_task(), if
iscsi_complete_task() wins the race.  If iscsi_xmit_task() wins the race,
iscsi_xmit_task() increments task->refcount
(__iscsi_get_task) ensuring iscsi_complete_task() will not iscsi_free_task().

Signed-off-by: Anoob Soman <[email protected]>
Signed-off-by: Bob Liu <[email protected]>
Acked-by: Lee Duncan <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
…sent()

In v4.20 we changed our pgd/pud_present() to check for _PAGE_PRESENT
rather than just checking that the value is non-zero, e.g.:

  static inline int pgd_present(pgd_t pgd)
  {
 -       return !pgd_none(pgd);
 +       return (pgd_raw(pgd) & cpu_to_be64(_PAGE_PRESENT));
  }

Unfortunately this is broken on big endian, as the result of the
bitwise & is truncated to int, which is always zero because
_PAGE_PRESENT is 0x8000000000000000ul. This means pgd_present() and
pud_present() are always false at compile time, and the compiler
elides the subsequent code.

Remarkably with that bug present we are still able to boot and run
with few noticeable effects. However under some work loads we are able
to trigger a warning in the ext4 code:

  WARNING: CPU: 11 PID: 29593 at fs/ext4/inode.c:3927 .ext4_set_page_dirty+0x70/0xb0
  CPU: 11 PID: 29593 Comm: debugedit Not tainted 4.20.0-rc1 #1
  ...
  NIP .ext4_set_page_dirty+0x70/0xb0
  LR  .set_page_dirty+0xa0/0x150
  Call Trace:
   .set_page_dirty+0xa0/0x150
   .unmap_page_range+0xbf0/0xe10
   .unmap_vmas+0x84/0x130
   .unmap_region+0xe8/0x190
   .__do_munmap+0x2f0/0x510
   .__vm_munmap+0x80/0x110
   .__se_sys_munmap+0x14/0x30
   system_call+0x5c/0x70

The fix is simple, we need to convert the result of the bitwise & to
an int before returning it.

Thanks to Erhard, Jan Kara and Aneesh for help with debugging.

Fixes: da7ad36 ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit")
Cc: [email protected] # v4.20+
Reported-by: Erhard F. <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
KASAN has found use-after-free in sockfs_setattr.
The existed commit 6d8c50d ("socket: close race condition between sock_close()
and sockfs_setattr()") is to fix this simillar issue, but it seems to ignore
that crypto module forgets to set the sk to NULL after af_alg_release.

KASAN report details as below:
BUG: KASAN: use-after-free in sockfs_setattr+0x120/0x150
Write of size 4 at addr ffff88837b956128 by task syz-executor0/4186

CPU: 2 PID: 4186 Comm: syz-executor0 Not tainted xxx + #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.10.2-1ubuntu1 04/01/2014
Call Trace:
 dump_stack+0xca/0x13e
 print_address_description+0x79/0x330
 ? vprintk_func+0x5e/0xf0
 kasan_report+0x18a/0x2e0
 ? sockfs_setattr+0x120/0x150
 sockfs_setattr+0x120/0x150
 ? sock_register+0x2d0/0x2d0
 notify_change+0x90c/0xd40
 ? chown_common+0x2ef/0x510
 chown_common+0x2ef/0x510
 ? chmod_common+0x3b0/0x3b0
 ? __lock_is_held+0xbc/0x160
 ? __sb_start_write+0x13d/0x2b0
 ? __mnt_want_write+0x19a/0x250
 do_fchownat+0x15c/0x190
 ? __ia32_sys_chmod+0x80/0x80
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 __x64_sys_fchownat+0xbf/0x160
 ? lockdep_hardirqs_on+0x39a/0x5e0
 do_syscall_64+0xc8/0x580
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x462589
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89
f7 48 89 d6 48 89
ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3
48 c7 c1 bc ff ff
ff f7 d8 64 89 01 48
RSP: 002b:00007fb4b2c83c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000104
RAX: ffffffffffffffda RBX: 000000000072bfa0 RCX: 0000000000462589
RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000007
RBP: 0000000000000005 R08: 0000000000001000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fb4b2c846bc
R13: 00000000004bc733 R14: 00000000006f5138 R15: 00000000ffffffff

Allocated by task 4185:
 kasan_kmalloc+0xa0/0xd0
 __kmalloc+0x14a/0x350
 sk_prot_alloc+0xf6/0x290
 sk_alloc+0x3d/0xc00
 af_alg_accept+0x9e/0x670
 hash_accept+0x4a3/0x650
 __sys_accept4+0x306/0x5c0
 __x64_sys_accept4+0x98/0x100
 do_syscall_64+0xc8/0x580
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 4184:
 __kasan_slab_free+0x12e/0x180
 kfree+0xeb/0x2f0
 __sk_destruct+0x4e6/0x6a0
 sk_destruct+0x48/0x70
 __sk_free+0xa9/0x270
 sk_free+0x2a/0x30
 af_alg_release+0x5c/0x70
 __sock_release+0xd3/0x280
 sock_close+0x1a/0x20
 __fput+0x27f/0x7f0
 task_work_run+0x136/0x1b0
 exit_to_usermode_loop+0x1a7/0x1d0
 do_syscall_64+0x461/0x580
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Syzkaller reproducer:
r0 = perf_event_open(&(0x7f0000000000)={0x0, 0x70, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, @perf_config_ext}, 0x0, 0x0,
0xffffffffffffffff, 0x0)
r1 = socket$alg(0x26, 0x5, 0x0)
getrusage(0x0, 0x0)
bind(r1, &(0x7f00000001c0)=@alg={0x26, 'hash\x00', 0x0, 0x0,
'sha256-ssse3\x00'}, 0x80)
r2 = accept(r1, 0x0, 0x0)
r3 = accept4$unix(r2, 0x0, 0x0, 0x0)
r4 = dup3(r3, r0, 0x0)
fchownat(r4, &(0x7f00000000c0)='\x00', 0x0, 0x0, 0x1000)

Fixes: 6d8c50d ("socket: close race condition between sock_close() and sockfs_setattr()")
Signed-off-by: Mao Wenan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
The compound IOMMU group rework moved iommu_register_group() together
in pnv_pci_ioda_setup_iommu_api() (which is a part of
ppc_md.pcibios_fixup). As the result, pnv_ioda_setup_bus_iommu_group()
does not create groups any more, it only adds devices to groups.

This works fine for boot time devices. However IOMMU groups for
SRIOV's VFs were added by pnv_ioda_setup_bus_iommu_group() so this got
broken: pnv_tce_iommu_bus_notifier() expects a group to be registered
for VF and it is not.

This adds missing group registration and adds a NULL pointer check
into the bus notifier so we won't crash if there is no group, although
it is not expected to happen now because of the change above.

Example oops seen prior to this patch:

  $ echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/sriov_numvfs
  Unable to handle kernel paging request for data at address 0x00000030
  Faulting instruction address: 0xc0000000004a6018
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE SMP NR_CPUS=2048 NUMA PowerNV
  CPU: 46 PID: 7006 Comm: bash Not tainted 4.15-ish
  NIP:  c0000000004a6018 LR: c0000000004a6014 CTR: 0000000000000000
  REGS: c000008fc876b400 TRAP: 0300   Not tainted  (4.15-ish)
  MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
  CFAR: c000000000d0be20 DAR: 0000000000000030 DSISR: 40000000 SOFTE: 1
  ...
  NIP sysfs_do_create_link_sd.isra.0+0x68/0x150
  LR  sysfs_do_create_link_sd.isra.0+0x64/0x150
  Call Trace:
    pci_dev_type+0x0/0x30 (unreliable)
    iommu_group_add_device+0x8c/0x600
    iommu_add_device+0xe8/0x180
    pnv_tce_iommu_bus_notifier+0xb0/0xf0
    notifier_call_chain+0x9c/0x110
    blocking_notifier_call_chain+0x64/0xa0
    device_add+0x524/0x7d0
    pci_device_add+0x248/0x450
    pci_iov_add_virtfn+0x294/0x3e0
    pci_enable_sriov+0x43c/0x580
    mlx5_core_sriov_configure+0x15c/0x2f0 [mlx5_core]
    sriov_numvfs_store+0x180/0x240
    dev_attr_store+0x3c/0x60
    sysfs_kf_write+0x64/0x90
    kernfs_fop_write+0x1ac/0x240
    __vfs_write+0x3c/0x70
    vfs_write+0xd8/0x220
    SyS_write+0x6c/0x110
    system_call+0x58/0x6c

Fixes: 0bd9716 ("powerpc/powernv/npu: Add compound IOMMU groups")
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reported-by: Santwana Samantray <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
Evaluating page_mapping() on a poisoned page ends up dereferencing junk
and making PF_POISONED_CHECK() considerably crashier than intended:

    Unable to handle kernel NULL pointer dereference at virtual address 0000000000000006
    Mem abort info:
      ESR = 0x96000005
      Exception class = DABT (current EL), IL = 32 bits
      SET = 0, FnV = 0
      EA = 0, S1PTW = 0
    Data abort info:
      ISV = 0, ISS = 0x00000005
      CM = 0, WnR = 0
    user pgtable: 4k pages, 39-bit VAs, pgdp = 00000000c2f6ac38
    [0000000000000006] pgd=0000000000000000, pud=0000000000000000
    Internal error: Oops: 96000005 [#1] PREEMPT SMP
    Modules linked in:
    CPU: 2 PID: 491 Comm: bash Not tainted 5.0.0-rc1+ #1
    Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Dec 17 2018
    pstate: 00000005 (nzcv daif -PAN -UAO)
    pc : page_mapping+0x18/0x118
    lr : __dump_page+0x1c/0x398
    Process bash (pid: 491, stack limit = 0x000000004ebd4ecd)
    Call trace:
     page_mapping+0x18/0x118
     __dump_page+0x1c/0x398
     dump_page+0xc/0x18
     remove_store+0xbc/0x120
     dev_attr_store+0x18/0x28
     sysfs_kf_write+0x40/0x50
     kernfs_fop_write+0x130/0x1d8
     __vfs_write+0x30/0x180
     vfs_write+0xb4/0x1a0
     ksys_write+0x60/0xd0
     __arm64_sys_write+0x18/0x20
     el0_svc_common+0x94/0xf8
     el0_svc_handler+0x68/0x70
     el0_svc+0x8/0xc
    Code: f9400401 d1000422 f240003f 9a801040 (f9400402)
    ---[ end trace cdb5eb5bf435cecb ]---

Fix that by not inspecting the mapping until we've determined that it's
likely to be valid.  Now the above condition still ends up stopping the
kernel, but in the correct manner:

    page:ffffffbf20000000 is uninitialized and poisoned
    raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
    raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
    page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
    ------------[ cut here ]------------
    kernel BUG at ./include/linux/mm.h:1006!
    Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
    Modules linked in:
    CPU: 1 PID: 483 Comm: bash Not tainted 5.0.0-rc1+ riscvarchive#3
    Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Dec 17 2018
    pstate: 40000005 (nZcv daif -PAN -UAO)
    pc : remove_store+0xbc/0x120
    lr : remove_store+0xbc/0x120
    ...

Link: http://lkml.kernel.org/r/03b53ee9d7e76cda4b9b5e1e31eea080db033396.1550071778.git.robin.murphy@arm.com
Fixes: 1c6fb1d ("mm: print more information about mapping in __dump_page")
Signed-off-by: Robin Murphy <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
We've been seeing hard-to-trigger psi crashes when running inside VM
instances:

    divide error: 0000 [#1] SMP PTI
    Modules linked in: [...]
    CPU: 0 PID: 212 Comm: kworker/0:2 Not tainted 4.16.18-119_fbk9_3817_gfe944c98d695 riscvarchive#119
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
    Workqueue: events psi_clock
    RIP: 0010:psi_update_stats+0x270/0x490
    RSP: 0018:ffffc90001117e10 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8800a35a13f8
    RDX: 0000000000000000 RSI: ffff8800a35a1340 RDI: 0000000000000000
    RBP: 0000000000000658 R08: ffff8800a35a1470 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    R13: 0000000000000000 R14: 0000000000000000 R15: 00000000000f8502
    FS:  0000000000000000(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fbe370fa000 CR3: 00000000b1e3a000 CR4: 00000000000006f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     psi_clock+0x12/0x50
     process_one_work+0x1e0/0x390
     worker_thread+0x2b/0x3c0
     ? rescuer_thread+0x330/0x330
     kthread+0x113/0x130
     ? kthread_create_worker_on_cpu+0x40/0x40
     ? SyS_exit_group+0x10/0x10
     ret_from_fork+0x35/0x40
    Code: 48 0f 47 c7 48 01 c2 45 85 e4 48 89 16 0f 85 e6 00 00 00 4c 8b 49 10 4c 8b 51 08 49 69 d9 f2 07 00 00 48 6b c0 64 4c 8b 29 31 d2 <48> f7 f7 49 69 d5 8d 06 00 00 48 89 c5 4c 69 f0 00 98 0b 00 48

The Code-line points to `period` being 0 inside update_stats(), and we
divide by that when calculating that period's pressure percentage.

The elapsed period should never be 0.  The reason this can happen is due
to an off-by-one in the idle time / missing period calculation combined
with a coarse sched_clock() in the virtual machine.

The target time for aggregation is advanced into the future on a fixed
grid to prevent clock drift.  So when an aggregation runs after some idle
period, we can not just set it to "now + psi_period", but have to
calculate the downtime and advance the target time relative to itself.

However, if the aggregator was disabled exactly one psi_period (ns), we
drop one idle period in the calculation due to a > when we should do >=.
In that case, next_update will be advanced from 'now - psi_period' to
'now' when it should be moved to 'now + psi_period'.  The run finishes
with last_update == next_update == sched_clock().

With hardware clocks, this exact nanosecond match isn't likely in the
first place; but if it does happen, the clock will still have moved on and
the period non-zero by the time the worker runs.  A pointlessly short
period, but besides the extra work, no harm no foul.  However, a slow
sched_clock() like we have on VMs might not have advanced either by the
time the worker runs again.  And when we calculate the elapsed period, the
result, our pressure divisor, will be 0.  Ouch.

Fix this by correctly handling the situation when the elapsed time between
aggregation runs is precisely two periods, and advance the expiration
timestamp correctly to period into the future.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Johannes Weiner <[email protected]>
Reported-by: Łukasz Siudut <[email protected]
Reviewed-by: Andrew Morton <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
In process_slab(), "p = get_freepointer()" could return a tagged
pointer, but "addr = page_address()" always return a native pointer.  As
the result, slab_index() is messed up here,

    return (p - addr) / s->size;

All other callers of slab_index() have the same situation where "addr"
is from page_address(), so just need to untag "p".

    # cat /sys/kernel/slab/hugetlbfs_inode_cache/alloc_calls

    Unable to handle kernel paging request at virtual address 2bff808aa4856d48
    Mem abort info:
      ESR = 0x96000007
      Exception class = DABT (current EL), IL = 32 bits
      SET = 0, FnV = 0
      EA = 0, S1PTW = 0
    Data abort info:
      ISV = 0, ISS = 0x00000007
      CM = 0, WnR = 0
    swapper pgtable: 64k pages, 48-bit VAs, pgdp = 0000000002498338
    [2bff808aa4856d48] pgd=00000097fcfd0003, pud=00000097fcfd0003, pmd=00000097fca30003, pte=00e8008b24850712
    Internal error: Oops: 96000007 [#1] SMP
    CPU: 3 PID: 79210 Comm: read_all Tainted: G             L    5.0.0-rc7+ riscvarchive#84
    Hardware name: HPE Apollo 70             /C01_APACHE_MB         , BIOS L50_5.13_1.0.6 07/10/2018
    pstate: 00400089 (nzcv daIf +PAN -UAO)
    pc : get_map+0x78/0xec
    lr : get_map+0xa0/0xec
    sp : aeff808989e3f8e0
    x29: aeff808989e3f940 x28: ffff800826200000
    x27: ffff100012d47000 x26: 9700000000002500
    x25: 0000000000000001 x24: 52ff8008200131f8
    x23: 52ff8008200130a0 x22: 52ff800820013098
    x21: ffff800826200000 x20: ffff100013172ba0
    x19: 2bff808a8971bc00 x18: ffff1000148f5538
    x17: 000000000000001b x16: 00000000000000ff
    x15: ffff1000148f5000 x14: 00000000000000d2
    x13: 0000000000000001 x12: 0000000000000000
    x11: 0000000020000002 x10: 2bff808aa4856d48
    x9 : 0000020000000000 x8 : 68ff80082620ebb0
    x7 : 0000000000000000 x6 : ffff1000105da1dc
    x5 : 0000000000000000 x4 : 0000000000000000
    x3 : 0000000000000010 x2 : 2bff808a8971bc00
    x1 : ffff7fe002098800 x0 : ffff80082620ceb0
    Process read_all (pid: 79210, stack limit = 0x00000000f65b9361)
    Call trace:
     get_map+0x78/0xec
     process_slab+0x7c/0x47c
     list_locations+0xb0/0x3c8
     alloc_calls_show+0x34/0x40
     slab_attr_show+0x34/0x48
     sysfs_kf_seq_show+0x2e4/0x570
     kernfs_seq_show+0x12c/0x1a0
     seq_read+0x48c/0xf84
     kernfs_fop_read+0xd4/0x448
     __vfs_read+0x94/0x5d4
     vfs_read+0xcc/0x194
     ksys_read+0x6c/0xe8
     __arm64_sys_read+0x68/0xb0
     el0_svc_handler+0x230/0x3bc
     el0_svc+0x8/0xc
    Code: d3467d2a 9ac92329 8b0a0e6a f9800151 (c85f7d4b)
    ---[ end trace a383a9a44ff13176 ]---
    Kernel panic - not syncing: Fatal exception
    SMP: stopping secondary CPUs
    SMP: failed to stop secondary CPUs 1-7,32,40,127
    Kernel Offset: disabled
    CPU features: 0x002,20000c18
    Memory Limit: none
    ---[ end Kernel panic - not syncing: Fatal exception ]---

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Qian Cai <[email protected]>
Reviewed-by: Andrey Konovalov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
Rong Chen has reported the following boot crash:

    PGD 0 P4D 0
    Oops: 0000 [#1] PREEMPT SMP PTI
    CPU: 1 PID: 239 Comm: udevd Not tainted 5.0.0-rc4-00149-gefad4e4 #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
    RIP: 0010:page_mapping+0x12/0x80
    Code: 5d c3 48 89 df e8 0e ad 02 00 85 c0 75 da 89 e8 5b 5d c3 0f 1f 44 00 00 53 48 89 fb 48 8b 43 08 48 8d 50 ff a8 01 48 0f 45 da <48> 8b 53 08 48 8d 42 ff 83 e2 01 48 0f 44 c3 48 83 38 ff 74 2f 48
    RSP: 0018:ffff88801fa87cd8 EFLAGS: 00010202
    RAX: ffffffffffffffff RBX: fffffffffffffffe RCX: 000000000000000a
    RDX: fffffffffffffffe RSI: ffffffff820b9a20 RDI: ffff88801e5c0000
    RBP: 6db6db6db6db6db7 R08: ffff88801e8bb000 R09: 0000000001b64d13
    R10: ffff88801fa87cf8 R11: 0000000000000001 R12: ffff88801e640000
    R13: ffffffff820b9a20 R14: ffff88801f145258 R15: 0000000000000001
    FS:  00007fb2079817c0(0000) GS:ffff88801dd00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000006 CR3: 000000001fa82000 CR4: 00000000000006a0
    Call Trace:
     __dump_page+0x14/0x2c0
     is_mem_section_removable+0x24c/0x2c0
     removable_show+0x87/0xa0
     dev_attr_show+0x25/0x60
     sysfs_kf_seq_show+0xba/0x110
     seq_read+0x196/0x3f0
     __vfs_read+0x34/0x180
     vfs_read+0xa0/0x150
     ksys_read+0x44/0xb0
     do_syscall_64+0x5e/0x4a0
     entry_SYSCALL_64_after_hwframe+0x49/0xbe

and bisected it down to commit efad4e4 ("mm, memory_hotplug:
is_mem_section_removable do not pass the end of a zone").

The reason for the crash is that the mapping is garbage for poisoned
(uninitialized) page.  This shouldn't happen as all pages in the zone's
boundary should be initialized.

Later debugging revealed that the actual problem is an off-by-one when
evaluating the end_page.  'start_pfn + nr_pages' resp 'zone_end_pfn'
refers to a pfn after the range and as such it might belong to a
differen memory section.

This along with CONFIG_SPARSEMEM then makes the loop condition
completely bogus because a pointer arithmetic doesn't work for pages
from two different sections in that memory model.

Fix the issue by reworking is_pageblock_removable to be pfn based and
only use struct page where necessary.  This makes the code slightly
easier to follow and we will remove the problematic pointer arithmetic
completely.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: efad4e4 ("mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone")
Signed-off-by: Michal Hocko <[email protected]>
Reported-by: <[email protected]>
Tested-by: <[email protected]>
Acked-by: Mike Rapoport <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
The default value of ARCH_SLAB_MINALIGN in "include/linux/slab.h" is
"__alignof__(unsigned long long)" which for ARC unexpectedly turns out
to be 4. This is not a compiler bug, but as defined by ARC ABI [1]

Thus slab allocator would allocate a struct which is 32-bit aligned,
which is generally OK even if struct has long long members.
There was however potetial problem when it had any atomic64_t which
use LLOCKD/SCONDD instructions which are required by ISA to take
64-bit addresses. This is the problem we ran into

[    4.015732] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[    4.167881] Misaligned Access
[    4.172356] Path: /bin/busybox.nosuid
[    4.176004] CPU: 2 PID: 171 Comm: rm Not tainted 4.19.14-yocto-standard #1
[    4.182851]
[    4.182851] [ECR   ]: 0x000d0000 => Check Programmer's Manual
[    4.190061] [EFA   ]: 0xbeaec3fc
[    4.190061] [BLINK ]: ext4_delete_entry+0x210/0x234
[    4.190061] [ERET  ]: ext4_delete_entry+0x13e/0x234
[    4.202985] [STAT32]: 0x80080002 : IE K
[    4.207236] BTA: 0x9009329c   SP: 0xbe5b1ec4  FP: 0x00000000
[    4.212790] LPS: 0x9074b118  LPE: 0x9074b120 LPC: 0x00000000
[    4.218348] r00: 0x00000040  r01: 0x00000021 r02: 0x00000001
...
...
[    4.270510] Stack Trace:
[    4.274510]   ext4_delete_entry+0x13e/0x234
[    4.278695]   ext4_rmdir+0xe0/0x238
[    4.282187]   vfs_rmdir+0x50/0xf0
[    4.285492]   do_rmdir+0x9e/0x154
[    4.288802]   EV_Trap+0x110/0x114

The fix is to make sure slab allocations are 64-bit aligned.

Do note that atomic64_t is __attribute__((aligned(8)) which means gcc
does generate 64-bit aligned references, relative to beginning of
container struct. However the issue is if the container itself is not
64-bit aligned, atomic64_t ends up unaligned which is what this patch
ensures.

[1] https://github.com/foss-for-synopsys-dwc-arc-processors/toolchain/wiki/files/ARCv2_ABI.pdf

Signed-off-by: Alexey Brodkin <[email protected]>
Cc: <[email protected]> # 4.8+
Signed-off-by: Vineet Gupta <[email protected]>
[vgupta: reworked changelog, added dependency on LL64+LLSC]
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
The SHA256 code we adopted from the OpenSSL project uses a rather
peculiar way to take the address of the round constant table: it
takes the address of the sha256_block_data_order() routine, and
substracts a constant known quantity to arrive at the base of the
table, which is emitted by the same assembler code right before
the routine's entry point.

However, recent versions of binutils have helpfully changed the
behavior of references emitted via an ADR instruction when running
in Thumb2 mode: it now takes the Thumb execution mode bit into
account, which is bit 0 af the address. This means the produced
table address also has bit 0 set, and so we end up with an address
value pointing 1 byte past the start of the table, which results
in crashes such as

  Unable to handle kernel paging request at virtual address bf825000
  pgd = 42f44b11
  [bf825000] *pgd=80000040206003, *pmd=5f1bd003, *pte=00000000
  Internal error: Oops: 207 [#1] PREEMPT SMP THUMB2
  Modules linked in: sha256_arm(+) sha1_arm_ce sha1_arm ...
  CPU: 7 PID: 396 Comm: cryptomgr_test Not tainted 5.0.0-rc6+ riscvarchive#144
  Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
  PC is at sha256_block_data_order+0xaaa/0xb30 [sha256_arm]
  LR is at __this_module+0x17fd/0xffffe800 [sha256_arm]
  pc : [<bf820bca>]    lr : [<bf824ffd>]    psr: 800b0033
  sp : ebc8bbe8  ip : faaabe1c  fp : 2fdd3433
  r10: 4c5f1692  r9 : e43037df  r8 : b04b0a5a
  r7 : c369d722  r6 : 39c3693e  r5 : 7a013189  r4 : 1580d26b
  r3 : 8762a9b0  r2 : eea9c2cd  r1 : 3e9ab536  r0 : 1dea4ae7
  Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA Thumb  Segment user
  Control: 70c5383d  Table: 6b8467c0  DAC: dbadc0de
  Process cryptomgr_test (pid: 396, stack limit = 0x69e1fe23)
  Stack: (0xebc8bbe8 to 0xebc8c000)
  ...
  unwind: Unknown symbol address bf820bca
  unwind: Index not found bf820bca
  Code: 441a ea80 40f9 440a (f85e) 3b04
  ---[ end trace e560cce92700ef8a ]---

Given that this affects older kernels as well, in case they are built
with a recent toolchain, apply a minimal backportable fix, which is
to emit another non-code label at the start of the routine, and
reference that instead. (This is similar to the current upstream state
of this file in OpenSSL)

Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
The SHA512 code we adopted from the OpenSSL project uses a rather
peculiar way to take the address of the round constant table: it
takes the address of the sha256_block_data_order() routine, and
substracts a constant known quantity to arrive at the base of the
table, which is emitted by the same assembler code right before
the routine's entry point.

However, recent versions of binutils have helpfully changed the
behavior of references emitted via an ADR instruction when running
in Thumb2 mode: it now takes the Thumb execution mode bit into
account, which is bit 0 af the address. This means the produced
table address also has bit 0 set, and so we end up with an address
value pointing 1 byte past the start of the table, which results
in crashes such as

  Unable to handle kernel paging request at virtual address bf825000
  pgd = 42f44b11
  [bf825000] *pgd=80000040206003, *pmd=5f1bd003, *pte=00000000
  Internal error: Oops: 207 [#1] PREEMPT SMP THUMB2
  Modules linked in: sha256_arm(+) sha1_arm_ce sha1_arm ...
  CPU: 7 PID: 396 Comm: cryptomgr_test Not tainted 5.0.0-rc6+ riscvarchive#144
  Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
  PC is at sha256_block_data_order+0xaaa/0xb30 [sha256_arm]
  LR is at __this_module+0x17fd/0xffffe800 [sha256_arm]
  pc : [<bf820bca>]    lr : [<bf824ffd>]    psr: 800b0033
  sp : ebc8bbe8  ip : faaabe1c  fp : 2fdd3433
  r10: 4c5f1692  r9 : e43037df  r8 : b04b0a5a
  r7 : c369d722  r6 : 39c3693e  r5 : 7a013189  r4 : 1580d26b
  r3 : 8762a9b0  r2 : eea9c2cd  r1 : 3e9ab536  r0 : 1dea4ae7
  Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA Thumb  Segment user
  Control: 70c5383d  Table: 6b8467c0  DAC: dbadc0de
  Process cryptomgr_test (pid: 396, stack limit = 0x69e1fe23)
  Stack: (0xebc8bbe8 to 0xebc8c000)
  ...
  unwind: Unknown symbol address bf820bca
  unwind: Index not found bf820bca
  Code: 441a ea80 40f9 440a (f85e) 3b04
  ---[ end trace e560cce92700ef8a ]---

Given that this affects older kernels as well, in case they are built
with a recent toolchain, apply a minimal backportable fix, which is
to emit another non-code label at the start of the routine, and
reference that instead. (This is similar to the current upstream state
of this file in OpenSSL)

Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
…version

Fix a possible NULL pointer dereference in ip6erspan_set_version checking
nlattr data pointer

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 7549 Comm: syz-executor432 Not tainted 5.0.0-rc6-next-20190218
riscvarchive#37
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:ip6erspan_set_version+0x5c/0x350 net/ipv6/ip6_gre.c:1726
Code: 07 38 d0 7f 08 84 c0 0f 85 9f 02 00 00 49 8d bc 24 b0 00 00 00 c6 43
54 01 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f
85 9a 02 00 00 4d 8b ac 24 b0 00 00 00 4d 85 ed 0f
RSP: 0018:ffff888089ed7168 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff8880869d6e58 RCX: 0000000000000000
RDX: 0000000000000016 RSI: ffffffff862736b4 RDI: 00000000000000b0
RBP: ffff888089ed7180 R08: 1ffff11010d3adcb R09: ffff8880869d6e58
R10: ffffed1010d3add5 R11: ffff8880869d6eaf R12: 0000000000000000
R13: ffffffff8931f8c0 R14: ffffffff862825d0 R15: ffff8880869d6e58
FS:  0000000000b3d880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000184 CR3: 0000000092cc5000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  ip6erspan_newlink+0x66/0x7b0 net/ipv6/ip6_gre.c:2210
  __rtnl_newlink+0x107b/0x16c0 net/core/rtnetlink.c:3176
  rtnl_newlink+0x69/0xa0 net/core/rtnetlink.c:3234
  rtnetlink_rcv_msg+0x465/0xb00 net/core/rtnetlink.c:5192
  netlink_rcv_skb+0x17a/0x460 net/netlink/af_netlink.c:2485
  rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5210
  netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
  netlink_unicast+0x536/0x720 net/netlink/af_netlink.c:1336
  netlink_sendmsg+0x8ae/0xd70 net/netlink/af_netlink.c:1925
  sock_sendmsg_nosec net/socket.c:621 [inline]
  sock_sendmsg+0xdd/0x130 net/socket.c:631
  ___sys_sendmsg+0x806/0x930 net/socket.c:2136
  __sys_sendmsg+0x105/0x1d0 net/socket.c:2174
  __do_sys_sendmsg net/socket.c:2183 [inline]
  __se_sys_sendmsg net/socket.c:2181 [inline]
  __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2181
  do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x440159
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fffa69156e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440159
RDX: 0000000000000000 RSI: 0000000020001340 RDI: 0000000000000003
RBP: 00000000006ca018 R08: 0000000000000001 R09: 00000000004002c8
R10: 0000000000000011 R11: 0000000000000246 R12: 00000000004019e0
R13: 0000000000401a70 R14: 0000000000000000 R15: 0000000000000000
Modules linked in:
---[ end trace 09f8a7d13b4faaa1 ]---
RIP: 0010:ip6erspan_set_version+0x5c/0x350 net/ipv6/ip6_gre.c:1726
Code: 07 38 d0 7f 08 84 c0 0f 85 9f 02 00 00 49 8d bc 24 b0 00 00 00 c6 43
54 01 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f
85 9a 02 00 00 4d 8b ac 24 b0 00 00 00 4d 85 ed 0f
RSP: 0018:ffff888089ed7168 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff8880869d6e58 RCX: 0000000000000000
RDX: 0000000000000016 RSI: ffffffff862736b4 RDI: 00000000000000b0
RBP: ffff888089ed7180 R08: 1ffff11010d3adcb R09: ffff8880869d6e58
R10: ffffed1010d3add5 R11: ffff8880869d6eaf R12: 0000000000000000
R13: ffffffff8931f8c0 R14: ffffffff862825d0 R15: ffff8880869d6e58
FS:  0000000000b3d880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000184 CR3: 0000000092cc5000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fixes: 4974d5f ("net: ip6_gre: initialize erspan_ver just for erspan tunnels")
Reported-and-tested-by: [email protected]
Signed-off-by: Lorenzo Bianconi <[email protected]>
Reviewed-by: Greg Rose <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
Commit 5738459 ("iommu/vt-d: Store bus information in RMRR PCI
device path") changed the type of the path data, however, the change in
path type was not reflected in size calculations.  Update to use the
correct type and prevent a buffer overflow.

This bug manifests in systems with deep PCI hierarchies, and can lead to
an overflow of the static allocated buffer (dmar_pci_notify_info_buf),
or can lead to overflow of slab-allocated data.

   BUG: KASAN: global-out-of-bounds in dmar_alloc_pci_notify_info+0x1d5/0x2e0
   Write of size 1 at addr ffffffff90445d80 by task swapper/0/1
   CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W       4.14.87-rt49-02406-gd0a0e96 #1
   Call Trace:
    ? dump_stack+0x46/0x59
    ? print_address_description+0x1df/0x290
    ? dmar_alloc_pci_notify_info+0x1d5/0x2e0
    ? kasan_report+0x256/0x340
    ? dmar_alloc_pci_notify_info+0x1d5/0x2e0
    ? e820__memblock_setup+0xb0/0xb0
    ? dmar_dev_scope_init+0x424/0x48f
    ? __down_write_common+0x1ec/0x230
    ? dmar_dev_scope_init+0x48f/0x48f
    ? dmar_free_unused_resources+0x109/0x109
    ? cpumask_next+0x16/0x20
    ? __kmem_cache_create+0x392/0x430
    ? kmem_cache_create+0x135/0x2f0
    ? e820__memblock_setup+0xb0/0xb0
    ? intel_iommu_init+0x170/0x1848
    ? _raw_spin_unlock_irqrestore+0x32/0x60
    ? migrate_enable+0x27a/0x5b0
    ? sched_setattr+0x20/0x20
    ? migrate_disable+0x1fc/0x380
    ? task_rq_lock+0x170/0x170
    ? try_to_run_init_process+0x40/0x40
    ? locks_remove_file+0x85/0x2f0
    ? dev_prepare_static_identity_mapping+0x78/0x78
    ? rt_spin_unlock+0x39/0x50
    ? lockref_put_or_lock+0x2a/0x40
    ? dput+0x128/0x2f0
    ? __rcu_read_unlock+0x66/0x80
    ? __fput+0x250/0x300
    ? __rcu_read_lock+0x1b/0x30
    ? mntput_no_expire+0x38/0x290
    ? e820__memblock_setup+0xb0/0xb0
    ? pci_iommu_init+0x25/0x63
    ? pci_iommu_init+0x25/0x63
    ? do_one_initcall+0x7e/0x1c0
    ? initcall_blacklisted+0x120/0x120
    ? kernel_init_freeable+0x27b/0x307
    ? rest_init+0xd0/0xd0
    ? kernel_init+0xf/0x120
    ? rest_init+0xd0/0xd0
    ? ret_from_fork+0x1f/0x40
   The buggy address belongs to the variable:
    dmar_pci_notify_info_buf+0x40/0x60

Fixes: 5738459 ("iommu/vt-d: Store bus information in RMRR PCI device path")
Signed-off-by: Julia Cartwright <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
atishp04 pushed a commit that referenced this pull request Mar 8, 2019
It can be reproduced by following steps:
1. virtio_net NIC is configured with gso/tso on
2. configure nginx as http server with an index file bigger than 1M bytes
3. use tc netem to produce duplicate packets and delay:
   tc qdisc add dev eth0 root netem delay 100ms 10ms 30% duplicate 90%
4. continually curl the nginx http server to get index file on client
5. BUG_ON is seen quickly

[10258690.371129] kernel BUG at net/core/skbuff.c:4028!
[10258690.371748] invalid opcode: 0000 [#1] SMP PTI
[10258690.372094] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G        W         5.0.0-rc6 riscvarchive#2
[10258690.372094] RSP: 0018:ffffa05797b43da0 EFLAGS: 00010202
[10258690.372094] RBP: 00000000000005ea R08: 0000000000000000 R09: 00000000000005ea
[10258690.372094] R10: ffffa0579334d800 R11: 00000000000002c0 R12: 0000000000000002
[10258690.372094] R13: 0000000000000000 R14: ffffa05793122900 R15: ffffa0578f7cb028
[10258690.372094] FS:  0000000000000000(0000) GS:ffffa05797b40000(0000) knlGS:0000000000000000
[10258690.372094] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10258690.372094] CR2: 00007f1a6dc00868 CR3: 000000001000e000 CR4: 00000000000006e0
[10258690.372094] Call Trace:
[10258690.372094]  <IRQ>
[10258690.372094]  skb_to_sgvec+0x11/0x40
[10258690.372094]  start_xmit+0x38c/0x520 [virtio_net]
[10258690.372094]  dev_hard_start_xmit+0x9b/0x200
[10258690.372094]  sch_direct_xmit+0xff/0x260
[10258690.372094]  __qdisc_run+0x15e/0x4e0
[10258690.372094]  net_tx_action+0x137/0x210
[10258690.372094]  __do_softirq+0xd6/0x2a9
[10258690.372094]  irq_exit+0xde/0xf0
[10258690.372094]  smp_apic_timer_interrupt+0x74/0x140
[10258690.372094]  apic_timer_interrupt+0xf/0x20
[10258690.372094]  </IRQ>

In __skb_to_sgvec(), the skb->len is not equal to the sum of the skb's
linear data size and nonlinear data size, thus BUG_ON triggered.
Because the skb is cloned and a part of nonlinear data is split off.

Duplicate packet is cloned in netem_enqueue() and may be delayed
some time in qdisc. When qdisc len reached the limit and returns
NET_XMIT_DROP, the skb will be retransmit later in write queue.
the skb will be fragmented by tso_fragment(), the limit size
that depends on cwnd and mss decrease, the skb's nonlinear
data will be split off. The length of the skb cloned by netem
will not be updated. When we use virtio_net NIC and invoke skb_to_sgvec(),
the BUG_ON trigger.

To fix it, netem returns NET_XMIT_SUCCESS to upper stack
when it clones a duplicate packet.

Fixes: 35d889d ("sch_netem: fix skb leak in netem_enqueue()")
Signed-off-by: Sheng Lan <[email protected]>
Reported-by: Qin Ji <[email protected]>
Suggested-by: Eric Dumazet <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants