merge (#1) #446

addstone · 2017-08-10T03:39:59Z

i2c: designware: Print clock freq on invalid clock freq error

When we refuse to probe due to an invalid clock frequency, log
the frequency which is causing this error.

Signed-off-by: Hans de Goede [email protected]
Reviewed-by: Andy Shevchenko [email protected]
Signed-off-by: Wolfram Sang [email protected]

i2c: designware: Some broken DSTDs use 1MiHz instead of 1MHz

At least the Acer Iconia Tab8 / aka W1-810 uses 1MiHz instead of
1MHz for one of its busses, fix this up to 1MHz instead of failing
the probe of that bus.

This fixes the accelerometer on the Acer Iconia Tab8 not working.

Cc: [email protected]
Signed-off-by: Hans de Goede [email protected]
Reviewed-by: Andy Shevchenko [email protected]
Signed-off-by: Wolfram Sang [email protected]

parisc: pdc_stable: Fix locking when creating sysfs links

There's no need to take the write lock when creating sysfs links.

This patch fixes the following BUG:
BUG: sleeping function called from invalid context at mm/slab.h:416
in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: swapper/0
CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc2-00110-g0b5477d9dabd #111
Backtrace:
[<0000000040217ac8>] show_stack+0x20/0x38
[<00000000406fbbb0>] dump_stack+0xb0/0x128
[<0000000040274090>] ___might_sleep+0x180/0x1b8
[<0000000040274144>] __might_sleep+0x7c/0xe8
[<0000000040373874>] kmem_cache_alloc+0x14c/0x1e0
[<0000000040419514>] __kernfs_new_node+0x84/0x1b8
[<000000004041b09c>] kernfs_new_node+0x3c/0x78
[<000000004041e040>] kernfs_create_link+0x40/0xd8
[<000000004041f320>] sysfs_do_create_link_sd.isra.0+0xb0/0x130
[<000000004041f3d4>] sysfs_create_link+0x34/0x58
[<000000004011b4a4>] pdc_stable_init+0x2c4/0x458
[<0000000040200250>] do_one_initcall+0x70/0x1d8
[<0000000040101644>] kernel_init_freeable+0x27c/0x390
[<000000004020be44>] kernel_init+0x24/0x1c0

Signed-off-by: James Bottomley [email protected]
Reported-by: Meelis Roos [email protected]
Signed-off-by: Helge Deller [email protected]

libata: fix a couple of doc build warnings

The kerneldoc comments for a couple of functions in drivers/ata/libata-eh.c
had fallen behind the current implementation, resulting in these doc build
warnings:

./drivers/ata/libata-eh.c:1449: warning: No description found for parameter 'link'
./drivers/ata/libata-eh.c:1449: warning: Excess function parameter 'ap' description in 'ata_eh_done'
./drivers/ata/libata-eh.c:1590: warning: No description found for parameter 'qc'
./drivers/ata/libata-eh.c:1590: warning: Excess function parameter 'dev' description in 'ata_eh_request_sense'

Update the comments and make the warnings go away.

Signed-off-by: Jonathan Corbet [email protected]
Signed-off-by: Tejun Heo [email protected]

i2c: allow i2c-versatile for ARM MPS platforms

Allow i2c-versatile to be enabled for ARM MPS platforms.

Signed-off-by: Russell King [email protected]
Signed-off-by: Wolfram Sang [email protected]

i2c: rephrase explanation of I2C_CLASS_DEPRECATED

Hopefully making clear that it is not needed for new drivers.

Signed-off-by: Wolfram Sang [email protected]

parisc: Define CONFIG_CPU_BIG_ENDIAN

While working on enabling queued rwlock on SPARC, found this following
code in include/asm-generic/qrwlock.h which uses CONFIG_CPU_BIG_ENDIAN
to clear a byte.

static inline u8 *__qrwlock_write_byte(struct qrwlock *lock)
{
return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN);
}

Problem is many of the fixed big endian architectures don't define
CPU_BIG_ENDIAN and clears the wrong byte.

Define CPU_BIG_ENDIAN for parisc architecture to fix it.

Signed-off-by: Babu Moger [email protected]
Signed-off-by: Helge Deller [email protected]

clk: samsung: exynos5420: The EPLL rate table corrections

This patch fixes values of the EPLL K coefficient and changes
the EPLL output frequency values to match exactly what is
possible to achieve with given M, P, S, K coefficients.
This allows to avoid rounding errors and unexpected frequency
being set with clk_set_rate(), due to recalc_rate returning
different values than the PLL rate specified in the
exynos5420_epll_24mhz_tbl table. E.g. this prevents a case
where two consecutive clk_set_rate() calls with same argument
result in different PLL output frequency.

The PLL output frequencies have been calculated with formula:

f = fxtal * (M * 2^16 + K) / (P * 2^S) / 2^16

where fxtal = 24000000.

Fixes: 9842452 ("clk: samsung: exynos542x: Add EPLL rate table")
Signed-off-by: Sylwester Nawrocki [email protected]
Signed-off-by: Stephen Boyd [email protected]

sunhme: fix up GREG_STAT and GREG_IMASK register offsets

Update the values to match those from the STP2002QFP documentation.

Signed-off-by: Mark Cave-Ayland [email protected]
Signed-off-by: David S. Miller [email protected]

net: phy: Correctly process PHY_HALTED in phy_stop_machine()

Marc reported that he was not getting the PHY library adjust_link()
callback function to run when calling phy_stop() + phy_disconnect()
which does not indeed happen because we set the state machine to
PHY_HALTED but we don't get to run it to process this state past that
point.

Fix this with a synchronous call to phy_state_machine() in order to have
the state machine actually act on PHY_HALTED, set the PHY device's link
down, turn the network device's carrier off and finally call the
adjust_link() function.

Reported-by: Marc Gonzalez [email protected]
Fixes: a390d1f ("phylib: convert state_queue work to delayed_work")
Signed-off-by: Florian Fainelli [email protected]
Signed-off-by: Marc Gonzalez [email protected]
Signed-off-by: David S. Miller [email protected]

ipv4: fib: Fix NULL pointer deref during fib_sync_down_dev()

Michał reported a NULL pointer deref during fib_sync_down_dev() when
unregistering a netdevice. The problem is that we don't check for
'in_dev' being NULL, which can happen in very specific cases.

Usually routes are flushed upon NETDEV_DOWN sent in either the netdev or
the inetaddr notification chains. However, if an interface isn't
configured with any IP address, then it's possible for host routes to be
flushed following NETDEV_UNREGISTER, after NULLing dev->ip_ptr in
inetdev_destroy().

To reproduce:
$ ip link add type dummy
$ ip route add local 1.1.1.0/24 dev dummy0
$ ip link del dev dummy0

Fix this by checking for the presence of 'in_dev' before referencing it.

Fixes: 982acb9 ("ipv4: fib: Notify about nexthop status changes")
Signed-off-by: Ido Schimmel [email protected]
Reported-by: Michał Mirosław [email protected]
Tested-by: Michał Mirosław [email protected]
Signed-off-by: David S. Miller [email protected]

MAINTAINERS: Add more files to the PHY LIBRARY section

Include missing files that are provided by, used, or directly maintained
within the PHY LIBRARY, this include uapi header, header files used by
Device Tree code etc.

Signed-off-by: Florian Fainelli [email protected]
Signed-off-by: David S. Miller [email protected]

mv643xx_eth: fix of_irq_to_resource() error check

of_irq_to_resource() has recently been fixed to return negative error #'s
along with 0 in case of failure, however the Marvell MV643xx Ethernet
driver still only regards 0 as invalid IRQ -- fix it up.

Fixes: 7a4228b ("of: irq: use of_irq_get() in of_irq_to_resource()")
Signed-off-by: Sergei Shtylyov [email protected]
Signed-off-by: David S. Miller [email protected]

virtio_net: fix truesize for mergeable buffers

Seth Forshee noticed a performance degradation with some workloads.
This turns out to be due to packet drops. Euan Kemp noticed that this
is because we drop all packets where length exceeds the truesize, but
for some packets we add in extra memory without updating the truesize.
This in turn was kept around unchanged from ab7db91 ("virtio-net:
auto-tune mergeable rx buffer size for improved performance"). That
commit had an internal reason not to account for the extra space: not
enough bits to do it. No longer true so let's account for the allocated
length exactly.

Many thanks to Seth Forshee for the report and bisecting and Euan Kemp
for debugging the issue.

Fixes: 680557c ("virtio_net: rework mergeable buffer handling")
Reported-by: Euan Kemp [email protected]
Tested-by: Euan Kemp [email protected]
Reported-by: Seth Forshee [email protected]
Tested-by: Seth Forshee [email protected]
Signed-off-by: Michael S. Tsirkin [email protected]
Signed-off-by: David S. Miller [email protected]

Revert "net: bcmgenet: Remove init parameter from bcmgenet_mii_config"

This reverts commit 28b4591 ("net: bcmgenet: Remove init parameter
from bcmgenet_mii_config") because in the process of moving from
dev_info() to dev_info_once() we essentially lost the helpful printed
messages once the second instance of the driver is loaded.
dev_info_once() does not actually print the message once per device
instance, but once period.

Fixes: 28b4591 ("net: bcmgenet: Remove init parameter from bcmgenet_mii_config")
Signed-off-by: Florian Fainelli [email protected]
Reviewed-by: Doug Berger [email protected]
Signed-off-by: David S. Miller [email protected]

ppp: Fix a scheduling-while-atomic bug in del_chan

The PPTP set the pptp_sock_destruct as the sock's sk_destruct, it would
trigger this bug when __sk_free is invoked in atomic context, because of
the call path pptp_sock_destruct->del_chan->synchronize_rcu.

Now move the synchronize_rcu to pptp_release from del_chan. This is the
only one case which would free the sock and need the synchronize_rcu.

The following is the panic I met with kernel 3.3.8, but this issue should
exist in current kernel too according to the codes.

BUG: scheduling while atomic
__schedule_bug+0x5e/0x64
__schedule+0x55/0x580
? ppp_unregister_channel+0x1cd5/0x1de0 [ppp_generic]
? dev_hard_start_xmit+0x423/0x530
? sch_direct_xmit+0x73/0x170
__cond_resched+0x16/0x30
_cond_resched+0x22/0x30
wait_for_common+0x18/0x110
? call_rcu_bh+0x10/0x10
wait_for_completion+0x12/0x20
wait_rcu_gp+0x34/0x40
? wait_rcu_gp+0x40/0x40
synchronize_sched+0x1e/0x20
0xf8417298
0xf8417484
? sock_queue_rcv_skb+0x109/0x130
__sk_free+0x16/0x110
? udp_queue_rcv_skb+0x1f2/0x290
sk_free+0x16/0x20
__udp4_lib_rcv+0x3b8/0x650

Signed-off-by: Gao Feng [email protected]
Signed-off-by: David S. Miller [email protected]

udp6: fix jumbogram reception

Since commit 67a5178 ("ipv6: udp: leverage scratch area
helpers") udp6_recvmsg() read the skb len from the scratch area,
to avoid a cache miss.
But the UDP6 rx path support RFC 2675 UDPv6 jumbograms, and their
length exceeds the 16 bits available in the scratch area. As a side
effect the length returned by recvmsg() is:
% (1<<16)

This commit addresses the issue allocating one more bit in the
IP6CB flags field and setting it for incoming jumbograms.
Such field is still in the first cacheline, so at recvmsg()
time we can check it and fallback to access skb->len if
required, without a measurable overhead.

Fixes: 67a5178 ("ipv6: udp: leverage scratch area helpers")
Signed-off-by: Paolo Abeni [email protected]
Signed-off-by: David S. Miller [email protected]

samples/bpf: fix bpf tunnel cleanup

test_tunnel_bpf.sh fails to remove the vxlan11 tunnel device, causing the
next geneve tunnelling test case fails. In addition, the geneve reserved bit
in tcbpf2_kern.c should be zero, according to the RFC.

Signed-off-by: William Tu [email protected]
Signed-off-by: David S. Miller [email protected]

pinctrl: stm32: select IRQ_DOMAIN_HIERARCHY instead of depends on

Drivers that need IRQ_DOMAIN_HIERARCHY should "select" it, but
drivers/pinctrl/stm32/Kconfig is the only exception that uses
"depends on" syntax. This prevents GPIO drivers from select'ing
IRQ_DOMAIN_HIERARCHY.

For example, if I add "select IRQ_DOMAIN_HIERARCHY" to GPIO_XGENE_SB,
I get the following recursive dependency error.

drivers/gpio/Kconfig:13:error: recursive dependency detected!
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/gpio/Kconfig:13: symbol GPIOLIB is selected by PINCTRL_STM32
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/pinctrl/stm32/Kconfig:3: symbol PINCTRL_STM32 is selected by PINCTRL_STM32F429
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/pinctrl/stm32/Kconfig:11: symbol PINCTRL_STM32F429 depends on IRQ_DOMAIN_HIERARCHY
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
kernel/irq/Kconfig:67: symbol IRQ_DOMAIN_HIERARCHY is selected by GPIO_XGENE_SB
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/gpio/Kconfig:502: symbol GPIO_XGENE_SB depends on GPIOLIB

Signed-off-by: Masahiro Yamada [email protected]
Tested-by: Alexandre TORGUE [email protected]
Signed-off-by: Linus Walleij [email protected]

gpio: xgene-sb: select IRQ_DOMAIN_HIERARCHY

This driver calls irq_domain_hierarchy() and irq_chip_*_parent().
They are available only when CONFIG_IRQ_DOMAIN_HIERARCHY is enabled.

Signed-off-by: Masahiro Yamada [email protected]
Signed-off-by: Linus Walleij [email protected]

gpio: gpio-mxc: Fix: higher 16 GPIOs usable as wake source

In the function gpio_set_wake_irq(), port->irq_high is only checked for
zero. As platform_get_irq() returns a value less then zero if no interrupt
was found, any gpio >= 16 was handled like an irq_high interrupt was
available. On iMX27 for example no high interrupt is available. This lead
to the problem that only some gpios (the lower 16) were useable as wake
sources.

Signed-off-by: Philipp Rosenberger [email protected]
Signed-off-by: Linus Walleij [email protected]

x86/hpet: Cure interface abuse in the resume path

The HPET resume path abuses irq_domain_[de]activate_irq() to restore the
MSI message in the HPET chip for the boot CPU on resume and it relies on an
implementation detail of the interrupt core code, which magically makes the
HPET unmask call invoked via a irq_disable/enable pair. This worked as long
as the irq code did unconditionally invoke the unmask() callback. With the
recent changes which keep track of the masked state to avoid expensive
hardware access, this does not longer work. As a consequence the HPET timer
interrupts are not unmasked which breaks resume as the boot CPU waits
forever that a timer interrupt arrives.

Make the restore of the MSI message explicit and invoke the unmask()
function directly. While at it get rid of the pointless affinity setting as
nothing can change the affinity of the interrupt and the vector across
suspend/resume. The restore of the MSI message reestablishes the previous
affinity setting which is the correct one.

Fixes: bf22ff4 ("genirq: Avoid unnecessary low level irq function calls")
Reported-and-tested-by: Tomi Sarvela [email protected]
Reported-by: Martin Peres [email protected]
Signed-off-by: Thomas Gleixner [email protected]
Acked-by: "Rafael J. Wysocki" [email protected]
Cc: [email protected]
Cc: Peter Zijlstra [email protected]
Cc: Marc Zyngier [email protected]
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1707312158590.2287@nanos

arm64: Use arch_timer_get_rate when trapping CNTFRQ_EL0

In an ideal world, CNTFRQ_EL0 always contains the timer frequency
for the kernel to use. Sadly, we get quite a few broken systems
where the firmware authors cannot be bothered to program that
register on all CPUs, and rely on DT to provide that frequency.

So when trapping CNTFRQ_EL0, make sure to return the actual rate
(as known by the kernel), and not CNTFRQ_EL0.

Acked-by: Mark Rutland [email protected]
Signed-off-by: Marc Zyngier [email protected]
Signed-off-by: Will Deacon [email protected]

ASoC: rt5665: fix wrong register for bclk ratio control

The register of setting back ratio should be RT5665_ADDA_CLK_2
instead of RT5665_ADDA_CLK_1.

Signed-off-by: Bard Liao [email protected]
Signed-off-by: Mark Brown [email protected]

gpio: exar: Use correct property prefix and document bindings

The device-specific property should be prefixed with the vendor name,
not "linux,", as Linus Walleij pointed out. Change this and document the
bindings of this platform device.

We didn't ship the old binding in a release yet. So we can still change
it without breaking an official API.

Fixes: 380b1e2 ("gpio-exar/8250-exar: Make set of exported GPIOs configurable")
Signed-off-by: Jan Kiszka [email protected]
Acked-by: Rob Herring [email protected]
Signed-off-by: Linus Walleij [email protected]

gpiolib: skip unwanted events, don't convert them to opposite edge

The previous fix for filtering out of unwatched events was not entirely
correct. Instead of skipping the events we don't want, they are now
interpreted as events with opposing edge.

In order to fix it: always read the GPIO line value on interrupt and
only emit the event if it corresponds with the event type we requested.

Cc: [email protected]
Fixes: ad537b8 ("gpiolib: fix filtering out unwanted events")
Signed-off-by: Bartosz Golaszewski [email protected]
Signed-off-by: Linus Walleij [email protected]

clk: meson: mpll: fix mpll0 fractional part ignored

mpll0 clock is special compared to the other mplls. It needs another
bit (ssen) to be set to activate the fractional part the mpll divider

Fixes: 007e6e5 ("clk: meson: mpll: add rw operation")
Signed-off-by: Jerome Brunet [email protected]
Signed-off-by: Neil Armstrong [email protected]

timers: Fix overflow in get_next_timer_interrupt

For e.g. HZ=100, timer being 430 jiffies in the future, and 32 bit
unsigned int, there is an overflow on unsigned int right-hand side
of the expression which results with wrong values being returned.

Type cast the multiplier to 64bit to avoid that issue.

Fixes: 46c8f0b ("timers: Fix get_next_timer_interrupt() computation")
Signed-off-by: Matija Glavinic Pecotic [email protected]
Signed-off-by: Thomas Gleixner [email protected]
Reviewed-by: Alexander Sverdlin [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]

device property: Fix usecount for of_graph_get_port_parent()

Fix inconsistent use of of_graph_get_port_parent() where
asoc_simple_card_parse_graph_dai() does of_node_get() before
calling it while other callers do not. We can fix this by
not trashing the node passed to of_graph_get_port_parent().

Let's also make sure the callers have correct refcounts and remove
related incorrect of_node_put() calls for of_for_each_phandle
as that's done by of_phandle_iterator_next() except when
we break out of the loop early.

Let's fix both issues with a single patch to avoid kobject
refcounts getting messed up more if two patches are merged
separately.

Otherwise strange issues can happen caused by memory corruption
caused by too many kobject_del() calls such as:

BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:747
...
(___might_sleep)
(__mutex_lock)
(mutex_lock_nested)
(kernfs_remove)
(kobject_del)
(kobject_put)
(of_get_next_parent)
(of_graph_get_port_parent)
(asoc_simple_card_parse_graph_dai [snd_soc_simple_card_utils])
(asoc_graph_card_probe [snd_soc_audio_graph_card])

Fixes: 0ef472a ("of_graph: add of_graph_get_port_parent()")
Fixes: 2692c1c ("ASoC: add audio-graph-card support")
Fixes: 1689333 ("ASoC: simple-card-utils: add asoc_simple_card_parse_graph_dai()")
Signed-off-by: Tony Lindgren [email protected]
Reviewed-by: Rob Herring [email protected]
Tested-by: Antonio Borneo [email protected]
Tested-by: Kuninori Morimoto [email protected]
Signed-off-by: Mark Brown [email protected]

libceph: make encode_request_*() work with r_mempool requests

Messages allocated out of ceph_msgpool have a fixed front length
(pool->front_len). Asserting that the entire front has been filled
while encoding is thus wrong.

Fixes: 8cb441c ("libceph: MOSDOp v8 encoding (actual spgid + full hash)")
Reported-by: "Yan, Zheng" [email protected]
Signed-off-by: Ilya Dryomov [email protected]
Reviewed-by: "Yan, Zheng" [email protected]

libceph: don't call ->reencode_message() more than once per message

Reencoding an already reencoded message is a bad idea. This could
happen on Policy::stateful_server connections (!CEPH_MSG_CONNECT_LOSSY),
such as MDS sessions.

This didn't pop up in testing because currently only OSD requests are
reencoded and OSD sessions are always lossy.

Fixes: 98ad5eb ("libceph: ceph_connection_operations::reencode_message() method")
Signed-off-by: Ilya Dryomov [email protected]
Reviewed-by: "Yan, Zheng" [email protected]

crush: assume weight_set != null imples weight_set_size > 0

Reflects ceph.git commit 5e8fa3e.

Signed-off-by: Ilya Dryomov [email protected]
Reviewed-by: Sage Weil [email protected]

libceph: fallback for when there isn't a pool-specific choose_arg

There is now a fallback to a choose_arg index of -1 if there isn't
a pool-specific choose_arg set. If you create a per-pool weight-set,
that works for that pool. Otherwise we try the compat/default one. If
that doesn't exist either, then we use the normal CRUSH weights.

Signed-off-by: Ilya Dryomov [email protected]
Reviewed-by: Sage Weil [email protected]

libceph: upmap semantic changes

apply both pg_upmap and pg_upmap_items
allow bidirectional swap of pg-upmap-items

Signed-off-by: Ilya Dryomov [email protected]
Reviewed-by: Sage Weil [email protected]

libceph: make RECOVERY_DELETES feature create a new interval

This is needed so that the OSDs can regenerate the missing set at the
start of a new interval where support for recovery deletes changed.

Signed-off-by: Ilya Dryomov [email protected]
Reviewed-by: Sage Weil [email protected]

xtensa: don't limit csum_partial export by CONFIG_NET

csum_partial and csum_partial_copy_generic are defined unconditionally
and are available even when CONFIG_NET is disabled. They are used not
only by the network drivers, but also by scsi and media.
Don't limit these functions export by CONFIG_NET.

Cc: [email protected]
Signed-off-by: Max Filippov [email protected]

xtensa: mm/cache: add missing EXPORT_SYMBOLs

Functions clear_user_highpage, copy_user_highpage, flush_dcache_page,
local_flush_cache_range and local_flush_cache_page may be used from
modules. Export them.

Cc: [email protected]
Signed-off-by: Max Filippov [email protected]

KVM: nVMX: do not fill vm_exit_intr_error_code in prepare_vmcs12

Do this in the caller of nested_vmx_vmexit instead.

nested_vmx_check_exception was doing a vmwrite to the vmcs02's
VM_EXIT_INTR_ERROR_CODE field, so that prepare_vmcs12 would move
the field to vmcs12->vm_exit_intr_error_code. However that isn't
possible on pre-Haswell machines. Moving the vmcs12 write to the
callers fixes it.

Reported-by: Jim Mattson [email protected]
Signed-off-by: Paolo Bonzini [email protected]
[Changed nested_vmx_reflect_vmexit() return type to (int)1 from (bool)1,
thanks to [email protected]]
Signed-off-by: Radim Krčmář [email protected]

KVM: nVMX: fixes to nested virt interrupt injection

There are three issues in nested_vmx_check_exception:

it is not taking PFEC_MATCH/PFEC_MASK into account, as reported
by Wanpeng Li;
it should rebuild the interruption info and exit qualification fields
from scratch, as reported by Jim Mattson, because the values from the
L2->L0 vmexit may be invalid (e.g. if an emulated instruction causes
a page fault, the EPT misconfig's exit qualification is incorrect).
CR2 and DR6 should not be written for exception intercept vmexits
(CR2 only for AMD).

This patch fixes the first two and adds a comment about the last,
outlining the fix.

Cc: Jim Mattson [email protected]
Cc: Wanpeng Li [email protected]
Signed-off-by: Paolo Bonzini [email protected]

KVM: async_pf: make rcu irq exit if not triggered from idle task

WARNING: CPU: 5 PID: 1242 at kernel/rcu/tree_plugin.h:323 rcu_note_context_switch+0x207/0x6b0
CPU: 5 PID: 1242 Comm: unity-settings- Not tainted 4.13.0-rc2+ #1
RIP: 0010:rcu_note_context_switch+0x207/0x6b0
Call Trace:
__schedule+0xda/0xba0
? kvm_async_pf_task_wait+0x1b2/0x270
schedule+0x40/0x90
kvm_async_pf_task_wait+0x1cc/0x270
? prepare_to_swait+0x22/0x70
do_async_page_fault+0x77/0xb0
? do_async_page_fault+0x77/0xb0
async_page_fault+0x28/0x30
RIP: 0010:__d_lookup_rcu+0x90/0x1e0

I encounter this when trying to stress the async page fault in L1 guest w/
L2 guests running.

Commit 9b132fb (Add rcu user eqs exception hooks for async page
fault) adds rcu_irq_enter/exit() to kvm_async_pf_task_wait() to exit cpu
idle eqs when needed, to protect the code that needs use rcu. However,
we need to call the pair even if the function calls schedule(), as seen
from the above backtrace.

This patch fixes it by informing the RCU subsystem exit/enter the irq
towards/away from idle for both n.halted and !n.halted.

Cc: Paolo Bonzini [email protected]
Cc: Radim Krčmář [email protected]
Cc: Paul E. McKenney [email protected]
Cc: [email protected]
Signed-off-by: Wanpeng Li [email protected]
Reviewed-by: Paolo Bonzini [email protected]
Signed-off-by: Radim Krčmář [email protected]

NFSv4: Fix EXCHANGE_ID corrupt verifier issue

The verifier is allocated on the stack, but the EXCHANGE_ID RPC call was
changed to be asynchronous by commit 8d89bd7. If we interrrupt
the call to rpc_wait_for_completion_task(), we can therefore end up
transmitting random stack contents in lieu of the verifier.

Fixes: 8d89bd7 ("NFS setup async exchange_id")
Cc: [email protected] # v4.9+
Signed-off-by: Trond Myklebust [email protected]
Signed-off-by: Anna Schumaker [email protected]

ptp: introduce ptp auxiliary worker

Many PTP drivers required to perform some asynchronous or periodic work,
like periodically handling PHC counter overflow or handle delayed timestamp
for RX/TX network packets. In most of the cases, such work is implemented
using workqueues. Unfortunately, Kernel workqueues might introduce
significant delay in work scheduling under high system load and on -RT,
which could cause misbehavior of PTP drivers due to internal counter
overflow, for example, and there is no way to tune its execution policy and
priority manuallly.

Hence, The kthread_worker can be used insted of workqueues, as it create
separte named kthread for each worker and its its execution policy and
priority can be configured using chrt tool.

This prblem was reported for two drivers TI CPSW CPTS and dp83640, so
instead of modifying each of these driver it was proposed to add PTP
auxiliary worker to the PHC subsystem.

The patch adds PTP auxiliary worker in PHC subsystem using kthread_worker
and kthread_delayed_work and introduces two new PHC subsystem APIs:

long (*do_aux_work)(struct ptp_clock_info *ptp) callback in
ptp_clock_info structure, which driver should assign if it require to
perform asynchronous or periodic work. Driver should return the delay of
the PTP next auxiliary work scheduling time (>=0) or negative value in case
further scheduling is not required.
int ptp_schedule_worker(struct ptp_clock *ptp, unsigned long delay) which
allows schedule PTP auxiliary work.

The name of kthread_worker thread corresponds PTP PHC device name "ptp%d".

Signed-off-by: Grygorii Strashko [email protected]
Signed-off-by: David S. Miller [email protected]

net: ethernet: ti: cpts: convert to use ptp auxiliary worker

There could be significant delay in CPTS work schedule under high system
load and on -RT which could cause CPTS misbehavior due to internal counter
overflow. Usage of own kthread_worker allows to avoid such kind of issues
and makes it possible to tune priority of CPTS kthread_worker thread on -RT
(thread name "cpts").

Hence, the CPTS driver is converted to use PTP auxiliary worker as PHC
subsystem implements such functionality in a generic way now.

Signed-off-by: Grygorii Strashko [email protected]
Signed-off-by: David S. Miller [email protected]

net: ethernet: ti: cpts: fix tx timestamping timeout

With the low speed Ethernet connection CPDMA notification about packet
processing can be received before CPTS TX timestamp event, which is set
when packet actually left CPSW while cpdma notification is sent when packet
pushed in CPSW fifo. As result, when connection is slow and CPU is fast
enough TX timestamping is not working properly.

Fix it, by introducing TX SKB queue to store PTP SKBs for which Ethernet
Transmit Event hasn't been received yet and then re-check this queue
with new Ethernet Transmit Events by scheduling CPTS overflow
work more often (every 1 jiffies) until TX SKB queue is not empty.

Side effect of this change is:

User space tools require to take into account possible delay in TX
timestamp processing (for example ptp4l works with tx_timestamp_timeout=400
under net traffic and tx_timestamp_timeout=25 in idle).

Signed-off-by: Grygorii Strashko [email protected]
Signed-off-by: David S. Miller [email protected]

net: ethernet: ti: cpts: fix fifo read in cpts_find_ts

Now the call chain
cpts_find_ts()
|- cpts_fifo_read(cpts, CPTS_EV_PUSH)

will stop reading CPTS FIFO if PUSH event is found. But this is not
expected and CPTS FIFI should be completely drained here. This is most
probably copy-paste error and it has no negative impact as CPTS_EV_PUSH
should not be present in FIFO without TS_PUSH request and
cpts_systim_read() and cpts_find_ts() synchronized by spin_lock.

Correct above by calling cpts_fifo_read() with -1 parameter, so it will
read all CPTS event from FIFO.

Signed-off-by: Grygorii Strashko [email protected]
Signed-off-by: David S. Miller [email protected]

Cipso: cipso_v4_optptr enter infinite loop

in for(),if((optlen > 0) && (optptr[1] == 0)), enter infinite loop.

Test: receive a packet which the ip length > 20 and the first byte of ip option is 0, produce this issue

Signed-off-by: yujuan.qi [email protected]
Acked-by: Paul Moore [email protected]
Signed-off-by: David S. Miller [email protected]

platform/x86: dell-wmi: Fix driver interface version query

When I converted dell-wmi to the new bus infrastructure, I left the
call to dell_wmi_check_descriptor_buffer() in dell_wmi_init(). This
could cause two problems:

An error message when loading the driver on a system without
dell-wmi. We'd try to read the event descriptor even if the WMI
GUID wasn't there.
A possible race if dell-wmi was loaded manually before wmi was
fully initialized.

Fix it by moving the call to the probe function where it belongs.

Fixes: bff589b ("platform/x86: dell-wmi: Convert to the WMI bus infrastructure")
Signed-off-by: Andy Lutomirski [email protected]
Reviewed-by: Pali Rohár [email protected]
Signed-off-by: Darren Hart (VMware) [email protected]

vxlan: fix remcsum when GRO on and CHECKSUM_PARTIAL boundary is outer UDP

In the case that GRO is turned on and the original received packet is
CHECKSUM_PARTIAL, if the outer UDP header is exactly at the last
csum-unnecessary point, which for instance could occur if the packet
comes from another Linux guest on the same Linux host, we have to do
either remcsum_adjust or set up CHECKSUM_PARTIAL again with its
csum_start properly reset considering RCO.

However, since b7fe10e("gro: Fix remcsum offload to deal with frags
in GRO") that barrier in such case could be skipped if GRO turned on,
hence we pass over it and the inner L4 validation mistakenly reckons
it as a bad csum.

This patch makes remcsum_offload being reset at the same time of GRO
remcsum cleanup, so as to make it work in such case as before.

Fixes: b7fe10e ("gro: Fix remcsum offload to deal with frags in GRO")
Signed-off-by: Koichiro Den [email protected]
Signed-off-by: David S. Miller [email protected]

gue: fix remcsum when GRO on and CHECKSUM_PARTIAL boundary is outer UDP

In the case that GRO is turned on and the original received packet is
CHECKSUM_PARTIAL, if the outer UDP header is exactly at the last
csum-unnecessary point, which for instance could occur if the packet
comes from another Linux guest on the same Linux host, we have to do
either remcsum_adjust or set up CHECKSUM_PARTIAL again with its
csum_start properly reset considering RCO.

However, since b7fe10e ("gro: Fix remcsum offload to deal with frags
in GRO") that barrier in such case could be skipped if GRO turned on,
hence we pass over it and the inner L4 validation mistakenly reckons
it as a bad csum.

This patch makes remcsum_offload being reset at the same time of GRO
remcsum cleanup, so as to make it work in such case as before.

Fixes: b7fe10e ("gro: Fix remcsum offload to deal with frags in GRO")
Signed-off-by: Koichiro Den [email protected]
Signed-off-by: David S. Miller [email protected]

b44: Initialize 64-bit stats seqcount

On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a
lockdep splat indicating this seqcount is not correctly initialized, fix
that.

Fixes: eeda858 ("b44: add 64 bit stats")
Signed-off-by: Florian Fainelli [email protected]
Acked-by: Michael Chan [email protected]
Signed-off-by: David S. Miller [email protected]

i40e: Initialize 64-bit statistics TX ring seqcount

On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a
lockdep splat indicating this seqcount is not correctly initialized, fix
that.

Fixes: 980e9b1 ("i40e: Add support for 64 bit netstats")
Signed-off-by: Florian Fainelli [email protected]
Signed-off-by: David S. Miller [email protected]

ixgbe: Initialize 64-bit stats seqcounts

On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a
lockdep splat indicating this seqcount is not correctly initialized, fix
that.

Fixes: 4197aa7 ("ixgbevf: provide 64 bit statistics")
Signed-off-by: Florian Fainelli [email protected]
Signed-off-by: David S. Miller [email protected]

nfp: Initialize RX and TX ring 64-bit stats seqcounts

On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a
lockdep splat indicating this seqcount is not correctly initialized, fix
that.

Fixes: 4c35236 ("net: add driver for Netronome NFP4000/NFP6000 NIC VFs")
Signed-off-by: Florian Fainelli [email protected]
Reviewed-by: Simon Horman [email protected]
Signed-off-by: David S. Miller [email protected]

gtp: Initialize 64-bit per-cpu stats correctly

On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a
lockdep splat indicating this seqcount is not correctly initialized, fix
that by using netdev_alloc_pcpu_stats() instead of an open coded
allocation.

Fixes: 459aa66 ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
Signed-off-by: Florian Fainelli [email protected]
Signed-off-by: David S. Miller [email protected]

netvsc: Initialize 64-bit stats seqcount

On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a
lockdep splat indicating this seqcount is not correctly initialized, fix
that. In commit 6c80f3f ("netvsc: report per-channel stats in
ethtool statistics") netdev_alloc_pcpu_stats() was removed in favor of
open-coding the 64-bits statistics, except that u64_stats_init() was
missed.

Fixes: 6c80f3f ("netvsc: report per-channel stats in ethtool statistics")
Signed-off-by: Florian Fainelli [email protected]
Signed-off-by: Stephen Hemminger [email protected]
Signed-off-by: David S. Miller [email protected]

ipvlan: Fix 64-bit statistics seqcount initialization

On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a
lockdep splat indicating this seqcount is not correctly initialized, fix
that by using the proper helper function: netdev_alloc_pcpu_stats().

Fixes: 2ad7bf3 ("ipvlan: Initial check-in of the IPVLAN driver.")
Signed-off-by: Florian Fainelli [email protected]
Signed-off-by: David S. Miller [email protected]

nand: fix wrong default oob layout for small pages using soft ecc

When using soft ecc, if no ooblayout is given, the core automatically
uses one of the nand_ooblayout_{sp,lp}*() functions to determine the
layout inside the out of band data.

Until kernel version 4.6, struct nand_ecclayout was used for that
purpose. During the migration from 4.6 to 4.7, an error shown up in the
small page layout, in the case oob section is only 8 bytes long.

The layout was using three bytes (0, 1, 2) for ecc, two bytes (3, 4)
as free bytes, one byte (5) for bad block marker and finally
two bytes (6, 7) as free bytes, as shown there:

[linux-4.6] drivers/mtd/nand/nand_base.c:52
static struct nand_ecclayout nand_oob_8 = {
.eccbytes = 3,
.eccpos = {0, 1, 2},
.oobfree = {
{.offset = 3,
.length = 2},
{.offset = 6,
.length = 2} }
};

This fixes the current implementation which is incoherent. It
references bit 3 at the same time as an ecc byte and a free byte.

Furthermore, it is clear with the previous implementation that there
is only one ecc section with 8 bytes oob sections. We shall return
-ERANGE in the nand_ooblayout_ecc_sp() function when asked for the
second section.

Signed-off-by: Miquel Raynal [email protected]
Fixes: 41b207a ("mtd: nand: implement the default mtd_ooblayout_ops")
Cc: [email protected]
Signed-off-by: Boris Brezillon [email protected]

mtd: nand: sunxi: fix potential divide-by-zero error

clk_round_rate() can return <= 0. Currently the value returned by
clk_round_rate() is used directly for a division. This patch introduces a
guard to ensure a divide-by-zero or a divide by a negative number for that
matter can't happen by bugging out returning -EINVAL if clk_round_rate()
returns <= 0.

Fixes: 2d43457 ("mtd: nand: sunxi: fix EDO mode selection")
Signed-off-by: Bryan O'Donoghue [email protected]
Signed-off-by: Boris Brezillon [email protected]

mtd: nand: Fix a docs build warning

Commit 0b4773f (mtd: nand: Drop unused cached programming support)
removed the "cached" parameter from nand_write_page(), but did not update
the kerneldoc comments, creating this docs build warning:

./drivers/mtd/nand/nand_base.c:2751: warning: Excess function parameter 'cached' description in 'nand_write_page'

Remove the offending line so we can have a little peace and quiet.

Signed-off-by: Jonathan Corbet [email protected]
Signed-off-by: Boris Brezillon [email protected]

mtd: nand: Fix timing setup for NANDs that do not support SET FEATURES

Some ONFI NANDs do not support the SET/GET FEATURES commands, which,
according to the spec, is perfectly valid.

On these NANDs we can't set a specific timing mode using the "timing
mode" feature, and we should assume the NAND does not require any setup
to enter a specific timing mode.

Signed-off-by: Boris Brezillon [email protected]
Fixes: d8e725d ("mtd: nand: automate NAND timings selection")
Reported-by: Alexander Dahl [email protected]
Cc: [email protected]
Tested-by: Alexander Dahl [email protected]
Signed-off-by: Boris Brezillon [email protected]

mtd: nand: Declare tBERS, tR and tPROG as u64 to avoid integer overflow

All timings in nand_sdr_timings are expressed in picoseconds but some
of them may not fit in an u32.

Signed-off-by: Boris Brezillon [email protected]
Fixes: 204e7ec ("mtd: nand: Add a few more timings to nand_sdr_timings")
Reported-by: Alexander Dahl [email protected]
Cc: [email protected]
Reviewed-by: Alexander Dahl [email protected]
Tested-by: Alexander Dahl [email protected]
Signed-off-by: Boris Brezillon [email protected]

mtd: nand: atmel: Fix EDO mode check

EDO mode should be used when tRC is less than 30ns, but timings are
expressed in picoseconds in the nand_sdr_timings struct.

Signed-off-by: Boris Brezillon [email protected]
Fixes: f9ce2ed ("mtd: nand: atmel: Add ->setup_data_interface() hooks")
Reported-by: Alexander Dahl [email protected]
Tested-by: Alexander Dahl [email protected]
Signed-off-by: Boris Brezillon [email protected]

gpio: tegra: fix unbalanced chained_irq_enter/exit

When more than one GPIO IRQs are triggered simultaneously,
tegra_gpio_irq_handler() called chained_irq_exit() multiple
times for one chained_irq_enter().

Fixes: 3c92db9
Signed-off-by: Michał Mirosław [email protected]
[Also changed the variable to a bool]
Signed-off-by: Linus Walleij [email protected]

powerpc/83xx/mpc832x_rdb: fix of_irq_to_resource() error check

of_irq_to_resource() has recently been fixed to return negative error #'s
along with 0 in case of failure, however the Freescale MPC832x RDB board
code still only regards 0 as a failure indication -- fix it up.

Fixes: 7a4228b ("of: irq: use of_irq_get() in of_irq_to_resource()")
Signed-off-by: Sergei Shtylyov [email protected]
Acked-by: Scott Wood [email protected]
Signed-off-by: Michael Ellerman [email protected]

ALSA: hda - Fix speaker output from VAIO VPCL14M1R

Sony VAIO VPCL14M1R needs the quirk to make the speaker working properly.

Tested-by: Dmitriy [email protected]
Cc: [email protected]
Signed-off-by: Sergei A. Trusov [email protected]
Signed-off-by: Takashi Iwai [email protected]

NFSv4: Fix double frees in nfs4_test_session_trunk()

rpc_clnt_add_xprt() expects the callback function to be synchronous, and
expects to release the transport and switch references itself.

Fixes: 04fa2c6 ("NFS pnfs data server multipath session trunking")
Signed-off-by: Trond Myklebust [email protected]
Signed-off-by: Anna Schumaker [email protected]

ARM64: dts: marvell: armada-37xx: Fix the number of GPIO on south bridge

The number of pins in South Bridge is 30 and not 29. There is a fix for
the driver for the pinctrl, but a fix is also need at device tree level
for the GPIO.

Fixes: afda007 ("ARM64: dts: marvell: Add pinctrl nodes for Armada
3700")
Cc: [email protected]
Signed-off-by: Gregory CLEMENT [email protected]

blk-mq: don't leak preempt counter/q_usage_counter when allocating rq failed

When blk_mq_get_request() failed, preempt counter isn't
released, and blk_mq_make_request() doesn't release the counter
too.

This patch fixes the issue, and makes sure that preempt counter
is only held if rq is allocated successfully. The same policy is
applied on .q_usage_counter too.

Signed-off-by: Ming Lei [email protected]
Signed-off-by: Jens Axboe [email protected]

lan78xx: USB fast connect/disconnect crash fix

USB fast connect/disconnect crash fix

When USB plugged/unplugged at fast rate,
lan78xx_mdio_init() in lan78xx_bind() failing case is not handled.
Whenever lan78xx_mdio_init() failed, dev->mdiobus will be freed, however
since lan78xx_bind() not consider as error and try to proceed for
further initialization in lan78xx_probe() which leads system hung/crash.
Also when register_netdev() failed, netdev is freed without calling lan78xx_unbind().
Hence halting the failed cases right manner fixes the system crash/hung issue.

Signed-off-by: Nisar Sayed [email protected]
Signed-off-by: David S. Miller [email protected]

lan78xx: Fix to handle hard_header_len update

Fix to handle hard_header_len update

When ifconfig up/down sequence is initiated hard_header_len
get updated incrementally for each ifconfig up /down sequence,
this leads invalid hard_header_len, moving to lan78xx_bind
to have one time update of hard_header_len addresses the issue.

Signed-off-by: Nisar Sayed [email protected]
Signed-off-by: David S. Miller [email protected]

net/mlx4_en: Fix wrong indication of Wake-on-LAN (WoL) support

Currently when WoL is supported but disabled, ethtool reports:
"Supports Wake-on: d".
Fix the indication of Wol support, so that the indication
remains "g" all the time if the NIC supports WoL.

Tested:
As accepted, when NIC supports WoL- ethtool reports:
Supports Wake-on: g
Wake-on: d
when NIC doesn't support WoL- ethtool reports:
Supports Wake-on: d
Wake-on: d

Fixes: 14c07b1 ("mlx4: Wake on LAN support")
Signed-off-by: Inbar Karmy [email protected]
Signed-off-by: Tariq Toukan [email protected]
Signed-off-by: David S. Miller [email protected]

net/mlx4_core: Fix sl_to_vl_change bit offset in flags2 dump

The index value in function dump_dev_cap_flags2() for outputting
"sl to vl mapping table change event support" needs to be
consistent with the value of the enumerated constant
MLX4_DEV_CAP_FLAG2_SL_TO_VL_CHANGE_EVENT defined in file
include/linux/mlx4_device.h

The change here restores that consistency.

Fixes: fd10ed8 ("IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets")
Reported-by: Mukesh Kacker [email protected]
Signed-off-by: Jack Morgenstein [email protected]
Signed-off-by: Tariq Toukan [email protected]
Signed-off-by: David S. Miller [email protected]

net/mlx4_core: Fix namespace misalignment in QinQ VST support commit

The cited commit introduced the following new enum value in file
include/linux/mlx4/device.h:

MLX4_DEV_CAP_FLAG2_SVLAN_BY_QP

However the value of MLX4_DEV_CAP_FLAG2_SVLAN_BY_QP needs to stay
consistent with the value used in another namespace in
function dump_dev_cap_flags2(), which is manually kept in sync.
The change here restores that consistency.

Fixes: 7c3d21c ("net/mlx4_core: Preparation for VF vlan protocol 802.1ad")
Reported-by: Mukesh Kacker [email protected]
Signed-off-by: Jack Morgenstein [email protected]
Signed-off-by: Tariq Toukan [email protected]
Signed-off-by: David S. Miller [email protected]

net/mlx4_core: Fixes missing capability bit in flags2 capability dump

The cited commit introduced the following new enum value in file
include/linux/mlx4/device.h:

QUERY_DEV_CAP_DIAG_RPRT_PER_PORT

However, it failed to introduce a corresponding entry in function
dump_dev_cap_flags2() for outputting a line in the message log
when this capability bit is set.

The change here fixes that omission.

Fixes: c7c122e ("net/mlx4: Add diagnostic counters capability bit")
Reported-by: Mukesh Kacker [email protected]
Signed-off-by: Jack Morgenstein [email protected]
Signed-off-by: Tariq Toukan [email protected]
Signed-off-by: David S. Miller [email protected]

usb: qmi_wwan: add D-Link DWM-222 device ID

Signed-off-by: Hector Martin [email protected]
Signed-off-by: David S. Miller [email protected]

ibmvnic: Initialize SCRQ's during login renegotiation

SCRQ resources are freed during renegotiation, but they are not
re-allocated afterwards due to some changes in the initialization
process. Fix that by re-allocating the memory after renegotation.

SCRQ's can also be freed if a server capabilities request fails.
If this were encountered during a device reset for example,
SCRQ's may not be re-allocated. This operation is not necessary
anymore so remove it.

Signed-off-by: Thomas Falcon [email protected]
Signed-off-by: David S. Miller [email protected]

tcp: avoid setting cwnd to invalid ssthresh after cwnd reduction states

If the sender switches the congestion control during ECN-triggered
cwnd-reduction state (CA_CWR), upon exiting recovery cwnd is set to
the ssthresh value calculated by the previous congestion control. If
the previous congestion control is BBR that always keep ssthresh
to TCP_INIFINITE_SSTHRESH, cwnd ends up being infinite. The safe
step is to avoid assigning invalid ssthresh value when recovery ends.

Signed-off-by: Yuchung Cheng [email protected]
Signed-off-by: Neal Cardwell [email protected]
Acked-by: Eric Dumazet [email protected]
Signed-off-by: David S. Miller [email protected]

drm/amdgpu: Fix undue fallthroughs in golden registers initialization

As I was staring at the si_init_golden_registers code, I noticed that
the Pitcairn initialization silently falls through the Cape Verde
initialization, and the Oland initialization falls through the Hainan
initialization. However there is no comment stating that this is
intentional, and the radeon driver doesn't have any such fallthrough,
so I suspect this is not supposed to happen.

Signed-off-by: Jean Delvare [email protected]
Fixes: 62a3755 ("drm/amdgpu: add si implementation v10")
Cc: Ken Wang [email protected]
Cc: Alex Deucher [email protected]
Cc: "Marek Olšák" [email protected]
Cc: "Christian König" [email protected]
Cc: Flora Cui [email protected]
Reviewed-by: Marek Olšák [email protected]
Signed-off-by: Alex Deucher [email protected]
Cc: [email protected]

drm/amdgpu: Use list_del_init in amdgpu_mn_unregister

Otherwise bo->shadow_list (which is aliased by bo->mn_list) will not
appear empty in amdgpu_ttm_bo_destroy and cause an oops when freeing
former userptr BOs.

Signed-off-by: Felix Kuehling [email protected]
Reviewed-by: Christian König [email protected]
Signed-off-by: Alex Deucher [email protected]

KVM: X86: Fix loss of pending INIT due to race

When SMP VM start, AP may lost INIT because of receiving INIT between
kvm_vcpu_ioctl_x86_get/set_vcpu_events.

   vcpu 0                             vcpu 1
                               kvm_vcpu_ioctl_x86_get_vcpu_events
                                 events->smi.latched_init = 0

send INIT to vcpu1
set vcpu1's pending_events
kvm_vcpu_ioctl_x86_set_vcpu_events
if (events->smi.latched_init == 0)
clear INIT in pending_events

This patch fixes it by just update SMM related flags if we are in SMM.

Thanks Peng Hao for the report and original commit message.

Reported-by: Peng Hao [email protected]
Cc: Paolo Bonzini [email protected]
Cc: Radim Krčmář [email protected]
Signed-off-by: Wanpeng Li [email protected]
Reviewed-by: Paolo Bonzini [email protected]
Signed-off-by: Radim Krčmář [email protected]

KVM: X86: init irq->level in kvm_pv_kick_cpu_op

'lapic_irq' is a local variable and its 'level' field isn't
initialized, so 'level' is random, it doesn't matter but
makes UBSAN unhappy:

UBSAN: Undefined behaviour in .../lapic.c:...
load of value 10 is not a valid value for type '_Bool'
...
Call Trace:
[] dump_stack+0x1e/0x20
[] ubsan_epilogue+0x12/0x55
[] __ubsan_handle_load_invalid_value+0x118/0x162
[] kvm_apic_set_irq+0xc3/0xf0 [kvm]
[] kvm_irq_delivery_to_apic_fast+0x450/0x910 [kvm]
[] kvm_irq_delivery_to_apic+0xfa/0x7a0 [kvm]
[] kvm_emulate_hypercall+0x62e/0x760 [kvm]
[] handle_vmcall+0x1a/0x30 [kvm_intel]
[] vmx_handle_exit+0x7a2/0x1fa0 [kvm_intel]
...

Signed-off-by: Longpeng(Mike) [email protected]
Signed-off-by: Radim Krčmář [email protected]

KVM: avoid using rcu_dereference_protected

During teardown, accesses to memslots and buses are using
rcu_dereference_protected with an always-true condition because
these accesses are done outside the usual mutexes. This
is because the last reference is gone and there cannot be any
concurrent modifications, but rcu_dereference_protected is
ugly and unobvious.

Instead, check the refcount in kvm_get_bus and __kvm_memslots.

Signed-off-by: Paolo Bonzini [email protected]
Signed-off-by: Radim Krčmář [email protected]

KVM: nVMX: do not pin the VMCS12

Since the current implementation of VMCS12 does a memcpy in and out
of guest memory, we do not need current_vmcs12 and current_vmcs12_page
anymore. current_vmptr is enough to read and write the VMCS12.

And David Matlack noted:

This patch also fixes dirty tracking (memslot->dirty_bitmap) of the
VMCS12 page by using kvm_write_guest. nested_release_page() only marks
the struct page dirty.

Signed-off-by: Paolo Bonzini [email protected]
Reviewed-by: David Hildenbrand [email protected]
[Added David Matlack's note and nested_release_page_clean() fix.]
Signed-off-by: Radim Krčmář [email protected]

kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardown

According to the Intel SDM, software cannot rely on the current VMCS to be
coherent after a VMXOFF or shutdown. So this is a valid way to handle VMCS12
flushes.

24.11.1 Software Use of Virtual-Machine Control Structures
...
If a logical processor leaves VMX operation, any VMCSs active on
that logical processor may be corrupted (see below). To prevent
such corruption of a VMCS that may be used either after a return
to VMX operation or on another logical processor, software should
execute VMCLEAR for that VMCS before executing the VMXOFF instruction
or removing power from the processor (e.g., as part of a transition
to the S3 and S4 power states).
...

This fixes a "suspicious rcu_dereference_check() usage!" warning during
kvm_vm_release() because nested_release_vmcs12() calls
kvm_vcpu_write_guest_page() without holding kvm->srcu.

Signed-off-by: David Matlack [email protected]
Reviewed-by: Paolo Bonzini [email protected]
Signed-off-by: Radim Krčmář [email protected]

KVM: nVMX: mark vmcs12 pages dirty on L2 exit

The host physical addresses of L1's Virtual APIC Page and Posted
Interrupt descriptor are loaded into the VMCS02. The CPU may write
to these pages via their host physical address while L2 is running,
bypassing address-translation-based dirty tracking (e.g. EPT write
protection). Mark them dirty on every exit from L2 to prevent them
from getting out of sync with dirty tracking.

Also mark the virtual APIC page and the posted interrupt descriptor
dirty when KVM is virtualizing posted interrupt processing.

Signed-off-by: David Matlack [email protected]
Reviewed-by: Paolo Bonzini [email protected]
Signed-off-by: Radim Krčmář [email protected]

mm/hugetlb.c: __get_user_pages ignores certain follow_hugetlb_page errors

Commit 9a291a7 ("mm/hugetlb: report -EHWPOISON not -EFAULT when
FOLL_HWPOISON is specified") causes __get_user_pages to ignore certain
errors from follow_hugetlb_page. After such error, __get_user_pages
subsequently calls faultin_page on the same VMA and start address that
follow_hugetlb_page failed on instead of returning the error immediately
as it should.

In follow_hugetlb_page, when hugetlb_fault returns a value covered under
VM_FAULT_ERROR, follow_hugetlb_page returns it without setting nr_pages
to 0 as __get_user_pages expects in this case, which causes the
following to happen in __get_user_pages: the "while (nr_pages)" check
succeeds, we skip the "if (!vma..." check because we got a VMA the last
time around, we find no page with follow_page_mask, and we call
faultin_page, which calls hugetlb_fault for the second time.

This issue also slightly changes how __get_user_pages works. Before, it
only returned error if it had made no progress (i = 0). But now,
follow_hugetlb_page can clobber "i" with an error code since its new
return path doesn't check for progress. So if "i" is nonzero before a
failing call to follow_hugetlb_page, that indication of progress is lost
and __get_user_pages can return error even if some pages were
successfully pinned.

To fix this, change follow_hugetlb_page so that it updates nr_pages,
allowing __get_user_pages to fail immediately and restoring the "error
only if no progress" behavior to __get_user_pages.

Tested that __get_user_pages returns when expected on error from
hugetlb_fault in follow_hugetlb_page.

Fixes: 9a291a7 ("mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Daniel Jordan [email protected]
Acked-by: Punit Agrawal [email protected]
Cc: Andrea Arcangeli [email protected]
Cc: "Aneesh Kumar K.V" [email protected]
Cc: Gerald Schaefer [email protected]
Cc: James Morse [email protected]
Cc: "Kirill A. Shutemov" [email protected]
Cc: Michal Hocko [email protected]
Cc: Mike Kravetz [email protected]
Cc: Naoya Horiguchi [email protected]
Cc: zhong jiang [email protected]
Cc: [email protected] [4.12.x]
Signed-off-by: Andrew Morton [email protected]
Signed-off-by: Linus Torvalds [email protected]

pid: kill pidhash_size in pidhash_init()

After commit 3d375d7 ("mm: update callers to use HASH_ZERO flag"),
drop unused pidhash_size in pidhash_init().

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang [email protected]
Reviewed-by: Pavel Tatashin [email protected]
Signed-off-by: Andrew Morton [email protected]
Signed-off-by: Linus Torvalds [email protected]

mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries

Nadav Amit identified a theoritical race between page reclaim and
mprotect due to TLB flushes being batched outside of the PTL being held.

He described the race as follows:

    CPU0                            CPU1
    ----                            ----
                                    user accesses memory using RW PTE
                                    [PTE now cached in TLB]
    try_to_unmap_one()
    ==> ptep_get_and_clear()
    ==> set_tlb_ubc_flush_pending()
                                    mprotect(addr, PROT_READ)
                                    ==> change_pte_range()
                                    ==> [ PTE non-present - no flush ]

                                    user writes using cached RW PTE
    ...

    try_to_unmap_flush()

The same type of race exists for reads when protecting for PROT_NONE and
also exists for operations that can leave an old TLB entry behind such
as munmap, mremap and madvise.

For some operations like mprotect, it's not necessarily a data integrity
issue but it is a correctness issue as there is a window where an
mprotect that limits access still allows access. For munmap, it's
potentially a data integrity issue although the race is massive as an
munmap, mmap and return to userspace must all complete between the
window when reclaim drops the PTL and flushes the TLB. However, it's
theoritically possible so handle this issue by flushing the mm if
reclaim is potentially currently batching TLB flushes.

Other instances where a flush is required for a present pte should be ok
as either the page lock is held preventing parallel reclaim or a page
reference count is elevated preventing a parallel free leading to
corruption. In the case of page_mkclean there isn't an obvious path
that userspace could take advantage of without using the operations that
are guarded by this patch. Other users such as gup as a race with
reclaim looks just at PTEs. huge page variants should be ok as they
don't race with reclaim. mincore only looks at PTEs. userfault also
should be ok as if a parallel reclaim takes place, it will either fault
the page back in or read some of the data before the flush occurs
triggering a fault.

Note that a variant of this patch was acked by Andy Lutomirski but this
was for the x86 parts on top of his PCID work which didn't make the 4.13
merge window as expected. His ack is dropped from this version and
there will be a follow-on patch on top of PCID that will include his
ack.

[[email protected]: tweak comments]
[[email protected]: fix spello]
Link: http://lkml.kernel.org/r/[email protected]
Reported-by: Nadav Amit [email protected]
Signed-off-by: Mel Gorman [email protected]
Cc: Andy Lutomirski [email protected]
Cc: [email protected] [v4.4+]
Signed-off-by: Andrew Morton [email protected]
Signed-off-by: Linus Torvalds [email protected]

userfaultfd: non-cooperative: notify about unmap of destination during mremap

When mremap is called with MREMAP_FIXED it unmaps memory at the
destination address without notifying userfaultfd monitor.

If the destination were registered with userfaultfd, the monitor has no
way to distinguish between the old and new ranges and to properly relate
the page faults that would occur in the destination region.

Fixes: 897ab3e ("userfaultfd: non-cooperative: add event for memory unmaps")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport [email protected]
Acked-by: Pavel Emelyanov [email protected]
Cc: Andrea Arcangeli [email protected]
Cc: [email protected]
Signed-off-by: Andrew Morton [email protected]
Signed-off-by: Linus Torvalds [email protected]

kasan: avoid -Wmaybe-uninitialized warning

gcc-7 produces this warning:

mm/kasan/report.c: In function 'kasan_report':
mm/kasan/report.c:351:3: error: 'info.first_bad_addr' may be used uninitialized in this function [-Werror=maybe-uninitialized]
print_shadow_for_address(info->first_bad_addr);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mm/kasan/report.c:360:27: note: 'info.first_bad_addr' was declared here

The code seems fine as we only print info.first_bad_addr when there is a
shadow, and we always initialize it in that case, but this is relatively
hard for gcc to figure out after the latest rework.

Adding an intialization to the most likely value together with the other
struct members shuts up that warning.

Fixes: b235b98 ("kasan: unify report headers")
Link: https://patchwork.kernel.org/patch/9641417/
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann [email protected]
Suggested-by: Alexander Potapenko [email protected]
Suggested-by: Andrey Ryabinin [email protected]
Acked-by: Andrey Ryabinin [email protected]
Cc: Dmitry Vyukov [email protected]
Signed-off-by: Andrew Morton [email protected]
Signed-off-by: Linus Torvalds [email protected]

kthread: fix documentation build warning

The kerneldoc comment for kthread_create() had an incorrect argument
name, leading to a warning in the docs build.

Correct it, and make one more small step toward a warning-free build.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jonathan Corbet [email protected]
Cc: Randy Dunlap [email protected]
Signed-off-by: Andrew Morton [email protected]
Signed-off-by: Linus Torvalds [email protected]

zram: do not free pool->size_class

Mike …

* i2c: designware: Print clock freq on invalid clock freq error When we refuse to probe due to an invalid clock frequency, log the frequency which is causing this error. Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: Wolfram Sang <[email protected]> * i2c: designware: Some broken DSTDs use 1MiHz instead of 1MHz At least the Acer Iconia Tab8 / aka W1-810 uses 1MiHz instead of 1MHz for one of its busses, fix this up to 1MHz instead of failing the probe of that bus. This fixes the accelerometer on the Acer Iconia Tab8 not working. Cc: [email protected] Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: Wolfram Sang <[email protected]> * parisc: pdc_stable: Fix locking when creating sysfs links There's no need to take the write lock when creating sysfs links. This patch fixes the following BUG: BUG: sleeping function called from invalid context at mm/slab.h:416 in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: swapper/0 CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc2-00110-g0b5477d9dabd #111 Backtrace: [<0000000040217ac8>] show_stack+0x20/0x38 [<00000000406fbbb0>] dump_stack+0xb0/0x128 [<0000000040274090>] ___might_sleep+0x180/0x1b8 [<0000000040274144>] __might_sleep+0x7c/0xe8 [<0000000040373874>] kmem_cache_alloc+0x14c/0x1e0 [<0000000040419514>] __kernfs_new_node+0x84/0x1b8 [<000000004041b09c>] kernfs_new_node+0x3c/0x78 [<000000004041e040>] kernfs_create_link+0x40/0xd8 [<000000004041f320>] sysfs_do_create_link_sd.isra.0+0xb0/0x130 [<000000004041f3d4>] sysfs_create_link+0x34/0x58 [<000000004011b4a4>] pdc_stable_init+0x2c4/0x458 [<0000000040200250>] do_one_initcall+0x70/0x1d8 [<0000000040101644>] kernel_init_freeable+0x27c/0x390 [<000000004020be44>] kernel_init+0x24/0x1c0 Signed-off-by: James Bottomley <[email protected]> Reported-by: Meelis Roos <[email protected]> Signed-off-by: Helge Deller <[email protected]> * libata: fix a couple of doc build warnings The kerneldoc comments for a couple of functions in drivers/ata/libata-eh.c had fallen behind the current implementation, resulting in these doc build warnings: ./drivers/ata/libata-eh.c:1449: warning: No description found for parameter 'link' ./drivers/ata/libata-eh.c:1449: warning: Excess function parameter 'ap' description in 'ata_eh_done' ./drivers/ata/libata-eh.c:1590: warning: No description found for parameter 'qc' ./drivers/ata/libata-eh.c:1590: warning: Excess function parameter 'dev' description in 'ata_eh_request_sense' Update the comments and make the warnings go away. Signed-off-by: Jonathan Corbet <[email protected]> Signed-off-by: Tejun Heo <[email protected]> * i2c: allow i2c-versatile for ARM MPS platforms Allow i2c-versatile to be enabled for ARM MPS platforms. Signed-off-by: Russell King <[email protected]> Signed-off-by: Wolfram Sang <[email protected]> * i2c: rephrase explanation of I2C_CLASS_DEPRECATED Hopefully making clear that it is not needed for new drivers. Signed-off-by: Wolfram Sang <[email protected]> * parisc: Define CONFIG_CPU_BIG_ENDIAN While working on enabling queued rwlock on SPARC, found this following code in include/asm-generic/qrwlock.h which uses CONFIG_CPU_BIG_ENDIAN to clear a byte. static inline u8 *__qrwlock_write_byte(struct qrwlock *lock) { return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN); } Problem is many of the fixed big endian architectures don't define CPU_BIG_ENDIAN and clears the wrong byte. Define CPU_BIG_ENDIAN for parisc architecture to fix it. Signed-off-by: Babu Moger <[email protected]> Signed-off-by: Helge Deller <[email protected]> * clk: samsung: exynos5420: The EPLL rate table corrections This patch fixes values of the EPLL K coefficient and changes the EPLL output frequency values to match exactly what is possible to achieve with given M, P, S, K coefficients. This allows to avoid rounding errors and unexpected frequency being set with clk_set_rate(), due to recalc_rate returning different values than the PLL rate specified in the exynos5420_epll_24mhz_tbl table. E.g. this prevents a case where two consecutive clk_set_rate() calls with same argument result in different PLL output frequency. The PLL output frequencies have been calculated with formula: f = fxtal * (M * 2^16 + K) / (P * 2^S) / 2^16 where fxtal = 24000000. Fixes: 9842452acd ("clk: samsung: exynos542x: Add EPLL rate table") Signed-off-by: Sylwester Nawrocki <[email protected]> Signed-off-by: Stephen Boyd <[email protected]> * sunhme: fix up GREG_STAT and GREG_IMASK register offsets Update the values to match those from the STP2002QFP documentation. Signed-off-by: Mark Cave-Ayland <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net: phy: Correctly process PHY_HALTED in phy_stop_machine() Marc reported that he was not getting the PHY library adjust_link() callback function to run when calling phy_stop() + phy_disconnect() which does not indeed happen because we set the state machine to PHY_HALTED but we don't get to run it to process this state past that point. Fix this with a synchronous call to phy_state_machine() in order to have the state machine actually act on PHY_HALTED, set the PHY device's link down, turn the network device's carrier off and finally call the adjust_link() function. Reported-by: Marc Gonzalez <[email protected]> Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: Marc Gonzalez <[email protected]> Signed-off-by: David S. Miller <[email protected]> * ipv4: fib: Fix NULL pointer deref during fib_sync_down_dev() Michał reported a NULL pointer deref during fib_sync_down_dev() when unregistering a netdevice. The problem is that we don't check for 'in_dev' being NULL, which can happen in very specific cases. Usually routes are flushed upon NETDEV_DOWN sent in either the netdev or the inetaddr notification chains. However, if an interface isn't configured with any IP address, then it's possible for host routes to be flushed following NETDEV_UNREGISTER, after NULLing dev->ip_ptr in inetdev_destroy(). To reproduce: $ ip link add type dummy $ ip route add local 1.1.1.0/24 dev dummy0 $ ip link del dev dummy0 Fix this by checking for the presence of 'in_dev' before referencing it. Fixes: 982acb97560c ("ipv4: fib: Notify about nexthop status changes") Signed-off-by: Ido Schimmel <[email protected]> Reported-by: Michał Mirosław <[email protected]> Tested-by: Michał Mirosław <[email protected]> Signed-off-by: David S. Miller <[email protected]> * MAINTAINERS: Add more files to the PHY LIBRARY section Include missing files that are provided by, used, or directly maintained within the PHY LIBRARY, this include uapi header, header files used by Device Tree code etc. Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]> * mv643xx_eth: fix of_irq_to_resource() error check of_irq_to_resource() has recently been fixed to return negative error #'s along with 0 in case of failure, however the Marvell MV643xx Ethernet driver still only regards 0 as invalid IRQ -- fix it up. Fixes: 7a4228bbff76 ("of: irq: use of_irq_get() in of_irq_to_resource()") Signed-off-by: Sergei Shtylyov <[email protected]> Signed-off-by: David S. Miller <[email protected]> * virtio_net: fix truesize for mergeable buffers Seth Forshee noticed a performance degradation with some workloads. This turns out to be due to packet drops. Euan Kemp noticed that this is because we drop all packets where length exceeds the truesize, but for some packets we add in extra memory without updating the truesize. This in turn was kept around unchanged from ab7db91705e95 ("virtio-net: auto-tune mergeable rx buffer size for improved performance"). That commit had an internal reason not to account for the extra space: not enough bits to do it. No longer true so let's account for the allocated length exactly. Many thanks to Seth Forshee for the report and bisecting and Euan Kemp for debugging the issue. Fixes: 680557cf79f8 ("virtio_net: rework mergeable buffer handling") Reported-by: Euan Kemp <[email protected]> Tested-by: Euan Kemp <[email protected]> Reported-by: Seth Forshee <[email protected]> Tested-by: Seth Forshee <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: David S. Miller <[email protected]> * Revert "net: bcmgenet: Remove init parameter from bcmgenet_mii_config" This reverts commit 28b45910ccda ("net: bcmgenet: Remove init parameter from bcmgenet_mii_config") because in the process of moving from dev_info() to dev_info_once() we essentially lost the helpful printed messages once the second instance of the driver is loaded. dev_info_once() does not actually print the message once per device instance, but once period. Fixes: 28b45910ccda ("net: bcmgenet: Remove init parameter from bcmgenet_mii_config") Signed-off-by: Florian Fainelli <[email protected]> Reviewed-by: Doug Berger <[email protected]> Signed-off-by: David S. Miller <[email protected]> * ppp: Fix a scheduling-while-atomic bug in del_chan The PPTP set the pptp_sock_destruct as the sock's sk_destruct, it would trigger this bug when __sk_free is invoked in atomic context, because of the call path pptp_sock_destruct->del_chan->synchronize_rcu. Now move the synchronize_rcu to pptp_release from del_chan. This is the only one case which would free the sock and need the synchronize_rcu. The following is the panic I met with kernel 3.3.8, but this issue should exist in current kernel too according to the codes. BUG: scheduling while atomic __schedule_bug+0x5e/0x64 __schedule+0x55/0x580 ? ppp_unregister_channel+0x1cd5/0x1de0 [ppp_generic] ? dev_hard_start_xmit+0x423/0x530 ? sch_direct_xmit+0x73/0x170 __cond_resched+0x16/0x30 _cond_resched+0x22/0x30 wait_for_common+0x18/0x110 ? call_rcu_bh+0x10/0x10 wait_for_completion+0x12/0x20 wait_rcu_gp+0x34/0x40 ? wait_rcu_gp+0x40/0x40 synchronize_sched+0x1e/0x20 0xf8417298 0xf8417484 ? sock_queue_rcv_skb+0x109/0x130 __sk_free+0x16/0x110 ? udp_queue_rcv_skb+0x1f2/0x290 sk_free+0x16/0x20 __udp4_lib_rcv+0x3b8/0x650 Signed-off-by: Gao Feng <[email protected]> Signed-off-by: David S. Miller <[email protected]> * udp6: fix jumbogram reception Since commit 67a51780aebb ("ipv6: udp: leverage scratch area helpers") udp6_recvmsg() read the skb len from the scratch area, to avoid a cache miss. But the UDP6 rx path support RFC 2675 UDPv6 jumbograms, and their length exceeds the 16 bits available in the scratch area. As a side effect the length returned by recvmsg() is: <ingress datagram len> % (1<<16) This commit addresses the issue allocating one more bit in the IP6CB flags field and setting it for incoming jumbograms. Such field is still in the first cacheline, so at recvmsg() time we can check it and fallback to access skb->len if required, without a measurable overhead. Fixes: 67a51780aebb ("ipv6: udp: leverage scratch area helpers") Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]> * samples/bpf: fix bpf tunnel cleanup test_tunnel_bpf.sh fails to remove the vxlan11 tunnel device, causing the next geneve tunnelling test case fails. In addition, the geneve reserved bit in tcbpf2_kern.c should be zero, according to the RFC. Signed-off-by: William Tu <[email protected]> Signed-off-by: David S. Miller <[email protected]> * pinctrl: stm32: select IRQ_DOMAIN_HIERARCHY instead of depends on Drivers that need IRQ_DOMAIN_HIERARCHY should "select" it, but drivers/pinctrl/stm32/Kconfig is the only exception that uses "depends on" syntax. This prevents GPIO drivers from select'ing IRQ_DOMAIN_HIERARCHY. For example, if I add "select IRQ_DOMAIN_HIERARCHY" to GPIO_XGENE_SB, I get the following recursive dependency error. drivers/gpio/Kconfig:13:error: recursive dependency detected! For a resolution refer to Documentation/kbuild/kconfig-language.txt subsection "Kconfig recursive dependency limitations" drivers/gpio/Kconfig:13: symbol GPIOLIB is selected by PINCTRL_STM32 For a resolution refer to Documentation/kbuild/kconfig-language.txt subsection "Kconfig recursive dependency limitations" drivers/pinctrl/stm32/Kconfig:3: symbol PINCTRL_STM32 is selected by PINCTRL_STM32F429 For a resolution refer to Documentation/kbuild/kconfig-language.txt subsection "Kconfig recursive dependency limitations" drivers/pinctrl/stm32/Kconfig:11: symbol PINCTRL_STM32F429 depends on IRQ_DOMAIN_HIERARCHY For a resolution refer to Documentation/kbuild/kconfig-language.txt subsection "Kconfig recursive dependency limitations" kernel/irq/Kconfig:67: symbol IRQ_DOMAIN_HIERARCHY is selected by GPIO_XGENE_SB For a resolution refer to Documentation/kbuild/kconfig-language.txt subsection "Kconfig recursive dependency limitations" drivers/gpio/Kconfig:502: symbol GPIO_XGENE_SB depends on GPIOLIB Signed-off-by: Masahiro Yamada <[email protected]> Tested-by: Alexandre TORGUE <[email protected]> Signed-off-by: Linus Walleij <[email protected]> * gpio: xgene-sb: select IRQ_DOMAIN_HIERARCHY This driver calls irq_domain_hierarchy() and irq_chip_*_parent(). They are available only when CONFIG_IRQ_DOMAIN_HIERARCHY is enabled. Signed-off-by: Masahiro Yamada <[email protected]> Signed-off-by: Linus Walleij <[email protected]> * gpio: gpio-mxc: Fix: higher 16 GPIOs usable as wake source In the function gpio_set_wake_irq(), port->irq_high is only checked for zero. As platform_get_irq() returns a value less then zero if no interrupt was found, any gpio >= 16 was handled like an irq_high interrupt was available. On iMX27 for example no high interrupt is available. This lead to the problem that only some gpios (the lower 16) were useable as wake sources. Signed-off-by: Philipp Rosenberger <[email protected]> Signed-off-by: Linus Walleij <[email protected]> * x86/hpet: Cure interface abuse in the resume path The HPET resume path abuses irq_domain_[de]activate_irq() to restore the MSI message in the HPET chip for the boot CPU on resume and it relies on an implementation detail of the interrupt core code, which magically makes the HPET unmask call invoked via a irq_disable/enable pair. This worked as long as the irq code did unconditionally invoke the unmask() callback. With the recent changes which keep track of the masked state to avoid expensive hardware access, this does not longer work. As a consequence the HPET timer interrupts are not unmasked which breaks resume as the boot CPU waits forever that a timer interrupt arrives. Make the restore of the MSI message explicit and invoke the unmask() function directly. While at it get rid of the pointless affinity setting as nothing can change the affinity of the interrupt and the vector across suspend/resume. The restore of the MSI message reestablishes the previous affinity setting which is the correct one. Fixes: bf22ff45bed6 ("genirq: Avoid unnecessary low level irq function calls") Reported-and-tested-by: Tomi Sarvela <[email protected]> Reported-by: Martin Peres <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: "Rafael J. Wysocki" <[email protected]> Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Cc: Marc Zyngier <[email protected]> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1707312158590.2287@nanos * arm64: Use arch_timer_get_rate when trapping CNTFRQ_EL0 In an ideal world, CNTFRQ_EL0 always contains the timer frequency for the kernel to use. Sadly, we get quite a few broken systems where the firmware authors cannot be bothered to program that register on all CPUs, and rely on DT to provide that frequency. So when trapping CNTFRQ_EL0, make sure to return the actual rate (as known by the kernel), and not CNTFRQ_EL0. Acked-by: Mark Rutland <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Will Deacon <[email protected]> * ASoC: rt5665: fix wrong register for bclk ratio control The register of setting back ratio should be RT5665_ADDA_CLK_2 instead of RT5665_ADDA_CLK_1. Signed-off-by: Bard Liao <[email protected]> Signed-off-by: Mark Brown <[email protected]> * gpio: exar: Use correct property prefix and document bindings The device-specific property should be prefixed with the vendor name, not "linux,", as Linus Walleij pointed out. Change this and document the bindings of this platform device. We didn't ship the old binding in a release yet. So we can still change it without breaking an official API. Fixes: 380b1e2f3a2f ("gpio-exar/8250-exar: Make set of exported GPIOs configurable") Signed-off-by: Jan Kiszka <[email protected]> Acked-by: Rob Herring <[email protected]> Signed-off-by: Linus Walleij <[email protected]> * gpiolib: skip unwanted events, don't convert them to opposite edge The previous fix for filtering out of unwatched events was not entirely correct. Instead of skipping the events we don't want, they are now interpreted as events with opposing edge. In order to fix it: always read the GPIO line value on interrupt and only emit the event if it corresponds with the event type we requested. Cc: [email protected] Fixes: ad537b822577 ("gpiolib: fix filtering out unwanted events") Signed-off-by: Bartosz Golaszewski <[email protected]> Signed-off-by: Linus Walleij <[email protected]> * clk: meson: mpll: fix mpll0 fractional part ignored mpll0 clock is special compared to the other mplls. It needs another bit (ssen) to be set to activate the fractional part the mpll divider Fixes: 007e6e5c5f01 ("clk: meson: mpll: add rw operation") Signed-off-by: Jerome Brunet <[email protected]> Signed-off-by: Neil Armstrong <[email protected]> * timers: Fix overflow in get_next_timer_interrupt For e.g. HZ=100, timer being 430 jiffies in the future, and 32 bit unsigned int, there is an overflow on unsigned int right-hand side of the expression which results with wrong values being returned. Type cast the multiplier to 64bit to avoid that issue. Fixes: 46c8f0b077a8 ("timers: Fix get_next_timer_interrupt() computation") Signed-off-by: Matija Glavinic Pecotic <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Alexander Sverdlin <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] * device property: Fix usecount for of_graph_get_port_parent() Fix inconsistent use of of_graph_get_port_parent() where asoc_simple_card_parse_graph_dai() does of_node_get() before calling it while other callers do not. We can fix this by not trashing the node passed to of_graph_get_port_parent(). Let's also make sure the callers have correct refcounts and remove related incorrect of_node_put() calls for of_for_each_phandle as that's done by of_phandle_iterator_next() except when we break out of the loop early. Let's fix both issues with a single patch to avoid kobject refcounts getting messed up more if two patches are merged separately. Otherwise strange issues can happen caused by memory corruption caused by too many kobject_del() calls such as: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747 ... (___might_sleep) (__mutex_lock) (mutex_lock_nested) (kernfs_remove) (kobject_del) (kobject_put) (of_get_next_parent) (of_graph_get_port_parent) (asoc_simple_card_parse_graph_dai [snd_soc_simple_card_utils]) (asoc_graph_card_probe [snd_soc_audio_graph_card]) Fixes: 0ef472a973eb ("of_graph: add of_graph_get_port_parent()") Fixes: 2692c1c63c29 ("ASoC: add audio-graph-card support") Fixes: 1689333f8311 ("ASoC: simple-card-utils: add asoc_simple_card_parse_graph_dai()") Signed-off-by: Tony Lindgren <[email protected]> Reviewed-by: Rob Herring <[email protected]> Tested-by: Antonio Borneo <[email protected]> Tested-by: Kuninori Morimoto <[email protected]> Signed-off-by: Mark Brown <[email protected]> * libceph: make encode_request_*() work with r_mempool requests Messages allocated out of ceph_msgpool have a fixed front length (pool->front_len). Asserting that the entire front has been filled while encoding is thus wrong. Fixes: 8cb441c0545d ("libceph: MOSDOp v8 encoding (actual spgid + full hash)") Reported-by: "Yan, Zheng" <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: "Yan, Zheng" <[email protected]> * libceph: don't call ->reencode_message() more than once per message Reencoding an already reencoded message is a bad idea. This could happen on Policy::stateful_server connections (!CEPH_MSG_CONNECT_LOSSY), such as MDS sessions. This didn't pop up in testing because currently only OSD requests are reencoded and OSD sessions are always lossy. Fixes: 98ad5ebd1505 ("libceph: ceph_connection_operations::reencode_message() method") Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: "Yan, Zheng" <[email protected]> * crush: assume weight_set != null imples weight_set_size > 0 Reflects ceph.git commit 5e8fa3e06b68fae1582c9230a3a8d1abc6146286. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Sage Weil <[email protected]> * libceph: fallback for when there isn't a pool-specific choose_arg There is now a fallback to a choose_arg index of -1 if there isn't a pool-specific choose_arg set. If you create a per-pool weight-set, that works for that pool. Otherwise we try the compat/default one. If that doesn't exist either, then we use the normal CRUSH weights. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Sage Weil <[email protected]> * libceph: upmap semantic changes - apply both pg_upmap and pg_upmap_items - allow bidirectional swap of pg-upmap-items Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Sage Weil <[email protected]> * libceph: make RECOVERY_DELETES feature create a new interval This is needed so that the OSDs can regenerate the missing set at the start of a new interval where support for recovery deletes changed. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Sage Weil <[email protected]> * xtensa: don't limit csum_partial export by CONFIG_NET csum_partial and csum_partial_copy_generic are defined unconditionally and are available even when CONFIG_NET is disabled. They are used not only by the network drivers, but also by scsi and media. Don't limit these functions export by CONFIG_NET. Cc: [email protected] Signed-off-by: Max Filippov <[email protected]> * xtensa: mm/cache: add missing EXPORT_SYMBOLs Functions clear_user_highpage, copy_user_highpage, flush_dcache_page, local_flush_cache_range and local_flush_cache_page may be used from modules. Export them. Cc: [email protected] Signed-off-by: Max Filippov <[email protected]> * KVM: nVMX: do not fill vm_exit_intr_error_code in prepare_vmcs12 Do this in the caller of nested_vmx_vmexit instead. nested_vmx_check_exception was doing a vmwrite to the vmcs02's VM_EXIT_INTR_ERROR_CODE field, so that prepare_vmcs12 would move the field to vmcs12->vm_exit_intr_error_code. However that isn't possible on pre-Haswell machines. Moving the vmcs12 write to the callers fixes it. Reported-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> [Changed nested_vmx_reflect_vmexit() return type to (int)1 from (bool)1, thanks to [email protected]] Signed-off-by: Radim Krčmář <[email protected]> * KVM: nVMX: fixes to nested virt interrupt injection There are three issues in nested_vmx_check_exception: 1) it is not taking PFEC_MATCH/PFEC_MASK into account, as reported by Wanpeng Li; 2) it should rebuild the interruption info and exit qualification fields from scratch, as reported by Jim Mattson, because the values from the L2->L0 vmexit may be invalid (e.g. if an emulated instruction causes a page fault, the EPT misconfig's exit qualification is incorrect). 3) CR2 and DR6 should not be written for exception intercept vmexits (CR2 only for AMD). This patch fixes the first two and adds a comment about the last, outlining the fix. Cc: Jim Mattson <[email protected]> Cc: Wanpeng Li <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> * KVM: async_pf: make rcu irq exit if not triggered from idle task WARNING: CPU: 5 PID: 1242 at kernel/rcu/tree_plugin.h:323 rcu_note_context_switch+0x207/0x6b0 CPU: 5 PID: 1242 Comm: unity-settings- Not tainted 4.13.0-rc2+ #1 RIP: 0010:rcu_note_context_switch+0x207/0x6b0 Call Trace: __schedule+0xda/0xba0 ? kvm_async_pf_task_wait+0x1b2/0x270 schedule+0x40/0x90 kvm_async_pf_task_wait+0x1cc/0x270 ? prepare_to_swait+0x22/0x70 do_async_page_fault+0x77/0xb0 ? do_async_page_fault+0x77/0xb0 async_page_fault+0x28/0x30 RIP: 0010:__d_lookup_rcu+0x90/0x1e0 I encounter this when trying to stress the async page fault in L1 guest w/ L2 guests running. Commit 9b132fbe5419 (Add rcu user eqs exception hooks for async page fault) adds rcu_irq_enter/exit() to kvm_async_pf_task_wait() to exit cpu idle eqs when needed, to protect the code that needs use rcu. However, we need to call the pair even if the function calls schedule(), as seen from the above backtrace. This patch fixes it by informing the RCU subsystem exit/enter the irq towards/away from idle for both n.halted and !n.halted. Cc: Paolo Bonzini <[email protected]> Cc: Radim Krčmář <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: [email protected] Signed-off-by: Wanpeng Li <[email protected]> Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Radim Krčmář <[email protected]> * NFSv4: Fix EXCHANGE_ID corrupt verifier issue The verifier is allocated on the stack, but the EXCHANGE_ID RPC call was changed to be asynchronous by commit 8d89bd70bc939. If we interrrupt the call to rpc_wait_for_completion_task(), we can therefore end up transmitting random stack contents in lieu of the verifier. Fixes: 8d89bd70bc939 ("NFS setup async exchange_id") Cc: [email protected] # v4.9+ Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Anna Schumaker <[email protected]> * ptp: introduce ptp auxiliary worker Many PTP drivers required to perform some asynchronous or periodic work, like periodically handling PHC counter overflow or handle delayed timestamp for RX/TX network packets. In most of the cases, such work is implemented using workqueues. Unfortunately, Kernel workqueues might introduce significant delay in work scheduling under high system load and on -RT, which could cause misbehavior of PTP drivers due to internal counter overflow, for example, and there is no way to tune its execution policy and priority manuallly. Hence, The kthread_worker can be used insted of workqueues, as it create separte named kthread for each worker and its its execution policy and priority can be configured using chrt tool. This prblem was reported for two drivers TI CPSW CPTS and dp83640, so instead of modifying each of these driver it was proposed to add PTP auxiliary worker to the PHC subsystem. The patch adds PTP auxiliary worker in PHC subsystem using kthread_worker and kthread_delayed_work and introduces two new PHC subsystem APIs: - long (*do_aux_work)(struct ptp_clock_info *ptp) callback in ptp_clock_info structure, which driver should assign if it require to perform asynchronous or periodic work. Driver should return the delay of the PTP next auxiliary work scheduling time (>=0) or negative value in case further scheduling is not required. - int ptp_schedule_worker(struct ptp_clock *ptp, unsigned long delay) which allows schedule PTP auxiliary work. The name of kthread_worker thread corresponds PTP PHC device name "ptp%d". Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net: ethernet: ti: cpts: convert to use ptp auxiliary worker There could be significant delay in CPTS work schedule under high system load and on -RT which could cause CPTS misbehavior due to internal counter overflow. Usage of own kthread_worker allows to avoid such kind of issues and makes it possible to tune priority of CPTS kthread_worker thread on -RT (thread name "cpts"). Hence, the CPTS driver is converted to use PTP auxiliary worker as PHC subsystem implements such functionality in a generic way now. Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net: ethernet: ti: cpts: fix tx timestamping timeout With the low speed Ethernet connection CPDMA notification about packet processing can be received before CPTS TX timestamp event, which is set when packet actually left CPSW while cpdma notification is sent when packet pushed in CPSW fifo. As result, when connection is slow and CPU is fast enough TX timestamping is not working properly. Fix it, by introducing TX SKB queue to store PTP SKBs for which Ethernet Transmit Event hasn't been received yet and then re-check this queue with new Ethernet Transmit Events by scheduling CPTS overflow work more often (every 1 jiffies) until TX SKB queue is not empty. Side effect of this change is: - User space tools require to take into account possible delay in TX timestamp processing (for example ptp4l works with tx_timestamp_timeout=400 under net traffic and tx_timestamp_timeout=25 in idle). Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net: ethernet: ti: cpts: fix fifo read in cpts_find_ts Now the call chain cpts_find_ts() |- cpts_fifo_read(cpts, CPTS_EV_PUSH) will stop reading CPTS FIFO if PUSH event is found. But this is not expected and CPTS FIFI should be completely drained here. This is most probably copy-paste error and it has no negative impact as CPTS_EV_PUSH should not be present in FIFO without TS_PUSH request and cpts_systim_read() and cpts_find_ts() synchronized by spin_lock. Correct above by calling cpts_fifo_read() with -1 parameter, so it will read all CPTS event from FIFO. Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: David S. Miller <[email protected]> * Cipso: cipso_v4_optptr enter infinite loop in for(),if((optlen > 0) && (optptr[1] == 0)), enter infinite loop. Test: receive a packet which the ip length > 20 and the first byte of ip option is 0, produce this issue Signed-off-by: yujuan.qi <[email protected]> Acked-by: Paul Moore <[email protected]> Signed-off-by: David S. Miller <[email protected]> * platform/x86: dell-wmi: Fix driver interface version query When I converted dell-wmi to the new bus infrastructure, I left the call to dell_wmi_check_descriptor_buffer() in dell_wmi_init(). This could cause two problems: - An error message when loading the driver on a system without dell-wmi. We'd try to read the event descriptor even if the WMI GUID wasn't there. - A possible race if dell-wmi was loaded manually before wmi was fully initialized. Fix it by moving the call to the probe function where it belongs. Fixes: bff589be59c5 ("platform/x86: dell-wmi: Convert to the WMI bus infrastructure") Signed-off-by: Andy Lutomirski <[email protected]> Reviewed-by: Pali Rohár <[email protected]> Signed-off-by: Darren Hart (VMware) <[email protected]> * vxlan: fix remcsum when GRO on and CHECKSUM_PARTIAL boundary is outer UDP In the case that GRO is turned on and the original received packet is CHECKSUM_PARTIAL, if the outer UDP header is exactly at the last csum-unnecessary point, which for instance could occur if the packet comes from another Linux guest on the same Linux host, we have to do either remcsum_adjust or set up CHECKSUM_PARTIAL again with its csum_start properly reset considering RCO. However, since b7fe10e5ebac("gro: Fix remcsum offload to deal with frags in GRO") that barrier in such case could be skipped if GRO turned on, hence we pass over it and the inner L4 validation mistakenly reckons it as a bad csum. This patch makes remcsum_offload being reset at the same time of GRO remcsum cleanup, so as to make it work in such case as before. Fixes: b7fe10e5ebac ("gro: Fix remcsum offload to deal with frags in GRO") Signed-off-by: Koichiro Den <[email protected]> Signed-off-by: David S. Miller <[email protected]> * gue: fix remcsum when GRO on and CHECKSUM_PARTIAL boundary is outer UDP In the case that GRO is turned on and the original received packet is CHECKSUM_PARTIAL, if the outer UDP header is exactly at the last csum-unnecessary point, which for instance could occur if the packet comes from another Linux guest on the same Linux host, we have to do either remcsum_adjust or set up CHECKSUM_PARTIAL again with its csum_start properly reset considering RCO. However, since b7fe10e5ebac ("gro: Fix remcsum offload to deal with frags in GRO") that barrier in such case could be skipped if GRO turned on, hence we pass over it and the inner L4 validation mistakenly reckons it as a bad csum. This patch makes remcsum_offload being reset at the same time of GRO remcsum cleanup, so as to make it work in such case as before. Fixes: b7fe10e5ebac ("gro: Fix remcsum offload to deal with frags in GRO") Signed-off-by: Koichiro Den <[email protected]> Signed-off-by: David S. Miller <[email protected]> * b44: Initialize 64-bit stats seqcount On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a lockdep splat indicating this seqcount is not correctly initialized, fix that. Fixes: eeda8585522b ("b44: add 64 bit stats") Signed-off-by: Florian Fainelli <[email protected]> Acked-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]> * i40e: Initialize 64-bit statistics TX ring seqcount On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a lockdep splat indicating this seqcount is not correctly initialized, fix that. Fixes: 980e9b118642 ("i40e: Add support for 64 bit netstats") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]> * ixgbe: Initialize 64-bit stats seqcounts On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a lockdep splat indicating this seqcount is not correctly initialized, fix that. Fixes: 4197aa7bb818 ("ixgbevf: provide 64 bit statistics") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]> * nfp: Initialize RX and TX ring 64-bit stats seqcounts On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a lockdep splat indicating this seqcount is not correctly initialized, fix that. Fixes: 4c3523623dc0 ("net: add driver for Netronome NFP4000/NFP6000 NIC VFs") Signed-off-by: Florian Fainelli <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]> * gtp: Initialize 64-bit per-cpu stats correctly On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a lockdep splat indicating this seqcount is not correctly initialized, fix that by using netdev_alloc_pcpu_stats() instead of an open coded allocation. Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]> * netvsc: Initialize 64-bit stats seqcount On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a lockdep splat indicating this seqcount is not correctly initialized, fix that. In commit 6c80f3fc2398 ("netvsc: report per-channel stats in ethtool statistics") netdev_alloc_pcpu_stats() was removed in favor of open-coding the 64-bits statistics, except that u64_stats_init() was missed. Fixes: 6c80f3fc2398 ("netvsc: report per-channel stats in ethtool statistics") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]> * ipvlan: Fix 64-bit statistics seqcount initialization On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a lockdep splat indicating this seqcount is not correctly initialized, fix that by using the proper helper function: netdev_alloc_pcpu_stats(). Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]> * nand: fix wrong default oob layout for small pages using soft ecc When using soft ecc, if no ooblayout is given, the core automatically uses one of the nand_ooblayout_{sp,lp}*() functions to determine the layout inside the out of band data. Until kernel version 4.6, struct nand_ecclayout was used for that purpose. During the migration from 4.6 to 4.7, an error shown up in the small page layout, in the case oob section is only 8 bytes long. The layout was using three bytes (0, 1, 2) for ecc, two bytes (3, 4) as free bytes, one byte (5) for bad block marker and finally two bytes (6, 7) as free bytes, as shown there: [linux-4.6] drivers/mtd/nand/nand_base.c:52 static struct nand_ecclayout nand_oob_8 = { .eccbytes = 3, .eccpos = {0, 1, 2}, .oobfree = { {.offset = 3, .length = 2}, {.offset = 6, .length = 2} } }; This fixes the current implementation which is incoherent. It references bit 3 at the same time as an ecc byte and a free byte. Furthermore, it is clear with the previous implementation that there is only one ecc section with 8 bytes oob sections. We shall return -ERANGE in the nand_ooblayout_ecc_sp() function when asked for the second section. Signed-off-by: Miquel Raynal <[email protected]> Fixes: 41b207a70d3a ("mtd: nand: implement the default mtd_ooblayout_ops") Cc: <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> * mtd: nand: sunxi: fix potential divide-by-zero error clk_round_rate() can return <= 0. Currently the value returned by clk_round_rate() is used directly for a division. This patch introduces a guard to ensure a divide-by-zero or a divide by a negative number for that matter can't happen by bugging out returning -EINVAL if clk_round_rate() returns <= 0. Fixes: 2d43457f79e4 ("mtd: nand: sunxi: fix EDO mode selection") Signed-off-by: Bryan O'Donoghue <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> * mtd: nand: Fix a docs build warning Commit 0b4773fd1649 (mtd: nand: Drop unused cached programming support) removed the "cached" parameter from nand_write_page(), but did not update the kerneldoc comments, creating this docs build warning: ./drivers/mtd/nand/nand_base.c:2751: warning: Excess function parameter 'cached' description in 'nand_write_page' Remove the offending line so we can have a little peace and quiet. Signed-off-by: Jonathan Corbet <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> * mtd: nand: Fix timing setup for NANDs that do not support SET FEATURES Some ONFI NANDs do not support the SET/GET FEATURES commands, which, according to the spec, is perfectly valid. On these NANDs we can't set a specific timing mode using the "timing mode" feature, and we should assume the NAND does not require any setup to enter a specific timing mode. Signed-off-by: Boris Brezillon <[email protected]> Fixes: d8e725dd8311 ("mtd: nand: automate NAND timings selection") Reported-by: Alexander Dahl <[email protected]> Cc: <[email protected]> Tested-by: Alexander Dahl <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> * mtd: nand: Declare tBERS, tR and tPROG as u64 to avoid integer overflow All timings in nand_sdr_timings are expressed in picoseconds but some of them may not fit in an u32. Signed-off-by: Boris Brezillon <[email protected]> Fixes: 204e7ecd47e2 ("mtd: nand: Add a few more timings to nand_sdr_timings") Reported-by: Alexander Dahl <[email protected]> Cc: <[email protected]> Reviewed-by: Alexander Dahl <[email protected]> Tested-by: Alexander Dahl <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> * mtd: nand: atmel: Fix EDO mode check EDO mode should be used when tRC is less than 30ns, but timings are expressed in picoseconds in the nand_sdr_timings struct. Signed-off-by: Boris Brezillon <[email protected]> Fixes: f9ce2eddf176 ("mtd: nand: atmel: Add ->setup_data_interface() hooks") Reported-by: Alexander Dahl <[email protected]> Tested-by: Alexander Dahl <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> * gpio: tegra: fix unbalanced chained_irq_enter/exit When more than one GPIO IRQs are triggered simultaneously, tegra_gpio_irq_handler() called chained_irq_exit() multiple times for one chained_irq_enter(). Fixes: 3c92db9ac0ca3eee8e46e2424b6c074e2e394ad9 Signed-off-by: Michał Mirosław <[email protected]> [Also changed the variable to a bool] Signed-off-by: Linus Walleij <[email protected]> * powerpc/83xx/mpc832x_rdb: fix of_irq_to_resource() error check of_irq_to_resource() has recently been fixed to return negative error #'s along with 0 in case of failure, however the Freescale MPC832x RDB board code still only regards 0 as a failure indication -- fix it up. Fixes: 7a4228bbff76 ("of: irq: use of_irq_get() in of_irq_to_resource()") Signed-off-by: Sergei Shtylyov <[email protected]> Acked-by: Scott Wood <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> * ALSA: hda - Fix speaker output from VAIO VPCL14M1R Sony VAIO VPCL14M1R needs the quirk to make the speaker working properly. Tested-by: Dmitriy <[email protected]> Cc: <[email protected]> Signed-off-by: Sergei A. Trusov <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> * NFSv4: Fix double frees in nfs4_test_session_trunk() rpc_clnt_add_xprt() expects the callback function to be synchronous, and expects to release the transport and switch references itself. Fixes: 04fa2c6bb51b1 ("NFS pnfs data server multipath session trunking") Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Anna Schumaker <[email protected]> * ARM64: dts: marvell: armada-37xx: Fix the number of GPIO on south bridge The number of pins in South Bridge is 30 and not 29. There is a fix for the driver for the pinctrl, but a fix is also need at device tree level for the GPIO. Fixes: afda007feda5 ("ARM64: dts: marvell: Add pinctrl nodes for Armada 3700") Cc: <[email protected]> Signed-off-by: Gregory CLEMENT <[email protected]> * blk-mq: don't leak preempt counter/q_usage_counter when allocating rq failed When blk_mq_get_request() failed, preempt counter isn't released, and blk_mq_make_request() doesn't release the counter too. This patch fixes the issue, and makes sure that preempt counter is only held if rq is allocated successfully. The same policy is applied on .q_usage_counter too. Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]> * lan78xx: USB fast connect/disconnect crash fix USB fast connect/disconnect crash fix When USB plugged/unplugged at fast rate, lan78xx_mdio_init() in lan78xx_bind() failing case is not handled. Whenever lan78xx_mdio_init() failed, dev->mdiobus will be freed, however since lan78xx_bind() not consider as error and try to proceed for further initialization in lan78xx_probe() which leads system hung/crash. Also when register_netdev() failed, netdev is freed without calling lan78xx_unbind(). Hence halting the failed cases right manner fixes the system crash/hung issue. Signed-off-by: Nisar Sayed <[email protected]> Signed-off-by: David S. Miller <[email protected]> * lan78xx: Fix to handle hard_header_len update Fix to handle hard_header_len update When ifconfig up/down sequence is initiated hard_header_len get updated incrementally for each ifconfig up /down sequence, this leads invalid hard_header_len, moving to lan78xx_bind to have one time update of hard_header_len addresses the issue. Signed-off-by: Nisar Sayed <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net/mlx4_en: Fix wrong indication of Wake-on-LAN (WoL) support Currently when WoL is supported but disabled, ethtool reports: "Supports Wake-on: d". Fix the indication of Wol support, so that the indication remains "g" all the time if the NIC supports WoL. Tested: As accepted, when NIC supports WoL- ethtool reports: Supports Wake-on: g Wake-on: d when NIC doesn't support WoL- ethtool reports: Supports Wake-on: d Wake-on: d Fixes: 14c07b1358ed ("mlx4: Wake on LAN support") Signed-off-by: Inbar Karmy <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net/mlx4_core: Fix sl_to_vl_change bit offset in flags2 dump The index value in function dump_dev_cap_flags2() for outputting "sl to vl mapping table change event support" needs to be consistent with the value of the enumerated constant MLX4_DEV_CAP_FLAG2_SL_TO_VL_CHANGE_EVENT defined in file include/linux/mlx4_device.h The change here restores that consistency. Fixes: fd10ed8e6f42 ("IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets") Reported-by: Mukesh Kacker <[email protected]> Signed-off-by: Jack Morgenstein <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net/mlx4_core: Fix namespace misalignment in QinQ VST support commit The cited commit introduced the following new enum value in file include/linux/mlx4/device.h: MLX4_DEV_CAP_FLAG2_SVLAN_BY_QP However the value of MLX4_DEV_CAP_FLAG2_SVLAN_BY_QP needs to stay consistent with the value used in another namespace in function dump_dev_cap_flags2(), which is manually kept in sync. The change here restores that consistency. Fixes: 7c3d21c8153c ("net/mlx4_core: Preparation for VF vlan protocol 802.1ad") Reported-by: Mukesh Kacker <[email protected]> Signed-off-by: Jack Morgenstein <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net/mlx4_core: Fixes missing capability bit in flags2 capability dump The cited commit introduced the following new enum value in file include/linux/mlx4/device.h: QUERY_DEV_CAP_DIAG_RPRT_PER_PORT However, it failed to introduce a corresponding entry in function dump_dev_cap_flags2() for outputting a line in the message log when this capability bit is set. The change here fixes that omission. Fixes: c7c122ed67e4 ("net/mlx4: Add diagnostic counters capability bit") Reported-by: Mukesh Kacker <[email protected]> Signed-off-by: Jack Morgenstein <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]> * usb: qmi_wwan: add D-Link DWM-222 device ID Signed-off-by: Hector Martin <[email protected]> Signed-off-by: David S. Miller <[email protected]> * ibmvnic: Initialize SCRQ's during login renegotiation SCRQ resources are freed during renegotiation, but they are not re-allocated afterwards due to some changes in the initialization process. Fix that by re-allocating the memory after renegotation. SCRQ's can also be freed if a server capabilities request fails. If this were encountered during a device reset for example, SCRQ's may not be re-allocated. This operation is not necessary anymore so remove it. Signed-off-by: Thomas Falcon <[email protected]> Signed-off-by: David S. Miller <[email protected]> * tcp: avoid setting cwnd to invalid ssthresh after cwnd reduction states If the sender switches the congestion control during ECN-triggered cwnd-reduction state (CA_CWR), upon exiting recovery cwnd is set to the ssthresh value calculated by the previous congestion control. If the previous congestion control is BBR that always keep ssthresh to TCP_INIFINITE_SSTHRESH, cwnd ends up being infinite. The safe step is to avoid assigning invalid ssthresh value when recovery ends. Signed-off-by: Yuchung Cheng <[email protected]> Signed-off-by: Neal Cardwell <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]> * drm/amdgpu: Fix undue fallthroughs in golden registers initialization As I was staring at the si_init_golden_registers code, I noticed that the Pitcairn initialization silently falls through the Cape Verde initialization, and the Oland initialization falls through the Hainan initialization. However there is no comment stating that this is intentional, and the radeon driver doesn't have any such fallthrough, so I suspect this is not supposed to happen. Signed-off-by: Jean Delvare <[email protected]> Fixes: 62a37553414a ("drm/amdgpu: add si implementation v10") Cc: Ken Wang <[email protected]> Cc: Alex Deucher <[email protected]> Cc: "Marek Olšák" <[email protected]> Cc: "Christian König" <[email protected]> Cc: Flora Cui <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] * drm/amdgpu: Use list_del_init in amdgpu_mn_unregister Otherwise bo->shadow_list (which is aliased by bo->mn_list) will not appear empty in amdgpu_ttm_bo_destroy and cause an oops when freeing former userptr BOs. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> * KVM: X86: Fix loss of pending INIT due to race When SMP VM start, AP may lost INIT because of receiving INIT between kvm_vcpu_ioctl_x86_get/set_vcpu_events. vcpu 0 vcpu 1 kvm_vcpu_ioctl_x86_get_vcpu_events events->smi.latched_init = 0 send INIT to vcpu1 set vcpu1's pending_events kvm_vcpu_ioctl_x86_set_vcpu_events if (events->smi.latched_init == 0) clear INIT in pending_events This patch fixes it by just update SMM related flags if we are in SMM. Thanks Peng Hao for the report and original commit message. Reported-by: Peng Hao <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Radim Krčmář <[email protected]> Signed-off-by: Wanpeng Li <[email protected]> Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Radim Krčmář <[email protected]> * KVM: X86: init irq->level in kvm_pv_kick_cpu_op 'lapic_irq' is a local variable and its 'level' field isn't initialized, so 'level' is random, it doesn't matter but makes UBSAN unhappy: UBSAN: Undefined behaviour in .../lapic.c:... load of value 10 is not a valid value for type '_Bool' ... Call Trace: [<ffffffff81f030b6>] dump_stack+0x1e/0x20 [<ffffffff81f03173>] ubsan_epilogue+0x12/0x55 [<ffffffff81f03b96>] __ubsan_handle_load_invalid_value+0x118/0x162 [<ffffffffa1575173>] kvm_apic_set_irq+0xc3/0xf0 [kvm] [<ffffffffa1575b20>] kvm_irq_delivery_to_apic_fast+0x450/0x910 [kvm] [<ffffffffa15858ea>] kvm_irq_delivery_to_apic+0xfa/0x7a0 [kvm] [<ffffffffa1517f4e>] kvm_emulate_hypercall+0x62e/0x760 [kvm] [<ffffffffa113141a>] handle_vmcall+0x1a/0x30 [kvm_intel] [<ffffffffa114e592>] vmx_handle_exit+0x7a2/0x1fa0 [kvm_intel] ... Signed-off-by: Longpeng(Mike) <[email protected]> Signed-off-by: Radim Krčmář <[email protected]> * KVM: avoid using rcu_dereference_protected During teardown, accesses to memslots and buses are using rcu_dereference_protected with an always-true condition because these accesses are done outside the usual mutexes. This is because the last reference is gone and there cannot be any concurrent modifications, but rcu_dereference_protected is ugly and unobvious. Instead, check the refcount in kvm_get_bus and __kvm_memslots. Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Radim Krčmář <[email protected]> * KVM: nVMX: do not pin the VMCS12 Since the current implementation of VMCS12 does a memcpy in and out of guest memory, we do not need current_vmcs12 and current_vmcs12_page anymore. current_vmptr is enough to read and write the VMCS12. And David Matlack noted: This patch also fixes dirty tracking (memslot->dirty_bitmap) of the VMCS12 page by using kvm_write_guest. nested_release_page() only marks the struct page dirty. Signed-off-by: Paolo Bonzini <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> [Added David Matlack's note and nested_release_page_clean() fix.] Signed-off-by: Radim Krčmář <[email protected]> * kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardown According to the Intel SDM, software cannot rely on the current VMCS to be coherent after a VMXOFF or shutdown. So this is a valid way to handle VMCS12 flushes. 24.11.1 Software Use of Virtual-Machine Control Structures ... If a logical processor leaves VMX operation, any VMCSs active on that logical processor may be corrupted (see below). To prevent such corruption of a VMCS that may be used either after a return to VMX operation or on another logical processor, software should execute VMCLEAR for that VMCS before executing the VMXOFF instruction or removing power from the processor (e.g., as part of a transition to the S3 and S4 power states). ... This fixes a "suspicious rcu_dereference_check() usage!" warning during kvm_vm_release() because nested_release_vmcs12() calls kvm_vcpu_write_guest_page() without holding kvm->srcu. Signed-off-by: David Matlack <[email protected]> Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Radim Krčmář <[email protected]> * KVM: nVMX: mark vmcs12 pages dirty on L2 exit The host physical addresses of L1's Virtual APIC Page and Posted Interrupt descriptor are loaded into the VMCS02. The CPU may write to these pages via their host physical address while L2 is running, bypassing address-translation-based dirty tracking (e.g. EPT write protection). Mark them dirty on every exit from L2 to prevent them from getting out of sync with dirty tracking. Also mark the virtual APIC page and the posted interrupt descriptor dirty when KVM is virtualizing posted interrupt processing. Signed-off-by: David Matlack <[email protected]> Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Radim Krčmář <[email protected]> * mm/hugetlb.c: __get_user_pages ignores certain follow_hugetlb_page errors Commit 9a291a7c9428 ("mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified") causes __get_user_pages to ignore certain errors from follow_hugetlb_page. After such error, __get_user_pages subsequently calls faultin_page on the same VMA and start address that follow_hugetlb_page failed on instead of returning the error immediately as it should. In follow_hugetlb_page, when hugetlb_fault returns a value covered under VM_FAULT_ERROR, follow_hugetlb_page returns it without setting nr_pages to 0 as __get_user_pages expects in this case, which causes the following to happen in __get_user_pages: the "while (nr_pages)" check succeeds, we skip the "if (!vma..." check because we got a VMA the last time around, we find no page with follow_page_mask, and we call faultin_page, which calls hugetlb_fault for the second time. This issue also slightly changes how __get_user_pages works. Before, it only returned error if it had made no progress (i = 0). But now, follow_hugetlb_page can clobber "i" with an error code since its new return path doesn't check for progress. So if "i" is nonzero before a failing call to follow_hugetlb_page, that indication of progress is lost and __get_user_pages can return error even if some pages were successfully pinned. To fix this, change follow_hugetlb_page so that it updates nr_pages, allowing __get_user_pages to fail immediately and restoring the "error only if no progress" behavior to __get_user_pages. Tested that __get_user_pages returns when expected on error from hugetlb_fault in follow_hugetlb_page. Fixes: 9a291a7c9428 ("mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Daniel Jordan <[email protected]> Acked-by: Punit Agrawal <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: James Morse <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: zhong jiang <[email protected]> Cc: <[email protected]> [4.12.x] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> * pid: kill pidhash_size in pidhash_init() After commit 3d375d78593c ("mm: update callers to use HASH_ZERO flag"), drop unused pidhash_size in pidhash_init(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Reviewed-by: Pavel Tatashin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> * mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries Nadav Amit identified a theoritical race between page reclaim and mprotect due to TLB flushes being batched outside of the PTL being held. He described the race as follows: CPU0 CPU1 ---- ---- user accesses memory using RW PTE [PTE now cached in TLB] try_to_unmap_one() ==> ptep_get_and_clear() ==> set_tlb_ubc_flush_pending() mprotect(addr, PROT_READ) ==> change_pte_range() ==> [ PTE non-present - no flush ] user writes using cached RW PTE ... try_to_unmap_flush() The same type of race exists for reads when protecting for PROT_NONE and also exists for operations that can leave an old TLB entry behind such as munmap, mremap and madvise. For some operations like mprotect, it's not necessarily a data integrity issue but it is a correctness issue as there is a window where an mprotect that limits access still allows access. For munmap, it's potentially a data integrity issue although the race is massive as an munmap, mmap and return to userspace must all complete between the window when reclaim drops the PTL and flushes the TLB. However, it's theoritically possible so handle this issue by flushing the mm if reclaim is potentially currently batching TLB flushes. Other instances where a flush is required for a present pte should be ok as either the page lock is held preventing parallel reclaim or a page reference count is elevated preventing a parallel free leading to corruption. In the case of page_mkclean there isn't an obvious path that userspace could take advantage of without using the operations that are guarded by this patch. Other users such as gup as a race with reclaim looks just at PTEs. huge page variants should be ok as they don't race with reclaim. mincore only looks at PTEs. userfault also should be ok as if a parallel reclaim takes place, it will either fault the page back in or read some of the data before the flush occurs triggering a fault. Note that a variant of this patch was acked by Andy Lutomirski but this was for the x86 parts on top of his PCID work which didn't make the 4.13 merge window as expected. His ack is dropped from this version and there will be a follow-on patch on top of PCID that will include his ack. [[email protected]: tweak comments] [[email protected]: fix spello] Link: http://lkml.kernel.org/r/[email protected] Reported-by: Nadav Amit <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: <[email protected]> [v4.4+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> * userfaultfd: non-cooperative: notify about unmap of destination during mremap When mremap is called with MREMAP_FIXED it unmaps memory at the destination address without notifying userfaultfd monitor. If the destination were registered with userfaultfd, the monitor has no way to distinguish between the old and new ranges and to properly relate the page faults that would occur in the destination region. Fixes: 897ab3e0c49e ("userfaultfd: non-cooperative: add event for memory unmaps") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> * kasan: avoid -Wmaybe-uninitialized warning gcc-7 produces this warning: mm/kasan/report.c: In function 'kasan_report': mm/kasan/report.c:351:3: error: 'info.first_bad_addr' may be used uninitialized in this function [-Werror=maybe-uninitialized] print_shadow_for_address(info->first_bad_addr); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ mm/kasan/report.c:360:27: note: 'info.first_bad_addr' was declared here The code seems fine as we only print info.first_bad_addr when there is a shadow, and we always initialize it in that case, but this is relatively hard for gcc to figure out after the latest rework. Adding an intialization to the most likely value together with the other struct members shuts up that warning. Fixes: b235b9808664 ("kasan: unify report headers") Link: https://patchwork.kernel.org/patch/9641417/ Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]> Suggested-by: Alexander Potapenko <[email protected]> Suggested-by: Andrey Ryabinin <[email protected]> Acked-by: Andrey Ryabinin <[email protected]> Cc: Dmitry Vyukov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> * kthread: fix documentation build warning The kerneldoc comment for kthread_create() had an incorrect argument name, leading to a warning in the docs build. Correct it, and make one more small step toward a warning-free build. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jonathan Corbet <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> * zram: do not free pool->size_class Mike …

KernelPRBot · 2017-08-10T03:40:06Z

Hi @andrewlucianlu!

Thanks for your contribution to the Linux kernel!

Linux kernel development happens on mailing lists, rather than on GitHub - this GitHub repository is a read-only mirror that isn't used for accepting contributions. So that your change can become part of Linux, please email it to us as a patch.

Sending patches isn't quite as simple as sending a pull request, but fortunately it is a well documented process.

Here's what to do:

Format your contribution according to kernel requirements
Decide who to send your contribution to
Set up your system to send your contribution as an email
Send your contribution and wait for feedback

How do I format my contribution?

The Linux kernel community is notoriously picky about how contributions are formatted and sent. Fortunately, they have documented their expectations.

Firstly, all contributions need to be formatted as patches. A patch is a plain text document showing the change you want to make to the code, and documenting why it is a good idea.

You can create patches with git format-patch.

Secondly, patches need 'commit messages', which is the human-friendly documentation explaining what the change is and why it's necessary.

Thirdly, changes have some technical requirements. There is a Linux kernel coding style, and there are licensing requirements you need to comply with.

Both of these are documented in the Submitting Patches documentation that is part of the kernel.

Note that you will almost certainly have to modify your existing git commits to satisfy these requirements. Don't worry: there are many guides on the internet for doing this.

Who do I send my contribution to?

The Linux kernel is composed of a number of subsystems. These subsystems are maintained by different people, and have different mailing lists where they discuss proposed changes.

If you don't already know what subsystem your change belongs to, the get_maintainer.pl script in the kernel source can help you.

get_maintainer.pl will take the patch or patches you created in the previous step, and tell you who is responsible for them, and what mailing lists are used. You can also take a look at the MAINTAINERS file by hand.

Make sure that your list of recipients includes a mailing list. If you can't find a more specific mailing list, then LKML - the Linux Kernel Mailing List - is the place to send your patches.

It's not usually necessary to subscribe to the mailing list before you send the patches, but if you're interested in kernel development, subscribing to a subsystem mailing list is a good idea. (At this point, you probably don't need to subscribe to LKML - it is a very high traffic list with about a thousand messages per day, which is often not useful for beginners.)

How do I send my contribution?

Use git send-email, which will ensure that your patches are formatted in the standard manner. In order to use git send-email, you'll need to configure git to use your SMTP email server.

For more information about using git send-email, look at the Git documentation or type git help send-email. There are a number of useful guides and tutorials about git send-email that can be found on the internet.

How do I get help if I'm stuck?

Firstly, don't get discouraged! There are an enormous number of resources on the internet, and many kernel developers who would like to see you succeed.

Many issues - especially about how to use certain tools - can be resolved by using your favourite internet search engine.

If you can't find an answer, there are a few places you can turn:

Kernel Newbies - this website contains a lot of useful resources for new kernel developers.
If you'd like a step-by-step, challenge-based introduction to kernel development, the Eudyptula Challenge would be an excellent start.
The kernel documentation - see also the Documentation directory in the kernel tree.

If you get really, really stuck, you could try the owners of this bot, @daxtens and @ajdlinux. Please be aware that we do have full-time jobs, so we are almost certainly the slowest way to get answers!

I sent my patch - now what?

You wait.

You can check that your email has been received by checking the mailing list archives for the mailing list you sent your patch to. Messages may not be received instantly, so be patient. Kernel developers are generally very busy people, so it may take a few weeks before your patch is looked at.

Then, you keep waiting. Three things may happen:

You might get a response to your email. Often these will be comments, which may require you to make changes to your patch, or explain why your way is the best way. You should respond to these comments, and you may need to submit another revision of your patch to address the issues raised.
Your patch might be merged into the subsystem tree. Code that becomes part of Linux isn't merged into the main repository straight away - it first goes into the subsystem tree, which is managed by the subsystem maintainer. It is then batched up with a number of other changes sent to Linus for inclusion. (This process is described in some detail in the kernel development process guide).
Your patch might be ignored completely. This happens sometimes - don't take it personally. Here's what to do:
- Wait a bit more - patches often take several weeks to get a response; more if they were sent at a busy time.
- Kernel developers often silently ignore patches that break the rules. Check for obvious violations of the the Submitting Patches guidelines, the style guidelines, and any other documentation you can find about your subsystem. Check that you're sending your patch to the right place.
- Try again later. When you resend it, don't add angry commentary, as that will get your patch ignored. It might also get you silently blacklisted.

Further information

Working with the kernel development community - the official documentation for new kernel contributors

Happy hacking!

This message was posted by a bot - if you have any questions or suggestions, please talk to my owners, @ajdlinux and @daxtens, or raise an issue at https://github.com/ajdlinux/KernelPRBot.

….org/drm/drm-intel into drm-fixes - Fix GitLab issue #446 causing GPU hangs: Do not restore invalid RS state - Fix GitLab issue #846: Restore coarse power gating that was disabled by initial RC66 context corruption security fixes. - Revert f6ec948 ("drm/i915: extend audio CDCLK>=2*BCLK constraint to more platforms") to avoid screen flicker - Fix to fill in unitialized uabi_instance in virtual engine uAPI - Add two missing W/As for ICL and EHL Signed-off-by: Dave Airlie <[email protected]> From: Joonas Lahtinen <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

…/drm Pull drm fixes from Dave Airlie: "Pre-LCA pull request I'm not sure how things will look next week, myself and Daniel are at LCA and I'm speaking quite late, so if I get my talk finished I'll probably process fixes. This week has a bunch of i915 fixes, some amdgpu fixes, one sun4i, one core MST, and one core fb_helper fix. More details below: core: - mst Fix NO_STOP_BIT bit offset (Wayne) fb_helper: - fb_helper: Fix bits_per_pixel param set behavior to round up (Geert) sun4i: - Fix RGB_DIV clock min divider on old hardware (Chen-Yu) amdgpu: - Stability fix for raven - Reduce pixel encoding to if max clock is exceeded on HDMI to allow additional high res modes - enable DRIVER_SYNCOBJ_TIMELINE for amdgpu i915: - Fix GitLab issue #446 causing GPU hangs: Do not restore invalid RS state - Fix GitLab issue #846: Restore coarse power gating that was disabled by initial RC66 context corruption security fixes. - Revert f6ec948 ("drm/i915: extend audio CDCLK>=2*BCLK constraint to more platforms") to avoid screen flicker - Fix to fill in unitialized uabi_instance in virtual engine uAPI - Add two missing W/As for ICL and EHL" * tag 'drm-fixes-2020-01-10' of git://anongit.freedesktop.org/drm/drm: drm/amdgpu: add DRIVER_SYNCOBJ_TIMELINE to amdgpu drm/amd/display: Reduce HDMI pixel encoding if max clock is exceeded Revert "drm/amdgpu: Set no-retry as default." drm/fb-helper: Round up bits_per_pixel if possible drm/sun4i: tcon: Set RGB DCLK min. divider based on hardware model drm/i915/dp: Disable Port sync mode correctly on teardown drm/i915: Add Wa_1407352427:icl,ehl drm/i915: Add Wa_1408615072 and Wa_1407596294 to icl,ehl drm/i915/gt: Restore coarse power gating drm/i915/gt: Do not restore invalid RS state drm/i915: Limit audio CDCLK>=2*BCLK constraint back to GLK only drm/i915/gt: Mark up virtual engine uabi_instance drm/dp_mst: correct the shifting in DP_REMOTE_I2C_READ

This commit fixes the following checkpatch.pl errors: ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#446: FILE: ./include/rtl8723b_xmit.h:446: +u8 BWMapping_8723B(struct adapter * Adapter, struct pkt_attrib *pattrib); ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#447: FILE: ./include/rtl8723b_xmit.h:447: +u8 SCMapping_8723B(struct adapter * Adapter, struct pkt_attrib *pattrib); Signed-off-by: Marco Cesati <[email protected]>

This commit fixes the following checkpatch.pl errors: ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#446: FILE: ./include/rtl8723b_xmit.h:446: +u8 BWMapping_8723B(struct adapter * Adapter, struct pkt_attrib *pattrib); ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar" torvalds#447: FILE: ./include/rtl8723b_xmit.h:447: +u8 SCMapping_8723B(struct adapter * Adapter, struct pkt_attrib *pattrib); Reviewed-by: Dan Carpenter <[email protected]> Signed-off-by: Marco Cesati <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

Support generating all docs, introduce `doc_cfg` and `doc_all`

When we try to add an IPv6 nexthop and IPv6 is not enabled (!CONFIG_IPV6) we'll hit a NULL pointer dereference[1] in the error path of nh_create_ipv6() due to calling ipv6_stub->fib6_nh_release. The bug has been present since the beginning of IPv6 nexthop gateway support. Commit 1aefd3d ("ipv6: Add fib6_nh_init and release to stubs") tells us that only fib6_nh_init has a dummy stub because fib6_nh_release should not be called if fib6_nh_init returns an error, but the commit below added a call to ipv6_stub->fib6_nh_release in its error path. To fix it return the dummy stub's -EAFNOSUPPORT error directly without calling ipv6_stub->fib6_nh_release in nh_create_ipv6()'s error path. [1] Output is a bit truncated, but it clearly shows the error. BUG: kernel NULL pointer dereference, address: 000000000000000000 #PF: supervisor instruction fetch in kernel modede #PF: error_code(0x0010) - not-present pagege PGD 0 P4D 0 Oops: 0010 [#1] PREEMPT SMP NOPTI CPU: 4 PID: 638 Comm: ip Kdump: loaded Not tainted 5.16.0-rc1+ torvalds#446 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014 RIP: 0010:0x0 Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. RSP: 0018:ffff888109f5b8f0 EFLAGS: 00010286^Ac RAX: 0000000000000000 RBX: ffff888109f5ba28 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8881008a2860 RBP: ffff888109f5b9d8 R08: 0000000000000000 R09: 0000000000000000 R10: ffff888109f5b978 R11: ffff888109f5b948 R12: 00000000ffffff9f R13: ffff8881008a2a80 R14: ffff8881008a2860 R15: ffff8881008a2840 FS: 00007f98de70f100(0000) GS:ffff88822bf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 0000000100efc000 CR4: 00000000000006e0 Call Trace: <TASK> nh_create_ipv6+0xed/0x10c rtm_new_nexthop+0x6d7/0x13f3 ? check_preemption_disabled+0x3d/0xf2 ? lock_is_held_type+0xbe/0xfd rtnetlink_rcv_msg+0x23f/0x26a ? check_preemption_disabled+0x3d/0xf2 ? rtnl_calcit.isra.0+0x147/0x147 netlink_rcv_skb+0x61/0xb2 netlink_unicast+0x100/0x187 netlink_sendmsg+0x37f/0x3a0 ? netlink_unicast+0x187/0x187 sock_sendmsg_nosec+0x67/0x9b ____sys_sendmsg+0x19d/0x1f9 ? copy_msghdr_from_user+0x4c/0x5e ? rcu_read_lock_any_held+0x2a/0x78 ___sys_sendmsg+0x6c/0x8c ? asm_sysvec_apic_timer_interrupt+0x12/0x20 ? lockdep_hardirqs_on+0xd9/0x102 ? sockfd_lookup_light+0x69/0x99 __sys_sendmsg+0x50/0x6e do_syscall_64+0xcb/0xf2 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f98dea28914 Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 48 8d 05 e9 5d 0c 00 8b 00 85 c0 75 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 41 89 d4 55 48 89 f5 53 RSP: 002b:00007fff859f5e68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e2e RAX: ffffffffffffffda RBX: 00000000619cb810 RCX: 00007f98dea28914 RDX: 0000000000000000 RSI: 00007fff859f5ed0 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000008 R10: fffffffffffffce6 R11: 0000000000000246 R12: 0000000000000001 R13: 000055c0097ae520 R14: 000055c0097957fd R15: 00007fff859f63a0 </TASK> Modules linked in: bridge stp llc bonding virtio_net Cc: [email protected] Fixes: 53010f9 ("nexthop: Add support for IPv6 gateways") Signed-off-by: Nikolay Aleksandrov <[email protected]>

When we try to add an IPv6 nexthop and IPv6 is not enabled (!CONFIG_IPV6) we'll hit a NULL pointer dereference[1] in the error path of nh_create_ipv6() due to calling ipv6_stub->fib6_nh_release. The bug has been present since the beginning of IPv6 nexthop gateway support. Commit 1aefd3d ("ipv6: Add fib6_nh_init and release to stubs") tells us that only fib6_nh_init has a dummy stub because fib6_nh_release should not be called if fib6_nh_init returns an error, but the commit below added a call to ipv6_stub->fib6_nh_release in its error path. To fix it return the dummy stub's -EAFNOSUPPORT error directly without calling ipv6_stub->fib6_nh_release in nh_create_ipv6()'s error path. [1] Output is a bit truncated, but it clearly shows the error. BUG: kernel NULL pointer dereference, address: 000000000000000000 #PF: supervisor instruction fetch in kernel modede #PF: error_code(0x0010) - not-present pagege PGD 0 P4D 0 Oops: 0010 [#1] PREEMPT SMP NOPTI CPU: 4 PID: 638 Comm: ip Kdump: loaded Not tainted 5.16.0-rc1+ torvalds#446 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014 RIP: 0010:0x0 Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. RSP: 0018:ffff888109f5b8f0 EFLAGS: 00010286^Ac RAX: 0000000000000000 RBX: ffff888109f5ba28 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8881008a2860 RBP: ffff888109f5b9d8 R08: 0000000000000000 R09: 0000000000000000 R10: ffff888109f5b978 R11: ffff888109f5b948 R12: 00000000ffffff9f R13: ffff8881008a2a80 R14: ffff8881008a2860 R15: ffff8881008a2840 FS: 00007f98de70f100(0000) GS:ffff88822bf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 0000000100efc000 CR4: 00000000000006e0 Call Trace: <TASK> nh_create_ipv6+0xed/0x10c rtm_new_nexthop+0x6d7/0x13f3 ? check_preemption_disabled+0x3d/0xf2 ? lock_is_held_type+0xbe/0xfd rtnetlink_rcv_msg+0x23f/0x26a ? check_preemption_disabled+0x3d/0xf2 ? rtnl_calcit.isra.0+0x147/0x147 netlink_rcv_skb+0x61/0xb2 netlink_unicast+0x100/0x187 netlink_sendmsg+0x37f/0x3a0 ? netlink_unicast+0x187/0x187 sock_sendmsg_nosec+0x67/0x9b ____sys_sendmsg+0x19d/0x1f9 ? copy_msghdr_from_user+0x4c/0x5e ? rcu_read_lock_any_held+0x2a/0x78 ___sys_sendmsg+0x6c/0x8c ? asm_sysvec_apic_timer_interrupt+0x12/0x20 ? lockdep_hardirqs_on+0xd9/0x102 ? sockfd_lookup_light+0x69/0x99 __sys_sendmsg+0x50/0x6e do_syscall_64+0xcb/0xf2 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f98dea28914 Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 48 8d 05 e9 5d 0c 00 8b 00 85 c0 75 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 41 89 d4 55 48 89 f5 53 RSP: 002b:00007fff859f5e68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e2e RAX: ffffffffffffffda RBX: 00000000619cb810 RCX: 00007f98dea28914 RDX: 0000000000000000 RSI: 00007fff859f5ed0 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000008 R10: fffffffffffffce6 R11: 0000000000000246 R12: 0000000000000001 R13: 000055c0097ae520 R14: 000055c0097957fd R15: 00007fff859f63a0 </TASK> Modules linked in: bridge stp llc bonding virtio_net Cc: [email protected] Fixes: 53010f9 ("nexthop: Add support for IPv6 gateways") Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>

commit 1c74312 upstream. When we try to add an IPv6 nexthop and IPv6 is not enabled (!CONFIG_IPV6) we'll hit a NULL pointer dereference[1] in the error path of nh_create_ipv6() due to calling ipv6_stub->fib6_nh_release. The bug has been present since the beginning of IPv6 nexthop gateway support. Commit 1aefd3d ("ipv6: Add fib6_nh_init and release to stubs") tells us that only fib6_nh_init has a dummy stub because fib6_nh_release should not be called if fib6_nh_init returns an error, but the commit below added a call to ipv6_stub->fib6_nh_release in its error path. To fix it return the dummy stub's -EAFNOSUPPORT error directly without calling ipv6_stub->fib6_nh_release in nh_create_ipv6()'s error path. [1] Output is a bit truncated, but it clearly shows the error. BUG: kernel NULL pointer dereference, address: 000000000000000000 #PF: supervisor instruction fetch in kernel modede #PF: error_code(0x0010) - not-present pagege PGD 0 P4D 0 Oops: 0010 [#1] PREEMPT SMP NOPTI CPU: 4 PID: 638 Comm: ip Kdump: loaded Not tainted 5.16.0-rc1+ torvalds#446 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014 RIP: 0010:0x0 Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. RSP: 0018:ffff888109f5b8f0 EFLAGS: 00010286^Ac RAX: 0000000000000000 RBX: ffff888109f5ba28 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8881008a2860 RBP: ffff888109f5b9d8 R08: 0000000000000000 R09: 0000000000000000 R10: ffff888109f5b978 R11: ffff888109f5b948 R12: 00000000ffffff9f R13: ffff8881008a2a80 R14: ffff8881008a2860 R15: ffff8881008a2840 FS: 00007f98de70f100(0000) GS:ffff88822bf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 0000000100efc000 CR4: 00000000000006e0 Call Trace: <TASK> nh_create_ipv6+0xed/0x10c rtm_new_nexthop+0x6d7/0x13f3 ? check_preemption_disabled+0x3d/0xf2 ? lock_is_held_type+0xbe/0xfd rtnetlink_rcv_msg+0x23f/0x26a ? check_preemption_disabled+0x3d/0xf2 ? rtnl_calcit.isra.0+0x147/0x147 netlink_rcv_skb+0x61/0xb2 netlink_unicast+0x100/0x187 netlink_sendmsg+0x37f/0x3a0 ? netlink_unicast+0x187/0x187 sock_sendmsg_nosec+0x67/0x9b ____sys_sendmsg+0x19d/0x1f9 ? copy_msghdr_from_user+0x4c/0x5e ? rcu_read_lock_any_held+0x2a/0x78 ___sys_sendmsg+0x6c/0x8c ? asm_sysvec_apic_timer_interrupt+0x12/0x20 ? lockdep_hardirqs_on+0xd9/0x102 ? sockfd_lookup_light+0x69/0x99 __sys_sendmsg+0x50/0x6e do_syscall_64+0xcb/0xf2 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f98dea28914 Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 48 8d 05 e9 5d 0c 00 8b 00 85 c0 75 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 41 89 d4 55 48 89 f5 53 RSP: 002b:00007fff859f5e68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e2e RAX: ffffffffffffffda RBX: 00000000619cb810 RCX: 00007f98dea28914 RDX: 0000000000000000 RSI: 00007fff859f5ed0 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000008 R10: fffffffffffffce6 R11: 0000000000000246 R12: 0000000000000001 R13: 000055c0097ae520 R14: 000055c0097957fd R15: 00007fff859f63a0 </TASK> Modules linked in: bridge stp llc bonding virtio_net Cc: [email protected] Fixes: 53010f9 ("nexthop: Add support for IPv6 gateways") Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

There's issue as follows when do IO fault injection: XFS (loop0): Filesystem has been shut down due to log error (0x2). XFS (loop0): Please unmount the filesystem and rectify the problem(s). XFS (loop0): Failed to recover intents XFS (loop0): log mount finish failed ================================================================== BUG: KASAN: slab-use-after-free in xfs_cui_release+0x20/0xe0 Read of size 4 at addr ffff88826d204098 by task kworker/u18:4/252 CPU: 7 PID: 252 Comm: kworker/u18:4 Not tainted 6.3.0-next torvalds#446 Workqueue: xfs-cil/loop0 xlog_cil_push_work Call Trace: <TASK> dump_stack_lvl+0xd9/0x150 print_report+0xc1/0x5e0 kasan_report+0x96/0xc0 kasan_check_range+0x13f/0x1a0 xfs_cui_release+0x20/0xe0 xfs_cud_item_release+0x37/0x80 xfs_trans_committed_bulk+0x337/0x8f0 xlog_cil_committed+0xc08/0x1010 xlog_cil_push_work+0x1cb4/0x25f0 process_one_work+0x9cf/0x1610 worker_thread+0x63a/0x10a0 kthread+0x343/0x440 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 8537: kasan_save_stack+0x22/0x40 kasan_set_track+0x25/0x30 __kasan_slab_alloc+0x7f/0x90 slab_post_alloc_hook+0x65/0x560 kmem_cache_alloc+0x1cf/0x440 xfs_cui_init+0x1a8/0x1e0 xlog_recover_cui_commit_pass2+0x137/0x440 xlog_recover_items_pass2+0xd8/0x310 xlog_recover_commit_trans+0x958/0xb20 xlog_recovery_process_trans+0x13a/0x1e0 xlog_recover_process_ophdr+0x1e9/0x410 xlog_recover_process_data+0x19c/0x400 xlog_recover_process+0x257/0x2e0 xlog_do_recovery_pass+0x70c/0xf90 xlog_do_log_recovery+0x9e/0x100 xlog_do_recover+0xdf/0x5a0 xlog_recover+0x2a8/0x500 xfs_log_mount+0x36e/0x790 xfs_mountfs+0x133a/0x2220 xfs_fs_fill_super+0x1376/0x1f10 get_tree_bdev+0x44a/0x770 vfs_get_tree+0x8d/0x350 path_mount+0x1228/0x1cc0 do_mount+0xf7/0x110 __x64_sys_mount+0x193/0x230 do_syscall_64+0x39/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 8537: kasan_save_stack+0x22/0x40 kasan_set_track+0x25/0x30 kasan_save_free_info+0x2e/0x40 __kasan_slab_free+0x11f/0x1b0 kmem_cache_free+0xf2/0x670 xfs_cui_item_free+0xa4/0xc0 xfs_cui_release+0xa0/0xe0 xlog_recover_cancel_intents.isra.0+0xf4/0x200 xlog_recover_finish+0x87f/0xa20 xfs_log_mount_finish+0x325/0x570 xfs_mountfs+0x16ba/0x2220 xfs_fs_fill_super+0x1376/0x1f10 get_tree_bdev+0x44a/0x770 vfs_get_tree+0x8d/0x350 path_mount+0x1228/0x1cc0 do_mount+0xf7/0x110 __x64_sys_mount+0x193/0x230 do_syscall_64+0x39/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd As xlog_recover_process_intents() failed will trigger LOG shutdown will not push CIL. Then xlog_recover_finish() will remove log item from AIL list and release log item. However xlog_cil_push_work() perhaps concurrent processing this log item, then trigger UAF. To solve above issue, there's need to make sure all CIL is pushed before cancel intent item. Fixes: deb4cd8 ("xfs: transfer recovered intent item ownership in ->iop_recover") Signed-off-by: Ye Bin <[email protected]>

There's issue as follows when do IO fault injection: XFS (loop0): Filesystem has been shut down due to log error (0x2). XFS (loop0): Please unmount the filesystem and rectify the problem(s). XFS (loop0): Failed to recover intents XFS (loop0): log mount finish failed ================================================================== BUG: KASAN: slab-use-after-free in xfs_cui_release+0x20/0xe0 Read of size 4 at addr ffff88826d204098 by task kworker/u18:4/252 CPU: 7 PID: 252 Comm: kworker/u18:4 Not tainted 6.3.0-next torvalds#446 Workqueue: xfs-cil/loop0 xlog_cil_push_work Call Trace: <TASK> dump_stack_lvl+0xd9/0x150 print_report+0xc1/0x5e0 kasan_report+0x96/0xc0 kasan_check_range+0x13f/0x1a0 xfs_cui_release+0x20/0xe0 xfs_cud_item_release+0x37/0x80 xfs_trans_committed_bulk+0x337/0x8f0 xlog_cil_committed+0xc08/0x1010 xlog_cil_push_work+0x1cb4/0x25f0 process_one_work+0x9cf/0x1610 worker_thread+0x63a/0x10a0 kthread+0x343/0x440 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 8537: kasan_save_stack+0x22/0x40 kasan_set_track+0x25/0x30 __kasan_slab_alloc+0x7f/0x90 slab_post_alloc_hook+0x65/0x560 kmem_cache_alloc+0x1cf/0x440 xfs_cui_init+0x1a8/0x1e0 xlog_recover_cui_commit_pass2+0x137/0x440 xlog_recover_items_pass2+0xd8/0x310 xlog_recover_commit_trans+0x958/0xb20 xlog_recovery_process_trans+0x13a/0x1e0 xlog_recover_process_ophdr+0x1e9/0x410 xlog_recover_process_data+0x19c/0x400 xlog_recover_process+0x257/0x2e0 xlog_do_recovery_pass+0x70c/0xf90 xlog_do_log_recovery+0x9e/0x100 xlog_do_recover+0xdf/0x5a0 xlog_recover+0x2a8/0x500 xfs_log_mount+0x36e/0x790 xfs_mountfs+0x133a/0x2220 xfs_fs_fill_super+0x1376/0x1f10 get_tree_bdev+0x44a/0x770 vfs_get_tree+0x8d/0x350 path_mount+0x1228/0x1cc0 do_mount+0xf7/0x110 __x64_sys_mount+0x193/0x230 do_syscall_64+0x39/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 8537: kasan_save_stack+0x22/0x40 kasan_set_track+0x25/0x30 kasan_save_free_info+0x2e/0x40 __kasan_slab_free+0x11f/0x1b0 kmem_cache_free+0xf2/0x670 xfs_cui_item_free+0xa4/0xc0 xfs_cui_release+0xa0/0xe0 xlog_recover_cancel_intents.isra.0+0xf4/0x200 xlog_recover_finish+0x87f/0xa20 xfs_log_mount_finish+0x325/0x570 xfs_mountfs+0x16ba/0x2220 xfs_fs_fill_super+0x1376/0x1f10 get_tree_bdev+0x44a/0x770 vfs_get_tree+0x8d/0x350 path_mount+0x1228/0x1cc0 do_mount+0xf7/0x110 __x64_sys_mount+0x193/0x230 do_syscall_64+0x39/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd As xlog_recover_process_intents() failed will trigger LOG shutdown will not push CIL. Then xlog_recover_finish() will remove log item from AIL list and release log item. However xlog_cil_push_work() perhaps concurrent processing this log item, then trigger UAF. To solve above issue, there's need to make sure all CIL is pushed before cancel intent item. Fixes: deb4cd8 ("xfs: transfer recovered intent item ownership in ->iop_recover") Signed-off-by: Ye Bin <[email protected]> Signed-off-by: Carlos Maiolino <[email protected]>

Avoid triggering a `refcount_t: addition on 0; use-after-free.` warning when registering a module clock with the same MSTOP configuration. The issue arises when a module clock is registered but not enabled, resulting in a `ref_cnt` of 0. Subsequent calls to `refcount_inc()` on such clocks cause the kernel to warn about use-after-free. [ 0.113529] ------------[ cut here ]------------ [ 0.113537] refcount_t: addition on 0; use-after-free. [ 0.113576] WARNING: CPU: 2 PID: 1 at lib/refcount.c:25 refcount_warn_saturate+0x120/0x144 [ 0.113602] Modules linked in: [ 0.113616] CPU: 2 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.13.0-rc3+ torvalds#446 [ 0.113629] Hardware name: Renesas RZ/V2H EVK Board based on r9a09g057h44 (DT) [ 0.113641] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 0.113652] pc : refcount_warn_saturate+0x120/0x144 [ 0.113664] lr : refcount_warn_saturate+0x120/0x144 [ 0.113675] sp : ffff8000818aba90 [ 0.113682] x29: ffff8000818aba90 x28: ffff0000c0d96450 x27: ffff0000c0d96440 [ 0.113699] x26: 0000000000000014 x25: 0000000000051000 x24: ffff0000c0ad6480 [ 0.113714] x23: ffff0000c0d96200 x22: ffff800080fae558 x21: 00000000000001e0 [ 0.113730] x20: ffff0000c0b11c10 x19: ffff8000815ae6f0 x18: 0000000000000006 [ 0.113745] x17: ffff800081765368 x16: 0000000000000000 x15: 0765076507720766 [ 0.113760] x14: ffff8000816a3ea0 x13: 0765076507720766 x12: 072d077207650774 [ 0.113776] x11: ffff8000816a3ea0 x10: 00000000000000ce x9 : ffff8000816fbea0 [ 0.113791] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : ffff8000816fbea0 [ 0.113806] x5 : 80000000fffff000 x4 : 0000000000000000 x3 : 0000000000000000 [ 0.113821] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c0158000 [ 0.113837] Call trace: [ 0.113845] refcount_warn_saturate+0x120/0x144 (P) [ 0.113859] rzv2h_cpg_probe+0x7f8/0xa38 [ 0.113874] platform_probe+0x68/0xdc [ 0.113890] really_probe+0xbc/0x2c0 [ 0.113901] __driver_probe_device+0x78/0x120 [ 0.113912] driver_probe_device+0x3c/0x154 [ 0.113923] __driver_attach+0x90/0x1a0 [ 0.113933] bus_for_each_dev+0x7c/0xe0 [ 0.113944] driver_attach+0x24/0x30 [ 0.113954] bus_add_driver+0xe4/0x208 [ 0.113965] driver_register+0x68/0x124 [ 0.113975] __platform_driver_probe+0x54/0xd4 [ 0.113987] rzv2h_cpg_init+0x24/0x30 [ 0.113998] do_one_initcall+0x60/0x1d4 [ 0.114013] kernel_init_freeable+0x214/0x278 [ 0.114028] kernel_init+0x20/0x140 [ 0.114041] ret_from_fork+0x10/0x20 [ 0.114052] ---[ end trace 0000000000000000 ]--- Resolve this by checking the `ref_cnt` value before calling `refcount_inc()`. If `ref_cnt` is 0, reset it to 1 using `refcount_set()`. Fixes: 7bd4cb3 ("clk: renesas: rzv2h: Add MSTOP support") Signed-off-by: Lad Prabhakar <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]>

Avoid triggering a `refcount_t: addition on 0; use-after-free.` warning when registering a module clock with the same MSTOP configuration. The issue arises when a module clock is registered but not enabled, resulting in a `ref_cnt` of 0. Subsequent calls to `refcount_inc()` on such clocks cause the kernel to warn about use-after-free. [ 0.113529] ------------[ cut here ]------------ [ 0.113537] refcount_t: addition on 0; use-after-free. [ 0.113576] WARNING: CPU: 2 PID: 1 at lib/refcount.c:25 refcount_warn_saturate+0x120/0x144 [ 0.113602] Modules linked in: [ 0.113616] CPU: 2 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.13.0-rc3+ torvalds#446 [ 0.113629] Hardware name: Renesas RZ/V2H EVK Board based on r9a09g057h44 (DT) [ 0.113641] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 0.113652] pc : refcount_warn_saturate+0x120/0x144 [ 0.113664] lr : refcount_warn_saturate+0x120/0x144 [ 0.113675] sp : ffff8000818aba90 [ 0.113682] x29: ffff8000818aba90 x28: ffff0000c0d96450 x27: ffff0000c0d96440 [ 0.113699] x26: 0000000000000014 x25: 0000000000051000 x24: ffff0000c0ad6480 [ 0.113714] x23: ffff0000c0d96200 x22: ffff800080fae558 x21: 00000000000001e0 [ 0.113730] x20: ffff0000c0b11c10 x19: ffff8000815ae6f0 x18: 0000000000000006 [ 0.113745] x17: ffff800081765368 x16: 0000000000000000 x15: 0765076507720766 [ 0.113760] x14: ffff8000816a3ea0 x13: 0765076507720766 x12: 072d077207650774 [ 0.113776] x11: ffff8000816a3ea0 x10: 00000000000000ce x9 : ffff8000816fbea0 [ 0.113791] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : ffff8000816fbea0 [ 0.113806] x5 : 80000000fffff000 x4 : 0000000000000000 x3 : 0000000000000000 [ 0.113821] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c0158000 [ 0.113837] Call trace: [ 0.113845] refcount_warn_saturate+0x120/0x144 (P) [ 0.113859] rzv2h_cpg_probe+0x7f8/0xa38 [ 0.113874] platform_probe+0x68/0xdc [ 0.113890] really_probe+0xbc/0x2c0 [ 0.113901] __driver_probe_device+0x78/0x120 [ 0.113912] driver_probe_device+0x3c/0x154 [ 0.113923] __driver_attach+0x90/0x1a0 [ 0.113933] bus_for_each_dev+0x7c/0xe0 [ 0.113944] driver_attach+0x24/0x30 [ 0.113954] bus_add_driver+0xe4/0x208 [ 0.113965] driver_register+0x68/0x124 [ 0.113975] __platform_driver_probe+0x54/0xd4 [ 0.113987] rzv2h_cpg_init+0x24/0x30 [ 0.113998] do_one_initcall+0x60/0x1d4 [ 0.114013] kernel_init_freeable+0x214/0x278 [ 0.114028] kernel_init+0x20/0x140 [ 0.114041] ret_from_fork+0x10/0x20 [ 0.114052] ---[ end trace 0000000000000000 ]--- Resolve this by checking the `ref_cnt` value before calling `refcount_inc()`. If `ref_cnt` is 0, reset it to 1 using `refcount_set()`. Fixes: 7bd4cb3 ("clk: renesas: rzv2h: Add MSTOP support") Signed-off-by: Lad Prabhakar <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]> Fixes: 7bd4cb3 ("clk: renesas: rzv2h: Add MSTOP support") Link: https://lore.kernel.org/[email protected] Signed-off-by: Geert Uytterhoeven <[email protected]>

sodar pushed a commit to sodar/linux that referenced this pull request Jul 19, 2021

Merge pull request torvalds#446 from ojeda/doc_cfg

bc33ea8

Support generating all docs, introduce `doc_cfg` and `doc_all`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge (#1) #446

merge (#1) #446

addstone commented Aug 10, 2017

KernelPRBot commented Aug 10, 2017

merge (#1) #446

Are you sure you want to change the base?

merge (#1) #446

Conversation

addstone commented Aug 10, 2017

KernelPRBot commented Aug 10, 2017

How do I format my contribution?

Who do I send my contribution to?

How do I send my contribution?

How do I get help if I'm stuck?

I sent my patch - now what?

Further information