Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes 27/12/21 #3

Merged
merged 132 commits into from
Dec 28, 2021
Merged

Changes 27/12/21 #3

merged 132 commits into from
Dec 28, 2021

Conversation

Vylpes
Copy link

@Vylpes Vylpes commented Dec 28, 2021

No description provided.

rtg-canonical and others added 30 commits November 22, 2021 11:22
…xfer_start()

Coverity complains of an uninitialized variable:

5. uninit_use_in_call: Using uninitialized value config.dst_per when calling axi_chan_config_write. [show details]
6. uninit_use_in_call: Using uninitialized value config.hs_sel_src when calling axi_chan_config_write. [show details]
CID 121164 (#1-3 of 3): Uninitialized scalar variable (UNINIT)
7. uninit_use_in_call: Using uninitialized value config.src_per when calling axi_chan_config_write. [show details]
418        axi_chan_config_write(chan, &config);

Fix this by initializing the structure to 0 which should at least be benign in axi_chan_config_write(). Also fix
what looks like a cut-n-paste error when initializing config.hs_sel_dst.

Fixes: 8243516 ("dmaengine: dw-axi-dmac: support DMAX_NUM_CHANNELS > 8")
Cc: Eugeniy Paltsev <[email protected]>
Cc: Vinod Koul <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Tim Gardner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>
…ent()

The commit in the Fixes: tag has changed the logic of the code and now it
is likely that the probe will return an early success (0), even if not
completely executed.

This should lead to a crash or similar issue later on when the code
accesses to some never allocated resources.

Change the '!err' into a 'err' when checking if
'dma_set_mask_and_coherent()' has failed or not.

While at it, simplify the code and remove the "can't success code" related
to 32 DMA mask.
As stated in [1], 'dma_set_mask_and_coherent(DMA_BIT_MASK(64))' can't fail
if 'dev->dma_mask' is non-NULL. And if it is NULL, it would fail for the
same reason when tried with DMA_BIT_MASK(32).

[1]: https://lkml.org/lkml/2021/6/7/398

Fixes: ecb8c88 ("dmaengine: dw-edma-pcie: switch from 'pci_' to 'dma_' API")
Signed-off-by: Christophe JAILLET <[email protected]>
Link: https://lore.kernel.org/r/935fbb40ae930c5fe87482a41dcb73abf2257973.1636492127.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Vinod Koul <[email protected]>
Dan reports that smatch has found idxd_wq_quiesce() is being called inside
the idxd->dev_lock. idxd_wq_quiesce() calls wait_for_completion() and
therefore it can sleep. Move the call outside of the spinlock as it does
not need device lock.

Fixes: 5b0c68c ("dmaengine: idxd: support reporting of halt interrupt")
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Dave Jiang <[email protected]>
Link: https://lore.kernel.org/r/163716858508.1721911.15051495873516709923.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <[email protected]>
Per HiFive Unleashed schematics, the card detect signal of the
micro SD card is connected to gpio pin torvalds#11, which should be
reflected in the DT via the <gpios> property, as described in
Documentation/devicetree/bindings/mmc/mmc-spi-slot.txt.

[1] https://sifive.cdn.prismic.io/sifive/c52a8e32-05ce-4aaf-95c8-7bf8453f8698_hifive-unleashed-a00-schematics-1.pdf

Signed-off-by: Bin Meng <[email protected]>
Fixes: d573b55 ("riscv: dts: add initial board data for the SiFive HiFive Unmatched")
Cc: [email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
Per HiFive Unmatched schematics, the card detect signal of the
micro SD card is connected to gpio pin torvalds#15, which should be
reflected in the DT via the <gpios> property, as described in
Documentation/devicetree/bindings/mmc/mmc-spi-slot.txt.

[1] https://sifive.cdn.prismic.io/sifive/6a06d6c0-6e66-49b5-8e9e-e68ce76f4192_hifive-unmatched-schematics-v3.pdf

Signed-off-by: Bin Meng <[email protected]>
Fixes: d573b55 ("riscv: dts: add initial board data for the SiFive HiFive Unmatched")
Cc: [email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
When CONFIG_FSL_PMC is set to n, no value is assigned to cpu_up_prepare
in the mpc85xx_pm_ops structure. As a result, oops is triggered in
smp_85xx_start_cpu().

  smp: Bringing up secondary CPUs ...
  kernel tried to execute user page (0) - exploit attempt? (uid: 0)
  BUG: Unable to handle kernel instruction fetch (NULL pointer?)
  Faulting instruction address: 0x00000000
  Oops: Kernel access of bad area, sig: 11 [#1]
  ...
  NIP [00000000] 0x0
  LR [c0021d2c] smp_85xx_kick_cpu+0xe8/0x568
  Call Trace:
  [c1051da8] [c0021cb8] smp_85xx_kick_cpu+0x74/0x568 (unreliable)
  [c1051de8] [c0011460] __cpu_up+0xc0/0x228
  [c1051e18] [c0031bbc] bringup_cpu+0x30/0x224
  [c1051e48] [c0031f3c] cpu_up.constprop.0+0x180/0x33c
  [c1051e88] [c00322e8] bringup_nonboot_cpus+0x88/0xc8
  [c1051eb8] [c07e67bc] smp_init+0x30/0x78
  [c1051ed8] [c07d9e28] kernel_init_freeable+0x118/0x2a8
  [c1051f18] [c00032d8] kernel_init+0x14/0x124
  [c1051f38] [c0010278] ret_from_kernel_thread+0x14/0x1c

Fixes: c45361a ("powerpc/85xx: fix timebase sync issue when CONFIG_HOTPLUG_CPU=n")
Reported-by: Martin Kennedy <[email protected]>
Signed-off-by: Xiaoming Ni <[email protected]>
Tested-by: Martin Kennedy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
While converting bindings to dtschema, the buck regulators lost
"op_mode" property.  The "op_mode" is a valid property for all
regulators (both LDOs and bucks), so add it.

Reported-by: Rob Herring <[email protected]>
Fixes: fab58de ("regulator: dt-bindings: samsung,s5m8767: convert to dtschema")
Cc: <[email protected]>
Signed-off-by: Krzysztof Kozlowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
The corresponding API for clk_prepare is clk_unprepare, other than
clk_disable_unprepare.

Fix this by changing clk_disable_unprepare to clk_unprepare.

Fixes: 5762ab7 ("spi: Add support for Armada 3700 SPI Controller")
Signed-off-by: Dongliang Mu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
After commit 9f76779 ("MIPS: implement architecture-specific
'pci_remap_iospace()'"), there exists the following warning on the
Loongson64 platform:

    loongson-pci 1a000000.pci:       IO 0x0018020000..0x001803ffff -> 0x0000020000
    loongson-pci 1a000000.pci:      MEM 0x0040000000..0x007fffffff -> 0x0040000000
    ------------[ cut here ]------------
    WARNING: CPU: 2 PID: 1 at arch/mips/pci/pci-generic.c:55 pci_remap_iospace+0x84/0x90
    resource start address is not zero
    ...
    Call Trace:
    [<ffffffff8020dc78>] show_stack+0x40/0x120
    [<ffffffff80cf4a0c>] dump_stack_lvl+0x58/0x74
    [<ffffffff8023a0b0>] __warn+0xe0/0x110
    [<ffffffff80cee02c>] warn_slowpath_fmt+0xa4/0xd0
    [<ffffffff80cecf24>] pci_remap_iospace+0x84/0x90
    [<ffffffff807f9864>] devm_pci_remap_iospace+0x5c/0xb8
    [<ffffffff808121b0>] devm_of_pci_bridge_init+0x178/0x1f8
    [<ffffffff807f4000>] devm_pci_alloc_host_bridge+0x78/0x98
    [<ffffffff80819454>] loongson_pci_probe+0x34/0x160
    [<ffffffff809203cc>] platform_probe+0x6c/0xe0
    [<ffffffff8091d5d4>] really_probe+0xbc/0x340
    [<ffffffff8091d8f0>] __driver_probe_device+0x98/0x110
    [<ffffffff8091d9b8>] driver_probe_device+0x50/0x118
    [<ffffffff8091dea0>] __driver_attach+0x80/0x118
    [<ffffffff8091b280>] bus_for_each_dev+0x80/0xc8
    [<ffffffff8091c6d8>] bus_add_driver+0x130/0x210
    [<ffffffff8091ead4>] driver_register+0x8c/0x150
    [<ffffffff80200a8c>] do_one_initcall+0x54/0x288
    [<ffffffff811a5320>] kernel_init_freeable+0x27c/0x2e4
    [<ffffffff80cfc380>] kernel_init+0x2c/0x134
    [<ffffffff80205a2c>] ret_from_kernel_thread+0x14/0x1c
    ---[ end trace e4a0efe10aa5cce6 ]---
    loongson-pci 1a000000.pci: error -19: failed to map resource [io  0x20000-0x3ffff]

We can see that the resource start address is 0x0000020000, because
the ISA Bridge used the zero address which is defined in the dts file
arch/mips/boot/dts/loongson/ls7a-pch.dtsi:

    ISA Bridge: /bus@10000000/isa@18000000
    IO 0x0000000018000000..0x000000001801ffff  ->  0x0000000000000000

Based on the above analysis, the architecture-specific pci_remap_iospace()
is not suitable for Loongson64, we should only define pci_remap_iospace()
for Ralink on MIPS based on the commit background.

Fixes: 9f76779 ("MIPS: implement architecture-specific 'pci_remap_iospace()'")
Suggested-by: Thomas Bogendoerfer <[email protected]>
Signed-off-by: Tiezhu Yang <[email protected]>
Tested-by: Sergio Paracuellos <[email protected]>
Acked-by: Sergio Paracuellos <[email protected]>
Signed-off-by: Thomas Bogendoerfer <[email protected]>
This reverts commit b3484d2.

That change attempted to improve the DRM drivers fbdev emulation device
names to avoid having confusing names like "simpledrmdrmfb" in /proc/fb.

But unfortunately, there are user-space programs such as pm-utils that
match against the fbdev names and so broke after the mentioned commit.

Since the names in /proc/fb are used by tools that consider it an uAPI,
let's restore the old names even when this lead to silly names like the
one mentioned above.

Fixes: b3484d2 ("drm/fb-helper: improve DRM fbdev emulation device names")
Reported-by: Johannes Stezenbach <[email protected]>
Signed-off-by: Javier Martinez Canillas <[email protected]>
Reviewed-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Smatch reports below warnings [1] wrt dereferencing rm_res when it can
potentially be ERR_PTR(). This is possible when entire range is
allocated to Linux
Fix this case by making sure, there is no deference of rm_res when its
ERR_PTR().

[1]:
 drivers/dma/ti/k3-udma.c:4524 udma_setup_resources() error: 'rm_res' dereferencing possible ERR_PTR()
 drivers/dma/ti/k3-udma.c:4537 udma_setup_resources() error: 'rm_res' dereferencing possible ERR_PTR()
 drivers/dma/ti/k3-udma.c:4681 bcdma_setup_resources() error: 'rm_res' dereferencing possible ERR_PTR()
 drivers/dma/ti/k3-udma.c:4696 bcdma_setup_resources() error: 'rm_res' dereferencing possible ERR_PTR()
 drivers/dma/ti/k3-udma.c:4711 bcdma_setup_resources() error: 'rm_res' dereferencing possible ERR_PTR()
 drivers/dma/ti/k3-udma.c:4848 pktdma_setup_resources() error: 'rm_res' dereferencing possible ERR_PTR()
 drivers/dma/ti/k3-udma.c:4861 pktdma_setup_resources() error: 'rm_res' dereferencing possible ERR_PTR()

Reported-by: Nishanth Menon <[email protected]>
Signed-off-by: Vignesh Raghavendra <[email protected]>
Acked-by: Peter Ujfalusi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>
Ming reported that with the abort path of the descriptor submission, there
can be a window where a completed descriptor can be missed to be completed
by the irq completion thread:

CPU A				CPU B
Submit (successful)

Submit (fail)
				irq_process_work_list() // empty

llist_abort_desc()
// remove all descs from pending list

				irq_process_pending_llist() // empty
				exit idxd_wq_thread() with no processing

Add opportunistic descriptor completion in the abort path in order to
remove the missed completion.

Fixes: 6b4b87f ("dmaengine: idxd: fix submission race window")
Reported-by: Ming Li <[email protected]>
Signed-off-by: Dave Jiang <[email protected]>
Link: https://lore.kernel.org/r/163898288714.443911.16084982766671976640.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <[email protected]>
modprobe can't handle spaces in aliases.

Fixes: 6b4cd72 ("dmaengine: st_fdma: Add STMicroelectronics FDMA engine driver support")
Signed-off-by: Alyssa Ross <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>
Pixel clock has to be set in kHz.

Signed-off-by: Alejandro Concepcion-Rodriguez <[email protected]>
Fixes: 11e8f5f ("drm: Add simpledrm driver")
Signed-off-by: Thomas Zimmermann <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
When listening for notifications through netlink of a new interface being
registered, sporadically, it is possible for the MAC to be read as zero.
The zero MAC address lasts a short period of time and then switches to a
valid random MAC address.

This causes problems for netd in Android, which assumes that the interface
is malfunctioning and will not use it.

In the good case we get this log:
InterfaceController::getCfg() ifName usb0
 hwAddr 92:a8:f0:73:79:5b ipv4Addr 0.0.0.0 flags 0x1002

In the error case we get these logs:
InterfaceController::getCfg() ifName usb0
 hwAddr 00:00:00:00:00:00 ipv4Addr 0.0.0.0 flags 0x1002

netd : interfaceGetCfg("usb0")
netd : interfaceSetCfg() -> ServiceSpecificException
 (99, "[Cannot assign requested address] : ioctl() failed")

The reason for the issue is the order in which the interface is setup,
it is first registered through register_netdev() and after the MAC
address is set.

Fixed by first setting the MAC address of the net_device and after that
calling register_netdev().

Fixes: bcd4a1c ("usb: gadget: u_ether: construct with default values and add setters/getters")
Cc: [email protected]
Signed-off-by: Marian Postevca <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
There is warning of 'list_del corruption' when enable list debug
(CONFIG_DEBUG_LIST=y), fix it by using list_del_init()

Fixes: 4ce1866 ("usb: xhci-mtk: Do not use xhci's virt_dev in drop_endpoint")
Cc: stable <[email protected]>
Signed-off-by: Chunfeng Yun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Patch restrict calling of cdnsp_died function during removing modules
or software disconnect.
This function was called because after transition controller to HALT
state the driver starts handling the deferred interrupt.
In this case such interrupt can be simple ignored.

Fixes: 3d82904 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
cc: <[email protected]>
Reviewed-by: Peter Chen <[email protected]>
Signed-off-by: Pawel Laszczak <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Patch fixes incorrect order of __entry->stream_id and __entry->state
parameters in TP_printk macro.

Fixes: 3d82904 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
cc: <[email protected]>
Reviewed-by: Peter Chen <[email protected]>
Signed-off-by: Pawel Laszczak <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Patch fixes incorrect status for control request.
Without this fix all usb_request objects were returned to upper drivers
with usb_reqest->status field set to -EINPROGRESS.

Fixes: 3d82904 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
cc: <[email protected]>
Reported-by: Ken (Jian) He <[email protected]>
Reviewed-by: Peter Chen <[email protected]>
Signed-off-by: Pawel Laszczak <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
We have two io-wq creation paths:

- On queue enqueue
- When a worker goes to sleep

The latter invokes worker creation with the wqe->lock held, but that can
run into problems if we end up exiting and need to cancel the queued work.
syzbot caught this:

============================================
WARNING: possible recursive locking detected
5.16.0-rc4-syzkaller #0 Not tainted
--------------------------------------------
iou-wrk-6468/6471 is trying to acquire lock:
ffff88801aa98018 (&wqe->lock){+.+.}-{2:2}, at: io_worker_cancel_cb+0xb7/0x210 fs/io-wq.c:187

but task is already holding lock:
ffff88801aa98018 (&wqe->lock){+.+.}-{2:2}, at: io_wq_worker_sleeping+0xb6/0x140 fs/io-wq.c:700

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&wqe->lock);
  lock(&wqe->lock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

1 lock held by iou-wrk-6468/6471:
 #0: ffff88801aa98018 (&wqe->lock){+.+.}-{2:2}, at: io_wq_worker_sleeping+0xb6/0x140 fs/io-wq.c:700

stack backtrace:
CPU: 1 PID: 6471 Comm: iou-wrk-6468 Not tainted 5.16.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x1dc/0x2d8 lib/dump_stack.c:106
 print_deadlock_bug kernel/locking/lockdep.c:2956 [inline]
 check_deadlock kernel/locking/lockdep.c:2999 [inline]
 validate_chain+0x5984/0x8240 kernel/locking/lockdep.c:3788
 __lock_acquire+0x1382/0x2b00 kernel/locking/lockdep.c:5027
 lock_acquire+0x19f/0x4d0 kernel/locking/lockdep.c:5637
 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
 _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:154
 io_worker_cancel_cb+0xb7/0x210 fs/io-wq.c:187
 io_wq_cancel_tw_create fs/io-wq.c:1220 [inline]
 io_queue_worker_create+0x3cf/0x4c0 fs/io-wq.c:372
 io_wq_worker_sleeping+0xbe/0x140 fs/io-wq.c:701
 sched_submit_work kernel/sched/core.c:6295 [inline]
 schedule+0x67/0x1f0 kernel/sched/core.c:6323
 schedule_timeout+0xac/0x300 kernel/time/timer.c:1857
 wait_woken+0xca/0x1b0 kernel/sched/wait.c:460
 unix_msg_wait_data net/unix/unix_bpf.c:32 [inline]
 unix_bpf_recvmsg+0x7f9/0xe20 net/unix/unix_bpf.c:77
 unix_stream_recvmsg+0x214/0x2c0 net/unix/af_unix.c:2832
 sock_recvmsg_nosec net/socket.c:944 [inline]
 sock_recvmsg net/socket.c:962 [inline]
 sock_read_iter+0x3a7/0x4d0 net/socket.c:1035
 call_read_iter include/linux/fs.h:2156 [inline]
 io_iter_do_read fs/io_uring.c:3501 [inline]
 io_read fs/io_uring.c:3558 [inline]
 io_issue_sqe+0x144c/0x9590 fs/io_uring.c:6671
 io_wq_submit_work+0x2d8/0x790 fs/io_uring.c:6836
 io_worker_handle_work+0x808/0xdd0 fs/io-wq.c:574
 io_wqe_worker+0x395/0x870 fs/io-wq.c:630
 ret_from_fork+0x1f/0x30

We can safely drop the lock before doing work creation, making the two
contexts the same in that regard.

Reported-by: [email protected]
Fixes: 71a8538 ("io-wq: check for wq exit after adding new worker task_work")
Signed-off-by: Jens Axboe <[email protected]>
The driver supports a "direct" mode of operation, where the SMP req frame
is directly copied into the command payload (and vice-versa for the SMP
resp).

To get at the SMP req frame data in the scatterlist the driver uses
phys_to_virt() on the DMA mapped memory dma_addr_t . This is broken, and
subsequently crashes as follows when an IOMMU is enabled:

 Unable to handle kernel paging request at virtual address
ffff0000fcebfb00
	...
 pc : pm80xx_chip_smp_req+0x2d0/0x3d0
 lr : pm80xx_chip_smp_req+0xac/0x3d0
 pm80xx_chip_smp_req+0x2d0/0x3d0
 pm8001_task_exec.constprop.0+0x368/0x520
 pm8001_queue_command+0x1c/0x30
 smp_execute_task_sg+0xdc/0x204
 sas_discover_expander.part.0+0xac/0x6cc
 sas_discover_root_expander+0x8c/0x150
 sas_discover_domain+0x3ac/0x6a0
 process_one_work+0x1d0/0x354
 worker_thread+0x13c/0x470
 kthread+0x17c/0x190
 ret_from_fork+0x10/0x20
 Code: 371806e1 910006d6 6b16033f 54000249 (38766b05)
 ---[ end trace b91d59aaee98ea2d ]---
note: kworker/u192:0[7] exited with preempt_count 1

Instead use kmap_atomic().

--
Difference to v1:
- use kmap_atomic() in both locations
Difference to  v2:
- add whitespace around arithmetic (Damien)

Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Damien Le Moal <[email protected]>
Signed-off-by: John Garry <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
The return value of kzalloc() needs to be checked.
To avoid use of null pointer '&ast_state->base' in case of the
failure of alloc.

Fixes: f0adbc3 ("drm/ast: Allocate initial CRTC state of the correct size")
Signed-off-by: Jiasheng Jiang <[email protected]>
Signed-off-by: Thomas Zimmermann <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
When generalising GPIO support and adding support for CP2102N, the GPIO
registration for some CP2105 devices accidentally broke. Specifically,
when all the pins of a port are in "modem" mode, and thus unavailable
for GPIO use, the GPIO chip would now be registered without having
initialised the number of GPIO lines. This would in turn be rejected by
gpiolib and some errors messages would be printed (but importantly probe
would still succeed).

Fix this by initialising the number of GPIO lines before registering the
GPIO chip.

Note that as for the other device types, and as when all CP2105 pins are
muxed for LED function, the GPIO chip is registered also when no pins
are available for GPIO use.

Reported-by: Maarten Brock <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Fixes: c8acfe0 ("USB: serial: cp210x: implement GPIO support for CP2102N")
Cc: [email protected]      # 4.19
Cc: Karoly Pados <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Tested-by: Maarten Brock <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Add the following Telit FN990 compositions:

0x1070: tty, adb, rmnet, tty, tty, tty, tty
0x1071: tty, adb, mbim, tty, tty, tty, tty
0x1072: rndis, tty, adb, tty, tty, tty, tty
0x1073: tty, adb, ecm, tty, tty, tty, tty

Signed-off-by: Daniele Palmas <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: [email protected]
Signed-off-by: Johan Hovold <[email protected]>
…tive.

smatch warning:
drivers/gpu/drm/i915/display/intel_dmc.c:601 parse_dmc_fw() warn:
unsigned 'fw->size - offset' is never less than zero

Firmware size is size_t and offset is u32. So the subtraction is
unsigned which can never be less than zero.

Fixes: 3d5928a ("drm/i915/xelpd: Pipe A DMC plugging")
Signed-off-by: Harshit Mogalapalli <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Signed-off-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 87bb2a4)
Signed-off-by: Rodrigo Vivi <[email protected]>
Livepatching a loaded module involves applying relocations through
apply_relocate_add(), which attempts to write to read-only memory when
CONFIG_STRICT_MODULE_RWX=y.  Work around this by performing these
writes through the text poke area by using patch_instruction().

R_PPC_REL24 is the only relocation type generated by the kpatch-build
userspace tool or klp-convert kernel tree that I observed applying a
relocation to a post-init module.

A more comprehensive solution is planned, but using patch_instruction()
for R_PPC_REL24 on should serve as a sufficient fix.

This does have a performance impact, I observed ~15% overhead in
module_load() on POWER8 bare metal with checksum verification off.

Fixes: c35717c ("powerpc: Set ARCH_HAS_STRICT_MODULE_RWX")
Cc: [email protected] # v5.14+
Reported-by: Joe Lawrence <[email protected]>
Signed-off-by: Russell Currey <[email protected]>
Tested-by: Joe Lawrence <[email protected]>
[mpe: Check return codes from patch_instruction()]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Masking all unused MSI-X entries is done to ensure that a crash kernel
starts from a clean slate, which correponds to the reset state of the
device as defined in the PCI-E specificion 3.0 and later:

 Vector Control for MSI-X Table Entries
 --------------------------------------

 "00: Mask bit:  When this bit is set, the function is prohibited from
                 sending a message using this MSI-X Table entry.
                 ...
                 This bit’s state after reset is 1 (entry is masked)."

A Marvell NVME device fails to deliver MSI interrupts after trying to
enable MSI-X interrupts due to that masking. It seems to take the MSI-X
mask bits into account even when MSI-X is disabled.

While not specification compliant, this can be cured by moving the masking
into the success path, so that the MSI-X table entries stay in device reset
state when the MSI-X setup fails.

[ tglx: Move it into the success path, add comment and amend changelog ]

Fixes: aa8092c ("PCI/MSI: Mask all unused MSI-X entries")                                                                                                                                                                                                                 
Signed-off-by: Stefan Roese <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Bjorn Helgaas <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Marek Vasut <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
PCI_MSIX_FLAGS_MASKALL is set in the MSI-X control register at MSI-X
interrupt setup time. It's cleared on success, but the error handling path
only clears the PCI_MSIX_FLAGS_ENABLE bit.

That's incorrect as the reset state of the PCI_MSIX_FLAGS_MASKALL bit is
zero. That can be observed via lspci:

        Capabilities: [b0] MSI-X: Enable- Count=67 Masked+

Clear the bit in the error path to restore the reset state.

Fixes: 4385539 ("PCI/MSI: Enable and mask MSI-X early")
Reported-by: Stefan Roese <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Stefan Roese <[email protected]>
Cc: [email protected]
Cc: Bjorn Helgaas <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Marek Vasut <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/87tufevoqx.ffs@tglx
The donation calculation logic assumes that the donor has non-zero
after-donation hweight, so the lowest active hweight a donating cgroup can
have is 2 so that it can donate 1 while keeping the other 1 for itself.
Earlier, we only donated from cgroups with sizable surpluses so this
condition was always true. However, with the precise donation algorithm
implemented, f1de243 ("blk-iocost: revamp donation amount
determination") made the donation amount calculation exact enabling even low
hweight cgroups to donate.

This means that in rare occasions, a cgroup with active hweight of 1 can
enter donation calculation triggering the following warning and then a
divide-by-zero oops.

 WARNING: CPU: 4 PID: 0 at block/blk-iocost.c:1928 transfer_surpluses.cold+0x0/0x53 [884/94867]
 ...
 RIP: 0010:transfer_surpluses.cold+0x0/0x53
 Code: 92 ff 48 c7 c7 28 d1 ab b5 65 48 8b 34 25 00 ae 01 00 48 81 c6 90 06 00 00 e8 8b 3f fe ff 48 c7 c0 ea ff ff ff e9 95 ff 92 ff <0f> 0b 48 c7 c7 30 da ab b5 e8 71 3f fe ff 4c 89 e8 4d 85 ed 74 0
4
 ...
 Call Trace:
  <IRQ>
  ioc_timer_fn+0x1043/0x1390
  call_timer_fn+0xa1/0x2c0
  __run_timers.part.0+0x1ec/0x2e0
  run_timer_softirq+0x35/0x70
 ...
 iocg: invalid donation weights in /a/b: active=1 donating=1 after=0

Fix it by excluding cgroups w/ active hweight < 2 from donating. Excluding
these extreme low hweight donations shouldn't affect work conservation in
any meaningful way.

Signed-off-by: Tejun Heo <[email protected]>
Fixes: f1de243 ("blk-iocost: revamp donation amount determination")
Cc: [email protected] # v5.10+
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Line 1169 (#3) allocates a memory chunk for victim_name by kmalloc(),
but  when the function returns in line 1184 (#4) victim_name allocated
by line 1169 (#3) is not freed, which will lead to a memory leak.
There is a similar snippet of code in this function as allocating a memory
chunk for victim_name in line 1104 (#1) as well as releasing the memory
in line 1116 (#2).

We should kfree() victim_name when the return value of backref_in_log()
is less than zero and before the function returns in line 1184 (#4).

1057 static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
1058 				  struct btrfs_root *root,
1059 				  struct btrfs_path *path,
1060 				  struct btrfs_root *log_root,
1061 				  struct btrfs_inode *dir,
1062 				  struct btrfs_inode *inode,
1063 				  u64 inode_objectid, u64 parent_objectid,
1064 				  u64 ref_index, char *name, int namelen,
1065 				  int *search_done)
1066 {

1104 	victim_name = kmalloc(victim_name_len, GFP_NOFS);
	// #1: kmalloc (victim_name-1)
1105 	if (!victim_name)
1106 		return -ENOMEM;

1112	ret = backref_in_log(log_root, &search_key,
1113			parent_objectid, victim_name,
1114			victim_name_len);
1115	if (ret < 0) {
1116		kfree(victim_name); // #2: kfree (victim_name-1)
1117		return ret;
1118	} else if (!ret) {

1169 	victim_name = kmalloc(victim_name_len, GFP_NOFS);
	// #3: kmalloc (victim_name-2)
1170 	if (!victim_name)
1171 		return -ENOMEM;

1180 	ret = backref_in_log(log_root, &search_key,
1181 			parent_objectid, victim_name,
1182 			victim_name_len);
1183 	if (ret < 0) {
1184 		return ret; // #4: missing kfree (victim_name-2)
1185 	} else if (!ret) {

1241 	return 0;
1242 }

Fixes: d3316c8 ("btrfs: Properly handle backref_in_log retval")
CC: [email protected] # 5.10+
Reviewed-by: Qu Wenruo <[email protected]>
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Jianglei Nie <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
torvalds and others added 28 commits December 19, 2021 11:40
…ernel/git/mips/linux

Pull MIPS fix from Thomas Bogendoerfer:

 - only enable pci_remap_iospace() for Ralink devices

* tag 'mips-fixes_5.16_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: Only define pci_remap_iospace() for Ralink
…/linux/kernel/git/tip/tip

Pull signal handlign fix from Borislav Petkov:

 - Prevent lock contention on the new sigaltstack lock on the
   common-case path, when no changes have been made to the alternative
   signal stack.

* tag 'core_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  signal: Skip the altstack update when not needed
…scm/linux/kernel/git/tip/tip

Pull locking fix from Borislav Petkov:

 - Fix the rtmutex condition checking when the optimistic spinning of a
   waiter needs to be terminated

* tag 'locking_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/rtmutex: Fix incorrect condition in rtmutex_spin_on_owner()
…cm/linux/kernel/git/tip/tip

Pull timer fix from Borislav Petkov:

 - Make sure the CLOCK_REALTIME to CLOCK_MONOTONIC offset is never
   positive

* tag 'timers_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping: Really make sure wall_to_monotonic isn't positive
…linux/kernel/git/tip/tip

Pull irq fixes from Borislav Petkov:

 - Clear the PCI_MSIX_FLAGS_MASKALL bit too on the error path so that it
   is restored to its reset state

 - Mask MSI-X vectors late on the init path in order to handle
   out-of-spec Marvell NVME devices which apparently look at the MSI-X
   mask even when MSI-X is disabled

* tag 'irq_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  PCI/MSI: Clear PCI_MSIX_FLAGS_MASKALL on error
  PCI/MSI: Mask MSI-X vectors only on success
Pull block revert from Jens Axboe:
 "It turns out that the fix for not hammering on the delayed work timer
  too much caused a performance regression for BFQ, so let's revert the
  change for now.

  I've got some ideas on how to fix it appropriately, but they should
  wait for 5.17"

* tag 'block-5.16-2021-12-19' of git://git.kernel.dk/linux-block:
  Revert "block: reduce kblockd_mod_delayed_work_on() CPU consumption"
Pull kvm fixes from Paolo Bonzini:
 "Two small fixes, one of which was being worked around in selftests"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Retry page fault if MMU reload is pending and root has no sp
  KVM: selftests: vmx_pmu_msrs_test: Drop tests mangling guest visible CPUIDs
  KVM: x86: Drop guest CPUID check for host initiated writes to MSR_IA32_PERF_CAPABILITIES
The fixed counter 3 is used for the Topdown metrics, which hasn't been
enabled for KVM guests. Userspace accessing to it will fail as it's not
included in get_fixed_pmc(). This breaks KVM selftests on ICX+ machines,
which have this counter.

To reproduce it on ICX+ machines, ./state_test reports:
==== Test Assertion Failure ====
lib/x86_64/processor.c:1078: r == nmsrs
pid=4564 tid=4564 - Argument list too long
1  0x000000000040b1b9: vcpu_save_state at processor.c:1077
2  0x0000000000402478: main at state_test.c:209 (discriminator 6)
3  0x00007fbe21ed5f92: ?? ??:0
4  0x000000000040264d: _start at ??:?
 Unexpected result from KVM_GET_MSRS, r: 17 (failed MSR was 0x30c)

With this patch, it works well.

Signed-off-by: Wei Wang <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
An overlook from the previous commit: we don't even parse or start the
device, meaning that the device is not presented to user space.

Fixes: 9302095 ("HID: check for valid USB device for many HID drivers")
Cc: [email protected]
Link: https://bugs.archlinux.org/task/73048
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215341
Link: https://lore.kernel.org/r/[email protected]/
Signed-off-by: Benjamin Tissoires <[email protected]>
The return value of devm_kzalloc() needs to be checked.
To avoid hdev->dev->driver_data to be null in case of the failure of
alloc.

Fixes: 14c9c01 ("HID: add vivaldi HID driver")
Cc: [email protected]
Signed-off-by: Jiasheng Jiang <[email protected]>
Signed-off-by: Benjamin Tissoires <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
After dropping mmu_lock in the TDP MMU, restart the iterator during
tdp_iter_next() and do not advance the iterator.  Advancing the iterator
results in skipping the top-level SPTE and all its children, which is
fatal if any of the skipped SPTEs were not visited before yielding.

When zapping all SPTEs, i.e. when min_level == root_level, restarting the
iter and then invoking tdp_iter_next() is always fatal if the current gfn
has as a valid SPTE, as advancing the iterator results in try_step_side()
skipping the current gfn, which wasn't visited before yielding.

Sprinkle WARNs on iter->yielded being true in various helpers that are
often used in conjunction with yielding, and tag the helper with
__must_check to reduce the probabily of improper usage.

Failing to zap a top-level SPTE manifests in one of two ways.  If a valid
SPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),
the shadow page will be leaked and KVM will WARN accordingly.

  WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm]
  RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm]
  Call Trace:
   <TASK>
   kvm_arch_destroy_vm+0x130/0x1b0 [kvm]
   kvm_destroy_vm+0x162/0x2a0 [kvm]
   kvm_vcpu_release+0x34/0x60 [kvm]
   __fput+0x82/0x240
   task_work_run+0x5c/0x90
   do_exit+0x364/0xa10
   ? futex_unqueue+0x38/0x60
   do_group_exit+0x33/0xa0
   get_signal+0x155/0x850
   arch_do_signal_or_restart+0xed/0x750
   exit_to_user_mode_prepare+0xc5/0x120
   syscall_exit_to_user_mode+0x1d/0x40
   do_syscall_64+0x48/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

If kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped by
kvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form of
marking a struct page as dirty/accessed after it has been put back on the
free list.  This directly triggers a WARN due to encountering a page with
page_count() == 0, but it can also lead to data corruption and additional
errors in the kernel.

  WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171
  RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm]
  Call Trace:
   <TASK>
   kvm_set_pfn_dirty+0x120/0x1d0 [kvm]
   __handle_changed_spte+0x92e/0xca0 [kvm]
   __handle_changed_spte+0x63c/0xca0 [kvm]
   __handle_changed_spte+0x63c/0xca0 [kvm]
   __handle_changed_spte+0x63c/0xca0 [kvm]
   zap_gfn_range+0x549/0x620 [kvm]
   kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm]
   mmu_free_root_page+0x219/0x2c0 [kvm]
   kvm_mmu_free_roots+0x1b4/0x4e0 [kvm]
   kvm_mmu_unload+0x1c/0xa0 [kvm]
   kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm]
   kvm_put_kvm+0x3b1/0x8b0 [kvm]
   kvm_vcpu_release+0x4e/0x70 [kvm]
   __fput+0x1f7/0x8c0
   task_work_run+0xf8/0x1a0
   do_exit+0x97b/0x2230
   do_group_exit+0xda/0x2a0
   get_signal+0x3be/0x1e50
   arch_do_signal_or_restart+0x244/0x17f0
   exit_to_user_mode_prepare+0xcb/0x120
   syscall_exit_to_user_mode+0x1d/0x40
   do_syscall_64+0x4d/0x90
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Note, the underlying bug existed even before commit 1af4a96 ("KVM:
x86/mmu: Yield in TDU MMU iter even if no SPTES changed") moved calls to
tdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could still
incorrectly advance past a top-level entry when yielding on a lower-level
entry.  But with respect to leaking shadow pages, the bug was introduced
by yielding before processing the current gfn.

Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, or
callers could jump to their "retry" label.  The downside of that approach
is that tdp_mmu_iter_cond_resched() _must_ be called before anything else
in the loop, and there's no easy way to enfornce that requirement.

Ideally, KVM would handling the cond_resched() fully within the iterator
macro (the code is actually quite clean) and avoid this entire class of
bugs, but that is extremely difficult do while also supporting yielding
after tdp_mmu_set_spte_atomic() fails.  Yielding after failing to set a
SPTE is very desirable as the "owner" of the REMOVED_SPTE isn't strictly
bounded, e.g. if it's zapping a high-level shadow page, the REMOVED_SPTE
may block operations on the SPTE for a significant amount of time.

Fixes: faaf05b ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
Fixes: 1af4a96 ("KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed")
Reported-by: Ignat Korchagin <[email protected]>
Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
The kvm_run struct's if_flag is a part of the userspace/kernel API. The
SEV-ES patches failed to set this flag because it's no longer needed by
QEMU (according to the comment in the source code). However, other
hypervisors may make use of this flag. Therefore, set the flag for
guests with encrypted registers (i.e., with guest_state_protected set).

Fixes: f1c6366 ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
Signed-off-by: Marc Orr <[email protected]>
Message-Id: <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Attempting to compile on a non-x86 architecture fails with

include/kvm_util.h: In function ‘vm_compute_max_gfn’:
include/kvm_util.h:79:21: error: dereferencing pointer to incomplete type ‘struct kvm_vm’
  return ((1ULL << vm->pa_bits) >> vm->page_shift) - 1;
                     ^~

This is because the declaration of struct kvm_vm is in
lib/kvm_util_internal.h as an effort to make it private to
the test lib code. We can still provide arch specific functions,
though, by making the generic function symbols weak. Do that to
fix the compile error.

Fixes: c8cc43c ("selftests: KVM: avoid failures due to reserved HyperTransport region")
Cc: [email protected]
Signed-off-by: Andrew Jones <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Revert a relatively recent change that set vmx->fail if the vCPU is in L2
and emulation_required is true, as that behavior is completely bogus.
Setting vmx->fail and synthesizing a VM-Exit is contradictory and wrong:

  (a) it's impossible to have both a VM-Fail and VM-Exit
  (b) vmcs.EXIT_REASON is not modified on VM-Fail
  (c) emulation_required refers to guest state and guest state checks are
      always VM-Exits, not VM-Fails.

For KVM specifically, emulation_required is handled before nested exits
in __vmx_handle_exit(), thus setting vmx->fail has no immediate effect,
i.e. KVM calls into handle_invalid_guest_state() and vmx->fail is ignored.
Setting vmx->fail can ultimately result in a WARN in nested_vmx_vmexit()
firing when tearing down the VM as KVM never expects vmx->fail to be set
when L2 is active, KVM always reflects those errors into L1.

  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 21158 at arch/x86/kvm/vmx/nested.c:4548
                                nested_vmx_vmexit+0x16bd/0x17e0
                                arch/x86/kvm/vmx/nested.c:4547
  Modules linked in:
  CPU: 0 PID: 21158 Comm: syz-executor.1 Not tainted 5.16.0-rc3-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:nested_vmx_vmexit+0x16bd/0x17e0 arch/x86/kvm/vmx/nested.c:4547
  Code: <0f> 0b e9 2e f8 ff ff e8 57 b3 5d 00 0f 0b e9 00 f1 ff ff 89 e9 80
  Call Trace:
   vmx_leave_nested arch/x86/kvm/vmx/nested.c:6220 [inline]
   nested_vmx_free_vcpu+0x83/0xc0 arch/x86/kvm/vmx/nested.c:330
   vmx_free_vcpu+0x11f/0x2a0 arch/x86/kvm/vmx/vmx.c:6799
   kvm_arch_vcpu_destroy+0x6b/0x240 arch/x86/kvm/x86.c:10989
   kvm_vcpu_destroy+0x29/0x90 arch/x86/kvm/../../../virt/kvm/kvm_main.c:441
   kvm_free_vcpus arch/x86/kvm/x86.c:11426 [inline]
   kvm_arch_destroy_vm+0x3ef/0x6b0 arch/x86/kvm/x86.c:11545
   kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1189 [inline]
   kvm_put_kvm+0x751/0xe40 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1220
   kvm_vcpu_release+0x53/0x60 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3489
   __fput+0x3fc/0x870 fs/file_table.c:280
   task_work_run+0x146/0x1c0 kernel/task_work.c:164
   exit_task_work include/linux/task_work.h:32 [inline]
   do_exit+0x705/0x24f0 kernel/exit.c:832
   do_group_exit+0x168/0x2d0 kernel/exit.c:929
   get_signal+0x1740/0x2120 kernel/signal.c:2852
   arch_do_signal_or_restart+0x9c/0x730 arch/x86/kernel/signal.c:868
   handle_signal_work kernel/entry/common.c:148 [inline]
   exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
   exit_to_user_mode_prepare+0x191/0x220 kernel/entry/common.c:207
   __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
   syscall_exit_to_user_mode+0x2e/0x70 kernel/entry/common.c:300
   do_syscall_64+0x53/0xd0 arch/x86/entry/common.c:86
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: c8607e4 ("KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if !from_vmentry")
Reported-by: [email protected]
Reviewed-by: Maxim Levitsky <[email protected]>
Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Synthesize a triple fault if L2 guest state is invalid at the time of
VM-Enter, which can happen if L1 modifies SMRAM or if userspace stuffs
guest state via ioctls(), e.g. KVM_SET_SREGS.  KVM should never emulate
invalid guest state, since from L1's perspective, it's architecturally
impossible for L2 to have invalid state while L2 is running in hardware.
E.g. attempts to set CR0 or CR4 to unsupported values will either VM-Exit
or #GP.

Modifying vCPU state via RSM+SMRAM and ioctl() are the only paths that
can trigger this scenario, as nested VM-Enter correctly rejects any
attempt to enter L2 with invalid state.

RSM is a straightforward case as (a) KVM follows AMD's SMRAM layout and
behavior, and (b) Intel's SDM states that loading reserved CR0/CR4 bits
via RSM results in shutdown, i.e. there is precedent for KVM's behavior.
Following AMD's SMRAM layout is important as AMD's layout saves/restores
the descriptor cache information, including CS.RPL and SS.RPL, and also
defines all the fields relevant to invalid guest state as read-only, i.e.
so long as the vCPU had valid state before the SMI, which is guaranteed
for L2, RSM will generate valid state unless SMRAM was modified.  Intel's
layout saves/restores only the selector, which means that scenarios where
the selector and cached RPL don't match, e.g. conforming code segments,
would yield invalid guest state.  Intel CPUs fudge around this issued by
stuffing SS.RPL and CS.RPL on RSM.  Per Intel's SDM on the "Default
Treatment of RSM", paraphrasing for brevity:

  IF internal storage indicates that the [CPU was post-VMXON]
  THEN
     enter VMX operation (root or non-root);
     restore VMX-critical state as defined in Section 34.14.1;
     set to their fixed values any bits in CR0 and CR4 whose values must
     be fixed in VMX operation [unless coming from an unrestricted guest];
     IF RFLAGS.VM = 0 AND (in VMX root operation OR the
        “unrestricted guest” VM-execution control is 0)
     THEN
       CS.RPL := SS.DPL;
       SS.RPL := SS.DPL;
     FI;
     restore current VMCS pointer;
  FI;

Note that Intel CPUs also overwrite the fixed CR0/CR4 bits, whereas KVM
will sythesize TRIPLE_FAULT in this scenario.  KVM's behavior is allowed
as both Intel and AMD define CR0/CR4 SMRAM fields as read-only, i.e. the
only way for CR0 and/or CR4 to have illegal values is if they were
modified by the L1 SMM handler, and Intel's SDM "SMRAM State Save Map"
section states "modifying these registers will result in unpredictable
behavior".

KVM's ioctl() behavior is less straightforward.  Because KVM allows
ioctls() to be executed in any order, rejecting an ioctl() if it would
result in invalid L2 guest state is not an option as KVM cannot know if
a future ioctl() would resolve the invalid state, e.g. KVM_SET_SREGS, or
drop the vCPU out of L2, e.g. KVM_SET_NESTED_STATE.  Ideally, KVM would
reject KVM_RUN if L2 contained invalid guest state, but that carries the
risk of a false positive, e.g. if RSM loaded invalid guest state and KVM
exited to userspace.  Setting a flag/request to detect such a scenario is
undesirable because (a) it's extremely unlikely to add value to KVM as a
whole, and (b) KVM would need to consider ioctl() interactions with such
a flag, e.g. if userspace migrated the vCPU while the flag were set.

Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Update the documentation for kvm-intel's emulate_invalid_guest_state to
rectify the description of KVM's default behavior, and to document that
the behavior and thus parameter only applies to L1.

Fixes: a27685c ("KVM: VMX: Emulate invalid guest state by default")
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
…tate

Add a selftest to attempt to enter L2 with invalid guests state by
exiting to userspace via I/O from L2, and then using KVM_SET_SREGS to set
invalid guest state (marking TR unusable is arbitrary chosen for its
relative simplicity).

This is a regression test for a bug introduced by commit c8607e4
("KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if
!from_vmentry"), which incorrectly set vmx->fail=true when L2 had invalid
guest state and ultimately triggered a WARN due to nested_vmx_vmexit()
seeing vmx->fail==true while attempting to synthesize a nested VM-Exit.

The is also a functional test to verify that KVM sythesizes TRIPLE_FAULT
for L2, which is somewhat arbitrary behavior, instead of emulating L2.
KVM should never emulate L2 due to invalid guest state, as it's
architecturally impossible for L1 to run an L2 guest with invalid state
as nested VM-Enter should always fail, i.e. L1 needs to do the emulation.
Stuffing state via KVM ioctl() is a non-architctural, out-of-band case,
hence the TRIPLE_FAULT being rather arbitrary.

Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
…en/tip

Merge xen fixes from Juergen Gross:
 "Fixes for two issues related to Xen and malicious guests:

   - Guest can force the netback driver to hog large amounts of memory

   - Denial of Service in other guests due to event storms"

* 'xsa' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/netback: don't queue unlimited number of packages
  xen/netback: fix rx queue stall detection
  xen/console: harden hvc_xen against event channel storms
  xen/netfront: harden netfront against event channel storms
  xen/blkfront: harden blkfront against event channel storms
…inux/kernel/git/broonie/regulator

Pull regulator fix from Mark Brown:
 "Binding fix for v5.16

  This fixes problems validating DT bindings using op_mode which wasn't
  described as it should have been when converting to DT schema"

* tag 'regulator-fix-v5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: dt-bindings: samsung,s5m8767: add missing op_mode to bucks
…ernel/git/broonie/spi

Pull spi fix from Mark Brown:
 "One small fix for a long standing issue with error handling on probe
  in the Armada driver"

* tag 'spi-fix-v5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: change clk_disable_unprepare to clk_unprepare
…t/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Last fixes before holidays. Nothing very exciting:

   - Work around a HW bug in HNS HIP08

   - Recent memory leak regression in qib

   - Incorrect use of kfree() for vmalloc memory in hns"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/hns: Replace kfree() with kvfree()
  IB/qib: Fix memory leak in qib_user_sdma_queue_pkts()
  RDMA/hns: Fix RNR retransmission issue for HIP08
…/git/hid/hid

Pull HID fixes from Jiri Kosina:

 - NULL pointer dereference fix in Vivaldi driver (Jiasheng Jiang)

 - regression fix for device probing in Holtek driver (Benjamin
   Tissoires)

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: potential dereference of null pointer
  HID: holtek: fix mouse probing
Drop a check that guards triggering a posted interrupt on the currently
running vCPU, and more importantly guards waking the target vCPU if
triggering a posted interrupt fails because the vCPU isn't IN_GUEST_MODE.
If a vIRQ is delivered from asynchronous context, the target vCPU can be
the currently running vCPU and can also be blocking, in which case
skipping kvm_vcpu_wake_up() is effectively dropping what is supposed to
be a wake event for the vCPU.

The "do nothing" logic when "vcpu == running_vcpu" mostly works only
because the majority of calls to ->deliver_posted_interrupt(), especially
when using posted interrupts, come from synchronous KVM context.  But if
a device is exposed to the guest using vfio-pci passthrough, the VFIO IRQ
and vCPU are bound to the same pCPU, and the IRQ is _not_ configured to
use posted interrupts, wake events from the device will be delivered to
KVM from IRQ context, e.g.

  vfio_msihandler()
  |
  |-> eventfd_signal()
      |
      |-> ...
          |
          |->  irqfd_wakeup()
               |
               |->kvm_arch_set_irq_inatomic()
                  |
                  |-> kvm_irq_delivery_to_apic_fast()
                      |
                      |-> kvm_apic_set_irq()

This also aligns the non-nested and nested usage of triggering posted
interrupts, and will allow for additional cleanups.

Fixes: 379a3c8 ("KVM: VMX: Optimize posted-interrupt delivery for timer fastpath")
Cc: [email protected]
Reported-by: Longpeng (Mike) <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
…git/cel/linux

Pull nfsd fix from Chuck Lever:
 "Address a buffer overrun reported by Anatoly Trosinenko"

* tag 'nfsd-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: Fix READDIR buffer overflow
Pull kvm fixes from Paolo Bonzini:

 - Fix for compilation of selftests on non-x86 architectures

 - Fix for kvm_run->if_flag on SEV-ES

 - Fix for page table use-after-free if yielding during exit_mm()

 - Improve behavior when userspace starts a nested guest with invalid
   state

 - Fix missed wakeup with assigned devices but no VT-d posted interrupts

 - Do not tell userspace to save/restore an unsupported PMU MSR

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU
  KVM: selftests: Add test to verify TRIPLE_FAULT on invalid L2 guest state
  KVM: VMX: Fix stale docs for kvm-intel.emulate_invalid_guest_state
  KVM: nVMX: Synthesize TRIPLE_FAULT for L2 if emulation is required
  KVM: VMX: Always clear vmx->fail on emulation_required
  selftests: KVM: Fix non-x86 compiling
  KVM: x86: Always set kvm_run->if_flag
  KVM: x86/mmu: Don't advance iterator after restart due to yielding
  KVM: x86: remove PMU FIXED_CTR3 from msrs_to_save_all
…git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
 "Fix a recent regression causing the loop in dpm_prepare() to become
  infinite if one of the device ->prepare() callbacks returns an error"

* tag 'pm-5.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: sleep: Fix error handling in dpm_prepare()
@Vylpes Vylpes merged commit 3a34597 into master Dec 28, 2021
@Vylpes Vylpes deleted the archlinux-master branch December 28, 2021 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.