Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wip/matt auld/dynamic oa configs v2 #6

Conversation

matt-auld
Copy link

Feel free to take a look when you get the time. Thanks.

As per your comments:

  • We now silently fail when given an empty OA register set
  • We now check that at least one register set is given, be it MUX, BOOL or FLEX
  • We now explicitly check the uuid format from user-space - this is still just treated as a string and not a 128bit value
    • I did look into using the 128bit form for the uuid, but I couldn't find any helpful kernel utilities, other than this https://lwn.net/Articles/356868/. Something like uuid_unparse would be really nice here.
  • We now check for a uuid collision early on
  • We now return from the icotl the metric-set id, rather than storing it in the user struct - much cleaner

Also note the "remove redundant metric set check" commit, am I correct in making this change, to me both checks seem to do the same thing. If so feel free to squash with your commits, otherwise just disregard.

I think some kind of locking is still needed.

rib and others added 22 commits January 19, 2016 15:36
This adds a DRM_IOCTL_I915_PERF_OPEN ioctl comparable to perf_event_open
that opens a file descriptor for an event source.

Based on our initial experience aiming to use the core perf
infrastructure, this interface is inspired by perf, but focused on
exposing metrics about work running on Gen graphics instead a CPU.

One notable difference is that it doesn't support mmaping a circular
buffer of samples into userspace. The currently planned use cases
require an internal buffering that forces at least one copy of data
which can be neatly hidden in a read() based interface.

No specific event types are supported yet so perf_event_open can currently
only get as far as returning EINVAL for an unknown event type.

Signed-off-by: Robert Bragg <[email protected]>
OACONTROL changes quite a bit for gen8, with some bits split out into a
per-context OACTXCONTROL register

Signed-off-by: Robert Bragg <[email protected]>
Adds a static OA unit, MUX + B Counter configuration for basic render
metrics on Haswell. This is autogenerated from an internal XML
description of metric sets.

Signed-off-by: Robert Bragg <[email protected]>
Gen graphics hardware can be set up to periodically write snapshots of
performance counters into a circular buffer via its Observation
Architecture and this patch exposes that capability to userspace via the
i915 perf interface.

Cc: Chris Wilson <[email protected]>
Signed-off-by: Robert Bragg <[email protected]>
Signed-off-by: Zhenyu Wang <[email protected]>
Consistent with the kernel.perf_event_paranoid sysctl option that can
allow non-root users to access system wide cpu metrics, this can
optionally allow non-root users to access system wide OA counter metrics
from Gen graphics hardware.

Signed-off-by: Robert Bragg <[email protected]>
The minimal sampling period is now configurable via a
dev.i915.oa_min_timer_exponent sysctl parameter.

Following the precedent set by perf, the default is the minimum that
won't (on its own) exceed the default kernel.perf_event_max_sample_rate
default of 100000 samples/s.

Signed-off-by: Robert Bragg <[email protected]>
This adds 'compute', 'compute extended', 'memory reads', 'memory writes'
and 'sampler balance' metric sets for Haswell.

Signed-off-by: Robert Bragg <[email protected]>
The mask of slices enabled sometimes affects what Observability counters
are exposed for a particular metric set configuration and so userspace
needs the mask to be able to correctly normalize the raw data.

Signed-off-by: Robert Bragg <[email protected]>
Enables userspace to determine the number of slices enabled and also
know what specific slices are enabled. This information is required, for
example, to be able to analyse some OA counter reports where the counter
configuration depends on the HW slice configuration.

Signed-off-by: Robert Bragg <[email protected]>
Not assuming the subslice mask is uniform across all slices this mask
covers all slices with N bits per slice, where N is the maximum number
of slices for a particular Gen.

Notably this matches the definition of the $SubsliceMask variable that
is currently found in out OA metrics XML files.

Notably this updates the ss_max constant (max number of sub slices) from
4 to 3 because although the FUSE2 register provides a 4 bit mask, gen 9
configurations really only go up to a maximum of 3 slices, with bit 3 in
the FUSE2 subslice disabled mask always set.

Cc: Zhenyu Wang <[email protected]>
Signed-off-by: Robert Bragg <[email protected]>
Assuming a uniform mask across all slices, this enables userspace to
determine the specific sub slices enabled. This information is required,
for example, to be able to analyse some OA counter reports where the
counter configuration depends on the HW sub slice configuration.

Signed-off-by: Robert Bragg <[email protected]>
The current context user handles are specific to drm file instance.
There are some usecases, which may require a global id for the contexts.
For e.g. a system level GPU profiler tool may lean upon the global context
ids to associate the performance snapshots with individual contexts.

This global id may also be used further in order to provide a unique
context id to hw.

In this patch, the global ids are allocated from a separate cyclic idr and
can be further utilized for any usecase described above.

v2: According to Chris' suggestion, implemented a separate idr for holding
global ids for contexts, as opposed to overloading the file specific
ctx->user_handle for this purpose. This global id can also further be used
wherever hw has to be programmed with ctx unique id, though this patch just
introduces the hw global id as such.

Signed-off-by: Sourab Gupta <[email protected]>
This will allow the ID to be given to the HW as the unique context
identifier that's written, for example, to the context status buffer
on preemption and included in reports written by the OA unit.

Cc: Sourab Gupta <[email protected]>
Signed-off-by: Robert Bragg <[email protected]>
The newly added intel_context::global_id is suitable (a globally unique
20 bit ID) for giving to the hardware as a unique context identifier.

Compared to using the pinned address of a logical ring context these IDs
are constant for the lifetime of a context whereas a context could be
repinned at different addresses during its lifetime.

Having a stable ID is useful when we need to buffer information
associated with a context based on this ID so the association can't be
lost. For example the OA unit writes out counter reports to a circular
buffer tagged with this ID and we want to be able to accurately filter
reports for a specific context, ideally without the added complexity of
tracking context re-pinning while the OA buffer may contain reports with
older IDs.

Cc: Sourab Gupta <[email protected]>
Signed-off-by: Robert Bragg <[email protected]>
Since the exponent for periodic OA counter sampling is maintained in a
per-context register while we want to treat it as if it were global
state we need to be able to safely issue an mmio write to a per-context
register and affect any currently running context.

We have to take some extra care in this case and this adds a utility
api to encapsulate what's required.

Signed-off-by: Robert Bragg <[email protected]>
Adds a static OA unit, MUX, B Counter + Flex EU configurations for basic
render metrics on Broadwell, Cherryview and Skylake. These are
autogenerated from an internal XML description of metric sets.

Signed-off-by: Robert Bragg <[email protected]>
Enables access to OA unit metrics for BDW, CHV and SKL which all share
the same OA unit design.

Signed-off-by: Robert Bragg <[email protected]>
Each metric set is given a sysfs entry like:

/sys/class/drm/card0/metrics/<guid>/id

This allows userspace to enumerate the specific sets that are available
for the current system. The 'id' file contains an unsigned integer that
can be used to open the associated metric set via
DRM_IOCTL_I915_PERF_OPEN. The <guid> is a globally unique ID for a
specific OA unit configuration that can be reliably used as a key to
lookup corresponding counter meta data and normalization equations.

Signed-off-by: Robert Bragg <[email protected]>
This is already checked later on.

Signed-off-by: Matthew Auld <[email protected]>
The motivation behind this new interface is expose at runtime
the creation of new OA configs which can be used as part of the
i915 perf open interface. This will enable the kernel to learn new
configs which may be experimental, or otherwise not part of the core set
currently available through the i915 perf interface.

Signed-off-by: Matthew Auld <[email protected]>
@matt-auld
Copy link
Author

Ah right, cool, yeah I opted for the former given that the unconditional check ensures the metric_set has a sensible value before continuing(as in if no metric set is given). I must have missed _init_stream_code, is this in your branch, or am I just blind? I'll rebase this when you push those changes.

@rib rib force-pushed the wip/rib/oa-4.4-testing branch 2 times, most recently from 8ed4582 to 753f370 Compare January 27, 2016 02:11
@matt-auld matt-auld closed this Jan 28, 2016
rib pushed a commit that referenced this pull request Feb 22, 2016
Fixes segmentation fault using, for instance:

  (gdb) run record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls
  Starting program: /home/acme/bin/perf record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls
  Missing separate debuginfos, use: dnf debuginfo-install glibc-2.22-7.fc23.x86_64
  [Thread debugging using libthread_db enabled]
  Using host libthread_db library "/lib64/libthread_db.so.1".

 Program received signal SIGSEGV, Segmentation fault.
  0 x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410
  (gdb) bt
  #0  0x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410
  #1  0x00000000004b9fc5 in add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
      at util/parse-events.c:433
  #2  0x00000000004ba334 in add_tracepoint_event (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
      at util/parse-events.c:498
  #3  0x00000000004bb699 in parse_events_add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys=0x19b1370 "sched", event=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
      at util/parse-events.c:936
  #4  0x00000000004f6eda in parse_events_parse (_data=0x7fffffffb8b0, scanner=0x19a49d0) at util/parse-events.y:391
  #5  0x00000000004bc8e5 in parse_events__scanner (str=0x663ff2 "sched:sched_switch", data=0x7fffffffb8b0, start_token=258) at util/parse-events.c:1361
  #6  0x00000000004bca57 in parse_events (evlist=0x19a5220, str=0x663ff2 "sched:sched_switch", err=0x0) at util/parse-events.c:1401
  #7  0x0000000000518d5f in perf_evlist__can_select_event (evlist=0x19a3b90, str=0x663ff2 "sched:sched_switch") at util/record.c:253
  #8  0x0000000000553c42 in intel_pt_track_switches (evlist=0x19a3b90) at arch/x86/util/intel-pt.c:364
  #9  0x00000000005549d1 in intel_pt_recording_options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at arch/x86/util/intel-pt.c:664
  #10 0x000000000051e076 in auxtrace_record__options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at util/auxtrace.c:539
  #11 0x0000000000433368 in cmd_record (argc=1, argv=0x7fffffffde60, prefix=0x0) at builtin-record.c:1264
  #12 0x000000000049bec2 in run_builtin (p=0x8fa2a8 <commands+168>, argc=5, argv=0x7fffffffde60) at perf.c:390
  #13 0x000000000049c12a in handle_internal_command (argc=5, argv=0x7fffffffde60) at perf.c:451
  #14 0x000000000049c278 in run_argv (argcp=0x7fffffffdcbc, argv=0x7fffffffdcb0) at perf.c:495
  #15 0x000000000049c60a in main (argc=5, argv=0x7fffffffde60) at perf.c:618
(gdb)

Intel PT attempts to find the sched:sched_switch tracepoint but that seg
faults if tracefs is not readable, because the error reporting structure
is null, as errors are not reported when automatically adding
tracepoints.  Fix by checking before using.

Committer note:

This doesn't take place in a kernel that supports
perf_event_attr.context_switch, that is the default way that will be
used for tracking context switches, only in older kernels, like 4.2, in
a machine with Intel PT (e.g. Broadwell) for non-priviledged users.

Further info from a similar patch by Wang:

The error is in tracepoint_error: it assumes the 'e' parameter is valid.

However, there are many situation a parse_event() can be called without
parse_events_error. See result of

  $ grep 'parse_events(.*NULL)' ./tools/perf/ -r'

Signed-off-by: Adrian Hunter <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Tong Zhang <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: [email protected] # v4.4+
Fixes: 1965817 ("perf tools: Enhance parsing events tracepoint error output")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Mar 15, 2016
If we use USB ID pin as wakeup source, and there is a USB block
device on this USB OTG (ID) cable, the system will be deadlock
after system resume.

The root cause for this problem is: the workqueue ci_otg may try
to remove hcd before the driver resume has finished, and hcd will
disconnect the device on it, then, it will call device_release_driver,
and holds the device lock "dev->mutex", but it is never unlocked since
it waits workqueue writeback to run to flush the block information, but
the workqueue writeback is freezable, it is not thawed before driver
resume has finished.

When the driver (device: sd 0:0:0:0:) resume goes to dpm_complete, it
tries to get its device lock "dev->mutex", but it can't get it forever,
then the deadlock occurs. Below call stacks show the situation.

So, in order to fix this problem, we need to change workqueue ci_otg
as freezable, then the work item in this workqueue will be run after
driver's resume, this workqueue will not be blocked forever like above
case since the workqueue writeback has been thawed too.

Tested at: i.mx6qdl-sabresd and i.mx6sx-sdb.

[  555.178869] kworker/u2:13   D c07de74c     0   826      2 0x00000000
[  555.185310] Workqueue: ci_otg ci_otg_work
[  555.189353] Backtrace:
[  555.191849] [<c07de4fc>] (__schedule) from [<c07dec6c>] (schedule+0x48/0xa0)
[  555.198912]  r10:ee471ba0 r9:00000000 r8:00000000 r7:00000002 r6:ee470000 r5:ee471ba4
[  555.206867]  r4:ee470000
[  555.209453] [<c07dec24>] (schedule) from [<c07e2fc4>] (schedule_timeout+0x15c/0x1e0)
[  555.217212]  r4:7fffffff r3:edc2b000
[  555.220862] [<c07e2e68>] (schedule_timeout) from [<c07df6c8>] (wait_for_common+0x94/0x144)
[  555.229140]  r8:00000000 r7:00000002 r6:ee470000 r5:ee471ba4 r4:7fffffff
[  555.235980] [<c07df634>] (wait_for_common) from [<c07df790>] (wait_for_completion+0x18/0x1c)
[  555.244430]  r10:00000001 r9:c0b5563c r8:c0042e48 r7:ef086000 r6:eea4372c r5:ef131b00
[  555.252383]  r4:00000000
[  555.254970] [<c07df778>] (wait_for_completion) from [<c0043cb8>] (flush_work+0x19c/0x234)
[  555.263177] [<c0043b1c>] (flush_work) from [<c0043fac>] (flush_delayed_work+0x48/0x4c)
[  555.271106]  r8:ed5b5000 r7:c0b38a3c r6:eea439cc r5:eea4372c r4:eea4372c
[  555.277958] [<c0043f64>] (flush_delayed_work) from [<c00eae18>] (bdi_unregister+0x84/0xec)
[  555.286236]  r4:eea43520 r3:20000153
[  555.289885] [<c00ead94>] (bdi_unregister) from [<c02c2154>] (blk_cleanup_queue+0x180/0x29c)
[  555.298250]  r5:eea43808 r4:eea43400
[  555.301909] [<c02c1fd4>] (blk_cleanup_queue) from [<c0417914>] (__scsi_remove_device+0x48/0xb8)
[  555.310623]  r7:00000000 r6:20000153 r5:ededa950 r4:ededa800
[  555.316403] [<c04178cc>] (__scsi_remove_device) from [<c0415e90>] (scsi_forget_host+0x64/0x68)
[  555.325028]  r5:ededa800 r4:ed5b5000
[  555.328689] [<c0415e2c>] (scsi_forget_host) from [<c0409828>] (scsi_remove_host+0x78/0x104)
[  555.337054]  r5:ed5b5068 r4:ed5b5000
[  555.340709] [<c04097b0>] (scsi_remove_host) from [<c04cdfcc>] (usb_stor_disconnect+0x50/0xb4)
[  555.349247]  r6:ed5b56e4 r5:ed5b5818 r4:ed5b5690 r3:00000008
[  555.355025] [<c04cdf7c>] (usb_stor_disconnect) from [<c04b3bc8>] (usb_unbind_interface+0x78/0x25c)
[  555.363997]  r8:c13919b4 r7:edd3c000 r6:edd3c020 r5:ee551c68 r4:ee551c00 r3:c04cdf7c
[  555.371892] [<c04b3b50>] (usb_unbind_interface) from [<c03dc248>] (__device_release_driver+0x8c/0x118)
[  555.381213]  r10:00000001 r9:edd90c00 r8:c13919b4 r7:ee551c68 r6:c0b546e0 r5:c0b5563c
[  555.389167]  r4:edd3c020
[  555.391752] [<c03dc1bc>] (__device_release_driver) from [<c03dc2fc>] (device_release_driver+0x28/0x34)
[  555.401071]  r5:edd3c020 r4:edd3c054
[  555.404721] [<c03dc2d4>] (device_release_driver) from [<c03db304>] (bus_remove_device+0xe0/0x110)
[  555.413607]  r5:edd3c020 r4:ef17f04c
[  555.417253] [<c03db224>] (bus_remove_device) from [<c03d8128>] (device_del+0x114/0x21c)
[  555.425270]  r6:edd3c028 r5:edd3c020 r4:ee551c00 r3:00000000
[  555.431045] [<c03d8014>] (device_del) from [<c04b1560>] (usb_disable_device+0xa4/0x1e8)
[  555.439061]  r8:edd3c000 r7:eded8000 r6:00000000 r5:00000001 r4:ee551c00
[  555.445906] [<c04b14bc>] (usb_disable_device) from [<c04a8e54>] (usb_disconnect+0x74/0x224)
[  555.454271]  r9:edd90c00 r8:ee551000 r7:ee551c68 r6:ee551c9c r5:ee551c00 r4:00000001
[  555.462156] [<c04a8de0>] (usb_disconnect) from [<c04a8fb8>] (usb_disconnect+0x1d8/0x224)
[  555.470259]  r10:00000001 r9:edd90000 r8:ee471e2c r7:ee551468 r6:ee55149c r5:ee551400
[  555.478213]  r4:00000001
[  555.480797] [<c04a8de0>] (usb_disconnect) from [<c04ae5ec>] (usb_remove_hcd+0xa0/0x1ac)
[  555.488813]  r10:00000001 r9:ee471eb0 r8:00000000 r7:ef3d9500 r6:eded810c r5:eded80b0
[  555.496765]  r4:eded8000
[  555.499351] [<c04ae54c>] (usb_remove_hcd) from [<c04d4158>] (host_stop+0x28/0x64)
[  555.506847]  r6:eeb50010 r5:eded8000 r4:eeb51010
[  555.511563] [<c04d4130>] (host_stop) from [<c04d09b8>] (ci_otg_work+0xc4/0x124)
[  555.518885]  r6:00000001 r5:eeb50010 r4:eeb502a0 r3:c04d4130
[  555.524665] [<c04d08f4>] (ci_otg_work) from [<c00454f0>] (process_one_work+0x194/0x420)
[  555.532682]  r6:ef086000 r5:eeb502a0 r4:edc44480
[  555.537393] [<c004535c>] (process_one_work) from [<c00457b0>] (worker_thread+0x34/0x514)
[  555.545496]  r10:edc44480 r9:ef086000 r8:c0b1a100 r7:ef086034 r6:00000088 r5:edc44498
[  555.553450]  r4:ef086000
[  555.556032] [<c004577c>] (worker_thread) from [<c004bab4>] (kthread+0xdc/0xf8)
[  555.563268]  r10:00000000 r9:00000000 r8:00000000 r7:c004577c r6:edc44480 r5:eddc15c0
[  555.571221]  r4:00000000
[  555.573804] [<c004b9d8>] (kthread) from [<c000fef0>] (ret_from_fork+0x14/0x24)
[  555.581040]  r7:00000000 r6:00000000 r5:c004b9d8 r4:eddc15c0

[  553.429383] sh              D c07de74c     0   694    691 0x00000000
[  553.435801] Backtrace:
[  553.438295] [<c07de4fc>] (__schedule) from [<c07dec6c>] (schedule+0x48/0xa0)
[  553.445358]  r10:edd3c054 r9:edd3c078 r8:edddbd50 r7:edcbbc00 r6:c1377c34 r5:60000153
[  553.453313]  r4:eddda000
[  553.455896] [<c07dec24>] (schedule) from [<c07deff8>] (schedule_preempt_disabled+0x10/0x14)
[  553.464261]  r4:edd3c058 r3:0000000a
[  553.467910] [<c07defe8>] (schedule_preempt_disabled) from [<c07e0bbc>] (mutex_lock_nested+0x1a0/0x3e8)
[  553.477254] [<c07e0a1c>] (mutex_lock_nested) from [<c03e927c>] (dpm_complete+0xc0/0x1b0)
[  553.485358]  r10:00561408 r9:edd3c054 r8:c0b4863c r7:edddbd90 r6:c0b485d8 r5:edd3c020
[  553.493313]  r4:edd3c0d0
[  553.495896] [<c03e91bc>] (dpm_complete) from [<c03e9388>] (dpm_resume_end+0x1c/0x20)
[  553.503652]  r9:00000000 r8:c0b1a9d0 r7:c1334ec0 r6:c1334edc r5:00000003 r4:00000010
[  553.511544] [<c03e936c>] (dpm_resume_end) from [<c0079894>] (suspend_devices_and_enter+0x158/0x504)
[  553.520604]  r4:00000000 r3:c1334efc
[  553.524250] [<c007973c>] (suspend_devices_and_enter) from [<c0079e74>] (pm_suspend+0x234/0x2cc)
[  553.532961]  r10:00561408 r9:ed6b7300 r8:00000004 r7:c1334eec r6:00000000 r5:c1334ee8
[  553.540914]  r4:00000003
[  553.543493] [<c0079c40>] (pm_suspend) from [<c0078a6c>] (state_store+0x6c/0xc0)

[  555.703684] 7 locks held by kworker/u2:13/826:
[  555.708140]  #0:  ("%s""ci_otg"){++++.+}, at: [<c0045484>] process_one_work+0x128/0x420
[  555.716277]  rib#1:  ((&ci->work)){+.+.+.}, at: [<c0045484>] process_one_work+0x128/0x420
[  555.724317]  rib#2:  (usb_bus_list_lock){+.+.+.}, at: [<c04ae5e4>] usb_remove_hcd+0x98/0x1ac
[  555.732626]  rib#3:  (&dev->mutex){......}, at: [<c04a8e28>] usb_disconnect+0x48/0x224
[  555.740403]  rib#4:  (&dev->mutex){......}, at: [<c04a8e28>] usb_disconnect+0x48/0x224
[  555.748179]  rib#5:  (&dev->mutex){......}, at: [<c03dc2f4>] device_release_driver+0x20/0x34
[  555.756487]  rib#6:  (&shost->scan_mutex){+.+.+.}, at: [<c04097d0>] scsi_remove_host+0x20/0x104

Cc: <[email protected]> #v3.14+
Cc: Jun Li <[email protected]>
Signed-off-by: Peter Chen <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Apr 12, 2016
If the lower or upper directory of an overlayfs mount belong to a btrfs
file system and we fsync the file through the overlayfs' merged directory
we ended up accessing an inode that didn't belong to btrfs as if it were
a btrfs inode at btrfs_sync_file() resulting in a crash like the following:

[ 7782.588845] BUG: unable to handle kernel NULL pointer dereference at 0000000000000544
[ 7782.590624] IP: [<ffffffffa030b7ab>] btrfs_sync_file+0x11b/0x3e9 [btrfs]
[ 7782.591931] PGD 4d954067 PUD 1e878067 PMD 0
[ 7782.592016] Oops: 0002 [rib#6] PREEMPT SMP DEBUG_PAGEALLOC
[ 7782.592016] Modules linked in: btrfs overlay ppdev crc32c_generic evdev xor raid6_pq psmouse pcspkr sg serio_raw acpi_cpufreq parport_pc parport tpm_tis i2c_piix4 tpm i2c_core processor button loop autofs4 ext4 crc16 mbcache jbd2 sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix virtio_pci libata virtio_ring virtio scsi_mod e1000 floppy [last unloaded: btrfs]
[ 7782.592016] CPU: 10 PID: 16437 Comm: xfs_io Tainted: G      D         4.5.0-rc6-btrfs-next-26+ rib#1
[ 7782.592016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[ 7782.592016] task: ffff88001b8d40c0 ti: ffff880137488000 task.ti: ffff880137488000
[ 7782.592016] RIP: 0010:[<ffffffffa030b7ab>]  [<ffffffffa030b7ab>] btrfs_sync_file+0x11b/0x3e9 [btrfs]
[ 7782.592016] RSP: 0018:ffff88013748be40  EFLAGS: 00010286
[ 7782.592016] RAX: 0000000080000000 RBX: ffff880133b30c88 RCX: 0000000000000001
[ 7782.592016] RDX: 0000000000000001 RSI: ffffffff8148fec0 RDI: 00000000ffffffff
[ 7782.592016] RBP: ffff88013748bec0 R08: 0000000000000001 R09: 0000000000000000
[ 7782.624248] R10: ffff88013748be40 R11: 0000000000000246 R12: 0000000000000000
[ 7782.624248] R13: 0000000000000000 R14: 00000000009305a0 R15: ffff880015e3be40
[ 7782.624248] FS:  00007fa83b9cb700(0000) GS:ffff88023ed40000(0000) knlGS:0000000000000000
[ 7782.624248] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7782.624248] CR2: 0000000000000544 CR3: 00000001fa652000 CR4: 00000000000006e0
[ 7782.624248] Stack:
[ 7782.624248]  ffffffff8108b5cc ffff88013748bec0 0000000000000246 ffff8800b005ded0
[ 7782.624248]  ffff880133b30d60 8000000000000000 7fffffffffffffff 0000000000000246
[ 7782.624248]  0000000000000246 ffffffff81074f9b ffffffff8104357c ffff880015e3be40
[ 7782.624248] Call Trace:
[ 7782.624248]  [<ffffffff8108b5cc>] ? arch_local_irq_save+0x9/0xc
[ 7782.624248]  [<ffffffff81074f9b>] ? ___might_sleep+0xce/0x217
[ 7782.624248]  [<ffffffff8104357c>] ? __do_page_fault+0x3c0/0x43a
[ 7782.624248]  [<ffffffff811a2351>] vfs_fsync_range+0x8c/0x9e
[ 7782.624248]  [<ffffffff811a237f>] vfs_fsync+0x1c/0x1e
[ 7782.624248]  [<ffffffff811a24d6>] do_fsync+0x31/0x4a
[ 7782.624248]  [<ffffffff811a2700>] SyS_fsync+0x10/0x14
[ 7782.624248]  [<ffffffff81493617>] entry_SYSCALL_64_fastpath+0x12/0x6b
[ 7782.624248] Code: 85 c0 0f 85 e2 02 00 00 48 8b 45 b0 31 f6 4c 29 e8 48 ff c0 48 89 45 a8 48 8d 83 d8 00 00 00 48 89 c7 48 89 45 a0 e8 fc 43 18 e1 <f0> 41 ff 84 24 44 05 00 00 48 8b 83 58 ff ff ff 48 c1 e8 07 83
[ 7782.624248] RIP  [<ffffffffa030b7ab>] btrfs_sync_file+0x11b/0x3e9 [btrfs]
[ 7782.624248]  RSP <ffff88013748be40>
[ 7782.624248] CR2: 0000000000000544
[ 7782.661994] ---[ end trace 721e14960eb939bc ]---

This started happening since commit 4bacc9c (overlayfs: Make f_path
always point to the overlay and f_inode to the underlay) and even though
after this change we could still access the btrfs inode through
struct file->f_mapping->host or struct file->f_inode, we would end up
resulting in more similar issues later on at check_parent_dirs_for_sync()
because the dentry we got (from struct file->f_path.dentry) was from
overlayfs and not from btrfs, that is, we had no way of getting the dentry
that belonged to btrfs (we always got the dentry that belonged to
overlayfs).

The new patch from Miklos Szeredi, titled "vfs: add file_dentry()" and
recently submitted to linux-fsdevel, adds a file_dentry() API that allows
us to get the btrfs dentry from the input file and therefore being able
to fsync when the upper and lower directories belong to btrfs filesystems.

This issue has been reported several times by users in the mailing list
and bugzilla. A test case for xfstests is being submitted as well.

Fixes: 4bacc9c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=101951
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=109791
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
Cc: [email protected]
matt-auld pushed a commit to matt-auld/linux that referenced this pull request May 9, 2016
The bug in a workqueue leads to a stalled IO request in MQ ctx->rq_list
with the following backtrace:

[  601.347452] INFO: task kworker/u129:5:1636 blocked for more than 120 seconds.
[  601.347574]       Tainted: G           O    4.4.5-1-storage+ rib#6
[  601.347651] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  601.348142] kworker/u129:5  D ffff880803077988     0  1636      2 0x00000000
[  601.348519] Workqueue: ibnbd_server_fileio_wq ibnbd_dev_file_submit_io_worker [ibnbd_server]
[  601.348999]  ffff880803077988 ffff88080466b900 ffff8808033f9c80 ffff880803078000
[  601.349662]  ffff880807c95000 7fffffffffffffff ffffffff815b0920 ffff880803077ad0
[  601.350333]  ffff8808030779a0 ffffffff815b01d5 0000000000000000 ffff880803077a38
[  601.350965] Call Trace:
[  601.351203]  [<ffffffff815b0920>] ? bit_wait+0x60/0x60
[  601.351444]  [<ffffffff815b01d5>] schedule+0x35/0x80
[  601.351709]  [<ffffffff815b2dd2>] schedule_timeout+0x192/0x230
[  601.351958]  [<ffffffff812d43f7>] ? blk_flush_plug_list+0xc7/0x220
[  601.352208]  [<ffffffff810bd737>] ? ktime_get+0x37/0xa0
[  601.352446]  [<ffffffff815b0920>] ? bit_wait+0x60/0x60
[  601.352688]  [<ffffffff815af784>] io_schedule_timeout+0xa4/0x110
[  601.352951]  [<ffffffff815b3a4e>] ? _raw_spin_unlock_irqrestore+0xe/0x10
[  601.353196]  [<ffffffff815b093b>] bit_wait_io+0x1b/0x70
[  601.353440]  [<ffffffff815b056d>] __wait_on_bit+0x5d/0x90
[  601.353689]  [<ffffffff81127bd0>] wait_on_page_bit+0xc0/0xd0
[  601.353958]  [<ffffffff81096db0>] ? autoremove_wake_function+0x40/0x40
[  601.354200]  [<ffffffff81127cc4>] __filemap_fdatawait_range+0xe4/0x140
[  601.354441]  [<ffffffff81127d34>] filemap_fdatawait_range+0x14/0x30
[  601.354688]  [<ffffffff81129a9f>] filemap_write_and_wait_range+0x3f/0x70
[  601.354932]  [<ffffffff811ced3b>] blkdev_fsync+0x1b/0x50
[  601.355193]  [<ffffffff811c82d9>] vfs_fsync_range+0x49/0xa0
[  601.355432]  [<ffffffff811cf45a>] blkdev_write_iter+0xca/0x100
[  601.355679]  [<ffffffff81197b1a>] __vfs_write+0xaa/0xe0
[  601.355925]  [<ffffffff81198379>] vfs_write+0xa9/0x1a0
[  601.356164]  [<ffffffff811c59d8>] kernel_write+0x38/0x50

The underlying device is a null_blk, with default parameters:

  queue_mode    = MQ
  submit_queues = 1

Verification that nullb0 has something inflight:

root@pserver8:~# cat /sys/block/nullb0/inflight
       0        1
root@pserver8:~# find /sys/block/nullb0/mq/0/cpu* -name rq_list -print -exec cat {} \;
...
/sys/block/nullb0/mq/0/cpu2/rq_list
CTX pending:
        ffff8838038e2400
...

During debug it became clear that stalled request is always inserted in
the rq_list from the following path:

   save_stack_trace_tsk + 34
   blk_mq_insert_requests + 231
   blk_mq_flush_plug_list + 281
   blk_flush_plug_list + 199
   wait_on_page_bit + 192
   __filemap_fdatawait_range + 228
   filemap_fdatawait_range + 20
   filemap_write_and_wait_range + 63
   blkdev_fsync + 27
   vfs_fsync_range + 73
   blkdev_write_iter + 202
   __vfs_write + 170
   vfs_write + 169
   kernel_write + 56

So blk_flush_plug_list() was called with from_schedule == true.

If from_schedule is true, that means that finally blk_mq_insert_requests()
offloads execution of __blk_mq_run_hw_queue() and uses kblockd workqueue,
i.e. it calls kblockd_schedule_delayed_work_on().

That means, that we race with another CPU, which is about to execute
__blk_mq_run_hw_queue() work.

Further debugging shows the following traces from different CPUs:

  CPU#0                                  CPU#1
  ----------------------------------     -------------------------------
  reqeust A inserted
  STORE hctx->ctx_map[0] bit marked
  kblockd_schedule...() returns 1
  <schedule to kblockd workqueue>
                                         request B inserted
                                         STORE hctx->ctx_map[1] bit marked
                                         kblockd_schedule...() returns 0
  *** WORK PENDING bit is cleared ***
  flush_busy_ctxs() is executed, but
  bit 1, set by CPU#1, is not observed

As a result request B pended forever.

This behaviour can be explained by speculative LOAD of hctx->ctx_map on
CPU#0, which is reordered with clear of PENDING bit and executed _before_
actual STORE of bit 1 on CPU#1.

The proper fix is an explicit full barrier <mfence>, which guarantees
that clear of PENDING bit is to be executed before all possible
speculative LOADS or STORES inside actual work function.

Signed-off-by: Roman Pen <[email protected]>
Cc: Gioh Kim <[email protected]>
Cc: Michael Wang <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Tejun Heo <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request Oct 31, 2018
When the function name for an inline frame is invalid, we must not try
to demangle this symbol, otherwise we crash with:

  #0  0x0000555555895c01 in bfd_demangle ()
  rib#1  0x0000555555823262 in demangle_sym (dso=0x555555d92b90, elf_name=0x0, kmodule=0) at util/symbol-elf.c:215
  rib#2  dso__demangle_sym (dso=dso@entry=0x555555d92b90, kmodule=<optimized out>, kmodule@entry=0, elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400
  rib#3  0x00005555557fef4b in new_inline_sym (funcname=0x0, base_sym=0x555555d92b90, dso=0x555555d92b90) at util/srcline.c:89
  rib#4  inline_list__append_dso_a2l (dso=dso@entry=0x555555c7bb00, node=node@entry=0x555555e31810, sym=sym@entry=0x555555d92b90) at util/srcline.c:264
  rib#5  0x00005555557ff27f in addr2line (dso_name=dso_name@entry=0x555555d92430 "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf", addr=addr@entry=2888, file=file@entry=0x0,
      line=line@entry=0x0, dso=dso@entry=0x555555c7bb00, unwind_inlines=unwind_inlines@entry=true, node=0x555555e31810, sym=0x555555d92b90) at util/srcline.c:313
  rib#6  0x00005555557ffe7c in addr2inlines (sym=0x555555d92b90, dso=0x555555c7bb00, addr=2888, dso_name=0x555555d92430 "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf")
      at util/srcline.c:358

So instead handle the case where we get invalid function names for
inlined frames and use a fallback '??' function name instead.

While this crash was originally reported by Hadrien for rust code, I can
now also reproduce it with trivial C++ code. Indeed, it seems like
libbfd fails to interpret the debug information for the inline frame
symbol name:

  $ addr2line -e /home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf -if b48
  main
  /usr/include/c++/8.2.1/complex:610
  ??
  /usr/include/c++/8.2.1/complex:618
  ??
  /usr/include/c++/8.2.1/complex:675
  ??
  /usr/include/c++/8.2.1/complex:685
  main
  /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39

I've reported this bug upstream and also attached a patch there which
should fix this issue:

https://sourceware.org/bugzilla/show_bug.cgi?id=23715

Reported-by: Hadrien Grasland <[email protected]>
Signed-off-by: Milian Wolff <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Fixes: a64489c ("perf report: Find the inline stack for a given address")
[ The above 'Fixes:' cset is where originally the problem was
  introduced, i.e.  using a2l->funcname without checking if it is NULL,
  but this current patch fixes the current codebase, i.e. multiple csets
  were applied after a64489c before the problem was reported by Hadrien ]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request Dec 6, 2018
It was observed that a process blocked indefintely in
__fscache_read_or_alloc_page(), waiting for FSCACHE_COOKIE_LOOKING_UP
to be cleared via fscache_wait_for_deferred_lookup().

At this time, ->backing_objects was empty, which would normaly prevent
__fscache_read_or_alloc_page() from getting to the point of waiting.
This implies that ->backing_objects was cleared *after*
__fscache_read_or_alloc_page was was entered.

When an object is "killed" and then "dropped",
FSCACHE_COOKIE_LOOKING_UP is cleared in fscache_lookup_failure(), then
KILL_OBJECT and DROP_OBJECT are "called" and only in DROP_OBJECT is
->backing_objects cleared.  This leaves a window where
something else can set FSCACHE_COOKIE_LOOKING_UP and
__fscache_read_or_alloc_page() can start waiting, before
->backing_objects is cleared

There is some uncertainty in this analysis, but it seems to be fit the
observations.  Adding the wake in this patch will be handled correctly
by __fscache_read_or_alloc_page(), as it checks if ->backing_objects
is empty again, after waiting.

Customer which reported the hang, also report that the hang cannot be
reproduced with this fix.

The backtrace for the blocked process looked like:

PID: 29360  TASK: ffff881ff2ac0f80  CPU: 3   COMMAND: "zsh"
 #0 [ffff881ff43efbf8] schedule at ffffffff815e56f1
 rib#1 [ffff881ff43efc58] bit_wait at ffffffff815e64ed
 rib#2 [ffff881ff43efc68] __wait_on_bit at ffffffff815e61b8
 rib#3 [ffff881ff43efca0] out_of_line_wait_on_bit at ffffffff815e625e
 rib#4 [ffff881ff43efd08] fscache_wait_for_deferred_lookup at ffffffffa04f2e8f [fscache]
 rib#5 [ffff881ff43efd18] __fscache_read_or_alloc_page at ffffffffa04f2ffe [fscache]
 rib#6 [ffff881ff43efd58] __nfs_readpage_from_fscache at ffffffffa0679668 [nfs]
 rib#7 [ffff881ff43efd78] nfs_readpage at ffffffffa067092b [nfs]
 rib#8 [ffff881ff43efda0] generic_file_read_iter at ffffffff81187a73
 rib#9 [ffff881ff43efe50] nfs_file_read at ffffffffa066544b [nfs]
rib#10 [ffff881ff43efe70] __vfs_read at ffffffff811fc756
rib#11 [ffff881ff43efee8] vfs_read at ffffffff811fccfa
rib#12 [ffff881ff43eff18] sys_read at ffffffff811fda62
rib#13 [ffff881ff43eff50] entry_SYSCALL_64_fastpath at ffffffff815e986e

Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: David Howells <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request Dec 19, 2018
Function graph tracing recurses into itself when stackleak is enabled,
causing the ftrace graph selftest to run for up to 90 seconds and
trigger the softlockup watchdog.

Breakpoint 2, ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:200
200             mcount_get_lr_addr        x0    //     pointer to function's saved lr
(gdb) bt
\#0  ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:200
\rib#1  0xffffff80081d5280 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:153
\rib#2  0xffffff8008555484 in stackleak_track_stack () at ../kernel/stackleak.c:106
\rib#3  0xffffff8008421ff8 in ftrace_ops_test (ops=0xffffff8009eaa840 <graph_ops>, ip=18446743524091297036, regs=<optimized out>) at ../kernel/trace/ftrace.c:1507
\rib#4  0xffffff8008428770 in __ftrace_ops_list_func (regs=<optimized out>, ignored=<optimized out>, parent_ip=<optimized out>, ip=<optimized out>) at ../kernel/trace/ftrace.c:6286
\rib#5  ftrace_ops_no_ops (ip=18446743524091297036, parent_ip=18446743524091242824) at ../kernel/trace/ftrace.c:6321
\rib#6  0xffffff80081d5280 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:153
\rib#7  0xffffff800832fd10 in irq_find_mapping (domain=0xffffffc03fc4bc80, hwirq=27) at ../kernel/irq/irqdomain.c:876
\rib#8  0xffffff800832294c in __handle_domain_irq (domain=0xffffffc03fc4bc80, hwirq=27, lookup=true, regs=0xffffff800814b840) at ../kernel/irq/irqdesc.c:650
\rib#9  0xffffff80081d52b4 in ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:205

Rework so we mark stackleak_track_stack as notrace

Co-developed-by: Arnd Bergmann <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Anders Roxell <[email protected]>
Acked-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request Dec 19, 2018
The *_frag_reasm() functions are susceptible to miscalculating the byte
count of packet fragments in case the truesize of a head buffer changes.
The truesize member may be changed by the call to skb_unclone(), leaving
the fragment memory limit counter unbalanced even if all fragments are
processed. This miscalculation goes unnoticed as long as the network
namespace which holds the counter is not destroyed.

Should an attempt be made to destroy a network namespace that holds an
unbalanced fragment memory limit counter the cleanup of the namespace
never finishes. The thread handling the cleanup gets stuck in
inet_frags_exit_net() waiting for the percpu counter to reach zero. The
thread is usually in running state with a stacktrace similar to:

 PID: 1073   TASK: ffff880626711440  CPU: 1   COMMAND: "kworker/u48:4"
  rib#5 [ffff880621563d48] _raw_spin_lock at ffffffff815f5480
  rib#6 [ffff880621563d48] inet_evict_bucket at ffffffff8158020b
  rib#7 [ffff880621563d80] inet_frags_exit_net at ffffffff8158051c
  rib#8 [ffff880621563db0] ops_exit_list at ffffffff814f5856
  rib#9 [ffff880621563dd8] cleanup_net at ffffffff814f67c0
 rib#10 [ffff880621563e38] process_one_work at ffffffff81096f14

It is not possible to create new network namespaces, and processes
that call unshare() end up being stuck in uninterruptible sleep state
waiting to acquire the net_mutex.

The bug was observed in the IPv6 netfilter code by Per Sundstrom.
I thank him for his analysis of the problem. The parts of this patch
that apply to IPv4 and IPv6 fragment reassembly are preemptive measures.

Signed-off-by: Jiri Wiesner <[email protected]>
Reported-by: Per Sundstrom <[email protected]>
Acked-by: Peter Oskolkov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request Feb 14, 2019
Ido Schimmel says:

====================
mlxsw: Various fixes

This patchset contains small fixes in mlxsw and one fix in the bridge
driver.

Patches rib#1-rib#4 perform small adjustments in PCI and FID code following
recent tests that were performed on the Spectrum-2 ASIC.

Patch rib#5 fixes the bridge driver to mark FDB entries that were added by
user as such. Otherwise, these entries will be ignored by underlying
switch drivers.

Patch rib#6 fixes a long standing issue in mlxsw where the driver
incorrectly programmed static FDB entries as both static and sticky.

Patches rib#7-rib#8 add test cases for above mentioned bugs.

Please consider patches rib#1, rib#2 and rib#4 for stable.
====================

Signed-off-by: David S. Miller <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request Feb 14, 2019
When option CONFIG_KASAN is enabled toghether with ftrace, function
ftrace_graph_caller() gets in to a recursion, via functions
kasan_check_read() and kasan_check_write().

 Breakpoint 2, ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:179
 179             mcount_get_pc             x0    //     function's pc
 (gdb) bt
 #0  ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:179
 rib#1  0xffffff90101406c8 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:151
 rib#2  0xffffff90106fd084 in kasan_check_write (p=0xffffffc06c170878, size=4) at ../mm/kasan/common.c:105
 rib#3  0xffffff90104a2464 in atomic_add_return (v=<optimized out>, i=<optimized out>) at ./include/generated/atomic-instrumented.h:71
 rib#4  atomic_inc_return (v=<optimized out>) at ./include/generated/atomic-fallback.h:284
 rib#5  trace_graph_entry (trace=0xffffffc03f5ff380) at ../kernel/trace/trace_functions_graph.c:441
 rib#6  0xffffff9010481774 in trace_graph_entry_watchdog (trace=<optimized out>) at ../kernel/trace/trace_selftest.c:741
 rib#7  0xffffff90104a185c in function_graph_enter (ret=<optimized out>, func=<optimized out>, frame_pointer=18446743799894897728, retp=<optimized out>) at ../kernel/trace/trace_functions_graph.c:196
 rib#8  0xffffff9010140628 in prepare_ftrace_return (self_addr=18446743592948977792, parent=0xffffffc03f5ff418, frame_pointer=18446743799894897728) at ../arch/arm64/kernel/ftrace.c:231
 rib#9  0xffffff90101406f4 in ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:182
 Backtrace stopped: previous frame identical to this frame (corrupt stack?)
 (gdb)

Rework so that the kasan implementation isn't traced.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Anders Roxell <[email protected]>
Acked-by: Dmitry Vyukov <[email protected]>
Tested-by: Dmitry Vyukov <[email protected]>
Acked-by: Steven Rostedt (VMware) <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Mar 29, 2019
…_map

Detected via gcc's ASan:

  Direct leak of 2048 byte(s) in 64 object(s) allocated from:
    6     #0 0x7f606512e370 in __interceptor_realloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee370)
    7     rib#1 0x556b0f1d7ddd in thread_map__realloc util/thread_map.c:43
    8     rib#2 0x556b0f1d84c7 in thread_map__new_by_tid util/thread_map.c:85
    9     rib#3 0x556b0f0e045e in is_event_supported util/parse-events.c:2250
   10     rib#4 0x556b0f0e1aa1 in print_hwcache_events util/parse-events.c:2382
   11     rib#5 0x556b0f0e3231 in print_events util/parse-events.c:2514
   12     rib#6 0x556b0ee0a66e in cmd_list /home/changbin/work/linux/tools/perf/builtin-list.c:58
   13     rib#7 0x556b0f01e0ae in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
   14     rib#8 0x556b0f01e859 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
   15     rib#9 0x556b0f01edc8 in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
   16     rib#10 0x556b0f01f71f in main /home/changbin/work/linux/tools/perf/perf.c:520
   17     rib#11 0x7f6062ccf09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

Signed-off-by: Changbin Du <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Fixes: 8989605 ("perf tools: Do not put a variable sized type not at the end of a struct")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Mar 29, 2019
Detected with gcc's ASan:

  Direct leak of 66 byte(s) in 5 object(s) allocated from:
      #0 0x7ff3b1f32070 in __interceptor_strdup (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x3b070)
      rib#1 0x560c8761034d in collect_config util/config.c:597
      rib#2 0x560c8760d9cb in get_value util/config.c:169
      rib#3 0x560c8760dfd7 in perf_parse_file util/config.c:285
      rib#4 0x560c8760e0d2 in perf_config_from_file util/config.c:476
      rib#5 0x560c876108fd in perf_config_set__init util/config.c:661
      rib#6 0x560c87610c72 in perf_config_set__new util/config.c:709
      rib#7 0x560c87610d2f in perf_config__init util/config.c:718
      rib#8 0x560c87610e5d in perf_config util/config.c:730
      rib#9 0x560c875ddea0 in main /home/changbin/work/linux/tools/perf/perf.c:442
      rib#10 0x7ff3afb8609a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

Signed-off-by: Changbin Du <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Cc: Taeung Song <[email protected]>
Fixes: 20105ca ("perf config: Introduce perf_config_set class")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Mar 29, 2019
Detected with gcc's ASan:

  Direct leak of 4356 byte(s) in 120 object(s) allocated from:
      #0 0x7ff1a2b5a070 in __interceptor_strdup (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x3b070)
      rib#1 0x55719aef4814 in build_id_cache__origname util/build-id.c:215
      rib#2 0x55719af649b6 in print_sdt_events util/parse-events.c:2339
      rib#3 0x55719af66272 in print_events util/parse-events.c:2542
      rib#4 0x55719ad1ecaa in cmd_list /home/changbin/work/linux/tools/perf/builtin-list.c:58
      rib#5 0x55719aec745d in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      rib#6 0x55719aec7d1a in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      rib#7 0x55719aec8184 in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      rib#8 0x55719aeca41a in main /home/changbin/work/linux/tools/perf/perf.c:520
      rib#9 0x7ff1a07ae09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

Signed-off-by: Changbin Du <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Fixes: 40218da ("perf list: Show SDT and pre-cached events")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Mar 29, 2019
…r-free issue

The evlist should be destroyed before the perf session.

Detected with gcc's ASan:

  =================================================================
  ==27350==ERROR: AddressSanitizer: heap-use-after-free on address 0x62b000002e38 at pc 0x5611da276999 bp 0x7ffce8f1d1a0 sp 0x7ffce8f1d190
  WRITE of size 8 at 0x62b000002e38 thread T0
      #0 0x5611da276998 in __list_del /home/work/linux/tools/include/linux/list.h:89
      rib#1 0x5611da276d4a in __list_del_entry /home/work/linux/tools/include/linux/list.h:102
      rib#2 0x5611da276e77 in list_del_init /home/work/linux/tools/include/linux/list.h:145
      rib#3 0x5611da2781cd in thread__put util/thread.c:130
      rib#4 0x5611da2cc0a8 in __thread__zput util/thread.h:68
      rib#5 0x5611da2d2dcb in hist_entry__delete util/hist.c:1148
      rib#6 0x5611da2cdf91 in hists__delete_entry util/hist.c:337
      rib#7 0x5611da2ce19e in hists__delete_entries util/hist.c:365
      rib#8 0x5611da2db2ab in hists__delete_all_entries util/hist.c:2639
      rib#9 0x5611da2db325 in hists_evsel__exit util/hist.c:2651
      rib#10 0x5611da1c5352 in perf_evsel__exit util/evsel.c:1304
      rib#11 0x5611da1c5390 in perf_evsel__delete util/evsel.c:1309
      rib#12 0x5611da1b35f0 in perf_evlist__purge util/evlist.c:124
      rib#13 0x5611da1b38e2 in perf_evlist__delete util/evlist.c:148
      rib#14 0x5611da069781 in cmd_top /home/changbin/work/linux/tools/perf/builtin-top.c:1645
      rib#15 0x5611da17d038 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      rib#16 0x5611da17d577 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      rib#17 0x5611da17d97b in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      rib#18 0x5611da17e0e9 in main /home/changbin/work/linux/tools/perf/perf.c:520
      rib#19 0x7fdcc970f09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
      rib#20 0x5611d9ff35c9 in _start (/home/work/linux/tools/perf/perf+0x3e95c9)

  0x62b000002e38 is located 11320 bytes inside of 27448-byte region [0x62b000000200,0x62b000006d38)
  freed by thread T0 here:
      #0 0x7fdccb04ab70 in free (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedb70)
      rib#1 0x5611da260df4 in perf_session__delete util/session.c:201
      rib#2 0x5611da063de5 in __cmd_top /home/changbin/work/linux/tools/perf/builtin-top.c:1300
      rib#3 0x5611da06973c in cmd_top /home/changbin/work/linux/tools/perf/builtin-top.c:1642
      rib#4 0x5611da17d038 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      rib#5 0x5611da17d577 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      rib#6 0x5611da17d97b in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      rib#7 0x5611da17e0e9 in main /home/changbin/work/linux/tools/perf/perf.c:520
      rib#8 0x7fdcc970f09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

  previously allocated by thread T0 here:
      #0 0x7fdccb04b138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
      rib#1 0x5611da26010c in zalloc util/util.h:23
      rib#2 0x5611da260824 in perf_session__new util/session.c:118
      rib#3 0x5611da0633a6 in __cmd_top /home/changbin/work/linux/tools/perf/builtin-top.c:1192
      rib#4 0x5611da06973c in cmd_top /home/changbin/work/linux/tools/perf/builtin-top.c:1642
      rib#5 0x5611da17d038 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      rib#6 0x5611da17d577 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      rib#7 0x5611da17d97b in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      rib#8 0x5611da17e0e9 in main /home/changbin/work/linux/tools/perf/perf.c:520
      rib#9 0x7fdcc970f09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

  SUMMARY: AddressSanitizer: heap-use-after-free /home/work/linux/tools/include/linux/list.h:89 in __list_del
  Shadow bytes around the buggy address:
    0x0c567fff8570: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c567fff8580: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c567fff8590: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c567fff85a0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c567fff85b0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  =>0x0c567fff85c0: fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd fd fd
    0x0c567fff85d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c567fff85e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c567fff85f0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c567fff8600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c567fff8610: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable:           00
    Partially addressable: 01 02 03 04 05 06 07
    Heap left redzone:       fa
    Freed heap region:       fd
    Stack left redzone:      f1
    Stack mid redzone:       f2
    Stack right redzone:     f3
    Stack after return:      f5
    Stack use after scope:   f8
    Global redzone:          f9
    Global init order:       f6
    Poisoned by user:        f7
    Container overflow:      fc
    Array cookie:            ac
    Intra object redzone:    bb
    ASan internal:           fe
    Left alloca redzone:     ca
    Right alloca redzone:    cb
  ==27350==ABORTING

Signed-off-by: Changbin Du <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Mar 29, 2019
The array str[] should have six elements.

  =================================================================
  ==4322==ERROR: AddressSanitizer: global-buffer-overflow on address 0x56463844e300 at pc 0x564637e7ad0d bp 0x7f30c8c89d10 sp 0x7f30c8c89d00
  READ of size 8 at 0x56463844e300 thread T9
      #0 0x564637e7ad0c in __ordered_events__flush util/ordered-events.c:316
      rib#1 0x564637e7b0e4 in ordered_events__flush util/ordered-events.c:338
      rib#2 0x564637c6a57d in process_thread /home/changbin/work/linux/tools/perf/builtin-top.c:1073
      rib#3 0x7f30d173a163 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x8163)
      rib#4 0x7f30cfffbdee in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x11adee)

  0x56463844e300 is located 32 bytes to the left of global variable 'flags' defined in 'util/trace-event-parse.c:229:26' (0x56463844e320) of size 192
  0x56463844e300 is located 0 bytes to the right of global variable 'str' defined in 'util/ordered-events.c:268:28' (0x56463844e2e0) of size 32
  SUMMARY: AddressSanitizer: global-buffer-overflow util/ordered-events.c:316 in __ordered_events__flush
  Shadow bytes around the buggy address:
    0x0ac947081c10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ac947081c20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ac947081c30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ac947081c40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ac947081c50: 00 00 00 00 00 00 00 00 f9 f9 f9 f9 00 00 00 00
  =>0x0ac947081c60:[f9]f9 f9 f9 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ac947081c70: 00 00 00 00 00 00 00 00 00 00 00 00 f9 f9 f9 f9
    0x0ac947081c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ac947081c90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ac947081ca0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ac947081cb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable:           00
    Partially addressable: 01 02 03 04 05 06 07
    Heap left redzone:       fa
    Freed heap region:       fd
    Stack left redzone:      f1
    Stack mid redzone:       f2
    Stack right redzone:     f3
    Stack after return:      f5
    Stack use after scope:   f8
    Global redzone:          f9
    Global init order:       f6
    Poisoned by user:        f7
    Container overflow:      fc
    Array cookie:            ac
    Intra object redzone:    bb
    ASan internal:           fe
    Left alloca redzone:     ca
    Right alloca redzone:    cb
  Thread T9 created by T0 here:
      #0 0x7f30d179de5f in __interceptor_pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x4ae5f)
      rib#1 0x564637c6b954 in __cmd_top /home/changbin/work/linux/tools/perf/builtin-top.c:1253
      rib#2 0x564637c7173c in cmd_top /home/changbin/work/linux/tools/perf/builtin-top.c:1642
      rib#3 0x564637d85038 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      rib#4 0x564637d85577 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      rib#5 0x564637d8597b in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      rib#6 0x564637d860e9 in main /home/changbin/work/linux/tools/perf/perf.c:520
      rib#7 0x7f30cff0509a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

Signed-off-by: Changbin Du <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Cc: Jiri Olsa <[email protected]>
Fixes: 16c66bc ("perf top: Add processing thread")
Fixes: 68ca5d0 ("perf ordered_events: Add ordered_events__flush_time interface")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Mar 29, 2019
Using gcc's ASan, Changbin reports:

  =================================================================
  ==7494==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 48 byte(s) in 1 object(s) allocated from:
      #0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
      rib#1 0x5625e5330a5e in zalloc util/util.h:23
      rib#2 0x5625e5330a9b in perf_counts__new util/counts.c:10
      rib#3 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47
      rib#4 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505
      rib#5 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347
      rib#6 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47
      rib#7 0x5625e51528e6 in run_test tests/builtin-test.c:358
      rib#8 0x5625e5152baf in test_and_print tests/builtin-test.c:388
      rib#9 0x5625e51543fe in __cmd_test tests/builtin-test.c:583
      rib#10 0x5625e515572f in cmd_test tests/builtin-test.c:722
      rib#11 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      rib#12 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      rib#13 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      rib#14 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520
      rib#15 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

  Indirect leak of 72 byte(s) in 1 object(s) allocated from:
      #0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
      rib#1 0x5625e532560d in zalloc util/util.h:23
      rib#2 0x5625e532566b in xyarray__new util/xyarray.c:10
      rib#3 0x5625e5330aba in perf_counts__new util/counts.c:15
      rib#4 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47
      rib#5 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505
      rib#6 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347
      rib#7 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47
      rib#8 0x5625e51528e6 in run_test tests/builtin-test.c:358
      rib#9 0x5625e5152baf in test_and_print tests/builtin-test.c:388
      rib#10 0x5625e51543fe in __cmd_test tests/builtin-test.c:583
      rib#11 0x5625e515572f in cmd_test tests/builtin-test.c:722
      rib#12 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      rib#13 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      rib#14 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      rib#15 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520
      rib#16 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

His patch took care of evsel->prev_raw_counts, but the above backtraces
are about evsel->counts, so fix that instead.

Reported-by: Changbin Du <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: https://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Mar 29, 2019
…_event_on_all_cpus test

  =================================================================
  ==7497==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 40 byte(s) in 1 object(s) allocated from:
      #0 0x7f0333a88f30 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedf30)
      rib#1 0x5625e5326213 in cpu_map__trim_new util/cpumap.c:45
      rib#2 0x5625e5326703 in cpu_map__read util/cpumap.c:103
      rib#3 0x5625e53267ef in cpu_map__read_all_cpu_map util/cpumap.c:120
      rib#4 0x5625e5326915 in cpu_map__new util/cpumap.c:135
      rib#5 0x5625e517b355 in test__openat_syscall_event_on_all_cpus tests/openat-syscall-all-cpus.c:36
      rib#6 0x5625e51528e6 in run_test tests/builtin-test.c:358
      rib#7 0x5625e5152baf in test_and_print tests/builtin-test.c:388
      rib#8 0x5625e51543fe in __cmd_test tests/builtin-test.c:583
      rib#9 0x5625e515572f in cmd_test tests/builtin-test.c:722
      rib#10 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      rib#11 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      rib#12 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      rib#13 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520
      rib#14 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

Signed-off-by: Changbin Du <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Fixes: f30a79b ("perf tools: Add reference counting for cpu_map object")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Mar 29, 2019
  =================================================================
  ==7506==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 13 byte(s) in 3 object(s) allocated from:
      #0 0x7f03339d6070 in __interceptor_strdup (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x3b070)
      rib#1 0x5625e53aaef0 in expr__find_other util/expr.y:221
      rib#2 0x5625e51bcd3f in test__expr tests/expr.c:52
      rib#3 0x5625e51528e6 in run_test tests/builtin-test.c:358
      rib#4 0x5625e5152baf in test_and_print tests/builtin-test.c:388
      rib#5 0x5625e51543fe in __cmd_test tests/builtin-test.c:583
      rib#6 0x5625e515572f in cmd_test tests/builtin-test.c:722
      rib#7 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      rib#8 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      rib#9 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      rib#10 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520
      rib#11 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

Signed-off-by: Changbin Du <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Fixes: 0751673 ("perf tools: Add a simple expression parser for JSON")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Mar 29, 2019
  =================================================================
  ==20875==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 1160 byte(s) in 1 object(s) allocated from:
      #0 0x7f1b6fc84138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
      rib#1 0x55bd50005599 in zalloc util/util.h:23
      rib#2 0x55bd500068f5 in perf_evsel__newtp_idx util/evsel.c:327
      rib#3 0x55bd4ff810fc in perf_evsel__newtp /home/work/linux/tools/perf/util/evsel.h:216
      rib#4 0x55bd4ff81608 in test__perf_evsel__tp_sched_test tests/evsel-tp-sched.c:69
      rib#5 0x55bd4ff528e6 in run_test tests/builtin-test.c:358
      rib#6 0x55bd4ff52baf in test_and_print tests/builtin-test.c:388
      rib#7 0x55bd4ff543fe in __cmd_test tests/builtin-test.c:583
      rib#8 0x55bd4ff5572f in cmd_test tests/builtin-test.c:722
      rib#9 0x55bd4ffc4087 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      rib#10 0x55bd4ffc45c6 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      rib#11 0x55bd4ffc49ca in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      rib#12 0x55bd4ffc5138 in main /home/changbin/work/linux/tools/perf/perf.c:520
      rib#13 0x7f1b6e34809a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

  Indirect leak of 19 byte(s) in 1 object(s) allocated from:
      #0 0x7f1b6fc83f30 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedf30)
      rib#1 0x7f1b6e3ac30f in vasprintf (/lib/x86_64-linux-gnu/libc.so.6+0x8830f)

Signed-off-by: Changbin Du <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Fixes: 6a6cd11 ("perf test: Add test for the sched tracepoint format fields")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Apr 2, 2019
gpio-aspeed implements support for PIN_CONFIG_INPUT_DEBOUNCE. As of
v5.1-rc1 we're seeing the following when booting a Romulus BMC kernel:

> [   21.373137] ------------[ cut here ]------------
> [   21.374545] WARNING: CPU: 0 PID: 1 at drivers/gpio/gpio-aspeed.c:834 unregister_allocated_timer+0x38/0x94
> [   21.376181] No timer allocated to offset 74
> [   21.377672] CPU: 0 PID: 1 Comm: swapper Not tainted 5.1.0-rc1-dirty rib#6
> [   21.378800] Hardware name: Generic DT based system
> [   21.379965] Backtrace:
> [   21.381024] [<80107d44>] (dump_backtrace) from [<80107f78>] (show_stack+0x20/0x24)
> [   21.382713]  r7:8038b720 r6:00000009 r5:00000000 r4:87897c64
> [   21.383815] [<80107f58>] (show_stack) from [<80656398>] (dump_stack+0x20/0x28)
> [   21.385042] [<80656378>] (dump_stack) from [<80115f1c>] (__warn.part.3+0xb4/0xdc)
> [   21.386253] [<80115e68>] (__warn.part.3) from [<80115fb0>] (warn_slowpath_fmt+0x6c/0x90)
> [   21.387471]  r6:00000342 r5:807f8758 r4:80a07008
> [   21.388278] [<80115f48>] (warn_slowpath_fmt) from [<8038b720>] (unregister_allocated_timer+0x38/0x94)
> [   21.389809]  r3:0000004a r2:807f8774
> [   21.390526]  r7:00000000 r6:0000000a r5:60000153 r4:0000004a
> [   21.391601] [<8038b6e8>] (unregister_allocated_timer) from [<8038baac>] (aspeed_gpio_set_config+0x330/0x48c)
> [   21.393248] [<8038b77c>] (aspeed_gpio_set_config) from [<803840b0>] (gpiod_set_debounce+0xe8/0x114)
> [   21.394745]  r10:82ee2248 r9:00000000 r8:87b63a00 r7:00001388 r6:87947320 r5:80729310
> [   21.396030]  r4:879f64a0
> [   21.396499] [<80383fc8>] (gpiod_set_debounce) from [<804b4350>] (gpio_keys_probe+0x69c/0x8e0)
> [   21.397715]  r7:845d94b8 r6:00000001 r5:00000000 r4:87b63a1c
> [   21.398618] [<804b3cb4>] (gpio_keys_probe) from [<8040eeec>] (platform_dev_probe+0x44/0x80)
> [   21.399834]  r10:00000003 r9:80a3a8b0 r8:00000000 r7:00000000 r6:80a7f9dc r5:80a3a8b0
> [   21.401163]  r4:8796bc10
> [   21.401634] [<8040eea8>] (platform_drv_probe) from [<8040d0d4>] (really_probe+0x208/0x3dc)
> [   21.402786]  r5:80a7f8d0 r4:8796bc10
> [   21.403547] [<8040cecc>] (really_probe) from [<8040d7a4>] (driver_probe_device+0x130/0x170)
> [   21.404744]  r10:0000007b r9:8093683c r8:00000000 r7:80a07008 r6:80a3a8b0 r5:8796bc10
> [   21.405854]  r4:80a3a8b0
> [   21.406324] [<8040d674>] (driver_probe_device) from [<8040da8c>] (device_driver_attach+0x68/0x70)
> [   21.407568]  r9:8093683c r8:00000000 r7:80a07008 r6:80a3a8b0 r5:00000000 r4:8796bc10
> [   21.408877] [<8040da24>] (device_driver_attach) from [<8040db14>] (__driver_attach+0x80/0x150)
> [   21.410327]  r7:80a07008 r6:8796bc10 r5:00000001 r4:80a3a8b0
> [   21.411294] [<8040da94>] (__driver_attach) from [<8040b20c>] (bus_for_each_dev+0x80/0xc4)
> [   21.412641]  r7:80a07008 r6:8040da94 r5:80a3a8b0 r4:87966f30
> [   21.413580] [<8040b18c>] (bus_for_each_dev) from [<8040dc0c>] (driver_attach+0x28/0x30)
> [   21.414943]  r7:00000000 r6:87b411e0 r5:80a33fc8 r4:80a3a8b0
> [   21.415927] [<8040dbe4>] (driver_attach) from [<8040bbf0>] (bus_add_driver+0x14c/0x200)
> [   21.417289] [<8040baa4>] (bus_add_driver) from [<8040e2b4>] (driver_register+0x84/0x118)
> [   21.418652]  r7:80a60ae0 r6:809226b8 r5:80a07008 r4:80a3a8b0
> [   21.419652] [<8040e230>] (driver_register) from [<8040fc28>] (__platform_driver_register+0x3c/0x50)
> [   21.421193]  r5:80a07008 r4:809525f8
> [   21.421990] [<8040fbec>] (__platform_driver_register) from [<809226d8>] (gpio_keys_init+0x20/0x28)
> [   21.423447] [<809226b8>] (gpio_keys_init) from [<8090128c>] (do_one_initcall+0x80/0x180)
> [   21.424886] [<8090120c>] (do_one_initcall) from [<80901538>] (kernel_init_freeable+0x1ac/0x26c)
> [   21.426354]  r8:80a60ae0 r7:80a60ae0 r6:8093685c r5:00000008 r4:809525f8
> [   21.427579] [<8090138c>] (kernel_init_freeable) from [<8066d9a0>] (kernel_init+0x18/0x11c)
> [   21.428819]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:8066d988
> [   21.429947]  r4:00000000
> [   21.430415] [<8066d988>] (kernel_init) from [<801010e8>] (ret_from_fork+0x14/0x2c)
> [   21.431666] Exception stack(0x87897fb0 to 0x87897ff8)
> [   21.432877] 7fa0:                                     00000000 00000000 00000000 00000000
> [   21.434446] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [   21.436052] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [   21.437308]  r5:8066d988 r4:00000000
> [   21.438102] ---[ end trace d7d7ac3a80567d0e ]---

We only hit unregister_allocated_timer() if the argument to
aspeed_gpio_set_config() is 0, but we can't be calling through
gpiod_set_debounce() from gpio_keys_probe() unless the gpio-keys button has a
non-zero debounce interval.

Commit 6581eaf ("gpio: use new gpio_set_config() helper in more places")
spreads the use of gpio_set_config() to the debounce and transitory
state configuration paths. The implementation of gpio_set_config() is:

> static int gpio_set_config(struct gpio_chip *gc, unsigned offset,
> 			   enum pin_config_param mode)
> {
> 	unsigned long config = { PIN_CONF_PACKED(mode, 0) };
>
> 	return gc->set_config ? gc->set_config(gc, offset, config) : -ENOTSUPP;
> }

Here it packs its own config value with a fixed argument of 0; this is
incorrect behaviour for implementing the debounce and transitory functions, and
the debounce and transitory gpio_set_config() call-sites now have an undetected
type mismatch as they both already pack their own config parameter (i.e. what
gets passed is not an `enum pin_config_param`). Indeed this can be seen in the
small diff for 6581eaf:

> diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
> index de595fa31a1a..1f239aac43df 100644
> --- a/drivers/gpio/gpiolib.c
> +++ b/drivers/gpio/gpiolib.c
> @@ -2725,7 +2725,7 @@ int gpiod_set_debounce(struct gpio_desc *desc, unsigned debounce)
>         }
>
>         config = pinconf_to_config_packed(PIN_CONFIG_INPUT_DEBOUNCE, debounce);
> -       return chip->set_config(chip, gpio_chip_hwgpio(desc), config);
> +       return gpio_set_config(chip, gpio_chip_hwgpio(desc), config);
>  }
>  EXPORT_SYMBOL_GPL(gpiod_set_debounce);
>
> @@ -2762,7 +2762,7 @@ int gpiod_set_transitory(struct gpio_desc *desc, bool transitory)
>         packed = pinconf_to_config_packed(PIN_CONFIG_PERSIST_STATE,
>                                           !transitory);
>         gpio = gpio_chip_hwgpio(desc);
> -       rc = chip->set_config(chip, gpio, packed);
> +       rc = gpio_set_config(chip, gpio, packed);
>         if (rc == -ENOTSUPP) {
>                 dev_dbg(&desc->gdev->dev, "Persistence not supported for GPIO %d\n",
>                                 gpio);

Revert commit 6581eaf ("gpio: use new gpio_set_config() helper in
more places") to restore correct behaviour for gpiod_set_debounce() and
gpiod_set_transitory().

Cc: Thomas Petazzoni <[email protected]>
Signed-off-by: Andrew Jeffery <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request May 17, 2019
Ido Schimmel says:

====================
mlxsw: Various fixes

This patchset contains various small fixes for mlxsw.

Patch rib#1 fixes a warning generated by switchdev core when the driver
fails to insert an MDB entry in the commit phase.

Patches rib#2-rib#4 fix a warning in check_flush_dependency() that can be
triggered when a work item in a WQ_MEM_RECLAIM workqueue tries to flush
a non-WQ_MEM_RECLAIM workqueue.

It seems that the semantics of the WQ_MEM_RECLAIM flag are not very
clear [1] and that various patches have been sent to remove it from
various workqueues throughout the kernel [2][3][4] in order to silence
the warning.

These patches do the same for the workqueues created by mlxsw that
probably should not have been created with this flag in the first place.

Patch rib#5 fixes a regression where an IP address cannot be assigned to a
VRF upper due to erroneous MAC validation check. Patch rib#6 adds a test
case.

Patch rib#7 adjusts Spectrum-2 shared buffer configuration to be compatible
with Spectrum-1. The problem and fix are described in detail in the
commit message.

Please consider patches rib#1-rib#5 for 5.0.y. I verified they apply cleanly.

[1] https://patchwork.kernel.org/patch/10791315/
[2] Commit ce162bf ("mac80211_hwsim: don't use WQ_MEM_RECLAIM")
[3] Commit 39baf10 ("IB/core: Fix use workqueue without WQ_MEM_RECLAIM")
[4] Commit 75215e5 ("iwcm: Don't allocate iwcm workqueue with WQ_MEM_RECLAIM")
====================

Signed-off-by: David S. Miller <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request May 17, 2019
By calling maps__insert() we assume to get 2 references on the map,
which we relese within maps__remove call.

However if there's already same map name, we currently don't bump the
reference and can crash, like:

  Program received signal SIGABRT, Aborted.
  0x00007ffff75e60f5 in raise () from /lib64/libc.so.6

  (gdb) bt
  #0  0x00007ffff75e60f5 in raise () from /lib64/libc.so.6
  rib#1  0x00007ffff75d0895 in abort () from /lib64/libc.so.6
  rib#2  0x00007ffff75d0769 in __assert_fail_base.cold () from /lib64/libc.so.6
  rib#3  0x00007ffff75de596 in __assert_fail () from /lib64/libc.so.6
  rib#4  0x00000000004fc006 in refcount_sub_and_test (i=1, r=0x1224e88) at tools/include/linux/refcount.h:131
  rib#5  refcount_dec_and_test (r=0x1224e88) at tools/include/linux/refcount.h:148
  rib#6  map__put (map=0x1224df0) at util/map.c:299
  rib#7  0x00000000004fdb95 in __maps__remove (map=0x1224df0, maps=0xb17d80) at util/map.c:953
  rib#8  maps__remove (maps=0xb17d80, map=0x1224df0) at util/map.c:959
  rib#9  0x00000000004f7d8a in map_groups__remove (map=<optimized out>, mg=<optimized out>) at util/map_groups.h:65
  rib#10 machine__process_ksymbol_unregister (sample=<optimized out>, event=0x7ffff7279670, machine=<optimized out>) at util/machine.c:728
  rib#11 machine__process_ksymbol (machine=<optimized out>, event=0x7ffff7279670, sample=<optimized out>) at util/machine.c:741
  rib#12 0x00000000004fffbb in perf_session__deliver_event (session=0xb11390, event=0x7ffff7279670, tool=0x7fffffffc7b0, file_offset=13936) at util/session.c:1362
  rib#13 0x00000000005039bb in do_flush (show_progress=false, oe=0xb17e80) at util/ordered-events.c:243
  rib#14 __ordered_events__flush (oe=0xb17e80, how=OE_FLUSH__ROUND, timestamp=<optimized out>) at util/ordered-events.c:322
  rib#15 0x00000000005005e4 in perf_session__process_user_event (session=session@entry=0xb11390, event=event@entry=0x7ffff72a4af8,
  ...

Add the map to the list and getting the reference event if we find the
map with same name.

Signed-off-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Eric Saint-Etienne <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Fixes: 1e62856 ("perf symbols: Fix slowness due to -ffunction-section")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request Jun 12, 2019
… allocation

After memory allocation failure vc_allocate() doesn't clean up data
which has been initialized in visual_init(). In case of fbcon this
leads to divide-by-0 in fbcon_init() on next open of the same tty.

memory allocation in vc_allocate() may fail here:
1097:     vc->vc_screenbuf = kzalloc(vc->vc_screenbuf_size, GFP_KERNEL);

on next open() fbcon_init() skips vc_font.data initialization:
1088:     if (!p->fontdata) {

division by zero in fbcon_init() happens here:
1149:     new_cols /= vc->vc_font.width;

Additional check is needed in fbcon_deinit() to prevent
usage of uninitialized vc_screenbuf:

1251:        if (vc->vc_hi_font_mask && vc->vc_screenbuf)
1252:                set_vc_hi_font(vc, false);

Crash:

 rib#6 [ffffc90001eafa60] divide_error at ffffffff81a00be4
    [exception RIP: fbcon_init+463]
    RIP: ffffffff814b860f  RSP: ffffc90001eafb18  RFLAGS: 00010246
...
 rib#7 [ffffc90001eafb60] visual_init at ffffffff8154c36e
 rib#8 [ffffc90001eafb80] vc_allocate at ffffffff8154f53c
 rib#9 [ffffc90001eafbc8] con_install at ffffffff8154f624
...

Signed-off-by: Grzegorz Halat <[email protected]>
Reviewed-by: Oleksandr Natalenko <[email protected]>
Acked-by: Bartlomiej Zolnierkiewicz <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request Jun 12, 2019
When we have holes in a normal memory zone, we could endup having
cached_migrate_pfns which may not necessarily be valid, under heavy memory
pressure with swapping enabled ( via __reset_isolation_suitable(),
triggered by kswapd).

Later if we fail to find a page via fast_isolate_freepages(), we may end
up using the migrate_pfn we started the search with, as valid page.  This
could lead to accessing NULL pointer derefernces like below, due to an
invalid mem_section pointer.

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [47/1825]
 Mem abort info:
   ESR = 0x96000004
   Exception class = DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
 Data abort info:
   ISV = 0, ISS = 0x00000004
   CM = 0, WnR = 0
 user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000082f94ae9
 [0000000000000008] pgd=0000000000000000
 Internal error: Oops: 96000004 [rib#1] SMP
 ...
 CPU: 10 PID: 6080 Comm: qemu-system-aar Not tainted 510-rc1+ rib#6
 Hardware name: AmpereComputing(R) OSPREY EV-883832-X3-0001/OSPREY, BIOS 4819 09/25/2018
 pstate: 60000005 (nZCv daif -PAN -UAO)
 pc : set_pfnblock_flags_mask+0x58/0xe8
 lr : compaction_alloc+0x300/0x950
 [...]
 Process qemu-system-aar (pid: 6080, stack limit = 0x0000000095070da5)
 Call trace:
  set_pfnblock_flags_mask+0x58/0xe8
  compaction_alloc+0x300/0x950
  migrate_pages+0x1a4/0xbb0
  compact_zone+0x750/0xde8
  compact_zone_order+0xd8/0x118
  try_to_compact_pages+0xb4/0x290
  __alloc_pages_direct_compact+0x84/0x1e0
  __alloc_pages_nodemask+0x5e0/0xe18
  alloc_pages_vma+0x1cc/0x210
  do_huge_pmd_anonymous_page+0x108/0x7c8
  __handle_mm_fault+0xdd4/0x1190
  handle_mm_fault+0x114/0x1c0
  __get_user_pages+0x198/0x3c0
  get_user_pages_unlocked+0xb4/0x1d8
  __gfn_to_pfn_memslot+0x12c/0x3b8
  gfn_to_pfn_prot+0x4c/0x60
  kvm_handle_guest_abort+0x4b0/0xcd8
  handle_exit+0x140/0x1b8
  kvm_arch_vcpu_ioctl_run+0x260/0x768
  kvm_vcpu_ioctl+0x490/0x898
  do_vfs_ioctl+0xc4/0x898
  ksys_ioctl+0x8c/0xa0
  __arm64_sys_ioctl+0x28/0x38
  el0_svc_common+0x74/0x118
  el0_svc_handler+0x38/0x78
  el0_svc+0x8/0xc
 Code: f8607840 f100001f 8b011401 9a801020 (f9400400)
 ---[ end trace af6a35219325a9b6 ]---

The issue was reported on an arm64 server with 128GB with holes in the
zone (e.g, [32GB@4GB, 96GB@544GB]), with a swap device enabled, while
running 100 KVM guest instances.

This patch fixes the issue by ensuring that the page belongs to a valid
PFN when we fallback to using the lower limit of the scan range upon
failure in fast_isolate_freepages().

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 5a81188 ("mm, compaction: use free lists to quickly locate a migration target")
Signed-off-by: Suzuki K Poulose <[email protected]>
Reported-by: Marc Zyngier <[email protected]>
Reviewed-by: Mel Gorman <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request Jul 9, 2019
This patch addresses below two issues and prepares the code to address
3rd issue listed below.

1. mdev device is placed on the mdev bus before it is created in the
vendor driver. Once a device is placed on the mdev bus without creating
its supporting underlying vendor device, mdev driver's probe() gets
triggered.  However there isn't a stable mdev available to work on.

   create_store()
     mdev_create_device()
       device_register()
          ...
         vfio_mdev_probe()
        [...]
        parent->ops->create()
          vfio_ap_mdev_create()
            mdev_set_drvdata(mdev, matrix_mdev);
            /* Valid pointer set above */

Due to this way of initialization, mdev driver who wants to use the mdev,
doesn't have a valid mdev to work on.

2. Current creation sequence is,
   parent->ops_create()
   groups_register()

Remove sequence is,
   parent->ops->remove()
   groups_unregister()

However, remove sequence should be exact mirror of creation sequence.
Once this is achieved, all users of the mdev will be terminated first
before removing underlying vendor device.
(Follow standard linux driver model).
At that point vendor's remove() ops shouldn't fail because taking the
device off the bus should terminate any usage.

3. When remove operation fails, mdev sysfs removal attempts to add the
file back on already removed device. Following call trace [1] is observed.

[1] call trace:
kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327 sysfs_create_file_ns+0x7f/0x90
kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted 5.1.0-rc6-vdevbus+ rib#6
kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
kernel: Call Trace:
kernel: remove_store+0xdc/0x100 [mdev]
kernel: kernfs_fop_write+0x113/0x1a0
kernel: vfs_write+0xad/0x1b0
kernel: ksys_write+0x5a/0xe0
kernel: do_syscall_64+0x5a/0x210
kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe

Therefore, mdev core is improved in following ways.

1. Split the device registration/deregistration sequence so that some
things can be done between initialization of the device and hooking it
up to the bus respectively after deregistering it from the bus but
before giving up our final reference.
In particular, this means invoking the ->create() and ->remove()
callbacks in those new windows. This gives the vendor driver an
initialized mdev device to work with during creation.
At the same time, a bus driver who wish to bind to mdev driver also
gets initialized mdev device.

This follows standard Linux kernel bus and device model.

2. During remove flow, first remove the device from the bus. This
ensures that any bus specific devices are removed.
Once device is taken off the mdev bus, invoke remove() of mdev
from the vendor driver.

3. The driver core device model provides way to register and auto
unregister the device sysfs attribute groups at dev->groups.
Make use of dev->groups to let core create the groups and eliminate
code to avoid explicit groups creation and removal.

To ensure, that new sequence is solid, a below stack dump of a
process is taken who attempts to remove the device while device is in
use by vfio driver and user application.
This stack dump validates that vfio driver guards against such device
removal when device is in use.

 cat /proc/21962/stack
[<0>] vfio_del_group_dev+0x216/0x3c0 [vfio]
[<0>] mdev_remove+0x21/0x40 [mdev]
[<0>] device_release_driver_internal+0xe8/0x1b0
[<0>] bus_remove_device+0xf9/0x170
[<0>] device_del+0x168/0x350
[<0>] mdev_device_remove_common+0x1d/0x50 [mdev]
[<0>] mdev_device_remove+0x8c/0xd0 [mdev]
[<0>] remove_store+0x71/0x90 [mdev]
[<0>] kernfs_fop_write+0x113/0x1a0
[<0>] vfs_write+0xad/0x1b0
[<0>] ksys_write+0x5a/0xe0
[<0>] do_syscall_64+0x5a/0x210
[<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<0>] 0xffffffffffffffff

This prepares the code to eliminate calling device_create_file() in
subsequent patch.

Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request Jul 9, 2019
If device is removal is initiated by two threads as below, mdev core
attempts to create a syfs remove file on stale device.
During this flow, below [1] call trace is observed.

     cpu-0                                    cpu-1
     -----                                    -----
  mdev_unregister_device()
    device_for_each_child
       mdev_device_remove_cb
          mdev_device_remove
                                       user_syscall
                                         remove_store()
                                           mdev_device_remove()
                                        [..]
   unregister device();
                                       /* not found in list or
                                        * active=false.
                                        */
                                          sysfs_create_file()
                                          ..Call trace

Now that mdev core follows correct device removal sequence of the linux
bus model, remove shouldn't fail in normal cases. If it fails, there is
no point of creating a stale file or checking for specific error status.

kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
sysfs_create_file_ns+0x7f/0x90
kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
5.1.0-rc6-vdevbus+ rib#6
kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b
08/09/2016
kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
kernel: Call Trace:
kernel: remove_store+0xdc/0x100 [mdev]
kernel: kernfs_fop_write+0x113/0x1a0
kernel: vfs_write+0xad/0x1b0
kernel: ksys_write+0x5a/0xe0
kernel: do_syscall_64+0x5a/0x210
kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe

Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request Jul 9, 2019
In following sequences, child devices created while removing mdev parent
device can be left out, or it may lead to race of removing half
initialized child mdev devices.

issue-1:
--------
       cpu-0                         cpu-1
       -----                         -----
                                  mdev_unregister_device()
                                    device_for_each_child()
                                      mdev_device_remove_cb()
                                        mdev_device_remove()
create_store()
  mdev_device_create()                   [...]
    device_add()
                                  parent_remove_sysfs_files()

/* BUG: device added by cpu-0
 * whose parent is getting removed
 * and it won't process this mdev.
 */

issue-2:
--------
Below crash is observed when user initiated remove is in progress
and mdev_unregister_driver() completes parent unregistration.

       cpu-0                         cpu-1
       -----                         -----
remove_store()
   mdev_device_remove()
   active = false;
                                  mdev_unregister_device()
                                  parent device removed.
   [...]
   parents->ops->remove()
 /*
  * BUG: Accessing invalid parent.
  */

This is similar race like create() racing with mdev_unregister_device().

BUG: unable to handle kernel paging request at ffffffffc0585668
PGD e8f618067 P4D e8f618067 PUD e8f61a067 PMD 85adca067 PTE 0
Oops: 0000 [rib#1] SMP PTI
CPU: 41 PID: 37403 Comm: bash Kdump: loaded Not tainted 5.1.0-rc6-vdevbus+ rib#6
Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
RIP: 0010:mdev_device_remove+0xfa/0x140 [mdev]
Call Trace:
 remove_store+0x71/0x90 [mdev]
 kernfs_fop_write+0x113/0x1a0
 vfs_write+0xad/0x1b0
 ksys_write+0x5a/0xe0
 do_syscall_64+0x5a/0x210
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Therefore, mdev core is improved as below to overcome above issues.

Wait for any ongoing mdev create() and remove() to finish before
unregistering parent device.
This continues to allow multiple create and remove to progress in
parallel for different mdev devices as most common case.
At the same time guard parent removal while parent is being accessed by
create() and remove() callbacks.
create()/remove() and unregister_device() are synchronized by the rwsem.

Refactor device removal code to mdev_device_remove_common() to avoid
acquiring unreg_sem of the parent.

Fixes: 7b96953 ("vfio: Mediated device Core driver")
Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request Jul 9, 2019
Ido Schimmel says:

====================
mlxsw: Various fixes

This patchset contains various fixes for mlxsw.

Patch rib#1 fixes an hash polarization problem when a nexthop device is a
LAG device. This is caused by the fact that the same seed is used for
the LAG and ECMP hash functions.

Patch rib#2 fixes an issue in which the driver fails to refresh a nexthop
neighbour after it becomes dead. This prevents the nexthop from ever
being written to the adjacency table and used to forward traffic. Patch

Patch rib#4 fixes a wrong extraction of TOS value in flower offload code.
Patch rib#5 is a test case.

Patch rib#6 works around a buffer issue in Spectrum-2 by reducing the
default sizes of the shared buffer pools.

Patch rib#7 prevents prio-tagged packets from entering the switch when PVID
is removed from the bridge port.

Please consider patches rib#2, rib#4 and rib#6 for 5.1.y
====================

Signed-off-by: David S. Miller <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request Jul 9, 2019
Currently the calcuation of end_pfn can round up the pfn number to more
than the actual maximum number of pfns, causing an Oops.  Fix this by
ensuring end_pfn is never more than max_pfn.

This can be easily triggered when on systems where the end_pfn gets
rounded up to more than max_pfn using the idle-page stress-ng stress test:

sudo stress-ng --idle-page 0

  BUG: unable to handle kernel paging request at 00000000000020d8
  #PF error: [normal kernel read fault]
  PGD 0 P4D 0
  Oops: 0000 [rib#1] SMP PTI
  CPU: 1 PID: 11039 Comm: stress-ng-idle- Not tainted 5.0.0-5-generic rib#6-Ubuntu
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
  RIP: 0010:page_idle_get_page+0xc8/0x1a0
  Code: 0f b1 0a 75 7d 48 8b 03 48 89 c2 48 c1 e8 33 83 e0 07 48 c1 ea 36 48 8d 0c 40 4c 8d 24 88 49 c1 e4 07 4c 03 24 d5 00 89 c3 be <49> 8b 44 24 58 48 8d b8 80 a1 02 00 e8 07 d5 77 00 48 8b 53 08 48
  RSP: 0018:ffffafd7c672fde8 EFLAGS: 00010202
  RAX: 0000000000000005 RBX: ffffe36341fff700 RCX: 000000000000000f
  RDX: 0000000000000284 RSI: 0000000000000275 RDI: 0000000001fff700
  RBP: ffffafd7c672fe00 R08: ffffa0bc34056410 R09: 0000000000000276
  R10: ffffa0bc754e9b40 R11: ffffa0bc330f6400 R12: 0000000000002080
  R13: ffffe36341fff700 R14: 0000000000080000 R15: ffffa0bc330f6400
  FS: 00007f0ec1ea5740(0000) GS:ffffa0bc7db00000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000000020d8 CR3: 0000000077d68000 CR4: 00000000000006e0
  Call Trace:
    page_idle_bitmap_write+0x8c/0x140
    sysfs_kf_bin_write+0x5c/0x70
    kernfs_fop_write+0x12e/0x1b0
    __vfs_write+0x1b/0x40
    vfs_write+0xab/0x1b0
    ksys_write+0x55/0xc0
    __x64_sys_write+0x1a/0x20
    do_syscall_64+0x5a/0x110
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 33c3fc7 ("mm: introduce idle page tracking")
Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Acked-by: Vladimir Davydov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request Mar 17, 2020
The req->body should be updated before req->state is updated and the
order should be guaranteed by a barrier.

Otherwise, read_reply() might return req->body = NULL.

Below is sample callstack when the issue is reproduced on purpose by
reordering the updates of req->body and req->state and adding delay in
code between updates of req->state and req->body.

[   22.356105] general protection fault: 0000 [rib#1] SMP PTI
[   22.361185] CPU: 2 PID: 52 Comm: xenwatch Not tainted 5.5.0xen+ rib#6
[   22.366727] Hardware name: Xen HVM domU, BIOS ...
[   22.372245] RIP: 0010:_parse_integer_fixup_radix+0x6/0x60
... ...
[   22.392163] RSP: 0018:ffffb2d64023fdf0 EFLAGS: 00010246
[   22.395933] RAX: 0000000000000000 RBX: 75746e7562755f6d RCX: 0000000000000000
[   22.400871] RDX: 0000000000000000 RSI: ffffb2d64023fdfc RDI: 75746e7562755f6d
[   22.405874] RBP: 0000000000000000 R08: 00000000000001e8 R09: 0000000000cdcdcd
[   22.410945] R10: ffffb2d6402ffe00 R11: ffff9d95395eaeb0 R12: ffff9d9535935000
[   22.417613] R13: ffff9d9526d4a000 R14: ffff9d9526f4f340 R15: ffff9d9537654000
[   22.423726] FS:  0000000000000000(0000) GS:ffff9d953bc80000(0000) knlGS:0000000000000000
[   22.429898] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   22.434342] CR2: 000000c4206a9000 CR3: 00000001ea3fc002 CR4: 00000000001606e0
[   22.439645] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   22.444941] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   22.450342] Call Trace:
[   22.452509]  simple_strtoull+0x27/0x70
[   22.455572]  xenbus_transaction_start+0x31/0x50
[   22.459104]  netback_changed+0x76c/0xcc1 [xen_netfront]
[   22.463279]  ? find_watch+0x40/0x40
[   22.466156]  xenwatch_thread+0xb4/0x150
[   22.469309]  ? wait_woken+0x80/0x80
[   22.472198]  kthread+0x10e/0x130
[   22.474925]  ? kthread_park+0x80/0x80
[   22.477946]  ret_from_fork+0x35/0x40
[   22.480968] Modules linked in: xen_kbdfront xen_fbfront(+) xen_netfront xen_blkfront
[   22.486783] ---[ end trace a9222030a747c3f7 ]---
[   22.490424] RIP: 0010:_parse_integer_fixup_radix+0x6/0x60

The virt_rmb() is added in the 'true' path of test_reply(). The "while"
is changed to "do while" so that test_reply() is used as a read memory
barrier.

Signed-off-by: Dongli Zhang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Julien Grall <[email protected]>
Signed-off-by: Boris Ostrovsky <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Jun 17, 2021
We inadvertently create a dependency on mmap_sem with a whole chain.

This breaks any user who wants to take a lock and call rcu_barrier(),
while also taking that lock inside mmap_sem:

<4> [604.892532] ======================================================
<4> [604.892534] WARNING: possible circular locking dependency detected
<4> [604.892536] 5.6.0-rc7-CI-Patchwork_17096+ rib#1 Tainted: G     U
<4> [604.892537] ------------------------------------------------------
<4> [604.892538] kms_frontbuffer/2595 is trying to acquire lock:
<4> [604.892540] ffffffff8264a558 (rcu_state.barrier_mutex){+.+.}, at: rcu_barrier+0x23/0x190
<4> [604.892547]
but task is already holding lock:
<4> [604.892547] ffff888484716050 (reservation_ww_class_mutex){+.+.}, at: i915_gem_object_pin_to_display_plane+0x89/0x270 [i915]
<4> [604.892592]
which lock already depends on the new lock.
<4> [604.892593]
the existing dependency chain (in reverse order) is:
<4> [604.892594]
-> rib#6 (reservation_ww_class_mutex){+.+.}:
<4> [604.892597]        __ww_mutex_lock.constprop.15+0xc3/0x1090
<4> [604.892598]        ww_mutex_lock+0x39/0x70
<4> [604.892600]        dma_resv_lockdep+0x10e/0x1f5
<4> [604.892602]        do_one_initcall+0x58/0x300
<4> [604.892604]        kernel_init_freeable+0x17b/0x1dc
<4> [604.892605]        kernel_init+0x5/0x100
<4> [604.892606]        ret_from_fork+0x24/0x50
<4> [604.892607]
-> rib#5 (reservation_ww_class_acquire){+.+.}:
<4> [604.892609]        dma_resv_lockdep+0xec/0x1f5
<4> [604.892610]        do_one_initcall+0x58/0x300
<4> [604.892610]        kernel_init_freeable+0x17b/0x1dc
<4> [604.892611]        kernel_init+0x5/0x100
<4> [604.892612]        ret_from_fork+0x24/0x50
<4> [604.892613]
-> rib#4 (&mm->mmap_sem#2){++++}:
<4> [604.892615]        __might_fault+0x63/0x90
<4> [604.892617]        _copy_to_user+0x1e/0x80
<4> [604.892619]        perf_read+0x200/0x2b0
<4> [604.892621]        vfs_read+0x96/0x160
<4> [604.892622]        ksys_read+0x9f/0xe0
<4> [604.892623]        do_syscall_64+0x4f/0x220
<4> [604.892624]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [604.892625]
-> rib#3 (&cpuctx_mutex){+.+.}:
<4> [604.892626]        __mutex_lock+0x9a/0x9c0
<4> [604.892627]        perf_event_init_cpu+0xa4/0x140
<4> [604.892629]        perf_event_init+0x19d/0x1cd
<4> [604.892630]        start_kernel+0x362/0x4e4
<4> [604.892631]        secondary_startup_64+0xa4/0xb0
<4> [604.892631]
-> rib#2 (pmus_lock){+.+.}:
<4> [604.892633]        __mutex_lock+0x9a/0x9c0
<4> [604.892633]        perf_event_init_cpu+0x6b/0x140
<4> [604.892635]        cpuhp_invoke_callback+0x9b/0x9d0
<4> [604.892636]        _cpu_up+0xa2/0x140
<4> [604.892637]        do_cpu_up+0x61/0xa0
<4> [604.892639]        smp_init+0x57/0x96
<4> [604.892639]        kernel_init_freeable+0x87/0x1dc
<4> [604.892640]        kernel_init+0x5/0x100
<4> [604.892642]        ret_from_fork+0x24/0x50
<4> [604.892642]
-> rib#1 (cpu_hotplug_lock.rw_sem){++++}:
<4> [604.892643]        cpus_read_lock+0x34/0xd0
<4> [604.892644]        rcu_barrier+0xaa/0x190
<4> [604.892645]        kernel_init+0x21/0x100
<4> [604.892647]        ret_from_fork+0x24/0x50
<4> [604.892647]
-> #0 (rcu_state.barrier_mutex){+.+.}:
<4> [604.892649]        __lock_acquire+0x1328/0x15d0
<4> [604.892650]        lock_acquire+0xa7/0x1c0
<4> [604.892651]        __mutex_lock+0x9a/0x9c0
<4> [604.892652]        rcu_barrier+0x23/0x190
<4> [604.892680]        i915_gem_object_unbind+0x29d/0x3f0 [i915]
<4> [604.892707]        i915_gem_object_pin_to_display_plane+0x141/0x270 [i915]
<4> [604.892737]        intel_pin_and_fence_fb_obj+0xec/0x1f0 [i915]
<4> [604.892767]        intel_plane_pin_fb+0x3f/0xd0 [i915]
<4> [604.892797]        intel_prepare_plane_fb+0x13b/0x5c0 [i915]
<4> [604.892798]        drm_atomic_helper_prepare_planes+0x85/0x110
<4> [604.892827]        intel_atomic_commit+0xda/0x390 [i915]
<4> [604.892828]        drm_atomic_helper_set_config+0x57/0xa0
<4> [604.892830]        drm_mode_setcrtc+0x1c4/0x720
<4> [604.892830]        drm_ioctl_kernel+0xb0/0xf0
<4> [604.892831]        drm_ioctl+0x2e1/0x390
<4> [604.892833]        ksys_ioctl+0x7b/0x90
<4> [604.892835]        __x64_sys_ioctl+0x11/0x20
<4> [604.892835]        do_syscall_64+0x4f/0x220
<4> [604.892836]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [604.892837]

Changes since v1:
- Use (*values)[n++] in perf_read_one().
Changes since v2:
- Centrally allocate values.

Signed-off-by: Maarten Lankhorst <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request May 24, 2022
Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <[email protected]>
Reviewed-by: Douglas Anderson <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>

======================================================
 WARNING: possible circular locking dependency detected
 5.15.35-lockdep rib#6 Tainted: G        W
 ------------------------------------------------------
 frecon/429 is trying to acquire lock:
 ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

 but task is already holding lock:
 ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> rib#3 (&kms->commit_lock[i]){+.+.}-{3:3}:
        __mutex_lock_common+0x174/0x1a64
        mutex_lock_nested+0x98/0xac
        lock_crtcs+0xb4/0x124
        msm_atomic_commit_tail+0x330/0x748
        commit_tail+0x19c/0x278
        drm_atomic_helper_commit+0x1dc/0x1f0
        drm_atomic_commit+0xc0/0xd8
        drm_atomic_helper_set_config+0xb4/0x134
        drm_mode_setcrtc+0x688/0x1248
        drm_ioctl_kernel+0x1e4/0x338
        drm_ioctl+0x3a4/0x684
        __arm64_sys_ioctl+0x118/0x154
        invoke_syscall+0x78/0x224
        el0_svc_common+0x178/0x200
        do_el0_svc+0x94/0x13c
        el0_svc+0x5c/0xec
        el0t_64_sync_handler+0x78/0x108
        el0t_64_sync+0x1a4/0x1a8

 -> rib#2 (crtc_ww_class_mutex){+.+.}-{3:3}:
        __mutex_lock_common+0x174/0x1a64
        ww_mutex_lock+0xb8/0x278
        modeset_lock+0x304/0x4ac
        drm_modeset_lock+0x4c/0x7c
        drmm_mode_config_init+0x4a8/0xc50
        msm_drm_init+0x274/0xac0
        msm_drm_bind+0x20/0x2c
        try_to_bring_up_master+0x3dc/0x470
        __component_add+0x18c/0x3c0
        component_add+0x1c/0x28
        dp_display_probe+0x954/0xa98
        platform_probe+0x124/0x15c
        really_probe+0x1b0/0x5f8
        __driver_probe_device+0x174/0x20c
        driver_probe_device+0x70/0x134
        __device_attach_driver+0x130/0x1d0
        bus_for_each_drv+0xfc/0x14c
        __device_attach+0x1bc/0x2bc
        device_initial_probe+0x1c/0x28
        bus_probe_device+0x94/0x178
        deferred_probe_work_func+0x1a4/0x1f0
        process_one_work+0x5d4/0x9dc
        worker_thread+0x898/0xccc
        kthread+0x2d4/0x3d4
        ret_from_fork+0x10/0x20

 -> rib#1 (crtc_ww_class_acquire){+.+.}-{0:0}:
        ww_acquire_init+0x1c4/0x2c8
        drm_modeset_acquire_init+0x44/0xc8
        drm_helper_probe_single_connector_modes+0xb0/0x12dc
        drm_mode_getconnector+0x5dc/0xfe8
        drm_ioctl_kernel+0x1e4/0x338
        drm_ioctl+0x3a4/0x684
        __arm64_sys_ioctl+0x118/0x154
        invoke_syscall+0x78/0x224
        el0_svc_common+0x178/0x200
        do_el0_svc+0x94/0x13c
        el0_svc+0x5c/0xec
        el0t_64_sync_handler+0x78/0x108
        el0t_64_sync+0x1a4/0x1a8

 -> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
        __lock_acquire+0x2650/0x672c
        lock_acquire+0x1b4/0x4ac
        __mutex_lock_common+0x174/0x1a64
        mutex_lock_nested+0x98/0xac
        dp_panel_add_fail_safe_mode+0x4c/0xa0
        dp_hpd_plug_handle+0x1f0/0x280
        dp_bridge_enable+0x94/0x2b8
        drm_atomic_bridge_chain_enable+0x11c/0x168
        drm_atomic_helper_commit_modeset_enables+0x500/0x740
        msm_atomic_commit_tail+0x3e4/0x748
        commit_tail+0x19c/0x278
        drm_atomic_helper_commit+0x1dc/0x1f0
        drm_atomic_commit+0xc0/0xd8
        drm_atomic_helper_set_config+0xb4/0x134
        drm_mode_setcrtc+0x688/0x1248
        drm_ioctl_kernel+0x1e4/0x338
        drm_ioctl+0x3a4/0x684
        __arm64_sys_ioctl+0x118/0x154
        invoke_syscall+0x78/0x224
        el0_svc_common+0x178/0x200
        do_el0_svc+0x94/0x13c
        el0_svc+0x5c/0xec
        el0t_64_sync_handler+0x78/0x108
        el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
--  [email protected]

Changes in v6:
--  fix Fixes commit ID

Fixes: 8b2c181 ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <[email protected]>
Signed-off-by: Kuogee Hsieh <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Clark <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request May 24, 2022
Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[  203.588029] ==================================================================
[  203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[  203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[  203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G    B             5.17.0-rc7+ rib#6
[  203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[  203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[  203.600053] Call Trace:
[  203.600608]  <TASK>
[  203.601110]  dump_stack_lvl+0x48/0x5e
[  203.601860]  print_address_description.constprop.0+0x1f/0x160
[  203.602950]  ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[  203.604073]  ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[  203.605177]  kasan_report.cold+0x83/0xdf
[  203.605969]  ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[  203.607102]  mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[  203.608199]  ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[  203.609382]  ? read_word_at_a_time+0xe/0x20
[  203.610463]  ? strscpy+0xa0/0x2a0
[  203.611463]  process_one_work+0x722/0x1270
[  203.612344]  worker_thread+0x540/0x11e0
[  203.613136]  ? rescuer_thread+0xd50/0xd50
[  203.613949]  kthread+0x26e/0x300
[  203.614627]  ? kthread_complete_and_exit+0x20/0x20
[  203.615542]  ret_from_fork+0x1f/0x30
[  203.616273]  </TASK>

[  203.617174] Allocated by task 3746:
[  203.617874]  kasan_save_stack+0x1e/0x40
[  203.618644]  __kasan_kmalloc+0x81/0xa0
[  203.619394]  fib_create_info+0xb41/0x3c50
[  203.620213]  fib_table_insert+0x190/0x1ff0
[  203.621020]  fib_magic.isra.0+0x246/0x2e0
[  203.621803]  fib_add_ifaddr+0x19f/0x670
[  203.622563]  fib_inetaddr_event+0x13f/0x270
[  203.623377]  blocking_notifier_call_chain+0xd4/0x130
[  203.624355]  __inet_insert_ifa+0x641/0xb20
[  203.625185]  inet_rtm_newaddr+0xc3d/0x16a0
[  203.626009]  rtnetlink_rcv_msg+0x309/0x880
[  203.626826]  netlink_rcv_skb+0x11d/0x340
[  203.627626]  netlink_unicast+0x4cc/0x790
[  203.628430]  netlink_sendmsg+0x762/0xc00
[  203.629230]  sock_sendmsg+0xb2/0xe0
[  203.629955]  ____sys_sendmsg+0x58a/0x770
[  203.630756]  ___sys_sendmsg+0xd8/0x160
[  203.631523]  __sys_sendmsg+0xb7/0x140
[  203.632294]  do_syscall_64+0x35/0x80
[  203.633045]  entry_SYSCALL_64_after_hwframe+0x44/0xae

[  203.634427] Freed by task 0:
[  203.635063]  kasan_save_stack+0x1e/0x40
[  203.635844]  kasan_set_track+0x21/0x30
[  203.636618]  kasan_set_free_info+0x20/0x30
[  203.637450]  __kasan_slab_free+0xfc/0x140
[  203.638271]  kfree+0x94/0x3b0
[  203.638903]  rcu_core+0x5e4/0x1990
[  203.639640]  __do_softirq+0x1ba/0x5d3

[  203.640828] Last potentially related work creation:
[  203.641785]  kasan_save_stack+0x1e/0x40
[  203.642571]  __kasan_record_aux_stack+0x9f/0xb0
[  203.643478]  call_rcu+0x88/0x9c0
[  203.644178]  fib_release_info+0x539/0x750
[  203.644997]  fib_table_delete+0x659/0xb80
[  203.645809]  fib_magic.isra.0+0x1a3/0x2e0
[  203.646617]  fib_del_ifaddr+0x93f/0x1300
[  203.647415]  fib_inetaddr_event+0x9f/0x270
[  203.648251]  blocking_notifier_call_chain+0xd4/0x130
[  203.649225]  __inet_del_ifa+0x474/0xc10
[  203.650016]  devinet_ioctl+0x781/0x17f0
[  203.650788]  inet_ioctl+0x1ad/0x290
[  203.651533]  sock_do_ioctl+0xce/0x1c0
[  203.652315]  sock_ioctl+0x27b/0x4f0
[  203.653058]  __x64_sys_ioctl+0x124/0x190
[  203.653850]  do_syscall_64+0x35/0x80
[  203.654608]  entry_SYSCALL_64_after_hwframe+0x44/0xae

[  203.666952] The buggy address belongs to the object at ffff888144df2000
                which belongs to the cache kmalloc-256 of size 256
[  203.669250] The buggy address is located 80 bytes inside of
                256-byte region [ffff888144df2000, ffff888144df2100)
[  203.671332] The buggy address belongs to the page:
[  203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[  203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[  203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[  203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[  203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[  203.679928] page dumped because: kasan: bad access detected

[  203.681455] Memory state around the buggy address:
[  203.682421]  ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  203.683863]  ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  203.686701]                                                  ^
[  203.687820]  ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  203.689226]  ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  203.690620] ==================================================================

Fixes: ad11c4f ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <[email protected]>
Reviewed-by: Maor Dickman <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request May 24, 2022
Do not allow to write timestamps on RX rings if PF is being configured.
When PF is being configured RX rings can be freed or rebuilt. If at the
same time timestamps are updated, the kernel will crash by dereferencing
null RX ring pointer.

PID: 1449   TASK: ff187d28ed658040  CPU: 34  COMMAND: "ice-ptp-0000:51"
 #0 [ff1966a94a713bb0] machine_kexec at ffffffff9d05a0be
 rib#1 [ff1966a94a713c08] __crash_kexec at ffffffff9d192e9d
 rib#2 [ff1966a94a713cd0] crash_kexec at ffffffff9d1941bd
 rib#3 [ff1966a94a713ce8] oops_end at ffffffff9d01bd54
 rib#4 [ff1966a94a713d08] no_context at ffffffff9d06bda4
 rib#5 [ff1966a94a713d60] __bad_area_nosemaphore at ffffffff9d06c10c
 rib#6 [ff1966a94a713da8] do_page_fault at ffffffff9d06cae4
 rib#7 [ff1966a94a713de0] page_fault at ffffffff9da0107e
    [exception RIP: ice_ptp_update_cached_phctime+91]
    RIP: ffffffffc076db8b  RSP: ff1966a94a713e98  RFLAGS: 00010246
    RAX: 16e3db9c6b7ccae4  RBX: ff187d269dd3c180  RCX: ff187d269cd4d018
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ff187d269cfcc644   R8: ff187d339b9641b0   R9: 0000000000000000
    R10: 0000000000000002  R11: 0000000000000000  R12: ff187d269cfcc648
    R13: ffffffff9f128784  R14: ffffffff9d101b70  R15: ff187d269cfcc640
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 rib#8 [ff1966a94a713ea0] ice_ptp_periodic_work at ffffffffc076dbef [ice]
 rib#9 [ff1966a94a713ee0] kthread_worker_fn at ffffffff9d101c1b
 rib#10 [ff1966a94a713f10] kthread at ffffffff9d101b4d
 rib#11 [ff1966a94a713f50] ret_from_fork at ffffffff9da0023f

Fixes: 77a7811 ("ice: enable receive hardware timestamping")
Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Reviewed-by: Michal Schmidt <[email protected]>
Tested-by: Dave Cain <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants