Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added backlight support for some samsung laptops #11

Closed
wants to merge 1 commit into from

Conversation

xonatius
Copy link

@xonatius xonatius commented Sep 6, 2011

Models:

  • N120
  • R468/R418
  • X320/X420/X520
  • R510/P510
  • N350
  • R470/R420
  • R528/R728
  • SQ1S

Models:
 * N120
 * R468/R418
 * X320/X420/X520
 * R510/P510
 * N350
 * R470/R420
 * R528/R728
 * SQ1S
@torvalds
Copy link
Owner

torvalds commented Sep 6, 2011

I'm not doing github pulls. The pull requests are seriously
misdesigned, and github does horrible things to the commits.

Please don't press the "pull request" github button. Do proper kernel
pull request with diffstat, git source tree (which can be on github,
of course), branch, commit information etc etc etc.

                  Linus

@xonatius xonatius closed this Sep 7, 2011
@nils-werner
Copy link

How are pullrequest seriously misdesigned (apart from that you might be used to a different kind of workflow)?

@valpackett
Copy link
Contributor

I'm not doing linux kernel pulls. The kernel pulls are seriously
misdesigned, and linux does horrible things to the commits.

Please don't press the "pull request" kernel button. Do proper github
pull request with diffstat, git source tree (which can be on linux,
of course), branch, commit information etc etc etc.

GitHub

@tilsammans
Copy link

I honestly would like to know why github pull requests are misdesigned. I'll grant that I didn't actually create git but they seem to work just fine, is there something I am missing?

@hotwoofy
Copy link

hotwoofy commented Sep 7, 2011

@nils-werner
Copy link

See https://github.com/torvalds/diveclog/pull/18#issuecomment-2023552

Wow, great discussion went on there. Shacon raised perfectly valid points and Torvalds was basically "f this, I don't care, you're crazy". Great response!

@torvalds
Copy link
Owner

torvalds commented Sep 9, 2011

On Fri, Sep 9, 2011 at 12:49 AM, Nils Werner
[email protected]
wrote:

Wow, great discussion went on there. Shacon raised perfectly valid points and Torvalds was basically "f this, I don't care, you're crazy". Great response!

Can you read?

"If the merge message doesn't tell me who the merge is from and what
branch it was, the merge message is totally useless."

If you can't understand that, then yes, you're crazy. Or just terminally stupid.

The quality of github "issues" and comments really is very low. This
being just another example of it.

                          Linus

@nils-werner
Copy link

First, I agree with Scott: In many cases people delete their fork (or at least the branch). So where would the message point you to? The pull request of the pulling repository will much more likely be around for a long time.

Also, what if the branch you'll pull from has changed in the meantime? You'd end up with changes that are not documented in the pull request and thus not reviewed by the ones discussing the pull request. As soon as the PR is posted you must put them out of reach of the author to keep them from sneaking in changes.

Great response!

Can you read?

Also, you did notice that you've proven my point right there, right?

@torvalds
Copy link
Owner

torvalds commented Sep 9, 2011

On Fri, Sep 9, 2011 at 12:10 PM, Nils Werner
[email protected]
wrote:

First, I agree with Scott: In many cases people delete their fork (or at least the branch). So where would the message point you to? The pull request of the pulling repository will much more likely be around for a long time.

That's a "implementation problem". It's not an argument for doing crap.

Simple solution: if people delete the branch or repository, consider
the pull request dead.

You can make the "pull request" namespace separate from the branch
namespace, but do it on the source side, instead of on the
destination side like you do now. So if somebody says "please pull by
branch xyzzy", you turn it into a pull request for

git pull git://github.com/ pull/xyzzy

and then if there i a previous pull request, add a number to it (so it
becomes "pull/xyzzy-2" or whatever).

Or something along those lines. The important part is that YOU MUST
NOT THROW AWAY THE SOURCE INFORMATION!

Also, what if the branch you'll pull from has changed in the meantime?

We actually do this in the kernel on purpose sometimes - people fix up
their stuff.

That said, again, you could do the same thing: if somebody changes a
branch after created a pull request off it, just invalidate the pull
request and refuse to honor it. Again, if you do a separate
"pull/xyzzy" namespace, you should be able to validate that trivially
(save off the commit ID at the time of the pull, and refuse to serve
"pull/xyzzy" if the commit ID doesn't match the branch "xyzzy" any
more).

You'd end up with changes that are not documented in the pull request and thus not reviewed by the ones discussing the pull request.

Umm, considering that the pull requests used to have no documentation
what-so-ever before I even complained about it, that's a pretty damn
weak argument, isn't it?

As soon as the PR is posted you must put them out of reach of the
author to keep them from sneaking in changes.

Can you read?

Also, you did notice that you've proven my point right there, right?

Umm. I'm not polite. Big news. I'd rather be acerbic than stupid.

                       Linus

@nils-werner
Copy link

That's a "implementation problem".

A decentralized system that doesn't accept disappearing nodes sounds more like a design problem.

Simple solution: if people delete the branch or repository, consider the pull request dead.

Years after the branch has been merged? Is that a problem we wanted to solve?

Also, what if the branch you'll pull from has changed in the meantime?

We actually do this in the kernel on purpose sometimes - people fix up their stuff.

I meant malicuous changes. Hierarchies are shallow, elite circles basically nonexistant so that's a real issue. And the biggest strength of GitHub.

save off the commit ID at the time of the pull, and refuse to serve "pull/xyzzy" if the commit ID doesn't match the branch "xyzzy" any more

Thats the first constructive comment to this discussion. And sounds like a good idea, apart from the problem that you'd lose the link to the PR wich, to many, is more useful than being able to immediately recognise the source.

Also it would probably require lots of modifications to the deamon though.And very disciplined contributors (always make sure to use dead-end topic-branches, not everybody does that). Separating the two simply improves the workflow a lot.

It'd be interesting what @schacon has to say about it.

Umm, considering that the pull requests used to have no documentation what-so-ever before I even complained about it, that's a pretty damn weak argument, isn't it?

When was that? Months ago? I am talking about your comment 2 days ago.

@nils-werner
Copy link

Umm. I'm not polite. Big news. I'd rather be acerbic than stupid.

A personal, unrelated note: Being unable to lead an objective discussion. Judging people, then insulting them just to prove a point. Recognising ones flaws but being unwilling to change them, instead bragging about them. Missing the ability to reflect on ones actions during interactions with others.

That sounds pretty stupid to me. Anyways, I'm moving on.

@jeffWelling
Copy link

@nils-werner

A decentralized system that doesn't accept disappearing nodes sounds more like a design problem.

I thought we were talking about pull requests and branches? When did a branch become a node?
Perhaps I'm missing something but this sounds simple; if you have a change and you want someone else to pull it, it sounds reasonable to expect you to keep the change published at least until it is pulled.

I meant malicuous changes. Hierarchies are shallow, elite circles basically nonexistant so that's a real issue. And the biggest strength of GitHub.

Except that, as indicated by Scott Chacon [0], the most common scenario is to perform the pull request locally on your machine, allowing you to pull the code and then review it without said code being changed before merging. I can understand your argument in relation to pull requests done using the button on the website though.

[0] https://github.com/torvalds/diveclog/pull/18

cuviper pushed a commit to cuviper/linux-uprobes that referenced this pull request Nov 3, 2011
* Ingo Molnar <[email protected]> wrote:

> The patch below addresses these concerns, serializes the output, tidies up the
> printout, resulting in this new output:

There's one bug remaining that my patch does not address: the vCPUs are not
printed in order:

# vCPU #0's dump:
# vCPU #2's dump:
# vCPU torvalds#24's dump:
# vCPU #5's dump:
# vCPU torvalds#39's dump:
# vCPU torvalds#38's dump:
# vCPU torvalds#51's dump:
# vCPU torvalds#11's dump:
# vCPU torvalds#10's dump:
# vCPU torvalds#12's dump:

This is undesirable as the order of printout is highly random, so successive
dumps are difficult to compare.

The patch below serializes the signalling itself. (this is on top of the
previous patch)

The patch also tweaks the vCPU printout line a bit so that it does not start
with '#', which is discarded if such messages are pasted into Git commit
messages.

Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Pekka Enberg <[email protected]>
torvalds pushed a commit that referenced this pull request Dec 15, 2011
If the pte mapping in generic_perform_write() is unmapped between
iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the
"copied" parameter to ->end_write can be zero. ext4 couldn't cope with
it with delayed allocations enabled. This skips the i_disksize
enlargement logic if copied is zero and no new data was appeneded to
the inode.

 gdb> bt
 #0  0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\
 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 #2  0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\
 ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440
 #3  generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\
 os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482
 #4  0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\
 xffff88001e26be40) at mm/filemap.c:2600
 #5  0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\
 zed out>, pos=<value optimized out>) at mm/filemap.c:2632
 #6  0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\
 t fs/ext4/file.c:136
 #7  0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \
 ppos=0xffff88001e26bf48) at fs/read_write.c:406
 #8  0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\
 000, pos=0xffff88001e26bf48) at fs/read_write.c:435
 #9  0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\
 4000) at fs/read_write.c:487
 #10 <signal handler called>
 #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ ()
 #12 0x0000000000000000 in ?? ()
 gdb> print offset
 $22 = 0xffffffffffffffff
 gdb> print idx
 $23 = 0xffffffff
 gdb> print inode->i_blkbits
 $24 = 0xc
 gdb> up
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 2512                    if (ext4_da_should_update_i_disksize(page, end)) {
 gdb> print start
 $25 = 0x0
 gdb> print end
 $26 = 0xffffffffffffffff
 gdb> print pos
 $27 = 0x108000
 gdb> print new_i_size
 $28 = 0x108000
 gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize
 $29 = 0xd9000
 gdb> down
 2467            for (i = 0; i < idx; i++)
 gdb> print i
 $30 = 0xd44acbee

This is 100% reproducible with some autonuma development code tuned in
a very aggressive manner (not normal way even for knumad) which does
"exotic" changes to the ptes. It wouldn't normally trigger but I don't
see why it can't happen normally if the page is added to swap cache in
between the two faults leading to "copied" being zero (which then
hangs in ext4). So it should be fixed. Especially possible with lumpy
reclaim (albeit disabled if compaction is enabled) as that would
ignore the young bits in the ptes.

Signed-off-by: Andrea Arcangeli <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
Cc: [email protected]
jkstrick pushed a commit to jkstrick/linux that referenced this pull request Feb 11, 2012
If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not
update the real num tx queues. netdev_queue_update_kobjects() is already
called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when
upper layer driver, e.g., FCoE protocol stack is monitoring the netdev
event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove
extra queues allocated for FCoE, the associated txq sysfs kobjects are already
removed, and trying to update the real num queues would cause something like
below:

...
PID: 25138  TASK: ffff88021e64c440  CPU: 3   COMMAND: "kworker/3:3"
 #0 [ffff88021f007760] machine_kexec at ffffffff810226d9
 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d
 #2 [ffff88021f0078a0] oops_end at ffffffff813bca78
 #3 [ffff88021f0078d0] no_context at ffffffff81029e72
 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155
 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e
 torvalds#6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e
 torvalds#7 [ffff88021f007b10] page_fault at ffffffff813bc045
    [exception RIP: sysfs_find_dirent+17]
    RIP: ffffffff81178611  RSP: ffff88021f007bc0  RFLAGS: 00010246
    RAX: ffff88021e64c440  RBX: ffffffff8156cc63  RCX: 0000000000000004
    RDX: ffffffff8156cc63  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffff88021f007be0   R8: 0000000000000004   R9: 0000000000000008
    R10: ffffffff816fed00  R11: 0000000000000004  R12: 0000000000000000
    R13: ffffffff8156cc63  R14: 0000000000000000  R15: ffff8802222a0000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 torvalds#8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07
 torvalds#9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27
torvalds#10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9
torvalds#11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38
torvalds#12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe]
torvalds#13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe]
torvalds#14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe]
torvalds#15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q]
torvalds#16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe]
torvalds#17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe]
torvalds#18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca
torvalds#19 [ffff88021f007e68] worker_thread at ffffffff81060513
torvalds#20 [ffff88021f007ee8] kthread at ffffffff810648b6
torvalds#21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4

Signed-off-by: Yi Zou <[email protected]>
Tested-by: Ross Brattain <[email protected]>
Tested-by: Stephen Ko <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
zachariasmaladroit pushed a commit to galaxys-cm7miui-kernel/linux that referenced this pull request Feb 11, 2012
If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not
update the real num tx queues. netdev_queue_update_kobjects() is already
called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when
upper layer driver, e.g., FCoE protocol stack is monitoring the netdev
event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove
extra queues allocated for FCoE, the associated txq sysfs kobjects are already
removed, and trying to update the real num queues would cause something like
below:

...
PID: 25138  TASK: ffff88021e64c440  CPU: 3   COMMAND: "kworker/3:3"
 #0 [ffff88021f007760] machine_kexec at ffffffff810226d9
 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d
 #2 [ffff88021f0078a0] oops_end at ffffffff813bca78
 #3 [ffff88021f0078d0] no_context at ffffffff81029e72
 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155
 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e
 torvalds#6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e
 torvalds#7 [ffff88021f007b10] page_fault at ffffffff813bc045
    [exception RIP: sysfs_find_dirent+17]
    RIP: ffffffff81178611  RSP: ffff88021f007bc0  RFLAGS: 00010246
    RAX: ffff88021e64c440  RBX: ffffffff8156cc63  RCX: 0000000000000004
    RDX: ffffffff8156cc63  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffff88021f007be0   R8: 0000000000000004   R9: 0000000000000008
    R10: ffffffff816fed00  R11: 0000000000000004  R12: 0000000000000000
    R13: ffffffff8156cc63  R14: 0000000000000000  R15: ffff8802222a0000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 torvalds#8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07
 torvalds#9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27
torvalds#10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9
torvalds#11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38
torvalds#12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe]
torvalds#13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe]
torvalds#14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe]
torvalds#15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q]
torvalds#16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe]
torvalds#17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe]
torvalds#18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca
torvalds#19 [ffff88021f007e68] worker_thread at ffffffff81060513
torvalds#20 [ffff88021f007ee8] kthread at ffffffff810648b6
torvalds#21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4

Signed-off-by: Yi Zou <[email protected]>
Tested-by: Ross Brattain <[email protected]>
Tested-by: Stephen Ko <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
tworaz pushed a commit to tworaz/linux that referenced this pull request Feb 13, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a
 torvalds#7 [d72d3d1] zone_watermark_ok at c02d26cb
 torvalds#8 [d72d3d2c] compact_zone at c030b8de
 torvalds#9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
xXorAa pushed a commit to xXorAa/linux that referenced this pull request Feb 17, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a
 torvalds#7 [d72d3d1] zone_watermark_ok at c02d26cb
 torvalds#8 [d72d3d2c] compact_zone at c030b8de
 torvalds#9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koct9i pushed a commit to koct9i/linux that referenced this pull request Feb 20, 2012
fixed:
WARNING: please, no space before tabs
torvalds#11: FILE: adt7411.c:11:
+ * ^I  use power-down mode for suspend?, interrupt handling?$

not fixed as all other macros around it are the same structure and this one is only 2 chars longer:
WARNING: line over 80 characters
torvalds#229: FILE: adt7411.c:229:
+static ADT7411_BIT_ATTR(fast_sampling, ADT7411_REG_CFG3, ADT7411_CFG3_ADC_CLK_225);

Signed-off-by: Frans Meulenbroeks <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Feb 23, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Mar 1, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Mar 19, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Mar 22, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Apr 2, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Apr 9, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Apr 11, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Apr 12, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
psanford pushed a commit to retailnext/linux that referenced this pull request Apr 16, 2012
BugLink: http://bugs.launchpad.net/bugs/907778

commit ea51d13 upstream.

If the pte mapping in generic_perform_write() is unmapped between
iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the
"copied" parameter to ->end_write can be zero. ext4 couldn't cope with
it with delayed allocations enabled. This skips the i_disksize
enlargement logic if copied is zero and no new data was appeneded to
the inode.

 gdb> bt
 #0  0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\
 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 #2  0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\
 ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440
 #3  generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\
 os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482
 #4  0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\
 xffff88001e26be40) at mm/filemap.c:2600
 #5  0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\
 zed out>, pos=<value optimized out>) at mm/filemap.c:2632
 torvalds#6  0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\
 t fs/ext4/file.c:136
 torvalds#7  0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \
 ppos=0xffff88001e26bf48) at fs/read_write.c:406
 torvalds#8  0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\
 000, pos=0xffff88001e26bf48) at fs/read_write.c:435
 torvalds#9  0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\
 4000) at fs/read_write.c:487
 torvalds#10 <signal handler called>
 torvalds#11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ ()
 torvalds#12 0x0000000000000000 in ?? ()
 gdb> print offset
 $22 = 0xffffffffffffffff
 gdb> print idx
 $23 = 0xffffffff
 gdb> print inode->i_blkbits
 $24 = 0xc
 gdb> up
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 2512                    if (ext4_da_should_update_i_disksize(page, end)) {
 gdb> print start
 $25 = 0x0
 gdb> print end
 $26 = 0xffffffffffffffff
 gdb> print pos
 $27 = 0x108000
 gdb> print new_i_size
 $28 = 0x108000
 gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize
 $29 = 0xd9000
 gdb> down
 2467            for (i = 0; i < idx; i++)
 gdb> print i
 $30 = 0xd44acbee

This is 100% reproducible with some autonuma development code tuned in
a very aggressive manner (not normal way even for knumad) which does
"exotic" changes to the ptes. It wouldn't normally trigger but I don't
see why it can't happen normally if the page is added to swap cache in
between the two faults leading to "copied" being zero (which then
hangs in ext4). So it should be fixed. Especially possible with lumpy
reclaim (albeit disabled if compaction is enabled) as that would
ignore the young bits in the ptes.

Signed-off-by: Andrea Arcangeli <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Tim Gardner <[email protected]>
Signed-off-by: Brad Figg <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Dec 6, 2024
…4_xdr_dec_open_noattr

The `nfs4_xdr_dec_open()` function does not properly check the return
status of the `ACCESS` operation. This oversight can result in
out-of-bounds memory access when decoding NFSv4 compound requests.

For instance, in an NFSv4 compound request `{5, PUTFH, OPEN, GETFH,
ACCESS, GETATTR}`, if the `ACCESS` operation (step 4) returns an error,
the function proceeds to decode the subsequent `GETATTR` operation
(step 5) without validating the RPC buffer's state. This can cause an
RPC buffer overflow, which leading to a system panic. This issue
can be reliably reproduced by running multiple `fsstress` tests in the
same directory exported by the Ganesha NFS server.

This patch introduces proper error handling for the `ACCESS` operation
in `nfs4_xdr_dec_open()` and `nfs4_xdr_dec_open_noattr()`. When an
error is detected, the decoding process is terminated gracefully to
prevent further buffer corruption and ensure system stability.

 torvalds#7 [ffffa42b17337bc0] page_fault at ffffffff906010fe
    [exception RIP: xdr_set_page_base+61]
    RIP: ffffffffc12166dd  RSP: ffffa42b17337c78  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffffa42b17337db8  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: ffffa42b17337db8
    RBP: 0000000000000000   R8: ffff904948b0a650   R9: 0000000000000000
    R10: 8080808080808080  R11: ffff904ac3c68be4  R12: 0000000000000009
    R13: ffffa42b17337db8  R14: ffff904aa6aee000  R15: ffffffffc11f7f50
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 torvalds#8 [ffffa42b17337c78] xdr_set_next_buffer at ffffffffc1217b0b [sunrpc]
 torvalds#9 [ffffa42b17337c90] xdr_inline_decode at ffffffffc1218259 [sunrpc]
 torvalds#10 [ffffa42b17337cb8] __decode_op_hdr at ffffffffc128d2c2 [nfsv4]
 torvalds#11 [ffffa42b17337cf0] decode_getfattr_generic.constprop.124 at ffffffffc12980a2 [nfsv4]
 torvalds#12 [ffffa42b17337d58] nfs4_xdr_dec_open at ffffffffc1298374 [nfsv4]
 torvalds#13 [ffffa42b17337db0] call_decode at ffffffffc11f8144 [sunrpc]
 torvalds#14 [ffffa42b17337e28] __rpc_execute at ffffffffc1206ad5 [sunrpc]
 torvalds#15 [ffffa42b17337e80] rpc_async_schedule at ffffffffc1206e39 [sunrpc]
 torvalds#16 [ffffa42b17337e98] process_one_work at ffffffff8fcfe397
 torvalds#17 [ffffa42b17337ed8] worker_thread at ffffffff8fcfea60
 torvalds#18 [ffffa42b17337f10] kthread at ffffffff8fd04406
 torvalds#19 [ffffa42b17337f50] ret_from_fork at ffffffff9060023f

Signed-off-by: changxin.liu <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Dec 6, 2024
[ Upstream commit 5bf1557 ]

test_progs uses glibc specific functions backtrace() and
backtrace_symbols_fd() to print backtrace in case of SIGSEGV.

Recent commit (see fixes) updated test_progs.c to define stub versions
of the same functions with attriubte "weak" in order to allow linking
test_progs against musl libc. Unfortunately this broke the backtrace
handling for glibc builds.

As it turns out, glibc defines backtrace() and backtrace_symbols_fd()
as weak:

  $ llvm-readelf --symbols /lib64/libc.so.6 \
     | grep -P '( backtrace_symbols_fd| backtrace)$'
  4910: 0000000000126b40   161 FUNC    WEAK   DEFAULT    16 backtrace
  6843: 0000000000126f90   852 FUNC    WEAK   DEFAULT    16 backtrace_symbols_fd

So does test_progs:

 $ llvm-readelf --symbols test_progs \
    | grep -P '( backtrace_symbols_fd| backtrace)$'
  2891: 00000000006ad190    15 FUNC    WEAK   DEFAULT    13 backtrace
 11215: 00000000006ad1a0    41 FUNC    WEAK   DEFAULT    13 backtrace_symbols_fd

In such situation dynamic linker is not obliged to favour glibc
implementation over the one defined in test_progs.

Compiling with the following simple modification to test_progs.c
demonstrates the issue:

  $ git diff
  ...
  \--- a/tools/testing/selftests/bpf/test_progs.c
  \+++ b/tools/testing/selftests/bpf/test_progs.c
  \@@ -1817,6 +1817,7 @@ int main(int argc, char **argv)
          if (err)
                  return err;

  +       *(int *)0xdeadbeef  = 42;
          err = cd_flavor_subdir(argv[0]);
          if (err)
                  return err;

  $ ./test_progs
  [0]: Caught signal torvalds#11!
  Stack trace:
  <backtrace not supported>
  Segmentation fault (core dumped)

Resolve this by hiding stub definitions behind __GLIBC__ macro check
instead of using "weak" attribute.

Fixes: c9a83e7 ("selftests/bpf: Fix compile if backtrace support missing in libc")
Signed-off-by: Eduard Zingerman <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Tested-by: Tony Ambardar <[email protected]>
Reviewed-by: Tony Ambardar <[email protected]>
Acked-by: Daniel Xu <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
kevin-zhm added a commit to spacemit-com/linux-k1x that referenced this pull request Dec 7, 2024
there is a global spinlock between reset and clk, if locked in reset,
then print some debug information, maybe dead-lock when uart driver
try to disable clk.

Backtrace stopped: frame did not save the PC
(gdb) thread 4
[Switching to thread 4 (Thread 4)]
#0  cpu_relax () at ./arch/riscv/include/asm/vdso/processor.h:22
22      ./arch/riscv/include/asm/vdso/processor.h: No such file or directory.
(gdb) bt
#0  cpu_relax () at ./arch/riscv/include/asm/vdso/processor.h:22
#1  arch_spin_lock (lock=lock@entry=0xffffffff81a57cd0 <enable_lock>) at ./include/asm-generic/spinlock.h:49
#2  do_raw_spin_lock (lock=lock@entry=0xffffffff81a57cd0 <enable_lock>) at ./include/linux/spinlock.h:186
#3  0xffffffff80aa21ce in __raw_spin_lock_irqsave (lock=0xffffffff81a57cd0 <enable_lock>) at ./include/linux/spinlock_api_smp.h:111
#4  _raw_spin_lock_irqsave (lock=lock@entry=0xffffffff81a57cd0 <enable_lock>) at kernel/locking/spinlock.c:162
#5  0xffffffff80563416 in clk_enable_lock () at ./include/linux/spinlock.h:325
#6  0xffffffff805648de in clk_core_disable_lock (core=0xffffffd900512500) at drivers/clk/clk.c:1062
#7  0xffffffff8056527e in clk_disable (clk=<optimized out>) at drivers/clk/clk.c:1084
#8  clk_disable (clk=0xffffffd9048b5100) at drivers/clk/clk.c:1079
torvalds#9  0xffffffff8059e5d4 in serial_pxa_console_write (co=<optimized out>, s=0xffffffff81a68250 <text> "[   14.708612] [RESET][spacemit_reset_set][373]:assert = 1, id = 59 \n", count=<optimized out>)
    at drivers/tty/serial/pxa_k1x.c:1724
torvalds#10 0xffffffff8004a34c in call_console_driver (dropped_text=0xffffffff81a68650 <dropped_text> "", len=69,
    text=0xffffffff81a68250 <text> "[   14.708612] [RESET][spacemit_reset_set][373]:assert = 1, id = 59 \n", con=0xffffffff81964c10 <serial_pxa_console>) at kernel/printk/printk.c:1942
torvalds#11 console_emit_next_record (con=con@entry=0xffffffff81964c10 <serial_pxa_console>, ext_text=<optimized out>, dropped_text=0xffffffff81a68650 <dropped_text> "", handover=0xffffffc80578baa7,
    text=0xffffffff81a68250 <text> "[   14.708612] [RESET][spacemit_reset_set][373]:assert = 1, id = 59 \n") at kernel/printk/printk.c:2731
torvalds#12 0xffffffff8004a49a in console_flush_all (handover=0xffffffc80578baa7, next_seq=<synthetic pointer>, do_cond_resched=false) at kernel/printk/printk.c:2793
torvalds#13 console_unlock () at kernel/printk/printk.c:2860
torvalds#14 0xffffffff8004b388 in vprintk_emit (facility=facility@entry=0, level=<optimized out>, level@entry=-1, dev_info=dev_info@entry=0x0, fmt=<optimized out>, args=<optimized out>)
    at kernel/printk/printk.c:2268
torvalds#15 0xffffffff8004b3ae in vprintk_default (fmt=<optimized out>, args=<optimized out>) at kernel/printk/printk.c:2279
torvalds#16 0xffffffff8004b646 in vprintk (fmt=fmt@entry=0xffffffff813be470 "\001\066[RESET][%s][%d]:assert = %d, id = %d \n", args=args@entry=0xffffffc80578bbd8) at kernel/printk/printk_safe.c:50
torvalds#17 0xffffffff80a880d6 in _printk (fmt=fmt@entry=0xffffffff813be470 "\001\066[RESET][%s][%d]:assert = %d, id = %d \n") at kernel/printk/printk.c:2289
torvalds#18 0xffffffff80a90bb6 in spacemit_reset_set (rcdev=rcdev@entry=0xffffffff81f563a8 <k1x_reset_controller+8>, id=id@entry=59, assert=assert@entry=true) at drivers/reset/reset-spacemit-k1x.c:373
torvalds#19 0xffffffff805823b6 in spacemit_reset_update (assert=true, id=59, rcdev=0xffffffff81f563a8 <k1x_reset_controller+8>) at drivers/reset/reset-spacemit-k1x.c:401
torvalds#20 spacemit_reset_update (assert=true, id=59, rcdev=0xffffffff81f563a8 <k1x_reset_controller+8>) at drivers/reset/reset-spacemit-k1x.c:387
torvalds#21 spacemit_reset_assert (rcdev=0xffffffff81f563a8 <k1x_reset_controller+8>, id=59) at drivers/reset/reset-spacemit-k1x.c:413
torvalds#22 0xffffffff8058158e in reset_control_assert (rstc=0xffffffd902b2f280) at drivers/reset/core.c:485
torvalds#23 0xffffffff807ccf96 in cpp_disable_clocks (cpp_dev=cpp_dev@entry=0xffffffd904cc9040) at drivers/media/platform/spacemit/camera/cam_cpp/k1x_cpp.c:960
torvalds#24 0xffffffff807cd0b2 in cpp_release_hardware (cpp_dev=cpp_dev@entry=0xffffffd904cc9040) at drivers/media/platform/spacemit/camera/cam_cpp/k1x_cpp.c:1038
torvalds#25 0xffffffff807cd990 in cpp_close_node (sd=<optimized out>, fh=<optimized out>) at drivers/media/platform/spacemit/camera/cam_cpp/k1x_cpp.c:1135
torvalds#26 0xffffffff8079525e in subdev_close (file=0xffffffd906645d00) at drivers/media/v4l2-core/v4l2-subdev.c:105
torvalds#27 0xffffffff8078e49e in v4l2_release (inode=<optimized out>, filp=0xffffffd906645d00) at drivers/media/v4l2-core/v4l2-dev.c:459
torvalds#28 0xffffffff80154974 in __fput (file=0xffffffd906645d00) at fs/file_table.c:320
torvalds#29 0xffffffff80154aa2 in ____fput (work=<optimized out>) at fs/file_table.c:348
torvalds#30 0xffffffff8002677e in task_work_run () at kernel/task_work.c:179
torvalds#31 0xffffffff800053b4 in resume_user_mode_work (regs=0xffffffc80578bee0) at ./include/linux/resume_user_mode.h:49
torvalds#32 do_work_pending (regs=0xffffffc80578bee0, thread_info_flags=<optimized out>) at arch/riscv/kernel/signal.c:478
torvalds#33 0xffffffff800039c6 in handle_exception () at arch/riscv/kernel/entry.S:374
Backtrace stopped: frame did not save the PC
(gdb) thread 1
[Switching to thread 1 (Thread 1)]
#0  0xffffffff80047e9c in arch_spin_lock (lock=lock@entry=0xffffffff81a57cd8 <g_cru_lock>) at ./include/asm-generic/spinlock.h:49
49      ./include/asm-generic/spinlock.h: No such file or directory.
(gdb) bt
#0  0xffffffff80047e9c in arch_spin_lock (lock=lock@entry=0xffffffff81a57cd8 <g_cru_lock>) at ./include/asm-generic/spinlock.h:49
#1  do_raw_spin_lock (lock=lock@entry=0xffffffff81a57cd8 <g_cru_lock>) at ./include/linux/spinlock.h:186
#2  0xffffffff80aa21ce in __raw_spin_lock_irqsave (lock=0xffffffff81a57cd8 <g_cru_lock>) at ./include/linux/spinlock_api_smp.h:111
#3  _raw_spin_lock_irqsave (lock=0xffffffff81a57cd8 <g_cru_lock>) at kernel/locking/spinlock.c:162
#4  0xffffffff8056c4cc in ccu_mix_disable (hw=0xffffffff81956858 <sdh2_clk+120>) at ./include/linux/spinlock.h:325
#5  0xffffffff80564832 in clk_core_disable (core=0xffffffd900529900) at drivers/clk/clk.c:1051
#6  clk_core_disable (core=0xffffffd900529900) at drivers/clk/clk.c:1031
#7  0xffffffff805648e6 in clk_core_disable_lock (core=0xffffffd900529900) at drivers/clk/clk.c:1063
#8  0xffffffff8056527e in clk_disable (clk=<optimized out>) at drivers/clk/clk.c:1084
torvalds#9  clk_disable (clk=clk@entry=0xffffffd904fafa80) at drivers/clk/clk.c:1079
torvalds#10 0xffffffff808bb898 in clk_disable_unprepare (clk=0xffffffd904fafa80) at ./include/linux/clk.h:1085
torvalds#11 0xffffffff808bb916 in spacemit_sdhci_runtime_suspend (dev=<optimized out>) at drivers/mmc/host/sdhci-of-k1x.c:1469
torvalds#12 0xffffffff8066e8e2 in pm_generic_runtime_suspend (dev=<optimized out>) at drivers/base/power/generic_ops.c:25
torvalds#13 0xffffffff80670398 in __rpm_callback (cb=cb@entry=0xffffffff8066e8ca <pm_generic_runtime_suspend>, dev=dev@entry=0xffffffd9018a2810) at drivers/base/power/runtime.c:395
torvalds#14 0xffffffff806704b8 in rpm_callback (cb=cb@entry=0xffffffff8066e8ca <pm_generic_runtime_suspend>, dev=dev@entry=0xffffffd9018a2810) at drivers/base/power/runtime.c:529
torvalds#15 0xffffffff80670bdc in rpm_suspend (dev=0xffffffd9018a2810, rpmflags=<optimized out>) at drivers/base/power/runtime.c:672
torvalds#16 0xffffffff806716de in pm_runtime_work (work=0xffffffd9018a2948) at drivers/base/power/runtime.c:974
torvalds#17 0xffffffff800236f4 in process_one_work (worker=worker@entry=0xffffffd9013ee9c0, work=0xffffffd9018a2948) at kernel/workqueue.c:2289
torvalds#18 0xffffffff80023ba6 in worker_thread (__worker=0xffffffd9013ee9c0) at kernel/workqueue.c:2436
torvalds#19 0xffffffff80028bb2 in kthread (_create=0xffffffd9017de840) at kernel/kthread.c:376
torvalds#20 0xffffffff80003934 in handle_exception () at arch/riscv/kernel/entry.S:249
Backtrace stopped: frame did not save the PC
(gdb)

Change-Id: Ia95b41ffd6c1893c9c5e9c1c9fc0c155ea902d2c
kevin-zhm added a commit to spacemit-com/linux-k1x that referenced this pull request Dec 7, 2024
there is an invalid instrucation crash when run node.js:

[  443.219580] node[3123]: unhandled signal 4 code 0x1 at 0x00000038be663620
[  443.226499] CPU: 5 PID: 3123 Comm: node Not tainted 6.6.36+ torvalds#11
[  443.232501] Hardware name: spacemit k1-x deb1 board (DT)
[  443.237875] epc : 00000038be663620 ra : 00000038be652e00 sp : 0000003ff310a000
[  443.245195]  gp : 000000000447d6d0 tp : 0000003f82e2b780 t0 : 0000003e5c000000
[  443.252501]  t1 : 00000000000d31b8 t2 : 0000000000000063 s0 : 0000003ff310a050
[  443.259815]  s1 : 0000003ff3109fd0 a0 : 0000003c1e11ba29 a1 : 0000000000000004
[  443.267121]  a2 : 00000000000d31b8 a3 : 0000000000000003 a4 : 000000000019759e
[  443.274435]  a5 : 0000000000000075 a6 : 000000000000006c a7 : 0000000000000065
[  443.281749]  s2 : 00000000010df958 s3 : 0000000000000001 s4 : 0000003e5c0d31b8
[  443.289054]  s5 : 00000000045442e0 s6 : 0000000004544260 s7 : 0000000ba8d91399
[  443.296368]  s8 : 0000000000000000 s9 : 00000038be650168 s10: 0000000ba8d9fa81
[  443.303674]  s11: 0000000000000000 t3 : 00000038be650198 t4 : 0000002200000000
[  443.310980]  t5 : 0000000000000008 t6 : 00000038be663620
[  443.316352] status: 8000000200006020 badaddr: 0000000000800e13 cause: 0000000000000002
the op-code 0x00800e13 should be a valid instruction 'li t3, 0'
the cause of the issue is that the i-cache data is wrong, when flush i-cahce request from user-space,
icache of all cores related to the process should be flushed

Change-Id: I0a06c77a2a3c1aa7aaf1e930eaa774d405e6fddb
kevin-zhm added a commit to spacemit-com/linux-k1x that referenced this pull request Dec 7, 2024
there is a global spinlock between reset and clk, if locked in reset,
then print some debug information, maybe dead-lock when uart driver
try to disable clk.

Backtrace stopped: frame did not save the PC
(gdb) thread 4
[Switching to thread 4 (Thread 4)]
#0  cpu_relax () at ./arch/riscv/include/asm/vdso/processor.h:22
22      ./arch/riscv/include/asm/vdso/processor.h: No such file or directory.
(gdb) bt
#0  cpu_relax () at ./arch/riscv/include/asm/vdso/processor.h:22
#1  arch_spin_lock (lock=lock@entry=0xffffffff81a57cd0 <enable_lock>) at ./include/asm-generic/spinlock.h:49
#2  do_raw_spin_lock (lock=lock@entry=0xffffffff81a57cd0 <enable_lock>) at ./include/linux/spinlock.h:186
#3  0xffffffff80aa21ce in __raw_spin_lock_irqsave (lock=0xffffffff81a57cd0 <enable_lock>) at ./include/linux/spinlock_api_smp.h:111
#4  _raw_spin_lock_irqsave (lock=lock@entry=0xffffffff81a57cd0 <enable_lock>) at kernel/locking/spinlock.c:162
#5  0xffffffff80563416 in clk_enable_lock () at ./include/linux/spinlock.h:325
#6  0xffffffff805648de in clk_core_disable_lock (core=0xffffffd900512500) at drivers/clk/clk.c:1062
#7  0xffffffff8056527e in clk_disable (clk=<optimized out>) at drivers/clk/clk.c:1084
#8  clk_disable (clk=0xffffffd9048b5100) at drivers/clk/clk.c:1079
torvalds#9  0xffffffff8059e5d4 in serial_pxa_console_write (co=<optimized out>, s=0xffffffff81a68250 <text> "[   14.708612] [RESET][spacemit_reset_set][373]:assert = 1, id = 59 \n", count=<optimized out>)
    at drivers/tty/serial/pxa_k1x.c:1724
torvalds#10 0xffffffff8004a34c in call_console_driver (dropped_text=0xffffffff81a68650 <dropped_text> "", len=69,
    text=0xffffffff81a68250 <text> "[   14.708612] [RESET][spacemit_reset_set][373]:assert = 1, id = 59 \n", con=0xffffffff81964c10 <serial_pxa_console>) at kernel/printk/printk.c:1942
torvalds#11 console_emit_next_record (con=con@entry=0xffffffff81964c10 <serial_pxa_console>, ext_text=<optimized out>, dropped_text=0xffffffff81a68650 <dropped_text> "", handover=0xffffffc80578baa7,
    text=0xffffffff81a68250 <text> "[   14.708612] [RESET][spacemit_reset_set][373]:assert = 1, id = 59 \n") at kernel/printk/printk.c:2731
torvalds#12 0xffffffff8004a49a in console_flush_all (handover=0xffffffc80578baa7, next_seq=<synthetic pointer>, do_cond_resched=false) at kernel/printk/printk.c:2793
torvalds#13 console_unlock () at kernel/printk/printk.c:2860
torvalds#14 0xffffffff8004b388 in vprintk_emit (facility=facility@entry=0, level=<optimized out>, level@entry=-1, dev_info=dev_info@entry=0x0, fmt=<optimized out>, args=<optimized out>)
    at kernel/printk/printk.c:2268
torvalds#15 0xffffffff8004b3ae in vprintk_default (fmt=<optimized out>, args=<optimized out>) at kernel/printk/printk.c:2279
torvalds#16 0xffffffff8004b646 in vprintk (fmt=fmt@entry=0xffffffff813be470 "\001\066[RESET][%s][%d]:assert = %d, id = %d \n", args=args@entry=0xffffffc80578bbd8) at kernel/printk/printk_safe.c:50
torvalds#17 0xffffffff80a880d6 in _printk (fmt=fmt@entry=0xffffffff813be470 "\001\066[RESET][%s][%d]:assert = %d, id = %d \n") at kernel/printk/printk.c:2289
torvalds#18 0xffffffff80a90bb6 in spacemit_reset_set (rcdev=rcdev@entry=0xffffffff81f563a8 <k1x_reset_controller+8>, id=id@entry=59, assert=assert@entry=true) at drivers/reset/reset-spacemit-k1x.c:373
torvalds#19 0xffffffff805823b6 in spacemit_reset_update (assert=true, id=59, rcdev=0xffffffff81f563a8 <k1x_reset_controller+8>) at drivers/reset/reset-spacemit-k1x.c:401
torvalds#20 spacemit_reset_update (assert=true, id=59, rcdev=0xffffffff81f563a8 <k1x_reset_controller+8>) at drivers/reset/reset-spacemit-k1x.c:387
torvalds#21 spacemit_reset_assert (rcdev=0xffffffff81f563a8 <k1x_reset_controller+8>, id=59) at drivers/reset/reset-spacemit-k1x.c:413
torvalds#22 0xffffffff8058158e in reset_control_assert (rstc=0xffffffd902b2f280) at drivers/reset/core.c:485
torvalds#23 0xffffffff807ccf96 in cpp_disable_clocks (cpp_dev=cpp_dev@entry=0xffffffd904cc9040) at drivers/media/platform/spacemit/camera/cam_cpp/k1x_cpp.c:960
torvalds#24 0xffffffff807cd0b2 in cpp_release_hardware (cpp_dev=cpp_dev@entry=0xffffffd904cc9040) at drivers/media/platform/spacemit/camera/cam_cpp/k1x_cpp.c:1038
torvalds#25 0xffffffff807cd990 in cpp_close_node (sd=<optimized out>, fh=<optimized out>) at drivers/media/platform/spacemit/camera/cam_cpp/k1x_cpp.c:1135
torvalds#26 0xffffffff8079525e in subdev_close (file=0xffffffd906645d00) at drivers/media/v4l2-core/v4l2-subdev.c:105
torvalds#27 0xffffffff8078e49e in v4l2_release (inode=<optimized out>, filp=0xffffffd906645d00) at drivers/media/v4l2-core/v4l2-dev.c:459
torvalds#28 0xffffffff80154974 in __fput (file=0xffffffd906645d00) at fs/file_table.c:320
torvalds#29 0xffffffff80154aa2 in ____fput (work=<optimized out>) at fs/file_table.c:348
torvalds#30 0xffffffff8002677e in task_work_run () at kernel/task_work.c:179
torvalds#31 0xffffffff800053b4 in resume_user_mode_work (regs=0xffffffc80578bee0) at ./include/linux/resume_user_mode.h:49
torvalds#32 do_work_pending (regs=0xffffffc80578bee0, thread_info_flags=<optimized out>) at arch/riscv/kernel/signal.c:478
torvalds#33 0xffffffff800039c6 in handle_exception () at arch/riscv/kernel/entry.S:374
Backtrace stopped: frame did not save the PC
(gdb) thread 1
[Switching to thread 1 (Thread 1)]
#0  0xffffffff80047e9c in arch_spin_lock (lock=lock@entry=0xffffffff81a57cd8 <g_cru_lock>) at ./include/asm-generic/spinlock.h:49
49      ./include/asm-generic/spinlock.h: No such file or directory.
(gdb) bt
#0  0xffffffff80047e9c in arch_spin_lock (lock=lock@entry=0xffffffff81a57cd8 <g_cru_lock>) at ./include/asm-generic/spinlock.h:49
#1  do_raw_spin_lock (lock=lock@entry=0xffffffff81a57cd8 <g_cru_lock>) at ./include/linux/spinlock.h:186
#2  0xffffffff80aa21ce in __raw_spin_lock_irqsave (lock=0xffffffff81a57cd8 <g_cru_lock>) at ./include/linux/spinlock_api_smp.h:111
#3  _raw_spin_lock_irqsave (lock=0xffffffff81a57cd8 <g_cru_lock>) at kernel/locking/spinlock.c:162
#4  0xffffffff8056c4cc in ccu_mix_disable (hw=0xffffffff81956858 <sdh2_clk+120>) at ./include/linux/spinlock.h:325
#5  0xffffffff80564832 in clk_core_disable (core=0xffffffd900529900) at drivers/clk/clk.c:1051
#6  clk_core_disable (core=0xffffffd900529900) at drivers/clk/clk.c:1031
#7  0xffffffff805648e6 in clk_core_disable_lock (core=0xffffffd900529900) at drivers/clk/clk.c:1063
#8  0xffffffff8056527e in clk_disable (clk=<optimized out>) at drivers/clk/clk.c:1084
torvalds#9  clk_disable (clk=clk@entry=0xffffffd904fafa80) at drivers/clk/clk.c:1079
torvalds#10 0xffffffff808bb898 in clk_disable_unprepare (clk=0xffffffd904fafa80) at ./include/linux/clk.h:1085
torvalds#11 0xffffffff808bb916 in spacemit_sdhci_runtime_suspend (dev=<optimized out>) at drivers/mmc/host/sdhci-of-k1x.c:1469
torvalds#12 0xffffffff8066e8e2 in pm_generic_runtime_suspend (dev=<optimized out>) at drivers/base/power/generic_ops.c:25
torvalds#13 0xffffffff80670398 in __rpm_callback (cb=cb@entry=0xffffffff8066e8ca <pm_generic_runtime_suspend>, dev=dev@entry=0xffffffd9018a2810) at drivers/base/power/runtime.c:395
torvalds#14 0xffffffff806704b8 in rpm_callback (cb=cb@entry=0xffffffff8066e8ca <pm_generic_runtime_suspend>, dev=dev@entry=0xffffffd9018a2810) at drivers/base/power/runtime.c:529
torvalds#15 0xffffffff80670bdc in rpm_suspend (dev=0xffffffd9018a2810, rpmflags=<optimized out>) at drivers/base/power/runtime.c:672
torvalds#16 0xffffffff806716de in pm_runtime_work (work=0xffffffd9018a2948) at drivers/base/power/runtime.c:974
torvalds#17 0xffffffff800236f4 in process_one_work (worker=worker@entry=0xffffffd9013ee9c0, work=0xffffffd9018a2948) at kernel/workqueue.c:2289
torvalds#18 0xffffffff80023ba6 in worker_thread (__worker=0xffffffd9013ee9c0) at kernel/workqueue.c:2436
torvalds#19 0xffffffff80028bb2 in kthread (_create=0xffffffd9017de840) at kernel/kthread.c:376
torvalds#20 0xffffffff80003934 in handle_exception () at arch/riscv/kernel/entry.S:249
Backtrace stopped: frame did not save the PC
(gdb)

Change-Id: Ia95b41ffd6c1893c9c5e9c1c9fc0c155ea902d2c
kevin-zhm added a commit to spacemit-com/linux-k1x that referenced this pull request Dec 7, 2024
there is an invalid instrucation crash when run node.js:

[  443.219580] node[3123]: unhandled signal 4 code 0x1 at 0x00000038be663620
[  443.226499] CPU: 5 PID: 3123 Comm: node Not tainted 6.6.36+ torvalds#11
[  443.232501] Hardware name: spacemit k1-x deb1 board (DT)
[  443.237875] epc : 00000038be663620 ra : 00000038be652e00 sp : 0000003ff310a000
[  443.245195]  gp : 000000000447d6d0 tp : 0000003f82e2b780 t0 : 0000003e5c000000
[  443.252501]  t1 : 00000000000d31b8 t2 : 0000000000000063 s0 : 0000003ff310a050
[  443.259815]  s1 : 0000003ff3109fd0 a0 : 0000003c1e11ba29 a1 : 0000000000000004
[  443.267121]  a2 : 00000000000d31b8 a3 : 0000000000000003 a4 : 000000000019759e
[  443.274435]  a5 : 0000000000000075 a6 : 000000000000006c a7 : 0000000000000065
[  443.281749]  s2 : 00000000010df958 s3 : 0000000000000001 s4 : 0000003e5c0d31b8
[  443.289054]  s5 : 00000000045442e0 s6 : 0000000004544260 s7 : 0000000ba8d91399
[  443.296368]  s8 : 0000000000000000 s9 : 00000038be650168 s10: 0000000ba8d9fa81
[  443.303674]  s11: 0000000000000000 t3 : 00000038be650198 t4 : 0000002200000000
[  443.310980]  t5 : 0000000000000008 t6 : 00000038be663620
[  443.316352] status: 8000000200006020 badaddr: 0000000000800e13 cause: 0000000000000002
the op-code 0x00800e13 should be a valid instruction 'li t3, 0'
the cause of the issue is that the i-cache data is wrong, when flush i-cahce request from user-space,
icache of all cores related to the process should be flushed

Change-Id: I0a06c77a2a3c1aa7aaf1e930eaa774d405e6fddb
kuba-moo added a commit to linux-netdev/testing that referenced this pull request Dec 10, 2024
Petr Machata says:

====================
vxlan: Support user-defined reserved bits

Currently the VXLAN header validation works by vxlan_rcv() going feature
by feature, each feature clearing the bits that it consumes. If anything
is left unparsed at the end, the packet is rejected.

Unfortunately there are machines out there that send VXLAN packets with
reserved bits set, even if they are configured to not use the
corresponding features. One such report is here[1], and we have heard
similar complaints from our customers as well.

This patchset adds an attribute that makes it configurable which bits
the user wishes to tolerate and which they consider reserved. This was
recommended in [1] as well.

A knob like that inevitably allows users to set as reserved bits that
are in fact required for the features enabled by the netdevice, such as
GPE. This is detected, and such configurations are rejected.

In patches #1..torvalds#7, the reserved bits validation code is gradually moved
away from the unparsed approach described above, to one where a given
set of valid bits is precomputed and then the packet is validated
against that.

In patch torvalds#8, this precomputed set is made configurable through a new
attribute IFLA_VXLAN_RESERVED_BITS.

Patches torvalds#9 and torvalds#10 massage the testsuite a bit, so that patch torvalds#11 can
introduce a selftest for the resreved bits feature.

The corresponding iproute2 support is available in [2].

[1] https://lore.kernel.org/netdev/[email protected]/
[2] https://github.com/pmachata/iproute2/commits/vxlan_reserved_bits/
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
roxell added a commit to roxell/linux that referenced this pull request Dec 11, 2024
[  123.491737][    T1] Unexpected kernel BRK exception at EL1
[  123.497593][    T1] Internal error: ptrace BRK handler: f20003e8 [#1] PREEMPT SMP
[  123.500785][    T1] Modules linked in:
[  123.502567][    T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.8.0-rc3-next-20200630-00003-g15e24419c239-dirty torvalds#11
[  123.507468][    T1] Hardware name: linux,dummy-virt (DT)
[  123.509826][    T1] pstate: 80400005 (Nzcv daif +PAN -UAO BTYPE=--)
[  123.512609][    T1] pc : of_unittest_untrack_overlay+0x64/0x134
[  123.515245][    T1] lr : of_unittest_untrack_overlay+0x64/0x134
[  123.517848][    T1] sp : ffff00006a65fb30
[  123.519668][    T1] x29: ffff00006a65fb30 x28: 0000000000000000
[  123.522295][    T1] x27: ffff00006a65fc30 x26: ffffa00016b86f00
[  123.524937][    T1] x25: 0000000000000000 x24: 0000000000000000
[  123.527592][    T1] x23: ffffa00014c72540 x22: ffffa00016b86000
[  123.530191][    T1] x21: 0000000000000000 x20: 00000000ffffffff
[  123.532845][    T1] x19: 00000000ffffffff x18: 0000000000002690
[  123.535547][    T1] x17: 0000000000002718 x16: 00000000000014b8
[  123.538299][    T1] x15: 0000000000000001 x14: 0080000000000000
[  123.541055][    T1] x13: 0000000000000002 x12: ffff94000298d209
[  123.543801][    T1] x11: 1ffff4000298d208 x10: ffff94000298d208
[  123.546580][    T1] x9 : dfffa00000000000 x8 : ffffa00014c69047
[  123.549247][    T1] x7 : 0000000000000001 x6 : ffffa00014c69040
[  123.552026][    T1] x5 : ffff00006a654040 x4 : 0000000000000000
[  123.554799][    T1] x3 : ffffa00011d59d04 x2 : 00000000ffffffff
[  123.557541][    T1] x1 : ffff00006a654040 x0 : 0000000000000000
[  123.560390][    T1] Call trace:
[  123.561935][    T1]  of_unittest_untrack_overlay+0x64/0x134
[  123.564469][    T1]  of_unittest+0x2220/0x2438
[  123.566585][    T1]  do_one_initcall+0x470/0xa10
[  123.568751][    T1]  kernel_init_freeable+0x510/0x5f0
[  123.571123][    T1]  kernel_init+0x18/0x1e8
[  123.573078][    T1]  ret_from_fork+0x10/0x18
[  123.575119][    T1] Code: 97978a9c d4210000 14000024 97978a99 (d4207d00)
[  123.578138][    T1] ---[ end trace c4e049fb5e3b0ba0 ]---
[  123.580449][    T1] Kernel panic - not syncing: Fatal exception
[  123.583116][    T1] Kernel Offset: disabled
[  123.585066][    T1] CPU features: 0x240002,20002004
[  123.587259][    T1] Memory Limit: none
[  123.588986][    T1] ---[ end Kernel panic - not syncing: Fatal exception ]---

Signed-off-by: Anders Roxell <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Dec 12, 2024
Once we are inside the 'ext4_xattr_delete_inode' function and trying
to delete the inode, the 'xattr_sem' should be unlocked.

We need trylock here to avoid false-positive warning from lockdep
about reclaim circular dependency.

This fixes the following KASAN reported issue:

==================================================================
BUG: KASAN: slab-use-after-free in ext4_xattr_inode_dec_ref_all+0xb8c/0xe90
Read of size 4 at addr ffff888012c120c4 by task repro/2065

CPU: 1 UID: 0 PID: 2065 Comm: repro Not tainted 6.13.0-rc2+ torvalds#11
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x1fd/0x300
 ? tcp_gro_dev_warn+0x260/0x260
 ? _printk+0xc0/0x100
 ? read_lock_is_recursive+0x10/0x10
 ? irq_work_queue+0x72/0xf0
 ? __virt_addr_valid+0x17b/0x4b0
 print_address_description+0x78/0x390
 print_report+0x107/0x1f0
 ? __virt_addr_valid+0x17b/0x4b0
 ? __virt_addr_valid+0x3ff/0x4b0
 ? __phys_addr+0xb5/0x160
 ? ext4_xattr_inode_dec_ref_all+0xb8c/0xe90
 kasan_report+0xcc/0x100
 ? ext4_xattr_inode_dec_ref_all+0xb8c/0xe90
 ext4_xattr_inode_dec_ref_all+0xb8c/0xe90
 ? ext4_xattr_delete_inode+0xd30/0xd30
 ? __ext4_journal_ensure_credits+0x5f0/0x5f0
 ? __ext4_journal_ensure_credits+0x2b/0x5f0
 ? inode_update_timestamps+0x410/0x410
 ext4_xattr_delete_inode+0xb64/0xd30
 ? ext4_truncate+0xb70/0xdc0
 ? ext4_expand_extra_isize_ea+0x1d20/0x1d20
 ? __ext4_mark_inode_dirty+0x670/0x670
 ? ext4_journal_check_start+0x16f/0x240
 ? ext4_inode_is_fast_symlink+0x2f2/0x3a0
 ext4_evict_inode+0xc8c/0xff0
 ? ext4_inode_is_fast_symlink+0x3a0/0x3a0
 ? do_raw_spin_unlock+0x53/0x8a0
 ? ext4_inode_is_fast_symlink+0x3a0/0x3a0
 evict+0x4ac/0x950
 ? proc_nr_inodes+0x310/0x310
 ? trace_ext4_drop_inode+0xa2/0x220
 ? _raw_spin_unlock+0x1a/0x30
 ? iput+0x4cb/0x7e0
 do_unlinkat+0x495/0x7c0
 ? try_break_deleg+0x120/0x120
 ? 0xffffffff81000000
 ? __check_object_size+0x15a/0x210
 ? strncpy_from_user+0x13e/0x250
 ? getname_flags+0x1dc/0x530
 __x64_sys_unlinkat+0xc8/0xf0
 do_syscall_64+0x65/0x110
 entry_SYSCALL_64_after_hwframe+0x67/0x6f
RIP: 0033:0x434ffd
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8
RSP: 002b:00007ffc50fa7b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
RAX: ffffffffffffffda RBX: 00007ffc50fa7e18 RCX: 0000000000434ffd
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000005
RBP: 00007ffc50fa7be0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00007ffc50fa7e08 R14: 00000000004bbf30 R15: 0000000000000001
 </TASK>

The buggy address belongs to the object at ffff888012c12000
 which belongs to the cache filp of size 360
The buggy address is located 196 bytes inside of
 freed 360-byte region [ffff888012c12000, ffff888012c12168)

The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x12c12
head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x40(head|node=0|zone=0)
page_type: f5(slab)
raw: 0000000000000040 ffff888000ad7640 ffffea0000497a00 dead000000000004
raw: 0000000000000000 0000000000100010 00000001f5000000 0000000000000000
head: 0000000000000040 ffff888000ad7640 ffffea0000497a00 dead000000000004
head: 0000000000000000 0000000000100010 00000001f5000000 0000000000000000
head: 0000000000000001 ffffea00004b0481 ffffffffffffffff 0000000000000000
head: 0000000000000002 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888012c11f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff888012c12000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff888012c12080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                           ^
 ffff888012c12100: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc
 ffff888012c12180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================

Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=b244bda78289b00204ed
Signed-off-by: Bhupesh <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Dec 12, 2024
[ Upstream commit 5bf1557 ]

test_progs uses glibc specific functions backtrace() and
backtrace_symbols_fd() to print backtrace in case of SIGSEGV.

Recent commit (see fixes) updated test_progs.c to define stub versions
of the same functions with attriubte "weak" in order to allow linking
test_progs against musl libc. Unfortunately this broke the backtrace
handling for glibc builds.

As it turns out, glibc defines backtrace() and backtrace_symbols_fd()
as weak:

  $ llvm-readelf --symbols /lib64/libc.so.6 \
     | grep -P '( backtrace_symbols_fd| backtrace)$'
  4910: 0000000000126b40   161 FUNC    WEAK   DEFAULT    16 backtrace
  6843: 0000000000126f90   852 FUNC    WEAK   DEFAULT    16 backtrace_symbols_fd

So does test_progs:

 $ llvm-readelf --symbols test_progs \
    | grep -P '( backtrace_symbols_fd| backtrace)$'
  2891: 00000000006ad190    15 FUNC    WEAK   DEFAULT    13 backtrace
 11215: 00000000006ad1a0    41 FUNC    WEAK   DEFAULT    13 backtrace_symbols_fd

In such situation dynamic linker is not obliged to favour glibc
implementation over the one defined in test_progs.

Compiling with the following simple modification to test_progs.c
demonstrates the issue:

  $ git diff
  ...
  \--- a/tools/testing/selftests/bpf/test_progs.c
  \+++ b/tools/testing/selftests/bpf/test_progs.c
  \@@ -1817,6 +1817,7 @@ int main(int argc, char **argv)
          if (err)
                  return err;

  +       *(int *)0xdeadbeef  = 42;
          err = cd_flavor_subdir(argv[0]);
          if (err)
                  return err;

  $ ./test_progs
  [0]: Caught signal torvalds#11!
  Stack trace:
  <backtrace not supported>
  Segmentation fault (core dumped)

Resolve this by hiding stub definitions behind __GLIBC__ macro check
instead of using "weak" attribute.

Fixes: c9a83e7 ("selftests/bpf: Fix compile if backtrace support missing in libc")
Signed-off-by: Eduard Zingerman <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Tested-by: Tony Ambardar <[email protected]>
Reviewed-by: Tony Ambardar <[email protected]>
Acked-by: Daniel Xu <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
roxell added a commit to roxell/linux that referenced this pull request Dec 13, 2024
[  123.491737][    T1] Unexpected kernel BRK exception at EL1
[  123.497593][    T1] Internal error: ptrace BRK handler: f20003e8 [#1] PREEMPT SMP
[  123.500785][    T1] Modules linked in:
[  123.502567][    T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.8.0-rc3-next-20200630-00003-g15e24419c239-dirty torvalds#11
[  123.507468][    T1] Hardware name: linux,dummy-virt (DT)
[  123.509826][    T1] pstate: 80400005 (Nzcv daif +PAN -UAO BTYPE=--)
[  123.512609][    T1] pc : of_unittest_untrack_overlay+0x64/0x134
[  123.515245][    T1] lr : of_unittest_untrack_overlay+0x64/0x134
[  123.517848][    T1] sp : ffff00006a65fb30
[  123.519668][    T1] x29: ffff00006a65fb30 x28: 0000000000000000
[  123.522295][    T1] x27: ffff00006a65fc30 x26: ffffa00016b86f00
[  123.524937][    T1] x25: 0000000000000000 x24: 0000000000000000
[  123.527592][    T1] x23: ffffa00014c72540 x22: ffffa00016b86000
[  123.530191][    T1] x21: 0000000000000000 x20: 00000000ffffffff
[  123.532845][    T1] x19: 00000000ffffffff x18: 0000000000002690
[  123.535547][    T1] x17: 0000000000002718 x16: 00000000000014b8
[  123.538299][    T1] x15: 0000000000000001 x14: 0080000000000000
[  123.541055][    T1] x13: 0000000000000002 x12: ffff94000298d209
[  123.543801][    T1] x11: 1ffff4000298d208 x10: ffff94000298d208
[  123.546580][    T1] x9 : dfffa00000000000 x8 : ffffa00014c69047
[  123.549247][    T1] x7 : 0000000000000001 x6 : ffffa00014c69040
[  123.552026][    T1] x5 : ffff00006a654040 x4 : 0000000000000000
[  123.554799][    T1] x3 : ffffa00011d59d04 x2 : 00000000ffffffff
[  123.557541][    T1] x1 : ffff00006a654040 x0 : 0000000000000000
[  123.560390][    T1] Call trace:
[  123.561935][    T1]  of_unittest_untrack_overlay+0x64/0x134
[  123.564469][    T1]  of_unittest+0x2220/0x2438
[  123.566585][    T1]  do_one_initcall+0x470/0xa10
[  123.568751][    T1]  kernel_init_freeable+0x510/0x5f0
[  123.571123][    T1]  kernel_init+0x18/0x1e8
[  123.573078][    T1]  ret_from_fork+0x10/0x18
[  123.575119][    T1] Code: 97978a9c d4210000 14000024 97978a99 (d4207d00)
[  123.578138][    T1] ---[ end trace c4e049fb5e3b0ba0 ]---
[  123.580449][    T1] Kernel panic - not syncing: Fatal exception
[  123.583116][    T1] Kernel Offset: disabled
[  123.585066][    T1] CPU features: 0x240002,20002004
[  123.587259][    T1] Memory Limit: none
[  123.588986][    T1] ---[ end Kernel panic - not syncing: Fatal exception ]---

Signed-off-by: Anders Roxell <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Dec 13, 2024
[ Upstream commit 5bf1557 ]

test_progs uses glibc specific functions backtrace() and
backtrace_symbols_fd() to print backtrace in case of SIGSEGV.

Recent commit (see fixes) updated test_progs.c to define stub versions
of the same functions with attriubte "weak" in order to allow linking
test_progs against musl libc. Unfortunately this broke the backtrace
handling for glibc builds.

As it turns out, glibc defines backtrace() and backtrace_symbols_fd()
as weak:

  $ llvm-readelf --symbols /lib64/libc.so.6 \
     | grep -P '( backtrace_symbols_fd| backtrace)$'
  4910: 0000000000126b40   161 FUNC    WEAK   DEFAULT    16 backtrace
  6843: 0000000000126f90   852 FUNC    WEAK   DEFAULT    16 backtrace_symbols_fd

So does test_progs:

 $ llvm-readelf --symbols test_progs \
    | grep -P '( backtrace_symbols_fd| backtrace)$'
  2891: 00000000006ad190    15 FUNC    WEAK   DEFAULT    13 backtrace
 11215: 00000000006ad1a0    41 FUNC    WEAK   DEFAULT    13 backtrace_symbols_fd

In such situation dynamic linker is not obliged to favour glibc
implementation over the one defined in test_progs.

Compiling with the following simple modification to test_progs.c
demonstrates the issue:

  $ git diff
  ...
  \--- a/tools/testing/selftests/bpf/test_progs.c
  \+++ b/tools/testing/selftests/bpf/test_progs.c
  \@@ -1817,6 +1817,7 @@ int main(int argc, char **argv)
          if (err)
                  return err;

  +       *(int *)0xdeadbeef  = 42;
          err = cd_flavor_subdir(argv[0]);
          if (err)
                  return err;

  $ ./test_progs
  [0]: Caught signal torvalds#11!
  Stack trace:
  <backtrace not supported>
  Segmentation fault (core dumped)

Resolve this by hiding stub definitions behind __GLIBC__ macro check
instead of using "weak" attribute.

Fixes: c9a83e7 ("selftests/bpf: Fix compile if backtrace support missing in libc")
Signed-off-by: Eduard Zingerman <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Tested-by: Tony Ambardar <[email protected]>
Reviewed-by: Tony Ambardar <[email protected]>
Acked-by: Daniel Xu <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
staging-kernelci-org pushed a commit to kernelci/linux that referenced this pull request Dec 15, 2024
[ Upstream commit 5bf1557 ]

test_progs uses glibc specific functions backtrace() and
backtrace_symbols_fd() to print backtrace in case of SIGSEGV.

Recent commit (see fixes) updated test_progs.c to define stub versions
of the same functions with attriubte "weak" in order to allow linking
test_progs against musl libc. Unfortunately this broke the backtrace
handling for glibc builds.

As it turns out, glibc defines backtrace() and backtrace_symbols_fd()
as weak:

  $ llvm-readelf --symbols /lib64/libc.so.6 \
     | grep -P '( backtrace_symbols_fd| backtrace)$'
  4910: 0000000000126b40   161 FUNC    WEAK   DEFAULT    16 backtrace
  6843: 0000000000126f90   852 FUNC    WEAK   DEFAULT    16 backtrace_symbols_fd

So does test_progs:

 $ llvm-readelf --symbols test_progs \
    | grep -P '( backtrace_symbols_fd| backtrace)$'
  2891: 00000000006ad190    15 FUNC    WEAK   DEFAULT    13 backtrace
 11215: 00000000006ad1a0    41 FUNC    WEAK   DEFAULT    13 backtrace_symbols_fd

In such situation dynamic linker is not obliged to favour glibc
implementation over the one defined in test_progs.

Compiling with the following simple modification to test_progs.c
demonstrates the issue:

  $ git diff
  ...
  \--- a/tools/testing/selftests/bpf/test_progs.c
  \+++ b/tools/testing/selftests/bpf/test_progs.c
  \@@ -1817,6 +1817,7 @@ int main(int argc, char **argv)
          if (err)
                  return err;

  +       *(int *)0xdeadbeef  = 42;
          err = cd_flavor_subdir(argv[0]);
          if (err)
                  return err;

  $ ./test_progs
  [0]: Caught signal torvalds#11!
  Stack trace:
  <backtrace not supported>
  Segmentation fault (core dumped)

Resolve this by hiding stub definitions behind __GLIBC__ macro check
instead of using "weak" attribute.

Fixes: c9a83e7 ("selftests/bpf: Fix compile if backtrace support missing in libc")
Signed-off-by: Eduard Zingerman <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Tested-by: Tony Ambardar <[email protected]>
Reviewed-by: Tony Ambardar <[email protected]>
Acked-by: Daniel Xu <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
roxell added a commit to roxell/linux that referenced this pull request Dec 17, 2024
[  123.491737][    T1] Unexpected kernel BRK exception at EL1
[  123.497593][    T1] Internal error: ptrace BRK handler: f20003e8 [#1] PREEMPT SMP
[  123.500785][    T1] Modules linked in:
[  123.502567][    T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.8.0-rc3-next-20200630-00003-g15e24419c239-dirty torvalds#11
[  123.507468][    T1] Hardware name: linux,dummy-virt (DT)
[  123.509826][    T1] pstate: 80400005 (Nzcv daif +PAN -UAO BTYPE=--)
[  123.512609][    T1] pc : of_unittest_untrack_overlay+0x64/0x134
[  123.515245][    T1] lr : of_unittest_untrack_overlay+0x64/0x134
[  123.517848][    T1] sp : ffff00006a65fb30
[  123.519668][    T1] x29: ffff00006a65fb30 x28: 0000000000000000
[  123.522295][    T1] x27: ffff00006a65fc30 x26: ffffa00016b86f00
[  123.524937][    T1] x25: 0000000000000000 x24: 0000000000000000
[  123.527592][    T1] x23: ffffa00014c72540 x22: ffffa00016b86000
[  123.530191][    T1] x21: 0000000000000000 x20: 00000000ffffffff
[  123.532845][    T1] x19: 00000000ffffffff x18: 0000000000002690
[  123.535547][    T1] x17: 0000000000002718 x16: 00000000000014b8
[  123.538299][    T1] x15: 0000000000000001 x14: 0080000000000000
[  123.541055][    T1] x13: 0000000000000002 x12: ffff94000298d209
[  123.543801][    T1] x11: 1ffff4000298d208 x10: ffff94000298d208
[  123.546580][    T1] x9 : dfffa00000000000 x8 : ffffa00014c69047
[  123.549247][    T1] x7 : 0000000000000001 x6 : ffffa00014c69040
[  123.552026][    T1] x5 : ffff00006a654040 x4 : 0000000000000000
[  123.554799][    T1] x3 : ffffa00011d59d04 x2 : 00000000ffffffff
[  123.557541][    T1] x1 : ffff00006a654040 x0 : 0000000000000000
[  123.560390][    T1] Call trace:
[  123.561935][    T1]  of_unittest_untrack_overlay+0x64/0x134
[  123.564469][    T1]  of_unittest+0x2220/0x2438
[  123.566585][    T1]  do_one_initcall+0x470/0xa10
[  123.568751][    T1]  kernel_init_freeable+0x510/0x5f0
[  123.571123][    T1]  kernel_init+0x18/0x1e8
[  123.573078][    T1]  ret_from_fork+0x10/0x18
[  123.575119][    T1] Code: 97978a9c d4210000 14000024 97978a99 (d4207d00)
[  123.578138][    T1] ---[ end trace c4e049fb5e3b0ba0 ]---
[  123.580449][    T1] Kernel panic - not syncing: Fatal exception
[  123.583116][    T1] Kernel Offset: disabled
[  123.585066][    T1] CPU features: 0x240002,20002004
[  123.587259][    T1] Memory Limit: none
[  123.588986][    T1] ---[ end Kernel panic - not syncing: Fatal exception ]---

Signed-off-by: Anders Roxell <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Dec 21, 2024
…atch-fixes

WARNING: Possible repeated word: 'to'
torvalds#11: 
set as null leaving it to to be accessed.  Additionally, the read-only

WARNING: Please use correct Fixes: style 'Fixes: <12 chars of sha1> ("<title line>")' - ie: 'Fixes: fatal: not a ("nux-next'")'
torvalds#21: 
Fixes: 8f9e8f5 ("ocfs2: Fix Q_GETNEXTQUOTA for filesystem without quotas")

WARNING: Reported-by: should be immediately followed by Closes: with a URL to the report
torvalds#23: 
Reported-by: [email protected]
Tested-by: [email protected]

ERROR: space required before the open brace '{'
torvalds#47: FILE: fs/ocfs2/quota_global.c:896:
+	if (!sb_has_quota_active(sb, type)){

total: 1 errors, 3 warnings, 15 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/ocfs2-fix-slab-use-after-free-due-to-dangling-pointer-dqi_priv.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Changwei Ge <[email protected]>
Cc: Dennis Lam <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
roxell added a commit to roxell/linux that referenced this pull request Dec 22, 2024
[  123.491737][    T1] Unexpected kernel BRK exception at EL1
[  123.497593][    T1] Internal error: ptrace BRK handler: f20003e8 [#1] PREEMPT SMP
[  123.500785][    T1] Modules linked in:
[  123.502567][    T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.8.0-rc3-next-20200630-00003-g15e24419c239-dirty torvalds#11
[  123.507468][    T1] Hardware name: linux,dummy-virt (DT)
[  123.509826][    T1] pstate: 80400005 (Nzcv daif +PAN -UAO BTYPE=--)
[  123.512609][    T1] pc : of_unittest_untrack_overlay+0x64/0x134
[  123.515245][    T1] lr : of_unittest_untrack_overlay+0x64/0x134
[  123.517848][    T1] sp : ffff00006a65fb30
[  123.519668][    T1] x29: ffff00006a65fb30 x28: 0000000000000000
[  123.522295][    T1] x27: ffff00006a65fc30 x26: ffffa00016b86f00
[  123.524937][    T1] x25: 0000000000000000 x24: 0000000000000000
[  123.527592][    T1] x23: ffffa00014c72540 x22: ffffa00016b86000
[  123.530191][    T1] x21: 0000000000000000 x20: 00000000ffffffff
[  123.532845][    T1] x19: 00000000ffffffff x18: 0000000000002690
[  123.535547][    T1] x17: 0000000000002718 x16: 00000000000014b8
[  123.538299][    T1] x15: 0000000000000001 x14: 0080000000000000
[  123.541055][    T1] x13: 0000000000000002 x12: ffff94000298d209
[  123.543801][    T1] x11: 1ffff4000298d208 x10: ffff94000298d208
[  123.546580][    T1] x9 : dfffa00000000000 x8 : ffffa00014c69047
[  123.549247][    T1] x7 : 0000000000000001 x6 : ffffa00014c69040
[  123.552026][    T1] x5 : ffff00006a654040 x4 : 0000000000000000
[  123.554799][    T1] x3 : ffffa00011d59d04 x2 : 00000000ffffffff
[  123.557541][    T1] x1 : ffff00006a654040 x0 : 0000000000000000
[  123.560390][    T1] Call trace:
[  123.561935][    T1]  of_unittest_untrack_overlay+0x64/0x134
[  123.564469][    T1]  of_unittest+0x2220/0x2438
[  123.566585][    T1]  do_one_initcall+0x470/0xa10
[  123.568751][    T1]  kernel_init_freeable+0x510/0x5f0
[  123.571123][    T1]  kernel_init+0x18/0x1e8
[  123.573078][    T1]  ret_from_fork+0x10/0x18
[  123.575119][    T1] Code: 97978a9c d4210000 14000024 97978a99 (d4207d00)
[  123.578138][    T1] ---[ end trace c4e049fb5e3b0ba0 ]---
[  123.580449][    T1] Kernel panic - not syncing: Fatal exception
[  123.583116][    T1] Kernel Offset: disabled
[  123.585066][    T1] CPU features: 0x240002,20002004
[  123.587259][    T1] Memory Limit: none
[  123.588986][    T1] ---[ end Kernel panic - not syncing: Fatal exception ]---

Signed-off-by: Anders Roxell <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Dec 23, 2024
…atch-fixes

WARNING: Possible repeated word: 'to'
torvalds#11: 
set as null leaving it to to be accessed.  Additionally, the read-only

WARNING: Please use correct Fixes: style 'Fixes: <12 chars of sha1> ("<title line>")' - ie: 'Fixes: fatal: not a ("nux-next'")'
torvalds#21: 
Fixes: 8f9e8f5 ("ocfs2: Fix Q_GETNEXTQUOTA for filesystem without quotas")

WARNING: Reported-by: should be immediately followed by Closes: with a URL to the report
torvalds#23: 
Reported-by: [email protected]
Tested-by: [email protected]

ERROR: space required before the open brace '{'
torvalds#47: FILE: fs/ocfs2/quota_global.c:896:
+	if (!sb_has_quota_active(sb, type)){

total: 1 errors, 3 warnings, 15 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/ocfs2-fix-slab-use-after-free-due-to-dangling-pointer-dqi_priv.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Changwei Ge <[email protected]>
Cc: Dennis Lam <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Dec 24, 2024
…atch-fixes

WARNING: Possible repeated word: 'to'
torvalds#11: 
set as null leaving it to to be accessed.  Additionally, the read-only

WARNING: Please use correct Fixes: style 'Fixes: <12 chars of sha1> ("<title line>")' - ie: 'Fixes: fatal: not a ("nux-next'")'
torvalds#21: 
Fixes: 8f9e8f5 ("ocfs2: Fix Q_GETNEXTQUOTA for filesystem without quotas")

WARNING: Reported-by: should be immediately followed by Closes: with a URL to the report
torvalds#23: 
Reported-by: [email protected]
Tested-by: [email protected]

ERROR: space required before the open brace '{'
torvalds#47: FILE: fs/ocfs2/quota_global.c:896:
+	if (!sb_has_quota_active(sb, type)){

total: 1 errors, 3 warnings, 15 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/ocfs2-fix-slab-use-after-free-due-to-dangling-pointer-dqi_priv.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Changwei Ge <[email protected]>
Cc: Dennis Lam <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Dec 27, 2024
…atch-fixes

WARNING: Possible repeated word: 'to'
torvalds#11: 
set as null leaving it to to be accessed.  Additionally, the read-only

WARNING: Please use correct Fixes: style 'Fixes: <12 chars of sha1> ("<title line>")' - ie: 'Fixes: fatal: not a ("nux-next'")'
torvalds#21: 
Fixes: 8f9e8f5 ("ocfs2: Fix Q_GETNEXTQUOTA for filesystem without quotas")

WARNING: Reported-by: should be immediately followed by Closes: with a URL to the report
torvalds#23: 
Reported-by: [email protected]
Tested-by: [email protected]

ERROR: space required before the open brace '{'
torvalds#47: FILE: fs/ocfs2/quota_global.c:896:
+	if (!sb_has_quota_active(sb, type)){

total: 1 errors, 3 warnings, 15 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/ocfs2-fix-slab-use-after-free-due-to-dangling-pointer-dqi_priv.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Changwei Ge <[email protected]>
Cc: Dennis Lam <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Dec 28, 2024
…atch-fixes

WARNING: Possible repeated word: 'to'
torvalds#11: 
set as null leaving it to to be accessed.  Additionally, the read-only

WARNING: Please use correct Fixes: style 'Fixes: <12 chars of sha1> ("<title line>")' - ie: 'Fixes: fatal: not a ("nux-next'")'
torvalds#21: 
Fixes: 8f9e8f5 ("ocfs2: Fix Q_GETNEXTQUOTA for filesystem without quotas")

WARNING: Reported-by: should be immediately followed by Closes: with a URL to the report
torvalds#23: 
Reported-by: [email protected]
Tested-by: [email protected]

ERROR: space required before the open brace '{'
torvalds#47: FILE: fs/ocfs2/quota_global.c:896:
+	if (!sb_has_quota_active(sb, type)){

total: 1 errors, 3 warnings, 15 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/ocfs2-fix-slab-use-after-free-due-to-dangling-pointer-dqi_priv.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Changwei Ge <[email protected]>
Cc: Dennis Lam <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Dec 28, 2024
…atch-fixes

WARNING: Possible repeated word: 'to'
torvalds#11: 
set as null leaving it to to be accessed.  Additionally, the read-only

WARNING: Please use correct Fixes: style 'Fixes: <12 chars of sha1> ("<title line>")' - ie: 'Fixes: fatal: not a ("nux-next'")'
torvalds#21: 
Fixes: 8f9e8f5 ("ocfs2: Fix Q_GETNEXTQUOTA for filesystem without quotas")

WARNING: Reported-by: should be immediately followed by Closes: with a URL to the report
torvalds#23: 
Reported-by: [email protected]
Tested-by: [email protected]

ERROR: space required before the open brace '{'
torvalds#47: FILE: fs/ocfs2/quota_global.c:896:
+	if (!sb_has_quota_active(sb, type)){

total: 1 errors, 3 warnings, 15 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/ocfs2-fix-slab-use-after-free-due-to-dangling-pointer-dqi_priv.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Changwei Ge <[email protected]>
Cc: Dennis Lam <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Dec 31, 2024
…atch-fixes

WARNING: Possible repeated word: 'to'
torvalds#11: 
set as null leaving it to to be accessed.  Additionally, the read-only

WARNING: Please use correct Fixes: style 'Fixes: <12 chars of sha1> ("<title line>")' - ie: 'Fixes: fatal: not a ("nux-next'")'
torvalds#21: 
Fixes: 8f9e8f5 ("ocfs2: Fix Q_GETNEXTQUOTA for filesystem without quotas")

WARNING: Reported-by: should be immediately followed by Closes: with a URL to the report
torvalds#23: 
Reported-by: [email protected]
Tested-by: [email protected]

ERROR: space required before the open brace '{'
torvalds#47: FILE: fs/ocfs2/quota_global.c:896:
+	if (!sb_has_quota_active(sb, type)){

total: 1 errors, 3 warnings, 15 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/ocfs2-fix-slab-use-after-free-due-to-dangling-pointer-dqi_priv.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Changwei Ge <[email protected]>
Cc: Dennis Lam <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Jan 7, 2025
During the update procedure, when overwrite element in a pre-allocated
htab, the freeing of old_element is protected by the bucket lock. The
reason why the bucket lock is necessary is that the old_element has
already been stashed in htab->extra_elems after alloc_htab_elem()
returns. If freeing the old_element after the bucket lock is unlocked,
the stashed element may be reused by concurrent update procedure and the
freeing of old_element will run concurrently with the reuse of the
old_element. However, the invocation of check_and_free_fields() may
acquire a spin-lock which violates the lockdep rule because its caller
has already held a raw-spin-lock (bucket lock). The following warning
will be reported when such race happens:

  BUG: scheduling while atomic: test_progs/676/0x00000003
  3 locks held by test_progs/676:
   #0: ffffffff864b0240 (rcu_read_lock_trace){....}-{0:0}, at: bpf_prog_test_run_syscall+0x2c0/0x830
   #1: ffff88810e961188 (&htab->lockdep_key){....}-{2:2}, at: htab_map_update_elem+0x306/0x1500
   #2: ffff8881f4eac1b8 (&base->softirq_expiry_lock){....}-{2:2}, at: hrtimer_cancel_wait_running+0xe9/0x1b0
  Modules linked in: bpf_testmod(O)
  Preemption disabled at:
  [<ffffffff817837a3>] htab_map_update_elem+0x293/0x1500
  CPU: 0 UID: 0 PID: 676 Comm: test_progs Tainted: G ... 6.12.0+ torvalds#11
  Tainted: [W]=WARN, [O]=OOT_MODULE
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)...
  Call Trace:
   <TASK>
   dump_stack_lvl+0x57/0x70
   dump_stack+0x10/0x20
   __schedule_bug+0x120/0x170
   __schedule+0x300c/0x4800
   schedule_rtlock+0x37/0x60
   rtlock_slowlock_locked+0x6d9/0x54c0
   rt_spin_lock+0x168/0x230
   hrtimer_cancel_wait_running+0xe9/0x1b0
   hrtimer_cancel+0x24/0x30
   bpf_timer_delete_work+0x1d/0x40
   bpf_timer_cancel_and_free+0x5e/0x80
   bpf_obj_free_fields+0x262/0x4a0
   check_and_free_fields+0x1d0/0x280
   htab_map_update_elem+0x7fc/0x1500
   bpf_prog_9f90bc20768e0cb9_overwrite_cb+0x3f/0x43
   bpf_prog_ea601c4649694dbd_overwrite_timer+0x5d/0x7e
   bpf_prog_test_run_syscall+0x322/0x830
   __sys_bpf+0x135d/0x3ca0
   __x64_sys_bpf+0x75/0xb0
   x64_sys_call+0x1b5/0xa10
   do_syscall_64+0x3b/0xc0
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
   ...
   </TASK>

To fix the problem, the patch breaks the reuse and refill of per-cpu
extra_elems into two independent part: reuse the per-cpu extra_elems
with bucket lock being held and refill the old_element as per-cpu
extra_elems after the bucket lock is unlocked. After the break, it is
safe to free pre-allocated element after bucket lock is unlocked.

Reported-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Hou Tao <[email protected]>
roxell added a commit to roxell/linux that referenced this pull request Jan 8, 2025
[  123.491737][    T1] Unexpected kernel BRK exception at EL1
[  123.497593][    T1] Internal error: ptrace BRK handler: f20003e8 [#1] PREEMPT SMP
[  123.500785][    T1] Modules linked in:
[  123.502567][    T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.8.0-rc3-next-20200630-00003-g15e24419c239-dirty torvalds#11
[  123.507468][    T1] Hardware name: linux,dummy-virt (DT)
[  123.509826][    T1] pstate: 80400005 (Nzcv daif +PAN -UAO BTYPE=--)
[  123.512609][    T1] pc : of_unittest_untrack_overlay+0x64/0x134
[  123.515245][    T1] lr : of_unittest_untrack_overlay+0x64/0x134
[  123.517848][    T1] sp : ffff00006a65fb30
[  123.519668][    T1] x29: ffff00006a65fb30 x28: 0000000000000000
[  123.522295][    T1] x27: ffff00006a65fc30 x26: ffffa00016b86f00
[  123.524937][    T1] x25: 0000000000000000 x24: 0000000000000000
[  123.527592][    T1] x23: ffffa00014c72540 x22: ffffa00016b86000
[  123.530191][    T1] x21: 0000000000000000 x20: 00000000ffffffff
[  123.532845][    T1] x19: 00000000ffffffff x18: 0000000000002690
[  123.535547][    T1] x17: 0000000000002718 x16: 00000000000014b8
[  123.538299][    T1] x15: 0000000000000001 x14: 0080000000000000
[  123.541055][    T1] x13: 0000000000000002 x12: ffff94000298d209
[  123.543801][    T1] x11: 1ffff4000298d208 x10: ffff94000298d208
[  123.546580][    T1] x9 : dfffa00000000000 x8 : ffffa00014c69047
[  123.549247][    T1] x7 : 0000000000000001 x6 : ffffa00014c69040
[  123.552026][    T1] x5 : ffff00006a654040 x4 : 0000000000000000
[  123.554799][    T1] x3 : ffffa00011d59d04 x2 : 00000000ffffffff
[  123.557541][    T1] x1 : ffff00006a654040 x0 : 0000000000000000
[  123.560390][    T1] Call trace:
[  123.561935][    T1]  of_unittest_untrack_overlay+0x64/0x134
[  123.564469][    T1]  of_unittest+0x2220/0x2438
[  123.566585][    T1]  do_one_initcall+0x470/0xa10
[  123.568751][    T1]  kernel_init_freeable+0x510/0x5f0
[  123.571123][    T1]  kernel_init+0x18/0x1e8
[  123.573078][    T1]  ret_from_fork+0x10/0x18
[  123.575119][    T1] Code: 97978a9c d4210000 14000024 97978a99 (d4207d00)
[  123.578138][    T1] ---[ end trace c4e049fb5e3b0ba0 ]---
[  123.580449][    T1] Kernel panic - not syncing: Fatal exception
[  123.583116][    T1] Kernel Offset: disabled
[  123.585066][    T1] CPU features: 0x240002,20002004
[  123.587259][    T1] Memory Limit: none
[  123.588986][    T1] ---[ end Kernel panic - not syncing: Fatal exception ]---

Signed-off-by: Anders Roxell <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Jan 9, 2025
During the update procedure, when overwrite element in a pre-allocated
htab, the freeing of old_element is protected by the bucket lock. The
reason why the bucket lock is necessary is that the old_element has
already been stashed in htab->extra_elems after alloc_htab_elem()
returns. If freeing the old_element after the bucket lock is unlocked,
the stashed element may be reused by concurrent update procedure and the
freeing of old_element will run concurrently with the reuse of the
old_element. However, the invocation of check_and_free_fields() may
acquire a spin-lock which violates the lockdep rule because its caller
has already held a raw-spin-lock (bucket lock). The following warning
will be reported when such race happens:

  BUG: scheduling while atomic: test_progs/676/0x00000003
  3 locks held by test_progs/676:
  #0: ffffffff864b0240 (rcu_read_lock_trace){....}-{0:0}, at: bpf_prog_test_run_syscall+0x2c0/0x830
  #1: ffff88810e961188 (&htab->lockdep_key){....}-{2:2}, at: htab_map_update_elem+0x306/0x1500
  #2: ffff8881f4eac1b8 (&base->softirq_expiry_lock){....}-{2:2}, at: hrtimer_cancel_wait_running+0xe9/0x1b0
  Modules linked in: bpf_testmod(O)
  Preemption disabled at:
  [<ffffffff817837a3>] htab_map_update_elem+0x293/0x1500
  CPU: 0 UID: 0 PID: 676 Comm: test_progs Tainted: G ... 6.12.0+ torvalds#11
  Tainted: [W]=WARN, [O]=OOT_MODULE
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)...
  Call Trace:
  <TASK>
  dump_stack_lvl+0x57/0x70
  dump_stack+0x10/0x20
  __schedule_bug+0x120/0x170
  __schedule+0x300c/0x4800
  schedule_rtlock+0x37/0x60
  rtlock_slowlock_locked+0x6d9/0x54c0
  rt_spin_lock+0x168/0x230
  hrtimer_cancel_wait_running+0xe9/0x1b0
  hrtimer_cancel+0x24/0x30
  bpf_timer_delete_work+0x1d/0x40
  bpf_timer_cancel_and_free+0x5e/0x80
  bpf_obj_free_fields+0x262/0x4a0
  check_and_free_fields+0x1d0/0x280
  htab_map_update_elem+0x7fc/0x1500
  bpf_prog_9f90bc20768e0cb9_overwrite_cb+0x3f/0x43
  bpf_prog_ea601c4649694dbd_overwrite_timer+0x5d/0x7e
  bpf_prog_test_run_syscall+0x322/0x830
  __sys_bpf+0x135d/0x3ca0
  __x64_sys_bpf+0x75/0xb0
  x64_sys_call+0x1b5/0xa10
  do_syscall_64+0x3b/0xc0
  entry_SYSCALL_64_after_hwframe+0x4b/0x53
  ...
  </TASK>

It seems feasible to break the reuse and refill of per-cpu extra_elems
into two independent parts: reuse the per-cpu extra_elems with bucket
lock being held and refill the old_element as per-cpu extra_elems after
the bucket lock is unlocked. However, it will make the concurrent
overwrite procedures on the same CPU return unexpected -E2BIG error when
the map is full.

Therefore, the patch fixes the lock problem by breaking the cancelling
of bpf_timer into two steps:
1) use hrtimer_try_to_cancel() and check its return value
2) if the timer is running, use hrtimer_cancel() through a kworker to
   cancel it again
Considering that the current implementation of hrtimer_cancel() will try
to spin on current CPU or acquire a being held softirq_expiry_lock
when the current timer is running, these steps above are reasonable.
However, it also has downside. When the timer is running, the cancelling
of the timer is delayed when releasing the last map uref. The delay is
also fixable (e.g., break the cancelling of bpf timer into two parts:
one part in locked scope, another one in unlocked scope), so it can be
revised later if necessary.

It is a bit hard to decide the right fix tag. One reason is that the
problem depends on PREEMPT_RT which is enabled in v6.12. Considering the
softirq_expiry_lock lock exists since v5.4 and bpf_timer is introduced
in v5.15, the bpf_timer commit is used in the fixes tag and an extra
depends-on tag is added to state the dependency on PREEMPT_RT.

Fixes: b00628b ("bpf: Introduce bpf timers.")
Depends-on: v6.12 with PREEMPT_RT enabled
Reported-by: Sebastian Andrzej Siewior <[email protected]>
Closes: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Hou Tao <[email protected]>
roxell added a commit to roxell/linux that referenced this pull request Jan 9, 2025
[  123.491737][    T1] Unexpected kernel BRK exception at EL1
[  123.497593][    T1] Internal error: ptrace BRK handler: f20003e8 [#1] PREEMPT SMP
[  123.500785][    T1] Modules linked in:
[  123.502567][    T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.8.0-rc3-next-20200630-00003-g15e24419c239-dirty torvalds#11
[  123.507468][    T1] Hardware name: linux,dummy-virt (DT)
[  123.509826][    T1] pstate: 80400005 (Nzcv daif +PAN -UAO BTYPE=--)
[  123.512609][    T1] pc : of_unittest_untrack_overlay+0x64/0x134
[  123.515245][    T1] lr : of_unittest_untrack_overlay+0x64/0x134
[  123.517848][    T1] sp : ffff00006a65fb30
[  123.519668][    T1] x29: ffff00006a65fb30 x28: 0000000000000000
[  123.522295][    T1] x27: ffff00006a65fc30 x26: ffffa00016b86f00
[  123.524937][    T1] x25: 0000000000000000 x24: 0000000000000000
[  123.527592][    T1] x23: ffffa00014c72540 x22: ffffa00016b86000
[  123.530191][    T1] x21: 0000000000000000 x20: 00000000ffffffff
[  123.532845][    T1] x19: 00000000ffffffff x18: 0000000000002690
[  123.535547][    T1] x17: 0000000000002718 x16: 00000000000014b8
[  123.538299][    T1] x15: 0000000000000001 x14: 0080000000000000
[  123.541055][    T1] x13: 0000000000000002 x12: ffff94000298d209
[  123.543801][    T1] x11: 1ffff4000298d208 x10: ffff94000298d208
[  123.546580][    T1] x9 : dfffa00000000000 x8 : ffffa00014c69047
[  123.549247][    T1] x7 : 0000000000000001 x6 : ffffa00014c69040
[  123.552026][    T1] x5 : ffff00006a654040 x4 : 0000000000000000
[  123.554799][    T1] x3 : ffffa00011d59d04 x2 : 00000000ffffffff
[  123.557541][    T1] x1 : ffff00006a654040 x0 : 0000000000000000
[  123.560390][    T1] Call trace:
[  123.561935][    T1]  of_unittest_untrack_overlay+0x64/0x134
[  123.564469][    T1]  of_unittest+0x2220/0x2438
[  123.566585][    T1]  do_one_initcall+0x470/0xa10
[  123.568751][    T1]  kernel_init_freeable+0x510/0x5f0
[  123.571123][    T1]  kernel_init+0x18/0x1e8
[  123.573078][    T1]  ret_from_fork+0x10/0x18
[  123.575119][    T1] Code: 97978a9c d4210000 14000024 97978a99 (d4207d00)
[  123.578138][    T1] ---[ end trace c4e049fb5e3b0ba0 ]---
[  123.580449][    T1] Kernel panic - not syncing: Fatal exception
[  123.583116][    T1] Kernel Offset: disabled
[  123.585066][    T1] CPU features: 0x240002,20002004
[  123.587259][    T1] Memory Limit: none
[  123.588986][    T1] ---[ end Kernel panic - not syncing: Fatal exception ]---

Signed-off-by: Anders Roxell <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Jan 14, 2025
Under PREEMPT_RT, it is not safe to use GPF_ATOMIC kmalloc when
preemption or irq is disabled. The following warning is reported when
running test_progs under PREEMPT_RT:

  BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
  in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 675, name: test_progs
  preempt_count: 1, expected: 0
  RCU nest depth: 0, expected: 0
  2 locks held by test_progs/675:
   #0: ffffffff864b0240 (rcu_read_lock_trace){....}-{0:0}, at: bpf_prog_test_run_syscall+0x2c0/0x830
   #1: ffff8881f4ec40c8 ((&c->lock)){....}-{2:2}, at: ___slab_alloc+0xbc/0x1280
  Preemption disabled at:
  [<ffffffff8175ae2b>] __bpf_async_init+0xbb/0xb10
  CPU: 1 UID: 0 PID: 675 Comm: test_progs Tainted: G           O       6.12.0+ torvalds#11
  Tainted: [O]=OOT_MODULE
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...
  Call Trace:
   <TASK>
   dump_stack_lvl+0x57/0x70
   dump_stack+0x10/0x20
   __might_resched+0x337/0x4d0
   rt_spin_lock+0xd4/0x230
   ___slab_alloc+0xbc/0x1280
   __slab_alloc.isra.0+0x5d/0xa0
   __kmalloc_node_noprof+0xf7/0x4f0
   bpf_map_kmalloc_node+0xf5/0x6b0
   __bpf_async_init+0x20e/0xb10
   bpf_timer_init+0x30/0x40
   bpf_prog_c7e2dc9ff3d5ba62_start_cb+0x55/0x85
   bpf_prog_4eb421be69ae82fa_start_timer+0x5d/0x7e
   bpf_prog_test_run_syscall+0x322/0x830
   __sys_bpf+0x135d/0x3ca0
   __x64_sys_bpf+0x75/0xb0
   x64_sys_call+0x1b5/0xa10
   do_syscall_64+0x3b/0xc0
   entry_SYSCALL_64_after_hwframe+0x4b/0x53

Fix the problem by using bpf_global_ma to allocate bpf_async_cb when
PREEMPT_RT is enabled. The reason for still using kmalloc for
no-PREEMPT_RT case is that bpf_global_ma doesn't support accouting the
allocated memory to specific memcg. Also doing the memory allocation
before invoking __bpf_spin_lock_irqsave() to reduce the possibility of
-ENOMEM for bpf_global_ma.

Signed-off-by: Hou Tao <[email protected]>
roxell added a commit to roxell/linux that referenced this pull request Jan 20, 2025
[  123.491737][    T1] Unexpected kernel BRK exception at EL1
[  123.497593][    T1] Internal error: ptrace BRK handler: f20003e8 [#1] PREEMPT SMP
[  123.500785][    T1] Modules linked in:
[  123.502567][    T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.8.0-rc3-next-20200630-00003-g15e24419c239-dirty torvalds#11
[  123.507468][    T1] Hardware name: linux,dummy-virt (DT)
[  123.509826][    T1] pstate: 80400005 (Nzcv daif +PAN -UAO BTYPE=--)
[  123.512609][    T1] pc : of_unittest_untrack_overlay+0x64/0x134
[  123.515245][    T1] lr : of_unittest_untrack_overlay+0x64/0x134
[  123.517848][    T1] sp : ffff00006a65fb30
[  123.519668][    T1] x29: ffff00006a65fb30 x28: 0000000000000000
[  123.522295][    T1] x27: ffff00006a65fc30 x26: ffffa00016b86f00
[  123.524937][    T1] x25: 0000000000000000 x24: 0000000000000000
[  123.527592][    T1] x23: ffffa00014c72540 x22: ffffa00016b86000
[  123.530191][    T1] x21: 0000000000000000 x20: 00000000ffffffff
[  123.532845][    T1] x19: 00000000ffffffff x18: 0000000000002690
[  123.535547][    T1] x17: 0000000000002718 x16: 00000000000014b8
[  123.538299][    T1] x15: 0000000000000001 x14: 0080000000000000
[  123.541055][    T1] x13: 0000000000000002 x12: ffff94000298d209
[  123.543801][    T1] x11: 1ffff4000298d208 x10: ffff94000298d208
[  123.546580][    T1] x9 : dfffa00000000000 x8 : ffffa00014c69047
[  123.549247][    T1] x7 : 0000000000000001 x6 : ffffa00014c69040
[  123.552026][    T1] x5 : ffff00006a654040 x4 : 0000000000000000
[  123.554799][    T1] x3 : ffffa00011d59d04 x2 : 00000000ffffffff
[  123.557541][    T1] x1 : ffff00006a654040 x0 : 0000000000000000
[  123.560390][    T1] Call trace:
[  123.561935][    T1]  of_unittest_untrack_overlay+0x64/0x134
[  123.564469][    T1]  of_unittest+0x2220/0x2438
[  123.566585][    T1]  do_one_initcall+0x470/0xa10
[  123.568751][    T1]  kernel_init_freeable+0x510/0x5f0
[  123.571123][    T1]  kernel_init+0x18/0x1e8
[  123.573078][    T1]  ret_from_fork+0x10/0x18
[  123.575119][    T1] Code: 97978a9c d4210000 14000024 97978a99 (d4207d00)
[  123.578138][    T1] ---[ end trace c4e049fb5e3b0ba0 ]---
[  123.580449][    T1] Kernel panic - not syncing: Fatal exception
[  123.583116][    T1] Kernel Offset: disabled
[  123.585066][    T1] CPU features: 0x240002,20002004
[  123.587259][    T1] Memory Limit: none
[  123.588986][    T1] ---[ end Kernel panic - not syncing: Fatal exception ]---

Signed-off-by: Anders Roxell <[email protected]>
guidosarducci pushed a commit to guidosarducci/linux that referenced this pull request Jan 20, 2025
During the update procedure, when overwrite element in a pre-allocated
htab, the freeing of old_element is protected by the bucket lock. The
reason why the bucket lock is necessary is that the old_element has
already been stashed in htab->extra_elems after alloc_htab_elem()
returns. If freeing the old_element after the bucket lock is unlocked,
the stashed element may be reused by concurrent update procedure and the
freeing of old_element will run concurrently with the reuse of the
old_element. However, the invocation of check_and_free_fields() may
acquire a spin-lock which violates the lockdep rule because its caller
has already held a raw-spin-lock (bucket lock). The following warning
will be reported when such race happens:

  BUG: scheduling while atomic: test_progs/676/0x00000003
  3 locks held by test_progs/676:
  #0: ffffffff864b0240 (rcu_read_lock_trace){....}-{0:0}, at: bpf_prog_test_run_syscall+0x2c0/0x830
  #1: ffff88810e961188 (&htab->lockdep_key){....}-{2:2}, at: htab_map_update_elem+0x306/0x1500
  #2: ffff8881f4eac1b8 (&base->softirq_expiry_lock){....}-{2:2}, at: hrtimer_cancel_wait_running+0xe9/0x1b0
  Modules linked in: bpf_testmod(O)
  Preemption disabled at:
  [<ffffffff817837a3>] htab_map_update_elem+0x293/0x1500
  CPU: 0 UID: 0 PID: 676 Comm: test_progs Tainted: G ... 6.12.0+ torvalds#11
  Tainted: [W]=WARN, [O]=OOT_MODULE
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)...
  Call Trace:
  <TASK>
  dump_stack_lvl+0x57/0x70
  dump_stack+0x10/0x20
  __schedule_bug+0x120/0x170
  __schedule+0x300c/0x4800
  schedule_rtlock+0x37/0x60
  rtlock_slowlock_locked+0x6d9/0x54c0
  rt_spin_lock+0x168/0x230
  hrtimer_cancel_wait_running+0xe9/0x1b0
  hrtimer_cancel+0x24/0x30
  bpf_timer_delete_work+0x1d/0x40
  bpf_timer_cancel_and_free+0x5e/0x80
  bpf_obj_free_fields+0x262/0x4a0
  check_and_free_fields+0x1d0/0x280
  htab_map_update_elem+0x7fc/0x1500
  bpf_prog_9f90bc20768e0cb9_overwrite_cb+0x3f/0x43
  bpf_prog_ea601c4649694dbd_overwrite_timer+0x5d/0x7e
  bpf_prog_test_run_syscall+0x322/0x830
  __sys_bpf+0x135d/0x3ca0
  __x64_sys_bpf+0x75/0xb0
  x64_sys_call+0x1b5/0xa10
  do_syscall_64+0x3b/0xc0
  entry_SYSCALL_64_after_hwframe+0x4b/0x53
  ...
  </TASK>

It seems feasible to break the reuse and refill of per-cpu extra_elems
into two independent parts: reuse the per-cpu extra_elems with bucket
lock being held and refill the old_element as per-cpu extra_elems after
the bucket lock is unlocked. However, it will make the concurrent
overwrite procedures on the same CPU return unexpected -E2BIG error when
the map is full.

Therefore, the patch fixes the lock problem by breaking the cancelling
of bpf_timer into two steps for PREEMPT_RT:
1) use hrtimer_try_to_cancel() and check its return value
2) if the timer is running, use hrtimer_cancel() through a kworker to
   cancel it again
Considering that the current implementation of hrtimer_cancel() will try
to acquire a being held softirq_expiry_lock when the current timer is
running, these steps above are reasonable. However, it also has
downside. When the timer is running, the cancelling of the timer is
delayed when releasing the last map uref. The delay is also fixable
(e.g., break the cancelling of bpf timer into two parts: one part in
locked scope, another one in unlocked scope), it can be revised later if
necessary.

It is a bit hard to decide the right fix tag. One reason is that the
problem depends on PREEMPT_RT which is enabled in v6.12. Considering the
softirq_expiry_lock lock exists since v5.4 and bpf_timer is introduced
in v5.15, the bpf_timer commit is used in the fixes tag and an extra
depends-on tag is added to state the dependency on PREEMPT_RT.

Fixes: b00628b ("bpf: Introduce bpf timers.")
Depends-on: v6.12+ with PREEMPT_RT enabled
Reported-by: Sebastian Andrzej Siewior <[email protected]>
Closes: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Hou Tao <[email protected]>
Reviewed-by: Toke Høiland-Jørgensen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
hogliux pushed a commit to hogliux/yogalinux7x that referenced this pull request Jan 21, 2025
BugLink: https://bugs.launchpad.net/bugs/2091625

[ Upstream commit 5bf1557 ]

test_progs uses glibc specific functions backtrace() and
backtrace_symbols_fd() to print backtrace in case of SIGSEGV.

Recent commit (see fixes) updated test_progs.c to define stub versions
of the same functions with attriubte "weak" in order to allow linking
test_progs against musl libc. Unfortunately this broke the backtrace
handling for glibc builds.

As it turns out, glibc defines backtrace() and backtrace_symbols_fd()
as weak:

  $ llvm-readelf --symbols /lib64/libc.so.6 \
     | grep -P '( backtrace_symbols_fd| backtrace)$'
  4910: 0000000000126b40   161 FUNC    WEAK   DEFAULT    16 backtrace
  6843: 0000000000126f90   852 FUNC    WEAK   DEFAULT    16 backtrace_symbols_fd

So does test_progs:

 $ llvm-readelf --symbols test_progs \
    | grep -P '( backtrace_symbols_fd| backtrace)$'
  2891: 00000000006ad190    15 FUNC    WEAK   DEFAULT    13 backtrace
 11215: 00000000006ad1a0    41 FUNC    WEAK   DEFAULT    13 backtrace_symbols_fd

In such situation dynamic linker is not obliged to favour glibc
implementation over the one defined in test_progs.

Compiling with the following simple modification to test_progs.c
demonstrates the issue:

  $ git diff
  ...
  \--- a/tools/testing/selftests/bpf/test_progs.c
  \+++ b/tools/testing/selftests/bpf/test_progs.c
  \@@ -1817,6 +1817,7 @@ int main(int argc, char **argv)
          if (err)
                  return err;

  +       *(int *)0xdeadbeef  = 42;
          err = cd_flavor_subdir(argv[0]);
          if (err)
                  return err;

  $ ./test_progs
  [0]: Caught signal torvalds#11!
  Stack trace:
  <backtrace not supported>
  Segmentation fault (core dumped)

Resolve this by hiding stub definitions behind __GLIBC__ macro check
instead of using "weak" attribute.

Fixes: c9a83e7 ("selftests/bpf: Fix compile if backtrace support missing in libc")
Signed-off-by: Eduard Zingerman <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Tested-by: Tony Ambardar <[email protected]>
Reviewed-by: Tony Ambardar <[email protected]>
Acked-by: Daniel Xu <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Paolo Pisati <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants