-
Notifications
You must be signed in to change notification settings - Fork 54.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added backlight support for some samsung laptops #11
Conversation
Models: * N120 * R468/R418 * X320/X420/X520 * R510/P510 * N350 * R470/R420 * R528/R728 * SQ1S
I'm not doing github pulls. The pull requests are seriously Please don't press the "pull request" github button. Do proper kernel
|
How are pullrequest seriously misdesigned (apart from that you might be used to a different kind of workflow)? |
I'm not doing linux kernel pulls. The kernel pulls are seriously Please don't press the "pull request" kernel button. Do proper github GitHub |
I honestly would like to know why github pull requests are misdesigned. I'll grant that I didn't actually create git but they seem to work just fine, is there something I am missing? |
Wow, great discussion went on there. Shacon raised perfectly valid points and Torvalds was basically "f this, I don't care, you're crazy". Great response! |
On Fri, Sep 9, 2011 at 12:49 AM, Nils Werner
Can you read? "If the merge message doesn't tell me who the merge is from and what If you can't understand that, then yes, you're crazy. Or just terminally stupid. The quality of github "issues" and comments really is very low. This
|
First, I agree with Scott: In many cases people delete their fork (or at least the branch). So where would the message point you to? The pull request of the pulling repository will much more likely be around for a long time. Also, what if the branch you'll pull from has changed in the meantime? You'd end up with changes that are not documented in the pull request and thus not reviewed by the ones discussing the pull request. As soon as the PR is posted you must put them out of reach of the author to keep them from sneaking in changes.
Also, you did notice that you've proven my point right there, right? |
On Fri, Sep 9, 2011 at 12:10 PM, Nils Werner
That's a "implementation problem". It's not an argument for doing crap. Simple solution: if people delete the branch or repository, consider You can make the "pull request" namespace separate from the branch git pull git://github.com/ and then if there i a previous pull request, add a number to it (so it Or something along those lines. The important part is that YOU MUST
We actually do this in the kernel on purpose sometimes - people fix up That said, again, you could do the same thing: if somebody changes a
Umm, considering that the pull requests used to have no documentation As soon as the PR is posted you must put them out of reach of the
Umm. I'm not polite. Big news. I'd rather be acerbic than stupid.
|
A decentralized system that doesn't accept disappearing nodes sounds more like a design problem.
Years after the branch has been merged? Is that a problem we wanted to solve?
I meant malicuous changes. Hierarchies are shallow, elite circles basically nonexistant so that's a real issue. And the biggest strength of GitHub.
Thats the first constructive comment to this discussion. And sounds like a good idea, apart from the problem that you'd lose the link to the PR wich, to many, is more useful than being able to immediately recognise the source. Also it would probably require lots of modifications to the deamon though.And very disciplined contributors (always make sure to use dead-end topic-branches, not everybody does that). Separating the two simply improves the workflow a lot. It'd be interesting what @schacon has to say about it.
When was that? Months ago? I am talking about your comment 2 days ago. |
A personal, unrelated note: Being unable to lead an objective discussion. Judging people, then insulting them just to prove a point. Recognising ones flaws but being unwilling to change them, instead bragging about them. Missing the ability to reflect on ones actions during interactions with others. That sounds pretty stupid to me. Anyways, I'm moving on. |
I thought we were talking about pull requests and branches? When did a branch become a node?
Except that, as indicated by Scott Chacon [0], the most common scenario is to perform the pull request locally on your machine, allowing you to pull the code and then review it without said code being changed before merging. I can understand your argument in relation to pull requests done using the button on the website though. |
* Ingo Molnar <[email protected]> wrote: > The patch below addresses these concerns, serializes the output, tidies up the > printout, resulting in this new output: There's one bug remaining that my patch does not address: the vCPUs are not printed in order: # vCPU #0's dump: # vCPU #2's dump: # vCPU torvalds#24's dump: # vCPU #5's dump: # vCPU torvalds#39's dump: # vCPU torvalds#38's dump: # vCPU torvalds#51's dump: # vCPU torvalds#11's dump: # vCPU torvalds#10's dump: # vCPU torvalds#12's dump: This is undesirable as the order of printout is highly random, so successive dumps are difficult to compare. The patch below serializes the signalling itself. (this is on top of the previous patch) The patch also tweaks the vCPU printout line a bit so that it does not start with '#', which is discarded if such messages are pasted into Git commit messages. Signed-off-by: Ingo Molnar <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
If the pte mapping in generic_perform_write() is unmapped between iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the "copied" parameter to ->end_write can be zero. ext4 couldn't cope with it with delayed allocations enabled. This skips the i_disksize enlargement logic if copied is zero and no new data was appeneded to the inode. gdb> bt #0 0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\ 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467 #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\ xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512 #2 0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\ ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440 #3 generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\ os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482 #4 0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\ xffff88001e26be40) at mm/filemap.c:2600 #5 0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\ zed out>, pos=<value optimized out>) at mm/filemap.c:2632 #6 0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\ t fs/ext4/file.c:136 #7 0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \ ppos=0xffff88001e26bf48) at fs/read_write.c:406 #8 0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\ 000, pos=0xffff88001e26bf48) at fs/read_write.c:435 #9 0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\ 4000) at fs/read_write.c:487 #10 <signal handler called> #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ () #12 0x0000000000000000 in ?? () gdb> print offset $22 = 0xffffffffffffffff gdb> print idx $23 = 0xffffffff gdb> print inode->i_blkbits $24 = 0xc gdb> up #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\ xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512 2512 if (ext4_da_should_update_i_disksize(page, end)) { gdb> print start $25 = 0x0 gdb> print end $26 = 0xffffffffffffffff gdb> print pos $27 = 0x108000 gdb> print new_i_size $28 = 0x108000 gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize $29 = 0xd9000 gdb> down 2467 for (i = 0; i < idx; i++) gdb> print i $30 = 0xd44acbee This is 100% reproducible with some autonuma development code tuned in a very aggressive manner (not normal way even for knumad) which does "exotic" changes to the ptes. It wouldn't normally trigger but I don't see why it can't happen normally if the page is added to swap cache in between the two faults leading to "copied" being zero (which then hangs in ext4). So it should be fixed. Especially possible with lumpy reclaim (albeit disabled if compaction is enabled) as that would ignore the young bits in the ptes. Signed-off-by: Andrea Arcangeli <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]> Cc: [email protected]
If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not update the real num tx queues. netdev_queue_update_kobjects() is already called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when upper layer driver, e.g., FCoE protocol stack is monitoring the netdev event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove extra queues allocated for FCoE, the associated txq sysfs kobjects are already removed, and trying to update the real num queues would cause something like below: ... PID: 25138 TASK: ffff88021e64c440 CPU: 3 COMMAND: "kworker/3:3" #0 [ffff88021f007760] machine_kexec at ffffffff810226d9 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d #2 [ffff88021f0078a0] oops_end at ffffffff813bca78 #3 [ffff88021f0078d0] no_context at ffffffff81029e72 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e torvalds#6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e torvalds#7 [ffff88021f007b10] page_fault at ffffffff813bc045 [exception RIP: sysfs_find_dirent+17] RIP: ffffffff81178611 RSP: ffff88021f007bc0 RFLAGS: 00010246 RAX: ffff88021e64c440 RBX: ffffffff8156cc63 RCX: 0000000000000004 RDX: ffffffff8156cc63 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88021f007be0 R8: 0000000000000004 R9: 0000000000000008 R10: ffffffff816fed00 R11: 0000000000000004 R12: 0000000000000000 R13: ffffffff8156cc63 R14: 0000000000000000 R15: ffff8802222a0000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 torvalds#8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07 torvalds#9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27 torvalds#10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9 torvalds#11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38 torvalds#12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe] torvalds#13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe] torvalds#14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe] torvalds#15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q] torvalds#16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe] torvalds#17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe] torvalds#18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca torvalds#19 [ffff88021f007e68] worker_thread at ffffffff81060513 torvalds#20 [ffff88021f007ee8] kthread at ffffffff810648b6 torvalds#21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4 Signed-off-by: Yi Zou <[email protected]> Tested-by: Ross Brattain <[email protected]> Tested-by: Stephen Ko <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not update the real num tx queues. netdev_queue_update_kobjects() is already called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when upper layer driver, e.g., FCoE protocol stack is monitoring the netdev event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove extra queues allocated for FCoE, the associated txq sysfs kobjects are already removed, and trying to update the real num queues would cause something like below: ... PID: 25138 TASK: ffff88021e64c440 CPU: 3 COMMAND: "kworker/3:3" #0 [ffff88021f007760] machine_kexec at ffffffff810226d9 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d #2 [ffff88021f0078a0] oops_end at ffffffff813bca78 #3 [ffff88021f0078d0] no_context at ffffffff81029e72 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e torvalds#6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e torvalds#7 [ffff88021f007b10] page_fault at ffffffff813bc045 [exception RIP: sysfs_find_dirent+17] RIP: ffffffff81178611 RSP: ffff88021f007bc0 RFLAGS: 00010246 RAX: ffff88021e64c440 RBX: ffffffff8156cc63 RCX: 0000000000000004 RDX: ffffffff8156cc63 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88021f007be0 R8: 0000000000000004 R9: 0000000000000008 R10: ffffffff816fed00 R11: 0000000000000004 R12: 0000000000000000 R13: ffffffff8156cc63 R14: 0000000000000000 R15: ffff8802222a0000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 torvalds#8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07 torvalds#9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27 torvalds#10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9 torvalds#11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38 torvalds#12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe] torvalds#13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe] torvalds#14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe] torvalds#15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q] torvalds#16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe] torvalds#17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe] torvalds#18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca torvalds#19 [ffff88021f007e68] worker_thread at ffffffff81060513 torvalds#20 [ffff88021f007ee8] kthread at ffffffff810648b6 torvalds#21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4 Signed-off-by: Yi Zou <[email protected]> Tested-by: Ross Brattain <[email protected]> Tested-by: Stephen Ko <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72ec #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a torvalds#7 [d72d3d1] zone_watermark_ok at c02d26cb torvalds#8 [d72d3d2c] compact_zone at c030b8de torvalds#9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72ec #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a torvalds#7 [d72d3d1] zone_watermark_ok at c02d26cb torvalds#8 [d72d3d2c] compact_zone at c030b8de torvalds#9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
fixed: WARNING: please, no space before tabs torvalds#11: FILE: adt7411.c:11: + * ^I use power-down mode for suspend?, interrupt handling?$ not fixed as all other macros around it are the same structure and this one is only 2 chars longer: WARNING: line over 80 characters torvalds#229: FILE: adt7411.c:229: +static ADT7411_BIT_ATTR(fast_sampling, ADT7411_REG_CFG3, ADT7411_CFG3_ADC_CLK_225); Signed-off-by: Frans Meulenbroeks <[email protected]> Signed-off-by: Guenter Roeck <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72ec #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d1] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8de #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72ec #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d1] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8de #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72ec #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d1] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8de #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72ec #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d1] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8de #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72ec #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d1] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8de #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72ec #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d1] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8de #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72ec #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d1] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8de #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72ec #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d1] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8de #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
BugLink: http://bugs.launchpad.net/bugs/907778 commit ea51d13 upstream. If the pte mapping in generic_perform_write() is unmapped between iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the "copied" parameter to ->end_write can be zero. ext4 couldn't cope with it with delayed allocations enabled. This skips the i_disksize enlargement logic if copied is zero and no new data was appeneded to the inode. gdb> bt #0 0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\ 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467 #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\ xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512 #2 0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\ ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440 #3 generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\ os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482 #4 0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\ xffff88001e26be40) at mm/filemap.c:2600 #5 0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\ zed out>, pos=<value optimized out>) at mm/filemap.c:2632 torvalds#6 0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\ t fs/ext4/file.c:136 torvalds#7 0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \ ppos=0xffff88001e26bf48) at fs/read_write.c:406 torvalds#8 0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\ 000, pos=0xffff88001e26bf48) at fs/read_write.c:435 torvalds#9 0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\ 4000) at fs/read_write.c:487 torvalds#10 <signal handler called> torvalds#11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ () torvalds#12 0x0000000000000000 in ?? () gdb> print offset $22 = 0xffffffffffffffff gdb> print idx $23 = 0xffffffff gdb> print inode->i_blkbits $24 = 0xc gdb> up #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\ xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512 2512 if (ext4_da_should_update_i_disksize(page, end)) { gdb> print start $25 = 0x0 gdb> print end $26 = 0xffffffffffffffff gdb> print pos $27 = 0x108000 gdb> print new_i_size $28 = 0x108000 gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize $29 = 0xd9000 gdb> down 2467 for (i = 0; i < idx; i++) gdb> print i $30 = 0xd44acbee This is 100% reproducible with some autonuma development code tuned in a very aggressive manner (not normal way even for knumad) which does "exotic" changes to the ptes. It wouldn't normally trigger but I don't see why it can't happen normally if the page is added to swap cache in between the two faults leading to "copied" being zero (which then hangs in ext4). So it should be fixed. Especially possible with lumpy reclaim (albeit disabled if compaction is enabled) as that would ignore the young bits in the ptes. Signed-off-by: Andrea Arcangeli <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Tim Gardner <[email protected]> Signed-off-by: Brad Figg <[email protected]>
…4_xdr_dec_open_noattr The `nfs4_xdr_dec_open()` function does not properly check the return status of the `ACCESS` operation. This oversight can result in out-of-bounds memory access when decoding NFSv4 compound requests. For instance, in an NFSv4 compound request `{5, PUTFH, OPEN, GETFH, ACCESS, GETATTR}`, if the `ACCESS` operation (step 4) returns an error, the function proceeds to decode the subsequent `GETATTR` operation (step 5) without validating the RPC buffer's state. This can cause an RPC buffer overflow, which leading to a system panic. This issue can be reliably reproduced by running multiple `fsstress` tests in the same directory exported by the Ganesha NFS server. This patch introduces proper error handling for the `ACCESS` operation in `nfs4_xdr_dec_open()` and `nfs4_xdr_dec_open_noattr()`. When an error is detected, the decoding process is terminated gracefully to prevent further buffer corruption and ensure system stability. torvalds#7 [ffffa42b17337bc0] page_fault at ffffffff906010fe [exception RIP: xdr_set_page_base+61] RIP: ffffffffc12166dd RSP: ffffa42b17337c78 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffa42b17337db8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa42b17337db8 RBP: 0000000000000000 R8: ffff904948b0a650 R9: 0000000000000000 R10: 8080808080808080 R11: ffff904ac3c68be4 R12: 0000000000000009 R13: ffffa42b17337db8 R14: ffff904aa6aee000 R15: ffffffffc11f7f50 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 torvalds#8 [ffffa42b17337c78] xdr_set_next_buffer at ffffffffc1217b0b [sunrpc] torvalds#9 [ffffa42b17337c90] xdr_inline_decode at ffffffffc1218259 [sunrpc] torvalds#10 [ffffa42b17337cb8] __decode_op_hdr at ffffffffc128d2c2 [nfsv4] torvalds#11 [ffffa42b17337cf0] decode_getfattr_generic.constprop.124 at ffffffffc12980a2 [nfsv4] torvalds#12 [ffffa42b17337d58] nfs4_xdr_dec_open at ffffffffc1298374 [nfsv4] torvalds#13 [ffffa42b17337db0] call_decode at ffffffffc11f8144 [sunrpc] torvalds#14 [ffffa42b17337e28] __rpc_execute at ffffffffc1206ad5 [sunrpc] torvalds#15 [ffffa42b17337e80] rpc_async_schedule at ffffffffc1206e39 [sunrpc] torvalds#16 [ffffa42b17337e98] process_one_work at ffffffff8fcfe397 torvalds#17 [ffffa42b17337ed8] worker_thread at ffffffff8fcfea60 torvalds#18 [ffffa42b17337f10] kthread at ffffffff8fd04406 torvalds#19 [ffffa42b17337f50] ret_from_fork at ffffffff9060023f Signed-off-by: changxin.liu <[email protected]>
[ Upstream commit 5bf1557 ] test_progs uses glibc specific functions backtrace() and backtrace_symbols_fd() to print backtrace in case of SIGSEGV. Recent commit (see fixes) updated test_progs.c to define stub versions of the same functions with attriubte "weak" in order to allow linking test_progs against musl libc. Unfortunately this broke the backtrace handling for glibc builds. As it turns out, glibc defines backtrace() and backtrace_symbols_fd() as weak: $ llvm-readelf --symbols /lib64/libc.so.6 \ | grep -P '( backtrace_symbols_fd| backtrace)$' 4910: 0000000000126b40 161 FUNC WEAK DEFAULT 16 backtrace 6843: 0000000000126f90 852 FUNC WEAK DEFAULT 16 backtrace_symbols_fd So does test_progs: $ llvm-readelf --symbols test_progs \ | grep -P '( backtrace_symbols_fd| backtrace)$' 2891: 00000000006ad190 15 FUNC WEAK DEFAULT 13 backtrace 11215: 00000000006ad1a0 41 FUNC WEAK DEFAULT 13 backtrace_symbols_fd In such situation dynamic linker is not obliged to favour glibc implementation over the one defined in test_progs. Compiling with the following simple modification to test_progs.c demonstrates the issue: $ git diff ... \--- a/tools/testing/selftests/bpf/test_progs.c \+++ b/tools/testing/selftests/bpf/test_progs.c \@@ -1817,6 +1817,7 @@ int main(int argc, char **argv) if (err) return err; + *(int *)0xdeadbeef = 42; err = cd_flavor_subdir(argv[0]); if (err) return err; $ ./test_progs [0]: Caught signal torvalds#11! Stack trace: <backtrace not supported> Segmentation fault (core dumped) Resolve this by hiding stub definitions behind __GLIBC__ macro check instead of using "weak" attribute. Fixes: c9a83e7 ("selftests/bpf: Fix compile if backtrace support missing in libc") Signed-off-by: Eduard Zingerman <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Tested-by: Tony Ambardar <[email protected]> Reviewed-by: Tony Ambardar <[email protected]> Acked-by: Daniel Xu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Sasha Levin <[email protected]>
there is a global spinlock between reset and clk, if locked in reset, then print some debug information, maybe dead-lock when uart driver try to disable clk. Backtrace stopped: frame did not save the PC (gdb) thread 4 [Switching to thread 4 (Thread 4)] #0 cpu_relax () at ./arch/riscv/include/asm/vdso/processor.h:22 22 ./arch/riscv/include/asm/vdso/processor.h: No such file or directory. (gdb) bt #0 cpu_relax () at ./arch/riscv/include/asm/vdso/processor.h:22 #1 arch_spin_lock (lock=lock@entry=0xffffffff81a57cd0 <enable_lock>) at ./include/asm-generic/spinlock.h:49 #2 do_raw_spin_lock (lock=lock@entry=0xffffffff81a57cd0 <enable_lock>) at ./include/linux/spinlock.h:186 #3 0xffffffff80aa21ce in __raw_spin_lock_irqsave (lock=0xffffffff81a57cd0 <enable_lock>) at ./include/linux/spinlock_api_smp.h:111 #4 _raw_spin_lock_irqsave (lock=lock@entry=0xffffffff81a57cd0 <enable_lock>) at kernel/locking/spinlock.c:162 #5 0xffffffff80563416 in clk_enable_lock () at ./include/linux/spinlock.h:325 #6 0xffffffff805648de in clk_core_disable_lock (core=0xffffffd900512500) at drivers/clk/clk.c:1062 #7 0xffffffff8056527e in clk_disable (clk=<optimized out>) at drivers/clk/clk.c:1084 #8 clk_disable (clk=0xffffffd9048b5100) at drivers/clk/clk.c:1079 torvalds#9 0xffffffff8059e5d4 in serial_pxa_console_write (co=<optimized out>, s=0xffffffff81a68250 <text> "[ 14.708612] [RESET][spacemit_reset_set][373]:assert = 1, id = 59 \n", count=<optimized out>) at drivers/tty/serial/pxa_k1x.c:1724 torvalds#10 0xffffffff8004a34c in call_console_driver (dropped_text=0xffffffff81a68650 <dropped_text> "", len=69, text=0xffffffff81a68250 <text> "[ 14.708612] [RESET][spacemit_reset_set][373]:assert = 1, id = 59 \n", con=0xffffffff81964c10 <serial_pxa_console>) at kernel/printk/printk.c:1942 torvalds#11 console_emit_next_record (con=con@entry=0xffffffff81964c10 <serial_pxa_console>, ext_text=<optimized out>, dropped_text=0xffffffff81a68650 <dropped_text> "", handover=0xffffffc80578baa7, text=0xffffffff81a68250 <text> "[ 14.708612] [RESET][spacemit_reset_set][373]:assert = 1, id = 59 \n") at kernel/printk/printk.c:2731 torvalds#12 0xffffffff8004a49a in console_flush_all (handover=0xffffffc80578baa7, next_seq=<synthetic pointer>, do_cond_resched=false) at kernel/printk/printk.c:2793 torvalds#13 console_unlock () at kernel/printk/printk.c:2860 torvalds#14 0xffffffff8004b388 in vprintk_emit (facility=facility@entry=0, level=<optimized out>, level@entry=-1, dev_info=dev_info@entry=0x0, fmt=<optimized out>, args=<optimized out>) at kernel/printk/printk.c:2268 torvalds#15 0xffffffff8004b3ae in vprintk_default (fmt=<optimized out>, args=<optimized out>) at kernel/printk/printk.c:2279 torvalds#16 0xffffffff8004b646 in vprintk (fmt=fmt@entry=0xffffffff813be470 "\001\066[RESET][%s][%d]:assert = %d, id = %d \n", args=args@entry=0xffffffc80578bbd8) at kernel/printk/printk_safe.c:50 torvalds#17 0xffffffff80a880d6 in _printk (fmt=fmt@entry=0xffffffff813be470 "\001\066[RESET][%s][%d]:assert = %d, id = %d \n") at kernel/printk/printk.c:2289 torvalds#18 0xffffffff80a90bb6 in spacemit_reset_set (rcdev=rcdev@entry=0xffffffff81f563a8 <k1x_reset_controller+8>, id=id@entry=59, assert=assert@entry=true) at drivers/reset/reset-spacemit-k1x.c:373 torvalds#19 0xffffffff805823b6 in spacemit_reset_update (assert=true, id=59, rcdev=0xffffffff81f563a8 <k1x_reset_controller+8>) at drivers/reset/reset-spacemit-k1x.c:401 torvalds#20 spacemit_reset_update (assert=true, id=59, rcdev=0xffffffff81f563a8 <k1x_reset_controller+8>) at drivers/reset/reset-spacemit-k1x.c:387 torvalds#21 spacemit_reset_assert (rcdev=0xffffffff81f563a8 <k1x_reset_controller+8>, id=59) at drivers/reset/reset-spacemit-k1x.c:413 torvalds#22 0xffffffff8058158e in reset_control_assert (rstc=0xffffffd902b2f280) at drivers/reset/core.c:485 torvalds#23 0xffffffff807ccf96 in cpp_disable_clocks (cpp_dev=cpp_dev@entry=0xffffffd904cc9040) at drivers/media/platform/spacemit/camera/cam_cpp/k1x_cpp.c:960 torvalds#24 0xffffffff807cd0b2 in cpp_release_hardware (cpp_dev=cpp_dev@entry=0xffffffd904cc9040) at drivers/media/platform/spacemit/camera/cam_cpp/k1x_cpp.c:1038 torvalds#25 0xffffffff807cd990 in cpp_close_node (sd=<optimized out>, fh=<optimized out>) at drivers/media/platform/spacemit/camera/cam_cpp/k1x_cpp.c:1135 torvalds#26 0xffffffff8079525e in subdev_close (file=0xffffffd906645d00) at drivers/media/v4l2-core/v4l2-subdev.c:105 torvalds#27 0xffffffff8078e49e in v4l2_release (inode=<optimized out>, filp=0xffffffd906645d00) at drivers/media/v4l2-core/v4l2-dev.c:459 torvalds#28 0xffffffff80154974 in __fput (file=0xffffffd906645d00) at fs/file_table.c:320 torvalds#29 0xffffffff80154aa2 in ____fput (work=<optimized out>) at fs/file_table.c:348 torvalds#30 0xffffffff8002677e in task_work_run () at kernel/task_work.c:179 torvalds#31 0xffffffff800053b4 in resume_user_mode_work (regs=0xffffffc80578bee0) at ./include/linux/resume_user_mode.h:49 torvalds#32 do_work_pending (regs=0xffffffc80578bee0, thread_info_flags=<optimized out>) at arch/riscv/kernel/signal.c:478 torvalds#33 0xffffffff800039c6 in handle_exception () at arch/riscv/kernel/entry.S:374 Backtrace stopped: frame did not save the PC (gdb) thread 1 [Switching to thread 1 (Thread 1)] #0 0xffffffff80047e9c in arch_spin_lock (lock=lock@entry=0xffffffff81a57cd8 <g_cru_lock>) at ./include/asm-generic/spinlock.h:49 49 ./include/asm-generic/spinlock.h: No such file or directory. (gdb) bt #0 0xffffffff80047e9c in arch_spin_lock (lock=lock@entry=0xffffffff81a57cd8 <g_cru_lock>) at ./include/asm-generic/spinlock.h:49 #1 do_raw_spin_lock (lock=lock@entry=0xffffffff81a57cd8 <g_cru_lock>) at ./include/linux/spinlock.h:186 #2 0xffffffff80aa21ce in __raw_spin_lock_irqsave (lock=0xffffffff81a57cd8 <g_cru_lock>) at ./include/linux/spinlock_api_smp.h:111 #3 _raw_spin_lock_irqsave (lock=0xffffffff81a57cd8 <g_cru_lock>) at kernel/locking/spinlock.c:162 #4 0xffffffff8056c4cc in ccu_mix_disable (hw=0xffffffff81956858 <sdh2_clk+120>) at ./include/linux/spinlock.h:325 #5 0xffffffff80564832 in clk_core_disable (core=0xffffffd900529900) at drivers/clk/clk.c:1051 #6 clk_core_disable (core=0xffffffd900529900) at drivers/clk/clk.c:1031 #7 0xffffffff805648e6 in clk_core_disable_lock (core=0xffffffd900529900) at drivers/clk/clk.c:1063 #8 0xffffffff8056527e in clk_disable (clk=<optimized out>) at drivers/clk/clk.c:1084 torvalds#9 clk_disable (clk=clk@entry=0xffffffd904fafa80) at drivers/clk/clk.c:1079 torvalds#10 0xffffffff808bb898 in clk_disable_unprepare (clk=0xffffffd904fafa80) at ./include/linux/clk.h:1085 torvalds#11 0xffffffff808bb916 in spacemit_sdhci_runtime_suspend (dev=<optimized out>) at drivers/mmc/host/sdhci-of-k1x.c:1469 torvalds#12 0xffffffff8066e8e2 in pm_generic_runtime_suspend (dev=<optimized out>) at drivers/base/power/generic_ops.c:25 torvalds#13 0xffffffff80670398 in __rpm_callback (cb=cb@entry=0xffffffff8066e8ca <pm_generic_runtime_suspend>, dev=dev@entry=0xffffffd9018a2810) at drivers/base/power/runtime.c:395 torvalds#14 0xffffffff806704b8 in rpm_callback (cb=cb@entry=0xffffffff8066e8ca <pm_generic_runtime_suspend>, dev=dev@entry=0xffffffd9018a2810) at drivers/base/power/runtime.c:529 torvalds#15 0xffffffff80670bdc in rpm_suspend (dev=0xffffffd9018a2810, rpmflags=<optimized out>) at drivers/base/power/runtime.c:672 torvalds#16 0xffffffff806716de in pm_runtime_work (work=0xffffffd9018a2948) at drivers/base/power/runtime.c:974 torvalds#17 0xffffffff800236f4 in process_one_work (worker=worker@entry=0xffffffd9013ee9c0, work=0xffffffd9018a2948) at kernel/workqueue.c:2289 torvalds#18 0xffffffff80023ba6 in worker_thread (__worker=0xffffffd9013ee9c0) at kernel/workqueue.c:2436 torvalds#19 0xffffffff80028bb2 in kthread (_create=0xffffffd9017de840) at kernel/kthread.c:376 torvalds#20 0xffffffff80003934 in handle_exception () at arch/riscv/kernel/entry.S:249 Backtrace stopped: frame did not save the PC (gdb) Change-Id: Ia95b41ffd6c1893c9c5e9c1c9fc0c155ea902d2c
there is an invalid instrucation crash when run node.js: [ 443.219580] node[3123]: unhandled signal 4 code 0x1 at 0x00000038be663620 [ 443.226499] CPU: 5 PID: 3123 Comm: node Not tainted 6.6.36+ torvalds#11 [ 443.232501] Hardware name: spacemit k1-x deb1 board (DT) [ 443.237875] epc : 00000038be663620 ra : 00000038be652e00 sp : 0000003ff310a000 [ 443.245195] gp : 000000000447d6d0 tp : 0000003f82e2b780 t0 : 0000003e5c000000 [ 443.252501] t1 : 00000000000d31b8 t2 : 0000000000000063 s0 : 0000003ff310a050 [ 443.259815] s1 : 0000003ff3109fd0 a0 : 0000003c1e11ba29 a1 : 0000000000000004 [ 443.267121] a2 : 00000000000d31b8 a3 : 0000000000000003 a4 : 000000000019759e [ 443.274435] a5 : 0000000000000075 a6 : 000000000000006c a7 : 0000000000000065 [ 443.281749] s2 : 00000000010df958 s3 : 0000000000000001 s4 : 0000003e5c0d31b8 [ 443.289054] s5 : 00000000045442e0 s6 : 0000000004544260 s7 : 0000000ba8d91399 [ 443.296368] s8 : 0000000000000000 s9 : 00000038be650168 s10: 0000000ba8d9fa81 [ 443.303674] s11: 0000000000000000 t3 : 00000038be650198 t4 : 0000002200000000 [ 443.310980] t5 : 0000000000000008 t6 : 00000038be663620 [ 443.316352] status: 8000000200006020 badaddr: 0000000000800e13 cause: 0000000000000002 the op-code 0x00800e13 should be a valid instruction 'li t3, 0' the cause of the issue is that the i-cache data is wrong, when flush i-cahce request from user-space, icache of all cores related to the process should be flushed Change-Id: I0a06c77a2a3c1aa7aaf1e930eaa774d405e6fddb
there is a global spinlock between reset and clk, if locked in reset, then print some debug information, maybe dead-lock when uart driver try to disable clk. Backtrace stopped: frame did not save the PC (gdb) thread 4 [Switching to thread 4 (Thread 4)] #0 cpu_relax () at ./arch/riscv/include/asm/vdso/processor.h:22 22 ./arch/riscv/include/asm/vdso/processor.h: No such file or directory. (gdb) bt #0 cpu_relax () at ./arch/riscv/include/asm/vdso/processor.h:22 #1 arch_spin_lock (lock=lock@entry=0xffffffff81a57cd0 <enable_lock>) at ./include/asm-generic/spinlock.h:49 #2 do_raw_spin_lock (lock=lock@entry=0xffffffff81a57cd0 <enable_lock>) at ./include/linux/spinlock.h:186 #3 0xffffffff80aa21ce in __raw_spin_lock_irqsave (lock=0xffffffff81a57cd0 <enable_lock>) at ./include/linux/spinlock_api_smp.h:111 #4 _raw_spin_lock_irqsave (lock=lock@entry=0xffffffff81a57cd0 <enable_lock>) at kernel/locking/spinlock.c:162 #5 0xffffffff80563416 in clk_enable_lock () at ./include/linux/spinlock.h:325 #6 0xffffffff805648de in clk_core_disable_lock (core=0xffffffd900512500) at drivers/clk/clk.c:1062 #7 0xffffffff8056527e in clk_disable (clk=<optimized out>) at drivers/clk/clk.c:1084 #8 clk_disable (clk=0xffffffd9048b5100) at drivers/clk/clk.c:1079 torvalds#9 0xffffffff8059e5d4 in serial_pxa_console_write (co=<optimized out>, s=0xffffffff81a68250 <text> "[ 14.708612] [RESET][spacemit_reset_set][373]:assert = 1, id = 59 \n", count=<optimized out>) at drivers/tty/serial/pxa_k1x.c:1724 torvalds#10 0xffffffff8004a34c in call_console_driver (dropped_text=0xffffffff81a68650 <dropped_text> "", len=69, text=0xffffffff81a68250 <text> "[ 14.708612] [RESET][spacemit_reset_set][373]:assert = 1, id = 59 \n", con=0xffffffff81964c10 <serial_pxa_console>) at kernel/printk/printk.c:1942 torvalds#11 console_emit_next_record (con=con@entry=0xffffffff81964c10 <serial_pxa_console>, ext_text=<optimized out>, dropped_text=0xffffffff81a68650 <dropped_text> "", handover=0xffffffc80578baa7, text=0xffffffff81a68250 <text> "[ 14.708612] [RESET][spacemit_reset_set][373]:assert = 1, id = 59 \n") at kernel/printk/printk.c:2731 torvalds#12 0xffffffff8004a49a in console_flush_all (handover=0xffffffc80578baa7, next_seq=<synthetic pointer>, do_cond_resched=false) at kernel/printk/printk.c:2793 torvalds#13 console_unlock () at kernel/printk/printk.c:2860 torvalds#14 0xffffffff8004b388 in vprintk_emit (facility=facility@entry=0, level=<optimized out>, level@entry=-1, dev_info=dev_info@entry=0x0, fmt=<optimized out>, args=<optimized out>) at kernel/printk/printk.c:2268 torvalds#15 0xffffffff8004b3ae in vprintk_default (fmt=<optimized out>, args=<optimized out>) at kernel/printk/printk.c:2279 torvalds#16 0xffffffff8004b646 in vprintk (fmt=fmt@entry=0xffffffff813be470 "\001\066[RESET][%s][%d]:assert = %d, id = %d \n", args=args@entry=0xffffffc80578bbd8) at kernel/printk/printk_safe.c:50 torvalds#17 0xffffffff80a880d6 in _printk (fmt=fmt@entry=0xffffffff813be470 "\001\066[RESET][%s][%d]:assert = %d, id = %d \n") at kernel/printk/printk.c:2289 torvalds#18 0xffffffff80a90bb6 in spacemit_reset_set (rcdev=rcdev@entry=0xffffffff81f563a8 <k1x_reset_controller+8>, id=id@entry=59, assert=assert@entry=true) at drivers/reset/reset-spacemit-k1x.c:373 torvalds#19 0xffffffff805823b6 in spacemit_reset_update (assert=true, id=59, rcdev=0xffffffff81f563a8 <k1x_reset_controller+8>) at drivers/reset/reset-spacemit-k1x.c:401 torvalds#20 spacemit_reset_update (assert=true, id=59, rcdev=0xffffffff81f563a8 <k1x_reset_controller+8>) at drivers/reset/reset-spacemit-k1x.c:387 torvalds#21 spacemit_reset_assert (rcdev=0xffffffff81f563a8 <k1x_reset_controller+8>, id=59) at drivers/reset/reset-spacemit-k1x.c:413 torvalds#22 0xffffffff8058158e in reset_control_assert (rstc=0xffffffd902b2f280) at drivers/reset/core.c:485 torvalds#23 0xffffffff807ccf96 in cpp_disable_clocks (cpp_dev=cpp_dev@entry=0xffffffd904cc9040) at drivers/media/platform/spacemit/camera/cam_cpp/k1x_cpp.c:960 torvalds#24 0xffffffff807cd0b2 in cpp_release_hardware (cpp_dev=cpp_dev@entry=0xffffffd904cc9040) at drivers/media/platform/spacemit/camera/cam_cpp/k1x_cpp.c:1038 torvalds#25 0xffffffff807cd990 in cpp_close_node (sd=<optimized out>, fh=<optimized out>) at drivers/media/platform/spacemit/camera/cam_cpp/k1x_cpp.c:1135 torvalds#26 0xffffffff8079525e in subdev_close (file=0xffffffd906645d00) at drivers/media/v4l2-core/v4l2-subdev.c:105 torvalds#27 0xffffffff8078e49e in v4l2_release (inode=<optimized out>, filp=0xffffffd906645d00) at drivers/media/v4l2-core/v4l2-dev.c:459 torvalds#28 0xffffffff80154974 in __fput (file=0xffffffd906645d00) at fs/file_table.c:320 torvalds#29 0xffffffff80154aa2 in ____fput (work=<optimized out>) at fs/file_table.c:348 torvalds#30 0xffffffff8002677e in task_work_run () at kernel/task_work.c:179 torvalds#31 0xffffffff800053b4 in resume_user_mode_work (regs=0xffffffc80578bee0) at ./include/linux/resume_user_mode.h:49 torvalds#32 do_work_pending (regs=0xffffffc80578bee0, thread_info_flags=<optimized out>) at arch/riscv/kernel/signal.c:478 torvalds#33 0xffffffff800039c6 in handle_exception () at arch/riscv/kernel/entry.S:374 Backtrace stopped: frame did not save the PC (gdb) thread 1 [Switching to thread 1 (Thread 1)] #0 0xffffffff80047e9c in arch_spin_lock (lock=lock@entry=0xffffffff81a57cd8 <g_cru_lock>) at ./include/asm-generic/spinlock.h:49 49 ./include/asm-generic/spinlock.h: No such file or directory. (gdb) bt #0 0xffffffff80047e9c in arch_spin_lock (lock=lock@entry=0xffffffff81a57cd8 <g_cru_lock>) at ./include/asm-generic/spinlock.h:49 #1 do_raw_spin_lock (lock=lock@entry=0xffffffff81a57cd8 <g_cru_lock>) at ./include/linux/spinlock.h:186 #2 0xffffffff80aa21ce in __raw_spin_lock_irqsave (lock=0xffffffff81a57cd8 <g_cru_lock>) at ./include/linux/spinlock_api_smp.h:111 #3 _raw_spin_lock_irqsave (lock=0xffffffff81a57cd8 <g_cru_lock>) at kernel/locking/spinlock.c:162 #4 0xffffffff8056c4cc in ccu_mix_disable (hw=0xffffffff81956858 <sdh2_clk+120>) at ./include/linux/spinlock.h:325 #5 0xffffffff80564832 in clk_core_disable (core=0xffffffd900529900) at drivers/clk/clk.c:1051 #6 clk_core_disable (core=0xffffffd900529900) at drivers/clk/clk.c:1031 #7 0xffffffff805648e6 in clk_core_disable_lock (core=0xffffffd900529900) at drivers/clk/clk.c:1063 #8 0xffffffff8056527e in clk_disable (clk=<optimized out>) at drivers/clk/clk.c:1084 torvalds#9 clk_disable (clk=clk@entry=0xffffffd904fafa80) at drivers/clk/clk.c:1079 torvalds#10 0xffffffff808bb898 in clk_disable_unprepare (clk=0xffffffd904fafa80) at ./include/linux/clk.h:1085 torvalds#11 0xffffffff808bb916 in spacemit_sdhci_runtime_suspend (dev=<optimized out>) at drivers/mmc/host/sdhci-of-k1x.c:1469 torvalds#12 0xffffffff8066e8e2 in pm_generic_runtime_suspend (dev=<optimized out>) at drivers/base/power/generic_ops.c:25 torvalds#13 0xffffffff80670398 in __rpm_callback (cb=cb@entry=0xffffffff8066e8ca <pm_generic_runtime_suspend>, dev=dev@entry=0xffffffd9018a2810) at drivers/base/power/runtime.c:395 torvalds#14 0xffffffff806704b8 in rpm_callback (cb=cb@entry=0xffffffff8066e8ca <pm_generic_runtime_suspend>, dev=dev@entry=0xffffffd9018a2810) at drivers/base/power/runtime.c:529 torvalds#15 0xffffffff80670bdc in rpm_suspend (dev=0xffffffd9018a2810, rpmflags=<optimized out>) at drivers/base/power/runtime.c:672 torvalds#16 0xffffffff806716de in pm_runtime_work (work=0xffffffd9018a2948) at drivers/base/power/runtime.c:974 torvalds#17 0xffffffff800236f4 in process_one_work (worker=worker@entry=0xffffffd9013ee9c0, work=0xffffffd9018a2948) at kernel/workqueue.c:2289 torvalds#18 0xffffffff80023ba6 in worker_thread (__worker=0xffffffd9013ee9c0) at kernel/workqueue.c:2436 torvalds#19 0xffffffff80028bb2 in kthread (_create=0xffffffd9017de840) at kernel/kthread.c:376 torvalds#20 0xffffffff80003934 in handle_exception () at arch/riscv/kernel/entry.S:249 Backtrace stopped: frame did not save the PC (gdb) Change-Id: Ia95b41ffd6c1893c9c5e9c1c9fc0c155ea902d2c
there is an invalid instrucation crash when run node.js: [ 443.219580] node[3123]: unhandled signal 4 code 0x1 at 0x00000038be663620 [ 443.226499] CPU: 5 PID: 3123 Comm: node Not tainted 6.6.36+ torvalds#11 [ 443.232501] Hardware name: spacemit k1-x deb1 board (DT) [ 443.237875] epc : 00000038be663620 ra : 00000038be652e00 sp : 0000003ff310a000 [ 443.245195] gp : 000000000447d6d0 tp : 0000003f82e2b780 t0 : 0000003e5c000000 [ 443.252501] t1 : 00000000000d31b8 t2 : 0000000000000063 s0 : 0000003ff310a050 [ 443.259815] s1 : 0000003ff3109fd0 a0 : 0000003c1e11ba29 a1 : 0000000000000004 [ 443.267121] a2 : 00000000000d31b8 a3 : 0000000000000003 a4 : 000000000019759e [ 443.274435] a5 : 0000000000000075 a6 : 000000000000006c a7 : 0000000000000065 [ 443.281749] s2 : 00000000010df958 s3 : 0000000000000001 s4 : 0000003e5c0d31b8 [ 443.289054] s5 : 00000000045442e0 s6 : 0000000004544260 s7 : 0000000ba8d91399 [ 443.296368] s8 : 0000000000000000 s9 : 00000038be650168 s10: 0000000ba8d9fa81 [ 443.303674] s11: 0000000000000000 t3 : 00000038be650198 t4 : 0000002200000000 [ 443.310980] t5 : 0000000000000008 t6 : 00000038be663620 [ 443.316352] status: 8000000200006020 badaddr: 0000000000800e13 cause: 0000000000000002 the op-code 0x00800e13 should be a valid instruction 'li t3, 0' the cause of the issue is that the i-cache data is wrong, when flush i-cahce request from user-space, icache of all cores related to the process should be flushed Change-Id: I0a06c77a2a3c1aa7aaf1e930eaa774d405e6fddb
Petr Machata says: ==================== vxlan: Support user-defined reserved bits Currently the VXLAN header validation works by vxlan_rcv() going feature by feature, each feature clearing the bits that it consumes. If anything is left unparsed at the end, the packet is rejected. Unfortunately there are machines out there that send VXLAN packets with reserved bits set, even if they are configured to not use the corresponding features. One such report is here[1], and we have heard similar complaints from our customers as well. This patchset adds an attribute that makes it configurable which bits the user wishes to tolerate and which they consider reserved. This was recommended in [1] as well. A knob like that inevitably allows users to set as reserved bits that are in fact required for the features enabled by the netdevice, such as GPE. This is detected, and such configurations are rejected. In patches #1..torvalds#7, the reserved bits validation code is gradually moved away from the unparsed approach described above, to one where a given set of valid bits is precomputed and then the packet is validated against that. In patch torvalds#8, this precomputed set is made configurable through a new attribute IFLA_VXLAN_RESERVED_BITS. Patches torvalds#9 and torvalds#10 massage the testsuite a bit, so that patch torvalds#11 can introduce a selftest for the resreved bits feature. The corresponding iproute2 support is available in [2]. [1] https://lore.kernel.org/netdev/[email protected]/ [2] https://github.com/pmachata/iproute2/commits/vxlan_reserved_bits/ ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
[ 123.491737][ T1] Unexpected kernel BRK exception at EL1 [ 123.497593][ T1] Internal error: ptrace BRK handler: f20003e8 [#1] PREEMPT SMP [ 123.500785][ T1] Modules linked in: [ 123.502567][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 5.8.0-rc3-next-20200630-00003-g15e24419c239-dirty torvalds#11 [ 123.507468][ T1] Hardware name: linux,dummy-virt (DT) [ 123.509826][ T1] pstate: 80400005 (Nzcv daif +PAN -UAO BTYPE=--) [ 123.512609][ T1] pc : of_unittest_untrack_overlay+0x64/0x134 [ 123.515245][ T1] lr : of_unittest_untrack_overlay+0x64/0x134 [ 123.517848][ T1] sp : ffff00006a65fb30 [ 123.519668][ T1] x29: ffff00006a65fb30 x28: 0000000000000000 [ 123.522295][ T1] x27: ffff00006a65fc30 x26: ffffa00016b86f00 [ 123.524937][ T1] x25: 0000000000000000 x24: 0000000000000000 [ 123.527592][ T1] x23: ffffa00014c72540 x22: ffffa00016b86000 [ 123.530191][ T1] x21: 0000000000000000 x20: 00000000ffffffff [ 123.532845][ T1] x19: 00000000ffffffff x18: 0000000000002690 [ 123.535547][ T1] x17: 0000000000002718 x16: 00000000000014b8 [ 123.538299][ T1] x15: 0000000000000001 x14: 0080000000000000 [ 123.541055][ T1] x13: 0000000000000002 x12: ffff94000298d209 [ 123.543801][ T1] x11: 1ffff4000298d208 x10: ffff94000298d208 [ 123.546580][ T1] x9 : dfffa00000000000 x8 : ffffa00014c69047 [ 123.549247][ T1] x7 : 0000000000000001 x6 : ffffa00014c69040 [ 123.552026][ T1] x5 : ffff00006a654040 x4 : 0000000000000000 [ 123.554799][ T1] x3 : ffffa00011d59d04 x2 : 00000000ffffffff [ 123.557541][ T1] x1 : ffff00006a654040 x0 : 0000000000000000 [ 123.560390][ T1] Call trace: [ 123.561935][ T1] of_unittest_untrack_overlay+0x64/0x134 [ 123.564469][ T1] of_unittest+0x2220/0x2438 [ 123.566585][ T1] do_one_initcall+0x470/0xa10 [ 123.568751][ T1] kernel_init_freeable+0x510/0x5f0 [ 123.571123][ T1] kernel_init+0x18/0x1e8 [ 123.573078][ T1] ret_from_fork+0x10/0x18 [ 123.575119][ T1] Code: 97978a9c d4210000 14000024 97978a99 (d4207d00) [ 123.578138][ T1] ---[ end trace c4e049fb5e3b0ba0 ]--- [ 123.580449][ T1] Kernel panic - not syncing: Fatal exception [ 123.583116][ T1] Kernel Offset: disabled [ 123.585066][ T1] CPU features: 0x240002,20002004 [ 123.587259][ T1] Memory Limit: none [ 123.588986][ T1] ---[ end Kernel panic - not syncing: Fatal exception ]--- Signed-off-by: Anders Roxell <[email protected]>
Once we are inside the 'ext4_xattr_delete_inode' function and trying to delete the inode, the 'xattr_sem' should be unlocked. We need trylock here to avoid false-positive warning from lockdep about reclaim circular dependency. This fixes the following KASAN reported issue: ================================================================== BUG: KASAN: slab-use-after-free in ext4_xattr_inode_dec_ref_all+0xb8c/0xe90 Read of size 4 at addr ffff888012c120c4 by task repro/2065 CPU: 1 UID: 0 PID: 2065 Comm: repro Not tainted 6.13.0-rc2+ torvalds#11 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x1fd/0x300 ? tcp_gro_dev_warn+0x260/0x260 ? _printk+0xc0/0x100 ? read_lock_is_recursive+0x10/0x10 ? irq_work_queue+0x72/0xf0 ? __virt_addr_valid+0x17b/0x4b0 print_address_description+0x78/0x390 print_report+0x107/0x1f0 ? __virt_addr_valid+0x17b/0x4b0 ? __virt_addr_valid+0x3ff/0x4b0 ? __phys_addr+0xb5/0x160 ? ext4_xattr_inode_dec_ref_all+0xb8c/0xe90 kasan_report+0xcc/0x100 ? ext4_xattr_inode_dec_ref_all+0xb8c/0xe90 ext4_xattr_inode_dec_ref_all+0xb8c/0xe90 ? ext4_xattr_delete_inode+0xd30/0xd30 ? __ext4_journal_ensure_credits+0x5f0/0x5f0 ? __ext4_journal_ensure_credits+0x2b/0x5f0 ? inode_update_timestamps+0x410/0x410 ext4_xattr_delete_inode+0xb64/0xd30 ? ext4_truncate+0xb70/0xdc0 ? ext4_expand_extra_isize_ea+0x1d20/0x1d20 ? __ext4_mark_inode_dirty+0x670/0x670 ? ext4_journal_check_start+0x16f/0x240 ? ext4_inode_is_fast_symlink+0x2f2/0x3a0 ext4_evict_inode+0xc8c/0xff0 ? ext4_inode_is_fast_symlink+0x3a0/0x3a0 ? do_raw_spin_unlock+0x53/0x8a0 ? ext4_inode_is_fast_symlink+0x3a0/0x3a0 evict+0x4ac/0x950 ? proc_nr_inodes+0x310/0x310 ? trace_ext4_drop_inode+0xa2/0x220 ? _raw_spin_unlock+0x1a/0x30 ? iput+0x4cb/0x7e0 do_unlinkat+0x495/0x7c0 ? try_break_deleg+0x120/0x120 ? 0xffffffff81000000 ? __check_object_size+0x15a/0x210 ? strncpy_from_user+0x13e/0x250 ? getname_flags+0x1dc/0x530 __x64_sys_unlinkat+0xc8/0xf0 do_syscall_64+0x65/0x110 entry_SYSCALL_64_after_hwframe+0x67/0x6f RIP: 0033:0x434ffd Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8 RSP: 002b:00007ffc50fa7b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000107 RAX: ffffffffffffffda RBX: 00007ffc50fa7e18 RCX: 0000000000434ffd RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000005 RBP: 00007ffc50fa7be0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 R13: 00007ffc50fa7e08 R14: 00000000004bbf30 R15: 0000000000000001 </TASK> The buggy address belongs to the object at ffff888012c12000 which belongs to the cache filp of size 360 The buggy address is located 196 bytes inside of freed 360-byte region [ffff888012c12000, ffff888012c12168) The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x12c12 head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0x40(head|node=0|zone=0) page_type: f5(slab) raw: 0000000000000040 ffff888000ad7640 ffffea0000497a00 dead000000000004 raw: 0000000000000000 0000000000100010 00000001f5000000 0000000000000000 head: 0000000000000040 ffff888000ad7640 ffffea0000497a00 dead000000000004 head: 0000000000000000 0000000000100010 00000001f5000000 0000000000000000 head: 0000000000000001 ffffea00004b0481 ffffffffffffffff 0000000000000000 head: 0000000000000002 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888012c11f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888012c12000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff888012c12080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888012c12100: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc ffff888012c12180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=b244bda78289b00204ed Signed-off-by: Bhupesh <[email protected]>
[ Upstream commit 5bf1557 ] test_progs uses glibc specific functions backtrace() and backtrace_symbols_fd() to print backtrace in case of SIGSEGV. Recent commit (see fixes) updated test_progs.c to define stub versions of the same functions with attriubte "weak" in order to allow linking test_progs against musl libc. Unfortunately this broke the backtrace handling for glibc builds. As it turns out, glibc defines backtrace() and backtrace_symbols_fd() as weak: $ llvm-readelf --symbols /lib64/libc.so.6 \ | grep -P '( backtrace_symbols_fd| backtrace)$' 4910: 0000000000126b40 161 FUNC WEAK DEFAULT 16 backtrace 6843: 0000000000126f90 852 FUNC WEAK DEFAULT 16 backtrace_symbols_fd So does test_progs: $ llvm-readelf --symbols test_progs \ | grep -P '( backtrace_symbols_fd| backtrace)$' 2891: 00000000006ad190 15 FUNC WEAK DEFAULT 13 backtrace 11215: 00000000006ad1a0 41 FUNC WEAK DEFAULT 13 backtrace_symbols_fd In such situation dynamic linker is not obliged to favour glibc implementation over the one defined in test_progs. Compiling with the following simple modification to test_progs.c demonstrates the issue: $ git diff ... \--- a/tools/testing/selftests/bpf/test_progs.c \+++ b/tools/testing/selftests/bpf/test_progs.c \@@ -1817,6 +1817,7 @@ int main(int argc, char **argv) if (err) return err; + *(int *)0xdeadbeef = 42; err = cd_flavor_subdir(argv[0]); if (err) return err; $ ./test_progs [0]: Caught signal torvalds#11! Stack trace: <backtrace not supported> Segmentation fault (core dumped) Resolve this by hiding stub definitions behind __GLIBC__ macro check instead of using "weak" attribute. Fixes: c9a83e7 ("selftests/bpf: Fix compile if backtrace support missing in libc") Signed-off-by: Eduard Zingerman <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Tested-by: Tony Ambardar <[email protected]> Reviewed-by: Tony Ambardar <[email protected]> Acked-by: Daniel Xu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Sasha Levin <[email protected]>
[ 123.491737][ T1] Unexpected kernel BRK exception at EL1 [ 123.497593][ T1] Internal error: ptrace BRK handler: f20003e8 [#1] PREEMPT SMP [ 123.500785][ T1] Modules linked in: [ 123.502567][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 5.8.0-rc3-next-20200630-00003-g15e24419c239-dirty torvalds#11 [ 123.507468][ T1] Hardware name: linux,dummy-virt (DT) [ 123.509826][ T1] pstate: 80400005 (Nzcv daif +PAN -UAO BTYPE=--) [ 123.512609][ T1] pc : of_unittest_untrack_overlay+0x64/0x134 [ 123.515245][ T1] lr : of_unittest_untrack_overlay+0x64/0x134 [ 123.517848][ T1] sp : ffff00006a65fb30 [ 123.519668][ T1] x29: ffff00006a65fb30 x28: 0000000000000000 [ 123.522295][ T1] x27: ffff00006a65fc30 x26: ffffa00016b86f00 [ 123.524937][ T1] x25: 0000000000000000 x24: 0000000000000000 [ 123.527592][ T1] x23: ffffa00014c72540 x22: ffffa00016b86000 [ 123.530191][ T1] x21: 0000000000000000 x20: 00000000ffffffff [ 123.532845][ T1] x19: 00000000ffffffff x18: 0000000000002690 [ 123.535547][ T1] x17: 0000000000002718 x16: 00000000000014b8 [ 123.538299][ T1] x15: 0000000000000001 x14: 0080000000000000 [ 123.541055][ T1] x13: 0000000000000002 x12: ffff94000298d209 [ 123.543801][ T1] x11: 1ffff4000298d208 x10: ffff94000298d208 [ 123.546580][ T1] x9 : dfffa00000000000 x8 : ffffa00014c69047 [ 123.549247][ T1] x7 : 0000000000000001 x6 : ffffa00014c69040 [ 123.552026][ T1] x5 : ffff00006a654040 x4 : 0000000000000000 [ 123.554799][ T1] x3 : ffffa00011d59d04 x2 : 00000000ffffffff [ 123.557541][ T1] x1 : ffff00006a654040 x0 : 0000000000000000 [ 123.560390][ T1] Call trace: [ 123.561935][ T1] of_unittest_untrack_overlay+0x64/0x134 [ 123.564469][ T1] of_unittest+0x2220/0x2438 [ 123.566585][ T1] do_one_initcall+0x470/0xa10 [ 123.568751][ T1] kernel_init_freeable+0x510/0x5f0 [ 123.571123][ T1] kernel_init+0x18/0x1e8 [ 123.573078][ T1] ret_from_fork+0x10/0x18 [ 123.575119][ T1] Code: 97978a9c d4210000 14000024 97978a99 (d4207d00) [ 123.578138][ T1] ---[ end trace c4e049fb5e3b0ba0 ]--- [ 123.580449][ T1] Kernel panic - not syncing: Fatal exception [ 123.583116][ T1] Kernel Offset: disabled [ 123.585066][ T1] CPU features: 0x240002,20002004 [ 123.587259][ T1] Memory Limit: none [ 123.588986][ T1] ---[ end Kernel panic - not syncing: Fatal exception ]--- Signed-off-by: Anders Roxell <[email protected]>
[ Upstream commit 5bf1557 ] test_progs uses glibc specific functions backtrace() and backtrace_symbols_fd() to print backtrace in case of SIGSEGV. Recent commit (see fixes) updated test_progs.c to define stub versions of the same functions with attriubte "weak" in order to allow linking test_progs against musl libc. Unfortunately this broke the backtrace handling for glibc builds. As it turns out, glibc defines backtrace() and backtrace_symbols_fd() as weak: $ llvm-readelf --symbols /lib64/libc.so.6 \ | grep -P '( backtrace_symbols_fd| backtrace)$' 4910: 0000000000126b40 161 FUNC WEAK DEFAULT 16 backtrace 6843: 0000000000126f90 852 FUNC WEAK DEFAULT 16 backtrace_symbols_fd So does test_progs: $ llvm-readelf --symbols test_progs \ | grep -P '( backtrace_symbols_fd| backtrace)$' 2891: 00000000006ad190 15 FUNC WEAK DEFAULT 13 backtrace 11215: 00000000006ad1a0 41 FUNC WEAK DEFAULT 13 backtrace_symbols_fd In such situation dynamic linker is not obliged to favour glibc implementation over the one defined in test_progs. Compiling with the following simple modification to test_progs.c demonstrates the issue: $ git diff ... \--- a/tools/testing/selftests/bpf/test_progs.c \+++ b/tools/testing/selftests/bpf/test_progs.c \@@ -1817,6 +1817,7 @@ int main(int argc, char **argv) if (err) return err; + *(int *)0xdeadbeef = 42; err = cd_flavor_subdir(argv[0]); if (err) return err; $ ./test_progs [0]: Caught signal torvalds#11! Stack trace: <backtrace not supported> Segmentation fault (core dumped) Resolve this by hiding stub definitions behind __GLIBC__ macro check instead of using "weak" attribute. Fixes: c9a83e7 ("selftests/bpf: Fix compile if backtrace support missing in libc") Signed-off-by: Eduard Zingerman <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Tested-by: Tony Ambardar <[email protected]> Reviewed-by: Tony Ambardar <[email protected]> Acked-by: Daniel Xu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 5bf1557 ] test_progs uses glibc specific functions backtrace() and backtrace_symbols_fd() to print backtrace in case of SIGSEGV. Recent commit (see fixes) updated test_progs.c to define stub versions of the same functions with attriubte "weak" in order to allow linking test_progs against musl libc. Unfortunately this broke the backtrace handling for glibc builds. As it turns out, glibc defines backtrace() and backtrace_symbols_fd() as weak: $ llvm-readelf --symbols /lib64/libc.so.6 \ | grep -P '( backtrace_symbols_fd| backtrace)$' 4910: 0000000000126b40 161 FUNC WEAK DEFAULT 16 backtrace 6843: 0000000000126f90 852 FUNC WEAK DEFAULT 16 backtrace_symbols_fd So does test_progs: $ llvm-readelf --symbols test_progs \ | grep -P '( backtrace_symbols_fd| backtrace)$' 2891: 00000000006ad190 15 FUNC WEAK DEFAULT 13 backtrace 11215: 00000000006ad1a0 41 FUNC WEAK DEFAULT 13 backtrace_symbols_fd In such situation dynamic linker is not obliged to favour glibc implementation over the one defined in test_progs. Compiling with the following simple modification to test_progs.c demonstrates the issue: $ git diff ... \--- a/tools/testing/selftests/bpf/test_progs.c \+++ b/tools/testing/selftests/bpf/test_progs.c \@@ -1817,6 +1817,7 @@ int main(int argc, char **argv) if (err) return err; + *(int *)0xdeadbeef = 42; err = cd_flavor_subdir(argv[0]); if (err) return err; $ ./test_progs [0]: Caught signal torvalds#11! Stack trace: <backtrace not supported> Segmentation fault (core dumped) Resolve this by hiding stub definitions behind __GLIBC__ macro check instead of using "weak" attribute. Fixes: c9a83e7 ("selftests/bpf: Fix compile if backtrace support missing in libc") Signed-off-by: Eduard Zingerman <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Tested-by: Tony Ambardar <[email protected]> Reviewed-by: Tony Ambardar <[email protected]> Acked-by: Daniel Xu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Sasha Levin <[email protected]>
[ 123.491737][ T1] Unexpected kernel BRK exception at EL1 [ 123.497593][ T1] Internal error: ptrace BRK handler: f20003e8 [#1] PREEMPT SMP [ 123.500785][ T1] Modules linked in: [ 123.502567][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 5.8.0-rc3-next-20200630-00003-g15e24419c239-dirty torvalds#11 [ 123.507468][ T1] Hardware name: linux,dummy-virt (DT) [ 123.509826][ T1] pstate: 80400005 (Nzcv daif +PAN -UAO BTYPE=--) [ 123.512609][ T1] pc : of_unittest_untrack_overlay+0x64/0x134 [ 123.515245][ T1] lr : of_unittest_untrack_overlay+0x64/0x134 [ 123.517848][ T1] sp : ffff00006a65fb30 [ 123.519668][ T1] x29: ffff00006a65fb30 x28: 0000000000000000 [ 123.522295][ T1] x27: ffff00006a65fc30 x26: ffffa00016b86f00 [ 123.524937][ T1] x25: 0000000000000000 x24: 0000000000000000 [ 123.527592][ T1] x23: ffffa00014c72540 x22: ffffa00016b86000 [ 123.530191][ T1] x21: 0000000000000000 x20: 00000000ffffffff [ 123.532845][ T1] x19: 00000000ffffffff x18: 0000000000002690 [ 123.535547][ T1] x17: 0000000000002718 x16: 00000000000014b8 [ 123.538299][ T1] x15: 0000000000000001 x14: 0080000000000000 [ 123.541055][ T1] x13: 0000000000000002 x12: ffff94000298d209 [ 123.543801][ T1] x11: 1ffff4000298d208 x10: ffff94000298d208 [ 123.546580][ T1] x9 : dfffa00000000000 x8 : ffffa00014c69047 [ 123.549247][ T1] x7 : 0000000000000001 x6 : ffffa00014c69040 [ 123.552026][ T1] x5 : ffff00006a654040 x4 : 0000000000000000 [ 123.554799][ T1] x3 : ffffa00011d59d04 x2 : 00000000ffffffff [ 123.557541][ T1] x1 : ffff00006a654040 x0 : 0000000000000000 [ 123.560390][ T1] Call trace: [ 123.561935][ T1] of_unittest_untrack_overlay+0x64/0x134 [ 123.564469][ T1] of_unittest+0x2220/0x2438 [ 123.566585][ T1] do_one_initcall+0x470/0xa10 [ 123.568751][ T1] kernel_init_freeable+0x510/0x5f0 [ 123.571123][ T1] kernel_init+0x18/0x1e8 [ 123.573078][ T1] ret_from_fork+0x10/0x18 [ 123.575119][ T1] Code: 97978a9c d4210000 14000024 97978a99 (d4207d00) [ 123.578138][ T1] ---[ end trace c4e049fb5e3b0ba0 ]--- [ 123.580449][ T1] Kernel panic - not syncing: Fatal exception [ 123.583116][ T1] Kernel Offset: disabled [ 123.585066][ T1] CPU features: 0x240002,20002004 [ 123.587259][ T1] Memory Limit: none [ 123.588986][ T1] ---[ end Kernel panic - not syncing: Fatal exception ]--- Signed-off-by: Anders Roxell <[email protected]>
…atch-fixes WARNING: Possible repeated word: 'to' torvalds#11: set as null leaving it to to be accessed. Additionally, the read-only WARNING: Please use correct Fixes: style 'Fixes: <12 chars of sha1> ("<title line>")' - ie: 'Fixes: fatal: not a ("nux-next'")' torvalds#21: Fixes: 8f9e8f5 ("ocfs2: Fix Q_GETNEXTQUOTA for filesystem without quotas") WARNING: Reported-by: should be immediately followed by Closes: with a URL to the report torvalds#23: Reported-by: [email protected] Tested-by: [email protected] ERROR: space required before the open brace '{' torvalds#47: FILE: fs/ocfs2/quota_global.c:896: + if (!sb_has_quota_active(sb, type)){ total: 1 errors, 3 warnings, 15 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/ocfs2-fix-slab-use-after-free-due-to-dangling-pointer-dqi_priv.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Changwei Ge <[email protected]> Cc: Dennis Lam <[email protected]> Cc: Joel Becker <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Jun Piao <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Mark Fasheh <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
[ 123.491737][ T1] Unexpected kernel BRK exception at EL1 [ 123.497593][ T1] Internal error: ptrace BRK handler: f20003e8 [#1] PREEMPT SMP [ 123.500785][ T1] Modules linked in: [ 123.502567][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 5.8.0-rc3-next-20200630-00003-g15e24419c239-dirty torvalds#11 [ 123.507468][ T1] Hardware name: linux,dummy-virt (DT) [ 123.509826][ T1] pstate: 80400005 (Nzcv daif +PAN -UAO BTYPE=--) [ 123.512609][ T1] pc : of_unittest_untrack_overlay+0x64/0x134 [ 123.515245][ T1] lr : of_unittest_untrack_overlay+0x64/0x134 [ 123.517848][ T1] sp : ffff00006a65fb30 [ 123.519668][ T1] x29: ffff00006a65fb30 x28: 0000000000000000 [ 123.522295][ T1] x27: ffff00006a65fc30 x26: ffffa00016b86f00 [ 123.524937][ T1] x25: 0000000000000000 x24: 0000000000000000 [ 123.527592][ T1] x23: ffffa00014c72540 x22: ffffa00016b86000 [ 123.530191][ T1] x21: 0000000000000000 x20: 00000000ffffffff [ 123.532845][ T1] x19: 00000000ffffffff x18: 0000000000002690 [ 123.535547][ T1] x17: 0000000000002718 x16: 00000000000014b8 [ 123.538299][ T1] x15: 0000000000000001 x14: 0080000000000000 [ 123.541055][ T1] x13: 0000000000000002 x12: ffff94000298d209 [ 123.543801][ T1] x11: 1ffff4000298d208 x10: ffff94000298d208 [ 123.546580][ T1] x9 : dfffa00000000000 x8 : ffffa00014c69047 [ 123.549247][ T1] x7 : 0000000000000001 x6 : ffffa00014c69040 [ 123.552026][ T1] x5 : ffff00006a654040 x4 : 0000000000000000 [ 123.554799][ T1] x3 : ffffa00011d59d04 x2 : 00000000ffffffff [ 123.557541][ T1] x1 : ffff00006a654040 x0 : 0000000000000000 [ 123.560390][ T1] Call trace: [ 123.561935][ T1] of_unittest_untrack_overlay+0x64/0x134 [ 123.564469][ T1] of_unittest+0x2220/0x2438 [ 123.566585][ T1] do_one_initcall+0x470/0xa10 [ 123.568751][ T1] kernel_init_freeable+0x510/0x5f0 [ 123.571123][ T1] kernel_init+0x18/0x1e8 [ 123.573078][ T1] ret_from_fork+0x10/0x18 [ 123.575119][ T1] Code: 97978a9c d4210000 14000024 97978a99 (d4207d00) [ 123.578138][ T1] ---[ end trace c4e049fb5e3b0ba0 ]--- [ 123.580449][ T1] Kernel panic - not syncing: Fatal exception [ 123.583116][ T1] Kernel Offset: disabled [ 123.585066][ T1] CPU features: 0x240002,20002004 [ 123.587259][ T1] Memory Limit: none [ 123.588986][ T1] ---[ end Kernel panic - not syncing: Fatal exception ]--- Signed-off-by: Anders Roxell <[email protected]>
…atch-fixes WARNING: Possible repeated word: 'to' torvalds#11: set as null leaving it to to be accessed. Additionally, the read-only WARNING: Please use correct Fixes: style 'Fixes: <12 chars of sha1> ("<title line>")' - ie: 'Fixes: fatal: not a ("nux-next'")' torvalds#21: Fixes: 8f9e8f5 ("ocfs2: Fix Q_GETNEXTQUOTA for filesystem without quotas") WARNING: Reported-by: should be immediately followed by Closes: with a URL to the report torvalds#23: Reported-by: [email protected] Tested-by: [email protected] ERROR: space required before the open brace '{' torvalds#47: FILE: fs/ocfs2/quota_global.c:896: + if (!sb_has_quota_active(sb, type)){ total: 1 errors, 3 warnings, 15 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/ocfs2-fix-slab-use-after-free-due-to-dangling-pointer-dqi_priv.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Changwei Ge <[email protected]> Cc: Dennis Lam <[email protected]> Cc: Joel Becker <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Jun Piao <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Mark Fasheh <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
…atch-fixes WARNING: Possible repeated word: 'to' torvalds#11: set as null leaving it to to be accessed. Additionally, the read-only WARNING: Please use correct Fixes: style 'Fixes: <12 chars of sha1> ("<title line>")' - ie: 'Fixes: fatal: not a ("nux-next'")' torvalds#21: Fixes: 8f9e8f5 ("ocfs2: Fix Q_GETNEXTQUOTA for filesystem without quotas") WARNING: Reported-by: should be immediately followed by Closes: with a URL to the report torvalds#23: Reported-by: [email protected] Tested-by: [email protected] ERROR: space required before the open brace '{' torvalds#47: FILE: fs/ocfs2/quota_global.c:896: + if (!sb_has_quota_active(sb, type)){ total: 1 errors, 3 warnings, 15 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/ocfs2-fix-slab-use-after-free-due-to-dangling-pointer-dqi_priv.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Changwei Ge <[email protected]> Cc: Dennis Lam <[email protected]> Cc: Joel Becker <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Jun Piao <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Mark Fasheh <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
…atch-fixes WARNING: Possible repeated word: 'to' torvalds#11: set as null leaving it to to be accessed. Additionally, the read-only WARNING: Please use correct Fixes: style 'Fixes: <12 chars of sha1> ("<title line>")' - ie: 'Fixes: fatal: not a ("nux-next'")' torvalds#21: Fixes: 8f9e8f5 ("ocfs2: Fix Q_GETNEXTQUOTA for filesystem without quotas") WARNING: Reported-by: should be immediately followed by Closes: with a URL to the report torvalds#23: Reported-by: [email protected] Tested-by: [email protected] ERROR: space required before the open brace '{' torvalds#47: FILE: fs/ocfs2/quota_global.c:896: + if (!sb_has_quota_active(sb, type)){ total: 1 errors, 3 warnings, 15 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/ocfs2-fix-slab-use-after-free-due-to-dangling-pointer-dqi_priv.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Changwei Ge <[email protected]> Cc: Dennis Lam <[email protected]> Cc: Joel Becker <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Jun Piao <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Mark Fasheh <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
…atch-fixes WARNING: Possible repeated word: 'to' torvalds#11: set as null leaving it to to be accessed. Additionally, the read-only WARNING: Please use correct Fixes: style 'Fixes: <12 chars of sha1> ("<title line>")' - ie: 'Fixes: fatal: not a ("nux-next'")' torvalds#21: Fixes: 8f9e8f5 ("ocfs2: Fix Q_GETNEXTQUOTA for filesystem without quotas") WARNING: Reported-by: should be immediately followed by Closes: with a URL to the report torvalds#23: Reported-by: [email protected] Tested-by: [email protected] ERROR: space required before the open brace '{' torvalds#47: FILE: fs/ocfs2/quota_global.c:896: + if (!sb_has_quota_active(sb, type)){ total: 1 errors, 3 warnings, 15 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/ocfs2-fix-slab-use-after-free-due-to-dangling-pointer-dqi_priv.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Changwei Ge <[email protected]> Cc: Dennis Lam <[email protected]> Cc: Joel Becker <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Jun Piao <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Mark Fasheh <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
…atch-fixes WARNING: Possible repeated word: 'to' torvalds#11: set as null leaving it to to be accessed. Additionally, the read-only WARNING: Please use correct Fixes: style 'Fixes: <12 chars of sha1> ("<title line>")' - ie: 'Fixes: fatal: not a ("nux-next'")' torvalds#21: Fixes: 8f9e8f5 ("ocfs2: Fix Q_GETNEXTQUOTA for filesystem without quotas") WARNING: Reported-by: should be immediately followed by Closes: with a URL to the report torvalds#23: Reported-by: [email protected] Tested-by: [email protected] ERROR: space required before the open brace '{' torvalds#47: FILE: fs/ocfs2/quota_global.c:896: + if (!sb_has_quota_active(sb, type)){ total: 1 errors, 3 warnings, 15 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/ocfs2-fix-slab-use-after-free-due-to-dangling-pointer-dqi_priv.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Changwei Ge <[email protected]> Cc: Dennis Lam <[email protected]> Cc: Joel Becker <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Jun Piao <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Mark Fasheh <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
…atch-fixes WARNING: Possible repeated word: 'to' torvalds#11: set as null leaving it to to be accessed. Additionally, the read-only WARNING: Please use correct Fixes: style 'Fixes: <12 chars of sha1> ("<title line>")' - ie: 'Fixes: fatal: not a ("nux-next'")' torvalds#21: Fixes: 8f9e8f5 ("ocfs2: Fix Q_GETNEXTQUOTA for filesystem without quotas") WARNING: Reported-by: should be immediately followed by Closes: with a URL to the report torvalds#23: Reported-by: [email protected] Tested-by: [email protected] ERROR: space required before the open brace '{' torvalds#47: FILE: fs/ocfs2/quota_global.c:896: + if (!sb_has_quota_active(sb, type)){ total: 1 errors, 3 warnings, 15 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/ocfs2-fix-slab-use-after-free-due-to-dangling-pointer-dqi_priv.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Changwei Ge <[email protected]> Cc: Dennis Lam <[email protected]> Cc: Joel Becker <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Jun Piao <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Mark Fasheh <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
During the update procedure, when overwrite element in a pre-allocated htab, the freeing of old_element is protected by the bucket lock. The reason why the bucket lock is necessary is that the old_element has already been stashed in htab->extra_elems after alloc_htab_elem() returns. If freeing the old_element after the bucket lock is unlocked, the stashed element may be reused by concurrent update procedure and the freeing of old_element will run concurrently with the reuse of the old_element. However, the invocation of check_and_free_fields() may acquire a spin-lock which violates the lockdep rule because its caller has already held a raw-spin-lock (bucket lock). The following warning will be reported when such race happens: BUG: scheduling while atomic: test_progs/676/0x00000003 3 locks held by test_progs/676: #0: ffffffff864b0240 (rcu_read_lock_trace){....}-{0:0}, at: bpf_prog_test_run_syscall+0x2c0/0x830 #1: ffff88810e961188 (&htab->lockdep_key){....}-{2:2}, at: htab_map_update_elem+0x306/0x1500 #2: ffff8881f4eac1b8 (&base->softirq_expiry_lock){....}-{2:2}, at: hrtimer_cancel_wait_running+0xe9/0x1b0 Modules linked in: bpf_testmod(O) Preemption disabled at: [<ffffffff817837a3>] htab_map_update_elem+0x293/0x1500 CPU: 0 UID: 0 PID: 676 Comm: test_progs Tainted: G ... 6.12.0+ torvalds#11 Tainted: [W]=WARN, [O]=OOT_MODULE Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)... Call Trace: <TASK> dump_stack_lvl+0x57/0x70 dump_stack+0x10/0x20 __schedule_bug+0x120/0x170 __schedule+0x300c/0x4800 schedule_rtlock+0x37/0x60 rtlock_slowlock_locked+0x6d9/0x54c0 rt_spin_lock+0x168/0x230 hrtimer_cancel_wait_running+0xe9/0x1b0 hrtimer_cancel+0x24/0x30 bpf_timer_delete_work+0x1d/0x40 bpf_timer_cancel_and_free+0x5e/0x80 bpf_obj_free_fields+0x262/0x4a0 check_and_free_fields+0x1d0/0x280 htab_map_update_elem+0x7fc/0x1500 bpf_prog_9f90bc20768e0cb9_overwrite_cb+0x3f/0x43 bpf_prog_ea601c4649694dbd_overwrite_timer+0x5d/0x7e bpf_prog_test_run_syscall+0x322/0x830 __sys_bpf+0x135d/0x3ca0 __x64_sys_bpf+0x75/0xb0 x64_sys_call+0x1b5/0xa10 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 ... </TASK> To fix the problem, the patch breaks the reuse and refill of per-cpu extra_elems into two independent part: reuse the per-cpu extra_elems with bucket lock being held and refill the old_element as per-cpu extra_elems after the bucket lock is unlocked. After the break, it is safe to free pre-allocated element after bucket lock is unlocked. Reported-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Hou Tao <[email protected]>
[ 123.491737][ T1] Unexpected kernel BRK exception at EL1 [ 123.497593][ T1] Internal error: ptrace BRK handler: f20003e8 [#1] PREEMPT SMP [ 123.500785][ T1] Modules linked in: [ 123.502567][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 5.8.0-rc3-next-20200630-00003-g15e24419c239-dirty torvalds#11 [ 123.507468][ T1] Hardware name: linux,dummy-virt (DT) [ 123.509826][ T1] pstate: 80400005 (Nzcv daif +PAN -UAO BTYPE=--) [ 123.512609][ T1] pc : of_unittest_untrack_overlay+0x64/0x134 [ 123.515245][ T1] lr : of_unittest_untrack_overlay+0x64/0x134 [ 123.517848][ T1] sp : ffff00006a65fb30 [ 123.519668][ T1] x29: ffff00006a65fb30 x28: 0000000000000000 [ 123.522295][ T1] x27: ffff00006a65fc30 x26: ffffa00016b86f00 [ 123.524937][ T1] x25: 0000000000000000 x24: 0000000000000000 [ 123.527592][ T1] x23: ffffa00014c72540 x22: ffffa00016b86000 [ 123.530191][ T1] x21: 0000000000000000 x20: 00000000ffffffff [ 123.532845][ T1] x19: 00000000ffffffff x18: 0000000000002690 [ 123.535547][ T1] x17: 0000000000002718 x16: 00000000000014b8 [ 123.538299][ T1] x15: 0000000000000001 x14: 0080000000000000 [ 123.541055][ T1] x13: 0000000000000002 x12: ffff94000298d209 [ 123.543801][ T1] x11: 1ffff4000298d208 x10: ffff94000298d208 [ 123.546580][ T1] x9 : dfffa00000000000 x8 : ffffa00014c69047 [ 123.549247][ T1] x7 : 0000000000000001 x6 : ffffa00014c69040 [ 123.552026][ T1] x5 : ffff00006a654040 x4 : 0000000000000000 [ 123.554799][ T1] x3 : ffffa00011d59d04 x2 : 00000000ffffffff [ 123.557541][ T1] x1 : ffff00006a654040 x0 : 0000000000000000 [ 123.560390][ T1] Call trace: [ 123.561935][ T1] of_unittest_untrack_overlay+0x64/0x134 [ 123.564469][ T1] of_unittest+0x2220/0x2438 [ 123.566585][ T1] do_one_initcall+0x470/0xa10 [ 123.568751][ T1] kernel_init_freeable+0x510/0x5f0 [ 123.571123][ T1] kernel_init+0x18/0x1e8 [ 123.573078][ T1] ret_from_fork+0x10/0x18 [ 123.575119][ T1] Code: 97978a9c d4210000 14000024 97978a99 (d4207d00) [ 123.578138][ T1] ---[ end trace c4e049fb5e3b0ba0 ]--- [ 123.580449][ T1] Kernel panic - not syncing: Fatal exception [ 123.583116][ T1] Kernel Offset: disabled [ 123.585066][ T1] CPU features: 0x240002,20002004 [ 123.587259][ T1] Memory Limit: none [ 123.588986][ T1] ---[ end Kernel panic - not syncing: Fatal exception ]--- Signed-off-by: Anders Roxell <[email protected]>
During the update procedure, when overwrite element in a pre-allocated htab, the freeing of old_element is protected by the bucket lock. The reason why the bucket lock is necessary is that the old_element has already been stashed in htab->extra_elems after alloc_htab_elem() returns. If freeing the old_element after the bucket lock is unlocked, the stashed element may be reused by concurrent update procedure and the freeing of old_element will run concurrently with the reuse of the old_element. However, the invocation of check_and_free_fields() may acquire a spin-lock which violates the lockdep rule because its caller has already held a raw-spin-lock (bucket lock). The following warning will be reported when such race happens: BUG: scheduling while atomic: test_progs/676/0x00000003 3 locks held by test_progs/676: #0: ffffffff864b0240 (rcu_read_lock_trace){....}-{0:0}, at: bpf_prog_test_run_syscall+0x2c0/0x830 #1: ffff88810e961188 (&htab->lockdep_key){....}-{2:2}, at: htab_map_update_elem+0x306/0x1500 #2: ffff8881f4eac1b8 (&base->softirq_expiry_lock){....}-{2:2}, at: hrtimer_cancel_wait_running+0xe9/0x1b0 Modules linked in: bpf_testmod(O) Preemption disabled at: [<ffffffff817837a3>] htab_map_update_elem+0x293/0x1500 CPU: 0 UID: 0 PID: 676 Comm: test_progs Tainted: G ... 6.12.0+ torvalds#11 Tainted: [W]=WARN, [O]=OOT_MODULE Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)... Call Trace: <TASK> dump_stack_lvl+0x57/0x70 dump_stack+0x10/0x20 __schedule_bug+0x120/0x170 __schedule+0x300c/0x4800 schedule_rtlock+0x37/0x60 rtlock_slowlock_locked+0x6d9/0x54c0 rt_spin_lock+0x168/0x230 hrtimer_cancel_wait_running+0xe9/0x1b0 hrtimer_cancel+0x24/0x30 bpf_timer_delete_work+0x1d/0x40 bpf_timer_cancel_and_free+0x5e/0x80 bpf_obj_free_fields+0x262/0x4a0 check_and_free_fields+0x1d0/0x280 htab_map_update_elem+0x7fc/0x1500 bpf_prog_9f90bc20768e0cb9_overwrite_cb+0x3f/0x43 bpf_prog_ea601c4649694dbd_overwrite_timer+0x5d/0x7e bpf_prog_test_run_syscall+0x322/0x830 __sys_bpf+0x135d/0x3ca0 __x64_sys_bpf+0x75/0xb0 x64_sys_call+0x1b5/0xa10 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 ... </TASK> It seems feasible to break the reuse and refill of per-cpu extra_elems into two independent parts: reuse the per-cpu extra_elems with bucket lock being held and refill the old_element as per-cpu extra_elems after the bucket lock is unlocked. However, it will make the concurrent overwrite procedures on the same CPU return unexpected -E2BIG error when the map is full. Therefore, the patch fixes the lock problem by breaking the cancelling of bpf_timer into two steps: 1) use hrtimer_try_to_cancel() and check its return value 2) if the timer is running, use hrtimer_cancel() through a kworker to cancel it again Considering that the current implementation of hrtimer_cancel() will try to spin on current CPU or acquire a being held softirq_expiry_lock when the current timer is running, these steps above are reasonable. However, it also has downside. When the timer is running, the cancelling of the timer is delayed when releasing the last map uref. The delay is also fixable (e.g., break the cancelling of bpf timer into two parts: one part in locked scope, another one in unlocked scope), so it can be revised later if necessary. It is a bit hard to decide the right fix tag. One reason is that the problem depends on PREEMPT_RT which is enabled in v6.12. Considering the softirq_expiry_lock lock exists since v5.4 and bpf_timer is introduced in v5.15, the bpf_timer commit is used in the fixes tag and an extra depends-on tag is added to state the dependency on PREEMPT_RT. Fixes: b00628b ("bpf: Introduce bpf timers.") Depends-on: v6.12 with PREEMPT_RT enabled Reported-by: Sebastian Andrzej Siewior <[email protected]> Closes: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Hou Tao <[email protected]>
[ 123.491737][ T1] Unexpected kernel BRK exception at EL1 [ 123.497593][ T1] Internal error: ptrace BRK handler: f20003e8 [#1] PREEMPT SMP [ 123.500785][ T1] Modules linked in: [ 123.502567][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 5.8.0-rc3-next-20200630-00003-g15e24419c239-dirty torvalds#11 [ 123.507468][ T1] Hardware name: linux,dummy-virt (DT) [ 123.509826][ T1] pstate: 80400005 (Nzcv daif +PAN -UAO BTYPE=--) [ 123.512609][ T1] pc : of_unittest_untrack_overlay+0x64/0x134 [ 123.515245][ T1] lr : of_unittest_untrack_overlay+0x64/0x134 [ 123.517848][ T1] sp : ffff00006a65fb30 [ 123.519668][ T1] x29: ffff00006a65fb30 x28: 0000000000000000 [ 123.522295][ T1] x27: ffff00006a65fc30 x26: ffffa00016b86f00 [ 123.524937][ T1] x25: 0000000000000000 x24: 0000000000000000 [ 123.527592][ T1] x23: ffffa00014c72540 x22: ffffa00016b86000 [ 123.530191][ T1] x21: 0000000000000000 x20: 00000000ffffffff [ 123.532845][ T1] x19: 00000000ffffffff x18: 0000000000002690 [ 123.535547][ T1] x17: 0000000000002718 x16: 00000000000014b8 [ 123.538299][ T1] x15: 0000000000000001 x14: 0080000000000000 [ 123.541055][ T1] x13: 0000000000000002 x12: ffff94000298d209 [ 123.543801][ T1] x11: 1ffff4000298d208 x10: ffff94000298d208 [ 123.546580][ T1] x9 : dfffa00000000000 x8 : ffffa00014c69047 [ 123.549247][ T1] x7 : 0000000000000001 x6 : ffffa00014c69040 [ 123.552026][ T1] x5 : ffff00006a654040 x4 : 0000000000000000 [ 123.554799][ T1] x3 : ffffa00011d59d04 x2 : 00000000ffffffff [ 123.557541][ T1] x1 : ffff00006a654040 x0 : 0000000000000000 [ 123.560390][ T1] Call trace: [ 123.561935][ T1] of_unittest_untrack_overlay+0x64/0x134 [ 123.564469][ T1] of_unittest+0x2220/0x2438 [ 123.566585][ T1] do_one_initcall+0x470/0xa10 [ 123.568751][ T1] kernel_init_freeable+0x510/0x5f0 [ 123.571123][ T1] kernel_init+0x18/0x1e8 [ 123.573078][ T1] ret_from_fork+0x10/0x18 [ 123.575119][ T1] Code: 97978a9c d4210000 14000024 97978a99 (d4207d00) [ 123.578138][ T1] ---[ end trace c4e049fb5e3b0ba0 ]--- [ 123.580449][ T1] Kernel panic - not syncing: Fatal exception [ 123.583116][ T1] Kernel Offset: disabled [ 123.585066][ T1] CPU features: 0x240002,20002004 [ 123.587259][ T1] Memory Limit: none [ 123.588986][ T1] ---[ end Kernel panic - not syncing: Fatal exception ]--- Signed-off-by: Anders Roxell <[email protected]>
Under PREEMPT_RT, it is not safe to use GPF_ATOMIC kmalloc when preemption or irq is disabled. The following warning is reported when running test_progs under PREEMPT_RT: BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 675, name: test_progs preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 2 locks held by test_progs/675: #0: ffffffff864b0240 (rcu_read_lock_trace){....}-{0:0}, at: bpf_prog_test_run_syscall+0x2c0/0x830 #1: ffff8881f4ec40c8 ((&c->lock)){....}-{2:2}, at: ___slab_alloc+0xbc/0x1280 Preemption disabled at: [<ffffffff8175ae2b>] __bpf_async_init+0xbb/0xb10 CPU: 1 UID: 0 PID: 675 Comm: test_progs Tainted: G O 6.12.0+ torvalds#11 Tainted: [O]=OOT_MODULE Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ... Call Trace: <TASK> dump_stack_lvl+0x57/0x70 dump_stack+0x10/0x20 __might_resched+0x337/0x4d0 rt_spin_lock+0xd4/0x230 ___slab_alloc+0xbc/0x1280 __slab_alloc.isra.0+0x5d/0xa0 __kmalloc_node_noprof+0xf7/0x4f0 bpf_map_kmalloc_node+0xf5/0x6b0 __bpf_async_init+0x20e/0xb10 bpf_timer_init+0x30/0x40 bpf_prog_c7e2dc9ff3d5ba62_start_cb+0x55/0x85 bpf_prog_4eb421be69ae82fa_start_timer+0x5d/0x7e bpf_prog_test_run_syscall+0x322/0x830 __sys_bpf+0x135d/0x3ca0 __x64_sys_bpf+0x75/0xb0 x64_sys_call+0x1b5/0xa10 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Fix the problem by using bpf_global_ma to allocate bpf_async_cb when PREEMPT_RT is enabled. The reason for still using kmalloc for no-PREEMPT_RT case is that bpf_global_ma doesn't support accouting the allocated memory to specific memcg. Also doing the memory allocation before invoking __bpf_spin_lock_irqsave() to reduce the possibility of -ENOMEM for bpf_global_ma. Signed-off-by: Hou Tao <[email protected]>
[ 123.491737][ T1] Unexpected kernel BRK exception at EL1 [ 123.497593][ T1] Internal error: ptrace BRK handler: f20003e8 [#1] PREEMPT SMP [ 123.500785][ T1] Modules linked in: [ 123.502567][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 5.8.0-rc3-next-20200630-00003-g15e24419c239-dirty torvalds#11 [ 123.507468][ T1] Hardware name: linux,dummy-virt (DT) [ 123.509826][ T1] pstate: 80400005 (Nzcv daif +PAN -UAO BTYPE=--) [ 123.512609][ T1] pc : of_unittest_untrack_overlay+0x64/0x134 [ 123.515245][ T1] lr : of_unittest_untrack_overlay+0x64/0x134 [ 123.517848][ T1] sp : ffff00006a65fb30 [ 123.519668][ T1] x29: ffff00006a65fb30 x28: 0000000000000000 [ 123.522295][ T1] x27: ffff00006a65fc30 x26: ffffa00016b86f00 [ 123.524937][ T1] x25: 0000000000000000 x24: 0000000000000000 [ 123.527592][ T1] x23: ffffa00014c72540 x22: ffffa00016b86000 [ 123.530191][ T1] x21: 0000000000000000 x20: 00000000ffffffff [ 123.532845][ T1] x19: 00000000ffffffff x18: 0000000000002690 [ 123.535547][ T1] x17: 0000000000002718 x16: 00000000000014b8 [ 123.538299][ T1] x15: 0000000000000001 x14: 0080000000000000 [ 123.541055][ T1] x13: 0000000000000002 x12: ffff94000298d209 [ 123.543801][ T1] x11: 1ffff4000298d208 x10: ffff94000298d208 [ 123.546580][ T1] x9 : dfffa00000000000 x8 : ffffa00014c69047 [ 123.549247][ T1] x7 : 0000000000000001 x6 : ffffa00014c69040 [ 123.552026][ T1] x5 : ffff00006a654040 x4 : 0000000000000000 [ 123.554799][ T1] x3 : ffffa00011d59d04 x2 : 00000000ffffffff [ 123.557541][ T1] x1 : ffff00006a654040 x0 : 0000000000000000 [ 123.560390][ T1] Call trace: [ 123.561935][ T1] of_unittest_untrack_overlay+0x64/0x134 [ 123.564469][ T1] of_unittest+0x2220/0x2438 [ 123.566585][ T1] do_one_initcall+0x470/0xa10 [ 123.568751][ T1] kernel_init_freeable+0x510/0x5f0 [ 123.571123][ T1] kernel_init+0x18/0x1e8 [ 123.573078][ T1] ret_from_fork+0x10/0x18 [ 123.575119][ T1] Code: 97978a9c d4210000 14000024 97978a99 (d4207d00) [ 123.578138][ T1] ---[ end trace c4e049fb5e3b0ba0 ]--- [ 123.580449][ T1] Kernel panic - not syncing: Fatal exception [ 123.583116][ T1] Kernel Offset: disabled [ 123.585066][ T1] CPU features: 0x240002,20002004 [ 123.587259][ T1] Memory Limit: none [ 123.588986][ T1] ---[ end Kernel panic - not syncing: Fatal exception ]--- Signed-off-by: Anders Roxell <[email protected]>
During the update procedure, when overwrite element in a pre-allocated htab, the freeing of old_element is protected by the bucket lock. The reason why the bucket lock is necessary is that the old_element has already been stashed in htab->extra_elems after alloc_htab_elem() returns. If freeing the old_element after the bucket lock is unlocked, the stashed element may be reused by concurrent update procedure and the freeing of old_element will run concurrently with the reuse of the old_element. However, the invocation of check_and_free_fields() may acquire a spin-lock which violates the lockdep rule because its caller has already held a raw-spin-lock (bucket lock). The following warning will be reported when such race happens: BUG: scheduling while atomic: test_progs/676/0x00000003 3 locks held by test_progs/676: #0: ffffffff864b0240 (rcu_read_lock_trace){....}-{0:0}, at: bpf_prog_test_run_syscall+0x2c0/0x830 #1: ffff88810e961188 (&htab->lockdep_key){....}-{2:2}, at: htab_map_update_elem+0x306/0x1500 #2: ffff8881f4eac1b8 (&base->softirq_expiry_lock){....}-{2:2}, at: hrtimer_cancel_wait_running+0xe9/0x1b0 Modules linked in: bpf_testmod(O) Preemption disabled at: [<ffffffff817837a3>] htab_map_update_elem+0x293/0x1500 CPU: 0 UID: 0 PID: 676 Comm: test_progs Tainted: G ... 6.12.0+ torvalds#11 Tainted: [W]=WARN, [O]=OOT_MODULE Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)... Call Trace: <TASK> dump_stack_lvl+0x57/0x70 dump_stack+0x10/0x20 __schedule_bug+0x120/0x170 __schedule+0x300c/0x4800 schedule_rtlock+0x37/0x60 rtlock_slowlock_locked+0x6d9/0x54c0 rt_spin_lock+0x168/0x230 hrtimer_cancel_wait_running+0xe9/0x1b0 hrtimer_cancel+0x24/0x30 bpf_timer_delete_work+0x1d/0x40 bpf_timer_cancel_and_free+0x5e/0x80 bpf_obj_free_fields+0x262/0x4a0 check_and_free_fields+0x1d0/0x280 htab_map_update_elem+0x7fc/0x1500 bpf_prog_9f90bc20768e0cb9_overwrite_cb+0x3f/0x43 bpf_prog_ea601c4649694dbd_overwrite_timer+0x5d/0x7e bpf_prog_test_run_syscall+0x322/0x830 __sys_bpf+0x135d/0x3ca0 __x64_sys_bpf+0x75/0xb0 x64_sys_call+0x1b5/0xa10 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 ... </TASK> It seems feasible to break the reuse and refill of per-cpu extra_elems into two independent parts: reuse the per-cpu extra_elems with bucket lock being held and refill the old_element as per-cpu extra_elems after the bucket lock is unlocked. However, it will make the concurrent overwrite procedures on the same CPU return unexpected -E2BIG error when the map is full. Therefore, the patch fixes the lock problem by breaking the cancelling of bpf_timer into two steps for PREEMPT_RT: 1) use hrtimer_try_to_cancel() and check its return value 2) if the timer is running, use hrtimer_cancel() through a kworker to cancel it again Considering that the current implementation of hrtimer_cancel() will try to acquire a being held softirq_expiry_lock when the current timer is running, these steps above are reasonable. However, it also has downside. When the timer is running, the cancelling of the timer is delayed when releasing the last map uref. The delay is also fixable (e.g., break the cancelling of bpf timer into two parts: one part in locked scope, another one in unlocked scope), it can be revised later if necessary. It is a bit hard to decide the right fix tag. One reason is that the problem depends on PREEMPT_RT which is enabled in v6.12. Considering the softirq_expiry_lock lock exists since v5.4 and bpf_timer is introduced in v5.15, the bpf_timer commit is used in the fixes tag and an extra depends-on tag is added to state the dependency on PREEMPT_RT. Fixes: b00628b ("bpf: Introduce bpf timers.") Depends-on: v6.12+ with PREEMPT_RT enabled Reported-by: Sebastian Andrzej Siewior <[email protected]> Closes: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Hou Tao <[email protected]> Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2091625 [ Upstream commit 5bf1557 ] test_progs uses glibc specific functions backtrace() and backtrace_symbols_fd() to print backtrace in case of SIGSEGV. Recent commit (see fixes) updated test_progs.c to define stub versions of the same functions with attriubte "weak" in order to allow linking test_progs against musl libc. Unfortunately this broke the backtrace handling for glibc builds. As it turns out, glibc defines backtrace() and backtrace_symbols_fd() as weak: $ llvm-readelf --symbols /lib64/libc.so.6 \ | grep -P '( backtrace_symbols_fd| backtrace)$' 4910: 0000000000126b40 161 FUNC WEAK DEFAULT 16 backtrace 6843: 0000000000126f90 852 FUNC WEAK DEFAULT 16 backtrace_symbols_fd So does test_progs: $ llvm-readelf --symbols test_progs \ | grep -P '( backtrace_symbols_fd| backtrace)$' 2891: 00000000006ad190 15 FUNC WEAK DEFAULT 13 backtrace 11215: 00000000006ad1a0 41 FUNC WEAK DEFAULT 13 backtrace_symbols_fd In such situation dynamic linker is not obliged to favour glibc implementation over the one defined in test_progs. Compiling with the following simple modification to test_progs.c demonstrates the issue: $ git diff ... \--- a/tools/testing/selftests/bpf/test_progs.c \+++ b/tools/testing/selftests/bpf/test_progs.c \@@ -1817,6 +1817,7 @@ int main(int argc, char **argv) if (err) return err; + *(int *)0xdeadbeef = 42; err = cd_flavor_subdir(argv[0]); if (err) return err; $ ./test_progs [0]: Caught signal torvalds#11! Stack trace: <backtrace not supported> Segmentation fault (core dumped) Resolve this by hiding stub definitions behind __GLIBC__ macro check instead of using "weak" attribute. Fixes: c9a83e7 ("selftests/bpf: Fix compile if backtrace support missing in libc") Signed-off-by: Eduard Zingerman <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Tested-by: Tony Ambardar <[email protected]> Reviewed-by: Tony Ambardar <[email protected]> Acked-by: Daniel Xu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Paolo Pisati <[email protected]>
Models: