-
Notifications
You must be signed in to change notification settings - Fork 436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Ref
to the sync
module.
#141
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usual quick review on the docs only.
As usual, it is a pleasure to review things when they are well-documented on the first go :-)
8e008b9
to
b9d8c65
Compare
@wedsonaf would it be a good idea to add sample usage of |
rust/kernel/sync/arc.rs
Outdated
let ptr = NonNull::from(boxed.deref()); | ||
Box::into_raw(boxed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this just be NonNull::new(boxed.into_raw())
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NonNull::new
may fail (if its argument is null
), in which case it returns None
. NonNull::from
OTOH never fails because its argument cannot be null
-- by doing it the way I did I have one less failure path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I think this maybe makes more sense as mem::forget(boxed)
then? We're not actually using the into_raw()
ptr, which momentarily threw me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I think about it, it is possible that some future version of Box::into_raw
will return something else other than boxed.deref()
, so I think it's better to use your original suggestion, even if it implies an extra failure path (as it should never happen, it's possible the optimiser remove it from the final binary).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, look at this: rust-lang/rust#47336
So the way to go is NonNull::from(Box::leak(boxed))
-- no failure path and still compatible with Box::from_raw
!
/// [`Ref::try_new`]. | ||
pub unsafe trait RefCounted: Sized { | ||
/// Returns a pointer to the object field holds the reference count. | ||
fn get_count(&self) -> &RefCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since everything just accesses .count
, would it make more sense to have this ->&AtomicUsize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an opaque RefCount
is better for a couple of reasons:
- We can easily add more fields in the future if we choose to. For example, we could have a debug mode where we link all the references so that we know at any given time what all the references are. We could also add some instrumentation/perf numbers like a high water mark for the number of references, etc.
- In an earlier version of the code,
RefCount
didn't have any constructors: the only way to get one [outside the module] was throughRef::try_new
, which took a closure as argument that had aRefCount
as argument and returnedT
(which would then be wrapped inRef
). This was my attempt at enforcing that all instances ofT
be wrapped inRef
. It didn't really guarantee it because one could just 'exfiltrate' theRefCount
instance and use it to build an unsafeT
(it could never return because it wouldn't be able to construct theT
that it needs to return, but one could exfiltrateT
to another thread and sleep forever). This is a convoluted way to build an unsafe program, but it is a way nonetheless, so I had to makeRefCounted
an unsafe trait. Anyway, I still hope we'll find a way to implement this safely and makeRefCounted
safe; not allowingRefCount
to be arbitrarily constructed may be part of the solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wedsonaf would it be a good idea to add sample usage of
Ref
to one of the example rust character devices?
Yes, it would. I will add one in my next PR as it allows file instances to share some state through their device registration. I'll have implementations of read
and write
there that use this.
(Everything I'm pushing is used in a driver I'm writing. I'm actually teasing apart the driver and generic pieces and trying to commit them here before I push the initial version of the driver. Anyway, the reason I mention this is that people will be able to see real usage then as well.)
rust/kernel/sync/arc.rs
Outdated
let ptr = NonNull::from(boxed.deref()); | ||
Box::into_raw(boxed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NonNull::new
may fail (if its argument is null
), in which case it returns None
. NonNull::from
OTOH never fails because its argument cannot be null
-- by doing it the way I did I have one less failure path.
/// [`Ref::try_new`]. | ||
pub unsafe trait RefCounted: Sized { | ||
/// Returns a pointer to the object field holds the reference count. | ||
fn get_count(&self) -> &RefCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an opaque RefCount
is better for a couple of reasons:
- We can easily add more fields in the future if we choose to. For example, we could have a debug mode where we link all the references so that we know at any given time what all the references are. We could also add some instrumentation/perf numbers like a high water mark for the number of references, etc.
- In an earlier version of the code,
RefCount
didn't have any constructors: the only way to get one [outside the module] was throughRef::try_new
, which took a closure as argument that had aRefCount
as argument and returnedT
(which would then be wrapped inRef
). This was my attempt at enforcing that all instances ofT
be wrapped inRef
. It didn't really guarantee it because one could just 'exfiltrate' theRefCount
instance and use it to build an unsafeT
(it could never return because it wouldn't be able to construct theT
that it needs to return, but one could exfiltrateT
to another thread and sleep forever). This is a convoluted way to build an unsafe program, but it is a way nonetheless, so I had to makeRefCounted
an unsafe trait. Anyway, I still hope we'll find a way to implement this safely and makeRefCounted
safe; not allowingRefCount
to be arbitrarily constructed may be part of the solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to add comments for every Ordering
usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small comment, otherwise this LGTM.
rust/kernel/sync/arc.rs
Outdated
let ptr = NonNull::from(boxed.deref()); | ||
Box::into_raw(boxed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I think this maybe makes more sense as mem::forget(boxed)
then? We're not actually using the into_raw()
ptr, which momentarily threw me.
Inject fault while probing kunit-example-test.ko, if kzalloc fails in kunit_parse_glob_filter(), strcpy() or strncpy() to NULL will cause below null-ptr-deref bug. So check NULL for kzalloc() and return int instead of void for kunit_parse_glob_filter(). Unable to handle kernel paging request at virtual address dfff800000000000 KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] Mem abort info: ESR = 0x0000000096000005 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x05: level 1 translation fault Data abort info: ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [dfff800000000000] address between user and kernel address ranges Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Modules linked in: kunit_example_test cfg80211 rfkill 8021q garp mrp stp llc ipv6 [last unloaded: kunit_example_test] CPU: 4 PID: 6047 Comm: modprobe Tainted: G W N 6.5.0-next-20230829+ #141 Hardware name: linux,dummy-virt (DT) pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : strncpy+0x58/0xc0 lr : kunit_filter_suites+0x15c/0xa84 sp : ffff800082a17420 x29: ffff800082a17420 x28: 0000000000000000 x27: 0000000000000004 x26: 0000000000000000 x25: ffffa847e40a5320 x24: 0000000000000001 x23: 0000000000000000 x22: 0000000000000001 x21: dfff800000000000 x20: 000000000000002a x19: 0000000000000000 x18: 00000000750b3b54 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 34393178302f3039 x12: ffff7508fcea4ec1 x11: 1ffff508fcea4ec0 x10: ffff7508fcea4ec0 x9 : dfff800000000000 x8 : ffff6051b1a7f86a x7 : ffff800082a17270 x6 : 0000000000000002 x5 : 0000000000000098 x4 : ffff028d9817b250 x3 : 0000000000000000 x2 : 0000000000000000 x1 : ffffa847e40a5320 x0 : 0000000000000000 Call trace: strncpy+0x58/0xc0 kunit_filter_suites+0x15c/0xa84 kunit_module_notify+0x1b0/0x3ac blocking_notifier_call_chain+0xc4/0x128 do_init_module+0x250/0x594 load_module+0x37b0/0x44b4 init_module_from_file+0xd4/0x128 idempotent_init_module+0x2c8/0x524 __arm64_sys_finit_module+0xac/0x100 invoke_syscall+0x6c/0x258 el0_svc_common.constprop.0+0x160/0x22c do_el0_svc+0x44/0x5c el0_svc+0x38/0x78 el0t_64_sync_handler+0x13c/0x158 el0t_64_sync+0x190/0x194 Code: 5400028a d343fe63 12000a62 39400034 (38f56863) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal exception SMP: stopping secondary CPUs Kernel Offset: 0x284761400000 from 0xffff800080000000 PHYS_OFFSET: 0xfffffd7380000000 CPU features: 0x88000203,3c020000,1000421b Memory Limit: none Rebooting in 1 seconds.. Fixes: a127b15 ("kunit: tool: allow filtering test cases via glob") Signed-off-by: Jinjie Ruan <[email protected]> Reviewed-by: Rae Moar <[email protected]> Reviewed-by: David Gow <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
The last reference for `cache_head` can be reduced to zero in `c_show` and `e_show`(using `rcu_read_lock` and `rcu_read_unlock`). Consequently, `svc_export_put` and `expkey_put` will be invoked, leading to two issues: 1. The `svc_export_put` will directly free ex_uuid. However, `e_show`/`c_show` will access `ex_uuid` after `cache_put`, which can trigger a use-after-free issue, shown below. ================================================================== BUG: KASAN: slab-use-after-free in svc_export_show+0x362/0x430 [nfsd] Read of size 1 at addr ff11000010fdc120 by task cat/870 CPU: 1 UID: 0 PID: 870 Comm: cat Not tainted 6.12.0-rc3+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x53/0x70 print_address_description.constprop.0+0x2c/0x3a0 print_report+0xb9/0x280 kasan_report+0xae/0xe0 svc_export_show+0x362/0x430 [nfsd] c_show+0x161/0x390 [sunrpc] seq_read_iter+0x589/0x770 seq_read+0x1e5/0x270 proc_reg_read+0xe1/0x140 vfs_read+0x125/0x530 ksys_read+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Allocated by task 830: kasan_save_stack+0x20/0x40 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x8f/0xa0 __kmalloc_node_track_caller_noprof+0x1bc/0x400 kmemdup_noprof+0x22/0x50 svc_export_parse+0x8a9/0xb80 [nfsd] cache_do_downcall+0x71/0xa0 [sunrpc] cache_write_procfs+0x8e/0xd0 [sunrpc] proc_reg_write+0xe1/0x140 vfs_write+0x1a5/0x6d0 ksys_write+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Freed by task 868: kasan_save_stack+0x20/0x40 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 __kasan_slab_free+0x37/0x50 kfree+0xf3/0x3e0 svc_export_put+0x87/0xb0 [nfsd] cache_purge+0x17f/0x1f0 [sunrpc] nfsd_destroy_serv+0x226/0x2d0 [nfsd] nfsd_svc+0x125/0x1e0 [nfsd] write_threads+0x16a/0x2a0 [nfsd] nfsctl_transaction_write+0x74/0xa0 [nfsd] vfs_write+0x1a5/0x6d0 ksys_write+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e 2. We cannot sleep while using `rcu_read_lock`/`rcu_read_unlock`. However, `svc_export_put`/`expkey_put` will call path_put, which subsequently triggers a sleeping operation due to the following `dput`. ============================= WARNING: suspicious RCU usage 5.10.0-dirty #141 Not tainted ----------------------------- ... Call Trace: dump_stack+0x9a/0xd0 ___might_sleep+0x231/0x240 dput+0x39/0x600 path_put+0x1b/0x30 svc_export_put+0x17/0x80 e_show+0x1c9/0x200 seq_read_iter+0x63f/0x7c0 seq_read+0x226/0x2d0 vfs_read+0x113/0x2c0 ksys_read+0xc9/0x170 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x67/0xd1 Fix these issues by using `rcu_work` to help release `svc_expkey`/`svc_export`. This approach allows for an asynchronous context to invoke `path_put` and also facilitates the freeing of `uuid/exp/key` after an RCU grace period. Fixes: 9ceddd9 ("knfsd: Allow lockless lookups of the exports") Signed-off-by: Yang Erkun <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
[ Upstream commit 1037d18 ] With commit 53c98e3 ("openrisc: mm: remove unneeded early ioremap code") it was commented that early ioremap was not used in OpenRISC. I acked this but was wrong, earlycon was using it. Earlycon setup now fails with the below trace: Kernel command line: earlycon ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at mm/ioremap.c:23 generic_ioremap_prot+0x118/0x130 Modules linked in: CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.11.0-rc5-00001-gce02fd891c38-dirty Rust-for-Linux#141 Call trace: [<(ptrval)>] dump_stack_lvl+0x7c/0x9c [<(ptrval)>] dump_stack+0x1c/0x2c [<(ptrval)>] __warn+0xb4/0x108 [<(ptrval)>] ? generic_ioremap_prot+0x118/0x130 [<(ptrval)>] warn_slowpath_fmt+0x60/0x98 [<(ptrval)>] generic_ioremap_prot+0x118/0x130 [<(ptrval)>] ioremap_prot+0x20/0x30 [<(ptrval)>] of_setup_earlycon+0xd4/0x2e0 [<(ptrval)>] early_init_dt_scan_chosen_stdout+0x18c/0x1c8 [<(ptrval)>] param_setup_earlycon+0x3c/0x60 [<(ptrval)>] do_early_param+0xb0/0x118 [<(ptrval)>] parse_args+0x184/0x4b8 [<(ptrval)>] ? start_kernel+0x0/0x78c [<(ptrval)>] parse_early_options+0x40/0x50 [<(ptrval)>] ? do_early_param+0x0/0x118 [<(ptrval)>] parse_early_param+0x48/0x68 [<(ptrval)>] ? start_kernel+0x318/0x78c [<(ptrval)>] ? start_kernel+0x0/0x78c ---[ end trace 0000000000000000 ]--- To fix this we could either implement early_ioremap again or implement fixmap. In this patch we choose the later option of implementing basic fixmap support. While fixing this we also remove the old FIX_IOREMAP slots that were used by early ioremap code. That code was also removed by commit 53c98e3 ("openrisc: mm: remove unneeded early ioremap code") but these definitions were not cleaned up. Fixes: 53c98e3 ("openrisc: mm: remove unneeded early ioremap code") Signed-off-by: Stafford Horne <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit f8c989a ] The last reference for `cache_head` can be reduced to zero in `c_show` and `e_show`(using `rcu_read_lock` and `rcu_read_unlock`). Consequently, `svc_export_put` and `expkey_put` will be invoked, leading to two issues: 1. The `svc_export_put` will directly free ex_uuid. However, `e_show`/`c_show` will access `ex_uuid` after `cache_put`, which can trigger a use-after-free issue, shown below. ================================================================== BUG: KASAN: slab-use-after-free in svc_export_show+0x362/0x430 [nfsd] Read of size 1 at addr ff11000010fdc120 by task cat/870 CPU: 1 UID: 0 PID: 870 Comm: cat Not tainted 6.12.0-rc3+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x53/0x70 print_address_description.constprop.0+0x2c/0x3a0 print_report+0xb9/0x280 kasan_report+0xae/0xe0 svc_export_show+0x362/0x430 [nfsd] c_show+0x161/0x390 [sunrpc] seq_read_iter+0x589/0x770 seq_read+0x1e5/0x270 proc_reg_read+0xe1/0x140 vfs_read+0x125/0x530 ksys_read+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Allocated by task 830: kasan_save_stack+0x20/0x40 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x8f/0xa0 __kmalloc_node_track_caller_noprof+0x1bc/0x400 kmemdup_noprof+0x22/0x50 svc_export_parse+0x8a9/0xb80 [nfsd] cache_do_downcall+0x71/0xa0 [sunrpc] cache_write_procfs+0x8e/0xd0 [sunrpc] proc_reg_write+0xe1/0x140 vfs_write+0x1a5/0x6d0 ksys_write+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Freed by task 868: kasan_save_stack+0x20/0x40 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 __kasan_slab_free+0x37/0x50 kfree+0xf3/0x3e0 svc_export_put+0x87/0xb0 [nfsd] cache_purge+0x17f/0x1f0 [sunrpc] nfsd_destroy_serv+0x226/0x2d0 [nfsd] nfsd_svc+0x125/0x1e0 [nfsd] write_threads+0x16a/0x2a0 [nfsd] nfsctl_transaction_write+0x74/0xa0 [nfsd] vfs_write+0x1a5/0x6d0 ksys_write+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e 2. We cannot sleep while using `rcu_read_lock`/`rcu_read_unlock`. However, `svc_export_put`/`expkey_put` will call path_put, which subsequently triggers a sleeping operation due to the following `dput`. ============================= WARNING: suspicious RCU usage 5.10.0-dirty Rust-for-Linux#141 Not tainted ----------------------------- ... Call Trace: dump_stack+0x9a/0xd0 ___might_sleep+0x231/0x240 dput+0x39/0x600 path_put+0x1b/0x30 svc_export_put+0x17/0x80 e_show+0x1c9/0x200 seq_read_iter+0x63f/0x7c0 seq_read+0x226/0x2d0 vfs_read+0x113/0x2c0 ksys_read+0xc9/0x170 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x67/0xd1 Fix these issues by using `rcu_work` to help release `svc_expkey`/`svc_export`. This approach allows for an asynchronous context to invoke `path_put` and also facilitates the freeing of `uuid/exp/key` after an RCU grace period. Fixes: 9ceddd9 ("knfsd: Allow lockless lookups of the exports") Signed-off-by: Yang Erkun <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
No description provided.