Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Ref to the sync module. #141

Merged
merged 2 commits into from
Mar 30, 2021
Merged

Add Ref to the sync module. #141

merged 2 commits into from
Mar 30, 2021

Conversation

wedsonaf
Copy link

No description provided.

Copy link
Member

@ojeda ojeda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usual quick review on the docs only.

As usual, it is a pleasure to review things when they are well-documented on the first go :-)

rust/kernel/sync/arc.rs Outdated Show resolved Hide resolved
rust/kernel/sync/arc.rs Outdated Show resolved Hide resolved
rust/kernel/sync/arc.rs Outdated Show resolved Hide resolved
rust/kernel/sync/arc.rs Outdated Show resolved Hide resolved
rust/kernel/sync/arc.rs Outdated Show resolved Hide resolved
rust/kernel/sync/arc.rs Outdated Show resolved Hide resolved
@wedsonaf wedsonaf force-pushed the sync-arc branch 2 times, most recently from 8e008b9 to b9d8c65 Compare March 23, 2021 17:48
@jabedude
Copy link

@wedsonaf would it be a good idea to add sample usage of Ref to one of the example rust character devices?

rust/kernel/sync/arc.rs Outdated Show resolved Hide resolved
Comment on lines 42 to 55
let ptr = NonNull::from(boxed.deref());
Box::into_raw(boxed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this just be NonNull::new(boxed.into_raw())?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NonNull::new may fail (if its argument is null), in which case it returns None. NonNull::from OTOH never fails because its argument cannot be null -- by doing it the way I did I have one less failure path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I think this maybe makes more sense as mem::forget(boxed) then? We're not actually using the into_raw() ptr, which momentarily threw me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think about it, it is possible that some future version of Box::into_raw will return something else other than boxed.deref(), so I think it's better to use your original suggestion, even if it implies an extra failure path (as it should never happen, it's possible the optimiser remove it from the final binary).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, look at this: rust-lang/rust#47336

So the way to go is NonNull::from(Box::leak(boxed)) -- no failure path and still compatible with Box::from_raw!

/// [`Ref::try_new`].
pub unsafe trait RefCounted: Sized {
/// Returns a pointer to the object field holds the reference count.
fn get_count(&self) -> &RefCount;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since everything just accesses .count, would it make more sense to have this ->&AtomicUsize?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an opaque RefCount is better for a couple of reasons:

  1. We can easily add more fields in the future if we choose to. For example, we could have a debug mode where we link all the references so that we know at any given time what all the references are. We could also add some instrumentation/perf numbers like a high water mark for the number of references, etc.
  2. In an earlier version of the code, RefCount didn't have any constructors: the only way to get one [outside the module] was through Ref::try_new, which took a closure as argument that had a RefCount as argument and returned T (which would then be wrapped in Ref). This was my attempt at enforcing that all instances of T be wrapped in Ref. It didn't really guarantee it because one could just 'exfiltrate' the RefCount instance and use it to build an unsafe T (it could never return because it wouldn't be able to construct the T that it needs to return, but one could exfiltrate T to another thread and sleep forever). This is a convoluted way to build an unsafe program, but it is a way nonetheless, so I had to make RefCounted an unsafe trait. Anyway, I still hope we'll find a way to implement this safely and make RefCounted safe; not allowing RefCount to be arbitrarily constructed may be part of the solution.

Copy link
Author

@wedsonaf wedsonaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wedsonaf would it be a good idea to add sample usage of Ref to one of the example rust character devices?

Yes, it would. I will add one in my next PR as it allows file instances to share some state through their device registration. I'll have implementations of read and write there that use this.

(Everything I'm pushing is used in a driver I'm writing. I'm actually teasing apart the driver and generic pieces and trying to commit them here before I push the initial version of the driver. Anyway, the reason I mention this is that people will be able to see real usage then as well.)

Comment on lines 42 to 55
let ptr = NonNull::from(boxed.deref());
Box::into_raw(boxed);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NonNull::new may fail (if its argument is null), in which case it returns None. NonNull::from OTOH never fails because its argument cannot be null -- by doing it the way I did I have one less failure path.

/// [`Ref::try_new`].
pub unsafe trait RefCounted: Sized {
/// Returns a pointer to the object field holds the reference count.
fn get_count(&self) -> &RefCount;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an opaque RefCount is better for a couple of reasons:

  1. We can easily add more fields in the future if we choose to. For example, we could have a debug mode where we link all the references so that we know at any given time what all the references are. We could also add some instrumentation/perf numbers like a high water mark for the number of references, etc.
  2. In an earlier version of the code, RefCount didn't have any constructors: the only way to get one [outside the module] was through Ref::try_new, which took a closure as argument that had a RefCount as argument and returned T (which would then be wrapped in Ref). This was my attempt at enforcing that all instances of T be wrapped in Ref. It didn't really guarantee it because one could just 'exfiltrate' the RefCount instance and use it to build an unsafe T (it could never return because it wouldn't be able to construct the T that it needs to return, but one could exfiltrate T to another thread and sleep forever). This is a convoluted way to build an unsafe program, but it is a way nonetheless, so I had to make RefCounted an unsafe trait. Anyway, I still hope we'll find a way to implement this safely and make RefCounted safe; not allowing RefCount to be arbitrarily constructed may be part of the solution.

rust/kernel/sync/arc.rs Outdated Show resolved Hide resolved
rust/kernel/sync/arc.rs Outdated Show resolved Hide resolved
rust/kernel/sync/arc.rs Show resolved Hide resolved
Copy link
Member

@fbq fbq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add comments for every Ordering usage.

rust/kernel/sync/arc.rs Show resolved Hide resolved
Copy link
Member

@alex alex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment, otherwise this LGTM.

Comment on lines 42 to 55
let ptr = NonNull::from(boxed.deref());
Box::into_raw(boxed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I think this maybe makes more sense as mem::forget(boxed) then? We're not actually using the into_raw() ptr, which momentarily threw me.

@alex alex merged commit adddec4 into Rust-for-Linux:rust Mar 30, 2021
@wedsonaf wedsonaf deleted the sync-arc branch March 30, 2021 23:07
fbq pushed a commit that referenced this pull request Sep 25, 2023
Inject fault while probing kunit-example-test.ko, if kzalloc fails
in kunit_parse_glob_filter(), strcpy() or strncpy() to NULL will
cause below null-ptr-deref bug. So check NULL for kzalloc() and
return int instead of void for kunit_parse_glob_filter().

 Unable to handle kernel paging request at virtual address dfff800000000000
 KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
 Mem abort info:
   ESR = 0x0000000096000005
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x05: level 1 translation fault
 Data abort info:
   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
 [dfff800000000000] address between user and kernel address ranges
 Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
 Modules linked in: kunit_example_test cfg80211 rfkill 8021q garp mrp stp llc ipv6 [last unloaded: kunit_example_test]
 CPU: 4 PID: 6047 Comm: modprobe Tainted: G        W        N 6.5.0-next-20230829+ #141
 Hardware name: linux,dummy-virt (DT)
 pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : strncpy+0x58/0xc0
 lr : kunit_filter_suites+0x15c/0xa84
 sp : ffff800082a17420
 x29: ffff800082a17420 x28: 0000000000000000 x27: 0000000000000004
 x26: 0000000000000000 x25: ffffa847e40a5320 x24: 0000000000000001
 x23: 0000000000000000 x22: 0000000000000001 x21: dfff800000000000
 x20: 000000000000002a x19: 0000000000000000 x18: 00000000750b3b54
 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
 x14: 0000000000000000 x13: 34393178302f3039 x12: ffff7508fcea4ec1
 x11: 1ffff508fcea4ec0 x10: ffff7508fcea4ec0 x9 : dfff800000000000
 x8 : ffff6051b1a7f86a x7 : ffff800082a17270 x6 : 0000000000000002
 x5 : 0000000000000098 x4 : ffff028d9817b250 x3 : 0000000000000000
 x2 : 0000000000000000 x1 : ffffa847e40a5320 x0 : 0000000000000000
 Call trace:
  strncpy+0x58/0xc0
  kunit_filter_suites+0x15c/0xa84
  kunit_module_notify+0x1b0/0x3ac
  blocking_notifier_call_chain+0xc4/0x128
  do_init_module+0x250/0x594
  load_module+0x37b0/0x44b4
  init_module_from_file+0xd4/0x128
  idempotent_init_module+0x2c8/0x524
  __arm64_sys_finit_module+0xac/0x100
  invoke_syscall+0x6c/0x258
  el0_svc_common.constprop.0+0x160/0x22c
  do_el0_svc+0x44/0x5c
  el0_svc+0x38/0x78
  el0t_64_sync_handler+0x13c/0x158
  el0t_64_sync+0x190/0x194
 Code: 5400028a d343fe63 12000a62 39400034 (38f56863)
 ---[ end trace 0000000000000000 ]---
 Kernel panic - not syncing: Oops: Fatal exception
 SMP: stopping secondary CPUs
 Kernel Offset: 0x284761400000 from 0xffff800080000000
 PHYS_OFFSET: 0xfffffd7380000000
 CPU features: 0x88000203,3c020000,1000421b
 Memory Limit: none
 Rebooting in 1 seconds..

Fixes: a127b15 ("kunit: tool: allow filtering test cases via glob")
Signed-off-by: Jinjie Ruan <[email protected]>
Reviewed-by: Rae Moar <[email protected]>
Reviewed-by: David Gow <[email protected]>
Signed-off-by: Shuah Khan <[email protected]>
ojeda pushed a commit that referenced this pull request Nov 27, 2024
The last reference for `cache_head` can be reduced to zero in `c_show`
and `e_show`(using `rcu_read_lock` and `rcu_read_unlock`). Consequently,
`svc_export_put` and `expkey_put` will be invoked, leading to two
issues:

1. The `svc_export_put` will directly free ex_uuid. However,
   `e_show`/`c_show` will access `ex_uuid` after `cache_put`, which can
   trigger a use-after-free issue, shown below.

   ==================================================================
   BUG: KASAN: slab-use-after-free in svc_export_show+0x362/0x430 [nfsd]
   Read of size 1 at addr ff11000010fdc120 by task cat/870

   CPU: 1 UID: 0 PID: 870 Comm: cat Not tainted 6.12.0-rc3+ #1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
   1.16.1-2.fc37 04/01/2014
   Call Trace:
    <TASK>
    dump_stack_lvl+0x53/0x70
    print_address_description.constprop.0+0x2c/0x3a0
    print_report+0xb9/0x280
    kasan_report+0xae/0xe0
    svc_export_show+0x362/0x430 [nfsd]
    c_show+0x161/0x390 [sunrpc]
    seq_read_iter+0x589/0x770
    seq_read+0x1e5/0x270
    proc_reg_read+0xe1/0x140
    vfs_read+0x125/0x530
    ksys_read+0xc1/0x160
    do_syscall_64+0x5f/0x170
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

   Allocated by task 830:
    kasan_save_stack+0x20/0x40
    kasan_save_track+0x14/0x30
    __kasan_kmalloc+0x8f/0xa0
    __kmalloc_node_track_caller_noprof+0x1bc/0x400
    kmemdup_noprof+0x22/0x50
    svc_export_parse+0x8a9/0xb80 [nfsd]
    cache_do_downcall+0x71/0xa0 [sunrpc]
    cache_write_procfs+0x8e/0xd0 [sunrpc]
    proc_reg_write+0xe1/0x140
    vfs_write+0x1a5/0x6d0
    ksys_write+0xc1/0x160
    do_syscall_64+0x5f/0x170
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

   Freed by task 868:
    kasan_save_stack+0x20/0x40
    kasan_save_track+0x14/0x30
    kasan_save_free_info+0x3b/0x60
    __kasan_slab_free+0x37/0x50
    kfree+0xf3/0x3e0
    svc_export_put+0x87/0xb0 [nfsd]
    cache_purge+0x17f/0x1f0 [sunrpc]
    nfsd_destroy_serv+0x226/0x2d0 [nfsd]
    nfsd_svc+0x125/0x1e0 [nfsd]
    write_threads+0x16a/0x2a0 [nfsd]
    nfsctl_transaction_write+0x74/0xa0 [nfsd]
    vfs_write+0x1a5/0x6d0
    ksys_write+0xc1/0x160
    do_syscall_64+0x5f/0x170
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

2. We cannot sleep while using `rcu_read_lock`/`rcu_read_unlock`.
   However, `svc_export_put`/`expkey_put` will call path_put, which
   subsequently triggers a sleeping operation due to the following
   `dput`.

   =============================
   WARNING: suspicious RCU usage
   5.10.0-dirty #141 Not tainted
   -----------------------------
   ...
   Call Trace:
   dump_stack+0x9a/0xd0
   ___might_sleep+0x231/0x240
   dput+0x39/0x600
   path_put+0x1b/0x30
   svc_export_put+0x17/0x80
   e_show+0x1c9/0x200
   seq_read_iter+0x63f/0x7c0
   seq_read+0x226/0x2d0
   vfs_read+0x113/0x2c0
   ksys_read+0xc9/0x170
   do_syscall_64+0x33/0x40
   entry_SYSCALL_64_after_hwframe+0x67/0xd1

Fix these issues by using `rcu_work` to help release
`svc_expkey`/`svc_export`. This approach allows for an asynchronous
context to invoke `path_put` and also facilitates the freeing of
`uuid/exp/key` after an RCU grace period.

Fixes: 9ceddd9 ("knfsd: Allow lockless lookups of the exports")
Signed-off-by: Yang Erkun <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
metaspace pushed a commit to metaspace/linux that referenced this pull request Dec 17, 2024
[ Upstream commit 1037d18 ]

With commit 53c98e3 ("openrisc: mm: remove unneeded early ioremap
code") it was commented that early ioremap was not used in OpenRISC.  I
acked this but was wrong, earlycon was using it.  Earlycon setup now
fails with the below trace:

    Kernel command line: earlycon
    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 0 at mm/ioremap.c:23
    generic_ioremap_prot+0x118/0x130
    Modules linked in:
    CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted
    6.11.0-rc5-00001-gce02fd891c38-dirty Rust-for-Linux#141
    Call trace:
    [<(ptrval)>] dump_stack_lvl+0x7c/0x9c
    [<(ptrval)>] dump_stack+0x1c/0x2c
    [<(ptrval)>] __warn+0xb4/0x108
    [<(ptrval)>] ? generic_ioremap_prot+0x118/0x130
    [<(ptrval)>] warn_slowpath_fmt+0x60/0x98
    [<(ptrval)>] generic_ioremap_prot+0x118/0x130
    [<(ptrval)>] ioremap_prot+0x20/0x30
    [<(ptrval)>] of_setup_earlycon+0xd4/0x2e0
    [<(ptrval)>] early_init_dt_scan_chosen_stdout+0x18c/0x1c8
    [<(ptrval)>] param_setup_earlycon+0x3c/0x60
    [<(ptrval)>] do_early_param+0xb0/0x118
    [<(ptrval)>] parse_args+0x184/0x4b8
    [<(ptrval)>] ? start_kernel+0x0/0x78c
    [<(ptrval)>] parse_early_options+0x40/0x50
    [<(ptrval)>] ? do_early_param+0x0/0x118
    [<(ptrval)>] parse_early_param+0x48/0x68
    [<(ptrval)>] ? start_kernel+0x318/0x78c
    [<(ptrval)>] ? start_kernel+0x0/0x78c
    ---[ end trace 0000000000000000 ]---

To fix this we could either implement early_ioremap again or implement
fixmap.  In this patch we choose the later option of implementing basic
fixmap support.

While fixing this we also remove the old FIX_IOREMAP slots that were
used by early ioremap code.  That code was also removed by commit
53c98e3 ("openrisc: mm: remove unneeded early ioremap code") but
these definitions were not cleaned up.

Fixes: 53c98e3 ("openrisc: mm: remove unneeded early ioremap code")
Signed-off-by: Stafford Horne <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
metaspace pushed a commit to metaspace/linux that referenced this pull request Dec 17, 2024
[ Upstream commit f8c989a ]

The last reference for `cache_head` can be reduced to zero in `c_show`
and `e_show`(using `rcu_read_lock` and `rcu_read_unlock`). Consequently,
`svc_export_put` and `expkey_put` will be invoked, leading to two
issues:

1. The `svc_export_put` will directly free ex_uuid. However,
   `e_show`/`c_show` will access `ex_uuid` after `cache_put`, which can
   trigger a use-after-free issue, shown below.

   ==================================================================
   BUG: KASAN: slab-use-after-free in svc_export_show+0x362/0x430 [nfsd]
   Read of size 1 at addr ff11000010fdc120 by task cat/870

   CPU: 1 UID: 0 PID: 870 Comm: cat Not tainted 6.12.0-rc3+ #1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
   1.16.1-2.fc37 04/01/2014
   Call Trace:
    <TASK>
    dump_stack_lvl+0x53/0x70
    print_address_description.constprop.0+0x2c/0x3a0
    print_report+0xb9/0x280
    kasan_report+0xae/0xe0
    svc_export_show+0x362/0x430 [nfsd]
    c_show+0x161/0x390 [sunrpc]
    seq_read_iter+0x589/0x770
    seq_read+0x1e5/0x270
    proc_reg_read+0xe1/0x140
    vfs_read+0x125/0x530
    ksys_read+0xc1/0x160
    do_syscall_64+0x5f/0x170
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

   Allocated by task 830:
    kasan_save_stack+0x20/0x40
    kasan_save_track+0x14/0x30
    __kasan_kmalloc+0x8f/0xa0
    __kmalloc_node_track_caller_noprof+0x1bc/0x400
    kmemdup_noprof+0x22/0x50
    svc_export_parse+0x8a9/0xb80 [nfsd]
    cache_do_downcall+0x71/0xa0 [sunrpc]
    cache_write_procfs+0x8e/0xd0 [sunrpc]
    proc_reg_write+0xe1/0x140
    vfs_write+0x1a5/0x6d0
    ksys_write+0xc1/0x160
    do_syscall_64+0x5f/0x170
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

   Freed by task 868:
    kasan_save_stack+0x20/0x40
    kasan_save_track+0x14/0x30
    kasan_save_free_info+0x3b/0x60
    __kasan_slab_free+0x37/0x50
    kfree+0xf3/0x3e0
    svc_export_put+0x87/0xb0 [nfsd]
    cache_purge+0x17f/0x1f0 [sunrpc]
    nfsd_destroy_serv+0x226/0x2d0 [nfsd]
    nfsd_svc+0x125/0x1e0 [nfsd]
    write_threads+0x16a/0x2a0 [nfsd]
    nfsctl_transaction_write+0x74/0xa0 [nfsd]
    vfs_write+0x1a5/0x6d0
    ksys_write+0xc1/0x160
    do_syscall_64+0x5f/0x170
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

2. We cannot sleep while using `rcu_read_lock`/`rcu_read_unlock`.
   However, `svc_export_put`/`expkey_put` will call path_put, which
   subsequently triggers a sleeping operation due to the following
   `dput`.

   =============================
   WARNING: suspicious RCU usage
   5.10.0-dirty Rust-for-Linux#141 Not tainted
   -----------------------------
   ...
   Call Trace:
   dump_stack+0x9a/0xd0
   ___might_sleep+0x231/0x240
   dput+0x39/0x600
   path_put+0x1b/0x30
   svc_export_put+0x17/0x80
   e_show+0x1c9/0x200
   seq_read_iter+0x63f/0x7c0
   seq_read+0x226/0x2d0
   vfs_read+0x113/0x2c0
   ksys_read+0xc9/0x170
   do_syscall_64+0x33/0x40
   entry_SYSCALL_64_after_hwframe+0x67/0xd1

Fix these issues by using `rcu_work` to help release
`svc_expkey`/`svc_export`. This approach allows for an asynchronous
context to invoke `path_put` and also facilitates the freeing of
`uuid/exp/key` after an RCU grace period.

Fixes: 9ceddd9 ("knfsd: Allow lockless lookups of the exports")
Signed-off-by: Yang Erkun <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants