Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutex #41

Closed
wants to merge 1 commit into from
Closed

Mutex #41

wants to merge 1 commit into from

Conversation

kloenk
Copy link
Member

@kloenk kloenk commented Nov 29, 2020

Create a Mutex and pr_* macros

@kloenk
Copy link
Member Author

kloenk commented Nov 29, 2020

Draft for now, so I can first use it somewhere.

@kloenk
Copy link
Member Author

kloenk commented Nov 29, 2020

used it in the example as example. ready from my side.

@kloenk kloenk marked this pull request as ready for review November 29, 2020 23:15
rust/kernel/src/lib.rs Outdated Show resolved Hide resolved
rust/kernel/src/sync.rs Outdated Show resolved Hide resolved
rust/kernel/src/sync.rs Show resolved Hide resolved
@alex
Copy link
Member

alex commented Nov 29, 2020

It's a 3p library, but it may make sense to consider using https://docs.rs/lock_api/0.4.2/lock_api/

rust/kernel/src/sync.rs Outdated Show resolved Hide resolved
@kloenk
Copy link
Member Author

kloenk commented Nov 29, 2020

It's a 3p library, but it may make sense to consider using https://docs.rs/lock_api/0.4.2/lock_api/

I would like to keep the dependencies for kernel at a bare minimum. So would like to implement by my self, instead of using this crate.

rust/kernel/src/sync.rs Outdated Show resolved Hide resolved
@kloenk kloenk force-pushed the lock branch 2 times, most recently from d0d1dba to e4b7ba2 Compare November 30, 2020 00:07
@kloenk kloenk changed the title Mutex and pr_* Mutex Nov 30, 2020
@kloenk
Copy link
Member Author

kloenk commented Nov 30, 2020

renamed to only Mutex, as i think the example is enough to test, and so can keep the change smaller

drivers/char/rust_example/src/lib.rs Outdated Show resolved Hide resolved
rust/kernel/src/sync.rs Outdated Show resolved Hide resolved
rust/kernel/src/sync.rs Outdated Show resolved Hide resolved
rust/kernel/src/sync.rs Outdated Show resolved Hide resolved
rust/kernel/src/sync.rs Outdated Show resolved Hide resolved
@kloenk
Copy link
Member Author

kloenk commented Nov 30, 2020

Something broke now. Should sleep now, will Fix tomorrow.

rust/kernel/src/sync.rs Outdated Show resolved Hide resolved
}

impl<T: ?Sized> Mutex<T> {
/// acquire a lock on the mutex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicks: uppercase on first letter, end with period ..

/// acquire a lock on the mutex
/// # unsafe
/// This is unsafe, as it returns a second lock if one is already help by the current process
// with CONFIG_DEBUG_LOCK_ALLOW mutex_lock is a macro, which calls mutex_lock_nested(&mutex, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicks: missing /, typo help -> held, use Markdown syntax, i.e. `CONFIG_DEBUG_LOCK_ALLOW` and `mutex_lock_nested(&mutex, 0)`, end with .. Also "task" instead of "process".

rust/kernel/src/sync.rs Show resolved Hide resolved
impl<T: ?Sized> Mutex<T> {
/// acquire a lock on the mutex
/// # unsafe
/// This is unsafe, as it returns a second lock if one is already help by the current process
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any kernel APIs we can use to make this safe? It'd be a shame if all mutex usage requires unsafe.

Copy link
Member

@ojeda ojeda Nov 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could test if the mutex is already locked and panic in that case (similar to what Rust's std guarantees: a deadlock or a panic but never returning), but that would pretty much be like redoing the kernel mutex debugging infrastructure, so not sure how it will be received. Even if we go that route, we should have both APIs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ "panic in that case" -> "deadlock in that case" -- considering we are the kernel, it is way better to deadlock a task (with perhaps a warning or even an oops prior to the deadlock itself) than kill the entire system.


Hmm... I wonder, could we do this statically for a subset of the use cases? Say, two types, Mutex and MutexLocked which move themselves into each other on every operation (& forget themselves to avoid calling unnecessary drops which would require a test). Then we could have code like:

let m = m.lock();
...
let m = m.unlock();

Things like trylock could return an enum of either locked/unlocked, there could be a m.take(|...| { ... }), etc.

Of course, this doesn't work if one needs to keep the mutex stored in different states, but in some use cases this could be enough.

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan. With mutexguard you only need a reference of the mutex. Your idea needs a owned type which can not live in a struct or similar.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must be missing something here. Given that mutexes are not recursive, how is it that lock() can return a second MutexGuard when the mutex is already locked (by whomever)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it's unsafe, as if you call Ut 2 times from the same function, it calles 2 times mutex_unlock, which results in a kernel panic.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this and it just deadlocked the task, perhaps we have different debugging configurations. In any case, whether it deadlocks or panics, why does it make this unsafe?

Mutex from the std library is not unsafe and behaves similarly:

The exact behavior on locking a mutex in the thread which already holds the lock is left unspecified. However, this function will not return on the second call (it might panic or deadlock, for example).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does it make this unsafe?

It depends whether the panic is a controlled one or one triggered due to a double free, GPF or similar.

Mutex from the std library is not unsafe and behaves similarly:

Yeah, this is what I mentioned in my first comment above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove them unsafe again, and Wirte some documentation. I can only do it after my exams. So either feel free to take my work, or wait until I have more time again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry about it. Good luck on your exams! :)

use crate::bindings;

pub struct Mutex<T: ?Sized> {
lock: bindings::mutex,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct mutex contains a struct list_head, which is a self-referential data structure. So we need to pin Mutex before we can safely initialise and use it.

#[cfg(CONFIG_DEBUG_LOCK_ALLOC)]
pub unsafe fn lock<'a>(&'a self) -> MutexGuard<'a, T> {
unsafe {
bindings::mutex_lock_nested(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to call mutex_init before you can safely use it. Where is it being called?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some mutexes (I think it was in the rtlink code) I don't saw it called anywhere. So I thought it works without calling mutex_unit if you set the head_list

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some mutexes (I think it was in the rtlink code) I don't saw it called anywhere.

In such cases they were probably defined with DEFINE_MUTEX, which allows them to skip mutex_init.

So I thought it works without calling mutex_unit if you set the head_list

Where are you setting the head of the list?

In any case, check out Generic Mutex Subsystem, in particular:
A mutex must only be initialized via the API (see below).

@wedsonaf
Copy link

wedsonaf commented Dec 9, 2020

Hi Finn,

I just saw on the rust-for-linux list archive that you are doing this to implement pr_*; I'm not responding there because I just joined the list, so I don't have the thread to reply to.

Anyway, I don't think mutex is what you need for this because it will put the task to sleep when it is locked, which means that the pr_* functions you implement won't be usable for contexts where sleeping is not an option, for example, interrupt contents. You'll likely need a spinlock -- I was working on an implementation before, I'll upload to a branch here so that you can take a look.

But if you only need to set a value once, other sync primitives may be cheaper/better. Could you give more details on what you're trying to do?

Thanks!

@kloenk
Copy link
Member Author

kloenk commented Dec 19, 2020

Yes, I only want to set the value once. So I guess other sync primitives are better.

@ojeda
Copy link
Member

ojeda commented Mar 12, 2021

This will be superseded by #102 more or less, but the contribution has been very welcome!

@kloenk Perhaps you want to review the other one and/or compare it with this one; e.g. the other one does not have Display, uses a different interface, etc. :-)

@kloenk
Copy link
Member Author

kloenk commented Mar 12, 2021

Yes, closing this sounds like a good plan. Display I would like for easy (println) debugging :-). Hopefully I have time to implement it this weekend.

@kloenk kloenk closed this Mar 12, 2021
@ojeda ojeda deleted the lock branch March 18, 2021 11:58
JoseTeuttli pushed a commit to JoseTeuttli/linux that referenced this pull request Jun 14, 2021
Commit f5ce815 ("scsi: target: tcmu: Support DATA_BLOCK_SIZE = N *
PAGE_SIZE") introduced xas_next() calls to iterate xarray elements.  These
calls triggered the WARNING "suspicious RCU usage" at tcmu device set up
[1]. In the call stack of xas_next(), xas_load() was called.  According to
its comment, this function requires "the xa_lock or the RCU lock".

To avoid the warning:

 - Guard the small loop calling xas_next() in tcmu_get_empty_block with RCU
   lock.

 - In the large loop in tcmu_copy_data using RCU lock would possibly
   disable preemtion for a long time (copy multi MBs). Therefore replace
   XA_STATE, xas_set and xas_next with a single xa_load.

[1]

[ 1899.867091] =============================
[ 1899.871199] WARNING: suspicious RCU usage
[ 1899.875310] 5.13.0-rc1+ Rust-for-Linux#41 Not tainted
[ 1899.879222] -----------------------------
[ 1899.883299] include/linux/xarray.h:1182 suspicious rcu_dereference_check() usage!
[ 1899.890940] other info that might help us debug this:
[ 1899.899082] rcu_scheduler_active = 2, debug_locks = 1
[ 1899.905719] 3 locks held by kworker/0:1/1368:
[ 1899.910161]  #0: ffffa1f8c8b98738 ((wq_completion)target_submission){+.+.}-{0:0}, at: process_one_work+0x1ee/0x580
[ 1899.920732]  #1: ffffbd7040cd7e78 ((work_completion)(&q->sq.work)){+.+.}-{0:0}, at: process_one_work+0x1ee/0x580
[ 1899.931146]  #2: ffffa1f8d1c99768 (&udev->cmdr_lock){+.+.}-{3:3}, at: tcmu_queue_cmd+0xea/0x160 [target_core_user]
[ 1899.941678] stack backtrace:
[ 1899.946093] CPU: 0 PID: 1368 Comm: kworker/0:1 Not tainted 5.13.0-rc1+ Rust-for-Linux#41
[ 1899.953070] Hardware name: System manufacturer System Product Name/PRIME Z270-A, BIOS 1302 03/15/2018
[ 1899.962459] Workqueue: target_submission target_queued_submit_work [target_core_mod]
[ 1899.970337] Call Trace:
[ 1899.972839]  dump_stack+0x6d/0x89
[ 1899.976222]  xas_descend+0x10e/0x120
[ 1899.979875]  xas_load+0x39/0x50
[ 1899.983077]  tcmu_get_empty_blocks+0x115/0x1c0 [target_core_user]
[ 1899.989318]  queue_cmd_ring+0x1da/0x630 [target_core_user]
[ 1899.994897]  ? rcu_read_lock_sched_held+0x3f/0x70
[ 1899.999695]  ? trace_kmalloc+0xa6/0xd0
[ 1900.003501]  ? __kmalloc+0x205/0x380
[ 1900.007167]  tcmu_queue_cmd+0x12f/0x160 [target_core_user]
[ 1900.012746]  __target_execute_cmd+0x23/0xa0 [target_core_mod]
[ 1900.018589]  transport_generic_new_cmd+0x1f3/0x370 [target_core_mod]
[ 1900.025046]  transport_handle_cdb_direct+0x34/0x50 [target_core_mod]
[ 1900.031517]  target_queued_submit_work+0x43/0xe0 [target_core_mod]
[ 1900.037837]  process_one_work+0x268/0x580
[ 1900.041952]  ? process_one_work+0x580/0x580
[ 1900.046195]  worker_thread+0x55/0x3b0
[ 1900.049921]  ? process_one_work+0x580/0x580
[ 1900.054192]  kthread+0x143/0x160
[ 1900.057499]  ? kthread_create_worker_on_cpu+0x40/0x40
[ 1900.062661]  ret_from_fork+0x1f/0x30

Link: https://lore.kernel.org/r/[email protected]
Fixes: f5ce815 ("scsi: target: tcmu: Support DATA_BLOCK_SIZE = N * PAGE_SIZE")
Reported-by: Shin'ichiro Kawasaki <[email protected]>
Tested-by: Shin'ichiro Kawasaki <[email protected]>
Signed-off-by: Bodo Stroesser <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants