Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pthreads synchronization primitives: detect mutex/rwlock/... being moved to a different location #3749

Closed
3 tasks done
RalfJung opened this issue Jul 13, 2024 · 14 comments · Fixed by #3784 or #3884
Closed
3 tasks done
Labels
A-concurrency Area: affects our concurrency (multi-thread) support A-shims Area: This affects the external function shims C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement E-good-second-issue A good issue to pick up if you've already seen some parts of Miri, mentoring is available

Comments

@RalfJung
Copy link
Member

RalfJung commented Jul 13, 2024

pthread's synchronization primitives must not be moved to a different location once they were initialized. However, Miri does currently not catch that kind of UB. We probably want a fundamentally different implementation of these primitives to detect such bugs -- instead of storing data in the memory that is occupied by pthread_mutex_t et al, we use the address of the pthread_mutex_t as the key into some table for managing the lock.

This would also avoid having to figure out which "offsets" inside pthread_mutex_t to use on which OS to do what... we'd entirely stop caring about the layout of pthread_mutex_t. So this is a better way to achieve FreeBSD support for the pthread primitives as well.

  • mutex
  • rwlock
  • condvar
@RalfJung RalfJung added C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement A-shims Area: This affects the external function shims A-concurrency Area: affects our concurrency (multi-thread) support E-good-second-issue A good issue to pick up if you've already seen some parts of Miri, mentoring is available labels Jul 13, 2024
@RalfJung
Copy link
Member Author

RalfJung commented Jul 14, 2024

The macOS os_unfair_lock could also use this infrastructure to either abort execution when a lock is moved while locked, or implement behavior that matches the real OS.

@Mandragorian
Copy link
Contributor

Would this be possible to do by storing the original address in memory with pthread_mutex_t and check if it has changed? Not sure how changing the miri layout of a type interferes with what the program actually sees. (I assume that since pointers have provenance information stored with them this might not be an issue though?)

This doesn't have the benefit of simplifying the code, but might be a less invasive change.

@RalfJung
Copy link
Member Author

Some scheme like that could work, but it doesn't seem worth the effort to me. (Also not sure what your comment about provenance is meant to say.)

Probably the least invasive version of this is to store just the MutexId (etc) in the pthread_mutex_t, and then in the global mutex map store the address we are expecting this mutex to be at (and any other information, like the mutex kind). After all, we still need to store something inside the pthread_mutex_t to support the static initializers.

@Mandragorian
Copy link
Contributor

regarding provenance comment: I wasn't sure if how miri represents type in the machine's memory might conflict with the expected layout for a type. Like if we added that extra field that stores the original address the type layout would have changed. And I remembered that provenance already does something similar, if I understand correctly, where it doesn't just store the address that a pointer holds, but also provenance information. I was basically trying to prove to myself that adding metadata to a type is fine.

If there already exists such a map, then yes, storing the metadata to it is probably the simplest way, yes.

I will need to read the code more to understand what needs to happen but from your commends and some initial reading I think I could give this a shot if there is interest.

Thanks for the answers!

@RalfJung
Copy link
Member Author

There are several misconceptions here about how Miri works: there are no types in memory, only raw bytes. And we can never add more fields or anything like that, pthread_mutex_t has a certain size and that's the number of bytes we can use to store data in-place. We can do basically anything we want with these bytes, ignoring what they are used for "normally" on that platform, because users are supposed to treat pthread_mutex_t entirely opaquely. And yes provenance exists as "metadata" stored next to bytes, but that is a deeply invasive thing to add to the language with observable consequences, so we cannot put any other metadata there, it can only be used for provenance.

@Mandragorian
Copy link
Contributor

I have some code for this but before I send anything stupid for review I have 2 questions:

  • Regarding the mutex kind: From what I understand, static initializers set the kind of the mutex. So lock, trylock and unlock should always try to retrieve it from the bytes pointed by mutex_op. Doesn't this mean that we can't store the kind in some global map, and we need to keep storing it in the pthread_mutex_t?

  • Do we want a new TerminationInfo variant? I see that there is a dedicated variant for data race detection, so we could have an IllegalMove variant?

@RalfJung
Copy link
Member Author

Regarding the mutex kind: From what I understand, static initializers set the kind of the mutex. So lock, trylock and unlock should always try to retrieve it from the bytes pointed by mutex_op. Doesn't this mean that we can't store the kind in some global map, and we need to keep storing it in the pthread_mutex_t?

What I imagined was lazy initialization: there's a flag in the pthread_mutex_t value that indicates "has been initialized", and we ensure that it indicates "no" for all the static initializers. All lock operations begin by checking that flag, and then if it is not yet initialized, registering a mutex at that address, with the mutex kind as given by the static initializer, and setting the flag. Explicit initialization with pthread_mutex_init immediately sets this flag.

Do we want a new TerminationInfo variant? I see that there is a dedicated variant for data race detection, so we could have an IllegalMove variant?

That should not be needed, throw_ub_format should be sufficient. Data races do a bunch of special things for nicer errors, which is why they need their own variant.

@RalfJung
Copy link
Member Author

For the initial PR, it's probably best to only convert one of the primitives, like mutex. :)

Mandragorian added a commit to Mandragorian/miri that referenced this issue Aug 2, 2024
Mandragorian added a commit to Mandragorian/miri that referenced this issue Aug 30, 2024
Mandragorian added a commit to Mandragorian/miri that referenced this issue Sep 5, 2024
@bors bors closed this as completed in 7b422fe Sep 5, 2024
@Mandragorian
Copy link
Contributor

This can't be closed yet. The above mentioned commit only implements the move detection for mutexes.

@RalfJung RalfJung reopened this Sep 5, 2024
@RalfJung
Copy link
Member Author

RalfJung commented Sep 5, 2024

Ah, good point.

I moved the macos version of this into a separate issue: #3859.

bors added a commit that referenced this issue Sep 9, 2024
detect when pthread_rwlock_t is moved

For some implementations of pthreads, the address of pthread_rwlock_t (or its fields) is used to identify the lock. That means that if the contents of a pthread_rwlock_t are moved in memory, effectively a new lock object is created, which is completely independted from the original. Thus we want to detect when when such objects are moved and show an error.

see also #3749 for more context
@RalfJung
Copy link
Member Author

I think condvar is the only one that's missing now, right?

@Mandragorian
Copy link
Contributor

Yes. I started to prepare a PR a few days ago, but then life got a bit busy. I need to add a test and polish it a bit.

Mandragorian added a commit to Mandragorian/miri that referenced this issue Sep 13, 2024
Mandragorian added a commit to Mandragorian/miri that referenced this issue Sep 13, 2024
Mandragorian added a commit to Mandragorian/miri that referenced this issue Sep 13, 2024
bors added a commit that referenced this issue Sep 14, 2024
detect when pthread_cond_t is moved

Closes #3749
@bors bors closed this as completed in b1aaf1a Sep 14, 2024
@Mandragorian
Copy link
Contributor

tl;dr: It might be allowed by the posix standard to move synchornization objects after initialization, when a flag in the kind is set. Maybe you were already aware of this but I wasn't, so I am leaving a comment here just in case:

I was reading the POSIX spec and in section 2.9.9 Synchronization Object Copies and Alternative Mappings it says the following:

If the process-shared attribute is set to PTHREAD_PROCESS_SHARED, only the synchronization object itself can be used for performing synchronization; however, it need not be referenced at the address used to initalize it (that is, another mapping of the same object can be used). The effect of referring to a copy of the object when locking, unlocking, or destroying it is undefined.

So it seems to me that it is possible, under certain conditions, to move the mutex after you initialize it.

Setting the "process-shared attribute" is done through pthread_mutexattr_setpshared. I don't see any reference to it in miri, so I assume it is not supported. Also from what I understand from reading glibc code the glibc static initializers only create PRIVATE synchronization objects. I don't know what other libc implementations do.

@RalfJung
Copy link
Member Author

The docs you quote still say

The effect of referring to a copy of the object when locking, unlocking, or destroying it is undefined.

so I don't think this allows moving them anywhere else?

But anyway we don't support this flag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-concurrency Area: affects our concurrency (multi-thread) support A-shims Area: This affects the external function shims C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement E-good-second-issue A good issue to pick up if you've already seen some parts of Miri, mentoring is available
Projects
None yet
2 participants