-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISC-V] coreclr-pal and coreclr-debug directories #82380
Conversation
clamp03
commented
Feb 20, 2023
- Successfully cross-build for RISC-V.
- Run A simple application "helloworld"
- Fail a test in clr.paltest
If you don't mind, I want to fix a paltest error after 4 risc-v PRs are merged. It is hard to investigate an error now while I am trying to manage PRs. |
.macro PROLOG_STACK_ALLOC Size | ||
addi sp, sp, -\Size | ||
//.cfi_adjust_cfa_offset \Size | ||
.cfi_def_cfa sp,\Size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not adjust the CFA. The adjustment is made in the code using this macro, e.g. in the CallSignalHandlerWrapper
implementation that was added in this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will put CFG adjust codes in the code using this macro. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this changed from my previous review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated other codes, however I missed these
I removed .cfi_def_cfa in the macros. PROLOG_STACK_ALLOC and EPILOG_STACK_FREE
Thank you.
.macro EPILOG_STACK_FREE Size | ||
addi sp, sp, \Size | ||
//.cfi_adjust_cfa_offset -\Size | ||
.cfi_def_cfa sp,-\Size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason for not using the .cfi_adjust_cfa_offset
? I am just curious, it is not a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented based on LOONGARCH. I updated assembly codes and copied most other implementations (including cfi).
#if HAVE_FULLY_FEATURED_PTHREAD_MUTEXES && \ | ||
HAVE_FUNCTIONAL_PTHREAD_ROBUST_MUTEXES && \ | ||
!(defined(__FreeBSD__) || defined(TARGET_OSX)) | ||
!(defined(__FreeBSD__) || defined(TARGET_OSX) || defined(TARGET_RISCV64)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should define the HAVE_FUNCTIONAL_PTHREAD_ROBUST_MUTEXES_EXITCODE
to 1 (=> we don't have them) for RISCV64 in the tryrun.cmake. Now we set it to 0 for may architectures including RISCV64 here:
runtime/eng/native/tryrun.cmake
Lines 163 to 165 in 53bb518
if(TARGET_ARCH_NAME MATCHES "^(x86|s390x|armv6|loongarch64|riscv64|ppc64le)$") | |
set_cache_value(HAVE_FUNCTIONAL_PTHREAD_ROBUST_MUTEXES_EXITCODE 0) | |
endif() |
This is used for cross compilation. For compiling on real RISCV6 device, it would get detected at build time using a test execution in the configure.cmake here:
runtime/src/coreclr/pal/src/configure.cmake
Lines 1121 to 1308 in 0f1866e
if(NOT CLR_CMAKE_HOST_ARCH_ARM AND NOT CLR_CMAKE_HOST_ARCH_ARM64) | |
set(CMAKE_REQUIRED_LIBRARIES pthread) | |
check_cxx_source_runs(" | |
// This test case verifies the pthread process-shared robust mutex's cross-process abandon detection. The parent process starts | |
// a child process that locks the mutex, the process process then waits to acquire the lock, and the child process abandons the | |
// mutex by exiting the process while holding the lock. The parent process should then be released from its wait, be assigned | |
// ownership of the lock, and be notified that the mutex was abandoned. | |
#include <sys/mman.h> | |
#include <sys/time.h> | |
#include <errno.h> | |
#include <pthread.h> | |
#include <stdio.h> | |
#include <unistd.h> | |
#include <new> | |
using namespace std; | |
struct Shm | |
{ | |
pthread_mutex_t syncMutex; | |
pthread_cond_t syncCondition; | |
pthread_mutex_t robustMutex; | |
int conditionValue; | |
Shm() : conditionValue(0) | |
{ | |
} | |
} *shm; | |
int GetFailTimeoutTime(struct timespec *timeoutTimeRef) | |
{ | |
int getTimeResult = clock_gettime(CLOCK_REALTIME, timeoutTimeRef); | |
if (getTimeResult != 0) | |
{ | |
struct timeval tv; | |
getTimeResult = gettimeofday(&tv, NULL); | |
if (getTimeResult != 0) | |
return 1; | |
timeoutTimeRef->tv_sec = tv.tv_sec; | |
timeoutTimeRef->tv_nsec = tv.tv_usec * 1000; | |
} | |
timeoutTimeRef->tv_sec += 30; | |
return 0; | |
} | |
int WaitForConditionValue(int desiredConditionValue) | |
{ | |
struct timespec timeoutTime; | |
if (GetFailTimeoutTime(&timeoutTime) != 0) | |
return 1; | |
if (pthread_mutex_timedlock(&shm->syncMutex, &timeoutTime) != 0) | |
return 1; | |
if (shm->conditionValue != desiredConditionValue) | |
{ | |
if (GetFailTimeoutTime(&timeoutTime) != 0) | |
return 1; | |
if (pthread_cond_timedwait(&shm->syncCondition, &shm->syncMutex, &timeoutTime) != 0) | |
return 1; | |
if (shm->conditionValue != desiredConditionValue) | |
return 1; | |
} | |
if (pthread_mutex_unlock(&shm->syncMutex) != 0) | |
return 1; | |
return 0; | |
} | |
int SetConditionValue(int newConditionValue) | |
{ | |
struct timespec timeoutTime; | |
if (GetFailTimeoutTime(&timeoutTime) != 0) | |
return 1; | |
if (pthread_mutex_timedlock(&shm->syncMutex, &timeoutTime) != 0) | |
return 1; | |
shm->conditionValue = newConditionValue; | |
if (pthread_cond_signal(&shm->syncCondition) != 0) | |
return 1; | |
if (pthread_mutex_unlock(&shm->syncMutex) != 0) | |
return 1; | |
return 0; | |
} | |
void DoTest_Child(); | |
int DoTest() | |
{ | |
// Map some shared memory | |
void *shmBuffer = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_SHARED, -1, 0); | |
if (shmBuffer == MAP_FAILED) | |
return 1; | |
shm = new(shmBuffer) Shm; | |
// Create sync mutex | |
pthread_mutexattr_t syncMutexAttributes; | |
if (pthread_mutexattr_init(&syncMutexAttributes) != 0) | |
return 1; | |
if (pthread_mutexattr_setpshared(&syncMutexAttributes, PTHREAD_PROCESS_SHARED) != 0) | |
return 1; | |
if (pthread_mutex_init(&shm->syncMutex, &syncMutexAttributes) != 0) | |
return 1; | |
if (pthread_mutexattr_destroy(&syncMutexAttributes) != 0) | |
return 1; | |
// Create sync condition | |
pthread_condattr_t syncConditionAttributes; | |
if (pthread_condattr_init(&syncConditionAttributes) != 0) | |
return 1; | |
if (pthread_condattr_setpshared(&syncConditionAttributes, PTHREAD_PROCESS_SHARED) != 0) | |
return 1; | |
if (pthread_cond_init(&shm->syncCondition, &syncConditionAttributes) != 0) | |
return 1; | |
if (pthread_condattr_destroy(&syncConditionAttributes) != 0) | |
return 1; | |
// Create the robust mutex that will be tested | |
pthread_mutexattr_t robustMutexAttributes; | |
if (pthread_mutexattr_init(&robustMutexAttributes) != 0) | |
return 1; | |
if (pthread_mutexattr_setpshared(&robustMutexAttributes, PTHREAD_PROCESS_SHARED) != 0) | |
return 1; | |
if (pthread_mutexattr_setrobust(&robustMutexAttributes, PTHREAD_MUTEX_ROBUST) != 0) | |
return 1; | |
if (pthread_mutex_init(&shm->robustMutex, &robustMutexAttributes) != 0) | |
return 1; | |
if (pthread_mutexattr_destroy(&robustMutexAttributes) != 0) | |
return 1; | |
// Start child test process | |
int error = fork(); | |
if (error == -1) | |
return 1; | |
if (error == 0) | |
{ | |
DoTest_Child(); | |
return -1; | |
} | |
// Wait for child to take a lock | |
WaitForConditionValue(1); | |
// Wait to try to take a lock. Meanwhile, child abandons the robust mutex. | |
struct timespec timeoutTime; | |
if (GetFailTimeoutTime(&timeoutTime) != 0) | |
return 1; | |
error = pthread_mutex_timedlock(&shm->robustMutex, &timeoutTime); | |
if (error != EOWNERDEAD) // expect to be notified that the robust mutex was abandoned | |
return 1; | |
if (pthread_mutex_consistent(&shm->robustMutex) != 0) | |
return 1; | |
if (pthread_mutex_unlock(&shm->robustMutex) != 0) | |
return 1; | |
if (pthread_mutex_destroy(&shm->robustMutex) != 0) | |
return 1; | |
return 0; | |
} | |
void DoTest_Child() | |
{ | |
// Lock the robust mutex | |
struct timespec timeoutTime; | |
if (GetFailTimeoutTime(&timeoutTime) != 0) | |
return; | |
if (pthread_mutex_timedlock(&shm->robustMutex, &timeoutTime) != 0) | |
return; | |
// Notify parent that robust mutex is locked | |
if (SetConditionValue(1) != 0) | |
return; | |
// Wait a short period to let the parent block on waiting for a lock | |
sleep(1); | |
// Abandon the mutex by exiting the process while holding the lock. Parent's wait should be released by EOWNERDEAD. | |
} | |
int main() | |
{ | |
int result = DoTest(); | |
return result >= 0 ? result : 0; | |
}" HAVE_FUNCTIONAL_PTHREAD_ROBUST_MUTEXES) | |
set(CMAKE_REQUIRED_LIBRARIES) | |
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The codes set NAMED_MUTEX_USE_PTHREAD_MUTEX
to 0 to fix paltest fail (HAVE_FUNCTIONAL_PTHREAD_ROBUST_MUTEXES_EXITCODE
threading/NamedMutex/test1/paltest_namedmutex_test1
).
Do you mean we should set it to 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused - setting the HAVE_FUNCTIONAL_PTHREAD_ROBUST_MUTEXES_EXITCODE
in the tryrun.cmake to 0 means that the riscv Linux has functional robust mutexes. But in your comment, you say it doesn't. That's why I have mentioned setting it to 1 for riscv4 which should do the same thing as the extra condition you have added to the #if
. In the cross compilation case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I will set HAVE_FUNCTIONAL_PTHREAD_ROBUST_MUTEXES_EXITCODE
to 1
like others.
Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the change in the tryrun.cmake, the change in this ifdef should not be needed (because the HAVE_FUNCTIONAL_PTHREAD_ROBUST_MUTEXES
is now 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. Removed the codes.
Updated by review in dotnet#82380
- Successfully cross-build for RISC-V. - Run A simple application "helloworld" - Fail a test in clr.paltest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Looked just through half of this, have few comments.
src/coreclr/pal/inc/pal.h
Outdated
@@ -3713,7 +3691,7 @@ YieldProcessor() | |||
#elif defined(HOST_LOONGARCH64) | |||
__asm__ volatile( "dbar 0; \n"); | |||
#elif defined(HOST_RISCV64) | |||
__asm__ __volatile__( "wfi"); | |||
__asm__ __volatile__( "fence iorw, iorw"); // TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be __asm__ __volatile__( "pause");
. But this is part of Zihintpause extension, and as I remember it's not part of rv64gc, which is chosen for implementation in runtime. Maybe someone can correct me on this, but looks like we can't use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use fence iorw, iorw
because the instruction has similar functionality to dbar
in LOONGARCH. (If I don't know, I referenced many things from LOONGARCH.) If someone know better instruction, please let me know. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be asm volatile( "pause");. But this is part of Zihintpause extension, and as I remember it's not part of rv64gc
What is the desired result of YieldProcessor, exactly?
wfi
is certainly incorrect in any user program or even the OS. It is a Machine mode instruction that halts the entire CPU core, potentially forever, if there is no timer set up or pending I/O with completion interrupt. It is illegal instruction in S or U modes.
fence iorw,iorw
will take some time to execute (perhaps), but it doesn't save anything
pause
is the correct thing if you want a short delay in a busy-wait loop. It will sleep the current HART for a short time, saving energy or allowing other HARTs on the same core (hyperthreading) to execute faster. It is undefined how long it sleeps for, but is likely to be on the order of 10 to 100 clock cycles -- maybe similar to the time needed for a cache miss, but certainly less than the time needed to do an OS context switch to another process.
You can always use pause
(.word 0x0100000f
) on a machine that does not implement Zihintpause
because it is encoded as fence w,
(which in fact is just a NOP
, as there is no successor set). If Zihintpause
is not implemented then there is no better way to get a short pause in a polling loop anyway.
I tried...
.globl mypause
mypause:
pause
addi a0,a0,-1
bnez a0,mypause
ret
... on my VisionFive 2.
With argument 100,000,000 it takes 2.16 seconds. With the pause
commented out, 0.23 seconds. So the pause
takes 20ns or 30 clock cycles.
With fence iorw,iorw
instead of pause
it takes 0.124s. Yes -- even faster than without the fence. That's dual-issue pipelines for you ))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pause is the correct thing if you want a short delay in a busy-wait loop
Yes, that's the intention, similar to yield
instruction on arm. Thank you for detailed explanation.
My concern about pause
availability was that even though it is a hint, on ubuntu 20.04 with gcc 9.4 I was not able to compile it for riscv: Error: unrecongnized opcode 'pause'
. Similar error with fence w,
, fence w,0
, etc.: iilegal operands
. So, it also depends on minimal supported gcc/clang version. I can't check on ubuntu 22.04 right now, but suppose it should work fine there since there's 2 year difference. And if this change doesn't break runtime build, then everything's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running four programs doing pause
in a loop on the VisionFive 2 shows 100% CPU use on top and raises the temperature from 46 C to 52 C after some minutes. Commenting out the pause
results in the temperature quickly stabilising at 57.5 C.
So there is a significant energy saving by using pause
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern about
pause
availability was that even though it is a hint, on ubuntu 20.04 with gcc 9.4 I was not able to compile it for riscv:Error: unrecongnized opcode 'pause'
the ZiHintPause ISA extension was ratified in February 2021, so it's not reasonable to expect Ubuntu 20.04 to have it, or even gcc 9.4 -- a purely bug fix release in June 2021, when the actual current gcc release was 11.1.
RISC-V is developing extremely rapidly at the moment, with I think 15 important ISA extensions ratified in November 2021 and more every few months since. I think the pace will slow after the end of this year and wait for cores and SoCs to catch up.
It's important to use the latest tools. You need to be using gcc 12.2 (you can build it from here https://github.com/riscv-collab/riscv-gnu-toolchain) or llvm at least 14, but better 16.
If using old toolchains, .word 0x0100000f
is a safe way to insert pause
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to
yield
instruction on arm
ARM's documentation says "The YIELD instruction has no effect in a single-threaded system" (by which they mean single-thread core, not single core system).
I can't find anything suggesting that there might be energy savings from YIELD on a single-threaded system.
I just tried ...
.globl _mypause
.align 2
_mypause:
1:
subs x0,x0,#1
yield
bne 1b
ret
... on my M1 Mac Mini and the execution time for 10,000,000,000 loops was 3.2 seconds, both with and without the yield
.
I also tried Graviton 3 on AWS. Result: 10,000,000,000 loops in 3.85 seconds, both with and without the yield
.
So in fact ARM yield
and RISC-V pause
are not quite the same thing. Or M1 and Graviton 3 both don't implement yield
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brucehoult Wow!! Thank you so much for your explanation and experiments. It set it to .word 0x0100000f
. Thank you.
Testing the crossbuild to riscv64 on Alpine Linux edge, which uses clang-16, reveals some text relocations when linking Starting from binutils' ld 2.35 and lld-15, if there are text relocations a warning is issued by default. |
Thank you! |
@janvorli @mangod9 Could you review it? And if it is good, could you please merge it? I fixed all the things requested by the reviewers and waited 50 days. Other two PRs (#82379 , #82382) are already merged. Actually, I cannot fix all bugs and unimplemented code with just this PR. After the PR is merged, I will continue to fix and my colleagues and community members can help to support RISC-V in coreclr better. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are just two minor nits, otherwise it looks good.
.macro PROLOG_STACK_ALLOC Size | ||
addi sp, sp, -\Size | ||
//.cfi_adjust_cfa_offset \Size | ||
.cfi_def_cfa sp,\Size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this changed from my previous review.
#if HAVE_FULLY_FEATURED_PTHREAD_MUTEXES && \ | ||
HAVE_FUNCTIONAL_PTHREAD_ROBUST_MUTEXES && \ | ||
!(defined(__FreeBSD__) || defined(TARGET_OSX)) | ||
!(defined(__FreeBSD__) || defined(TARGET_OSX) || defined(TARGET_RISCV64)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the change in the tryrun.cmake, the change in this ifdef should not be needed (because the HAVE_FUNCTIONAL_PTHREAD_ROBUST_MUTEXES
is now 0)
Thank you so much. If I fix incorrectly again, please let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
The WASM leg failure is a known issue, merging the change. |
* [RISC-V] coreclr-vm and other directories in coreclr - Successfully cross-build for RISC-V. - Run A simple application "helloworld" - Fail a test in clr.paltest * Remove commented codes * Fix mistakes * Update by reviews * [VM] Fix test * [VM] Update Updated by review in #82380 * [VM] Update assert and todo comments * [VM] Updated * [VM] Fix a bug * [VM] Add getRISCV64PassStructInRegisterFlags * Revert "[VM] Add getRISCV64PassStructInRegisterFlags" This reverts commit cd1ea45. * [VM] Restore getLoongArch64PassStructInRegisterFlags In coreclr-jit patch, it makes getRISCV64PassStructInRegisterFlags for RISCV64. So restore getLoongArch64PassStructInRegisterFlags * [VM] FIX TEST ERRORS * [VM] Update Stubs * Fix exceptionhandling, stack smashing detected * [VM] Fix vm/riscv64 --------- Co-authored-by: Timur Mustafin <[email protected]>