Linux on VexRiscv #60

ghost · 2019-02-28T15:59:19Z

My intention with creating this issue is collecting/sharing information and gauging interest about running Linux on VecRiscv. From what I know, VexRiscv is still missing functionality, and it won't work out of the box.

A big problem is the MMU. Ideally, "someone" will hopefully write patches to add no-MMU support to Linux/RISC-V, but currently, a MMU is required. It appears VexRiscv has a partial MMU implementation using a software-filled TLB. There needs to be machine mode to walk the page tables and fill the TLBs, and I didn't find a reference implementation of that.

Another issue are atomics. Linux requires them currently. There seems to be partial support present in VexRiscv (a subset or so). Another possibility is patching the kernel not to use atomics if built without SMP support. There's also the question how much atomics support userspace typically requires.

Without doubt there are more issues that I don't know about.

Antmicro apparently made a Linux port: https://github.com/antmicro/litex-rv32-linux-system https://github.com/antmicro/litex-linux-riscv
I didn't know about this before and haven't managed to build the whole thing yet.
Unfortunately, their Linux kernel repository does not include the git history. Here's a diff against the apparent base: https://0x0.st/z-li.diff

Please post any other information you know.

Dolu1990 · 2019-02-28T16:02:36Z

About atomics, there is some support in VexRiscv to provide LR/SC in a local way, it only work for single CPU systems.

ghost · 2019-02-28T16:06:03Z

Yeah, "dummy" implementations that work on single CPU systems should be perfectly fine.

enjoy-digital · 2019-03-15T15:47:51Z

As discussed at Free Silicon Conference together with @Dolu1990 , we are also working on it here:
enjoy-digital/litex#134.

We can continue the discussion here for the CPU aspect. @daveshah1: i saw you made some progress,
just for info @Dolu1990 is ok to help getting things working. So it you see strange things or need help on things related to Spinal/Vexriscv, you can discuss your findings here.

daveshah1 · 2019-03-15T15:55:54Z

My current status is that I have made quite a few hacks to the kernel, vexriscv and LiteX, but I'm still only just getting into userspace and not anywhere useful yet.

VexRiscv: https://github.com/daveshah1/VexRiscv/tree/Supervisor
Build config: https://github.com/daveshah1/VexRiscv-verilog/tree/linux
LiteX: https://github.com/daveshah1/litex/tree/vexriscv-linux
kernel: https://github.com/daveshah1/litex-linux-riscv

@Dolu1990 I would be interested if you could look at 818f1f6 - loads were always reading 0xffffffff from virtual memory addresses when bit 10 of the offset (0x400) was set. This seems to fix it, but I'm not sure if a better fix is possible

As it stands, the current issue is a kernel panic "Oops - environment call from S-mode" shortly after init starts. It seems after a few syscalls it either isn't returning properly to userspace, or a spurious ECALL is accidently triggered while in S-mode (it might be the ECALL getting "stuck" somewhere and lurking, so what should be an IRQ triggers the ECALL instead)

Dolu1990 · 2019-03-15T17:00:00Z

Hi @daveshah1 @enjoy-digital :D

So, for sure we will hit bugs in VexRiscv, as only the machine mode was properly tested.
Things not tested enough in VexRiscv which could have bugs :

Supervisor / User mode
MMU

I think the best would be to setup a minimal test environnement to run linux on. It would save us a lot of time and sanity. Especialy for a linux port project :D
So, to distinguish hardware bugs from software bugs my proposal is that i setup a minimalistic environnement where only the VexRiscv CPU is simulated and compared against a instruction syncronised software model of the CPU (I already have one which do that, but CSR are missing from it)
This would point exactly when the hardware is diverging from what it should do, and bring serenity in the developpement ^.^

Does that sound good for you ?

daveshah1 · 2019-03-15T17:13:46Z

That sounds very sensible! The minimal peripheral requirement is low, just a timer (right now I have the LiteX timer connected to the timerInterruptS pin, and hacked the kernel to directly talk to that rather than the proper SBI route to setting up a timer) and a UART of some kind.

My only concern with this is speed, right now it is taking about 30s on hardware at 75MHz to get to the point of failure. So definitely want to use Verilator and not iverilog...

enjoy-digital · 2019-03-15T17:29:53Z

I can setup easily a verilator simulation. But 30s on hardware at 75MHz will still be a bit slow: we can expect 1MHz execution speed so that's still around 40 min...

daveshah1 · 2019-03-15T18:34:47Z

I did just manage to make a bit of progress on hardware (perhaps this talk of simulators is scaring it into behaviour 😄)

It does reach userspace successfully, so we can almost say Linux is working. If I set /bin/sh as init, then I can even use shell builtins - being able to run echo hello world counts as Linux, right? (but calls to other programs don't seem to work). init itself is segfaulting deep within libc, so there's still something fishy, but could just be a dodgy rootfs.

kgugala · 2019-03-15T19:01:28Z

@daveshah1 this is great. The libc segfault happened also in our REnode (https://github.com/renode/renode) emulation. Can you share the rootfs you're using?

daveshah1 · 2019-03-15T19:06:39Z

initramdisk.gz

This is the initramdisk from antmicro/litex-linux-readme with a small change to inittab to remove some references to files that don't exist

In terms of other outstanding issues, I also had to patch VexRiscv so that interrupts are routed to S-mode rather than M-mode. This broke the LiteX BIOS which expects M-mode interrupts, so I had to patch that to not expect interrupts at all, but that means there is now no useful UART output from the BIOS. I think a proper solution would be to select interrupt privilege dynamically somehow.

kgugala · 2019-03-15T19:09:29Z

We had to fix/workaround irq delegates. I think this code should be in our repo, but I'll check that again.

daveshah1 · 2019-03-15T19:10:39Z

The segfault I see is:

[   53.060000] getty[45]: unhandled signal 11 code 0x1 at 0x00000004 in libc-2.26.so[5016f000+148000]
[   53.070000] CPU: 0 PID: 45 Comm: getty Not tainted 4.19.0-rc4-gb367bd23-dirty #105
[   53.080000] sepc: 501e2730 ra : 501e2e1c sp : 9f9b2c60
[   53.080000]  gp : 00120800 tp : 500223a0 t0 : 5001e960
[   53.090000]  t1 : 00000000 t2 : ffffffff s0 : 00000000
[   53.090000]  s1 : 00000000 a0 : 00000000 a1 : 502ba624
[   53.100000]  a2 : 00000000 a3 : 00000000 a4 : 000003ef
[   53.100000]  a5 : 00000160 a6 : 00000000 a7 : 0000270f
[   53.110000]  s2 : 502ba5f4 s3 : 00000000 s4 : 00000150
[   53.110000]  s5 : 00000014 s6 : 502ba628 s7 : 502bb714
[   53.120000]  s8 : 00000020 s9 : 00000000 s10: 000003ef
[   53.120000]  s11: 00000000 t3 : 00000008 t4 : 00000000
[   53.130000]  t5 : 00000000 t6 : 502ba090
[   53.130000] sstatus: 00000020 sbadaddr: 00000004 scause: 0000000d

The bad address (0x73730 in libc-2.26.so) seems to be in _IO_str_seekoff, the disassembly around it is:

   73700:	00080c93          	mv	s9,a6
   73704:	00048a13          	mv	s4,s1
   73708:	000e0c13          	mv	s8,t3
   7370c:	000d8993          	mv	s3,s11
   73710:	010a0793          	addi	a5,s4,16
   73714:	00000d93          	li	s11,0
   73718:	00000e93          	li	t4,0
   7371c:	00800e13          	li	t3,8
   73720:	3ef00d13          	li	s10,1007
   73724:	02f12223          	sw	a5,36(sp)
   73728:	04092483          	lw	s1,64(s2)
   7372c:	71648463          	beq	s1,s6,73e34 <_IO_str_seekoff@@GLIBC_2.26+0x41bc>
   73730:	0044a783          	lw	a5,4(s1)

kgugala · 2019-03-15T19:37:55Z

I checked the code, and it looks like all has been pushed to github.

As for the segfault: Note that we had to re implement the mapping code in Linux + there are some hacks in the Vex MMU itself. This could be reason of the segfault as user space starts using the virtual memory very extensively.

For example the whole kernel memory space is mapped directly and we bypass the MMU translation maps see:
https://github.com/antmicro/VexRiscv/blob/97d04a5243bbfee9d1dfe56857f3490da9fe1091/src/main/scala/vexriscv/plugin/MemoryTranslatorPlugin.scala#L116

the kernel range is defined in MMU plugin instance: https://github.com/antmicro/VexRiscv/blob/97d04a5243bbfee9d1dfe56857f3490da9fe1091/src/main/scala/vexriscv/TestsWorkspace.scala#L98

I'm pretty sure there are many bugs hidden there :)

Dolu1990 · 2019-03-15T22:23:00Z

Ok, I will think about the best way and how exactly setup that test environnement with the syncronised software golden model (to get max speed).
About the golden model, i will complet it (MMU part). But then about the CSR i can do it too, but probably the best would be that somebody else than me cross check my interpretation of the privileged spec, because if both the hardware and the software golden model implement the same wrong interpretation, that's not so helpfull ^^.

Dolu1990 · 2019-03-15T22:30:08Z

@enjoy-digital
Maybe we can keep the actual regression test environnement of VexRiscv, and just complet it with the required stuff.
It's a bit dirty, but it should be fine.
https://github.com/SpinalHDL/VexRiscv/blob/master/src/test/cpp/regression/main.cpp

The golden model is currently there
https://github.com/SpinalHDL/VexRiscv/blob/master/src/test/cpp/regression/main.cpp#L193

enjoy-digital · 2019-03-16T07:42:03Z

@Dolu1990: in fact i already have the verilator simulation that is working fine, just need improve it a little bit load more easily the vmlinux.bin/vmlinux.dtb and initramdisk to ram. But yes, we'll use what it more convenient for you. I'll look at the your regression env and your golden model.

Dolu1990 · 2019-03-16T09:00:53Z

@enjoy-digital Can you show me the verilator testbench sources :D ?

Dolu1990 · 2019-03-16T09:06:02Z

@kgugala Which CPU configuration are you using, can you show me ? (The test workspace you pointer isn't using caches nor MMU)

daveshah1 · 2019-03-16T09:09:00Z

The config I am using is at https://github.com/daveshah1/VexRiscv-verilog/blob/linux/src/main/scala/vexriscv/GenCoreDefault.scala (which has a few small tweaks compared to @kgugala's, to skip over FENCEs for example).

Dolu1990 · 2019-03-16T09:15:51Z

@enjoy-digital The checks between the golden model and the RTL are :

Register file writes
Peripheral accesses
Some liveness checks

It should be enough to find out divegences fast.

@daveshah1 Jumping over Fence instruction is probably fine for the moment. But jumping over iFence instruction isn't. There is no cache coherency between the instruction cache and the data cache.

Need to use the caches fluch :) Is that used by some ways ?

Dolu1990 · 2019-03-16T09:18:06Z

(Memory coherency issues is something which is automaticaly catched by the golden model / RTL cross checkes)

daveshah1 · 2019-03-16T09:18:56Z

As it stands it looks like all the memory has been set up as IO, which I suspect means the L1 caches won't be used at all - I think LiteX provides a single L2 cache.

Indeed, to get useful performance proper use of caches and cache flushes will be needed.

kgugala · 2019-03-16T09:20:30Z

yes, we disabled the caches as they were causing a lot of troubles. It didn't make sense to fight both MMU and caches at the same time

Dolu1990 · 2019-03-16T09:43:31Z

@daveshah1 Ok ^^ One thing to know, is the instruction cache do not support IO instruction fetch, instead it cache them. (Supporting IO instruction fetch cost area, and isn't realy a usefull think, as far i know ?)
So you still need to flush the instruction cache in iFence. It could be done easily.

@kgugala The cacheless plugins aren't aware about the MMU.
I perfectly understand your point about avoiding the trouble of both at once. So my proposal, is :

I port MMU support to cacheless instruction and data plugins
We test things on that cacheless configuration
Later when things are stable enough, we can introduce caches stuff via a proper machine mode ifence emulation

To the roadmap would be :

To port MMU support into cacheless plugins
Implement the cross checked test environnement
Test and fix stuff until it is stable enough
Introduce the caches in the loop with proper machine mode emulation

kgugala · 2019-03-16T09:53:01Z

TBH the real long term solution will be to reimplement the MMU so it is fully compliant with the spec. Then we can get rid of the custom mapping code in Linux and restore the original mainline memory mapping code used for RV64.

I'm aware this will require quite significant amount of work in Vex itself.

Dolu1990 · 2019-03-16T09:57:52Z

I don't think it would require that much work. MMU is a relatively easy piece of hardware.
I have to think about he heavyness in term of FPGA area of a fully compliant MMU.

But what is the issue of a software refilled MMU ? If it use the machine mode to do it, it became transparent to the linux kernel right ? So no linux kernel modification required, but just a piece of machine mode code to have in addition of the raw Linux port :) ?

daveshah1 · 2019-03-16T09:58:53Z

Yes, I think an M-mode trap handler is the proper solution. We can probably use it to deal with any missing atomic instructions too.

Dolu1990 · 2019-03-16T10:03:41Z

(troll on)
We should not forget the ultimate goal : RISC-V linux on ice40 1K, i'm sure #28 would agree ^.^
(troll off)

kgugala · 2019-03-16T10:05:55Z

It just may be difficult to push the custom mapping code to Linux' mainline

daveshah1 · 2019-03-16T10:07:47Z

The trap handler need not sit in Linux at all, it can be part of the bootloader.

ghost · 2019-04-30T15:26:26Z

Will report back when we actually start using it. (Unfortunately not open source.)
I guess this issue can be closed now?

Dolu1990 · 2019-04-30T15:30:31Z

Sure, lets us know how it go, and please, share the improvements/fixes if there is any :)

enjoy-digital · 2019-05-02T17:09:14Z

It seems to be working fine on hardware too :)
https://asciinema.org/a/WfNA99RCdVi8kTPfzNTeoMTtY
https://github.com/enjoy-digital/linux-on-litex-vexriscv
Thanks @Dolu1990!

kgugala · 2019-05-02T20:48:51Z

indeed, it works on hardware
great work!

mithro · 2019-09-15T00:47:05Z

@Dolu1990 You might want to unpin this issue now?

Dolu1990 · 2019-09-15T08:48:09Z

Right ^^

Add LitexSoC workspace / linux loading. Need to emulate peripherals and adapte the kernel now. Probably also need some machine mode emulation Software time !

/sbin/init: error while loading shared libraries: libm.so.6: cannot stat shared object: Error 38

…nterrupt

…zero

…nstruction

…scala to help reproduce

…d regressions

…ing linux

…isn't doing so.

… regfile if the page was set as read only.

…abilities

Fix DBusCached plugin access sharing for the MMU deadlock when exception is in the decode stage Fix IBusSimplePlugin issues with used with non regular configs + MMU Bring back the LinuxGen config into a light one

enjoy-digital mentioned this issue Apr 28, 2019

Land linux optimized version of vexriscv enjoy-digital/litex#134

Closed

Dolu1990 closed this as completed Apr 30, 2019

mithro mentioned this issue May 4, 2019

Import the Taiga high performance RISC-V core enjoy-digital/litex#177

Closed

Dolu1990 unpinned this issue Sep 15, 2019

mithro mentioned this issue Jan 14, 2020

Add to LiteX ecosystem? rsd-devel/rsd#6

Open

ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022

Got the new MMU design to pass simple tests SpinalHDL#60

5f82178

ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022

SpinalHDL#60

60cd65d

Add LitexSoC workspace / linux loading. Need to emulate peripherals and adapte the kernel now. Probably also need some machine mode emulation Software time !

ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022

Add MMU MPRIV for easier machinemode emulation SpinalHDL#60

e013d20

ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022

SpinalHDL#60 user space reached

f8be95a

/sbin/init: error while loading shared libraries: libm.so.6: cannot stat shared object: Error 38

ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022

SpinalHDL#60 should fix the first instruction fetch privilege after i…

5c9187c

…nterrupt

ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022

SpinalHDL#60 Fix instruction fetch exception PC by forcing LSB to be …

c6aec9f

…zero

ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022

SpinalHDL#60 Fix software model. Forgot physical address for on RVC i…

ac4e54a

…nstruction

ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022

SpinalHDL#60 Sync everything, added much comment on the top of Linux.…

bcc3087

…scala to help reproduce

ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022

SpinalHDL#60 Fix interrupt causing fetch privilege issues

b5c7e8d

ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022

SpinalHDL#60 Got instruction cache running linux :D

c4dee81

ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022

SpinalHDL#60 Got the new instruction cache design passing the standar…

8bd61a5

…d regressions

ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022

SpinalHDL#60 Got the new data cache design passing all tests and runn…

1c6bad1

…ing linux

ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022

SpinalHDL#60 Fix MMU holding invalid tlb, while linux is assuming it …

51d7104

…isn't doing so.

ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022

SpinalHDL#60 Fix machine mode emulator atomic emulation. Do not write…

530bfdd

… regfile if the page was set as read only.

ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022

SpinalHDL#60 Fix instruction cache refill

f8202df

ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022

SpinalHDL#60 Fix missing ecallGen flag

62fb33b

ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022

SpinalHDL#60 Add sim error message on RVC instruction without RVC cap…

f259660

…abilities

ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022

SpinalHDL#60 Fix SFENCE_VMA deadlock

c79678f

Linux on VexRiscv #60

Linux on VexRiscv #60

Comments

ghost commented Feb 28, 2019

Dolu1990 commented Feb 28, 2019

ghost commented Feb 28, 2019

enjoy-digital commented Mar 15, 2019

daveshah1 commented Mar 15, 2019

Dolu1990 commented Mar 15, 2019

daveshah1 commented Mar 15, 2019 • edited Loading

enjoy-digital commented Mar 15, 2019

daveshah1 commented Mar 15, 2019

kgugala commented Mar 15, 2019

daveshah1 commented Mar 15, 2019

kgugala commented Mar 15, 2019

daveshah1 commented Mar 15, 2019

kgugala commented Mar 15, 2019

Dolu1990 commented Mar 15, 2019 • edited Loading

Dolu1990 commented Mar 15, 2019 • edited Loading

enjoy-digital commented Mar 16, 2019

Dolu1990 commented Mar 16, 2019

Dolu1990 commented Mar 16, 2019

daveshah1 commented Mar 16, 2019

Dolu1990 commented Mar 16, 2019

Dolu1990 commented Mar 16, 2019

daveshah1 commented Mar 16, 2019

kgugala commented Mar 16, 2019

Dolu1990 commented Mar 16, 2019

kgugala commented Mar 16, 2019

Dolu1990 commented Mar 16, 2019

daveshah1 commented Mar 16, 2019

Dolu1990 commented Mar 16, 2019

kgugala commented Mar 16, 2019

daveshah1 commented Mar 16, 2019

ghost commented Apr 30, 2019

Dolu1990 commented Apr 30, 2019

enjoy-digital commented May 2, 2019

kgugala commented May 2, 2019

mithro commented Sep 15, 2019

Dolu1990 commented Sep 15, 2019

daveshah1 commented Mar 15, 2019 •

edited

Loading

Dolu1990 commented Mar 15, 2019 •

edited

Loading

Dolu1990 commented Mar 15, 2019 •

edited

Loading