Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux on VexRiscv #60

Closed
ghost opened this issue Feb 28, 2019 · 345 comments
Closed

Linux on VexRiscv #60

ghost opened this issue Feb 28, 2019 · 345 comments
Assignees

Comments

@ghost
Copy link

ghost commented Feb 28, 2019

My intention with creating this issue is collecting/sharing information and gauging interest about running Linux on VecRiscv. From what I know, VexRiscv is still missing functionality, and it won't work out of the box.

A big problem is the MMU. Ideally, "someone" will hopefully write patches to add no-MMU support to Linux/RISC-V, but currently, a MMU is required. It appears VexRiscv has a partial MMU implementation using a software-filled TLB. There needs to be machine mode to walk the page tables and fill the TLBs, and I didn't find a reference implementation of that.

Another issue are atomics. Linux requires them currently. There seems to be partial support present in VexRiscv (a subset or so). Another possibility is patching the kernel not to use atomics if built without SMP support. There's also the question how much atomics support userspace typically requires.

Without doubt there are more issues that I don't know about.

Antmicro apparently made a Linux port: https://github.com/antmicro/litex-rv32-linux-system https://github.com/antmicro/litex-linux-riscv
I didn't know about this before and haven't managed to build the whole thing yet.
Unfortunately, their Linux kernel repository does not include the git history. Here's a diff against the apparent base: https://0x0.st/z-li.diff

Please post any other information you know.

@Dolu1990
Copy link
Member

About atomics, there is some support in VexRiscv to provide LR/SC in a local way, it only work for single CPU systems.

@ghost
Copy link
Author

ghost commented Feb 28, 2019

Yeah, "dummy" implementations that work on single CPU systems should be perfectly fine.

@enjoy-digital
Copy link

As discussed at Free Silicon Conference together with @Dolu1990 , we are also working on it here:
enjoy-digital/litex#134.

We can continue the discussion here for the CPU aspect. @daveshah1: i saw you made some progress,
just for info @Dolu1990 is ok to help getting things working. So it you see strange things or need help on things related to Spinal/Vexriscv, you can discuss your findings here.

@daveshah1
Copy link

My current status is that I have made quite a few hacks to the kernel, vexriscv and LiteX, but I'm still only just getting into userspace and not anywhere useful yet.

VexRiscv: https://github.com/daveshah1/VexRiscv/tree/Supervisor
Build config: https://github.com/daveshah1/VexRiscv-verilog/tree/linux
LiteX: https://github.com/daveshah1/litex/tree/vexriscv-linux
kernel: https://github.com/daveshah1/litex-linux-riscv

@Dolu1990 I would be interested if you could look at 818f1f6 - loads were always reading 0xffffffff from virtual memory addresses when bit 10 of the offset (0x400) was set. This seems to fix it, but I'm not sure if a better fix is possible

As it stands, the current issue is a kernel panic "Oops - environment call from S-mode" shortly after init starts. It seems after a few syscalls it either isn't returning properly to userspace, or a spurious ECALL is accidently triggered while in S-mode (it might be the ECALL getting "stuck" somewhere and lurking, so what should be an IRQ triggers the ECALL instead)

@Dolu1990
Copy link
Member

Hi @daveshah1 @enjoy-digital :D

So, for sure we will hit bugs in VexRiscv, as only the machine mode was properly tested.
Things not tested enough in VexRiscv which could have bugs :

  • Supervisor / User mode
  • MMU

I think the best would be to setup a minimal test environnement to run linux on. It would save us a lot of time and sanity. Especialy for a linux port project :D
So, to distinguish hardware bugs from software bugs my proposal is that i setup a minimalistic environnement where only the VexRiscv CPU is simulated and compared against a instruction syncronised software model of the CPU (I already have one which do that, but CSR are missing from it)
This would point exactly when the hardware is diverging from what it should do, and bring serenity in the developpement ^.^

Does that sound good for you ?

@daveshah1
Copy link

daveshah1 commented Mar 15, 2019

That sounds very sensible! The minimal peripheral requirement is low, just a timer (right now I have the LiteX timer connected to the timerInterruptS pin, and hacked the kernel to directly talk to that rather than the proper SBI route to setting up a timer) and a UART of some kind.

My only concern with this is speed, right now it is taking about 30s on hardware at 75MHz to get to the point of failure. So definitely want to use Verilator and not iverilog...

@enjoy-digital
Copy link

I can setup easily a verilator simulation. But 30s on hardware at 75MHz will still be a bit slow: we can expect 1MHz execution speed so that's still around 40 min...

@daveshah1
Copy link

I did just manage to make a bit of progress on hardware (perhaps this talk of simulators is scaring it into behaviour 😄)

It does reach userspace successfully, so we can almost say Linux is working. If I set /bin/sh as init, then I can even use shell builtins - being able to run echo hello world counts as Linux, right? (but calls to other programs don't seem to work). init itself is segfaulting deep within libc, so there's still something fishy, but could just be a dodgy rootfs.

@kgugala
Copy link

kgugala commented Mar 15, 2019

@daveshah1 this is great. The libc segfault happened also in our REnode (https://github.com/renode/renode) emulation. Can you share the rootfs you're using?

@daveshah1
Copy link

initramdisk.gz

This is the initramdisk from antmicro/litex-linux-readme with a small change to inittab to remove some references to files that don't exist

In terms of other outstanding issues, I also had to patch VexRiscv so that interrupts are routed to S-mode rather than M-mode. This broke the LiteX BIOS which expects M-mode interrupts, so I had to patch that to not expect interrupts at all, but that means there is now no useful UART output from the BIOS. I think a proper solution would be to select interrupt privilege dynamically somehow.

@kgugala
Copy link

kgugala commented Mar 15, 2019

We had to fix/workaround irq delegates. I think this code should be in our repo, but I'll check that again.

@daveshah1
Copy link

The segfault I see is:

[   53.060000] getty[45]: unhandled signal 11 code 0x1 at 0x00000004 in libc-2.26.so[5016f000+148000]
[   53.070000] CPU: 0 PID: 45 Comm: getty Not tainted 4.19.0-rc4-gb367bd23-dirty #105
[   53.080000] sepc: 501e2730 ra : 501e2e1c sp : 9f9b2c60
[   53.080000]  gp : 00120800 tp : 500223a0 t0 : 5001e960
[   53.090000]  t1 : 00000000 t2 : ffffffff s0 : 00000000
[   53.090000]  s1 : 00000000 a0 : 00000000 a1 : 502ba624
[   53.100000]  a2 : 00000000 a3 : 00000000 a4 : 000003ef
[   53.100000]  a5 : 00000160 a6 : 00000000 a7 : 0000270f
[   53.110000]  s2 : 502ba5f4 s3 : 00000000 s4 : 00000150
[   53.110000]  s5 : 00000014 s6 : 502ba628 s7 : 502bb714
[   53.120000]  s8 : 00000020 s9 : 00000000 s10: 000003ef
[   53.120000]  s11: 00000000 t3 : 00000008 t4 : 00000000
[   53.130000]  t5 : 00000000 t6 : 502ba090
[   53.130000] sstatus: 00000020 sbadaddr: 00000004 scause: 0000000d

The bad address (0x73730 in libc-2.26.so) seems to be in _IO_str_seekoff, the disassembly around it is:

   73700:	00080c93          	mv	s9,a6
   73704:	00048a13          	mv	s4,s1
   73708:	000e0c13          	mv	s8,t3
   7370c:	000d8993          	mv	s3,s11
   73710:	010a0793          	addi	a5,s4,16
   73714:	00000d93          	li	s11,0
   73718:	00000e93          	li	t4,0
   7371c:	00800e13          	li	t3,8
   73720:	3ef00d13          	li	s10,1007
   73724:	02f12223          	sw	a5,36(sp)
   73728:	04092483          	lw	s1,64(s2)
   7372c:	71648463          	beq	s1,s6,73e34 <_IO_str_seekoff@@GLIBC_2.26+0x41bc>
   73730:	0044a783          	lw	a5,4(s1)

@kgugala
Copy link

kgugala commented Mar 15, 2019

I checked the code, and it looks like all has been pushed to github.

As for the segfault: Note that we had to re implement the mapping code in Linux + there are some hacks in the Vex MMU itself. This could be reason of the segfault as user space starts using the virtual memory very extensively.

For example the whole kernel memory space is mapped directly and we bypass the MMU translation maps see:
https://github.com/antmicro/VexRiscv/blob/97d04a5243bbfee9d1dfe56857f3490da9fe1091/src/main/scala/vexriscv/plugin/MemoryTranslatorPlugin.scala#L116

the kernel range is defined in MMU plugin instance: https://github.com/antmicro/VexRiscv/blob/97d04a5243bbfee9d1dfe56857f3490da9fe1091/src/main/scala/vexriscv/TestsWorkspace.scala#L98

I'm pretty sure there are many bugs hidden there :)

@Dolu1990
Copy link
Member

Dolu1990 commented Mar 15, 2019

Ok, I will think about the best way and how exactly setup that test environnement with the syncronised software golden model (to get max speed).
About the golden model, i will complet it (MMU part). But then about the CSR i can do it too, but probably the best would be that somebody else than me cross check my interpretation of the privileged spec, because if both the hardware and the software golden model implement the same wrong interpretation, that's not so helpfull ^^.

@Dolu1990
Copy link
Member

Dolu1990 commented Mar 15, 2019

@enjoy-digital
Maybe we can keep the actual regression test environnement of VexRiscv, and just complet it with the required stuff.
It's a bit dirty, but it should be fine.
https://github.com/SpinalHDL/VexRiscv/blob/master/src/test/cpp/regression/main.cpp

The golden model is currently there
https://github.com/SpinalHDL/VexRiscv/blob/master/src/test/cpp/regression/main.cpp#L193

@enjoy-digital
Copy link

@Dolu1990: in fact i already have the verilator simulation that is working fine, just need improve it a little bit load more easily the vmlinux.bin/vmlinux.dtb and initramdisk to ram. But yes, we'll use what it more convenient for you. I'll look at the your regression env and your golden model.

@Dolu1990
Copy link
Member

@enjoy-digital Can you show me the verilator testbench sources :D ?

@Dolu1990
Copy link
Member

@kgugala Which CPU configuration are you using, can you show me ? (The test workspace you pointer isn't using caches nor MMU)

@daveshah1
Copy link

The config I am using is at https://github.com/daveshah1/VexRiscv-verilog/blob/linux/src/main/scala/vexriscv/GenCoreDefault.scala (which has a few small tweaks compared to @kgugala's, to skip over FENCEs for example).

@Dolu1990
Copy link
Member

@enjoy-digital The checks between the golden model and the RTL are :

  • Register file writes
  • Peripheral accesses
  • Some liveness checks

It should be enough to find out divegences fast.

@daveshah1 Jumping over Fence instruction is probably fine for the moment. But jumping over iFence instruction isn't. There is no cache coherency between the instruction cache and the data cache.

Need to use the caches fluch :) Is that used by some ways ?

@Dolu1990
Copy link
Member

(Memory coherency issues is something which is automaticaly catched by the golden model / RTL cross checkes)

@daveshah1
Copy link

As it stands it looks like all the memory has been set up as IO, which I suspect means the L1 caches won't be used at all - I think LiteX provides a single L2 cache.

Indeed, to get useful performance proper use of caches and cache flushes will be needed.

@kgugala
Copy link

kgugala commented Mar 16, 2019

yes, we disabled the caches as they were causing a lot of troubles. It didn't make sense to fight both MMU and caches at the same time

@Dolu1990
Copy link
Member

@daveshah1 Ok ^^ One thing to know, is the instruction cache do not support IO instruction fetch, instead it cache them. (Supporting IO instruction fetch cost area, and isn't realy a usefull think, as far i know ?)
So you still need to flush the instruction cache in iFence. It could be done easily.

@kgugala The cacheless plugins aren't aware about the MMU.
I perfectly understand your point about avoiding the trouble of both at once. So my proposal, is :

  • I port MMU support to cacheless instruction and data plugins
  • We test things on that cacheless configuration
  • Later when things are stable enough, we can introduce caches stuff via a proper machine mode ifence emulation

To the roadmap would be :

  • To port MMU support into cacheless plugins
  • Implement the cross checked test environnement
  • Test and fix stuff until it is stable enough
  • Introduce the caches in the loop with proper machine mode emulation

@kgugala
Copy link

kgugala commented Mar 16, 2019

TBH the real long term solution will be to reimplement the MMU so it is fully compliant with the spec. Then we can get rid of the custom mapping code in Linux and restore the original mainline memory mapping code used for RV64.

I'm aware this will require quite significant amount of work in Vex itself.

@Dolu1990
Copy link
Member

I don't think it would require that much work. MMU is a relatively easy piece of hardware.
I have to think about he heavyness in term of FPGA area of a fully compliant MMU.

But what is the issue of a software refilled MMU ? If it use the machine mode to do it, it became transparent to the linux kernel right ? So no linux kernel modification required, but just a piece of machine mode code to have in addition of the raw Linux port :) ?

@daveshah1
Copy link

Yes, I think an M-mode trap handler is the proper solution. We can probably use it to deal with any missing atomic instructions too.

@Dolu1990
Copy link
Member

(troll on)
We should not forget the ultimate goal : RISC-V linux on ice40 1K, i'm sure #28 would agree ^.^
(troll off)

@kgugala
Copy link

kgugala commented Mar 16, 2019

It just may be difficult to push the custom mapping code to Linux' mainline

@daveshah1
Copy link

The trap handler need not sit in Linux at all, it can be part of the bootloader.

@ghost
Copy link
Author

ghost commented Apr 30, 2019

Will report back when we actually start using it. (Unfortunately not open source.)
I guess this issue can be closed now?

@Dolu1990
Copy link
Member

Sure, lets us know how it go, and please, share the improvements/fixes if there is any :)

@enjoy-digital
Copy link

It seems to be working fine on hardware too :)
https://asciinema.org/a/WfNA99RCdVi8kTPfzNTeoMTtY
https://github.com/enjoy-digital/linux-on-litex-vexriscv
Thanks @Dolu1990!

@kgugala
Copy link

kgugala commented May 2, 2019

indeed, it works on hardware
great work!

@mithro
Copy link
Contributor

mithro commented Sep 15, 2019

@Dolu1990 You might want to unpin this issue now?

@Dolu1990
Copy link
Member

Right ^^

@Dolu1990 Dolu1990 unpinned this issue Sep 15, 2019
ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022
ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022
Add LitexSoC workspace / linux loading.
Need to emulate peripherals and adapte the kernel now.
Probably also need some machine mode emulation
Software time !
ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022
ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022
/sbin/init: error while loading shared libraries: libm.so.6: cannot stat shared object: Error 38
ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022
ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022
ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022
ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022
ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022
ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022
ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022
ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022
ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022
ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022
… regfile if the page was set as read only.
ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022
ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022
ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022
ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022
ciniml pushed a commit to ciniml/VexRiscv that referenced this issue Apr 18, 2022
Fix DBusCached plugin access sharing for the MMU deadlock when exception is in the decode stage
Fix IBusSimplePlugin issues with used with non regular configs + MMU
Bring back the LinuxGen config into a light one
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests