-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asking about system bus protocol and simulation instructions. #59
Comments
The busy signals work this way: Processor sets address/data lines and pulses RSTRB or WMASK for exactly one clock cycle. In the cycle following the RSTRB or WMASK pulse, the processor checks for WBUSY/RBUSY and if high, waits until they go low again. Memory or peripherals therefore can raise RBUSY or WBUSY one clock cycle after they received the request; if no busy signaling is necessary, peripherals can ignore these lines, which can even be tied low permanently. The processor holds address/data lines stable as long as the busy lines are set. Oh well, yes, we certainly did experiment with different possibilities for internal adders/mux combinations and optimised Quark for size! Follow the complete history here: #1 Quark is the result of the most useful compromise fitting into the Icestick with HX1K FPGA. If you want to go faster with Quark in terms of maximum frequency, choose Tachyon. https://github.com/BrunoLevy/learn-fpga/blob/master/FemtoRV/RTL/PROCESSOR/femtorv32_tachyon.v If you want to go faster with Quark in terms of cycles/instruction, which gives more gain in performance at the cost of size, try these: http://mecrisp.sourceforge.net/2022_02_11-femtorv32_quark_barrel_shifter.v |
Processor reads data from peripherals/memories one cycle after the RSTRB pulse or later when RBUSY goes low again. |
@BrunoLevy Shall I prepare a pull request to add Quark variant with barrel shifter and Quark variant with barrel shifter and two-cycle operation to maybe RTL/PROCESSOR/EXPERIMENTAL ? Individua is still WIP, but these are ready :-) |
Hi @Mecrisp, I've added it to RTL/PROCESSOR (all of them are experimental in fact :-) It's super cool, love it ! |
@Mecrisp, Regarding how to include your new versions, it's as you prefer (I can include them / update them when you tell me, but for sure pull requests can make it simpler/easier) |
I integrated
I suspect the Still I wonder if you are running I also tried to update |
Thanks for your efforts! Failing of almost all alignment edge cases is no surprise for me, as they are implemented in a very small, but quirky way internally. But I am really surprised on fails in auipc and jal. Could you elaborate on this and provide more data/insight/logs what exactly is going wrong, please? Maybe this is due to our limited ADDR_WIDTH; did you set it to 32 bits for testing? Default is 24 bits only on the address bus which is non-conformant. |
I looked at the code now, and the errors seem to be some kind of sign extension issue. |
Set parameter ADDR_WIDTH to 32. |
Yes, there are good chances it will fix the problem ! (We have created a possibly smaller internal address bus / address adder to save LUTs in the smallers FPGA like Ice40HX1K, but then it does not exactly follow the norm). |
I would assume the sign extension on the immediate is not properly transferred to the result. To add the test environment to FemtoRV I would need a step by step walk-through of current Verilator testbench scripts. I would probably have to modify the Verilator testbench code handling memory loads to use |
The configuration used for testbench is defined in this file: See also this file: To configure the firmware for this core:
Then
where Then to start simulation:
If everything goes well, you will see some raytracing computed by the RV32F core.
|
What I am asking for is some details on how to use system bus backpressure signals, so I can integrate quark into my FPGA synthesis and Verilator simulation (ISA tests). I will get through anyway but I like asking in advance so I have a communication channel if I get stuck.
I am developing my own RISC-V CPU in SystemVerilog, and I would like to compare it against
femtorv32_quark.v
.I used an unusual coding style for the instruction decoder which is causing logic consumption and timing issues. I am unable to predict which decoder change would provide improvements, what I think would be an optimization results in worse area and timing. I have not used one-hot decoding yet, since I forgot about it after many years working on designs where optimization was not a priority.
Comparing with quark which is codded with a more conventional style would help me understand and debug those issues.
Initially I am using FPGA vendor synthesis tools due to SystemVerilog requirements, but I will try yosys with SystemVerilog enhancements after I fix the major issues. My final target is to use open source tools for both FPGA and ASIC.
If you are still reading, I have a few questions and comments regarding the quark RTL, which I must say after reviewing it has a low WTF to code line ratio :).
I noticed separate adders are used for addition/subtraction and then the two results are multiplexed. There is also a separate adder for the load/store address. And as I remember a separate PC incrementer and an adder for branches/jumps.
Did you try different adder configurations to optimize timing and FPGA resource utilization?
While I have problems optimizing my unusual instruction decoder, I had some success optimizing adders and datapath multiplexers. My CPU is designed to execute all instructions in a single clock cycle without exceptions. At a minimum I need 2 adders, one for PC increment and branch, and another for everything else, ADD/SUB, branch compare, load/store address and jumps. I was able to see some good area vs. timing compromises by adding extra adders for each immediate value (branch, load, store).
I will run some of those optimization experiment on quark too and will give you some feedback.
The text was updated successfully, but these errors were encountered: