1. Overview
The NEORV32[1] is an open-source RISC-V compatible processor system that is intended as ready-to-go auxiliary processor within a larger SoC designs or as stand-alone custom / customizable microcontroller.
The system is highly configurable and provides optional common peripherals like embedded memories, timers, serial interfaces, general purpose IO ports and an external bus interface to connect custom IP like memories, NoCs and other peripherals. On-line and in-system debugging is supported by an OpenOCD/gdb compatible on-chip debugger accessible via JTAG.
Special focus is paid on execution safety to provide defined and predictable behavior at any time. Therefore, the CPU ensures that all memory access are acknowledged and no invalid/malformed instructions are executed. Whenever an unexpected situation occurs, the application code is informed via hardware exceptions.
The software framework of the processor comes with application makefiles, software libraries for all CPU and processor features, a bootloader, a runtime environment and several example programs - including a port of the CoreMark MCU benchmark and the official RISC-V architecture test suite. RISC-V GCC is used as default toolchain (prebuilt toolchains are also provided).
Check out the processor’s online User Guide that provides hands-on tutorial to get you started. |
The project’s change log is available in CHANGELOG.md in the root directory of the NEORV32 repository. Please also check out the Legal section. |
Structure
Links in this document are highlighted. |
1.1. Rationale
Why did you make this?
I am fascinated by processor and CPU architecture design: it is the magic frontier where software meets hardware. This project has started as something like a journey into this magic realm to understand how things actually work down on this very low level.
But there is more! When I started to dive into the emerging RISC-V ecosystem I felt overwhelmed by the complexity. As a beginner it is hard to get an overview - especially when you want to setup a minimal platform to tinker with: Which core to use? How to get the right toolchain? What features do I need? How does the booting work? How do I create an actual executable? How to get that into the hardware? How to customize things? Where to start???
So this project aims to provides a simple to understand and easy to use yet powerful and flexible platform that targets FPGA and RISC-V beginners as well as advanced users. Join me and us on this journey! 🙃
Why a soft-core processor?
As a matter of fact soft-core processors cannot compete with discrete or FPGA hard-macro processors in terms of performance, energy and size. But they do fill a niche in FPGA design space. For example, soft-core processors allow to implement the control flow part of certain applications (like communication protocol handling) using software like plain C. This provides high flexibility as software can be easily changed, re-compiled and re-uploaded again.
Furthermore, the concept of flexibility applies to all aspects of a soft-core processor. The user can add exactly the features that are required by the application: additional memories, custom interfaces, specialized IP and even user-defined instructions.
Why RISC-V?
RISC-V is a free and open ISA enabling a new era of processor innovation through open standard collaboration.
https://riscv.org/about/
I love the idea of open-source. Knowledge can help best if it is freely available. While open-source has already become quite popular in software, hardware projects still need to catch up. Admittedly, there has been quite a development, but mainly in terms of platforms and applications (so schematics, PCBs, etc.). Although processors and CPUs are the heart of almost every digital system, having a true open-source silicon is still a rarity. RISC-V aims to change that. Even it is just one approach, it helps paving the road for future development.
Furthermore, I welcome the community aspect of RISC-V. The ISA and everything beyond is developed with direct contact to the community: this includes businesses and professionals but also hobbyist, amateurs and people that are just curious. Everyone can join discussions and contribute to RISC-V in their very own way.
Finally, I really like the RISC-V ISA itself. It aims to be a clean, orthogonal and "intuitive" ISA that resembles with the basic concepts of RISC: simple yet effective.
Yet another RISC-V core? What makes it special?
The NEORV32 is not based on another RISC-V core. It was build entirely from ground up (just following the official ISA specs) having a different design goal in mind. The project does not intend to replace certain RISC-V cores or just beat existing ones like VexRISC in terms of performance or SERV in terms of size.
The project aims to provide another option in the RISC-V / soft-core design space with a different performance vs. size trade-off and a different focus: embrace concepts like documentation, platform-independence / portability, RISC-V compatibility, customization and ease of use. See the Project Key Features below.
Furthermore, the NEORV32 pays special focus on execution safety using Full Virtualization. The CPU aims to provide fall-backs for everything that could go wrong. This includes malformed instruction words, privilege escalations and even memory accesses that are checked for address space holes and deterministic response times from memory-mapped devices. Precise exceptions allow a defined and fully-synchronized state of the CPU at every time.
1.2. Project Key Features
-
open-source and documented; including user guides to get started
-
completely described in behavioral, platform-independent VHDL (yet platform-optimized modules are provided)
-
fully synchronous design, no latches, no gated clocks
-
small hardware footprint and high operating frequency for easy integration
-
NEORV32 CPU: 32-bit
rv32i
RISC-V CPU-
RISC-V compatibility: passes the official architecture tests
-
base architecture + privileged architecture (optional) + ISA extensions (optional)
-
rich set of customization options (ISA extensions, design goal: performance / area (/ energy), …)
-
aims to support Full Virtualization capabilities (CPU and SoC) to increase execution safety
-
official RISC-V open source architecture ID
-
-
NEORV32 Processor (SoC): highly-configurable full-scale microcontroller-like processor system
-
based on the NEORV32 CPU
-
optional serial interfaces (UARTs, TWI, SPI)
-
optional timers and counters (WDT, MTIME)
-
optional general purpose IO and PWM and native NeoPixel (c) compatible smart LED interface
-
optional embedded memories / caches for data, instructions and bootloader
-
optional external memory interface (Wishbone / AXI4-Lite) and stream link interface (AXI4-Stream) for custom connectivity
-
on-chip debugger compatible with OpenOCD and gdb
-
-
Software framework
-
GCC-based toolchain - prebuilt toolchains available; application compilation based on GNU makefiles
-
internal bootloader with serial user interface
-
core libraries for high-level usage of the provided functions and peripherals
-
runtime environment and several example programs
-
doxygen-based documentation of the software framework; a deployed version is available at https://stnolting.github.io/neorv32/sw/files.html
-
FreeRTOS port + demos available
-
For more in-depth details regarding the feature provided by he hardware see the according sections: NEORV32 Central Processing Unit (CPU) and NEORV32 Processor (SoC). |
1.3. Project Folder Structure
neorv32 - Project home folder │ ├docs - Project documentation │├datasheet - AsciiDoc sources for the NEORV32 data sheet │├figures - Figures and logos │├icons - Misc. symbols │├references - Data sheets and RISC-V specs. │└userguide - AsciiDoc sources for the NEORV32 user guide │ ├rtl - VHDL sources │├core - Core sources of the CPU & SoC ││└mem - SoC-internal memories (default architectures) │├processor_templates - Pre-configured SoC wrappers │├system_integration - System wrappers for advanced connectivity │└test_setups - Minimal test setup "SoCs" used in the User Guide │ ├setups - Example setups for various FPGAs, boards and toolchains │└... │ ├sim - Simulation files (see User Guide) │ └sw - Software framework ├bootloader - Sources of the processor-internal bootloader ├common - Linker script, crt0.S start-up code and central makefile ├example - Various example programs │└... ├isa-test │├riscv-arch-test - RISC-V spec. compatibility test framework (submodule) │└port-neorv32 - Port files for the official RISC-V architecture tests ├ocd_firmware - Source code for on-chip debugger's "park loop" ├openocd - OpenOCD on-chip debugger configuration files ├image_gen - Helper program to generate NEORV32 executables └lib - Processor core library ├include - Header files (*.h) └source - Source files (*.c)
1.4. VHDL File Hierarchy
All necessary VHDL hardware description files are located in the project’s rtl/core
folder. The top entity
of the entire processor including all the required configuration generics is neorv32_top.vhd
.
All core VHDL files from the list below have to be assigned to a new design library named neorv32 . Additional
files, like alternative top entities, can be assigned to any library.
|
neorv32_top.vhd - NEORV32 Processor top entity │ ├neorv32_fifo.vhd - General purpose FIFO component ├neorv32_package.vhd - Processor/CPU main VHDL package file │ ├neorv32_cpu.vhd - NEORV32 CPU top entity │├neorv32_cpu_alu.vhd - Arithmetic/logic unit ││├neorv32_cpu_cp_bitmanip.vhd - Bit-manipulation co-processor (B ext.) ││├neorv32_cpu_cp_fpu.vhd - Floating-point co-processor (Zfinx ext.) ││├neorv32_cpu_cp_muldiv.vhd - Mul/Div co-processor (M extension) ││└neorv32_cpu_cp_shifter.vhd - Bit-shift co-processor │├neorv32_cpu_bus.vhd - Bus interface + physical memory protection │├neorv32_cpu_control.vhd - CPU control, exception/IRQ system and CSRs ││└neorv32_cpu_decompressor.vhd - Compressed instructions decoder │└neorv32_cpu_regfile.vhd - Data register file │ ├neorv32_boot_rom.vhd - Bootloader ROM │└neorv32_bootloader_image.vhd - Bootloader boot ROM memory image ├neorv32_busswitch.vhd - Processor bus switch for CPU buses (I&D) ├neorv32_bus_keeper.vhd - Processor-internal bus monitor ├neorv32_cfs.vhd - Custom functions subsystem ├neorv32_debug_dm.vhd - on-chip debugger: debug module ├neorv32_debug_dtm.vhd - on-chip debugger: debug transfer module ├neorv32_dmem.entity.vhd - Processor-internal data memory (entity-only!) ├neorv32_gpio.vhd - General purpose input/output port unit ├neorv32_gptmr.vhd - General purpose 32-bit timer ├neorv32_icache.vhd - Processor-internal instruction cache ├neorv32_imem.entity.vhd - Processor-internal instruction memory (entity-only!) │└neor32_application_image.vhd - IMEM application initialization image ├neorv32_mtime.vhd - Machine system timer ├neorv32_neoled.vhd - NeoPixel (TM) compatible smart LED interface ├neorv32_pwm.vhd - Pulse-width modulation controller ├neorv32_slink.vhd - Stream link controller ├neorv32_spi.vhd - Serial peripheral interface controller ├neorv32_sysinfo.vhd - System configuration information memory ├neorv32_trng.vhd - True random number generator ├neorv32_twi.vhd - Two wire serial interface controller ├neorv32_uart.vhd - Universal async. receiver/transmitter ├neorv32_wdt.vhd - Watchdog timer ├neorv32_wishbone.vhd - External (Wishbone) bus interface ├neorv32_xirq.vhd - External interrupt controller │ ├mem/neorv32_dmem.default.vhd - _Default_ data memory (architecture-only) └mem/neorv32_imem.default.vhd - _Default_ instruction memory (architecture-only)
The processor-internal instruction and data memories (IMEM and DMEM) are split into two design files each:
a plain entity definition (neorv32_*mem.entity.vhd ) and the actual architecture definition
(mem/neorv32_*mem.default.vhd ). The *.default.vhd architecture definitions from rtl/core/mem provide a generic and
platform independent memory design that (should) infers embedded memory blocks. You can replace/modify the architecture
source file in order to use platform-specific features (like advanced memory resources) or to improve technology mapping
and/or timing.
|
1.5. FPGA Implementation Results
This chapter shows exemplary implementation results of the NEORV32 CPU and NEORV32 Processor.
1.5.1. CPU
Hardware version: |
|
Top entity: |
|
CPU | LEs | FFs | MEM bits | DSPs | fmax |
---|---|---|---|---|---|
|
806 |
359 |
1024 |
0 |
125 MHz |
|
1729 |
813 |
1024 |
0 |
124 MHz |
|
2269 |
1055 |
1024 |
0 |
124 MHz |
|
2501 |
1070 |
1024 |
0 |
124 MHz |
|
2511 |
1074 |
1024 |
0 |
124 MHz |
|
2521 |
1079 |
1024 |
0 |
124 MHz |
|
2522 |
1079 |
1024 |
0 |
122 MHz |
|
3807 |
1731 |
1024 |
7 |
116 MHz |
|
3974 |
1815 |
1024 |
7 |
116 MHz |
No HPM counters and no PMP regions were implemented for generating these results. |
The CPU provides further options to reduce the area footprint (for example by constraining the CPU-internal counter sizes) or to increase performance (for example by using a barrel-shifter; at cost of extra hardware). See section Processor Top Entity - Generics for more information. Also, take a look at the User Guide section Application-Specific Processor Configuration. |
1.5.2. Processor Modules
Hardware version: |
|
Top entity: |
|
Module | Description | LEs | FFs | MEM bits | DSPs |
---|---|---|---|---|---|
Boot ROM |
Bootloader ROM (4kB) |
2 |
1 |
32768 |
0 |
BUSKEEPER |
Processor-internal bus monitor |
9 |
6 |
0 |
0 |
BUSSWITCH |
Bus mux for CPU instr. and data interface |
63 |
8 |
0 |
0 |
CFS |
Custom functions subsystem[2] |
- |
- |
- |
- |
DMEM |
Processor-internal data memory (8kB) |
19 |
2 |
65536 |
0 |
DM |
On-chip debugger - debug module |
493 |
240 |
0 |
0 |
DTM |
On-chip debugger - debug transfer module (JTAG) |
254 |
218 |
0 |
0 |
GPIO |
General purpose input/output ports |
134 |
161 |
0 |
0 |
iCACHE |
Instruction cache (1x4 blocks, 256 bytes per block) |
2 21 |
156 |
8192 |
0 |
IMEM |
Processor-internal instruction memory (16kB) |
13 |
2 |
131072 |
0 |
MTIME |
Machine system timer |
319 |
167 |
0 |
0 |
NEOLED |
Smart LED Interface (NeoPixel/WS28128) [FIFO_depth=1] |
226 |
182 |
0 |
0 |
SLINK |
Stream link interface (2xRX, 2xTX, FIFO_depth=1) |
208 |
181 |
0 |
0 |
PWM |
Pulse_width modulation controller (4 channels) |
71 |
69 |
0 |
0 |
SPI |
Serial peripheral interface |
148 |
127 |
0 |
0 |
SYSINFO |
System configuration information memory |
14 |
11 |
0 |
0 |
TRNG |
True random number generator |
89 |
76 |
0 |
0 |
TWI |
Two-wire interface |
77 |
43 |
0 |
0 |
UART0/1 |
Universal asynchronous receiver/transmitter 0/1 |
183 |
132 |
0 |
0 |
WDT |
Watchdog timer |
53 |
43 |
0 |
0 |
WISHBONE |
External memory interface |
114 |
110 |
0 |
0 |
XIRQ |
External interrupt controller (32 channels) |
241 |
201 |
0 |
0 |
GPTMR |
General Purpose Timer |
153 |
107 |
0 |
0 |
1.5.3. Exemplary Setups
Check out the setups
folder (@GitHub: https://github.com/stnolting/neorv32/tree/master/setups),
which provides several demo setups for various FPGA boards and toolchains.
1.6. CPU Performance
The performance of the NEORV32 was tested and evaluated using the Core Mark CPU benchmark.
This benchmark focuses on testing the capabilities of the CPU core itself rather than the performance of the whole
system. The according sources can be found in the sw/example/coremark
folder.
Dhrystone
A simple port of the Dhrystone benchmark is also available in sw/example/dhrystone .
|
The resulting CoreMark score is defined as CoreMark iterations per second.
The execution time is determined via the RISC-V [m]cycle[h]
CSRs. The relative CoreMark score is
defined as CoreMark score divided by the CPU’s clock frequency in MHz.
HW version: |
|
Hardware: |
32kB int. IMEM, 16kB int. DMEM, no caches, 100MHz clock |
CoreMark: |
2000 iterations, MEM_METHOD is MEM_STACK |
Compiler: |
RISCV32-GCC 10.2.0 |
Compiler flags: |
default, see makefile |
CPU | CoreMark Score | CoreMarks/MHz | Average CPI |
---|---|---|---|
small ( |
33.89 |
0.3389 |
4.04 |
medium ( |
62.50 |
0.6250 |
5.34 |
performance ( |
95.23 |
0.9523 |
3.54 |
The "performance" CPU configuration uses the FAST_MUL_EN and FAST_SHIFT_EN options. |
The NEORV32 CPU is based on a multi-cycle architecture. Each instruction is executed in a sequence of several consecutive micro operations. |
The average CPI (cycles per instruction) depends on the instruction mix of a specific applications and also on
the available CPU extensions. The average CPI is computed by dividing the total number of required clock cycles
(only the timed core to avoid distortion due to IO wait cycles) by the number of executed instructions
([m]instret[h] CSRs).
|
More information regarding the execution time of each implemented instruction can be found in chapter Instruction Timing. |
2. NEORV32 Processor (SoC)
The NEORV32 Processor is based on the NEORV32 CPU. Together with common peripheral interfaces and embedded memories it provides a RISC-V-based full-scale microcontroller-like SoC platform.

Key Features
-
optional processor-internal data and instruction memories (DMEM/IMEM) + cache (iCACHE)
-
optional internal bootloader (BOOTROM) with UART console & SPI flash boot option
-
optional machine system timer (MTIME), RISC-V-compatible
-
optional two independent universal asynchronous receivers and transmitters (UART0, UART1) with optional hardware flow control (RTS/CTS) and optional RX/TX FIFOs
-
optional 8/16/24/32-bit serial peripheral interface controller (SPI) with 8 dedicated CS lines
-
optional two wire serial interface controller (TWI), compatible to the I²C standard
-
optional general purpose parallel IO port (GPIO), 64xOut, 64xIn
-
optional 32-bit external bus interface, Wishbone b4 / AXI4-Lite compatible (WISHBONE)
-
optional 32-bit stream link interface with up to 8 independent links, AXI4-Stream compatible (SLINK)
-
optional watchdog timer (WDT)
-
optional PWM controller with up to 60 channels & 8-bit duty cycle resolution (PWM)
-
optional ring-oscillator-based true random number generator (TRNG)
-
optional custom functions subsystem for custom co-processor extensions (CFS)
-
optional NeoPixel™/WS2812-compatible smart LED interface (NEOLED)
-
optional external interrupt controller with up to 32 channels (XIRQ)
-
optional general purpose 32-bit timer (GPTMR)
-
optional on-chip debugger with JTAG TAP (OCD)
-
bus keeper to monitor processor-internal bus transactions (BUSKEEPER)
-
system configuration information memory to check HW configuration via software (SYSINFO)
2.1. Processor Top Entity - Signals
The following table shows signals of the processor top entity (rtl/core/neorv32_top.vhd
).
The type of all signals is std_ulogic
or std_ulogic_vector
, respectively.
All input signals provide default values in case they are not explicitly assigned during instantiation.
For control signals the value L
(weak pull-down) is used. For serial and parallel data signals
the value U
(unknown) is used. Pulled-down signals will not cause "accidental" system crashes
since all control signals have defined level.
Signal | Width | Dir. | Function |
---|---|---|---|
Global Control |
|||
|
1 |
in |
global clock line, all registers triggering on rising edge |
|
1 |
in |
global reset, asynchronous, low-active |
JTAG Access Port for On-Chip Debugger (OCD) |
|||
|
1 |
in |
TAP reset, low-active (optional[3]) |
|
1 |
in |
serial clock |
|
1 |
in |
serial data input |
|
1 |
out |
serial data output[4] |
|
1 |
in |
mode select |
External Bus Interface (WISHBONE) |
|||
|
3 |
out |
tag (access type identifier) |
|
32 |
out |
destination address |
|
32 |
in |
write data |
|
32 |
out |
read data |
|
1 |
out |
write enable ('0' = read transfer) |
|
4 |
out |
byte enable |
|
1 |
out |
strobe |
|
1 |
out |
valid cycle |
|
1 |
out |
exclusive access request |
|
1 |
in |
transfer acknowledge |
|
1 |
in |
transfer error |
Advanced Memory Control Signals |
|||
|
1 |
out |
indicates an executed fence instruction |
|
1 |
out |
indicates an executed fencei instruction |
Stream Link Interface (SLINK) |
|||
|
8x32 |
out |
TX link n data |
|
8 |
out |
TX link n data valid |
|
8 |
in |
TX link n allowed to send |
|
8x32 |
in |
RX link n data |
|
8 |
in |
RX link n data valid |
|
8 |
out |
RX link n ready to receive |
General Purpose Inputs & Outputs (GPIO) |
|||
|
64 |
out |
general purpose parallel output |
|
64 |
in |
general purpose parallel input |
Primary Universal Asynchronous Receiver/Transmitter (UART0) |
|||
|
1 |
out |
UART0 serial transmitter |
|
1 |
in |
UART0 serial receiver |
|
1 |
out |
UART0 RX ready to receive new char |
|
1 |
in |
UART0 TX allowed to start sending |
Primary Universal Asynchronous Receiver/Transmitter (UART1) |
|||
|
1 |
out |
UART1 serial transmitter |
|
1 |
in |
UART1 serial receiver |
|
1 |
out |
UART1 RX ready to receive new char |
|
1 |
in |
UART1 TX allowed to start sending |
Serial Peripheral Interface Controller (SPI) |
|||
|
1 |
out |
SPI controller clock line |
|
1 |
out |
SPI serial data output |
|
1 |
in |
SPI serial data input |
|
8 |
out |
SPI dedicated chip select (low-active) |
Two-Wire Interface Controller (TWI) |
|||
|
1 |
inout |
TWI serial data line |
|
1 |
inout |
TWI serial clock line |
Pulse-Width Modulation Channels (PWM) |
|||
|
0..60 |
out |
pulse-width modulated channels |
Custom Functions Subsystem (CFS) |
|||
|
32 |
in |
custom CFS input signal conduit |
|
32 |
out |
custom CFS output signal conduit |
Smart LED Interface - NeoPixel™ compatible (NEOLED) |
|||
|
1 |
out |
asynchronous serial data output |
System time (MTIME) |
|||
|
64 |
in |
machine timer time (to |
|
64 |
out |
machine timer time from internal MTIME unit if processor-internal MTIME unit IS implemented |
External Interrupts (XIRQ) |
|||
|
32 |
in |
external interrupt requests (up to 32 channels) |
RISC-V Machine-Level CPU Interrupts |
|||
|
1 |
in |
machine timer interrupt13 (RISC-V), high-active |
|
1 |
in |
machine software interrupt (RISC-V), high-active |
|
1 |
in |
machine external interrupt (RISC-V), high-active |
2.2. Processor Top Entity - Generics
This is a list of all configuration generics of the NEORV32 processor top entity rtl/neorv32_top.vhd. The generic name is shown in orange, followed by the type in printed in black and concluded by the default value printed in light gray.
The NEORV32 generics allow to configure the system according to your needs. The generics are
used to control implementation of certain CPU extensions and peripheral modules and even allow to
optimize the system for certain design goals like minimal area or maximum performance. More information can be found in the user guides' section Application-Specific Processor Configuration. |
Privileged software can determine the actual CPU and processor configuration via the misa and the
SYSINFO registers.
|
If optional modules (like CPU extensions or peripheral devices) are not enabled the according circuitry will not be synthesized at all. Hence, the disabled modules do not increase area and power requirements and do not impact the timing. |
Not all configuration combinations are valid. The processor RTL code provides sanity checks to inform the user during synthesis/simulation if an invalid combination has been detected. |
Generic Description
The description of each generic provides the following summary:
Generic name |
type |
default value |
Description |
2.2.1. General
See section System Configuration Information Memory (SYSINFO) for more information.
CLOCK_FREQUENCY
CLOCK_FREQUENCY |
natural |
none |
The clock frequency of the processor’s |
INT_BOOTLOADER_EN
INT_BOOTLOADER_EN |
boolean |
false |
Implement the processor-internal boot ROM, pre-initialized with the default bootloader image when true. This will also change the processor’s boot address from the beginning of the instruction memory address space (default = 0x00000000) to the base address of the boot ROM. See section Boot Configuration for more information. |
HW_THREAD_ID
HW_THREAD_ID |
natural |
0 |
The hart ID of the CPU. Software can retrieve this value from the |
ON_CHIP_DEBUGGER_EN
ON_CHIP_DEBUGGER_EN |
boolean |
false |
Implement the on-chip debugger (OCD) and the CPU debug mode. See chapter On-Chip Debugger (OCD) for more information. |
2.2.2. RISC-V CPU Extensions
See section Instruction Sets and Extensions for more information. The configuration of the RISC-V main ISA extensions
(like M ) can be determined via the misa CSR. The configuration of ISA sub-extensions (like Zicsr ) and extension options
can be determined via memory-mapped registers of the System Configuration Information Memory (SYSINFO) module.
|
CPU_EXTENSION_RISCV_A
CPU_EXTENSION_RISCV_A |
boolean |
false |
Implement atomic memory access operations when true.
See section |
CPU_EXTENSION_RISCV_B
CPU_EXTENSION_RISCV_B |
boolean |
false |
Implement the |
CPU_EXTENSION_RISCV_C
CPU_EXTENSION_RISCV_C |
boolean |
false |
Implement compressed instructions (16-bit) when true. Compressed instructions can reduce program code
size by approx. 30%. See section |
CPU_EXTENSION_RISCV_E
CPU_EXTENSION_RISCV_E |
boolean |
false |
Implement the embedded CPU extension (only implement the first 16 data registers) when true. This reduces embedded memory
requirements for the register file. See section |
CPU_EXTENSION_RISCV_M
CPU_EXTENSION_RISCV_M |
boolean |
false |
Implement hardware accelerators for integer multiplication and division instructions when true.
If this extensions is not enabled, multiplication and division operations (not instructions) will be computed entirely in software.
If only a hardware multiplier is required use the CPU_EXTENSION_RISCV_Zmmul extension. Multiplication can also be mapped
to DSP slices via the FAST_MUL_EN generic.
See section |
CPU_EXTENSION_RISCV_U
CPU_EXTENSION_RISCV_U |
boolean |
false |
Implement less-privileged user mode when true.
See section |
CPU_EXTENSION_RISCV_Zfinx
CPU_EXTENSION_RISCV_Zfinx |
boolean |
false |
Implement the 32-bit single-precision floating-point extension (using integer registers) when true.
See section |
CPU_EXTENSION_RISCV_Zicsr
CPU_EXTENSION_RISCV_Zicsr |
boolean |
true |
Implement the control and status register (CSR) access instructions when true. Note: When this option is
disabled, the complete privileged architecture / trap system will be excluded from synthesis. Hence, no interrupts, no exceptions and
no machine information will be available.
See section |
CPU_EXTENSION_RISCV_Zicntr
CPU_EXTENSION_RISCV_Zicntr |
boolean |
true |
Implement the basic CPU counter CSRs ( |
CPU_EXTENSION_RISCV_Zihpm
CPU_EXTENSION_RISCV_Zihpm |
boolean |
false |
Implement hardware performance monitor CSRs when true.
Enabling this extension will set the SYSINFO_CPU_ZIHPM flag in the |
CPU_EXTENSION_RISCV_Zifencei
CPU_EXTENSION_RISCV_Zifencei |
boolean |
false |
Implement the instruction fetch synchronization instruction |
CPU_EXTENSION_RISCV_Zmmul
CPU_EXTENSION_RISCV_Zmmul |
boolean |
false |
Implement integer multiplication-only instructions when true. This is a sub-extension of the |
2.2.3. Extension Options
See section Instruction Sets and Extensions for more information.
FAST_MUL_EN
FAST_MUL_EN |
boolean |
false |
When this generic is enabled, the multiplier of the |
FAST_SHIFT_EN
FAST_SHIFT_EN |
boolean |
false |
If this generic is set true the shifter unit of the CPU’s ALU is implemented as fast barrel shifter (requiring
more hardware resources but completing within two clock cycles). If it is set false, the CPU uses a serial shifter
that only performs a single bit shift per cycle (requiring less hardware resources, but requires up to 32 clock
cycles to complete - depending on shift amount). Note that this option also implements barrel shifters for all
shift-related operations of the |
CPU_CNT_WIDTH
CPU_CNT_WIDTH |
natural |
64 |
This generic configures the total size of the CPU’s |
CPU_IPB_ENTRIES
CPU_IPB_ENTRIES |
natural |
2 |
This generic configures the number of entries in the CPU’s instruction prefetch buffer (a FIFO). The value has to be a power of two and has to be greater than zero. Long linear sequences of code can benefit from an increased IPB size. |
2.2.4. Physical Memory Protection (PMP)
See section PMP
Physical Memory Protection for more information.
PMP_NUM_REGIONS
PMP_NUM_REGIONS |
natural |
0 |
Total number of implemented protections regions (0..64). If this generics is zero no physical memory
protection logic will be implemented at all. Setting PMP_NUM_REGIONS_ > 0 will set the SYSINFO_CPU_PMP flag
in the |
PMP_MIN_GRANULARITY
PMP_MIN_GRANULARITY |
natural |
64*1024 |
Minimal region granularity in bytes. Has to be a power of two. Has to be at least 8 bytes. |
2.2.5. Hardware Performance Monitors (HPM)
These generics allow to customize the Zihpm
ISA extension. Note that the following generics are ignored if the
CPU_EXTENSION_RISCV_Zihpm generic is false. See section Zihpm
Hardware Performance Monitors for more information.
HPM_NUM_CNTS
HPM_NUM_CNTS |
natural |
0 |
Total number of implemented hardware performance monitor counters (0..29). If this generics is zero, no hardware performance monitor logic will be implemented at all. |
HPM_CNT_WIDTH
HPM_CNT_WIDTH |
natural |
40 |
This generic defines the total LSB-aligned size of each HPM counter ( |
2.2.6. Internal Instruction Memory
See sections Address Space and Instruction Memory (IMEM) for more information.
MEM_INT_IMEM_EN
MEM_INT_IMEM_EN |
boolean |
false |
Implement processor internal instruction memory (IMEM) when true. |
MEM_INT_IMEM_SIZE
MEM_INT_IMEM_SIZE |
natural |
16*1024 |
Size in bytes of the processor internal instruction memory (IMEM). Has no effect when MEM_INT_IMEM_EN is false. |
2.2.7. Internal Data Memory
See sections Address Space and Data Memory (DMEM) for more information.
MEM_INT_DMEM_EN
MEM_INT_DMEM_EN |
boolean |
false |
Implement processor internal data memory (DMEM) when true. |
MEM_INT_DMEM_SIZE
MEM_INT_DMEM_SIZE |
natural |
8*1024 |
Size in bytes of the processor-internal data memory (DMEM). Has no effect when MEM_INT_DMEM_EN is false. |
2.2.8. Internal Cache Memory
See section Processor-Internal Instruction Cache (iCACHE) for more information.
ICACHE_EN
ICACHE_EN |
boolean |
false |
Implement processor internal instruction cache when true. Note: if the setup only uses processor-internal data and instruction memories there is not point of implementing the i-cache. |
ICACHE_NUM_BLOCK
ICACHE_NUM_BLOCKS |
natural |
4 |
Number of blocks (cache "pages" or "lines") in the instruction cache. Has to be a power of two. Has no effect when [_icache_dmem_en] is false. |
ICACHE_BLOCK_SIZE
ICACHE_BLOCK_SIZE |
natural |
64 |
Size in bytes of each block in the instruction cache. Has to be a power of two. Has no effect when [_icache_dmem_en] is false. |
ICACHE_ASSOCIATIVITY
ICACHE_ASSOCIATIVITY |
natural |
1 |
Associativity (= number of sets) of the instruction cache. Has to be a power of two. Allowed configurations:
|
2.2.9. External Memory Interface
See sections Address Space and Processor-External Memory Interface (WISHBONE) (AXI4-Lite) for more information.
MEM_EXT_EN
MEM_EXT_EN |
boolean |
false |
Implement external bus interface (WISHBONE) when true. |
MEM_EXT_TIMEOUT
MEM_EXT_TIMEOUT |
natural |
255 |
Clock cycles after which a pending external bus access will auto-terminate and raise a bus fault exception. If set to zero, there will be no auto-timeout and no bus fault exception (might permanently stall system!). |
MEM_EXT_PIPE_MODE
MEM_EXT_PIPE_MODE |
boolean |
false |
Use standard ("classic") Wishbone protocol for external bus when false. Use pipelined Wishbone protocol when true. |
MEM_EXT_BIG_ENDIAN
MEM_EXT_BIG_ENDIAN |
boolean |
false |
Use BIG endian interface for external bus when true. Use little endian interface when false. |
MEM_EXT_ASYNC_RX
MEM_EXT_ASYNC_RX |
boolen |
false |
By default, MEM_EXT_ASYNC_RX = false implements a registered read-back path (RX) for incoming data in the bus interface in order to shorten the critical path. By setting MEM_EXT_ASYNC_RX = true an asynchronous ("direct") read-back path is implemented reducing access latency by one cycle but eventually increasing the critical path. |
2.2.10. Stream Link Interface
See section Stream Link Interface (SLINK) for more information.
SLINK_NUM_TX
SLINK_NUM_TX |
natural |
0 |
Number of TX (send) links to implement. Valid values are 0..8. |
SLINK_NUM_RX
SLINK_NUM_RX |
natural |
0 |
Number of RX (receive) links to implement. Valid values are 0..8. |
SLINK_TX_FIFO
SLINK_TX_FIFO |
natural |
1 |
Internal FIFO depth for all implemented TX links. Valid values are 1..32k and have to be a power of two. |
SLINK_RX_FIFO
SLINK_RX_FIFO |
natural |
1 |
Internal FIFO depth for all implemented RX links. Valid values are 1..32k and have to be a power of two. |
2.2.11. External Interrupt Controller
See section External Interrupt Controller (XIRQ) for more information.
XIRQ_NUM_CH
XIRQ_NUM_CH |
natural |
0 |
Number of external interrupt channels o implement. Valid values are 0..32. |
XIRQ_TRIGGER_TYPE
XIRQ_TRIGGER_TYPE |
std_ulogic_vector(31 downto 0) |
0xFFFFFFFF |
Interrupt trigger type configuration (one bit for each IRQ channel): |
XIRQ_TRIGGER_POLARITY
XIRQ_TRIGGER_POLARITY |
std_ulogic_vector(31 downto 0) |
0xFFFFFFFF |
Interrupt trigger polarity configuration (one bit for each IRQ channel): |
2.2.12. Processor Peripheral/IO Modules
See section Processor-Internal Modules for more information.
IO_GPIO_EN
IO_GPIO_EN |
boolean |
false |
Implement general purpose input/output port unit (GPIO) when true. See section General Purpose Input and Output Port (GPIO) for more information. |
IO_MTIME_EN
IO_MTIME_EN |
boolean |
false |
Implement machine system timer (MTIME) when true. See section Machine System Timer (MTIME) for more information. |
IO_UART0_EN
IO_UART0_EN |
boolean |
false |
Implement primary universal asynchronous receiver/transmitter (UART0) when true. See section Primary Universal Asynchronous Receiver and Transmitter (UART0) for more information. |
IO_UART0_RX_FIFO
IO_UART0_RX_FIFO |
natural |
1 |
UART0 receiver FIFO depth, has to be a power of two, minimum value is 1 (implementing simple double-buffering). See section Primary Universal Asynchronous Receiver and Transmitter (UART0) for more information. |
IO_UART0_TX_FIFO
IO_UART0_TX_FIFO |
natural |
1 |
UART0 transmitter FIFO depth, has to be a power of two, minimum value is 1 (implementing simple double-buffering). See section Primary Universal Asynchronous Receiver and Transmitter (UART0) for more information. |
IO_UART1_EN
IO_UART1_EN |
boolean |
false |
Implement secondary universal asynchronous receiver/transmitter (UART1) when true. See section Secondary Universal Asynchronous Receiver and Transmitter (UART1) for more information. |
IO_UART1_RX_FIFO
IO_UART1_RX_FIFO |
natural |
1 |
UART1 receiver FIFO depth, has to be a power of two, minimum value is 1 (implementing simple double-buffering). See section Primary Universal Asynchronous Receiver and Transmitter (UART0) for more information. |
IO_UART1_TX_FIFO
IO_UART1_TX_FIFO |
natural |
1 |
UART1 transmitter FIFO depth, has to be a power of two, minimum value is 1 (implementing simple double-buffering). See section Primary Universal Asynchronous Receiver and Transmitter (UART0) for more information. |
IO_SPI_EN
IO_SPI_EN |
boolean |
false |
Implement serial peripheral interface controller (SPI) when true. See section Serial Peripheral Interface Controller (SPI) for more information. |
IO_TWI_EN
IO_TWI_EN |
boolean |
false |
Implement two-wire interface controller (TWI) when true. See section Two-Wire Serial Interface Controller (TWI) for more information. |
IO_PWM_NUM_CH
IO_PWM_NUM_CH |
natural |
0 |
Number of pulse-width modulation (PWM) channels (0..60) to implement. The PWM controller is not implemented if zero. See section Pulse-Width Modulation Controller (PWM) for more information. |
IO_WDT_EN
IO_WDT_EN |
boolean |
false |
Implement watchdog timer (WDT) when true. See section Watchdog Timer (WDT) for more information. |
IO_TRNG_EN
IO_TRNG_EN |
boolean |
false |
Implement true-random number generator (TRNG) when true. See section True Random-Number Generator (TRNG) for more information. |
IO_CFS_EN
IO_CFS_EN |
boolean |
false |
Implement custom functions subsystem (CFS) when true. See section Custom Functions Subsystem (CFS) for more information. |
IO_CFS_CONFIG
IO_CFS_CONFIG |
std_ulogic_vector(31 downto 0) |
0x"00000000" |
This is a "conduit" generic that can be used to pass user-defined CFS implementation flags to the custom functions subsystem entity. See section Custom Functions Subsystem (CFS) for more information. |
IO_CFS_IN_SIZE
IO_CFS_IN_SIZE |
positive |
32 |
Defines the size of the CFS input signal conduit ( |
IO_CFS_OUT_SIZE
IO_CFS_OUT_SIZE |
positive |
32 |
Defines the size of the CFS output signal conduit ( |
IO_NEOLED_EN
IO_NEOLED_EN |
boolean |
false |
Implement smart LED interface (WS2812 / NeoPixel™-compatible) (NEOLED) when true. See section Smart LED Interface (NEOLED) for more information. |
IO_NEOLED_TX_FIFO
IO_NEOLED_TX_FIFO |
natural |
1 |
TX FIFO depth of the the NEOLED module. Minimal value is 1, maximal value is 32k, has to be a power of two. See section Smart LED Interface (NEOLED) for more information. |
IO_GPTMR_EN
IO_GPTMR_EN |
boolean |
false |
Implement general purpose 32-bit timer (GPTMR) when true. See section General Purpose Timer (GPTMR) for more information. |
2.3. Processor Interrupts
The NEORV32 Processor provides several interrupt request signals (IRQs) for custom platform use.
2.3.1. RISC-V Standard Interrupts
The processor setup features the standard machine-level RISC-V interrupt lines for "machine timer interrupt", "machine software interrupt" and "machine external interrupt". Their usage is defined by the RISC-V privileged architecture specifications. However, bare-metal system can also repurpose these interrupts. See CPU section Traps, Exceptions and Interrupts for more information.
Top signal | Width | Description |
---|---|---|
|
1 |
Machine timer interrupt from processor-external MTIME unit. This IRQ is only available if the processor-internal MTIME unit is not used (IO_MTIME_EN = false). |
|
1 |
Machine software interrupt. This interrupt is used for inter-processor interrupts in multi-core systems. However, it can also be used for any custom purpose. |
|
1 |
Machine external interrupt. This interrupt is used for any processor-external interrupt source (like a platform interrupt controller). |
Trigger type
The fast interrupt request channel trigger on high-level and have to stay asserted until explicitly acknowledged
by the software (for example by writing to a specific memory-mapped register). Hence, pending interrupts remain pending
as long as the interrupt-causing device’s state fulfills it’s interrupt condition(s).
|
2.3.2. Platform External Interrupts
Top signal | Width | Description |
---|---|---|
|
up to 32 |
External platform interrupts (user-defined). |
The processor provides an optional interrupt controller for up to 32 user-defined external interrupts (see section External Interrupt Controller (XIRQ)). These external IRQs are mapped to a single CPU fast interrupt request so a software handler is required to differentiate / prioritize these interrupts.
Trigger type
The trigger for these interrupt can be defined via generics. See section
External Interrupt Controller (XIRQ) for more information. Depending on the trigger type, users can
implement custom acknowledge mechanisms. All external interrupts are mapped to a single processor-internal
fast interrupt request (see below).
|
2.3.3. NEORV32-Specific Fast Interrupt Requests
As part of the custom/NEORV32-specific CPU extensions, the CPU features 16 fast interrupt request signals
(FIRQ0
- FIRQ15
). These are reserved for processor-internal modules only (for example for the communication
interfaces to signal "available incoming data" or "ready to send new data").
The mapping of the 16 FIRQ channels is shown in the following table (the channel number also corresponds to the according FIRQ priority; 0 = highest, 15 = lowest):
Channel | Source | Description |
---|---|---|
0 |
watchdog timeout interrupt |
|
1 |
custom functions subsystem (CFS) interrupt (user-defined) |
|
2 |
UART0 data received interrupt (RX complete) |
|
3 |
UART0 sending done interrupt (TX complete) |
|
4 |
UART1 data received interrupt (RX complete) |
|
5 |
UART1 sending done interrupt (TX complete) |
|
6 |
SPI transmission done interrupt |
|
7 |
TWI transmission done interrupt |
|
8 |
External interrupt controller interrupt |
|
9 |
NEOLED TX buffer interrupt |
|
10 |
RX data buffer interrupt |
|
11 |
TX data buffer interrupt |
|
12 |
General purpose timer interrupt |
|
13:15 |
- |
reserved, will never fire |
Trigger type
The fast interrupt request channel trigger on high-level and have to stay asserted until explicitly acknowledged
by the software (for example by writing to a specific memory-mapped register). Hence, pending interrupts remain pending
as long as the interrupt-causing device’s state fulfills it’s interrupt condition(s).
|
2.4. Address Space
The NEORV32 Processor provides a 32-bit / 4GB (physical) address space By default, this address space is divided into five main regions:
-
Instruction address space - memory address space for instructions (=code) and constants. A configurable section of this address space is used by the internal/external instruction memory (MEM_INT_IMEM_SIZE for the internal IMEM).
-
Data address space - memory address space for application runtime data (heap, stack, etc.). A configurable section of this address space is used by the internal/external data memory (MEM_INT_DMEM_SIZE for the internal DMEM).
-
Bootloader address space. A fixed section of this address space is used by the internal bootloader memory (BOOTLDROM).
-
On-Chip Debugger address space. This fixed section is entirely used by the processor’s On-Chip Debugger (OCD).
-
IO/peripheral address space. Also a fixed section used for the processor-internal memory-mapped IO/peripheral devices (e.g., UART).

2.4.1. CPU Data and Instruction Access
The CPU can access all of the 4GB address space from the instruction fetch interface (I) and also from the
data access interface (D). These two CPU interfaces are multiplexed by a simple bus switch
(rtl/core/neorv32_busswitch.vhd
) into a single processor-internal bus. All processor-internal
memories, peripherals and also the external memory interface are connected to this bus. Hence, both CPU
interfaces (instruction fetch & data access) have access to the same (identical) address space making the
setup a modified von-Neumann architecture.

The internal processor bus might appear as bottleneck. In order to reduce traffic jam on this bus (when instruction fetch and data interface access the bus at the same time) the instruction fetch of the CPU is equipped with a prefetch buffer. Instruction fetches can be further buffered using the i-cache. Furthermore, data accesses (loads and stores) have higher priority than instruction fetch accesses. |
Please note that all processor-internal components including the peripheral/IO devices can also be accessed from programs running in less-privileged user mode. For example, if the system relies on a periodic interrupt from the MTIME timer unit, user-level programs could alter the MTIME configuration corrupting this interrupt. This kind of security issues can be compensated using the PMP system (see [_machine_physical_memory_protection]). |
2.4.2. Address Space Layout
The general address space layout consists of two main configuration constants: ispace_base_c
defining
the base address of the instruction memory address space and dspace_base_c
defining the base address of
the data memory address space. Both constants are defined in the NEORV32 VHDL package file
rtl/core/neorv32_package.vhd
:
-- Architecture Configuration ----------------------------------------------------
-- ----------------------------------------------------------------------------------
constant ispace_base_c : std_ulogic_vector(31 downto 0) := x"00000000";
constant dspace_base_c : std_ulogic_vector(31 downto 0) := x"80000000";
The default configuration assumes the instruction memory address space starting at address 0x00000000 and the data memory address space starting at 0x80000000. Both values can be modified for a specific setup and the address space may overlap or can be completely identical. Make sure that both base addresses are aligned to a 4-byte boundary.
The base address of the internal bootloader (at 0xFFFF0000) and the internal IO region (at 0xFFFFFE00) for peripheral devices are also defined in the package and are fixed. These address regions cannot not be used for other applications - even if the bootloader or all IO devices are not implemented - without modifying the core’s hardware sources. |
2.4.3. Physical Memory Attributes
The processor setup defines fixed attributes for the four processor-internal address space regions. Accessing a memory region in a way that violates any of these attributes will raise an according access exception..
-
r
- read access (from CPU data access interface, "loads") -
w
- write access (from CPU data access interface, "stores") -
x
- execute access (from CPU instruction fetch interface) -
a
- atomic access (from CPU data access interface) -
8
- byte (8-bit)-accessible (when writing) -
16
- half-word (16-bit)-accessible (when writing) -
32
- word (32-bit)-accessible (when writing)
Read accesses (loads and instruction fetches) can always access data in
word, half-word (for instruction fetch only if C extension is enabled)
and byte (not for instruction fetch) quantities (requiring an accordingly aligned address).
|
The following table shows the default hardware-defined physical memory attributes of each main address space region. Additional user-defined attributes (for example certain read/write/execute rights for specific address space regions) can be provided using the RISC-V [_machine_physical_memory_protection]. |
# | Region | Base address | Size | Attributes |
---|---|---|---|---|
5 |
IO/peripheral devices |
0xfffffe00 |
512 bytes |
|
4 |
On-chip debugger |
0xfffff800 |
512 bytes |
|
3 |
Bootloader ROM |
0xffff0000 |
up to 32kB |
|
2 |
DMEM |
0x80000000 |
up to "2GB" |
|
1 |
IMEM |
0x00000000 |
up to 2GB |
|
2.4.4. Memory Configuration
The NEORV32 Processor was designed to provide maximum flexibility for the memory configuration. The processor can populate the instruction address space and/or the data address space with internal memories for instructions (IMEM) and data (DMEM). Processor external memories can be used as an alternative or even in combination with the internal ones. The figure below show some exemplary memory configurations.

Internal Memories
The processor-internal memories (Instruction Memory (IMEM) and Data Memory (DMEM)) are enabled (=implemented) via the MEM_INT_IMEM_EN and MEM_INT_DMEM_EN generics. Their sizes are configures via the according MEM_INT_IMEM_SIZE and MEM_INT_DMEM_SIZE generics.
If the processor-internal IMEM is implemented, it is located right at the base address of the instruction
address space (default ispace_base_c
= 0x00000000). Vice versa, the processor-internal data memory is
located right at the beginning of the data address space (default dspace_base_c
= 0x80000000) when
implemented.
The default processor setup uses only internal memories. |
If the IMEM (internal or external) is less than the (default) maximum size (2GB), there is a "dead address space" between it and the DMEM. This provides an additional safety feature since data corrupting scenarios like stack overflow cannot directly corrupt the content of the IMEM: any access to the "dead address space" in between will raise an exception that can be caught by the runtime environment. |
External Memories
If external memories (or further IP modules) shall be connected via the processor’s external bus interface, the interface has to be enabled via MEM_EXT_EN generic (=true). More information regarding this interface can be found in section Processor-External Memory Interface (WISHBONE) (AXI4-Lite).
Any CPU access (data or instructions), which does not fulfill at least one of the following conditions, is forwarded via the processor’s bus interface to external components:
-
access to the processor-internal IMEM and processor-internal IMEM is implemented
-
access to the processor-internal DMEM and processor-internal DMEM is implemented
-
access to the bootloader ROM and beyond → addresses >= BOOTROM_BASE (default 0xFFFF0000) will never be forwarded to the external memory interface
If no (or not all) processor-internal memories are implemented, the according base addresses are mapped to external memories.
For example, if the processor-internal IMEM is not implemented (MEM_INT_IMEM_EN = false), the processor will forward
any access to the instruction address space (starting at ispace_base_c
) via the external bus interface to the external
memory system.
If the external interface is deactivated, any access exceeding the internal memory address space (instruction, data, bootloader) or the internal peripheral address space will trigger a bus access fault exception. |
2.4.5. Boot Configuration
Due to the flexible memory configuration concept, the NEORV32 Processor provides several different boot concepts. The figure below shows the exemplary concepts for the two most common boot scenarios.

The configuration of internal or external data memory (DMEM; MEM_INT_DMEM_EN = true / false) is not further relevant for the boot configuration itself. Hence, it is not further illustrated here. |
There are two general boot scenarios: Indirect Boot (1a and 1b) and Direct Boot (2a and 2b) configured via the INT_BOOTLOADER_EN generic If this generic is set true the indirect boot scenario is used. This is also the default boot configuration of the processor. If INT_BOOTLOADER_EN is set false the direct boot scenario is used.
Please note that the provided boot scenarios are just exemplary setups that (should) fit most common requirements. Much more sophisticated boot scenarios are possible by combining internal and external memories. For example, the default internal bootloader could be used as first-level bootloader that loads (from extern SPI flash) a second-level bootloader that is placed and execute in internal IMEM. This second-level bootloader could then fetch the actual application and store it to external data memory and transfers CPU control to that. |
Indirect Boot
The indirect boot scenarios 1a and 1b use the processor-internal Bootloader. This general setup is enabled by setting the INT_BOOTLOADER_EN generic to true, which will implement the processor-internal Bootloader ROM (BOOTROM). This read-only memory is pre-initialized during synthesis with the default bootloader firmware.
The bootloader provides several options to upload an executable (via UART or from external SPI flash) and store it to the instruction address space so the CPU can execute it. Boot scenario 1a uses the processor-internal IMEM (MEM_INT_IMEM_EN = true). This scenario implements the internal Instruction Memory (IMEM) as non-initialized RAM so the bootloader can write the actual executable to it.
Boot scenario 1b uses a processor-external IMEM (MEM_INT_IMEM_EN = false) that is connected via the processor’s bus interface. In this scenario the internal Instruction Memory (IMEM) is not implemented at all and the bootloader will write the executable to the processor-external memory.
Direct Boot
The direct boot scenarios 2a and 2b do not use the processor-internal bootloader. Hence, the INT_BOOTLOADER_EN generic is set false. In this configuration the Bootloader ROM (BOOTROM) is not implemented at all and the CPU will directly begin executing code from the instruction address space after reset. A "pre-initialization mechanism is required in order to provide an executable in memory.
Boot scenario 2a uses the processor-internal IMEM (MEM_INT_IMEM_EN = true) that is implemented as read-only memory in this scenario. It is pre-initialized (by the bitstream) with the actual application executable.
In contrast, boot scenario 2b uses a processor-external IMEM (MEM_INT_IMEM_EN = false). In this scenario the system designer is responsible for providing a initialized external memory that contains the actual application to be executed.
2.5. Processor-Internal Modules
Basically, the processor is a SoC consisting of the NEORV32 CPU, peripheral/IO devices, embedded memories, an external memory interface and a bus infrastructure to interconnect all units. Additionally, the system implements an internal reset generator and a global clock generator/divider.
Internal Reset Generator
Most processor-internal modules - except for the CPU and the watchdog timer - do not have a dedicated
reset signal. However, all devices can be reset by software by clearing the corresponding unit’s control
register. The automatically included application start-up code (crt0.S
) will perform a software-reset of all
modules to ensure a clean system reset state.
The hardware reset signal of the processor can either be
triggered via the external reset pin (rstn_i
, low-active) or by the internal watchdog timer (if implemented).
Before the external reset signal is applied to the system, it is extended to have a minimal duration of eight
clock cycles.
Internal Clock Divider
An internal clock divider generates 8 clock signals derived from the processor’s main clock input clk_i
.
These derived clock signals are not actual clock signals. Instead, they are derived from a simple counter and
are used as "clock enable" signal by the different processor modules. Thus, the whole design operates using
only the main clock signal (single clock domain). Some of the processor peripherals like the Watchdog or the
UARTs can select one of the derived clock enabled signals for their internal operation. If none of the
connected modules require a clock signal from the divider, it is automatically deactivated to reduce dynamic
power.
The peripheral devices, which feature a time-based configuration, provide a three-bit prescaler select in their
according control register to select one out of the eight available clocks. The mapping of the prescaler select
bits to the actually obtained clock are shown in the table below. Here, f represents the processor main clock
from the top entity’s clk_i
signal.
Prescaler bits: |
|
|
|
|
|
|
|
|
Resulting clock: |
f/2 |
f/4 |
f/8 |
f/64 |
f/128 |
f/1024 |
f/2048 |
f/4096 |
Peripheral / IO Devices
The processor-internal peripheral/IO devices are located at the end of the 32-bit address space at base address 0xFFFFFE00. A region of 512 bytes is reserved for this devices. Hence, all peripheral/IO devices are accessed using a memory-mapped scheme. A special linker script as well as the NEORV32 core software library abstract the specific memory layout for the user.
The base address of each component/module has to be aligned to the total size of the module’s occupied address space! The occupied address space has to be a power of two (minimum 4 bytes)! Address spaces must not overlap! |
When accessing an IO device that hast not been implemented (via the according IO_x_EN generic), a load/store access fault exception is triggered. |
The peripheral/IO devices can only be written in full-word mode (i.e. 32-bit). Byte or half-word (8/16-bit) writes will trigger a store access fault exception. Read accesses are not size constrained. Processor-internal memories as well as modules connected to the external memory interface can still be written with a byte-wide granularity. |
You should use the provided core software library to interact with the peripheral devices. This prevents incompatibilities with future versions, since the hardware driver functions handle all the register and register bit accesses. |
Most of the IO devices do not have a hardware reset. Instead, the devices are reset via software by
writing zero to the unit’s control register. A general software-based reset of all devices is done by the
application start-up code crt0.S .
|
Interrupts of Processor-Internal Modules
Most peripheral/IO devices provide some kind of interrupt (for example to signal available incoming data). These interrupts are entirely mapped to the CPU’s Custom Fast Interrupt Request Lines. Note that all these interrupt lines are high-active and are permanently triggered until the IRQ-causing condition is resolved.
Nomenclature for the Peripheral / IO Devices Listing
Each peripheral device chapter features a register map showing accessible control and data registers of the
according device including the implemented control and status bits. C-language code can directly interact with these
registers via pre-defined struct
. Each IO/peripheral module provides a unique struct
. All accessible
interface registers of this module are defined as members of this struct
. The pre-defined struct
are defined int the
main processor core library include file sw/lib/include/neorv32.h
.
The naming scheme of these low-level hardware access structs is NEORV32_<module_name>.<register_name>
.
struct
// Read from SYSINFO "CLK" register
uint32_t temp = NEORV32_SYSINFO.CLK;
The registers and/or register bits, which can be accessed directly using plain C-code, are marked with a "[C]". Not all registers or register bits can be arbitrarily read/written. The following read/write access types are available:
-
r/w
registers / bits can be read and written -
r/-
registers / bits are read-only; any write access to them has no effect -
-/w
these registers / bits are write-only; they auto-clear in the next cycle and are always read as zero
Bits / registers that are not listed in the register map tables are not (yet) implemented. These registers / bits are always read as zero. A write access to them has no effect, but user programs should only write zero to them to keep compatible with future extension. |
When writing to read-only registers, the access is nevertheless acknowledged, but no actual data is written. When reading data from a write-only register the result is undefined. |
2.5.1. Instruction Memory (IMEM)
Hardware source file(s): |
neorv32_imem.entity.vhd |
entity-only definition |
mem/neorv32_imem.default.vhd |
default platform-agnostic memory architecture |
|
Software driver file(s): |
none |
implicitly used |
Top entity port: |
none |
|
Configuration generics: |
MEM_INT_IMEM_EN |
implement processor-internal IMEM when true |
MEM_INT_IMEM_SIZE |
IMEM size in bytes |
|
INT_BOOTLOADER_EN |
use internal bootloader when true (implements IMEM as uninitialized RAM) |
|
CPU interrupts: |
none |
The actual IMEM is split into two design files: a plain entity definition (neorv32_imem.entity.vhd ) and the actual
architecture definition (mem/neorv32_imem.default.vhd ). This default architecture provides a generic and
platform independent memory design that (should) infers embedded memory block. You can replace/modify the architecture
source file in order to use platform-specific features (like advanced memory resources) or to improve technology mapping
and/or timing.
|
Implementation of the processor-internal instruction memory is enabled via the processor’s
MEM_INT_IMEM_EN generic. The size in bytes is defined via the MEM_INT_IMEM_SIZE generic. If the
IMEM is implemented, the memory is mapped into the instruction memory space and located right at the
beginning of the instruction memory space (default ispace_base_c
= 0x00000000).
By default, the IMEM is implemented as RAM, so the content can be modified during run time. This is required when using a bootloader that can update the content of the IMEM at any time. If you do not need the bootloader anymore - since your application development has completed and you want the program to permanently reside in the internal instruction memory - the IMEM is automatically implemented as pre-intialized ROM when the processor-internal bootloader is disabled (INT_BOOTLOADER_EN = false).
When the IMEM is implemented as ROM, it will be initialized during synthesis with the actual application
program image. The compiler toolchain will generate a VHDL initialization
file rtl/core/neorv32_application_image.vhd
, which is automatically inserted into the IMEM. If
the IMEM is implemented as RAM (default), the memory will not be initialized at all.
2.5.2. Data Memory (DMEM)
Hardware source file(s): |
neorv32_dmem.entity.vhd |
entity-only definition |
mem/neorv32_dmem.default.vhd |
default platform-agnostic memory architecture |
|
Software driver file(s): |
none |
implicitly used |
Top entity port: |
none |
|
Configuration generics: |
MEM_INT_DMEM_EN |
implement processor-internal DMEM when true |
MEM_INT_DMEM_SIZE |
DMEM size in bytes |
|
CPU interrupts: |
none |
The actual DMEM is split into two design files: a plain entity definition (neorv32_dmem.entity.vhd ) and the actual
architecture definition (mem/neorv32_dmem.default.vhd ). This default architecture provides a generic and
platform independent memory design that (should) infers embedded memory block. You can replace/modify the architecture
source file in order to use platform-specific features (like advanced memory resources) or to improve technology mapping
and/or timing.
|
Implementation of the processor-internal data memory is enabled via the processor’s MEM_INT_DMEM_EN
generic. The size in bytes is defined via the MEM_INT_DMEM_SIZE generic. If the DMEM is implemented,
the memory is mapped into the data memory space and located right at the beginning of the data memory
space (default dspace_base_c
= 0x80000000). The DMEM is always implemented as RAM.
2.5.3. Bootloader ROM (BOOTROM)
Hardware source file(s): |
neorv32_boot_rom.vhd |
|
Software driver file(s): |
none |
implicitly used |
Top entity port: |
none |
|
Configuration generics: |
INT_BOOTLOADER_EN |
implement processor-internal bootloader when true |
CPU interrupts: |
none |
The default neorv32_boot_rom.vhd HDL source file provides a generic memory design that infers embedded
memory for larger memory configurations. You might need to replace/modify the source file in order to use
platform-specific features (like advanced memory resources) or to improve technology mapping and/or timing.
|
This HDL modules provides a read-only memory that contain the executable code image of the bootloader. If the INT_BOOTLOADER_EN generic is true this module will be implemented and the CPU boot address is modified to directly execute the code from the bootloader ROM after reset.
The bootloader ROM is located at address 0xFFFF0000
and can occupy a address space of up to 32kB. The base
address as well as the maximum address space size are fixed and cannot (should not!) be modified as this
might address collision with other processor modules.
The bootloader memory is read-only and is automatically initialized with the bootloader executable image
rtl/core/neorv32_bootloader_image.vhd
during synthesis. The actual physical size of the ROM is also
determined via synthesis and expanded to the next power of two. For example, if the bootloader code requires
10kB of storage, a ROM with 16kB will be generated. The maximum size must not exceed 32kB.
Bootloader - Software
See section Bootloader for more information regarding the actual bootloader software/executable itself.
|
Boot Configuration
See section Boot Configuration for more information regarding the processor’s different boot scenarios.
|
2.5.4. Processor-Internal Instruction Cache (iCACHE)
Hardware source file(s): |
neorv32_icache.vhd |
|
Software driver file(s): |
none |
implicitly used |
Top entity port: |
none |
|
Configuration generics: |
ICACHE_EN |
implement processor-internal instruction cache when true |
ICACHE_NUM_BLOCKS |
number of cache blocks (pages/lines) |
|
ICACHE_BLOCK_SIZE |
size of a cache block in bytes |
|
ICACHE_ASSOCIATIVITY |
associativity / number of sets |
|
CPU interrupts: |
none |
The default neorv32_icache.vhd HDL source file provides a generic memory design that infers embedded
memory. You might need to replace/modify the source file in order to use platform-specific features
(like advanced memory resources) or to improve technology mapping and/or timing.
|
The processor features an optional cache for instructions to compensate memories with high latency. The cache is directly connected to the CPU’s instruction fetch interface and provides a full-transparent buffering of instruction fetch accesses to the entire 4GB address space.
The instruction cache is intended to accelerate instruction fetch via the external memory interface. Since all processor-internal memories provide an access latency of one cycle (by default), caching internal memories does not bring any performance gain. However, it might reduce traffic on the processor-internal bus. |
The cache is implemented if the ICACHE_EN generic is true. The size of the cache memory is defined via ICACHE_BLOCK_SIZE (the size of a single cache block/page/line in bytes; has to be a power of two and >= 4 bytes), ICACHE_NUM_BLOCKS (the total amount of cache blocks; has to be a power of two and >= 1) and the actual cache associativity ICACHE_ASSOCIATIVITY (number of sets; 1 = direct-mapped, 2 = 2-way set-associative, has to be a power of two and >= 1).
If the cache associativity (ICACHE_ASSOCIATIVITY) is > 1 the LRU replacement policy (least recently used) is used.
Keep the features of the targeted FPGA’s memory resources (block RAM) in mind when configuring the cache size/layout to maximize and optimize resource utilization. |
By executing the ifence.i
instruction (Zifencei
CPU extension) the cache is cleared and a reload from
main memory is forced. Among other things, this allows to implement self-modifying code.
Bus Access Fault Handling
The cache always loads a complete cache block (ICACHE_BLOCK_SIZE bytes) aligned to the size of a cache block if a miss is detected. If any of the accessed addresses within a single block do not successfully acknowledge (i.e. issuing an error signal or timing out) the whole cache block is invalidate and any access to an address within this cache block will also raise an instruction fetch bus error fault exception.
2.5.5. Processor-External Memory Interface (WISHBONE) (AXI4-Lite)
Hardware source file(s): |
neorv32_wishbone.vhd |
|
Software driver file(s): |
none |
implicitly used |
Top entity port: |
|
request tag output (3-bit) |
|
address output (32-bit) |
|
|
data input (32-bit) |
|
|
data output (32-bit) |
|
|
write enable (1-bit) |
|
|
byte enable (4-bit) |
|
|
strobe (1-bit) |
|
|
valid cycle (1-bit) |
|
|
exclusive access request (1-bit) |
|
|
acknowledge (1-bit) |
|
|
bus error (1-bit) |
|
|
an executed |
|
|
an executed |
|
Configuration generics: |
MEM_EXT_EN |
enable external memory interface when true |
MEM_EXT_TIMEOUT |
number of clock cycles after which an unacknowledged external bus access will auto-terminate (0 = disabled) |
|
MEM_EXT_PIPE_MODE |
when false (default): classic/standard Wishbone protocol; when true: pipelined Wishbone protocol |
|
MEM_EXT_BIG_ENDIAN |
byte-order (Endianness) of external memory interface; true=BIG, false=little (default) |
|
MEM_EXT_ASYNC_RX |
use registered RX path when false (default); use async/direct RX path when true |
|
CPU interrupts: |
none |
The external memory interface uses the Wishbone interface protocol. The external interface port is available when the MEM_EXT_EN generic is true. This interface can be used to attach external memories, custom hardware accelerators additional IO devices or all other kinds of IP blocks. All memory accesses from the CPU, that do not target the internal bootloader ROM, the internal IO region or the internal data/instruction memories (if implemented at all) are forwarded to the Wishbone gateway and thus to the external memory interface.
When using the default processor setup, all access addresses between 0x00000000 and 0xffff0000 (= beginning of processor-internal BOOT ROM) are delegated to the external memory / bus interface if they are not targeting the (actually enabled/implemented) processor-internal instruction memory (IMEM) or the (actually enabled/implemented) processor-internal data memory (DMEM). See section Address Space for more information. |
Wishbone Bus Protocol
The external memory interface either uses standard ("classic") Wishbone transactions (default) or pipelined Wishbone transactions. The transaction protocol is configured via the MEM_EXT_PIPE_MODE generic:
When MEM_EXT_PIPE_MODE is false, all bus control signals including STB are active (and stable) until the transfer is acknowledged/terminated. If MEM_EXT_PIPE_MODE is true, all bus control except STB are active (and stable) until the transfer is acknowledged/terminated. In this case, STB is active only during the very first bus clock cycle.
![]() |
![]() |
Classic Wishbone read access |
Pipelined Wishbone write access |
A detailed description of the implemented Wishbone bus protocol and the according interface signals can be found in the data sheet "Wishbone B4 - WISHBONE System-on-Chip (SoC) Interconnection Architecture for Portable IP Cores". A copy of this document can be found in the docs folder of this project.
Interface Latency
By default, the Wishbone gateway introduces two additional latency cycles: processor-outgoing ("TX") and processor-incoming ("RX") signals are fully registered. Thus, any access from the CPU to a processor-external devices via Wishbone requires 2 additional clock cycles (at least; depending on device’s latency).
If the attached Wishbone network / peripheral already provides output registers or if the Wishbone network is not relevant for timing closure, the default buffering of incoming ("RX") data within the gateway can be disabled by implementing an "asynchronous" RX path. The configuration is done via the MEM_EXT_ASYNC_RX generic.
Bus Access Timeout
The Wishbone bus interface provides an option to configure a bus access timeout counter. The MEM_EXT_TIMEOUT top generic is used to specify the maximum time (in clock cycles) a bus access can be pending before it is automatically terminated. If MEM_EXT_TIMEOUT is set to zero, the timeout disabled an a bus access can take an arbitrary number of cycles to complete.
When MEM_EXT_TIMEOUT is greater than zero, the WIshbone adapter starts an internal countdown whenever the CPU
accesses a memory address via the external memory interface. If the accessed memory / device does not acknowledge (via wb_ack_i
)
or terminate (via wb_err_i
) the transfer within MEM_EXT_TIMEOUT clock cycles, the bus access is automatically canceled
(setting wb_cyc_o
low again) and a load/store/instruction fetch bus access fault exception is raised.
This feature can be used as safety guard if the external memory system does not check for "address space holes". That means that addresses, which do not belong to a certain memory or device, do not permanently stall the processor due to an unacknowledged/unterminated bus access. If the external memory system can guarantee to access any bus access (even it targets an unimplemented address) the timeout feature should be disabled (MEM_EXT_TIMEOUT = 0). |
Wishbone Tag
The 3-bit wishbone wb_tag_o
signal provides additional information regarding the access type. This signal
is compatible to the AXI4 AxPROT signal.
-
wb_tag_o(0)
1: privileged access (CPU is in machine mode); 0: unprivileged access -
wb_tag_o(1)
always zero (indicating "secure access") -
wb_tag_o(2)
1: instruction fetch access, 0: data access
Exclusive / Atomic Bus Access
If the atomic memory access CPU extension (via CPU_EXTENSION_RISCV_A) is enabled, the CPU can request an atomic/exclusive bus access via the external memory interface.
The load-reservate instruction (lr.w
) will set the wb_lock_o
signal telling the bus interconnect to establish a
reservation for the current accessed address (start of an exclusive access). This signal will stay asserted until
another memory access instruction is executed (for example a sc.w
).
The memory system has to make sure that no other entity can access the reservated address until wb_lock_o
is released again. If this attempt fails, the memory system has to assert wb_err_i
in order to indicate that the
reservation was broken.
See section Bus Interface for the CPU bus interface protocol. |
Endianness
The NEORV32 CPU and the Processor setup are little-endian architectures. To allow direct connection to a big-endian memory system the external bus interface provides an Endianness configuration. The Endianness (of the external memory interface) can be configured via the MEM_EXT_BIG_ENDIAN generic. By default, the external memory interface uses little-endian byte-order (like the rest of the processor / CPU).
Application software can check the Endianness configuration of the external bus interface via the SYSINFO module (see section System Configuration Information Memory (SYSINFO) for more information).
AXI4-Lite Connectivity
The AXI4-Lite wrapper (rtl/system_integration/neorv32_SystemTop_axi4lite.vhd
) provides a Wishbone-to-
AXI4-Lite bridge, compatible with Xilinx Vivado (IP packager and block design editor). All entity signals of
this wrapper are of type std_logic or std_logic_vector, respectively.
The AXI Interface has been verified using Xilinx Vivado IP Packager and Block Designer. The AXI interface port signals are automatically detected when packaging the core.

Using the auto-termination timeout feature (MEM_EXT_TIMEOUT greater than zero) is not AXI4 compliant as the AXI protocol does not support canceling of
bus transactions. Therefore, the NEORV32 top wrapper with AXI4-Lite interface (rtl/system_integration/neorv32_SystemTop_axi4lite ) configures MEM_EXT_TIMEOUT = 0 by default.
|
2.5.6. Internal Bus Monitor (BUSKEEPER)
Hardware source file(s): |
neorv32_buskeeper.vhd |
|
Software driver file(s): |
none |
explicitly used |
Top entity port: |
none |
|
Configuration generics: |
none |
|
Package constants: |
|
Access time window (#cycles) |
CPU interrupts: |
none |
Theory of Operation
The Bus Keeper is a fundamental component of the processor’s internal bus system that ensures correct bus operations to maintain execution safety. The Bus Keeper monitors every single bus transactions that is intimated by the CPU. If an accessed device responds with an error condition or do not respond within a specific access time window, the according bus access fault exception is raised. The following exceptions can be raised by the Bus Keeper (see section NEORV32 Trap Listing for all CPU exceptions):
-
TRAP_CODE_I_ACCESS
: error during instruction fetch bus access -
TRAP_CODE_S_ACCESS
: error during data store bus access -
TRAP_CODE_L_ACCESS
: error during data load bus access
The access time window, in which an accessed device has to respond, is defined by the max_proc_int_response_time_c
constant from the processor’s VHDL package file (rtl/neorv32_package.vhd
). The default value is 15 clock cycles.
In case of a bus access fault exception application software can evaluate the Bus Keeper’s control register
NEORV32_BUSKEEPER.CTRL
to retrieve further details of the bus exception. The BUSKEEPER_ERR_FLAG bit indicates
that an actual bus access fault has occurred. The bit is sticky once set is automatically cleared when reading the
NEORV32_BUSKEEPER.CTRL
register. The BUSKEEPER_ERR_TYPE indicated the tape or bus fault:
-
BUSKEEPER_ERR_TYPE =
0
- "Device Error": The bus access exception was cause by the memory-mapped device that has been accessed (the device asserted it’serr_o
). -
BUSKEEPER_ERR_TYPE =
1
- "Timeout Error": The bus access exception was caused by the Bus Keeper because the accessed memory-mapped device did not respond within the access time window.
Bus access fault exceptions are also raised if a physical memory protection rule is violated. In this case the BUSKEEPER_ERR_FLAG bit remains zero. |
Furthermore, application software can determine the source of the bus access fault via the BUSKEEPER_ERR_SRC bit:
-
BUSKEEPER_ERR_SRC =
0
: The error was cause during access via the Processor-External Memory Interface (WISHBONE) (AXI4-Lite)). -
BUSKEEPER_ERR_SRC =
1
: The error was cause during access to an processor-internal module.
The Bus Keeper does not track timeout errors of processor-external accesses via the external memory bus interface. However, the external memory bus interface also provides an optional and independent bus timeout feature (see section Processor-External Memory Interface (WISHBONE) (AXI4-Lite)). |
Address | Name [C] | Bit(s), Name [C] | R/W | Function |
---|---|---|---|---|
|
|
|
r/- |
Bus error type, valid if BUSKEEPER_ERR_FLAG is set: |
|
r/- |
Error source: |
||
|
r/- |
Sticky error flag, clears after read |
2.5.7. Stream Link Interface (SLINK)
Hardware source file(s): |
neorv32_slink.vhd |
|
Software driver file(s): |
neorv32_slink.c |
|
neorv32_slink.h |
||
Top entity port: |
|
TX link data (8x32-bit) |
|
TX link data valid (8-bit) |
|
|
TX link allowed to send (8-bit) |
|
|
RX link data (8x32-bit) |
|
|
RX link data valid (8-bit) |
|
|
RX link ready to receive (8-bit) |
|
Configuration generics: |
SLINK_NUM_TX |
Number of TX links to implement (0..8) |
SLINK_NUM_RX |
Number of RX links to implement (0..8) |
|
SLINK_TX_FIFO |
FIFO depth (1..32k) of TX links, has to be a power of two |
|
SLINK_RX_FIFO |
FIFO depth (1..32k) of RX links, has to be a power of two |
|
CPU interrupts: |
fast IRQ channel 10 |
SLINK RX IRQ (see Processor Interrupts) |
fast IRQ channel 11 |
SLINK TX IRQ (see Processor Interrupts) |
The SLINK component provides up to 8 independent RX (receiving) and TX (sending) links for transmitting stream data. The interface provides higher bandwidth (and less latency) than the external memory bus interface, which makes it ideally suited to couple custom stream processing units (like CORDIC, FFTs or cryptographic accelerators).
Each individual link provides an internal FIFO for data buffering. The FIFO depth is globally defined for all TX links via the SLINK_TX_FIFO generic and for all RX links via the SLINK_RX_FIFO generic. The FIFO depth has to be at least 1, which will implement a simple input/output register. The maximum value is limited to 32768 entries. Note that the FIFO depth has to be a power of two (for optimal logic mapping).
The actual number of implemented RX/TX links is configured by the SLINK_NUM_RX and SLINK_NUM_TX generics. The SLINK module will be synthesized only if at least one of these generics is greater than zero. All unimplemented links are internally terminated and their according output signals are pulled to low level.
The SLINK interface does not provide any additional tag signals (for example to define a "stream destination address" or to indicate the last data word of a "package"). Use a custom controller connected via the external memory bus interface or use some of the processor’s GPIO ports to implement custom data tag signals. |
Theory of Operation
The SLINK provides eight data registers (DATA[i]
) to access the links (read accesses will access the RX links, write
accesses will access the TX links), one control register (CTRL
) and one status register (STATUS
).
The SLINK is globally activated by setting the control register’s enable bit SLINK_CTRL_EN.
The actual data links are accessed by reading or writing the according link data registers DATA[0]
to DATA[7]
. For example, writing the DATA[0]
will put the according data into the FIFO of TX link 0.
Accordingly, reading from DATA[0]
will return one data word from the FIFO of RX link 0.
The configuration (done via the SLINK generics) can be checked by software by evaluating bit fields in the control register. The SLINK_CTRL_TX_FIFO_Sx and SLINK_CTRL_RX_FIFO_Sx indicate the TX & RX FIFO sizes. The SLINK_CTRL_TX_NUMx and SLINK_CTRL_RX_NUMx bits represent the absolute number of implemented TX and RX links.
The status register shows the FIFO status flags of each RX and TX link. The SLINK_CTRL_RXx_AVAIL flags indicate that there is at least one data word in the according RX link’s FIFO. The SLINK_CTRL_TXx_FREE flags indicate there is at least one free entry in the according TX link’s FIFO. The SLINK_STATUS_RXx_HALF and SLINK_STATUS_RXx_HALF flags show if a certain FIFO’s fill level has exceeded half of its capacity.
Blocking Link Access
When directly accessing the link data registers (without checking the according FIFO status flags) the access
is as blocking. That means the CPU access will stall until the accessed link responds. For
example, when reading RX link 0 (via DATA[0]
register) the CPU will stall, if there is not data
available in the according FIFO yet. The CPU access will complete as soon as RX link 0 receives new data.
Vice versa, writing data to TX link 0 (via DATA[0]
register) will stall the CPU access until there is
at least one free entry in the link’s FIFO.
The NEORV32 processor ensures that any CPU access to memory-mapped devices (including the SLINK module) will time out after a certain number of cycles (see section Bus Interface). Hence, blocking access to a stream link that does not complete within a certain amount of cycles will raise a store bus access exception when writing a full TX link or a load bus access exception when reading from an empty RX link. Hence, this concept should only be used when evaluating the half-full FIFO condition (for example via the SLINK interrupts) before actual accessing links. |
Non-Blocking Link Access
For a non-blocking link access concept, the FIFO status flags in STATUS
need to be checked before
reading/writing the actual link data register. For example, a non-blocking write access to a TX link 0 has
to check SLINK_STATUS_TX0_FREE first. If the bit is set, the FIFO of TX link 0 can take another data word
and the actual data can be written to DATA[0]
. If the bit is cleared, the link’s FIFO is full
and the status flag can be polled until it there is free space in the available.
This concept will not raise any exception as there is no "direct" access to the link data registers. However, non-blocking accesses require additional instructions to check the according status flags prior to the actual link access, which will reduce performance for high-bandwidth data streams.
Stream Link Interface & Protocol
The SLINK interface consists of three signals dat
, val
and rdy
for each RX and TX link.
Each signal is an "array" with eight entires (one for each link). Note that an entry in slink_*x_dat
is 32-bit
wide while entries in slink_*x_val
and slink_*x_rdy
are are just 1-bit wide.
The stream link protocol is based on a simple FIFO-like interface between a source (sender) and a sink (receiver).
Each link provides two signals for implementing a simple FIFO-style handshake. The slink_*x_val
signal is set by
the source if the according slink_*x_dat
(also set by the source) contains valid data. The stream source has to
ensure that both signals remain stable until the according slink_*x_rdy
signal is set by the stream sink to
indicate it can accept another data word.
In summary, a data word is transferred if both slink_*x_val(i)
and slink_*x_rdy(i)
are high.

The SLINK handshake protocol is compatible with the AXI4-Stream base protocol. |
Interrupts
The stream interface provides two independent interrupts that are globally driven by the RX and TX link’s
FIFO fill level status. Each RX and TX link provides an individual interrupt enable flag and an individual
interrupt type flag that allows to configure interrupts only for certain (or all) links and for application-
specific interrupt conditions. The interrupt configuration is done using the NEORV32_SLINK.IRQ
register.
Any interrupt can only become pending if the SLINK module is enabled at all.
The current FIFO fill-level of a specific RX link can only raise an interrupt request if it’s interrupt enable flag SLINK_IRQ_RX_EN is set. Vice versa, the current FIFO fill-level of a specific TX link can only raise an interrupt request if it’s interrupt enable flag SLINK_IRQ_TX_EN is set.
The RX link’s SLINK_IRQ_RX_MODE flags define the FIFO fill-level condition for raising an RX interrupt request:
* If a link’s interrupt mode flag is 1
an IRQ is generated when the link’s FIFO is not empty ("RX data available").
* If a link’s interrupt mode flag is 0
an IRQ is generated when the link’s FIFO is at least half-full ("time to get data from RX FIFO to prevent overflow").
The TX link’s SLINK_IRQ_TX_MODE flags define the FIFO fill-level condition for raising an TX interrupt request:
* If a link’s interrupt mode flag is 1
an IRQ is generated when the link’s FIFO is not full ("space left in FIFO for new TX data").
* If a link’s interrupt mode flag is 0
an IRQ is generated when the link’s FIFO is less than half-full ("SW can send SLINK_TX_FIFO/2 data words without checking any flags").
If SLINK_RX_FIFO is 1 the SLINK_IRQ_RX_MODE bits are hardwired to one. If SLINK_TX_FIFO is 1 the SLINK_IRQ_TX_MODE bits are hardwired to one. |
There is no RX FIFO overflow mechanism available yet. |
If any configured interrupt condition is fulfilled, the according global SLINK RX / SLINK TX CPU interrupt becomes pending. If the interrupt enable flags of several links are set, the interrupt service handler has to evaluate the SLINK status register is order to detect which link(s) caused the interrupt.
If the programmed interrupt condition is fulfilled, the corresponding IRQ will become pending until the causing interrupt conditions is resolved (for example by reading data from the according RX FIFO). |
Address | Name [C] | Bit(s) | R/W | Function |
---|---|---|---|---|
|
|
|
r/w |
SLINK global enable |
|
r/- |
reserved, read as zero |
||
|
r/- |
TX links FIFO depth, log2 of_SLINK_TX_FIFO_ generic |
||
|
r/- |
RX links FIFO depth, log2 of_SLINK_RX_FIFO_ generic |
||
|
r/- |
Number of implemented TX links |
||
|
r/- |
Number of implemented RX links |
||
|
- |
|
r/- |
reserved |
|
|
|
r/w |
RX interrupt enable for link 7..0 |
|
r/w |
RX IRQ mode for link 7..0: |
||
|
r/w |
TX interrupt enable for link 7..0 |
||
|
r/w |
TX IRQ mode for link 7..0: |
||
|
- |
|
r/- |
reserved |
|
|
|
r/- |
TX link 7..0 FIFO fill level is >= half-full |
|
r/- |
RX link 7..0 FIFO fill level is >= half-full |
||
|
r/- |
At least one free TX FIFO entry available for link 7..0 |
||
|
r/- |
At least one data word in RX FIFO available for link 7..0 |
||
|
- |
|
r/- |
reserved |
|
|
|
r/w |
Link 0 RX/TX data |
|
|
|
r/w |
Link 1 RX/TX data |
|
|
|
r/w |
Link 2 RX/TX data |
|
|
|
r/w |
Link 3 RX/TX data |
|
|
|
r/w |
Link 4 RX/TX data |
|
|
|
r/w |
Link 5 RX/TX data |
|
|
|
r/w |
Link 6 RX/TX data |
|
|
|
r/w |
Link 7 RX/TX data |
2.5.8. General Purpose Input and Output Port (GPIO)
Hardware source file(s): |
neorv32_gpio.vhd |
|
Software driver file(s): |
neorv32_gpio.c |
|
neorv32_gpio.h |
||
Top entity port: |
|
64-bit parallel output port |
|
64-bit parallel input port |
|
Configuration generics: |
IO_GPIO_EN |
implement GPIO port when true |
CPU interrupts: |
none |
Theory of Operation
The general purpose parallel IO port unit provides a simple 64-bit parallel input port and a 64-bit parallel output port. These ports can be used chip-externally (for example to drive status LEDs, connect buttons, etc.) or system-internally to provide control signals for other IP modules. The component is disabled for implementation when the IO_GPIO_EN generic is set false. In this case GPIO output port is tied to all-zero.
Access atomicity
The GPIO modules uses two memory-mapped registers (each 32-bit) each for accessing the input and
output signals. Since the CPU can only process 32-bit "at once" updating the entire output cannot
be performed within a single clock cycle.
|
Address | Name [C] | Bit(s) | R/W | Function |
---|---|---|---|---|
|
|
31:0 |
r/- |
parallel input port pins 31:0 (write accesses are ignored) |
|
|
31:0 |
r/- |
parallel input port pins 63:32 (write accesses are ignored) |
|
|
31:0 |
r/w |
parallel output port pins 31:0 |
|
|
31:0 |
r/w |
parallel output port pins 63:32 |
2.5.9. Watchdog Timer (WDT)
Hardware source file(s): |
neorv32_wdt.vhd |
|
Software driver file(s): |
neorv32_wdt.c |
|
neorv32_wdt.h |
||
Top entity port: |
none |
|
Configuration generics: |
IO_WDT_EN |
implement GPIO port when true |
CPU interrupts: |
fast IRQ channel 0 |
watchdog timer overflow (see Processor Interrupts) |
Theory of Operation
The watchdog (WDT) provides a last resort for safety-critical applications. The WDT has an internal 20-bit wide counter that needs to be reset every now and then by the user program. If the counter overflows, either a system reset or an interrupt is generated (depending on the configured operation mode).
Configuration of the watchdog is done by a single control register CTRL
. The watchdog is enabled by
setting the WDT_CTRL_EN bit. The clock used to increment the internal counter is selected via the 3-bit
WDT_CTRL_CLK_SELx prescaler:
WDT_CTRL_CLK_SELx |
Main clock prescaler | Timeout period in clock cycles |
---|---|---|
|
2 |
2 097 152 |
|
4 |
4 194 304 |
|
8 |
8 388 608 |
|
64 |
67 108 864 |
|
128 |
134 217 728 |
|
1024 |
1 073 741 824 |
|
2048 |
2 147 483 648 |
|
4096 |
4 294 967 296 |
Whenever the internal timer overflows the watchdog executes one of two possible actions: Either a hard processor reset is triggered or an interrupt is requested at CPU’s fast interrupt channel #0. The WDT_CTRL_MODE bit defines the action to be taken on an overflow: When cleared, the Watchdog will assert an IRQ, when set the WDT will cause a system reset. The configured action can also be triggered manually at any time by setting the WDT_CTRL_FORCE bit. The watchdog is reset by setting the WDT_CTRL_RESET bit.
A watchdog interrupt can only occur if the watchdog is enabled and interrupt mode is enabled. A pending interrupt is cleared by either disabling the watchdog or by resetting the watchdog.
The cause of the last action of the watchdog can be determined via the WDT_CTRL_RCAUSE flag. If this flag is zero, the processor has been reset via the external reset signal. If this flag is set the last system reset was initiated by the watchdog.
The Watchdog control register can be locked in order to protect the current configuration. The lock is activated by setting bit WDT_CTRL_LOCK. In the locked state any write access to the configuration flags is ignored (see table below, "accessible if locked"). Read accesses to the control register are not effected. The lock can only be removed by a system reset (via external reset signal or via a watchdog reset action).
Address | Name [C] | Bit(s), Name [C] | R/W | Writable if locked | Function |
---|---|---|---|---|---|
|
|
|
r/w |
no |
watchdog enable |
|
r/w |
no |
3-bit clock prescaler select |
||
|
r/w |
no |
|||
|
r/w |
no |
|||
|
r/w |
no |
overflow action: |
||
|
r/- |
- |
cause of last system reset: |
||
|
-/w |
yes |
watchdog reset when set, auto-clears |
||
|
-/w |
yes |
force configured watchdog action when set, auto-clears |
||
|
r/w |
no |
lock access to configuration when set, clears only on system reset (via external reset signal OR watchdog reset action = reset) |
2.5.10. Machine System Timer (MTIME)
Hardware source file(s): |
neorv32_mtime.vhd |
|
Software driver file(s): |
neorv32_mtime.c |
|
neorv32_mtime.h |
||
Top entity port: |
|
System time input from external MTIME |
|
System time output (64-bit) for SoC |
|
Configuration generics: |
IO_MTIME_EN |
implement MTIME when true |
CPU interrupts: |
|
machine timer interrupt (see Processor Interrupts) |
Theory of Operation
The MTIME machine system timer implements the memory-mapped MTIME timer from the official RISC-V
specifications. This unit features a 64-bit system timer incremented with the primary processor clock.
The current system time can also be obtained using the time[h]
CSRs and is made available for processor-external
use via the top’s mtime_o
signal.
If the processor-internal MTIME unit is NOT implemented, the top’s mtime_i input signal is used to update the time[h] CSRs
and the MTI machine timer CPU interrupt (MTI ) is directly connected to the top’s mtime_irq_i input.
|
The 64-bit system time can be accessed via the TIME_LO
and TIME_HI
memory-mapped registers (read/write) and also via
the CPU’s time[h]
CSRs (read-only). A 64-bit time compare register - accessible via memory-mapped TIMECMP_LO
and TIMECMP_HI
registers - is used to configure an interrupt to the CPU. The interrupt is triggered
whenever TIME
(high & low part) >= TIMECMP
(high & low part) and is directly forwarded to the CPU’s MTI
interrupt.
The interrupt remain active (=pending) until TIME
< TIMECMP
(either by modifying TIME
or TIMECMP
).
Address | Name [C] | Bits | R/W | Function |
---|---|---|---|---|
|
|
31:0 |
r/w |
machine system time, low word |
|
|
31:0 |
r/w |
machine system time, high word |
|
|
31:0 |
r/w |
time compare, low word |
|
|
31:0 |
r/w |
time compare, high word |
2.5.11. Primary Universal Asynchronous Receiver and Transmitter (UART0)
Hardware source file(s): |
neorv32_uart.vhd |
|
Software driver file(s): |
neorv32_uart.c |
|
neorv32_uart.h |
||
Top entity port: |
|
serial transmitter output UART0 |
|
serial receiver input UART0 |
|
|
flow control: RX ready to receive |
|
|
flow control: TX allowed to send |
|
Configuration generics: |
IO_UART0_EN |
implement UART0 when true |
UART0_RX_FIFO |
RX FIFO depth (power of 2, min 1) |
|
UART0_TX_FIFO |
TX FIFO depth (power of 2, min 1) |
|
CPU interrupts: |
fast IRQ channel 2 |
RX interrupt |
fast IRQ channel 3 |
TX interrupt (see Processor Interrupts) |
The UART is a standard serial interface mainly used to establish a communication channel between a host computer computer/user and an application running on the embedded processor.
The NEORV32 UARTs feature independent transmitter and receiver with a fixed frame configuration of 8 data bits, an optional parity bit (even or odd) and a fixed stop bit. The actual transmission rate - the Baudrate - is programmable via software. Optional FIFOs with custom sizes can be configured for the transmitter and receiver independently.
The UART features two memory-mapped registers CTRL
and DATA
, which are used for configuration, status
check and data transfer.
Please note that ALL default example programs and software libraries of the NEORV32 software framework (including the bootloader and the runtime environment) use the primary UART (UART0) as default user console interface. |
Theory of Operation
UART0 is enabled by setting the UART_CTRL_EN bit in the UART0 control register CTRL
. The Baud rate
is configured via a 12-bit UART_CTRL_BAUDxx baud prescaler (baud_prsc
) and a 3-bit UART_CTRL_PRSCx
clock prescaler (clock_prescaler
) that scales the processor’s primary clock (fmain).
UART_CTRL_PRSCx |
0b000 |
0b001 |
0b010 |
0b011 |
0b100 |
0b101 |
0b110 |
0b111 |
---|---|---|---|---|---|---|---|---|
Resulting |
2 |
4 |
8 |
64 |
128 |
1024 |
2048 |
4096 |
Baud rate = (fmain[Hz] / clock_prescaler
) / (baud_prsc
+ 1)
A new transmission is started by writing the data byte to be send to the lowest byte of the DATA
register. The
transfer is completed when the UART_CTRL_TX_BUSY control register flag returns to zero. A new received byte
is available when the UART_DATA_AVAIL flag of the DATA
register is set. A "frame error" in a received byte
(invalid stop bit) is indicated via the UART_DATA_FERR flag in the DATA
register. The flag is cleared by
reading the DATA
register.
A transmission (RX or TX) can be terminated at any time by disabling the UART module by clearing the UART_CTRL_EN control register bit. |
RX and TX FIFOs
UART0 provides optional FIFO buffers for the transmitter and the receiver. The UART0_RX_FIFO generic defines
the depth of the RX FIFO (for receiving data) while the UART0_TX_FIFO defines the depth of the TX FIFO
(for sending data). Both generics have to be a power of two with a minimal allowed value of 1. This minimal
value will implement simple "double-buffering" instead of full-featured FIFOs.
Both FIFOs are cleared whenever UART0 is disabled (clearing UART_CTRL_EN in CTRL
).
The state of both FIFO (empty, at lest half-full, full) is available via the UART_CTRL?X_EMPTY_,
UART_CTRL?X_HALF_ and UART_CTRL*X_FULL_ flags in the CTRL
register.
If the RX FIFO is already full and new data is received by the receiver unit, the UART_DATA_OVERR flag
in the DATA
register is set indicating an "overrun". This flag is cleared by reading the DATA
register.
In contrast to other FIFO-equipped peripherals, software cannot determine the UART’s FIFO size configuration by reading specific control register bits (simply because there are no bits left in the control register). |
Hardware Flow Control - RTS/CTS
UART0 supports optional hardware flow control using the standard CTS (clear to send) and/or RTS (ready to send / ready to receive "RTR") signals. Both hardware control flow mechanisms can be enabled individually.
-
If RTS hardware flow control is enabled by setting the UART_CTRL_RTS_EN control register flag, the UART will pull the
uart0_rts_o
signal low if the UART’s receiver is ready to receive new data. As long as this signal is low the connected device can send new data.uart0_rts_o
is always LOW if the UART is disabled. The RTS line is de-asserted (going high) as soon as the start bit of a new incoming char has been detected. -
If CTS hardware flow control is enabled by setting the UART_CTRL_CTS_EN control register flag, the UART’s transmitter will not start sending a new data until the
uart0_cts_i
signal goes low. During this time, the UART busy flag UART_CTRL_TX_BUSY remains set. Ifuart0_cts_i
is asserted, no new data transmission will be started by the UART. The state of theuart0_cts_i
signal has no effect on a transmission being already in progress. Application software can check the current state of theuart0_cts_o
input signal via the UART_CTRL_CTS control register flag.
Parity Modes
An optional parity bit can be added to the data stream if the UART_CTRL_PMODE1 flag is set.
When UART_CTRL_PMODE0 is zero, the UART operates in "even parity" mode. If this flag is set, the UART operates in "odd parity" mode.
Parity errors in received data are indicated via the UART_DATA_PERR flag in the DATA
register. This flag is updated with each new
received character and is cleared by reading the DATA
register.
Interrupts
UART0 features two independent interrupt for signaling certain RX and TX conditions. The behavior of these interrupts differ
based on the configured FIFO size. If the according FIFO size is greater than 1, the UART_CTRL_RX_IRQ and UART_CTRL_TX_IRQ
CTRL
flags allow a more fine-grained IRQ configuration.
-
If UART0_RX_FIFO is exactly 1, the RX interrupt becomes pending as soon as there is data available in the RX FIFO (→ UART_CTRL_RX_EMPTY clears). This flag is hardwired to
0
if UART0_RX_FIFO = 1. -
If UART0_TX_FIFO is exactly 1, the TX interrupt becomes pending as soon as there is a free entry left in the TX FIFO (→ UART_CTRL_TX_FULL clears). This flag is hardwired to
0
if UART0_RX_FIFO = 1. -
If UART0_RX_FIFO is greater than 1: If UART_CTRL_RX_IRQ is
0
the RX interrupt becomes pending as soon as there is data available in the RX FIFO (→ UART_CTRL_RX_EMPTY clears). If UART_CTRL_RX_IRQ is1
the RX interrupt becomes pending as soon as the RX FIFO is at least half-full (→ UART_CTRL_RX_HALF sets). -
If UART0_TX_FIFO is greater than 1: If UART_CTRL_TX_IRQ is
0
the TX interrupt becomes pending as soon as there is a free entry left in the TX FIFO (→ UART_CTRL_TX_FULL clears). If UART_CTRL_TX_IRQ is1
the TX interrupt becomes pending as soon as the RX FIFO is less than half-full (→ UART_CTRL_TX_HALF clears).
An interrupt can only become pending if the according interrupt condition is fulfilled and the UART is enabled at all. A pending interrupt is removed by resolving the interrupt-triggering conditions (for example by reading data from the more-than-half-full RX FIFO).
Simulation Mode
The default UART0 operation will transmit any data written to the DATA
register via the serial TX line at
the defined baud rate via the physical link. To accelerate UART0 output during simulation
(and also to dump large amounts of data) the UART0 features a simulation mode.
Simulation mode is enabled by setting the UART_CTRL_SIM_MODE bit in the UART0’s control register
CTRL
. Any other UART0 configuration bits are irrelevant for this mode but UART0 has to be enabled via the
UART_CTRL_EN bit. There will be no physical UART0 transmissions via uart0_txd_o
at all when
simulation mode is enabled. Furthermore, no interrupts (RX & TX) will be triggered.
When the simulation mode is enabled any data written to DATA[7:0]
is
directly output as ASCII char to the simulator console. Additionally, all chars are also stored to a text file
neorv32.uart0.sim_mode.text.out
in the simulation home folder.
Furthermore, the whole 32-bit word written to DATA[31:0]
is stored as plain 8-char hexadecimal value to a
second text file neorv32.uart0.sim_mode.data.out
also located in the simulation home folder.
More information regarding the simulation-mode of the UART0 can be found in the User Guide section Simulating the Processor. |
Address | Name [C] | Bit(s), Name [C] | R/W | Function |
---|---|---|---|---|
|
|
|
r/w |
12-bit BAUD value configuration value |
|
r/w |
enable simulation mode |
||
|
r/- |
RX FIFO is empty |
||
|
r/- |
RX FIFO is at least half-full |
||
|
r/- |
RX FIFO is full |
||
|
r/- |
TX FIFO is empty |
||
|
r/- |
TX FIFO is at least half-full |
||
|
r/- |
TX FIFO is full |
||
|
r/- |
reserved, read as zero |
||
|
r/w |
enable RTS hardware flow control |
||
|
r/w |
enable CTS hardware flow control |
||
|
r/w |
parity bit enable and configuration ( |
||
|
r/w |
|||
|
r/w |
3-bit baudrate clock prescaler select |
||
|
r/w |
|||
|
r/w |
|||
|
r/- |
current state of UART’s CTS input signal |
||
|
r/w |
UART enable |
||
|
r/w |
RX IRQ mode: |
||
|
r/w |
TX IRQ mode: |
||
|
r/- |
transmitter busy flag |
||
|
|
|
r/w |
receive/transmit data (8-bit) |
|
-/w |
simulation data output |
||
|
r/- |
RX parity error |
||
|
r/- |
RX data frame error (stop bit nt set) |
||
|
r/- |
RX data overrun |
||
|
r/- |
RX data available when set |
2.5.12. Secondary Universal Asynchronous Receiver and Transmitter (UART1)
Hardware source file(s): |
neorv32_uart.vhd |
|
Software driver file(s): |
neorv32_uart.c |
|
neorv32_uart.h |
||
Top entity port: |
|
serial transmitter output UART1 |
|
serial receiver input UART1 |
|
|
flow control: RX ready to receive |
|
|
flow control: TX allowed to send |
|
Configuration generics: |
IO_UART1_EN |
implement UART1 when true |
UART1_RX_FIFO |
RX FIFO depth (power of 2, min 1) |
|
UART1_TX_FIFO |
TX FIFO depth (power of 2, min 1) |
|
CPU interrupts: |
fast IRQ channel 4 |
RX interrupt |
fast IRQ channel 5 |
TX interrupt (see Processor Interrupts) |
Theory of Operation
The secondary UART (UART1) is functional identical to the primary UART (Primary Universal Asynchronous Receiver and Transmitter (UART0)).
Obviously, UART1 has different addresses for the control register (CTRL
) and the data register (DATA
) - see the register map below.
The register’s bits/flags use the same bit positions and naming as for the primary UART. The RX and TX interrupts of UART1 are
mapped to different CPU fast interrupt (FIRQ) channels.
Simulation Mode
The secondary UART (UART1) provides the same simulation options as the primary UART. However,
output data is written to UART1-specific files: neorv32.uart1.sim_mode.text.out
is used to store
plain ASCII text and neorv32.uart1.sim_mode.data.out
is used to store full 32-bit hexadecimal
data words.
Address | Name [C] | Bit(s), Name [C] | R/W | Function |
---|---|---|---|---|
|
|
|
r/w |
12-bit BAUD value configuration value |
|
r/w |
enable simulation mode |
||
|
r/- |
RX FIFO is empty |
||
|
r/- |
RX FIFO is at least half-full |
||
|
r/- |
RX FIFO is full |
||
|
r/- |
TX FIFO is empty |
||
|
r/- |
TX FIFO is at least half-full |
||
|
r/- |
TX FIFO is full |
||
|
r/- |
reserved, read as zero |
||
|
r/w |
enable RTS hardware flow control |
||
|
r/w |
enable CTS hardware flow control |
||
|
r/w |
parity bit enable and configuration ( |
||
|
r/w |
|||
|
r/w |
3-bit baudrate clock prescaler select |
||
|
r/w |
|||
|
r/w |
|||
|
r/- |
current state of UART’s CTS input signal |
||
|
r/w |
UART enable |
||
|
r/w |
RX IRQ mode: |
||
|
r/w |
TX IRQ mode: |
||
|
r/- |
transmitter busy flag |
||
|
|
|
r/w |
receive/transmit data (8-bit) |
|
-/w |
simulation data output |
||
|
r/- |
RX parity error |
||
|
r/- |
RX data frame error (stop bit nt set) |
||
|
r/- |
RX data overrun |
||
|
r/- |
RX data available when set |
2.5.13. Serial Peripheral Interface Controller (SPI)
Hardware source file(s): |
neorv32_spi.vhd |
|
Software driver file(s): |
neorv32_spi.c |
|
neorv32_spi.h |
||
Top entity port: |
|
1-bit serial clock output |
|
1-bit serial data output |
|
|
1-bit serial data input |
|
|
8-bit dedicated chip select (low-active) |
|
Configuration generics: |
IO_SPI_EN |
implement SPI controller when true |
CPU interrupts: |
fast IRQ channel 6 |
transmission done interrupt (see Processor Interrupts) |
Theory of Operation
SPI is a synchronous serial transmission interface for fast on-board communications.
The NEORV32 SPI transceiver supports 8-, 16-, 24- and 32-bit wide transmissions.
The unit provides 8 dedicated chip select signals via the top entity’s spi_csn_o
signal, which are
directly controlled by the SPI module (no additional GPIO required).
The NEORV32 SPI module only supports host mode. Transmission are initiated only by the processor’s SPI module (and not by an external SPI module). |
The SPI unit is enabled by setting the SPI_CTRL_EN bit in the CTRL
control register. No transfer can be initiated
and no interrupt request will be triggered if this bit is cleared. Furthermore, a transfer being in process
can be terminated at any time by clearing this bit.
Changes to the CTRL control register should be made only when the SPI module is idle as they directly effect
transmissions being in-progress.
|
A transmission can be terminated at any time by disabling the SPI module by clearing the SPI_CTRL_EN control register bit. |
The data quantity to be transferred within a single transmission is defined via the SPI_CTRL_SIZEx bits.
The SPI module supports 8-bit (00
), 16-bit (01
), 24-bit (10
) and 32-bit (11
) transfers.
A transmission is started when writing data to the DATA
register. The data must be LSB-aligned. So if
the SPI transceiver is configured for less than 32-bit transfers data quantity, the transmit data must be placed
into the lowest 8/16/24 bit of DATA
. Vice versa, the received data is also always LSB-aligned. Application
software should only actually process the amount of bits that were configured using SPI_CTRL_SIZEx when
reading DATA
.
The NEORV32 SPI module only support MSB-first mode. Data can be reversed before writing DATA (for TX) / after
reading DATA (for RX) to implement LSB-first transmissions. Note that in both cases data in ` DATA` still
needs to be LSB-aligned.
|
The actual transmission length is left to the user: after asserting chip-select an arbitrary amount of transmission with arbitrary data quantity (SPI_CTRL_SIZEx) can be made before de-asserting chip-select again. |
The SPI controller features 8 dedicated chip-select lines. These lines are controlled via the control register’s
SPI_CTRL_CSx bits. When a specific SPI_CTRL_CSx bit is set, the according chip-select line spi_csn_o(x)
goes low (low-active chip-select lines).
The dedicated SPI chip-select signals can be seen as general purpose outputs. These are intended to control the accessed device’s chip-select signal but can also be use for controlling other shift register signals (like data strobe or output-enables). |
SPI Clock Configuration
The SPI module supports all standard SPI clock modes (0, 1, 2, 3), which is via the two control register bits SPI_CTRL_CPHA and SPI_CTRL_CPOL. The SPI_CTRL_CPHA bit defines the clock phase and the SPI_CTRL_CPOL bit defines the clock polarity.

Mode 0 | Mode 1 | Mode 2 | Mode 4 | |
---|---|---|---|---|
SPI_CTRL_CPOL |
|
|
|
|
SPI_CTRL_CPHA |
|
|
|
|
The SPI clock frequency (spi_sck_o
) is programmed by the 3-bit SPI_CTRL_PRSCx clock prescaler.
The following prescalers are available:
SPI_CTRL_PRSCx |
0b000 |
0b001 |
0b010 |
0b011 |
0b100 |
0b101 |
0b110 |
0b111 |
---|---|---|---|---|---|---|---|---|
Resulting |
2 |
4 |
8 |
64 |
128 |
1024 |
2048 |
4096 |
Based on the SPI_CTRL_PRSCx configuration, the actual SPI clock frequency fSPI is derived from the processor’s main clock fmain and is determined by:
fSPI = fmain[Hz] / (2 * clock_prescaler
)
Hence, the maximum SPI clock is fmain / 4.
SPI Interrupt
The SPI module provides a single interrupt to signal "ready for new transmission" to the CPU. Whenever the SPI module is currently idle (and enabled), the interrupt request is active. A pending interrupt request is cleared by triggering a new SPI transmission or by disabling the SPI module.
Address | Name [C] | Bit(s), Name [C] | R/W | Function |
---|---|---|---|---|
|
|
|
r/w |
Direct chip-select 0..7; setting |
|
r/w |
|||
|
r/w |
|||
|
r/w |
|||
|
r/w |
|||
|
r/w |
|||
|
r/w |
|||
|
r/w |
|||
|
r/w |
SPI enable |
||
|
r/w |
clock phase ( |
||
|
r/w |
3-bit clock prescaler select |
||
|
r/w |
|||
|
r/w |
|||
|
r/w |
transfer size ( |
||
|
r/w |
|||
|
r/w |
clock polarity |
||
|
r/- |
_reserved, read as zero |
||
|
r/- |
transmission in progress when set |
||
|
|
|
r/w |
receive/transmit data, LSB-aligned |
2.5.14. Two-Wire Serial Interface Controller (TWI)
Hardware source file(s): |
neorv32_twi.vhd |
|
Software driver file(s): |
neorv32_twi.c |
|
neorv32_twi.h |
||
Top entity port: |
|
1-bit bi-directional serial data |
|
1-bit bi-directional serial clock |
|
Configuration generics: |
IO_TWI_EN |
implement TWI controller when true |
CPU interrupts: |
fast IRQ channel 7 |
transmission done interrupt (see Processor Interrupts) |
Theory of Operation
The two wire interface - also called "I²C" - is a quite famous interface for connecting several on-board
components. Since this interface only needs two signals (the serial data line twi_sda_io
and the serial
clock line twi_scl_io
) - despite of the number of connected devices - it allows easy interconnections of
several peripheral nodes.
The NEORV32 TWI implements a TWI controller. It features "clock stretching" (if enabled via the control register), so a slow peripheral can halt the transmission by pulling the SCL line low. Currently, no multi-controller support is available. Also, the NEORV32 TWI unit cannot operate in peripheral mode.
The TWI is enabled via the TWI_CTRL_EN bit in the CTRL
control register. The user program can start / stop a
transmission by issuing a START or STOP condition. These conditions are generated by setting the
according bits (TWI_CTRL_START or TWI_CTRL_STOP) in the control register.
Data is send by writing a byte to the DATA
register. Received data can also be read from this
register. The TWI controller is busy (transmitting data or performing a START or STOP condition) as long as the
TWI_CTRL_BUSY bit in the control register is set.
An accessed peripheral has to acknowledge each transferred byte. When the TWI_CTRL_ACK bit is set after a completed transmission, the accessed peripheral has send an acknowledge. If it is cleared after a transmission, the peripheral has send a not-acknowledge (NACK). The NEORV32 TWI controller can also send an ACK by itself ("controller acknowledge MACK") after a transmission by pulling SDA low during the ACK time slot. Set the TWI_CTRL_MACK bit to activate this feature. If this bit is cleared, the ACK/NACK of the peripheral is sampled in this time slot instead (normal mode).
In summary, the following independent TWI operations can be triggered by the application program:
-
send START condition (also as REPEATED START condition)
-
send STOP condition
-
send (at least) one byte while also sampling one byte from the bus
A transmission can be terminated at any time by disabling the TWI module by clearing the TWI_CTRL_EN control register bit. |
The serial clock (SCL) and the serial data (SDA) lines can only be actively driven low by the controller. Hence, external pull-up resistors are required for these lines. |
The TWI clock frequency is defined via the 3-bit TWI_CTRL_PRSCx clock prescaler. The following prescalers are available:
TWI_CTRL_PRSCx |
0b000 |
0b001 |
0b010 |
0b011 |
0b100 |
0b101 |
0b110 |
0b111 |
---|---|---|---|---|---|---|---|---|
Resulting |
2 |
4 |
8 |
64 |
128 |
1024 |
2048 |
4096 |
Based on the TWI_CTRL_PRSCx configuration, the actual TWI clock frequency fSCL is derived from the processor main clock fmain and is determined by:
fSCL = fmain[Hz] / (4 * clock_prescaler
)
Interrupt
The TWI module provides a single interrupt to signal idle state (= read for new transmission) to the CPU. Whenever TWI SPI module is currently idle (and enabled), the interrupt request is active. A pending interrupt request is cleared by triggering a new TWI transmission or by disabling the device.
Address | Name [C] | Bit(s), Name [C] | R/W | Function |
---|---|---|---|---|
|
|
|
r/w |
TWI enable |
|
r/w |
generate START condition |
||
|
r/w |
generate STOP condition |
||
|
r/w |
3-bit clock prescaler select |
||
|
r/w |
|||
|
r/w |
|||
|
r/w |
generate controller ACK for each transmission ("MACK") |
||
|
r/w |
allow clock-stretching by peripherals when set |
||
|
r/- |
ACK received when set |
||
|
r/- |
transfer/START/STOP in progress when set |
||
|
|
|
r/w |
receive/transmit data |
2.5.15. Pulse-Width Modulation Controller (PWM)
Hardware source file(s): |
neorv32_pwm.vhd |
|
Software driver file(s): |
neorv32_pwm.c |
|
neorv32_pwm.h |
||
Top entity port: |
|
up to 60 PWM output channels (1-bit per channel) |
Configuration generics: |
IO_PWM_NUM_CH |
number of PWM channels to implement (0..60) |
CPU interrupts: |
none |
The PWM controller implements a pulse-width modulation controller with up to 60 independent channels and 8- bit resolution per channel. The actual number of implemented channels is defined by the IO_PWM_NUM_CH generic. Setting this generic to zero will completely remove the PWM controller from the design.
The PWM controller is based on an 8-bit base counter with a programmable threshold comparators for each channel that defines the actual duty cycle. The controller can be used to drive fancy RGB-LEDs with 24- bit true color, to dim LCD back-lights or even for "analog" control. An external integrator (RC low-pass filter) can be used to smooth the generated "analog" signals.
Theory of Operation
The PWM controller is activated by setting the PWM_CTRL_EN bit in the module’s control register CTRL
. When this
bit is cleared, the unit is reset and all PWM output channels are set to zero.
The 8-bit duty cycle for each channel, which represents the channel’s "intensity", is defined via an 8-bit value. The module
provides up to 15 duty cycle registers DUTY[0]
to DUTY[14]
(depending on the number of implemented channels).
Each register contains the duty cycle configuration for 4 consecutive channels. For example, the duty cycle of channel 0
is defined via bits 7:0 in DUTY[0]
. The duty cycle of channel 2 is defined via bits 15:0 in DUTY[0]
.
Channel 4’s duty cycle is defined via bits 7:0 in DUTY[1]
and so on.
Regardless of the configuration of IO_PWM_NUM_CH all module registers can be accessed without raising an exception. Software can discover the number of available channels by writing 0xff to all duty cycle configuration bytes and reading those values back. The duty-cycle of channels that were not implemented always reads as zero. |
Based on the configured duty cycle the according intensity of the channel can be computed by the following formula:
Intensityx = DUTY[y](i*8+7 downto i*8)
/ (28)
The base frequency of the generated PWM signals is defined by the PWM core clock. This clock is derived from the main processor clock and divided by a prescaler via the 3-bit PWM_CTRL_PRSCx in the unit’s control register. The following pre-scalers are available:
PWM_CTRL_PRSCx |
0b000 |
0b001 |
0b010 |
0b011 |
0b100 |
0b101 |
0b110 |
0b111 |
---|---|---|---|---|---|---|---|---|
Resulting |
2 |
4 |
8 |
64 |
128 |
1024 |
2048 |
4096 |
The resulting PWM base frequency is defined by:
fPWM = fmain[Hz] / (28 * clock_prescaler
)
Address | Name [C] | Bit(s), Name [C] | R/W | Function |
---|---|---|---|---|
|
|
|
r/w |
PWM enable |
|
r/w |
3-bit clock prescaler select |
||
|
r/w |
|||
|
r/w |
|||
|
|
|
r/w |
8-bit duty cycle for channel 0 |
|
r/w |
8-bit duty cycle for channel 1 |
||
|
r/w |
8-bit duty cycle for channel 2 |
||
|
r/w |
8-bit duty cycle for channel 3 |
||
… |
… |
… |
r/w |
… |
|
|
|
r/w |
8-bit duty cycle for channel 56 |
|
r/w |
8-bit duty cycle for channel 57 |
||
|
r/w |
8-bit duty cycle for channel 58 |
||
|
r/w |
8-bit duty cycle for channel 59 |
2.5.16. True Random-Number Generator (TRNG)
Hardware source file(s): |
neorv32_trng.vhd |
|
Software driver file(s): |
neorv32_trng.c |
|
neorv32_trng.h |
||
Top entity port: |
none |
|
Configuration generics: |
IO_TRNG_EN |
implement TRNG when true |
CPU interrupts: |
none |
Theory of Operation
The NEORV32 true random number generator provides physical true random numbers for your application. Instead of using a pseudo RNG like a LFSR, the TRNG of the processor uses a simple, straight-forward ring oscillator as physical entropy source. Hence, voltage and thermal fluctuations are used to provide true physical random data.
The TRNG features a platform independent architecture without FPGA-specific primitives, macros or attributes. |
Architecture
The NEORV32 TRNG is based on simple ring oscillators, which are implemented as an inverter chain with an odd number of inverters. A latch is used to decouple each individual inverter. Basically, this architecture is some king of asynchronous LFSR.
The output of several ring oscillators are synchronized using two registers and are XORed together. The resulting output is de-biased using a von-Neumann randomness extractor. This de-biased output is further processed by a simple 8-bit Fibonacci LFSR to improve whitening. After at least 8 clock cycles the state of the LFSR is sampled and provided as final data output.
To prevent the synthesis tool from doing logic optimization and thus, removing all but one inverter, the TRNG uses simple latches to decouple an inverter and its actual output. The latches are reset when the TRNG is disabled and are enabled one by one by a "real" shift register when the TRNG is activated. This construct can be synthesized for any FPGA platform. Thus, the NEORV32 TRNG provides a platform independent architecture.
TRNG Configuration
The TRNG uses several ring-oscillators, where the next oscillator provides a slightly longer chain (more inverters) than the one before. This increment is constant for all implemented oscillators. This setup can be customized by modifying the "Advanced Configuration" constants in the TRNG’s VHDL file:
-
The
num_roscs_c
constant defines the total number of ring oscillators in the system. num_inv_start_c defines the number of inverters used by the first ring oscillators (has to be an odd number). Each additional ring oscillator providesnum_inv_inc_c
more inverters that the one before (has to be an even number). -
The LFSR-based post-processing can be deactivated using the
lfsr_en_c
constant. The polynomial tap mask of the LFSR can be customized usinglfsr_taps_c
.
Using the TRNG
The TRNG features a single register for status and data access. When the TRNG_CTRL_EN control register (CTRL
)
bit is set, the TRNG is enabled and starts operation. As soon as the TRNG_CTRL_VALID bit is set, the currently
sampled 8-bit random data byte can be obtained from the lowest 8 bits of the CTRL
register
(TRNG_CTRL_DATA_MSB : TRNG_CTRL_DATA_LSB). The TRNG_CTRL_VALID bit is automatically cleared
when reading the control register.
The TRNG needs at least 8 clock cycles to generate a new random byte. During this sampling time the current output random data is kept stable in the output register until a valid sampling of the new byte has completed. |
Randomness "Quality"
I have not verified the quality of the generated random numbers (for example using NIST test suites). The
quality is highly effected by the actual configuration of the TRNG and the resulting FPGA mapping/routing.
However, generating larger histograms of the generated random number shows an equal distribution (binary
average of the random numbers = 127). A simple evaluation test/demo program can be found in
sw/example/demo_trng
.
Address | Name [C] | Bit(s), Name [C] | R/W | Function |
---|---|---|---|---|
|
|
|
r/- |
8-bit random data output |
|
r/w |
TRNG enable |
||
|
r/- |
random data output is valid when set |
2.5.17. Custom Functions Subsystem (CFS)
Hardware source file(s): |
neorv32_gfs.vhd |
|
Software driver file(s): |
neorv32_gfs.c |
|
neorv32_gfs.h |
||
Top entity port: |
|
custom input conduit |
|
custom output conduit |
|
Configuration generics: |
IO_CFS_EN |
implement CFS when true |
IO_CFS_CONFIG |
custom generic conduit |
|
IO_CFS_IN_SIZE |
size of |
|
IO_CFS_OUT_SIZE |
size of |
|
CPU interrupts: |
fast IRQ channel 1 |
CFS interrupt (see Processor Interrupts) |
Theory of Operation
The custom functions subsystem is meant for implementing application-specific user-defined co-processors
IP [5] blocks. The CFS provides up to 32x 32-bit memory-mapped
registers (REG
, see register map table below) that can be accessed by the CPU via normal load/store operations.
The actual functionality of these register has to be defined by the hardware designer. Furthermore, the CFS
provides two IO conduits to implement custom module- or chip-external interfaces.
In contrast to connecting custom hardware accelerators via external memory interfaces (like SPI or the processor’s external bus interface), the CFS provide a convenient, low-latency and tightly-coupled extension and customization option.
Just like any other externally-connected IP, logic implemented within the custom functions subsystem can operate independently of the CPU providing true parallel processing capabilities. Potential use cases might include dedicated hardware accelerators for en-/decryption (AES), signal processing (FFT) or AI applications (CNNs) as well as custom IO systems like fast memory interfaces (DDR) and mass storage (SDIO), networking (CAN) or real-time data transport (I2S).
Take a look at the template CFS VHDL source file (rtl/core/neorv32_cfs.vhd
). The file is highly
commented to illustrate all aspects that are relevant for implementing custom CFS-based co-processor designs.
CFS Software Access
The CFS memory-mapped registers can be accessed by software using the provided C-language aliases (see
register map table below). Note that all interface registers provide 32-bit access data of type uint32_t
.
// C-code CFS usage example
NEORV32_CFS.REG[0] = (uint32_t)some_data_array(i); // write to CFS register 0
uint32_t temp = NEORV32_CFS.REG[20]; // read from CFS register 20
CFS Interrupt
The CFS provides a single high-level-triggered interrupt request signal mapped to the CPU’s fast interrupt channel 1. Once set, the interrupt has to stay asserted until explicitly acknowledged by the software (for example by writing to a specific CFS register). See section Processor Interrupts for more information.
CFS Configuration Generic
By default, the CFS provides a single 32-bit std_(u)logic_vector
configuration generic IO_CFS_CONFIG
that is available in the processor’s top entity. This generic can be used to pass custom configuration options
from the top entity directly down to the CFS. The actual definition of the generics and it’S usage inside the
CFS is left to the hardware designer.
CFS Custom IOs
By default, the CFS also provides two unidirectional input and output conduits cfs_in_i
and cfs_out_o
.
These signals are directly propagated to the processor’s top entity. These conduits can be used to implement
application-specific interfaces like memory or network connections. The actual use case of these signals
has to be defined by the hardware designer.
The size of the input signal conduit cfs_in_i
is defined via the top’s IO_CFS_IN_SIZE configuration
generic (default = 32-bit). The size of the output signal conduit cfs_out_o
is defined via the top’s
IO_CFS_OUT_SIZE configuration generic (default = 32-bit). If the custom function subsystem is not implemented
(IO_CFS_EN = false) the cfs_out_o
signal is tied to all-zero.
Address | Name [C] | Bit(s) | R/W | Function |
---|---|---|---|---|
|
|
|
(r)/(w) |
custom CFS interface register 0 |
|
|
|
(r)/(w) |
custom CFS interface register 1 |
… |
… |
|
(r)/(w) |
… |
|
|
|
(r)/(w) |
custom CFS interface register 30 |
|
|
|
(r)/(w) |
custom CFS interface register 31 |
2.5.18. Smart LED Interface (NEOLED)
Hardware source file(s): |
neorv32_neoled.vhd |
|
Software driver file(s): |
neorv32_neoled.c |
|
neorv32_neoled.h |
||
Top entity port: |
|
1-bit serial data output |
Configuration generics: |
IO_NEOLED_EN |
implement NEOLED when true |
IO_NEOLED_TX_FIFO |
TX FIFO depth (1..32k, has to be a power of two) |
|
CPU interrupts: |
fast IRQ channel 9 |
NEOLED interrupt (see Processor Interrupts) |
Theory of Operation
The NEOLED module provides a dedicated interface for "smart RGB LEDs" like the WS2812 or WS2811. These LEDs provide a single interface wire that uses an asynchronous serial protocol for transmitting color data. Basically, data is transferred via LED-internal shift registers, which allows to cascade an unlimited number of smart LEDs. The protocol provides a RESET command to strobe the transmitted data into the LED PWM driver registers after data has shifted throughout all LEDs in a chain.
The NEOLED interface is compatible to the "Adafruit Industries NeoPixel" products, which feature WS2812 (or older WS2811) smart LEDs (see link:https://learn.adafruit.com/adafruit-neopixel-uberguide). |
The interface provides a single 1-bit output neoled_o
to drive an arbitrary number of cascaded LEDs. Since the
NEOLED module provides 24-bit and 32-bit operating modes, a mixed setup with RGB LEDs (24-bit color)
and RGBW LEDs (32-bit color including a dedicated white LED chip) is possible.
Theory of Operation - NEOLED Module
The NEOLED modules provides two accessible interface registers: the control register CTRL
and the
TX data register DATA
. The NEOLED module is globally enabled via the control register’s
NEOLED_CTRL_EN bit. Clearing this bit will terminate any current operation, clear the TX buffer, reset the module
and set the neoled_o
output to zero. The precise timing (implementing the WS2812 protocol) and transmission
mode are fully programmable via the CTRL
register to provide maximum flexibility.
RGB / RGBW Configuration
NeoPixel are available in two "color" version: LEDs with three chips providing RGB color and LEDs with four chips providing RGB color plus a dedicated white LED chip (= RGBW). Since the intensity of every LED chip is defined via an 8-bit value the RGB LEDs require a frame of 24-bit per module and the RGBW LEDs require a frame of 32-bit per module.
The data transfer quantity of the NEOLED module can be configured via the NEOLED_MODE_EN control
register bit. If this bit is cleared, the NEOLED interface operates in 24-bit mode and will transmit bits 23:0
of
the data written to DATA
to the LEDs. If NEOLED_MODE_EN is set, the NEOLED interface operates in 32-bit
mode and will transmit bits 31:0
of the data written to DATA
to the LEDs.
The mode bit can be configured before writing each new data word in order to support an arbitrary setup of RGB and RGBW LEDs.
Theory of Operation - Protocol
The interface of the WS2812 LEDs uses an 800kHz carrier signal. Data is transmitted in a serial manner starting with LSB-first. The intensity for each R, G & B (& W) LED chip (= color code) is defined via an 8-bit value. The actual data bits are transferred by modifying the duty cycle of the signal (the timings for the WS2812 are shown below). A RESET command is "send" by pulling the data line LOW for at least 50μs.

Ttotal (Tcarrier) |
1.25μs +/- 300ns |
period for a single bit |
T0H |
0.4μs +/- 150ns |
high-time for sending a |
T0L |
0.8μs +/- 150ns |
low-time for sending a |
T1H |
0.85μs +/- 150ns |
high-time for sending a |
T1L |
0.45μs +/- 150 ns |
low-time for sending a |
RESET |
Above 50μs |
low-time for sending a RESET command |
Timing Configuration
The basic carrier frequency (800kHz for the WS2812 LEDs) is configured via a 3-bit main clock prescaler (NEOLED_CTRL_PRSCx, see table below) that scales the main processor clock fmain and a 5-bit cycle multiplier NEOLED_CTRL_T_TOT_x.
NEOLED_CTRL_PRSCx |
0b000 |
0b001 |
0b010 |
0b011 |
0b100 |
0b101 |
0b110 |
0b111 |
---|---|---|---|---|---|---|---|---|
Resulting |
2 |
4 |
8 |
64 |
128 |
1024 |
2048 |
4096 |
The duty-cycles (or more precisely: the high- and low-times for sending either a '1' bit or a '0' bit) are defined via the 5-bit NEOLED_CTRL_T_ONE_H_x and NEOLED_CTRL_T_ZERO_H_x values, respectively. These programmable timing constants allow to adapt the interface for a wide variety of smart LED protocol (for example WS2812 vs. WS2811).
Timing Configuration - Example (WS2812)
Generate the base clock fTX for the NEOLED TX engine:
-
processor clock fmain = 100 MHz
-
NEOLED_CTRL_PRSCx =
0b001
= fmain / 4
fTX = fmain[Hz] / clock_prescaler
= 100MHz / 4 = 25MHz
TTX = 1 / fTX = 40ns
Generate carrier period (Tcarrier) and high-times (duty cycle) for sending 0
(T0H) and 1
(T1H) bits:
-
NEOLED_CTRL_T_TOT =
0b11110
(= decimal 30) -
NEOLED_CTRL_T_ZERO_H =
0b01010
(= decimal 10) -
NEOLED_CTRL_T_ONE_H =
0b10100
(= decimal 20)
Tcarrier = TTX * NEOLED_CTRL_T_TOT = 40ns * 30 = 1.4µs
T0H = TTX * NEOLED_CTRL_T_ZERO_H = 40ns * 10 = 0.4µs
T1H = TTX * NEOLED_CTRL_T_ONE_H = 40ns * 20 = 0.8µs
The NEOLED SW driver library (neorv32_neoled.h ) provides a simplified configuration
function that configures all timing parameters for driving WS2812 LEDs based on the processor
clock frequency.
|
TX Data FIFO
The interface features a TX data buffer (a FIFO) to allow more CPU-independent operation. The buffer depth is configured via the IO_NEOLED_TX_FIFO top generic (default = 1 entry). The FIFO size configuration can be read via the NEOLED_CTRL_BUFS_x control register bits, which result log2(IO_NEOLED_TX_FIFO).
When writing data to the DATA
register the data is automatically written to the TX buffer. Whenever
data is available in the buffer the serial transmission engine will take it and transmit it to the LEDs.
The data transfer size (NEOLED_MODE_EN) can be modified at every time since this control register bit is also buffered
in the FIFO. This allows to arbitrarily mixing RGB and RGBW LEDs in the chain.
Software can check the FIFO fill level via the control register’s NEOLED_CTRL_TX_EMPTY, NEOLED_CTRL_TX_HALF and NEOLED_CTRL_TX_FULL flags. The NEOLED_CTRL_TX_BUSY flags provides additional information if the the TX unit is still busy sending data.
Please note that the timing configurations (NEOLED_CTRL_PRSCx, NEOLED_CTRL_T_TOT_x, NEOLED_CTRL_T_ONE_H_x and NEOLED_CTRL_T_ZERO_H_x) are NOT stored to the buffer. Changing these value while the buffer is not empty or the TX engine is still busy will cause data corruption. |
-
Strobe Command ("RESET") **
According to the WS2812 specs the data written to the LED’s shift registers is strobed to the actual PWM driver registers when the data line is low for 50μs ("RESET" command, see table above). This can be implemented using busy-wait for at least 50μs. Obviously, this concept wastes a lot of processing power.
To circumvent this, the NEOLED module provides an option to automatically issue an idle time for creating the RESET
command. If the NEOLED_CTRL_STROBE control register bit is set, all data written to the data FIFO (via DATA
,
the actually written data is irrelevant) will trigger an idle phase (neoled_o
= zero) of 127 periods (= Tcarrier).
This idle time will cause the LEDs to strobe the color data into the PWM driver registers.
Since the NEOLED_CTRL_STROBE flag is also buffered in the TX buffer, the RESET command is treated just as another data word being written to the TX buffer making busy wait concepts obsolete and allowing maximum refresh rates.
Interrupt
The NEOLED modules features a single interrupt that becomes pending based on the current TX buffer fill level.
The interrupt can only become pending if the NEOLED module is enabled. The specific interrupt condition
is configured via the NEOLED_CTRL_IRQ_CONF in the control register NEORV32_NEOLED.CTRL
.
If NEOLED_CTRL_IRQ_CONF is cleared, an interrupt is generated whenever the TX FIFO is less than half-full.
In this case software can write up to IO_NEOLED_TX_FIFO/2 new data words to DATA
without checking the FIFO
status flags. The interrupt request is cleared whenever the FIFO fill level is above half-full level or if
the NEOLED module is disabled.
If NEOLED_CTRL_IRQ_CONF is set, an interrupt is generated whenever the TX FIFO is empty. The interrupt request is cleared again when the FIFO contains at least one data word.
The NEOLED_CTRL_IRQ_CONF is hardwired to one if IO_NEOLED_TX_FIFO = 1 (→ IRQ if FIFO is empty). |
If the FIFO is configured to contain only a single entry (IO_NEOLED_TX_FIFO = 1) the interrupt will become pending if the FIFO (which is just a single register providing simple double-buffering) is empty.
Address | Name [C] | Bit(s), Name [C] | R/W | Function |
---|---|---|---|---|
|
|
|
r/w |
NEOLED enable |
|
r/w |
data transfer size; |
||
|
r/w |
|
||
|
r/w |
3-bit clock prescaler, bit 0 |
||
|
r/w |
3-bit clock prescaler, bit 1 |
||
|
r/w |
3-bit clock prescaler, bit 2 |
||
|
r/- |
4-bit log2(IO_NEOLED_TX_FIFO) |
||
|
r/- |
|||
|
r/- |
|||
|
r/- |
|||
|
r/w |
5-bit pulse clock ticks per total single-bit period (Ttotal) |
||
|
r/w |
|||
|
r/w |
|||
|
r/w |
|||
|
r/w |
|||
|
r/w |
5-bit pulse clock ticks per high-time for sending a zero-bit (T0H) |
||
|
r/w |
|||
|
r/w |
|||
|
r/w |
|||
|
r/w |
|||
|
r/w |
5-bit pulse clock ticks per high-time for sending a one-bit (T1H) |
||
|
r/w |
|||
|
r/w |
|||
|
r/w |
|||
|
r/w |
|||
|
r/w |
TX FIFO interrupt configuration: |
||
|
r/- |
TX FIFO is empty |
||
|
r/- |
TX FIFO is at least half full |
||
|
r/- |
TX FIFO is full |
||
|
r/- |
TX serial engine is busy when set |
||
|
|
|
-/w |
TX data (32-/24-bit) |
2.5.19. External Interrupt Controller (XIRQ)
Hardware source file(s): |
neorv32_xirq.vhd |
|
Software driver file(s): |
neorv32_xirq.c |
|
neorv32_xirq.h |
||
Top entity port: |
|
IRQ input (up to 32-bit) |
Configuration generics: |
XIRQ_NUM_CH |
Number of IRQs to implement (0..32) |
XIRQ_TRIGGER_TYPE |
IRQ trigger type configuration |
|
XIRQ_TRIGGER_POLARITY |
IRQ trigger polarity configuration |
|
CPU interrupts: |
fast IRQ channel 8 |
XIRQ (see Processor Interrupts) |
The eXternal interrupt controller provides a simple mechanism to implement up to 32 processor-external interrupt request signals. The external IRQ requests are prioritized, queued and signaled to the CPU via a single CPU fast interrupt request.
Theory of Operation
The XIRQ provides up to 32 interrupt channels (configured via the XIRQ_NUM_CH generic). Each bit in the xirq_i
input signal vector represents one interrupt channel. An interrupt channel is enabled by setting the according bit in the
interrupt enable register IER
.
If the configured trigger (see below) of an enabled channel fires, the request is stored into an internal buffer.
This buffer is available via the interrupt pending register IPR
. A 1
in this register indicates that the
corresponding interrupt channel has fired but has not yet been serviced (so it is pending). An interrupt channel can
become pending if the according IER
bit is set. Pending IRQs can be cleared by writing 0
to the according IPR
bit. As soon as there is a least one pending interrupt in the buffer, an interrupt request is send to the CPU.
A disabled interrupt channel can still be pending if it has been triggered before clearing the according IER bit.
|
The CPU can determine active external interrupt request either by checking the bits in the IPR
register, which show all
pending interrupt channels, or by reading the interrupt source register SCR
.
This register provides a 5-bit wide ID (0..31) that shows the interrupt request with highest priority.
Interrupt channel xirq_i(0)
has highest priority and xirq_i(XIRQ_NUM_CH-1)
has lowest priority.
This priority assignment is fixed and cannot be altered by software.
The CPU can use the ID from SCR
to service IRQ according to their priority. To acknowledge the according
interrupt the CPU can write 1 << SCR
to IPR
.
In order to clear a pending FIRQ interrupt from the external interrupt controller, the CPU has to write any
value to the interrupt source register SRC
.
An interrupt handler should clear the interrupt pending bit that caused the interrupt first before
acknowledging the interrupt by writing the SCR register.
|
IRQ Trigger Configuration
The controller does not provide a configuration option to define the IRQ triggers during runtime. Instead, two generics are provided to configure the trigger of each interrupt channel before synthesis: the XIRQ_TRIGGER_TYPE and XIRQ_TRIGGER_POLARITY generic. Both generics are 32 bit wide representing one bit per interrupt channel. If less than 32 interrupt channels are implemented the remaining configuration bits are ignored.
XIRQ_TRIGGER_TYPE is used to define the general trigger type. This can be either level-triggered (0
) or
edge-triggered (1
). XIRQ_TRIGGER_POLARITY is used to configure the polarity of the trigger: a 0
defines
low-level or falling-edge and a 1
defines high-level or rising-edge.
XIRQ_TRIGGER_TYPE => x"00000001";
XIRQ_TRIGGER_POLARITY => x"ffffffff";
Address | Name [C] | Bit(s) | R/W | Function |
---|---|---|---|---|
|
|
|
r/w |
Interrupt enable register (one bit per channel, LSB-aligned) |
|
|
|
r/w |
Interrupt pending register (one bit per channel, LSB-aligned); writing 0 to a bit clears according pending interrupt |
|
|
|
r/w |
Channel id (0..31) of firing IRQ (prioritized!); writing any value will acknowledge the current interrupt |
|
- |
|
r/- |
reserved, read as zero |
2.5.20. General Purpose Timer (GPTMR)
Hardware source file(s): |
neorv32_gptmr.vhd |
|
Software driver file(s): |
neorv32_gptmr.c |
|
neorv32_gptmr.h |
||
Top entity port: |
none |
|
Configuration generics: |
IO_GPTMR_EN |
implement timer when true |
CPU interrupts: |
fast IRQ channel 12 |
transmission done interrupt (see Processor Interrupts) |
Theory of Operation
The general purpose timer module provides a simple yet universal 32-bit timer. The timer is implemented if
IO_GPTMR_EN top generic is set true. It provides a 32-bit counter register (COUNT
) and a 32-bit threshold
register (THRES
). An interrupt is generated whenever the value of the counter registers matches the one from
threshold register.
The timer is enabled by setting the GPTMR_CTRL_EN bit in the device’s control register CTRL
. The COUNT
register will start incrementing at a programmable rate, which scales the main processor clock. The
pre-scaler value is configured via the three GPTMR_CTRL_PRSCx control register bits:
GPTMR_CTRL_PRSCx |
0b000 |
0b001 |
0b010 |
0b011 |
0b100 |
0b101 |
0b110 |
0b111 |
---|---|---|---|---|---|---|---|---|
Resulting |
2 |
4 |
8 |
64 |
128 |
1024 |
2048 |
4096 |
The timer provides two operation modes that are configured by the GPTMR_CTRL_MODE control register bit:
if GPTMR_CTRL_MODE is cleared (0
) the timer operates in single-shot mode. As soon as COUNT
matches
THRES
an interrupt request is generated and the timer stops operation (i.e. it stops incrementing). If
GPTMR_CTRL_MODE is set (1
) the timer operates in continuous mode. When COUNT
matches THRES
an interrupt
request is generated and COUNT
is automatically reset to all-zero before continuing to increment.
Disabling the timer will not clear the COUNT register. However, it can be manually reset at any time by
writing zero to it.
|
Timer Interrupt
The timer interrupt gets pending when the timer is enabled and COUNT
matches THRES
. The interrupt
request is indicated via the GPTMR_CTRL_ALARM control register bit. This bit as well as the actual
interrupt keeps pending until the bit is explicitly cleared by application software or if the
timer is disabled.
Address | Name [C] | Bit(s), Name [C] | R/W | Function |
---|---|---|---|---|
|
|
|
r/w |
Timer enable flag |
|
r/w |
3-bit clock prescaler select |
||
|
r/w |
|||
|
r/w |
|||
|
r/w |
Counter mode: |
||
|
r/c |
Pending interrupt/alarm, cleared by setting bit to zero |
||
|
|
|
r/w |
Threshold value register |
|
|
|
r/w |
Counter register |
2.5.21. System Configuration Information Memory (SYSINFO)
Hardware source file(s): |
neorv32_sysinfo.vhd |
|
Software driver file(s): |
neorv32.h |
|
Top entity port: |
none |
|
Configuration generics: |
* |
most of the top’s configuration generics |
CPU interrupts: |
none |
Theory of Operation
The SYSINFO allows the application software to determine the setting of most of the processor’s top entity generics that are related to processor/SoC configuration. All registers of this unit are read-only.
This device is always implemented - regardless of the actual hardware configuration. The bootloader as well as the NEORV32 software runtime environment require information from this device (like memory layout and default clock speed) for correct operation.
Address | Name [C] | Function |
---|---|---|
|
|
clock speed in Hz (via top’s CLOCK_FREQUENCY generic) |
|
|
specific CPU configuration (see SYSINFO - CPU Configuration) |
|
|
specific SoC configuration (see SYSINFO - SoC Configuration) |
|
|
cache configuration information (see SYSINFO - Cache Configuration) |
|
|
instruction address space base (via package’s |
|
|
internal IMEM size in bytes (via top’s MEM_INT_IMEM_SIZE generic) |
|
|
data address space base (via package’s |
|
|
internal DMEM size in bytes (via top’s MEM_INT_DMEM_SIZE generic) |
SYSINFO - CPU Configuration
Bit | Name [C] | Function |
---|---|---|
|
SYSINFO_CPU_ZICSR |
|
|
SYSINFO_CPU_ZIFENCEI |
|
|
SYSINFO_CPU_ZMMUL |
|
|
SYSINFO_CPU_ZFINX |
|
|
SYSINFO_CPU_ZXSCNT |
Custom extension - Small CPU counters: |
|
SYSINFO_CPU_ZXNOCNT |
Custom extension - NO CPU counters: |
|
SYSINFO_CPU_PMP |
|
|
SYSINFO_CPU_HPM |
|
|
SYSINFO_CPU_DEBUGMODE |
RISC-V CPU |
`30 |
SYSINFO_CPU_FASTMUL |
fast multiplication available when set (via top’s FAST_MUL_EN generic) |
|
SYSINFO_CPU_FASTSHIFT |
fast shifts available when set (via top’s FAST_SHIFT_EN generic) |
SYSINFO - SoC Configuration
Bit | Name [C] | Function |
---|---|---|
|
SYSINFO_SOC_BOOTLOADER |
set if the processor-internal bootloader is implemented (via top’s INT_BOOTLOADER_EN generic) |
|
SYSINFO_SOC_MEM_EXT |
set if the external Wishbone bus interface is implemented (via top’s MEM_EXT_EN generic) |
|
SYSINFO_SOC_MEM_INT_IMEM |
set if the processor-internal DMEM implemented (via top’s MEM_INT_DMEM_EN generic) |
|
SYSINFO_SOC_MEM_INT_DMEM |
set if the processor-internal IMEM is implemented (via top’s MEM_INT_IMEM_EN generic) |
|
SYSINFO_SOC_MEM_EXT_ENDIAN |
set if external bus interface uses BIG-endian byte-order (via top’s MEM_EXT_BIG_ENDIAN generic) |
|
SYSINFO_SOC_ICACHE |
set if processor-internal instruction cache is implemented (via top’s ICACHE_EN generic) |
|
SYSINFO_SOC_OCD |
set if on-chip debugger implemented (via top’s ON_CHIP_DEBUGGER_EN generic) |
|
SYSINFO_SOC_HW_RESET |
set if a dedicated hardware reset of all core registers is implemented (via package’s |
|
SYSINFO_SOC_IO_GPIO |
set if the GPIO is implemented (via top’s IO_GPIO_EN generic) |
|
SYSINFO_SOC_IO_MTIME |
set if the MTIME is implemented (via top’s IO_MTIME_EN generic) |
|
SYSINFO_SOC_IO_UART0 |
set if the primary UART0 is implemented (via top’s IO_UART0_EN generic) |
|
SYSINFO_SOC_IO_SPI |
set if the SPI is implemented (via top’s IO_SPI_EN generic) |
|
SYSINFO_SOC_IO_TWI |
set if the TWI is implemented (via top’s IO_TWI_EN generic) |
|
SYSINFO_SOC_IO_PWM |
set if the PWM is implemented (via top’s [_io_pwm_en] generic) |
|
SYSINFO_SOC_IO_WDT |
set if the WDT is implemented (via top’s IO_WDT_EN generic) |
|
SYSINFO_SOC_IO_CFS |
set if the custom functions subsystem is implemented (via top’s IO_CFS_EN generic) |
|
SYSINFO_SOC_IO_TRNG |
set if the TRNG is implemented (via top’s IO_TRNG_EN generic) |
|
SYSINFO_SOC_IO_SLINK |
set if the SLINK is implemented (via top’s SLINK_NUM_TX and/or SLINK_NUM_RX generics) |
|
SYSINFO_SOC_IO_UART1 |
set if the secondary UART1 is implemented (via top’s IO_UART1_EN generic) |
|
SYSINFO_SOC_IO_NEOLED |
set if the NEOLED is implemented (via top’s IO_NEOLED_EN generic) |
SYSINFO - Cache Configuration
Bit fields in this register are set to all-zero if the according cache is not implemented. |
Bit | Name [C] | Function |
---|---|---|
|
SYSINFO_CACHE_IC_BLOCK_SIZE_3 : SYSINFO_CACHE_IC_BLOCK_SIZE_0 |
log2(i-cache block size in bytes), via top’s ICACHE_BLOCK_SIZE generic |
|
SYSINFO_CACHE_IC_NUM_BLOCKS_3 : SYSINFO_CACHE_IC_NUM_BLOCKS_0 |
log2(i-cache number of cache blocks), via top’s [_icache_num_blocks] generic |
|
SYSINFO_CACHE_IC_ASSOCIATIVITY_3 : SYSINFO_CACHE_IC_ASSOCIATIVITY_0 |
log2(i-cache associativity), via top’s ICACHE_ASSOCIATIVITY generic |
|
SYSINFO_CACHE_IC_REPLACEMENT_3 : SYSINFO_CACHE_IC_REPLACEMENT_0 |
i-cache replacement policy ( |
|
- |
zero, reserved for d-cache |
3. NEORV32 Central Processing Unit (CPU)

Key Features
-
32-bit multi-cycle in-order
rv32
RISC-V CPU -
Optional RISC-V extensions:
-
A
- atomic memory access operations -
B
- bit-manipulation instructions -
C
- 16-bit compressed instructions -
I
- integer base ISA (always enabled) -
E
- embedded CPU version (reduced register file size) -
M
- integer multiplication and division hardware -
U
- less-privileged user mode -
Zfinx
- single-precision floating-point unit -
Zicsr
- control and status register access (privileged architecture) -
Zicntr
- CPU base counters -
Zihpm
- hardware performance monitors -
Zifencei
- instruction stream synchronization -
Zmmul
- integer multiplication hardware -
PMP
- physical memory protection -
Debug
- debug mode
-
-
Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications - passes the official RISC-V Architecture Tests (v2+)
-
Official RISC-V open-source architecture ID
-
Standard RISC-V interrupts (external, timer, software) plus 16 fast interrupts
-
Supports all of the machine-level traps from the RISC-V specifications (including bus access exceptions and all unimplemented/illegal/malformed instructions)
-
This is a special aspect on execution safety by Full Virtualization
-
-
Optional physical memory configuration (PMP), compatible to the RISC-V specifications
-
Optional hardware performance monitors (HPM) for application benchmarking
-
Separated interfaces for instruction fetch and data access (merged into a single processor bus))
-
little-endian byte order
-
Configurable hardware reset
-
No hardware support of unaligned data/instruction accesses - they will trigger an exception.
It is recommended to use the NEORV32 Processor as default top instance even if you only want to use the actual CPU. Simply disable all the processor-internal modules via the generics and you will get a "CPU wrapper" that provides a minimal CPU environment and an external bus interface (like AXI4). This setup also allows to further use the default bootloader and software framework. From this base you can start building your own SoC. Of course you can also use the CPU in it’s true stand-alone mode. |
This documentation assumes the reader is familiar with the official RISC-V "User" and "Privileged Architecture" specifications. |
3.1. Architecture
The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture specifications. The following figure shows the simplified architecture of the CPU.

The CPU implements a multi-cycle architecture. Hence, each instruction is executed as a series of consecutive micro-operations. In order to increase performance, the CPU’s front-end (instruction fetch) and back-end (instruction execution) are de-couples via a FIFO (the "instruction prefetch buffer"). Therefore, the front-end can already fetch new instructions while the back-end is still processing previously-fetched instructions.
The front-end is responsible for fetching 32-bit chunks of instruction words (one aligned 32-bit instruction, two 16-bit instructions or a mixture if 32-bit instructions are not aligned to 32-bit boundaries). The instruction data is stored to a FIFO queue - the instruction prefetch buffer.
The back-end is responsible for the actual execution of the instruction. It includes an "issue engine", which takes data from the instruction prefetch buffer and assembles 32-bit instruction words (plain 32-bit instruction or decompressed 16-bit instructions) for execution.
Front-end and back-end operate in parallel and with overlapping operations. Hence, the optimal CPI (cycles per instructions) is 2, but it can be significantly higher: for instance when executing loads/stores (accessing memory-mapped devices with high latency), executing multi-cycle ALU operations (like divisions) or when the CPU front-end has to reload the prefetch buffer due to a taken branch.
Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes every single instruction (including fetch) in a series of consecutive micro-operations. The combination of these two classical design paradigms allows an increased instruction execution in contrast to a pure multi-cycle approach (due to overlapping operation of fetch and execute) at a reduced hardware footprint (due to the multi-cycle concept).
As a Von-Neumann machine, the CPU provides independent interfaces for instruction fetch and data access. These two bus interfaces are merged into a single processor-internal bus via a prioritizing bus switch (data accesses have higher priority). Hence, ALL memory locations including peripheral devices are mapped to a single unified 32-bit address space.
3.2. Full Virtualization
Just like the RISC-V ISA the NEORV32 aims to provide maximum virtualization capabilities on CPU and SoC level to allow a high standard of execution safety. The CPU supports all traps specified by the official RISC-V specifications. [6] Thus, the CPU provides defined hardware fall-backs via traps for any expected and unexpected situation (e.g. executing an malformed instruction word or accessing a not-allocated memory address). For any kind of trap the core is always in a defined and fully synchronized state throughout the whole architecture (i.e. there are no out-of-order operations that might have to reverted). This allows predictable execution behavior at any time improving overall execution safety.
Execution Safety - NEORV32 Virtualization Features
-
Due to the acknowledged memory accesses the CPU is always sync with the memory system (i.e. there is no speculative execution / no out-of-order states).
-
The CPU supports all RISC-V compatible bus exceptions including access exceptions, which are triggered if an accessed address does not respond or encounters an internal error during access.
-
Accessed memory addresses (plain memory, but also memory-mapped devices) need to respond within a fixed time window. Otherwise a bus access exception is raised.
-
The RISC-V specs. state that executing an malformed instruction results in unpredictable behavior. As an additional execution safety feature the NEORV32 CPU ensures that all unimplemented/malformed/illegal instructions do raise an illegal instruction exceptions and do not commit any state-changing operation (like writing registers or triggering memory operations).
-
To be continued…
3.3. RISC-V Compatibility
The NEORV32 CPU passes the rv32_m/I, rv32_m/M, rv32_m/C, rv32_m/privilege, and
rv32_m/Zifencei tests of the official RISC-V Architecture Tests (GitHub). The port files for the
NEORV32 processor are located in the repository’s sw/isa-test
folder.
See section User Guide: RISC-V Architecture Test Framework for information how to run the tests on the NEORV32. |
rv32_m/C
TestsCheck cadd-01 ... OK Check caddi-01 ... OK Check caddi16sp-01 ... OK Check caddi4spn-01 ... OK Check cand-01 ... OK Check candi-01 ... OK Check cbeqz-01 ... OK Check cbnez-01 ... OK Check cebreak-01 ... OK Check cj-01 ... OK Check cjal-01 ... OK Check cjalr-01 ... OK Check cjr-01 ... OK Check cli-01 ... OK Check clui-01 ... OK Check clw-01 ... OK Check clwsp-01 ... OK Check cmv-01 ... OK Check cnop-01 ... OK Check cor-01 ... OK Check cslli-01 ... OK Check csrai-01 ... OK Check csrli-01 ... OK Check csub-01 ... OK Check csw-01 ... OK Check cswsp-01 ... OK Check cxor-01 ... OK -------------------------------- OK: 27/27 RISCV_TARGET=neorv32 RISCV_DEVICE=C XLEN=32
rv32_m/I
TestsCheck add-01 ... OK Check addi-01 ... OK Check and-01 ... OK Check andi-01 ... OK Check auipc-01 ... OK Check beq-01 ... OK Check bge-01 ... OK Check bgeu-01 ... OK Check blt-01 ... OK Check bltu-01 ... OK Check bne-01 ... OK Check fence-01 ... OK Check jal-01 ... OK Check jalr-01 ... OK Check lb-align-01 ... OK Check lbu-align-01 ... OK Check lh-align-01 ... OK Check lhu-align-01 ... OK Check lui-01 ... OK Check lw-align-01 ... OK Check or-01 ... OK Check ori-01 ... OK Check sb-align-01 ... OK Check sh-align-01 ... OK Check sll-01 ... OK Check slli-01 ... OK Check slt-01 ... OK Check slti-01 ... OK Check sltiu-01 ... OK Check sltu-01 ... OK Check sra-01 ... OK Check srai-01 ... OK Check srl-01 ... OK Check srli-01 ... OK Check sub-01 ... OK Check sw-align-01 ... OK Check xor-01 ... OK Check xori-01 ... OK -------------------------------- OK: 38/38 RISCV_TARGET=neorv32 RISCV_DEVICE=I XLEN=32
rv32_m/M
TestsCheck div-01 ... OK Check divu-01 ... OK Check mul-01 ... OK Check mulh-01 ... OK Check mulhsu-01 ... OK Check mulhu-01 ... OK Check rem-01 ... OK Check remu-01 ... OK -------------------------------- OK: 8/8 RISCV_TARGET=neorv32 RISCV_DEVICE=M XLEN=32
rv32_m/privilege
TestsCheck ebreak ... OK Check ecall ... OK Check misalign-beq-01 ... OK Check misalign-bge-01 ... OK Check misalign-bgeu-01 ... OK Check misalign-blt-01 ... OK Check misalign-bltu-01 ... OK Check misalign-bne-01 ... OK Check misalign-jal-01 ... OK Check misalign-lh-01 ... OK Check misalign-lhu-01 ... OK Check misalign-lw-01 ... OK Check misalign-sh-01 ... OK Check misalign-sw-01 ... OK Check misalign1-jalr-01 ... OK Check misalign2-jalr-01 ... OK -------------------------------- OK: 16/16 RISCV_TARGET=neorv32 RISCV_DEVICE=privilege XLEN=32
rv32_m/Zifencei
TestsCheck Fencei ... OK -------------------------------- OK: 1/1 RISCV_TARGET=neorv32 RISCV_DEVICE=Zifencei XLEN=32
3.3.1. RISC-V Incompatibility Issues and Limitations
This list shows the currently identified issues regarding full RISC-V-compatibility. More specific information can be found in section Instruction Sets and Extensions.
Hardwired R/W CSRs
The misa , mip and mtval CSRs in the NEORV32 are read-only.
Any write access to it (in machine mode) to them are ignored and will not cause any exceptions or side-effects.
Pending interrupt can only be cleared by acknowledging the interrupt-causing device. However, pending interrupts
can still be ignored by clearing the according mie register bits.
|
Physical memory protection
The physical memory protection (see section [_machine_physical_memory_protection])
only supports the modes OFF and NAPOT yet and a minimal granularity of 8 bytes per region.
|
Atomic memory operations
The A CPU extension only implements the lr.w and sc.w instructions yet.
However, these instructions are sufficient to emulate all further atomic memory operations.
|
Bit-manipulation operations
The NEORV32 B extension only implements the basic bit-manipulation instructions (Zbb ) subset
and the address generation instructions (Zba ) subset yet.
|
Instruction Misalignment
This is not a real RISC-V incompatibility, but something that might not be clear when studying the RISC-V privileged
architecture specifications: for 32-bit only instructions (no C extension) the misaligned instruction exception
is raised if bit 1 of the access address is set (i.e. not on 32-bit boundary). If the C extension is implemented
there will be no misaligned instruction exceptions at all.
In both cases bit 0 of the program counter and all related registers is hardwired to zero.
|
3.4. CPU Top Entity - Signals
The following table shows all interface signals of the CPU top entity rtl/core/neorv32_cpu.vhd
. The
type of all signals is std_ulogic or std_ulogic_vector, respectively. The "Dir." column shows the signal
direction seen from the CPU.
Signal | Width | Dir. | Function |
---|---|---|---|
Global Signals |
|||
|
1 |
in |
global clock line, all registers triggering on rising edge |
|
1 |
in |
global reset, low-active |
|
1 |
out |
CPU is in sleep mode when set |
Instruction Bus Interface (Bus Interface) |
|||
|
32 |
out |
destination address |
|
32 |
in |
read data |
|
32 |
out |
write data (always zero) |
|
4 |
out |
byte enable |
|
1 |
out |
write transaction (always zero) |
|
1 |
out |
read transaction |
|
1 |
out |
exclusive access request (always zero) |
|
1 |
in |
bus transfer acknowledge from accessed peripheral |
|
1 |
in |
bus transfer terminate from accessed peripheral |
|
1 |
out |
indicates an executed fence.i instruction |
|
2 |
out |
current CPU privilege level |
Data Bus Interface (Bus Interface) |
|||
|
32 |
out |
destination address |
|
32 |
in |
read data |
|
32 |
out |
write data |
|
4 |
out |
byte enable |
|
1 |
out |
write transaction |
|
1 |
out |
read transaction |
|
1 |
out |
exclusive access request |
|
1 |
in |
bus transfer acknowledge from accessed peripheral |
|
1 |
in |
bus transfer terminate from accessed peripheral |
|
1 |
out |
indicates an executed fence instruction |
|
2 |
out |
current CPU privilege level |
System Time (see |
|||
|
64 |
in |
system time input (from MTIME) |
Interrupts, RISC-V-compatible (Traps, Exceptions and Interrupts) |
|||
|
1 |
in |
RISC-V machine software interrupt |
|
1 |
in |
RISC-V machine external interrupt |
|
1 |
in |
RISC-V machine timer interrupt |
Fast Interrupts, NEORV32-specific (Traps, Exceptions and Interrupts) |
|||
|
16 |
in |
fast interrupt request signals |
Enter Debug Mode Request (On-Chip Debugger (OCD)) |
|||
|
1 |
in |
request CPU to halt and enter debug mode |
3.5. CPU Top Entity - Generics
Most of the CPU configuration generics are a subset of the actual Processor configuration generics (see section Processor Top Entity - Generics). and are not listed here. However, the CPU provides some specific generics that are used to configure the CPU for the NEORV32 processor setup. These generics are assigned by the processor setup only and are not available for user defined configuration. The specific generics are listed below.
CPU_BOOT_ADDR |
std_ulogic_vector(31 downto 0) |
0x00000000 |
This address defines the reset address at which the CPU starts fetching instructions after reset. In terms of the NEORV32 processor, this generic is configured with the base address of the bootloader ROM (default) or with the base address of the processor-internal instruction memory (IMEM) if the bootloader is disabled (INT_BOOTLOADER_EN = false). See section Address Space for more information. |
CPU_DEBUG_ADDR |
std_ulogic_vector(31 downto 0) |
0x00000000 |
This address defines the entry address for the "execution based" on-chip debugger. By default, this generic is configured with the base address of the debugger memory. See section On-Chip Debugger (OCD) for more information. |
CPU_EXTENSION_RISCV_DEBUG |
boolean |
false |
Implement RISC-V-compatible "debug" CPU operation mode. See section CPU Debug Mode for more information. |
3.6. Instruction Sets and Extensions
The basic NEORV32 is a RISC-V rv32i
architecture that provides several optional RISC-V CPU and ISA
(instruction set architecture) extensions. For more information regarding the RISC-V ISA extensions please
see the the RISC-V Instruction Set Manual - Volume I: Unprivileged ISA and The RISC-V Instruction Set Manual
Volume II: Privileged Architecture, which are available in the projects docs/references
folder.
The CPU can discover available ISA extensions via the misa CSR and the
CPU SYSINFO register
or by executing an instruction and checking for an illegal instruction exception.
|
Executing an instruction from an extension that is not supported yet or that is currently not enabled (via the according top entity generic) will raise an illegal instruction exception. |
3.6.1. A
- Atomic Memory Access
Atomic memory access instructions allow more sophisticated memory operations like implementing semaphores and mutexes.
The RICS-C specs. defines a specific atomic extension that provides instructions for atomic memory accesses. The A
ISA extension is enabled if the CPU_EXTENSION_RISCV_A
configuration generic is true.
In this case the following additional instructions are available:
-
lr.w
: load-reservate -
sc.w
: store-conditional
Even though only lr.w and sc.w instructions are implemented yet, all further atomic operations
(load-modify-write instruction) can be emulated using these two instruction. Furthermore, the
instruction’s ordering flags (aq and lr ) are ignored by the CPU hardware. Using any other (not yet
implemented) AMO (atomic memory operation) will raise an illegal instruction exception.
|
The load-reservate instruction behaves as a "normal" load-word instruction (lw
) but will also set a CPU-internal
data memory access lock. Executing a store-conditional behaves as "normal" store-word instruction (sw
) that will
only conduct an actual memory write operations if the lock is still intact. Additionally, the store-conditional instruction
will also return the lock state (returns zero if the lock is still intact or non-zero if the lock has been broken).
After the execution of the sc
instruction, the lock is automatically removed.
The lock is broken if at least one of the following conditions occur:
. executing any data memory access instruction other than lr.w
. raising any t (for example an interrupt or a memory access exception)
The atomic instructions have special requirements for memory system / bus interconnect. More information can be found in sections Bus Interface and Processor-External Memory Interface (WISHBONE) (AXI4-Lite), respectively. |
3.6.2. B
- Bit-Manipulation Operations
The B
ISA extension adds instructions for bit-manipulation operations. This extension is enabled if the
CPU_EXTENSION_RISCV_B
configuration generic is true.
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-bitmanip
The NEORV32 B extension only implements the basic bit-manipulation instructions (Zbb ) subset
and the address generation instructions (Zba ) subset yet.
|
The Zbb
sub-extension adds the following instruction:
-
andn
,orn
,xnor
-
clz
,ctz
,cpop
-
max
,maxu
,min
,minu
-
sext.b
,sext.h
,zext.h
-
rol
,ror
,rori
-
orc.b
,rev8
The Zba
sub-extension adds the following instruction:
-
sh1add
,sh2add
,sh3add
By default, the bit-manipulation unit uses an iterative approach to compute shift-related operations
like clz and rol . To increase performance (at the cost of additional hardware resources) the
FAST_SHIFT_EN generic can be enabled to implement full-parallel logic (like barrel shifters) for all
shift-related B instructions.
|
The B extension is frozen but not officially ratified yet. There is no
software support for this extension in the upstream GCC RISC-V port yet. However, an
intrinsic library is provided to utilize the provided B extension features from C-language
code (see sw/example/bitmanip_test ).
|
3.6.3. C
- Compressed Instructions
The compressed ISA extension provides 16-bit encodings of commonly used instructions to reduce code space size.
The C
extension is available when the CPU_EXTENSION_RISCV_C
configuration generic is true.
In this case the following instructions are available:
-
c.addi4spn
,c.lw
,c.sw
,c.nop
,c.addi
,c.jal
,c.li
,c.addi16sp
,c.lui
,c.srli
,c.srai
c.andi
,c.sub
,c.xor
,c.or
,c.and
,c.j
,c.beqz
,c.bnez
,c.slli
,c.lwsp
,c.jr
,c.mv
,c.ebreak
,c.jalr
,c.add
,c.swsp
When the compressed instructions extension is enabled, branches to an unaligned and uncompressed instruction require
an additional instruction fetch to load the according second half-word of that instruction. The performance can be increased
again by forcing a 32-bit alignment of branch target addresses. By default, this is enforced via the GCC -falign-functions=4 ,
-falign-labels=4 , -falign-loops=4 and -falign-jumps=4 compile flags (via the makefile).
|
3.6.4. E
- Embedded CPU
The embedded CPU extensions reduces the size of the general purpose register file from 32 entries to 16 entries to
decrease physical hardware requirements (for example block RAM). This extensions is enabled when the CPU_EXTENSION_RISCV_E
configuration generic is true. Accesses to registers beyond x15
will raise and illegal instruction exception.
This extension does not add any additional instructions or features.
Due to the reduced register file size an alternate toolchain ABI (ilp32e ) is required.
|
3.6.5. I
- Base Integer ISA
The CPU always supports the complete rv32i
base integer instruction set. This base set is always enabled
regardless of the setting of the remaining exceptions. The base instruction set includes the following
instructions:
-
immediate:
lui
,auipc
-
jumps:
jal
,jalr
-
branches:
beq
,bne
,blt
,bge
,bltu
,bgeu
-
memory:
lb
,lh
,lw
,lbu
,lhu
,sb
,sh
,sw
-
alu:
addi
,slti
,sltiu
,xori
,ori
,andi
,slli
,srli
,srai
,add
,sub
,sll
,slt
,sltu
,xor
,srl
,sra
,or
,and
-
environment:
ecall
,ebreak
,fence
In order to keep the hardware footprint low, the CPU’s shift unit uses a bit-serial serial approach. Hence, shift operations
take up to 32 cycles (plus overhead) depending on the actual shift amount. Alternatively, the shift operations can be processed
completely in parallels by a fast (but large) barrel shifter when the FAST_SHIFT_EN generic is true. In that case, shift operations
complete within 2 cycles (plus overhead) regardless of the actual shift amount.
|
Internally, the fence instruction does not perform any operation inside the CPU. It only sets the
top’s d_bus_fence_o signal high for one cycle to inform the memory system a fence instruction has been
executed. Any flags within the fence instruction word are ignore by the hardware.
|
3.6.6. M
- Integer Multiplication and Division
Hardware-accelerated integer multiplication and division operations are available when the
CPU_EXTENSION_RISCV_M
configuration generic is true. In this case the following instructions are
available:
-
multiplication:
mul
,mulh
,mulhsu
,mulhu
-
division:
div
,divu
,rem
,remu
By default, multiplication and division operations are executed in a bit-serial approach.
Alternatively, the multiplier core can be implemented using DSP blocks if the FAST_MUL_EN
generic is true allowing faster execution. Multiplications and divisions
always require a fixed amount of cycles to complete - regardless of the input operands.
|
3.6.7. Zmmul
- Integer Multiplication
This is a sub-extension of the M
ISA extension. It implements the multiplication-only operations
of the M
extensions and is intended for size-constrained setups that require hardware-based
integer multiplications but not hardware-based divisions, which will be computed entirely in software.
This extension requires only ~50% of the hardware utilization of the "full" M
extension.
-
multiplication:
mul
,mulh
,mulhsu
,mulhu
If Zmmul
is enabled, executing any division instruction from the M
ISA extension (div
, divu
, rem
, remu
)
will raise an illegal instruction exception.
Note that M
and Zmmul
extensions cannot be enabled at the same time.
If your RISC-V GCC toolchain does not (yet) support the _Zmmul ISA extensions, it can be "emulated"
using a rv32im machine architecture and setting the -mno-div compiler flag
(example $ make MARCH=rv32im USER_FLAGS+=-mno-div clean_all exe ).
|
3.6.8. U
- Less-Privileged User Mode
In addition to the basic (and highest-privileged) machine-mode, the user-mode ISA extensions adds a second less-privileged
operation mode. It is implemented if the CPU_EXTENSION_RISCV_U
configuration generic is true.
Code executed in user-mode cannot access machine-mode CSRs. Furthermore, user-mode access to the address space (like
peripheral/IO devices) can be constrained via the physical memory protection (PMP).
Any kind of privilege rights violation will raise an exception to allow full virtualization.
3.6.9. X
- NEORV32-Specific (Custom) Extensions
The NEORV32-specific extensions are always enabled and are indicated by the set X
bit in the misa
CSR.
The most important points of the NEORV32-specific extensions are:
* The CPU provides 16 fast interrupt interrupts (FIRQ)
, which are controlled via custom bits in the mie
and mip
CSR. This extension is mapped to reserved CSR bits, that are available for custom use (according to the
RISC-V specs). Also, custom trap codes for mcause
are implemented.
* All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception (see Full Virtualization).
3.6.10. Zfinx
Single-Precision Floating-Point Operations
The Zfinx
floating-point extension is an alternative of the standard F
floating-point ISA extension.
The Zfinx
extensions also uses the integer register file x
to store and operate on floating-point data
instead of a dedicated floating-point register file (hence, F-in-x
). Thus, the Zfinx
extension requires
less hardware resources and features faster context changes. This also implies that there are NO dedicated f
register file-related load/store or move instructions.
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-zfinx
The NEORV32 floating-point unit used by the Zfinx extension is compatible to the IEEE-754 specifications.
|
The Zfinx
extensions only supports single-precision (.s
instruction suffix), so it is a direct alternative
to the F
extension. The Zfinx
extension is implemented when the CPU_EXTENSION_RISCV_Zfinx
configuration
generic is true. In this case the following instructions and CSRs are available:
-
conversion:
fcvt.s.w
,fcvt.s.wu
,fcvt.w.s
,fcvt.wu.s
-
comparison:
fmin.s
,fmax.s
,feq.s
,flt.s
,fle.s
-
computational:
fadd.s
,fsub.s
,fmul.s
-
sign-injection:
fsgnj.s
,fsgnjn.s
,fsgnjx.s
-
number classification:
fclass.s
-
additional CSRs:
fcsr
,frm
,fflags
Fused multiply-add instructions f[n]m[add/sub].s are not supported!
Division fdiv.s and square root fsqrt.s instructions are not supported yet!
|
Subnormal numbers ("de-normalized" numbers) are not supported by the NEORV32 FPU.
Subnormal numbers (exponent = 0) are flushed to zero setting them to +/- 0 before entering the
FPU’s processing core. If a computational instruction (like fmul.s ) generates a subnormal result, the
result is also flushed to zero during normalization.
|
The Zfinx extension is not yet officially ratified, but is expected to stay unchanged. There is no
software support for the Zfinx extension in the upstream GCC RISC-V port yet. However, an
intrinsic library is provided to utilize the provided Zfinx floating-point extension from C-language
code (see sw/example/floating_point_test ).
|
3.6.11. Zicsr
Control and Status Register Access / Privileged Architecture
The CSR access instructions as well as the exception and interrupt system (= the privileged architecture)
is implemented when the CPU_EXTENSION_RISCV_Zicsr
configuration generic is true.
In this case the following instructions are available:
-
CSR access:
csrrw
,csrrs
,csrrc
,csrrwi
,csrrsi
,csrrci
-
environment:
mret
,wfi
If the Zicsr extension is disabled the CPU does not provide any privileged architecture features at all!
In order to provide the full set of functions and to allow a secure execution
environment the Zicsr extension should always be enabled.
|
The "wait for interrupt instruction" wfi works like a sleep command. When executed, the CPU is
halted until a valid interrupt request occurs. To wake up again, the according interrupt source has to
be enabled via the mie CSR and the global interrupt enable flag in mstatus has to be set.
|
The wfi instruction may also be executed in user-mode without causing an exception as mstatus bit
TW (timeout wait) is hardwired to zero.
|
3.6.12. Zicntr
CPU Base Counters
The Zicntr
ISA extension adds the basic cycle [m]cycle[h]
), instruction-retired ([m]instret[h]
) and time (time[h]
)
counters. This extensions is stated is mandatory by the RISC-V spec. However, size-constrained setups may remove support for
these counters. Section (Machine) Counter and Timer CSRs shows a list of all Zicntr
-related CSRs.
These are available if the Zicntr
ISA extensions is enabled via the CPU_EXTENSION_RISCV_Zicntr generic.
Disabling the Zicntr extension does not remove the time[h] -driving MTIME unit.
|
If Zicntr
is disabled, all accesses to the according counter CSRs will raise an illegal instruction exception.
3.6.13. Zihpm
Hardware Performance Monitors
In additions to the base cycle, instructions-retired and time counters the NEORV32 CPU provides
up to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of an
N-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top’s
HPM_CNT_WIDTH
generic (0..64-bit) and a corresponding event configuration CSR. The event configuration
CSR defines the architectural events that lead to an increment of the associated HPM counter.
The HPM counters are available if the Zihpm
ISA extensions is enabled via the CPU_EXTENSION_RISCV_Zihpm generic.
Depending on the configuration the following additional CSR are available:
-
counters:
mhpmcounter*[h]
(3..31, depending onHPM_NUM_CNTS
) -
event configuration:
mhpmevent*
(3..31, depending onHPM_NUM_CNTS
)
The HPM counter CSR can only be accessed in machine-mode. Hence, the according mcounteren CSR bits
are always zero and read-only. Any access from less-privileged modes will raise an illegal instruction
exception.
|
Auto-increment of the HPMs can be individually deactivated via the mcountinhibit CSR.
|
For a list of all HPM-related CSRs and all provided event configurations see section Hardware Performance Monitors (HPM). |
3.6.14. Zifencei
Instruction Stream Synchronization
The Zifencei
CPU extension is implemented if the CPU_EXTENSION_RISCV_Zifencei
configuration
generic is true. It allows manual synchronization of the instruction stream via the following instruction:
-
fence.i
The fence.i
instruction resets the CPU’s front-end (instruction fetch) and flushes the prefetch buffer.
This allows a clean re-fetch of modified instructions from memory. Also, the top’s i_bus_fencei_o
signal is set
high for one cycle to inform the memory system (like the i-cache to perform a flush/reload.
Any additional flags within the fence.i
instruction word are ignore by the hardware.
3.6.15. PMP
Physical Memory Protection
The NEORV32 physical memory protection (PMP) is compatible to the RISC-V PMP specifications. It can be used to constrain memory read/write/execute rights for each available privilege level.
The NEORV32 PMP only supports NAPOT mode yet and a minimal region size (granularity) of 8 bytes. Larger
minimal sizes can be configured via the top PMP_MIN_GRANULARITY
generic to reduce hardware requirements.
The physical memory protection system is implemented when the PMP_NUM_REGIONS
configuration generic is >0.
In this case the following additional CSRs are available:
-
pmpcfg*
(0..15, depending on configuration): PMP configuration registers -
pmpaddr*
(0..63, depending on configuration): PMP address registers
See section [_machine_physical_memory_protection] for more information regarding the PMP CSRs. |
The actual number of regions and the minimal region granularity are defined via the top entity
PMP_MIN_GRANULARITY
and PMP_NUM_REGIONS
generics. PMP_MIN_GRANULARITY
defines the minimal available
granularity of each region in bytes. PMP_NUM_REGIONS
defines the total number of implemented regions and thus, the
number of available pmpcfg*
and pmpaddr*
CSRs.
When implementing more PMP regions that a certain critical limit an additional register stage is automatically inserted into the CPU’s memory interfaces to reduce critical path length. Unfortunately, this will also increase the latency of instruction fetches and data access by +1 cycle.
The critical limit can be adapted for custom use by a constant from the main VHDL package file
(rtl/core/neorv32_package.vhd
). The default value is 8:
-- "critical" number of PMP regions --
constant pmp_num_regions_critical_c : natural := 8;
Operation
Any CPU memory access address (from the instruction fetch or data access interface) is tested if it is accessing any
of the specified PMP regions(configured via pmpaddr*
and enabled via pmpcfg*
). If an
address matches one of these regions, the configured access rights (attributes in pmpcfg*
) are enforced:
-
a write access (store) will fail if no write attribute is set
-
a read access (load) will fail if no read attribute is set
-
an instruction fetch access will fail if no execute attribute is set
If an access to a protected region does not have the according access rights it will raise the according instruction/load/store access fault exception.
By default, all PMP checks are enforced for user-level programs only. If you wish to enforce the physical
memory protection also for machine-level programs you need to set the locked bit in the according
pmpcfg*
configuration CSR.
After updating the address configuration registers pmpaddr* the system requires up to 33 cycles for
internal (iterative) computations before the configuration becomes valid.
|
For more information regarding RISC-V physical memory protection see the official The RISC-V Instruction Set Manual - Volume II: Privileged Architecture specifications. |
3.7. Instruction Timing
The instruction timing listed in the table below shows the required clock cycles for executing a certain instruction. These instruction cycles assume a bus access without additional wait states and a filled pipeline.
Average CPI (cycles per instructions) values for "real applications" like for executing the CoreMark benchmark for different CPU configurations are presented in CPU Performance.
Class | ISA | Instruction(s) | Execution cycles |
---|---|---|---|
ALU |
|
|
2 |
ALU |
|
|
2 |
ALU |
|
|
|
ALU |
|
|
|
Branches |
|
|
Taken: 5 + ML[12]; Not taken: 3 |
Branches |
|
|
Taken: 5 + ML[13]; Not taken: 3 |
Jumps / Calls |
|
|
4 + ML |
Jumps / Calls |
|
|
4 + ML |
Memory access |
|
|
4 + ML |
Memory access |
|
|
4 + ML |
Memory access |
|
|
4 + ML |
Multiplication |
|
|
2+31+3; FAST_MUL[14]: 5 |
Division |
|
|
22+32+4 |
CSR access |
|
|
4 |
System |
|
|
4 |
System |
|
|
3 |
System |
|
|
4 |
System |
|
|
5 |
System |
|
|
3 + ML |
Floating-point - artihmetic |
|
|
110 |
Floating-point - artihmetic |
|
|
112 |
Floating-point - artihmetic |
|
|
22 |
Floating-point - compare |
|
|
13 |
Floating-point - misc |
|
|
12 |
Floating-point - conversion |
|
|
47 |
Floating-point - conversion |
|
|
48 |
Bit-manipulation - arithmetic/logic |
|
|
3 |
Bit-manipulation - arithmetic/logic |
|
|
3 |
Bit-manipulation - shifts |
|
|
3 + 0..32 |
Bit-manipulation - shifts |
|
|
3 + 32 |
Bit-manipulation - shifts |
|
|
3 + SA |
Bit-manipulation - single-bit |
|
|
3 |
Bit-manipulation - shifted-add |
|
|
3 |
The presented values of the floating-point execution cycles are average values - obtained from 4096 instruction executions using pseudo-random input values. The execution time for emulating the instructions (using pure-software libraries) is ~17..140 times higher. |
3.8. Control and Status Registers (CSRs)
The following table shows a summary of all available CSRs. The address field defines the CSR address for the CSR access instructions. The [ASM] name can be used for (inline) assembly code and is directly understood by the assembler/compiler. The [C] names are defined by the NEORV32 core library and can be used as immediate in plain C code. The R/W column shows whether the CSR can be read and/or written. The NEORV32-specific CSRs are mapped to the official "custom CSRs" CSR address space.
The CSRs, the CSR-related instructions as well as the complete exception/interrupt processing
system are only available when the CPU_EXTENSION_RISCV_Zicsr generic is true.
|
When trying to write to a read-only CSR (like the time CSR) or when trying to access a nonexistent
CSR or when trying to access a machine-mode CSR from less-privileged user-mode an
illegal instruction exception is raised.
|
CSR reset value: Please note that most of the CSRs do NOT provide a dedicated reset. Hence,
these CSRs are not initialized by a hardware reset and keep an UNDEFINED value until they are
explicitly initialized by the software (normally, this is already done by the NEORV32-specific
crt0.S start-up code). For more information see section CPU Hardware Reset.
|
CSR Listing
The description of each single CSR provides the following summary:
Address |
Description |
ASM alias |
Reset value: CSR content after hardware reset (also see CPU Hardware Reset) |
||
Detailed description |
Not Implemented CSRs / CSR Bits
All CSR bits that are unused / not implemented / not shown are hardwired to zero. All CSRs that are not
implemented at all (and are not "disabled" using certain configuration generics) will trigger an exception on
access. The CSR that are implemented within the NEORV32 might cause an exception if they are disabled.
See the according CSR description for more information.
|
Debug Mode CSRs
The debug mode CSRs are not listed here since they are only accessible in debug mode and not during normal CPU operation.
See section CPU Debug Mode CSRs.
|
CSR Listing Notes
CSRs with the following notes …
-
X
: custom - have or are a custom CPU-specific extension (that is allowed by the RISC-V specs) -
R
: read-only - are read-only (in contrast to the originally specified r/w capability) -
C
: constrained - have a constrained compatibility, not all specified bits are implemented
Address | Name [ASM] | Name [C] | R/W | Function | Note |
---|---|---|---|---|---|
0x001 |
CSR_FFLAGS |
r/w |
Floating-point accrued exceptions |
||
0x002 |
CSR_FRM |
r/w |
Floating-point dynamic rounding mode |
||
0x003 |
CSR_FCSR |
r/w |
Floating-point control and status ( |
||
0x30a |
CSR_MENVCFG |
r/- |
Machine environment configuration register - low word |
|
|
0x31a |
CSR_MENVCFGH |
r/- |
Machine environment configuration register - low word |
|
|
0x300 |
CSR_MSTATUS |
r/w |
Machine status register - low word |
|
|
0x301 |
CSR_MISA |
r/- |
Machine CPU ISA and extensions |
|
|
0x304 |
CSR_MIE |
r/w |
Machine interrupt enable register |
|
|
0x305 |
CSR_MTVEC |
r/w |
Machine trap-handler base address (for ALL traps) |
||
0x306 |
CSR_MCOUNTEREN |
r/w |
Machine counter-enable register |
|
|
0x310 |
CSR_MSTATUSH |
r/- |
Machine status register - high word |
|
|
0x340 |
CSR_MSCRATCH |
r/w |
Machine scratch register |
||
0x341 |
CSR_MEPC |
r/w |
Machine exception program counter |
||
0x342 |
CSR_MCAUSE |
r/w |
Machine trap cause |
|
|
0x343 |
CSR_MTVAL |
r/- |
Machine bad address or instruction |
|
|
0x344 |
CSR_MIP |
r/- |
Machine interrupt pending register |
|
|
0x3a0 .. 0x3af |
CSR_PMPCFG0 .. CSR_PMPCFG15 |
r/w |
Physical memory protection config. for region 0..63 |
|
|
0x3b0 .. 0x3ef |
CSR_PMPADDR0 .. CSR_PMPADDR63 |
r/w |
Physical memory protection addr. register region 0..63 |
||
0xb00 |
CSR_MCYCLE |
r/w |
Machine cycle counter low word |
||
0xb02 |
CSR_MINSTRET |
r/w |
Machine instruction-retired counter low word |
||
0xb80 |
CSR_MCYCLE |
r/w |
Machine cycle counter high word |
||
0xb82 |
CSR_MINSTRET |
r/w |
Machine instruction-retired counter high word |
||
0xc00 |
CSR_CYCLE |
r/- |
Cycle counter low word |
||
0xc01 |
CSR_TIME |
r/- |
System time (from MTIME) low word |
||
0xc02 |
CSR_INSTRET |
r/- |
Instruction-retired counter low word |
||
0xc80 |
CSR_CYCLEH |
r/- |
Cycle counter high word |
||
0xc81 |
CSR_TIMEH |
r/- |
System time (from MTIME) high word |
||
0xc82 |
CSR_INSTRETH |
r/- |
Instruction-retired counter high word |
||
0x323 .. 0x33f |
CSR_MHPMEVENT3 .. CSR_MHPMEVENT31 |
r/w |
Machine performance-monitoring event selector 3..31 |
|
|
0xb03 .. 0xb1f |
CSR_MHPMCOUNTER3 .. CSR_MHPMCOUNTER31 |
r/w |
Machine performance-monitoring counter 3..31 low word |
||
0xb83 .. 0xb9f |
CSR_MHPMCOUNTER3H .. CSR_MHPMCOUNTER31H |
r/w |
Machine performance-monitoring counter 3..31 high word |
||
0x320 |
CSR_MCOUNTINHIBIT |
r/w |
Machine counter-enable register |
||
0xf11 |
CSR_MVENDORID |
r/- |
Vendor ID |
||
0xf12 |
CSR_MARCHID |
r/- |
Architecture ID |
||
0xf13 |
CSR_MIMPID |
r/- |
Machine implementation ID / version |
||
0xf14 |
CSR_MHARTID |
r/- |
Machine thread ID |
||
0xf15 |
CSR_MCONFIGPTR |
r/- |
Machine configuration pointer register |
3.8.1. Floating-Point CSRs
These CSRs are available if the Zfinx
extensions is enabled (CPU_EXTENSION_RISCV_Zfinx
is true).
Otherwise any access to the floating-point CSRs will raise an illegal instruction exception.
fflags
0x001 |
Floating-point accrued exceptions |
|
Reset value: UNDEFINED |
||
The |
frm
0x002 |
Floating-point dynamic rounding mode |
|
Reset value: UNDEFINED |
||
The |
fcsr
0x003 |
Floating-point control and status register |
|
Reset value: UNDEFINED |
||
The |
3.8.2. Machine Configuration CSRs
menvcfg
0x30a |
Machine environment configuration register |
|
Reset value: 0x00000000 |
||
The features of this CSR are not implemented yet. The register is read-only. NOTE: This register
only exists if the |
menvcfgh
0x31a |
Machine environment configuration register - high word |
|
Reset value: 0x00000000 |
||
The features of this CSR are not implemented yet. The register is read-only. NOTE: This register
only exists if the |
3.8.3. Machine Trap Setup CSRs
mstatus
0x300 |
Machine status register |
|
Reset value: 0x00000000 |
||
The |
Bit | Name [C] | R/W | Function |
---|---|---|---|
12:11 |
CSR_MSTATUS_MPP_H : CSR_MSTATUS_MPP_L |
r/w |
Previous machine privilege level, 11 = machine (M) level, 00 = user (U) level |
7 |
CSR_MSTATUS_MPIE |
r/w |
Previous machine global interrupt enable flag state |
3 |
CSR_MSTATUS_MIE |
r/w |
Machine global interrupt enable flag |
When entering an exception/interrupt, the MIE
flag is copied to MPIE
and cleared afterwards. When leaving
the exception/interrupt (via the mret
instruction), MPIE
is copied back to MIE
.
misa
0x301 |
ISA and extensions |
|
Reset value: configuration dependant |
||
The |
The misa CSR is not fully RISC-V-compatible as it is read-only. Hence, implemented CPU
extensions cannot be switch on/off during runtime. For compatibility reasons any write access to this
CSR is simply ignored and will NOT cause an illegal instruction exception.
|
Bit | Name [C] | R/W | Function |
---|---|---|---|
31:30 |
CSR_MISA_MXL_HI_EXT : CSR_MISA_MXL_LO_EXT |
r/- |
32-bit architecture indicator (always 01) |
23 |
CSR_MISA_X_EXT |
r/- |
|
20 |
CSR_MISA_U_EXT |
r/- |
|
12 |
CSR_MISA_M_EXT |
r/- |
|
8 |
CSR_MISA_I_EXT |
r/- |
|
4 |
CSR_MISA_E_EXT |
r/- |
|
2 |
CSR_MISA_C_EXT |
r/- |
|
0 |
CSR_MISA_A_EXT |
r/- |
|
Information regarding the implemented RISC-V Z* sub-extensions (like Zicsr or Zfinx ) can be found
in the CPU SYSINFO register.
|
mie
0x304 |
Machine interrupt-enable register |
|
Reset value: UNDEFINED |
||
The |
Bit | Name [C] | R/W | Function |
---|---|---|---|
31:16 |
CSR_MIE_FIRQ15E : CSR_MIE_FIRQ0E |
r/w |
Fast interrupt channel 15..0 enable |
11 |
CSR_MIE_MEIE |
r/w |
Machine external interrupt enable |
7 |
CSR_MIE_MTIE |
r/w |
Machine timer interrupt enable (from MTIME) |
3 |
CSR_MIE_MSIE |
r/w |
Machine software interrupt enable |
mtvec
0x305 |
Machine trap-handler base address |
|
Reset value: UNDEFINED |
||
The |
Bit | R/W | Function |
---|---|---|
31:2 |
r/w |
4-byte aligned base address of trap base handler |
1:0 |
r/- |
Always zero |
mcounteren
0x306 |
Machine counter enable |
|
Reset value: UNDEFINED |
||
The |
Bit | Name [C] | R/W | Function |
---|---|---|---|
31:3 |
|
r/- |
Always zero: user-level code is not allowed to read HPM counters |
2 |
CSR_MCOUNTEREN_IR |
r/w |
User-level code is allowed to read |
1 |
CSR_MCOUNTEREN_TM |
r/w |
User-level code is allowed to read |
0 |
CSR_MCOUNTEREN_CY |
r/w |
User-level code is allowed to read |
mstatush
0x310 |
Machine status register - high word |
|
Reset value: 0x00000000 |
||
The |
The NEORV32 mstatush CSR is not a physical register. All write access are ignored and all read accesses will always
return zero. However, any access will not raise an illegal instruction exception. The CSR address is implemented
in order to comply with the RISC-V privilege architecture specs.
|
3.8.4. Machine Trap Handling CSRs
mscratch
0x340 |
Scratch register for machine trap handlers |
|
Reset value: UNDEFINED |
||
The |
mepc
0x341 |
Machine exception program counter |
|
Reset value: UNDEFINED |
||
The |
mcause
0x342 |
Machine trap cause |
|
Reset value: UNDEFINED |
||
The |
Bit | R/W | Function |
---|---|---|
31 |
r/w |
|
30:5 |
r/- |
Reserved, read as zero |
4:0 |
r/w |
Trap ID, see NEORV32 Trap Listing |
mtval
0x343 |
Machine bad address or instruction |
|
Reset value: UNDEFINED |
||
The |
Trap cause | mtval content |
---|---|
misaligned instruction fetch address or instruction fetch access fault |
address of faulting instruction fetch |
breakpoint |
program counter (= address) of faulting instruction itself |
misaligned load address, load access fault, misaligned store address or store access fault |
program counter (= address) of faulting instruction itself |
illegal instruction |
actual instruction word of faulting instruction |
anything else including interrupts |
0x00000000 (always zero) |
The NEORV32 mtval
CSR is read-only. However, a write access will NOT raise an illegal instruction exception.
mip
0x344 |
Machine interrupt Pending |
|
Reset value: 0x00000000 |
||
The |
Bit | Name [C] | R/W | Function |
---|---|---|---|
31:16 |
CSR_MIP_FIRQ15P : CSR_MIP_FIRQ0P |
r/- |
fast interrupt channel 15..0 pending |
11 |
CSR_MIP_MEIP |
r/- |
machine external interrupt pending |
7 |
CSR_MIP_MTIP |
r/- |
machine timer interrupt pending |
3 |
CSR_MIP_MSIP |
r/- |
machine software interrupt pending |
The NEORV32 mip
CSR is read-only. However, a write access will NOT raise an illegal instruction exception.
3.8.5. Machine Physical Memory Protection CSRs
The available physical memory protection logic is configured via the PMP_NUM_REGIONS and
PMP_MIN_GRANULARITY top entity generics. PMP_NUM_REGIONS defines the number of implemented
protection regions and thus, the availability of the according pmpcfg*
and pmpaddr*
CSRs.
If trying to access an PMP-related CSR beyond PMP_NUM_REGIONS no illegal instruction exception is triggered. The according CSRs are read-only (writes are ignored) and always return zero. |
The RISC-V-compatible NEORV32 physical memory protection only implements the NAPOT (naturally aligned power-of-two region) mode with a minimal region granularity of 8 bytes. |
pmpcfg
0x3a0 - 0x3af |
Physical memory protection configuration registers |
|
Reset value: 0x00000000 |
||
The |
Bit | RISC-V name | R/W | Function |
---|---|---|---|
7 |
L |
r/w |
lock bit, can be set - but not be cleared again (only via CPU reset) |
6:5 |
- |
r/- |
reserved, read as zero |
4:3 |
A |
r/w |
mode configuration; only OFF ( |
2 |
X |
r/w |
execute permission |
1 |
W |
r/w |
write permission |
0 |
R |
r/w |
read permission |
pmpaddr
0x3b0 - 0x3ef |
Physical memory protection configuration registers |
|
Reset value: UNDEFINED |
||
The |
When configuring PMP make sure to set pmpaddr* before activating the according region via
pmpcfg* . When changing the PMP configuration, deactivate the according region via pmpcfg*
before modifying pmpaddr* .
|
3.8.6. (Machine) Counter and Timer CSRs
The (machine) counters and timers are implemented when the Zicntr
ISA extensions is enabled (default)
via the CPU_EXTENSION_RISCV_Zicntr generic.
The CPU_CNT_WIDTH generic defines the total size of the CPU’s cycle[h] and instret[h]
/ mcycle[h] and minstret[h]
counter CSRs (low and high words combined); the time CSRs are not affected by this generic. Note that any
configuration with CPU_CNT_WIDTH less than 64 is not RISC-V compliant.
|
Effective CPU counter width (
If CPU_CNT_WIDTH is less than 64 (the default value) and greater than or equal 32, the according
MSBs of [m]cycle & [m]instret )[m]cycleh and [m]instreth are read-only and always read as zero. This configuration
will also set the SYSINFO_CPU_ZXSCNT flag ("small counters") in the CPU
SYSINFO register.If CPU_CNT_WIDTH is less than 32 and greater than 0, the [m]cycleh and [m]instreth CSRs are hardwired to zero
and any write access to them is ignored. Furthermore, the according MSBs of [m]cycle and [m]instret are read-only
and always read as zero. This configuration will also set the SYSINFO_CPU_ZXSCNT flag ("small counters") in
the CPU SYSINFO register.If CPU_CNT_WIDTH is 0, the cycle[h] and instret[h] / mcycle[h] and minstret[h] CSRs are hardwired to zero
and any write access to them is ignored.
|
cycle[h]
0xc00 |
Cycle counter - low word |
|
0xc80 |
Cycle counter - high word |
|
Reset value: UNDEFINED |
||
The |
time[h]
0xc01 |
System time - low word |
|
0xc81 |
System time - high word |
|
Reset value: UNDEFINED |
||
The |
instret[h]
0xc02 |
Instructions-retired counter - low word |
|
0xc82 |
Instructions-retired counter - high word |
|
Reset value: UNDEFINED |
||
The |
mcycle[h]
0xb00 |
Machine cycle counter - low word |
|
0xb80 |
Machine cycle counter - high word |
|
Reset value: UNDEFINED |
||
The |
minstret[h]
0xb02 |
Machine instructions-retired counter - low word |
|
0xb82 |
Machine instructions-retired counter - high word |
|
Reset value: UNDEFINED |
||
The |
3.8.7. Hardware Performance Monitors (HPM) CSRs
The hardware performance monitor CSRs are implemented when the Zihpm
ISA extension is enabled via the
CPU_EXTENSION_RISCV_Zihpm generic.
The actually implemented hardware performance logic is configured via the HPM_NUM_CNTS top entity generic,
which defines the number of implemented performance monitors. Note that always all 28 HPM counter and configuration registers
(mhpmcounter*[h]
and mhpmevent*
) are implemented, but only the actually configured ones are real registers and
not hardwired to zero.
If trying to access an HPM-related CSR beyond HPM_NUM_CNTS no illegal instruction exception is triggered. The according CSRs are read-only (writes are ignored) and always return zero. |
The HPM system only allows machine-mode access. Hence, hpmcounter*[h] CSR are not implemented
and any access (even) from machine mode will raise an exception. Furthermore, the according bits of mcounteren
used to configure user-mode access to hpmcounter*[h] are hard-wired to zero.
|
The total counter width of the HPMs can be configured before synthesis via the HPM_CNT_WIDTH generic (0..64-bit).
The total LSB-aligned HPM counter size (low word CSR + high word CSR) is defined via the HPM_NUM_CNTS generic (0..64-bit). If HPM_NUM_CNTS is less than 64, all unused MSB-aligned bits are hardwired to zero. |
mhpmevent
0x232 -0x33f |
Machine hardware performance monitor event selector |
|
Reset value: UNDEFINED |
||
The |
The available hardware performance logic is configured via the HPM_NUM_CNTS top entity generic.
HPM_NUM_CNTS defines the number of implemented performance monitors and thus, the availability of the
according mhpmcounter*[h]
and mhpmevent*
CSRs.
Bit | Name [C] | R/W | Event |
---|---|---|---|
0 |
HPMCNT_EVENT_CY |
r/w |
active clock cycle (not in sleep) |
1 |
- |
r/- |
not implemented, always read as zero |
2 |
HPMCNT_EVENT_IR |
r/w |
retired instruction |
3 |
HPMCNT_EVENT_CIR |
r/w |
retired compressed instruction |
4 |
HPMCNT_EVENT_WAIT_IF |
r/w |
instruction fetch memory wait cycle (if more than 1 cycle memory latency) |
5 |
HPMCNT_EVENT_WAIT_II |
r/w |
instruction issue pipeline wait cycle (if more than 1 cycle latency), caused by pipelines flushes (like taken branches) |
6 |
HPMCNT_EVENT_WAIT_MC |
r/w |
multi-cycle ALU operation wait cycle |
7 |
HPMCNT_EVENT_LOAD |
r/w |
load operation |
8 |
HPMCNT_EVENT_STORE |
r/w |
store operation |
9 |
HPMCNT_EVENT_WAIT_LS |
r/w |
load/store memory wait cycle (if more than 1 cycle memory latency) |
10 |
HPMCNT_EVENT_JUMP |
r/w |
unconditional jump |
11 |
HPMCNT_EVENT_BRANCH |
r/w |
conditional branch (taken or not taken) |
12 |
HPMCNT_EVENT_TBRANCH |
r/w |
taken conditional branch |
13 |
HPMCNT_EVENT_TRAP |
r/w |
entered trap |
14 |
HPMCNT_EVENT_ILLEGAL |
r/w |
illegal instruction exception |
mhpmcounter[h]
0xb03 - 0xb1f |
Machine hardware performance monitor - counter low |
|
0xb83 - 0xb9f |
Machine hardware performance monitor - counter high |
|
Reset value: UNDEFINED |
||
The |
3.8.8. Machine Counter Setup CSRs
mcountinhibit
0x320 |
Machine counter-inhibit register |
|
Reset value: UNDEFINED |
||
The |
Bit | Name [C] | R/W | Event |
---|---|---|---|
0 |
CSR_MCOUNTINHIBIT_IR |
r/w |
the |
2 |
CSR_MCOUNTINHIBIT_IR |
r/w |
the |
3:31 |
CSR_MCOUNTINHIBIT_HPM3 : _CSR_MCOUNTINHIBIT_HPM31 |
r/w |
the |
3.8.9. Machine Information CSRs
All machine information registers can only be accessed in machine mode and are read-only. |
mvendorid
0xf11 |
Machine vendor ID |
|
Reset value: 0x00000000 |
||
The |
marchid
0xf12 |
Machine architecture ID |
|
Reset value: 0x00000013 |
||
The |
mimpid
0xf13 |
Machine implementation ID |
|
Reset value: HW version number |
||
The |
mhartid
0xf14 |
Machine hardware thread ID |
|
Reset value: HW_THREAD_ID generic |
||
The |
mconfigptr
0xf15 |
Machine configuration pointer register |
|
Reset value: |
||
This register holds a physical address (if not zero) that points to the base address of an architecture configuration structure. Software can traverse this data structure to discover information about the harts, the platform, and their configuration. NOTE: Not assigned yet. |
3.8.10. Traps, Exceptions and Interrupts
In this document the following nomenclature regarding traps is used:
-
interrupts = asynchronous exceptions
-
exceptions = synchronous exceptions
-
traps = exceptions + interrupts (synchronous or asynchronous exceptions)
Whenever an exception or interrupt is triggered, the CPU transfers control to the address stored in mtvec
CSR. The cause of the according interrupt or exception can be determined via the content of mcause
CSR. The address that reflects the current program counter when a trap was taken is stored to mepc
CSR.
Additional information regarding the cause of the trap can be retrieved from mtval
CSR.
The traps are prioritized. If several exceptions occur at once only the one with highest priority is triggered while all remaining exceptions are ignored. If several interrupts trigger at once, the one with highest priority is serviced first while the remaining ones stay pending. After completing the interrupt handler the interrupt with the second highest priority will get serviced and so on until no further interrupt are pending.
Interrupt Signal Requirements
All interrupts request signals (including FIRQs) are high-active. A request has to stay at high-level (=asserted)
until it is explicitly acknowledged by the CPU software (for example by writing to a specific memory-mapped register).
|
Instruction Atomicity
All instructions execute as atomic operations - interrupts can only trigger between two instructions.
So if there is a permanent interrupt request, exactly one instruction from the interrupt program will be executed before
a new interrupt handler can start.
|
3.8.11. Memory Access Exceptions**
If a load operation causes any exception, the instruction’s destination register is not written at all. Load exceptions caused by a misalignment or a physical memory protection fault do not trigger a bus read-operation at all. Exceptions caused by a store address misalignment or a store physical memory protection fault do not trigger a bus write-operation at all.
3.8.12. Custom Fast Interrupt Request Lines
As a custom extension, the NEORV32 CPU features 16 fast interrupt request (FIRQ) lines via the firq_i
CPU top
entity signals. These interrupts have custom configuration and status flags in the mie
and mip
CSRs and also
provide custom trap codes in mcause
. These FIRQs are reserved for NEORV32 processor-internal usage only.
NEORV32 Trap Listing
Prio. | mcause |
[RISC-V] | ID [C] | Cause | mepc |
mtval |
---|---|---|---|---|---|---|
1 |
|
0.0 |
TRAP_CODE_I_MISALIGNED |
instruction address misaligned |
B-ADR |
PC |
2 |
|
0.1 |
TRAP_CODE_I_ACCESS |
instruction access fault |
B-ADR |
PC |
3 |
|
0.2 |
TRAP_CODE_I_ILLEGAL |
illegal instruction |
PC |
Inst |
4 |
|
0.11 |
TRAP_CODE_MENV_CALL |
environment call from M-mode ( |
PC |
PC |
5 |
|
0.8 |
TRAP_CODE_UENV_CALL |
environment call from U-mode ( |
PC |
PC |
6 |
|
0.3 |
TRAP_CODE_BREAKPOINT |
breakpoint (EBREAK) |
PC |
PC |
7 |
|
0.6 |
TRAP_CODE_S_MISALIGNED |
store address misaligned |
B-ADR |
B-ADR |
8 |
|
0.4 |
TRAP_CODE_L_MISALIGNED |
load address misaligned |
B-ADR |
B-ADR |
9 |
|
0.7 |
TRAP_CODE_S_ACCESS |
store access fault |
B-ADR |
B-ADR |
10 |
|
0.5 |
TRAP_CODE_L_ACCESS |
load access fault |
B-ADR |
B-ADR |
11 |
|
1.16 |
TRAP_CODE_FIRQ_0 |
fast interrupt request channel 0 |
I-PC |
0 |
12 |
|
1.17 |
TRAP_CODE_FIRQ_1 |
fast interrupt request channel 1 |
I-PC |
0 |
13 |
|
1.18 |
TRAP_CODE_FIRQ_2 |
fast interrupt request channel 2 |
I-PC |
0 |
14 |
|
1.19 |
TRAP_CODE_FIRQ_3 |
fast interrupt request channel 3 |
I-PC |
0 |
15 |
|
1.20 |
TRAP_CODE_FIRQ_4 |
fast interrupt request channel 4 |
I-PC |
0 |
16 |
|
1.21 |
TRAP_CODE_FIRQ_5 |
fast interrupt request channel 5 |
I-PC |
0 |
17 |
|
1.22 |
TRAP_CODE_FIRQ_6 |
fast interrupt request channel 6 |
I-PC |
0 |
18 |
|
1.23 |
TRAP_CODE_FIRQ_7 |
fast interrupt request channel 7 |
I-PC |
0 |
19 |
|
1.24 |
TRAP_CODE_FIRQ_8 |
fast interrupt request channel 8 |
I-PC |
0 |
20 |
|
1.25 |
TRAP_CODE_FIRQ_9 |
fast interrupt request channel 9 |
I-PC |
0 |
21 |
|
1.26 |
TRAP_CODE_FIRQ_10 |
fast interrupt request channel 10 |
I-PC |
0 |
22 |
|
1.27 |
TRAP_CODE_FIRQ_11 |
fast interrupt request channel 11 |
I-PC |
0 |
23 |
|
1.28 |
TRAP_CODE_FIRQ_12 |
fast interrupt request channel 12 |
I-PC |
0 |
24 |
|
1.29 |
TRAP_CODE_FIRQ_13 |
fast interrupt request channel 13 |
I-PC |
0 |
25 |
|
1.30 |
TRAP_CODE_FIRQ_14 |
fast interrupt request channel 14 |
I-PC |
0 |
26 |
|
1.31 |
TRAP_CODE_FIRQ_15 |
fast interrupt request channel 15 |
I-PC |
0 |
27 |
|
1.11 |
TRAP_CODE_MEI |
machine external interrupt |
I-PC |
0 |
28 |
|
1.3 |
TRAP_CODE_MSI |
machine software interrupt |
I-PC |
0 |
29 |
|
1.7 |
TRAP_CODE_MTI |
machine timer interrupt |
I-PC |
0 |
Notes
The "Prio." column shows the priority of each trap. The highest priority is 1. The “mcause” column shows the
cause ID of the according trap that is written to mcause
CSR. The "[RISC-V]" columns show the interrupt/exception code value from the
official RISC-V privileged architecture manual. The "[C]" names are defined by the NEORV32 core library (sw/lib/include/neorv32.h
) and can
be used in plain C code. The “mepc” and “mtval” columns show the value written to
mepc
and mtval
CSRs when a trap is triggered:
-
I-PC - address of interrupted instruction (instruction has not been execute/completed yet)
-
B-ADR- bad memory access address that cause the trap
-
PC - address of instruction that caused the trap
-
0 - zero
-
Inst - the faulting instruction itself
3.8.13. Bus Interface
The CPU provides two independent bus interfaces: One for fetching instructions (i_bus_*
) and one for
accessing data (d_bus_*
) via load and store operations. Both interfaces use the same interface protocol.
Address Space
The CPU is a 32-bit architecture with separated instruction and data interfaces making it a Harvard Architecture. Each of this interfaces can access an address space of up to 232 bytes (4GB). The memory system is based on 32-bit words with a minimal granularity of 1 byte. Please note, that the NEORV32 CPU does not support unaligned memory accesses in hardware - however, a software-based handling can be implemented as any unaligned memory access will trigger an according exception.
Interface Signals
The following table shows the signals of the data and instruction interfaces seen from the CPU
(*_o
signals are driven by the CPU / outputs, *_i
signals are read by the CPU / inputs).
Signal | Size | Function |
---|---|---|
|
32 |
access address |
|
32 |
data input for read operations |
|
32 |
data output for write operations |
|
4 |
byte enable signal for write operations |
|
1 |
bus write access |
|
1 |
bus read access |
|
1 |
exclusive access request |
|
1 |
accessed peripheral indicates a successful completion of the bus transaction |
|
1 |
accessed peripheral indicates an error during the bus transaction |
|
1 |
this signal is set for one cycle when the CPU executes a data/instruction fence operation |
|
2 |
current CPU privilege level |
Currently, there a no pipelined or overlapping operations implemented within the same bus interface. So only a single transfer request can be "on the fly". |
Protocol
A bus request is triggered either by the bus_re_o
signal (for reading data) or by the bus_we_o
signal (for
writing data). These signals are active for exactly one cycle and initiate either a read or a write transaction. The transaction is
completed when the accessed peripheral either sets the bus_ack_i
signal (→ successful completion) or the
bus_err_i
signal is set (→ failed completion). All these control signals are only active (= high) for one
single cycle. An error indicated via the bus_err_i
signal during a transfer will trigger the according instruction bus
access fault or load/store bus access fault exception.
The transfer can be completed directly in the same cycle as it was initiated (via the bus_re_o or bus_we_o
signal) if the peripheral sets bus_ack_i or bus_err_i high for one cycle. However, in order to shorten the critical path such "asynchronous"
completion should be avoided. The default processor-internal module provide exactly one cycle delay between initiation and completion of transfers.
|
Bus Keeper: Processor-internal memories and memory-mapped devices with variable / high latency
Processor-internal peripherals or memories do not have to respond within one cycle after the transfer initiation (= latency > 1 cycle).
However, the bus transaction has to be completed (= acknowledged) within a certain response time window. This time window is defined
by the global max_proc_int_response_time_c constant (default = 15 cycles) from the processor’s VHDL package file (rtl/neorv32_package.vhd ).
It defines the maximum number of cycles after which an unacknowledged processor-internal bus transfer will timeout and raise a bus fault exception.
The BUSKEEPER hardware module (see section Internal Bus Monitor (BUSKEEPER)) keeps track of all internal bus transactions. If any bus operations times out
(for example when accessing "address space holes") this unit will issue a bus error to the CPU that will raise the according instruction fetch or data access bus exception.
Note that the bus keeper does not track external accesses via the external memory bus interface. However, the external memory bus interface also provides
an optional bus timeout (see section Processor-External Memory Interface (WISHBONE) (AXI4-Lite)).
|
Exemplary Bus Accesses
![]() |
![]() |
Read access |
Write access |
Write Access
For a write access, the accessed address (bus_addr_o
), the data to be written (bus_wdata_o
) and the byte
enable signals (bus_ben_o
) are set when bus_we_o goes high. These three signals are kept stable until the
transaction is completed. In the example the accessed peripheral cannot answer directly in the next
cycle after issuing. Here, the transaction is successful and the peripheral sets the bus_ack_i
signal several
cycles after issuing.
Read Access
For a read access, the accessed address (bus_addr_o
) is set when bus_re_o
goes high. The address is kept
stable until the transaction is completed. In the example the accessed peripheral cannot answer
directly in the next cycle after issuing. The peripheral hast to apply the read data right in the same cycle as
the bus transaction is completed (here, the transaction is successful and the peripheral sets the bus_ack_i
signal).
Access Boundaries
The instruction interface will always access memory on word (= 32-bit) boundaries even if fetching compressed (16-bit) instructions. The data interface can access memory on byte (= 8-bit), half-word (= 16- bit) and word (= 32-bit) boundaries.
Exclusive (Atomic) Access
The CPU can access memory in an exclusive manner by generating a load-reservate and store-conditional combination. Normally, these combinations should target the same memory address.
The CPU starts an exclusive access to memory via the load-reservate instruction (lr.w
). This instruction
will set the CPU-internal exclusive access lock, which directly drives the d_bus_lock_o
. It is the task of
the memory system to manage this exclusive access reservation by storing the according access address and
the source of the access itself (for example via the CPU ID in a multi-core system).
When the CPU executes a store-conditional instruction (sc.w
) the CPU-internal exclusive access lock is
evaluated to check if the exclusive access was successful. If the lock is still OK, the instruction will write-back
zero and will allow the according store operation to the memory system. If the lock is broken, the
instruction will write-back non-zero and will not generate an actual memory store operation.
The CPU-internal exclusive access lock is broken if at least one of the situations appear.
-
when executing any other memory-access operation than
lr.w
-
when any trap (sync. or async.) is triggered (for example to force a context switch)
-
when the memory system signals a bus error (via the
bus_err_i
signal)
For more information regarding the SoC-level behavior and requirements of atomic operations see section Processor-External Memory Interface (WISHBONE) (AXI4-Lite). |
Memory Barriers
Whenever the CPU executes a fence instruction, the according interface signal is set high for one cycle
(d_bus_fence_o
for a fence instruction; i_bus_fence_o
for a fencei instruction). It is the task of the
memory system to perform the necessary operations (like a cache flush and refill).
3.8.14. CPU Hardware Reset
In order to reduce routing constraints (and by this the actual hardware requirements), most uncritical registers of the NEORV32 CPU as well as most register of the whole NEORV32 Processor do not use a dedicated hardware reset. "Uncritical registers" in this context means that the initial value of these registers after power-up is not relevant for a defined CPU boot process.
Rational
A good example to illustrate the concept of uncritical registers is a pipelined processing engine. Each stage of the engine features an N-bit data register and a 1-bit status register. The status register is set when the data in the according data register is valid. At the end of the pipeline the status register might trigger a write-back of the processing result to some kind of memory. The initial status of the data registers after power-up is irrelevant as long as the status registers are all reset to a defined value that indicates there is no valid data in the pipeline’s data register. Therefore, the pipeline data register do no require a dedicated reset as they do not control the actual operation (in contrast to the status register). This makes the pipeline data registers from this example "uncritical registers".
NEORV32 CPU Reset
In terms of the NEORV32 CPU, there are several pipeline registers, state machine registers and even status
and control registers (CSRs) that do not require a defined initial state to ensure a correct boot process. The
pipeline register will get initialized by the CPU’s internal state machines, which are initialized from the main
control engine that actually features a defined reset. The initialization of most of the CPU’s core CSRs (like
interrupt control) is done by the software (to be more specific, this is done by the crt0.S
start-up code).
During the very early boot process (where crt0.S
is running) there is no chance for undefined behavior due to
the lack of dedicated hardware resets of certain CSRs. For example the machine interrupt-enable CSR (mie
)
does not provide a dedicated reset. The value after reset of this register is uncritical as interrupts cannot fire
because the global interrupt enabled flag in the status register (mstatsus(mie)
) provides a dedicated
hardware reset setting it to low (globally disabling interrupts).
Reset Configuration
Most CPU-internal register do feature an asynchronous reset in the VHDL code, but the "don’t care" value
(VHDL '-'
) is used for initialization of the uncritical register, effectively generating a flip-flop without a
reset. However, certain applications or situations (like advanced gate-level / timing simulations) might
require a more deterministic reset state. For this case, a defined reset level (reset-to-low) of all registers can
be enabled via a constant in the main VHDL package file (rtl/core/neorv32_package.vhd
):
-- "critical" number of PMP regions --
constant dedicated_reset_c : boolean := false; -- use dedicated hardware reset value
for UNCRITICAL registers (FALSE=reset value is irrelevant (might simplify HW),
default; TRUE=defined LOW reset value)
4. Software Framework
To make actual use of the NEORV32 processor, the project comes with a complete software eco-system. This ecosystem is based on the RISC-V port of the GCC GNU Compiler Collection and consists of the following elementary parts:
Application/bootloader start-up code |
|
Application/bootloader linker script |
|
Core hardware driver libraries |
|
Central makefile |
|
Auxiliary tool for generating NEORV32 executables |
|
Default bootloader |
|
Last but not least, the NEORV32 ecosystem provides some example programs for testing the hardware, for
illustrating the usage of peripherals and for general getting in touch with the project (sw/example
).
4.1. Compiler Toolchain
The toolchain for this project is based on the free RISC-V GCC-port. You can find the compiler sources and build instructions on the official RISC-V GNU toolchain GitHub page: https://github.com/riscv/riscv-gnutoolchain.
The NEORV32 implements a 32-bit base integer architecture (rv32i
) and a 32-bit integer and soft-float ABI
(ilp32), so make sure you build an according toolchain.
Alternatively, you can download my prebuilt rv32i/e
toolchains for 64-bit x86 Linux from: https://github.com/stnolting/riscv-gcc-prebuilt
The default toolchain prefix used by the project’s makefiles is (can be changed in the makefiles): riscv32-unknown-elf
More information regarding the toolchain (building from scratch or downloading the prebuilt ones) can be found in the user guides' section Software Toolchain Setup. |
4.2. Core Libraries
The NEORV32 project provides a set of C libraries that allows an easy usage of the processor/CPU features. Just include the main NEORV32 library file in your application’s source file(s):
#include <neorv32.h>
Together with the makefile, this will automatically include all the processor’s header files located in
sw/lib/include
into your application. The actual source files of the core libraries are located in
sw/lib/source
and are automatically included into the source list of your software project. The following
files are currently part of the NEORV32 core library:
C source file | C header file | Description |
---|---|---|
- |
|
main NEORV32 definitions and library file |
|
|
HW driver (stub)[15] functions for the custom functions subsystem |
|
|
HW driver functions for the NEORV32 CPU |
|
|
HW driver functions for the GPIO |
|
|
HW driver functions for the GPTRM |
- |
|
macros for custom intrinsics/instructions |
- |
|
legacy back-compatibility layer |
|
|
HW driver functions for the MTIME |
|
|
HW driver functions for the NEOLED |
|
|
HW driver functions for the PWM |
|
|
NEORV32 runtime environment and helpers |
|
|
HW driver functions for the SLINK |
|
|
HW driver functions for the SPI |
|
|
HW driver functions for the TRNG |
|
|
HW driver functions for the TWI |
|
|
HW driver functions for the UART0 and UART1 |
|
|
HW driver functions for the WDT |
|
|
HW driver functions for the XIRQ |
Documentation
All core library software sources are highly documented using doxygen. See section [Building the Software Framework Documentation].
The documentation is automatically built and deployed to GitHub pages by the CI workflow (:https://stnolting.github.io/neorv32/sw/files.html).
|
4.3. Application Makefile
Application compilation is based on a single, centralized GNU makefiles sw/common/common.mk
. Each project in the
sw/example
folder features a makefile that just includes this central makefile. When creating a new project, copy an existing project folder or
at least the makefile to your new project folder. I suggest to create new projects also in sw/example
to keep
the file dependencies. Of course, these dependencies can be manually configured via makefiles variables
when your project is located somewhere else.
Before you can use the makefiles, you need to install the RISC-V GCC toolchain. Also, you have to add the
installation folder of the compiler to your system’s PATH variable. More information can be found in
User Guide: Software Toolchain Setup.
|
The makefile is invoked by simply executing make in your console:
neorv32/sw/example/blink_led$ make
4.3.1. Targets
Just executing make
(or executing make help
) will show the help menu listing all available targets.
$ make
<<< NEORV32 Application Makefile >>>
Make sure to add the bin folder of RISC-V GCC to your PATH variable.
Targets:
help - show this text
check - check toolchain
info - show makefile/toolchain configuration
exe - compile and generate <neorv32_exe.bin> executable for upload via bootloader
hex - compile and generate <neorv32_exe.hex> executable raw file
image - compile and generate VHDL IMEM boot image (for application) in local folder
install - compile, generate and install VHDL IMEM boot image (for application)
sim - in-console simulation using default/simple testbench and GHDL
all - exe + hex + install
elf_info - show ELF layout info
clean - clean up project
clean_all - clean up project, core libraries and image generator
bl_image - compile and generate VHDL BOOTROM boot image (for bootloader only!) in local folder
bootloader - compile, generate and install VHDL BOOTROM boot image (for bootloader only!)
4.3.2. Configuration
The compilation flow is configured via variables right at the beginning of the central
makefile (sw/common/common.mk
):
The makefile configuration variables can be (re-)defined directly when invoking the makefile. For
example via $ make MARCH=rv32ic clean_all exe . You can also make project-specific definitions
of all variables inside the project’s actual makefile (e.g., sw/example/blink_led/makefile ).
|
# *****************************************************************************
# USER CONFIGURATION
# *****************************************************************************
# User's application sources (*.c, *.cpp, *.s, *.S); add additional files here
APP_SRC ?= $(wildcard ./*.c) $(wildcard ./*.s) $(wildcard ./*.cpp) $(wildcard ./*.S)
# User's application include folders (don't forget the '-I' before each entry)
APP_INC ?= -I .
# User's application include folders - for assembly files only (don't forget the '-I' before each
entry)
ASM_INC ?= -I .
# Optimization
EFFORT ?= -Os
# Compiler toolchain
RISCV_PREFIX ?= riscv32-unknown-elf-
# CPU architecture and ABI
MARCH ?= rv32i
MABI ?= ilp32
# User flags for additional configuration (will be added to compiler flags)
USER_FLAGS ?=
# Relative or absolute path to the NEORV32 home folder
NEORV32_HOME ?= ../../..
# *****************************************************************************
APP_SRC |
The source files of the application ( |
APP_INC |
Include file folders; separated by white spaces; must be defined with |
ASM_INC |
Include file folders that are used only for the assembly source files ( |
EFFORT |
Optimization level, optimize for size ( |
RISCV_PREFIX |
The toolchain prefix to be used; follows the naming convention "architecture-vendor-output-" |
MARCH |
The targeted RISC-V architecture/ISA. Only |
MABI |
The default 32-bit integer ABI. |
USER_FLAGS |
Additional flags that will be forwarded to the compiler tools |
NEORV32_HOME |
Relative or absolute path to the NEORV32 project home folder. Adapt this if the makefile/project is not in the project’s |
COM_PORT |
Default serial port for executable upload to bootloader. |
4.3.3. Default Compiler Flags
The following default compiler flags are used for compiling an application. These flags are defined via the
CC_OPTS
variable. Custom flags can be appended via the USER_FLAGS
variable to the CC_OPTS
variable.
|
Enable all compiler warnings. |
|
Put functions and data segment in independent sections. This allows a code optimization as dead code and unused data can be easily removed. |
|
Do not use the default start code. The makefiles use the NEORV32-specific start-up code instead ( |
|
Make the linker perform dead code elimination. |
|
Include/link with |
|
Search for the standard C library when linking. |
|
Make sure we have no unresolved references to internal GCC library subroutines. |
|
Use built-in software functions for floating-point divisions and square roots (since the according instructions are not supported yet). |
|
Force a 32-bit alignment of functions and labels (branch/jump/call targets). This increases performance as it simplifies instruction fetch when using the C extension. As a drawback this will also slightly increase the program code. |
|
|
|
|
|
4.4. Executable Image Format
In order to generate a file, which can be executed by the processor, all source files have to be compiler, linked and packed into a final executable.
4.4.1. Linker Script
When all the application sources have been compiled, they need to be linked in order to generate a unified
program file. For this purpose the makefile uses the NEORV32-specific linker script sw/common/neorv32.ld
for
linking all object files that were generated during compilation.
The linker script defines three memory sections: rom
, ram
and iodev
. Each section provides specific
access attributes: read access (r
), write access (w
) and executable (x
).
Memory section | Attributes | Description |
---|---|---|
|
|
Data memory address space (processor-internal/external DMEM) |
|
|
Instruction memory address space (processor-internal/external IMEM) or internal bootloader ROM |
|
|
Processor-internal memory-mapped IO/peripheral devices address space |
These sections are defined right at the beginning of the linker script:
neorv32.ld
MEMORY
{
ram (rwx) : ORIGIN = 0x80000000, LENGTH = DEFINED(make_bootloader) ? 512 : 8*1024
rom (rx) : ORIGIN = DEFINED(make_bootloader) ? 0xFFFF0000 : 0x00000000, LENGTH = DEFINED(make_bootloader) ? 32K : 2048M
iodev (rw) : ORIGIN = 0xFFFFFE00, LENGTH = 512
}
Each memory section provides a base address ORIGIN
and a size LENGTH
. The base address and size of the iodev
section is
fixed and must not be altered. The base addresses and sizes of the ram
and rom
regions correspond to the total available instruction
and data memory address space (see section Address Space Layout).
ORIGIN of the ram section has to be always identical to the processor’s dspace_base_c hardware configuration. Additionally,
ORIGIN of the rom section has to be always identical to the processor’s ispace_base_c hardware configuration.
|
The sizes of ram
section has to be equal to the size of the physical available data instruction memory. For example, if the processor
setup only uses processor-internal DMEM (MEM_INT_DMEM_EN = true and no external data memory attached) the LENGTH
parameter of
this memory section has to be equal to the size configured by the MEM_INT_DMEM_SIZE generic.
The sizes of rom
section is a little bit more complicated. The default linker script configuration assumes a maximum of 2GB logical
memory space, which is also the default configuration of the processor’s hardware instruction memory address space. This size does not have
to reflect the actual physical size of the instruction memory (internal IMEM and/or processor-external memory). It just provides a maximum
limit. When uploading new executable via the bootloader, the bootloader itself checks if sufficient physical instruction memory is available.
If a new executable is embedded right into the internal-IMEM the synthesis tool will check, if the configured instruction memory size
is sufficient (e.g., via the MEM_INT_IMEM_SIZE generic).
The rom region uses a conditional assignment (via the make_bootloader symbol) for ORIGIN and LENGTH that is used to place
"normal executable" (i.e. for the IMEM) or "the bootloader image" to their according memories.The ram region also uses a conditional assignment (via the make_bootloader symbol) for LENGTH . When compiling the bootloader
(make_bootloader symbol is set) the generated bootloader will only use the first 512 bytes of the data address space. This is
a fall-back to ensure the bootloader can operate independently of the actual physical data memory size.
|
The linker maps all the regions from the compiled object files into four final sections: .text
, .rodata
, .data
and .bss
.
These four regions contain everything required for the application to run:
Region | Description |
---|---|
|
Executable instructions generated from the start-up code and all application sources. |
|
Constants (like strings) from the application; also the initial data for initialized variables. |
|
This section is required for the address generation of fixed (= global) variables only. |
|
This section is required for the address generation of dynamic memory constructs only. |
The .text
and .rodata
sections are mapped to processor’s instruction memory space and the .data
and
.bss
sections are mapped to the processor’s data memory space. Finally, the .text
, .rodata
and .data
sections are extracted and concatenated into a single file main.bin
.
4.4.2. Executable Image Generator
The main.bin
file is packed by the NEORV32 image generator (sw/image_gen
) to generate the final executable file.
The sources of the image generator are automatically compiled when invoking the makefile. |
The image generator can generate three types of executables, selected by a flag when calling the generator:
|
Generates an executable binary file |
|
Generates a plain ASCII hex-char file |
|
Generates an executable VHDL memory initialization image for the processor-internal IMEM. This option generates the |
|
Generates an executable VHDL memory initialization image for the processor-internal BOOT ROM. This option generates the |
All these options are managed by the makefile. The normal application compilation flow will generate the neorv32_exe.bin
executable to be upload via UART to the NEORV32 bootloader.
The image generator add a small header to the neorv32_exe.bin
executable, which consists of three 32-bit words located right at the
beginning of the file. The first word of the executable is the signature word and is always 0x4788cafe
. Based on this word the bootloader
can identify a valid image file. The next word represents the size in bytes of the actual program
image in bytes. A simple "complement" checksum of the actual program image is given by the third word. This
provides a simple protection against data transmission or storage errors.
4.4.3. Start-Up Code (crt0)
The CPU and also the processor require a minimal start-up and initialization code to bring the CPU (and the SoC)
into a stable and initialized state and to initialize the C runtime environment before the actual application can be executed.
This start-up code is located in sw/common/crt0.S
and is automatically linked every application program
and placed right before the actual application code so it gets executed right after reset.
The crt0.S
start-up performs the following operations:
-
Initialize all integer registers
x1 - x31
(or jsutx1 - x15
when using theE
CPU extension) to a defined value. -
Initialize the global pointer
gp
and the stack pointersp
according to the.data
segment layout provided by the linker script. -
Initialize all CPU core CSRs and also install a default "dummy" trap handler for all traps. This handler catches all traps during the early boot phase.
-
Clear IO area: Write zero to all memory-mapped registers within the IO region (
iodev
section). If certain devices have not been implemented, a bus access fault exception will occur. This exception is captured by the dummy trap handler. -
Clear the
.bss
section defined by the linker script. -
Copy read-only data from the
.text
section to the.data
section to set initialized variables. -
Call the application’s
main
function (with no arguments:argc
=argv
= 0). -
If the
main
function returnscrt0
can call an "after-main handler" (see below) -
If there is no after-main handler or after returning from the after-main handler the processor goes to an endless sleep mode (using a simple loop or via the
wfi
instruction if available).
After-Main Handler
If the application’s main()
function actually returns, an after main handler can be executed. This handler can be a normal function
since the C runtime is still available when executed. If this handler uses any kind of peripheral/IO modules make sure these are
already initialized within the application or you have to initialize them inside the handler.
int __neorv32_crt0_after_main(int32_t return_code);
The function has exactly one argument (return_code
) that provides the return value of the application’s main function.
For instance, this variable contains -1 if the main function returned with return -1;
. The return value of the
__neorv32_crt0_after_main
function is irrelevant as there is no further "software instance" executed afterwards that can check this.
However, the on-chip debugger could still evaluate the return value of the after-main handler.
A simple printf
can be used to inform the user when the application main function return
(this example assumes that UART0 has been already properly configured in the actual application):
int __neorv32_crt0_after_main(int32_t return_code) {
neorv32_uart0_printf("Main returned with code: %i\n", return_code);
return 0;
}
4.5. Bootloader
This section illustrated the default bootloader from the repository. The bootloader can be customized to target application-specific scenarios. See User Guide section Customizing the Internal Bootloader for more information. |
The default NEORV32 bootloader (source code sw/bootloader/bootloader.c
) provides a build-in firmware that
allows to upload new application executables via UART at every time and to optionally store/boot them to/from
an external SPI flash. It features a simple "automatic boot" feature that will try to fetch an executable
from SPI flash if there is no UART user interaction. This allows to build processor setup with
non-volatile application storage, which can be updated at any time.
The bootloader is only implemented if the INT_BOOTLOADER_EN generic is true. This will select the Indirect Boot boot configuration.
Hardware requirements of the default NEORV32 bootloader
REQUIRED: The bootloader requires the CSR access CPU extension (CPU_EXTENSION_RISCV_Zicsr generic is true)
and at least 512 bytes of data memory (processor-internal DMEM or external DMEM).RECOMMENDED: For user interaction via UART (like uploading executables) the primary UART (UART0) has to be implemented (IO_UART0_EN generic is true). Without UART the bootloader does not make much sense. However, auto-boot via SPI is still supported but the bootloader should be customized (see User Guide) for this purpose. OPTIONAL: The default bootloader uses bit 0 of the GPIO output port as "heart beat" and status LED if the GPIO controller is implemented (IO_GPIO_EN generic is true). OPTIONAL: The MTIME machine timer (IO_MTIME_EN generic is true) and the SPI controller (IO_SPI_EN generic is true) are required in order to use the bootloader’s auto-boot feature (automatic boot from external SPI flash if there is no user interaction via UART). |
To interact with the bootloader, connect the primary UART (UART0) signals (uart0_txd_o
and
uart0_rxd_o
) of the processor’s top entity via a serial port (-adapter) to your computer (hardware flow control is
not used so the according interface signals can be ignored.), configure your
terminal program using the following settings and perform a reset of the processor.
Terminal console settings (19200-8-N-1
):
-
19200 Baud
-
8 data bits
-
no parity bit
-
1 stop bit
-
newline on
\r\n
(carriage return, newline) -
no transfer protocol / control flow protocol - just the raw byte stuff
The bootloader uses the LSB of the top entity’s gpio_o
output port as high-active status LED (all other
output pin are set to low level by the bootloader). After reset, this LED will start blinking at ~2Hz and the
following intro screen should show up in your terminal:
<< NEORV32 Bootloader >>
BLDV: Mar 23 2021
HWV: 0x01050208
CLK: 0x05F5E100
MISA: 0x40901105
CPU: 0x00000023
SOC: 0x0EFF0037
IMEM: 0x00004000 bytes @ 0x00000000
DMEM: 0x00002000 bytes @ 0x80000000
Autoboot in 8s. Press key to abort.
This start-up screen also gives some brief information about the bootloader and several system configuration parameters:
|
Bootloader version (built date). |
|
Processor hardware version (from the |
|
Processor clock speed in Hz (via the SYSINFO module, from the CLOCK_FREQUENCY generic). |
|
CPU extensions (from the |
|
CPU sub-extensions (via the |
|
Processor configuration (via the |
|
IMEM memory base address and size in byte (from the MEM_INT_IMEM_SIZE generic). |
|
DMEM memory base address and size in byte (from the MEM_INT_DMEM_SIZE generic). |
Now you have 8 seconds to press any key. Otherwise, the bootloader starts the auto boot sequence. When you press any key within the 8 seconds, the actual bootloader user console starts:
<< NEORV32 Bootloader >>
BLDV: Mar 23 2021
HWV: 0x01050208
CLK: 0x05F5E100
USER: 0x10000DE0
MISA: 0x40901105
CPU: 0x00000023
SOC: 0x0EFF0037
IMEM: 0x00004000 bytes @ 0x00000000
DMEM: 0x00002000 bytes @ 0x80000000
Autoboot in 8s. Press key to abort.
Aborted.
Available commands:
h: Help
r: Restart
u: Upload
s: Store to flash
l: Load from flash
e: Execute
CMD:>
The auto-boot countdown is stopped and now you can enter a command from the list to perform the corresponding operation:
-
h
: Show the help text (again) -
r
: Restart the bootloader and the auto-boot sequence -
u
: Upload new program executable (neorv32_exe.bin
) via UART into the instruction memory -
s
: Store executable to SPI flash atspi_csn_o(0)
-
l
: Load executable from SPI flash atspi_csn_o(0)
-
e
: Start the application, which is currently stored in the instruction memory (IMEM)
A new executable can be uploaded via UART by executing the u
command. After that, the executable can be directly
executed via the e
command. To store the recently uploaded executable to an attached SPI flash press s
. To
directly load an executable from the SPI flash press l
. The bootloader and the auto-boot sequence can be
manually restarted via the r
command.
The CPU is in machine level privilege mode after reset. When the bootloader boots an application, this application is also started in machine level privilege mode. |
For detailed information on using an SPI flash for application storage see User Guide section Programming an External SPI Flash via the Bootloader. |
4.5.1. Auto Boot Sequence
When you reset the NEORV32 processor, the bootloader waits 8 seconds for a UART console input before it
starts the automatic boot sequence. This sequence tries to fetch a valid boot image from the external SPI
flash, connected to SPI chip select spi_csn_o(0)
. If a valid boot image is found that can be successfully
transferred into the instruction memory, it is automatically started. If no SPI flash is detected or if there
is no valid boot image found, and error code will be shown.
4.5.2. Bootloader Error Codes
If something goes wrong during bootloader operation, an error code is shown. In this case the processor stalls, a bell command and one of the following error codes are send to the terminal, the bootloader status LED is permanently activated and the system must be manually reset.
|
If you try to transfer an invalid executable (via UART or from the external SPI flash), this error message shows up. There might be a transfer protocol configuration error in the terminal program. Also, if no SPI flash was found during an auto-boot attempt, this message will be displayed. |
|
Your program is way too big for the internal processor’s instructions memory. Increase the memory size or reduce your application code. |
|
This indicates a checksum error. Something went wrong during the transfer of the program image (upload via UART or loading from the external SPI flash). If the error was caused by a UART upload, just try it again. When the error was generated during a flash access, the stored image might be corrupted. |
|
This error occurs if the attached SPI flash cannot be accessed. Make sure you have the right type of flash and that it is properly connected to the NEORV32 SPI port using chip select #0. |
|
The bootloader encountered an exception during operation. This might be caused when it tries to access peripherals that were not implemented during synthesis. Example: executing |
4.6. NEORV32 Runtime Environment
The NEORV32 provides a minimal runtime environment (RTE) that takes care of a stable and safe execution environment by handling all traps (including interrupts).
Using the RTE is optional. The RTE provides a simple and comfortable way of delegating traps while making sure that all traps (even though they are not explicitly used by the application) are handled correctly. Performance-optimized applications or embedded operating systems should not use the RTE for delegating traps. |
When execution enters the application’s main
function, the actual runtime environment is responsible for catching all implemented exceptions
and interrupts. To activate the NEORV32 RTE execute the following function:
void neorv32_rte_setup(void);
This setup initializes the mtvec
CSR, which provides the base entry point for all trap
handlers. The address stored to this register reflects the first-level exception handler provided by the
NEORV32 RTE. Whenever an exception or interrupt is triggered, this first-level handler is called.
The first-level handler performs a complete context save, analyzes the source of the exception/interrupt and calls the according second-level exception handler, which actually takes care of the exception/interrupt handling. For this, the RTE manages a private look-up table to store the addresses of the according trap handlers.
After the initial setup of the RTE, each entry in the trap handler’s look-up table is initialized with a debug handler, that outputs detailed hardware information via the primary UART (UART0) when triggered. This is intended as a fall-back for debugging or for accidentally-triggered exceptions/interrupts. For instance, an illegal instruction exception caught by the RTE debug handler might look like this in the UART0 output:
<RTE> Illegal instruction @0x000002d6, MTVAL=0x00001537 </RTE>
To install the actual application’s trap handlers the NEORV32 RTE provides functions for installing and un-installing trap handler for each implemented exception/interrupt source.
int neorv32_rte_exception_install(uint8_t id, void (*handler)(void));
ID name [C] | Description / trap causing entry |
---|---|
|
instruction address misaligned |
|
instruction (bus) access fault |
|
illegal instruction |
|
breakpoint ( |
|
load address misaligned |
|
load (bus) access fault |
|
store address misaligned |
|
store (bus) access fault |
|
environment call from machine mode ( |
|
environment call from user mode ( |
|
machine timer interrupt |
|
machine external interrupt |
|
machine software interrupt |
|
fast interrupt channel 0..15 |
When installing a custom handler function for any of these exception/interrupts, make sure the function uses no attributes (especially no interrupt attribute!), has no arguments and no return value like in the following example:
void handler_xyz(void) {
// handle exception/interrupt...
}
Do NOT use the interrupt attribute for the application exception handler functions! This
will place an mret instruction to the end of it making it impossible to return to the first-level
exception handler of the RTE, which will cause stack corruption.
|
Example: Installation of the MTIME interrupt handler:
neorv32_rte_exception_install(EXC_MTI, handler_xyz);
To remove a previously installed exception handler call the according un-install function from the NEORV32 runtime environment. This will replace the previously installed handler by the initial debug handler, so even un-installed exceptions and interrupts are further captured.
int neorv32_rte_exception_uninstall(uint8_t id);
Example: Removing the MTIME interrupt handler:
neorv32_rte_exception_uninstall(EXC_MTI);
More information regarding the NEORV32 runtime environment can be found in the doxygen software documentation (also available online at GitHub pages). |
5. On-Chip Debugger (OCD)
The NEORV32 Processor features an on-chip debugger (OCD) implementing execution-based debugging that is compatible
to the Minimal RISC-V Debug Specification Version 0.13.2.
Please refer to this spec for in-deep information.
A copy of the specification is available in docs/references/riscv-debug-release.pdf
.
The NEORV32 OCD provides the following key features:
-
JTAG test access port
-
run-control of the CPU: halting, single-stepping and resuming
-
executing arbitrary programs during debugging
-
accessing core registers (direct access to GPRs, indirect access to CSRs via program buffer)
-
indirect access to the whole processor address space (via program buffer))
-
compatible to the RISC-V port of OpenOCD; pre-built binaries can be obtained for example from SiFive
OCD Security Note
Access via the OCD is always authenticated (dmstatus.authenticated == 1 ). Hence, the
whole system can always be accessed via the on-chip debugger. Currently, there is no option
to disable the OCD via software. The OCD can only be disabled by disabling implementation
(setting ON_CHIP_DEBUGGER_EN generic to false).
|
The OCD requires additional resources for implementation and might also increase the critical path resulting in less performance. If the OCD is not really required for the final implementation, it can be disabled and thus, discarded from implementation. In this case all circuitry of the debugger is completely removed (no impact on area, energy or timing at all). |
A simple example on how to use NEORV32 on-chip debugger in combination with OpenOCD and gdb
is shown in section Debugging using the On-Chip Debugger
of the User Guide.
|
The NEORV32 on-chip debugger complex is based on three hardware modules:

-
Debug Transport Module (DTM) (
rtl/core/neorv32_debug_dtm.vhd
): External JTAG access tap to allow an external adapter to interface with the debug module(DM) using the debug module interface (dmi). -
Debug Module (DM) (
rtl/core/neorv32_debug_tm.vhd
): Debugger control unit that is configured by the DTM via the the dmi. Form the CPU’s "point of view" this module behaves as a memory-mapped "peripheral" that can be accessed via the processor-internal bus. The memory-mapped registers provide an internal data buffer for data transfer from/to the DM, a code ROM containing the "park loop" code, a program buffer to allow the debugger to execute small programs defined by the DM and a status register that is used to communicate halt, resume and execute requests/acknowledges from/to the DM. -
CPU CPU Debug Mode extension (part of`rtl/core/neorv32_cpu_control.vhd`): This extension provides the "debug execution mode" which executes the "park loop" code from the DM. The mode also provides additional CSRs.
Theory of Operation
When debugging the system using the OCD, the debugger issues a halt request to the CPU (via the CPU’s
db_halt_req_i
signal) to make the CPU enter debug mode. In this state, the application-defined architectural
state of the system/CPU is "frozen" so the debugger can monitor and even modify it.
While in debug mode, the CPU executes the "park loop" code from the code ROM of the DM.
This park loop implements an endless loop, in which the CPU polls the memory-mapped status register that is
controlled by the debug module (DM). The flags of these register are used to communicate requests from
the DM and to acknowledge them by the CPU: trigger execution of the program buffer or resume the halted
application.
5.1. Debug Transport Module (DTM)
The debug transport module (VHDL module: rtl/core/neorv32_debug_dtm.vhd
) provides a JTAG test access port (TAP).
The DTM is the first entity in the debug system, which connects and external debugger via JTAG to the next debugging
entity: the debug module (DM).
External JTAG access is provided by the following top-level ports.
Name | Width | Direction | Description |
---|---|---|---|
|
1 |
in |
TAP reset (low-active); this signal is optional, make sure to pull it high if it is not used |
|
1 |
in |
serial clock |
|
1 |
in |
serial data input |
|
1 |
out |
serial data output |
|
1 |
in |
mode select |
JTAG Clock
The actual JTAG clock signal is not used as primary clock. Instead it is used to synchronize
JTGA accesses, while all internal operations trigger on the system clock. Hence, no additional clock domain is required
for integration of this module.
However, this constraints the maximal JTAG clock (jtag_tck_i ) frequency to be less than or equal to
1/4 of the system clock (clk_i ) frequency.
|
If the on-chip debugger is disabled (ON_CHIP_DEBUGGER_EN = false) the JTAG serial input jtag_tdi_i is directly
connected to the JTAG serial output jtag_tdo_o to maintain the JTAG chain.
|
The NEORV32 JTAG TAP does not provide a boundary check function (yet?). Hence, physical device pins cannot be accessed. |
The DTM uses the "debug module interface (dmi)" to access the actual debug module (DM).
These accesses are controlled by TAP-internal registers.
Each registers is selected by the JTAG instruction register (IR
) and accessed through the JTAG data register (DR
).
The DTM’s instruction and data registers can be accessed using OpenOCDs irscan and drscan commands.
The RISC-V port of OpenOCD also provides low-level command (riscv dmi_read & riscv dmi_write ) to access the dmi
debug module interface.
|
JTAG access is conducted via the instruction register IR
, which is 5 bit wide, and several data registers DR
with different sizes.
The data registers are accessed by writing the according address to the instruction register.
The following table shows the available data registers:
Address (via IR ) |
Name | Size [bits] | Description |
---|---|---|---|
|
|
32 |
identifier, default: |
|
|
32 |
debug transport module control and status register |
|
|
41 |
debug module interface (dmi); 7-bit address, 32-bit read/write data, 2-bit operation ( |
others |
|
1 |
default JTAG bypass register |
See the RISC-V debug specification for more information regarding the data
registers and operations.
A local copy can be found in docs/references
.
5.2. Debug Module (DM)
According to the RISC-V debug specification, the DM (VHDL module: rtl/core/neorv32_debug_dm.vhd
)
acts as a translation interface between abstract operations issued by the debugger and the platform-specific
debugger implementation. It supports the following features (excerpt from the debug spec):
-
Gives the debugger necessary information about the implementation.
-
Allows the hart to be halted and resumed and provides status of the current state.
-
Provides abstract read and write access to the halted hart’s GPRs.
-
Provides access to a reset signal that allows debugging from the very first instruction after reset.
-
Provides a mechanism to allow debugging the hart immediately out of reset. (still experimental)
-
Provides a Program Buffer to force the hart to execute arbitrary instructions.
-
Allows memory access from a hart’s point of view.
The NEORV32 DM follows the "Minimal RISC-V External Debug Specification" to provide full debugging capabilities while keeping resource (area) requirements at a minimum level. It implements the execution based debugging scheme for a single hart and provides the following hardware features:
-
program buffer with 2 entries and implicit
ebreak
instruction afterwards -
no direct bus access (indirect bus access via the CPU)
-
abstract commands: "access register" plus auto-execution
-
no dedicated halt-on-reset capabilities yet (but can be emulated)
The DM provides two "sides of access": access from the DTM via the debug module interface (dmi) and access from the CPU via the processor-internal bus. From the DTM’s point of view, the DM implements a set of DM Registers that are used to control and monitor the actual debugging. From the CPU’s point of view, the DM implements several memory-mapped registers (within the normal address space) that are used for communicating debugging control and status (DM CPU Access).
5.2.1. DM Registers
The DM is controlled via a set of registers that are accessed via the DTM’s dmi. The "Minimal RISC-V Debug Specification" requires only a subset of the registers specified in the spec. The following registers are implemented. Write accesses to any other registers are ignored and read accesses will always return zero. Register names that are encapsulated in "( )" are not actually implemented; however, they are listed to explicitly show their functionality.
Address | Name | Description |
---|---|---|
|
|
Abstract data 0, used for data transfer between debugger and processor |
|
|
Debug module control |
|
|
Debug module status |
|
|
Hart information |
|
|
Abstract control and status |
|
|
Abstract command |
|
|
Abstract command auto-execution |
|
( |
Base address of next DM; read as zero to indicate there is only one DM |
|
|
Program buffer 0 |
|
|
Program buffer 1 |
|
( |
System bus access control and status; read as zero to indicate there is no direct system bus access |
|
|
Halt summary 0 |
data
0x04 |
Abstract data 0 |
|
Reset value: UNDEFINED |
||
Basic read/write registers to be used with abstract command (for example to read/write data from/to CPU GPRs). |
dmcontrol
0x10 |
Debug module control register |
|
Reset value: 0x00000000 |
||
Control of the overall debug module and the hart. The following table shows all implemented bits. All remaining bits/bit-fields are configures as "zero" and are read-only. Writing '1' to these bits/fields will be ignored. |
Bit | Name [RISC-V] | R/W | Description |
---|---|---|---|
31 |
|
-/w |
set/clear hart halt request |
30 |
|
-/w |
request hart to resume |
28 |
|
-/w |
write |
1 |
|
r/w |
put whole processor into reset when |
0 |
|
r/w |
DM enable; writing |
dmstatus
0x11 |
Debug module status register |
|
Reset value: 0x00000000 |
||
Current status of the overall debug module and the hart. The entire register is read-only. |
Bit | Name [RISC-V] | Description |
---|---|---|
31:23 |
reserved |
reserved; always zero |
22 |
|
always |
21:20 |
reserved |
reserved; always zero |
19 |
|
|
18 |
|
|
17 |
|
|
16 |
|
|
15 |
|
always zero to indicate the hart is always existent |
14 |
|
|
13 |
|
|
12 |
|
|
11 |
|
|
10 |
|
|
9 |
|
|
8 |
|
|
7 |
|
always |
6 |
|
always |
5 |
|
always |
4 |
|
always |
3:0 |
|
|
hartinfo
0x12 |
Hart information |
|
Reset value: see below |
||
This register gives information about the hart. The entire register is read-only. |
Bit | Name [RISC-V] | Description |
---|---|---|
31:24 |
reserved |
reserved; always zero |
23:20 |
|
|
19:17 |
reserved |
reserved; always zero |
16 |
|
|
15:12 |
|
|
11:0 |
|
= |
abstracts
0x16 |
Abstract control and status |
|
Reset value: see below |
||
Command execution info and status. |
Bit | Name [RISC-V] | R/W | Description |
---|---|---|---|
31:29 |
reserved |
r/- |
reserved; always zero |
28:24 |
|
r/- |
|
23:11 |
reserved |
r/- |
reserved; always zero |
12 |
|
r/- |
|
11 |
reserved |
r/- |
reserved; always zero |
10:8 |
|
r/w |
error during command execution (see below); has to be cleared by writing |
7:4 |
reserved |
r/- |
reserved; always zero |
3:0 |
|
r/- |
|
Error codes in cmderr
(highest priority first):
-
000
- no error -
100
- command cannot be executed since hart is not in expected state -
011
- exception during command execution -
010
- unsupported command -
001
- invalid DM register read/write while command is/was executing
command
0x17 |
Abstract command |
|
Reset value: 0x00000000 |
||
Writing this register will trigger the execution of an abstract command. New command can only be executed if
|
The NEORV32 DM only supports Access Register abstract commands. These commands can only access the
hart’s GPRs (abstract command register index 0x1000 - 0x101f ).
|
Bit | Name [RISC-V] | R/W |
---|---|---|
Description / required value |
31:24 |
|
-/w |
|
23 |
reserved |
-/w |
reserved, has to be |
22:20 |
|
-/w |
|
21 |
|
-/w |
|
18 |
|
-/w |
if set the program buffer is executed after the command |
17 |
|
-/w |
if set the operation in |
16 |
|
-/w |
|
15:0 |
|
-/w |
GPR-access only; has to be |
abstractauto
0x18 |
Abstract command auto-execution |
|
Reset value: 0x00000000s |
||
Register to configure when a read/write access to a DM repeats execution of the last abstract command. |
Bit | Name [RISC-V] | R/W | Description |
---|---|---|---|
17 |
|
r/w |
when set reading/writing from/to |
16 |
|
r/w |
when set reading/writing from/to |
0 |
|
r/w |
when set reading/writing from/to |
progbuf
0x20 |
Program buffer 0 |
|
0x21 |
Program buffer 1 |
|
Reset value: |
||
General purpose program buffer for the DM. |
haltsum0
0x40 |
Halt summary 0 |
|
Reset value: UNDEFINED |
||
Bit 0 of this register is set if the hart is halted (all remaining bits are always zero). The entire register is read-only. |
5.2.2. DM CPU Access
From the CPU’s point of view, the DM behaves as a memory-mapped peripheral that includes
-
a small ROM that contains the code for the "park loop", which is executed when the CPU is in debug mode.
-
a program buffer populated by the debugger host to execute small programs
-
a data buffer to transfer data between the processor and the debugger host
-
a status register to communicate debugging requests
Park Loop Code Sources
The assembly sources of the park loop code are available in sw/ocd-firmware/park_loop.S . Please note, that these
sources are not intended to be changed by the used. Hence, the makefile does not provide an automatic option
to compile and "install" the debugger ROM code into the HDL sources and require a manual copy
(see sw/ocd-firmware/README.md ).
|
The DM uses a total address space of 128 words of the CPU’s address space (= 512 bytes) divided into four sections of 32 words (= 128 bytes) each. Please note, that the program buffer, the data buffer and the status register only uses a few effective words in this address space. However, these effective addresses are mirrored to fill up the whole 128 bytes of the section. Hence, any CPU access within this address space will succeed.
Base address | Name [VHDL package] | Actual size | Description |
---|---|---|---|
|
|
128 bytes |
Code ROM for the "park loop" code |
|
|
16 bytes |
Program buffer, provided by DM |
|
|
4 bytes |
Data buffer ( |
|
|
4 bytes |
Control and status register |
From the CPU’s point of view, the DM is mapped to an "unused" address range within the processor’s
Address Space right between the bootloader ROM (BOOTROM) and the actual processor-internal IO
space at addresses 0xfffff800 - 0xfffff9ff
|
When the CPU enters or re-enters (for example via ebreak
in the DM’s program buffer) debug mode, it jumps to
the beginning of the DM’s "park loop" code ROM at dm_code_base_c
. This is the normal entry point for the
park loop code. If an exception is encountered during debug mode, the CPU jumps to dm_code_base_c + 4
,
which is the exception entry point.
Status Register
The status register provides a direct communication channel between the CPU executing the park loop and the host-controlled controller of the DM. Note that all bits that can be written by the CPU (acknowledge flags) cause a single-shot (1-cycle) signal to the DM controller and auto-clear (always read as zero). The bits that are driven by the DM controller and are read-only to the CPU and keep their state until the CPU acknowledges the according request.
Bit | Name | CPU access | Description |
---|---|---|---|
0 |
|
-/w |
Set by the CPU to indicate that the CPU is halted and keeps iterating in the park loop |
1 |
|
r/- |
Set by the DM to tell the CPU to resume normal operation (leave parking loop and leave debug mode via |
2 |
|
-/w |
Set by the CPU to acknowledge that the CPU is now going to leave parking loop & debug mode |
3 |
|
r/- |
Set by the DM to tell the CPU to leave debug mode and execute the instructions from the program buffer; CPU will re-enter parking loop afterwards |
4 |
|
-/w |
Set by the CPU to acknowledge that the CPU is now going to execute the program buffer |
5 |
|
-/w |
Set by the CPU to inform the DM that an exception occurred during execution of the park loop or during execution of the program buffer |
5.3. CPU Debug Mode
The NEORV32 CPU Debug Mode DB
(part of rtl/core/neorv32_cpu_control.vhd
) is compatible to the "Minimal RISC-V Debug Specification 0.13.2".
It is enabled/implemented by setting the CPU generic CPU_EXTENSION_RISCV_DEBUG to "true" (done by setting processor
generic ON_CHIP_DEBUGGER_EN).
It provides a new operation mode called "debug mode".
When enabled, three additional CSRs are available (section CPU Debug Mode CSRs) and also the "return from debug mode"
instruction dret
is available when the CPU is "in" debug mode.
The CPU debug mode requires the Zicsr and Zifencei CPU extension to be implemented (top generics CPU_EXTENSION_RISCV_Zicsr
and CPU_EXTENSION_RISCV_Zifencei = true).
|
Hardware Watchpoints and Breakpoints
The NEORV32 CPU debug mode does not provide a hardware "trigger module" (which is optional in the RISC-V debug spec). However, gdb
provides a native emulation for code (breakpoints using break instruction) and data (polling data watchpoints in automated
single-stepping) triggers.
|
The CPU debug-mode is entered when one of the following events appear:
-
executing
ebreak
instruction (whendcsr.ebreakm
is set and in machine mode OR whendcsr.ebreaku
is set and in user mode) -
debug halt request from external DM (via CPU signal
db_halt_req_i
, high-active, triggering on rising-edge) -
finished executing of a single instruction while in single-step debugging mode (enabled via
dcsr.step
)
From a hardware point of view, these "entry conditions" are special synchronous (ebreak
instruction) or asynchronous
(single-stepping "interrupt"; halt request "interrupt") traps, that are handled invisibly by the control logic.
WFI instruction
The wait-for-interrupt instruction wfi puts the CPU into sleep mode. The CPU will resume normale operation
when at least one interrupt source becomes pending (= at least one bit in mip CSR is set).
However, the CPU will also resume from sleep mode if there is a halt request from the debug module (DM).
|
Whenever the CPU enters debug-mode it performs the following operations:
-
move
pc
todpcs
-
copy the hart’s current privilege level to
dcsr.prv
-
set
dcrs.cause
according to the cause why debug mode is entered -
no update of
mtval
,mcause
,mtval
andmstatus
CSRs -
load the address configured via the CPU CPU_DEBUG_ADDR generic to the
pc
to jump to "debugger park loop" code in the debug module (DM)
When the CPU is in debug-mode the following things are important:
-
while in debug mode, the CPU executes the parking loop and the program buffer provided by the DM if requested
-
effective CPU privilege level is
machine
mode, any PMP configuration is bypassed -
the
wfi
instruction acts as anop
(also during single-stepping) -
if an exception occurs:
-
if the exception was caused by any debug-mode entry action the CPU jumps to the normal entry point (= CPU_DEBUG_ADDR) of the park loop again (for example when executing
ebreak
in debug-mode) -
for all other exception sources the CPU jumps to the exception entry point ( = CPU_DEBUG_ADDR + 4) to signal an exception to the DM and restarts the park loop again afterwards
-
-
interrupts are disabled; however, they will remain pending and will get executed after the CPU has left debug mode
-
if the DM makes a resume request, the park loop exits and the CPU leaves debug mode (executing
dret
)
Debug mode is left either by executing the dret
instruction [16] (in debug mode) or by performing
a hardware reset of the CPU. Executing dret
outside of debug mode will raise an illegal instruction exception.
Whenever the CPU leaves debug mode the following things happen:
-
set the hart’s current privilege level according to
dcsr.prv
-
restore
pc
fromdpcs
-
resume normal operation at
pc
5.3.1. CPU Debug Mode CSRs
Two additional CSRs are required by the Minimal RISC-V Debug Specification: The debug mode control and status register
dcsr
and the program counter dpc
. Providing a general purpose scratch register for debug mode (dscratch0
) allows
faster execution of program provided by the debugger, since one general purpose register can be backup-ed and
directly used.
The debug-mode control and status registers (CSRs) are only accessible when the CPU is in debug mode.
If these CSRs are accessed outside of debug mode (for example when in machine mode) an illegal instruction exception
is raised.
|
dcsr
0x7b0 |
Debug control and status register |
|
Reset value: 0x00000000 |
||
The |
Bit | Name [RISC-V] | R/W | Event |
---|---|---|---|
31:28 |
|
r/- |
always |
27:16 |
- |
r/- |
reserved, read as zero |
15 |
|
r/w |
|
14 |
|
r/- |
|
13 |
|
r/- |
|
12 |
|
r/w |
|
11 |
|
r/- |
|
10 |
|
r/- |
|
9 |
|
r/- |
|
8:6 |
|
r/- |
cause identifier - why debug mode was entered |
5 |
- |
r/- |
reserved, read as zero |
4 |
|
r/- |
|
3 |
|
r/- |
|
2 |
|
r/w |
enable single-stepping when set |
1:0 |
|
r/w |
CPU privilege level before/after debug mode |
dpc
0x7b1 |
Debug program counter |
|
Reset value: UNDEFINED |
||
The |
dscratch0
0x7b2 |
Debug scratch register 0 |
|
Reset value: UNDEFINED |
||
The |
6. Legal
License
BSD 3-Clause License
Copyright (c) 2021, Stephan Nolting. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
-
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
-
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
-
Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
The NEORV32 RISC-V Processor
Copyright (c) 2021, by Dipl.-Ing. Stephan Nolting. All rights reserved.
HQ: https://github.com/stnolting/neorv32
Contact: stnolting@gmail.com
made in Hanover, Germany
Proprietary Notice
-
"GitHub" is a Subsidiary of Microsoft Corporation.
-
"Vivado" and "Artix" are trademarks of Xilinx Inc.
-
"AXI", "AXI4-Lite" and "AXI4-Stream" are trademarks of Arm Holdings plc.
-
"ModelSim" is a trademark of Mentor Graphics – A Siemens Business.
-
"Quartus Prime" and "Cyclone" are trademarks of Intel Corporation.
-
"iCE40", "UltraPlus" and "Radiant" are trademarks of Lattice Semiconductor Corporation.
-
"Windows" is a trademark of Microsoft Corporation.
-
"Tera Term" copyright by T. Teranishi.
-
Timing diagrams made with WaveDrom Editor.
-
"NeoPixel" is a trademark of Adafruit Industries.
-
Documentation made with
asciidoctor
.
PDF icons from https://www.flaticon.com and made by Freepik, Good Ware, Pixel perfect, Vectors Market
Disclaimer
This project is released under the BSD 3-Clause license. No copyright infringement intended. Other implied or used projects might have different licensing – see their documentation to get more information.
Limitation of Liability for External Links
This document contains links to the websites of third parties ("external links"). As the content of these websites is not under our control, we cannot assume any liability for such external content. In all cases, the provider of information of the linked websites is liable for the content and accuracy of the information provided. At the point in time when the links were placed, no infringements of the law were recognizable to us. As soon as an infringement of the law becomes known to us, we will immediately remove the link in question.
Citing
If you are using the NEORV32 or parts of the project in some kind of publication, please cite it as follows:
Contributors ❤️
Please add as many contributors as possible to the authors field 😉.This project would not be where it is without them. Full names can be found in the repository’s .mailmap .
|
@misc{nolting20,
author = {Nolting, S.},
title = {The NEORV32 RISC-V Processor},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/stnolting/neorv32}}
}
DOI
This project also provides a digital object identifier provided by zenodo:
|
Acknowledgments
A big shoutout to all contributors, who helped improving this project! ❤️
RISC-V - instruction sets want to be free!
Impressum (Imprint)
See docs/impressum.md
.