ZPU: Softcore implementation on a Spartan-3 FPGA

Jump to: navigation, search


This project describes the implementation of an opensource softcore CPU on the FPGA board Spartan-3 of Xilinx and the connection of peripheral devices using their bus interface.

Article in german: http://www.mikrocontroller.net/articles/ZPU:_Softcore_Implementierung_auf_Spartan-3_FPGA

Forum discussion in german: http://www.mikrocontroller.net/topic/212445



  • VHDL
  • Development environment: Xilinx ISE WebPack 10.1


  • ZPU of Øyvind Harboe (opencores.org)
  • stack based
  • 32 Bit
  • 8 Bit opcode
  • small, easy to use, clearly represented
  • programmable in C / C++ using GCC-Toolchain (ZPUGCC)
  • integrated program memory (BlockRAM)


  • Wishbone-Bus
  • 32 Bit

Peripheral devices:

  • Interrupt-Controller
  • UART (8N1)
  • SRAM
  • VGA
  • PS/2
  • Onboard-I/Os


For the project I used the experimental board Spartan-3 of Xilinx. This is a beginner board (see figure 1) with a XC3S200 FPGA (1). This chip is not the best one for using a softcore processor with peripheral devices because it only has 12 pieces of BlockRAM (equivalent to 24 KB).

The most important components for the project are: a PS/2 interface (2), a VGA interface (3), a RS232 interface (4), 8 switches (5), 8 LEDs (6), 4 buttons (7) and 4 seven-segment-panels (8) and the SRAM on the back side of the board. The board resp. the FPGA can be programmed with a JTAG adapter. A suitable software, the Xilinx development environment, can be downloaded for free on the page of Xilinx. For this project the version 10.1 has been used.

For the project a free softcore CPU has been chosen, the ZPU of Øyvind Harboe. It is a small stack based 32 bit CPU with a 8 bit opcode. It has a simple and clear structure and because of that optimally suitable for the intended diversifications. The opensource softcore CPU is created as a VHDL project. Very useful is the possibility to compile software written in C or C++ with a special GCC-Toolchain (ZPUGCC) to the ZPUs machine code. That means the ZPU can be run programs written in C. For this project the small version of the ZPU has been used.

The following article is a tutorial for all those who want to implement the ZPU on a FPGA.

The tutorial contains the following steps:

  • Download of all source codes and documentations
  • Creating a project in the development environment Xilinx ISE
  • Implementation of the softcore CPU on the FPGA
  • Implementation of suitable peripheral devices and their connection to the CPU by using their bus interfaces
  • Creating an executable program to show the functionality of the whole system

Apart from the project there exists a detailed documentation (only in german available) that can be downloaded together with the project.


Structure and functional principle of the ZPU

The ZPU works like a state machine that runs mainly through the phases fetch (get command), decode (decode command) and execute (execute command):

  • Fetch: In this state the ZPU loads a complete 32 bit value from the program memory by using the program counter. There for the 32 bit value is addressed by the upper 30 bits of the program counter. Note, that the program counter has a width of 32 bit but the actual address width of the program memory depends on its size (see below). That means that perhaps the entire width of the program counter can not be used.
  • Decode: The last two bits of the program counter are used to select the 8 bit opcode from the 32 bit value of the program memory. That 8 bit will be decoded. So finally the progam counter addresses always 8 bit.
  • Execute: In the execution cycle the ZPU jumps in to different states dependent on the decoded opcode to execute the instructions. Furthermore the program counter is incremented by 1.

The Fetch state is followed by a Fetch-Next state. That means at least 4 clocks are necessary to execute a single command. The board frequency of 50 MHz implies that the maximum working frequency is 12.5 MHz.

The ZPU has a BlockRAM unit that is used as a 32 bit program memory. In this case it is used as a dual port RAM. That means both ports can be used to read or write from or to the same address at the same time. But not writing from both ports to the same address at the same time. Furthermore the RAM contains the stack. Currently the memory has a size of 16 KB.

In the VHDL source code the structure of the ZPU is an entity that integrates the ZPU core and provides outwards a wishbone interface. The entity of the ZPU core implements the dual port RAM.


The Wishbone bus is an opensource hardware computer bus. Via it the different units of an integrated circuit can communicate which each other. It is a logical bus, what means that it does not define the electrical informations. Instead the specifications defines signals, clock cycles and high- and low levels. This makes it easy to use it in VHDL. All signals are hereby synchronous to the clock.

For this project a 32 bit version of the wishbone bus is used (32 bit address bus width and 32 bit data bus width). As topology the shared bus is used, that means all devices are connected to the same address and data bus and there is only one master existing (the ZPU). When using more than one slave, the current slave is selected by the address from the address bus. All peripheral devices are slaves in this project. An example for a connection between the master and a slave is shown in figure 2. The unit SysCon contains the clock generation and a connection to the reset button. The explanation of the signals is shown in table 1. Examples for a Whishbone output interface and a Whisbone input interface are shown in figure 3 and 4. In this project the interfaces described at the beginning of this article has been implemented.


On www.opencores.org are a lot of free peripheral devices with a Wishbone interface offered to download them. The interfaces has mostly to be adjusted. It makes sense to use a device that is already working, what means it has been tested in an own project. This is due to exclude failures coming from the device when connecting it to the Wishbone bus.


Download project and documentation: File:ZPU Softcore Implementierung auf Spartan-3 FPGA.zip

Start-up of the board and the development environment

1. Connect the board:

To use the board the supplied power supply and the JTAG programming cable has to be connected to the board (regard the polarity of the cables!).
Anschluss jtag.png

2. Install the development environment:

Download Xilinx ISE WebPack 10.1 from http://www.xilinx.com/support/download/index.htm
Download xilinx ise.png
A free registration at Xilinx is neccessary. After the download unpack the file and install it. Then registrate the product to use it.

3. Create first example project for the board in VHDL:

  • Start the Project Navigator in the Xilinx ISE Design Suite 10.1 from the start menu
  • Create a new project (menu File > New Project). In the following dialog box choose a project name and directory and select HDL as Top-level source. Please: no file names and paths with special characters (like space)! Click on Next. Now the FPGA has to be chosen. The following figure shows the data for the Spartan-3 Board.
  • Click on Next resp. Finish until the dialog box is closed. In the window on the left (Sources) the project is shown now. There will be only the FPGA added to the project. By doing a right click on the project and click on New Source a new VHDL module can be added. In the now opened dialog box select VHDL Module and choose a file name. Furthermore activate the Add to project. Then click on Next.
  • In the following window the input and output signals of the Entity can be set, this Entity is contained in the VHDL module. For this example you can see the configuration in the following figure.
Ports einstellen.png
  • Then click on Next resp. Finish until the dialog box is closed. In the window Sources now you have got the just created VHDL module. Open it with a double click. The empty Entity should look like this (except the individual name, here called test):

<vhdl>library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL;

entity test is

   Port ( clk, reset : in  STD_LOGIC;
          led : out  STD_LOGIC_VECTOR (7 downto 0));

end test;

architecture Behavioral of test is


end Behavioral; </vhdl>

  • As an example we implement a moving light that controls the 8 LEDs on the board. For this the architecture has to be changed like shown below:

<vhdl> architecture Behavioral of test is signal r_reg, r_next: unsigned (31 downto 0); signal led_reg, led_next: unsigned (7 downto 0); signal timer_reg, timer_next: unsigned (7 downto 0); begin

process (clk,reset) begin if (reset = '1') then r_reg <= (others => '0'); led_reg <= "00000001"; timer_reg <= (others => '0'); elsif rising_edge(clk) then r_reg <= r_next; led_reg <= led_next; timer_reg <= timer_next; end if; end process;

process (r_reg, led_reg, timer_reg) begin r_next <= r_reg; led_next <= led_reg; timer_next <= timer_reg;

if (r_reg >= 25000000) then if (timer_reg < 7) then led_next <= led_reg(6 downto 0) & led_reg(7); timer_next <= timer_reg + 1; else led_next <= led_reg(0) & led_reg(7 downto 1); if (timer_reg = 13) then timer_next <= (others => '0'); else timer_next <= timer_reg + 1; end if; end if;

r_next <= (others => '0'); else r_next <= r_reg + 1; timer_next <= timer_reg; end if;

end process;

led <= std_logic_vector(led_reg);

end Behavioral; </vhdl>

  • After saving the VHDL module, select it in the window Sources and then in the window Processes below do the Check Syntax by doing a double click on it (it is located under Synthesis - XST). It should be successful (a green hook is shown). If not, check the content of the VHDL module again.
  • The ports used in the Entity has to be connected to the pins of the FPGA. To do that a Constraints File is needed. It can be added via New Source. This time you select Implementation Constraints File, enter a file name and tick the Add to project. Again click on Next resp. Finish until the dialog box is closed. Now there is a plus in front of the VHDL module in the window Sources. If you click on it, the next level is opened, in this case it is the Constraints File. Select it and in the windows Processes click on the plus in front of User Constraints. The next level is opened with Edit Constraints (Text). Do a double click on it to edit the Constraints File. Enter the following code:


  1. ========================================================
  2. Pin assignment for Xilinx
  3. Spartan-3 Starter board
  4. ========================================================
  1. ========================================================
  2. clock and reset signals
  3. ========================================================

NET "clk" LOC = "T9"; NET "reset" LOC = "L14";

  1. ========================================================
  2. 8 LEDs on the board
  3. ========================================================

NET "led<0>" LOC = "K12"; NET "led<1>" LOC = "P14"; NET "led<2>" LOC = "L12"; NET "led<3>" LOC = "N14"; NET "led<4>" LOC = "P13"; NET "led<5>" LOC = "N12"; NET "led<6>" LOC = "P12"; NET "led<7>" LOC = "P11";

  1. ========================================================
  2. Timing constraint of S3 50-MHz onboard oscillator
  3. name of the clock signal is clk
  4. ========================================================

NET "clk" TNM_NET = "clk"; TIMESPEC "TS_clk" = PERIOD "clk" 20 ns HIGH 50 %; </vhdl>

  • Select again the VHDL module in the window Sources and do a double click on Generate Programming File in the window Processes. Now the complete project will by synthesized by Xilinx (it will do the steps Synthesize - XST and Implement Design before automatically). Behind all three a green hook should be shown afterwards.
  • Now we have to load the project on to the FPGA. Do a double click on Configure Target Device, a connection is established to the board using the JTAG adapter. Ignore the warnings of the program iMPACT by clicking on OK. In the following dialog box select Configure devices using Boundary-Scan (JTAG) and click on Finish. Now the FPGA board will be found automatically. In the main window the Boundary Scan will be opened with 2 elements. The left one is the FPGA, the right one the ConfigurationFLASH (not needed). A dialog box should be opened automatically that wants to know where the Configuration Files (.bit files) for the two elements are. The dialog box can be opened as well by doing a double click on the elements. The FPGA (xc3s200) gets the .bit file from the main directory of the project. The other element has to be selected as Bypass. In the next dialog box make sure that Verify is NOT ticked and then click on OK. Now do a right click on the FPGA and click on Program to load the project on to the FPGA. The LEDs on the board should show flash now, from left to right and back.

Overview of the VHDL project

The overview of the project structure is shown in the figure below:

Vhdl projekt uebersicht.png

As shown above the top of the tree, the top module, is the file softcore_top.vhd. It combines all the neccessary external signals of the board using io_pins.ucf with the moduls. The top module contains the Wishbone bus and the connections of the slaves. The definition of the address range is also located in this file.

Using softcore_top.vhd all neccessary Wishbone interfaces are included, starting with the ZPU that is configured as the Wishbone master followed by an interrupt controller, a UART, a PS/2 (keyboard) interface, a SRAM controller, a VGA interface and a controller for the onboard LEDs, switches, buttons and seven-segment displays.

The file wb_core.vhd contains a Wishbone interface for the ZPU and switches the address, data and control signals to wishbone compliant signals. The ZPU is coded in zpu_core.vhd and contains its state machine, the memory controller and the opcode interpretation and the access to external peripherals via the Wishbone interface. The ZPU file includes the BRAM (file zpu_bram.vhd and softcore.bmm) that is used as program memory and stack.

All files of the project ending with _pkg.vhd contain the declaration of constants and definition of components. The file zpu_pkg.vhd contains the opcode definition of the ZPU.

The files ending with _config.vhd contain the global constants. In softcore_config.vhd e.g. the CPU clock frequency and the baudrate of the UART can be set. In zpu_config.vhd the address width can be set as well as the data width, the stack size and the program memory size.

The file interrupt.vhd contains the interrupt controller and its Wishbone interface. Further to the Wishbone signals the controler has got inputs for the interrupts of other peripherals and an output to set the interrupt of the ZPU. Interrupting the ZPU by this output (if the interrupt is enabled and accepted by the ZPU), the controller sends the ZPU the interrupt number. Using software the reaction to the interrupt of the ZPU can be controlled. This means the software must know which interrupt number is mapped to which peripheral device.

The file uart.vhd contains the Wishbone interface of the UART. The files below are the modules for sending (tx_unit.vhd) and receiving (rx_unit.vhd). Both have a shared timer module (brgen.vhd) that is a clock divider. These two modules are used to check cyclically the UART for received data and to send synchronously. The timer values are calculated for these actions by using the baudrate (defined in softcore_config.vhd). Received data cause an interrupt that is notified to the interrupt controller. The moduls for sending and receiving are simple 8N1 moduls downloaded from www.opencores.org. The file ps2.vhd contains a very simple PS/2 interface with a Wishbone interface. The clock signal of the keyboard is retrieved and than the 10 bit telegram read. An interrupt notifies that data are available.

The file intIO.vhd contains the Wishbone interface to control the switches, buttons, LEDs and 7-segment-panels at the board. Switches and buttons are read by calling them with their address and LEDs switched by the lower 8 bit data word. The 7-segment-panels can not all be accessed at the same time, therefore a state machine is switching from one to another. The 32 bit data word is used to set the panels - 1 byte per panel and the highest byte sets the panel to the left.

The file wb_sram.vhd contains the wishbone interface of the SRAM controller. Itself is located in the file MemoryController.vhd. The SRAM is using only 18 bit for its address, therefore the highest address bit is used to switch between writing and reading.

The file wb_vga.vhd contains the Wishbone interface for the VGA controller. Actually there are two projects put together to implement this controller. First a simple control of the color chanels of the VGA port (see chapter "Adding new peripherals"), next to display text at the screen (80x30 positions). Bit 3 is used to switch between these two modes, for the color mode bit 0-2 are used for the RGB values. For the text mode currently a static text is created and displayed (file font_rom_pixel_generation.vhd). The file font_rom.vhd contains a ROM that is filled with the characters. Both modes are using the resolution 640x480. The file vga_sync.vhd contains the control of the comport, it generates the signal that is being send to the screen.

Peripheral devices

Integrated peripherals

During the project the following peripherals has been connected to the Wishbone bus. The addresses for the interfaces are directly coded and can be changed by editing the file softcore_top.vhd.

1. RS-232:

A UART with 8 data bits, 1 stop bit, parity none and a baudrate of 115200 baud (defined in the file soft-core_config.vhd). An interrupt signals a new incoming character on the UART. By reading the RX address of the UART the received character can be read.
RX address: 0x80101004
TX address: 0x80101008

2. Internal I/Os:

Including the 8 switches, 4 buttons, 8 LEDs and 4 7-segment-panels on the board. Reading the address of the switches and buttons results the state of the switches (8 bit) or the buttons (4 bit). Sending an 8 bit value to the LED address switches them. Sending a 32 bit value to the address of the 7-segment-panel controls the 4 panels. The highest byte controls the panel to the left and so on. See the following picture for details on which bit in a byte selects which segment in the panel.
Address of the buttons: 0x80080000
Address of the switches: 0x80080001
Address of the LEDs: 0x80080002
Address of the 7-segment-panels: 0x80080003

3. PS/2 interface:

A simple keyboard PS/2 interface that sends an interrupt when a key is pressed. The scancode of the keyboard is transferred.
Address of the PS/2 interface: 0x80100000C

4. SRAM memory:

One of the two onboard SRAM chips has a Wishbone interface. It has 16 bit data and a 18 bit address. For writing to the SRAM first the data are send to the Data-In address and then the address to the 18 bit SRAM address. The highest address bit (31) is used to set the read (bit 31 = 1) or write mode (bit 31 = 0).
Address for SRAM address: 0x800C0000
Address to write data: 0x800C0001
Address to read data: 0x800C0002

5. VGA interface:

The VGA interface has been implemented with to modes using basic functionalities of the screen. First it's possible to set the color at the screen by a 3 bit RGB value (8 colors). The 4th bit is used to switch between this color mode (bit = 0) and a state machine (bit = 1) that displays characters of a ROM. The resolution for both modes is 640x480.
Address of the VGA interface: 0x800E0000

Further an interrupt controller is implemented as Wishbone slave. All interrupts of the other slaves are handled by this controller and it signalizes the ZPU which address has caused the interrupt. Every interrupt must have an own IRQ line, therefore the constant irqLines in the file softcore_top.vhd has to be updated for every new slave. The standard is that all interrupts are deactivated. They can be set as available via the IR control address (0x8000000C) and activated cia the IR enable address (0x80000008). An interrupt causes the call of the function _zpu_interrupt(). The address that caused the interrupt is contained in the interrupt address 0x80000000.

Adding new peripherals

On www.opencores.org you can find a lot of free peripherals with Wishbone interfaces to download. The interface has to be changed a little. This chapter outlines with an example how to create a new peripheral with a Wishbone interface. Using this knowledge it should be possible to change existing peripherals for the use in the project - doesn't matter if they have already a Wishbone interface or not. This example implements a VGA controller.

It's usefull to use a device that is already working (already tested in another or an own project) so that all failures occuring while implementing it in the project are only because of a wrong connection.


The Wishbone signals are collected in Record Types that are defined in the file softcore_pkg.vhd. The structure wb_master_in is shown below and contains the input signals to the master: <vhdl> type wb_master_in is record data : std_logic_vector(31 downto 0); ack : std_logic; end record; </vhdl> The structure wb_master_out contains the output signals coming from the master: <vhdl> type wb_master_out is record addr : std_logic_vector(31 downto 0); data : std_logic_vector(31 downto 0); sel : std_logic_vector(3 downto 0); we : std_logic; stb : std_logic; cyc : std_logic; end record; </vhdl> It's enough for a Wishbone interface to define the clock, reset and a wb_master_out signal as inputs (slave_in) and a wb_master_in signal as output (slave_out). If neccessary, an interrupt signal can be defined as a further output and other input and output signals that are defined in the .ucf file.

The VGA interface receives the neccessary signals for a Wishbone communication and the interrupt signal. The interrupt signal can not yet be used, therefore it is connected to 0. Next there are the signals that will be mapped to the VGA port: A signal hsync for the horizontal synchronisation, a signal vsync for the vertical synchronisation and a 3 chanel signal rgb for the 3 color chanels (8 colors possible).

First a new VHDL module is needed with the name vga_sync.vhd. The source code can be downloaded from here: File:Vga sync.vhd. Next another VHDL module called wb_vga.vhd. The entity is called wb_vga, this will be the Wishbone slave. It will contain all inputs and outputs. To use the structure wb_master_in and wb_master_out not only the standard libraries, also the library work is needed and use work.softcore_pkg.all.

The entity wb_vga will now look like: <vhdl> library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL;

library work; use work.softcore_pkg.ALL;

entity wb_vga is port ( clk, reset : in std_logic; interrupt : out std_logic; slave_in : in wb_master_out; slave_out : out wb_master_in; hsync, vsync : out std_logic; rgb : out std_logic_vector(2 downto 0) ); end wb_vga;

architecture Behavioral of wb_vga is begin end Behavioral; </vhdl> Now this empty Wishbone slave will be connected to the bus. This means that the just described 3 signals of the VGA port have to be connected through to the top level and connect the slave to the bus. The .ucf-file is updated with the following: <vhdl>

  1. ========================================================
  2. VGA outputs
  3. ========================================================

NET "ioVGA_rgb<2>" LOC = "R12" | DRIVE=8 | SLEW=FAST; NET "ioVGA_rgb<1>" LOC = "T12" | DRIVE=8 | SLEW=FAST; NET "ioVGA_rgb<0>" LOC = "R11" | DRIVE=8 | SLEW=FAST; NET "ioVGA_vsync" LOC = "T10" | DRIVE=8 | SLEW=FAST; NET "ioVGA_hsync" LOC = "R9" | DRIVE=8 | SLEW=FAST; </vhdl> Now the signal of the VGA port are connected to the FPGA. The top level is the entity softcore_top in the softcore_top.vhd. In this entity the signals of the VGA ports will be added as ports: <vhdl> ioVGA_hsync, ioVGA_vsync: out std_logic; ioVGA_rgb: out std_logic_vector(2 downto 0); </vhdl> The file softcore_pkg.vhd still needs to be changed. In this file the components are defined, therefore the new wb_vga has to be added in here that it can be used: <vhdl> component wb_vga is port ( clk, reset : in std_logic; interrupt : out std_logic; slave_in : in wb_master_out; slave_out : out wb_master_in; hsync, vsync : out std_logic; rgb : out std_logic_vector(2 downto 0) ); end component; </vhdl> Now wb_vga can be implemented in the entity softcore_top, therefore a port mapping is neccessary (like with all other Wishbone slaves too). Before starting that the signals need to be defined. In the architecture of the softcore_top (in softcore_top.vhd) will be added the following before the mark begin: <vhdl> signal vga_out : wb_master_in; signal vga_in : wb_master_out; signal irq_vga : std_logic; </vhdl> These signals are used to connect the Wishbone bus to the wb_vga and the interrupt signal of the wb_vga to the interrupt controller. The interrupt controller needs now one more interrupt line, therefore the constant irqLines must be increased by 1. Next the new interrupt line is connected to the interrupt of the wb_vga (example for a new interrupt with the number 4): <vhdl> irq(4) <= irq_vga; </vhdl> The port mapping can normaly be copied from another slave and modified to match with the wb_vga: <vhdl> vga_inst: wb_vga port map ( clk => clk, reset => reset, interrupt => irq_vga, slave_in => vga_in, slave_out => vga_out, hsync => ioVGA_hsync, vsync => ioVGA_vsync, rgb => ioVGA_rgb ); </vhdl> The signals hsync, vsync and rgb are already connected up to the top level and connected to the VGA port. Now there is just the connection to the Wishbone bus missing via the signals vga_in and vga_out. They are already defined and connected to wb_vga but not yet connected to the Wishbone bus.

A free address has to be chosen to allow the ZPU to communicate with the VGA controller, e.g. 0x800E0000. The data of the VGA controller to the master are send via the signal vga_out to the signal master_in. When selecting the just chosen Wishbone address of the vga controller, the vga_out needs to be mapped to master_in. The multiplexer at the end of the softcore_top.vhd can be extended with the following: <vhdl> master_in <= irq_out when (master_out.addr(31 downto 11) = "1000" & '0' & std_logic_vector(to_unsigned(0, (26 - 11 + 1)))) else uart1_out when (master_out.addr(31 downto 11) = "1000" & '0' & "0000001000000010") else ps2_out when (master_out.addr(31 downto 11) = "1000" & '0' & "0000001000000000") else intIO_out when (master_out.addr(31 downto 11) = "1000" & '0' & "0000000100000000") else sram_out when (master_out.addr(31 downto 11) = "1000" & '0' & "0000000110000000") else vga_out when (master_out.addr(31 downto 11) = "1000" & '0' & "0000000111000000") else dummy_out; </vhdl> The address „1000“ & ‚0’ & „0000000111000000“ is 0x800E0000. The vga_in has to be connected in the same way: <vhdl> vga_in.addr <= master_out.addr; vga_in.data <= master_out.data; vga_in.sel <= master_out.sel; vga_in.we <= master_out.we; vga_in.stb <= master_out.stb; vga_in.cyc <= master_out.cyc when (master_out.addr(31 downto 11) = "1000" & '0' & "0000000111000000") else '0'; </vhdl> Now wb_vga is completely connected and the actual functionality can be added. E.g. the 3 RGB values can be controlled by the ZPU, therefore the entity vga_syn is implemented. It controls the analog VGA signal. In this description it is skipped to explain the working of a VGA screen. Unused signals are connected to 0, like the interrupt signal and slave_out.data (that is the data bus to the master). ACK is connected to CYC. Another process is needed to interpret the data received from the master and to set the RGB colours. Now the architecture of wb_vga looks like shown below: <vhdl> architecture Behavioral of wb_vga is signal rgb_reg: std_logic_vector(2 downto 0); signal video_on: std_logic; begin -- instantiate VGA sync circuit vga_syn_unit: entity work.vga_sync port map(clk=>clk, reset=>reset, hsync=>hsync, vsync=>vsync, video_on=>video_on, p_tick=>open, pixel_x=>open, pixel_y=>open);

interrupt <= '0';

slave_out.ack <= slave_in.cyc;

slave_out.data <= (others => '0');

-- rgb buffer process (clk, reset) begin if (reset='1') then rgb_reg <= (others=>'0'); elsif (rising_edge(clk) and slave_in.cyc = '1' and slave_in.we = '1') then rgb_reg <= slave_in.data(2 downto 0); end if; end process; rgb <= rgb_reg when video_on='1' else "000"; end Behavioral; </vhdl>


A program written in C for the ZPU can now define the addresses of the CGA controller as volatile int pointer. Using the state of the switches, the color of the screen can be changed. More details on how to programm the ZPU in the chapter below. <c> /* IO addresses */ volatile int* intIOBTN = (volatile int*)0x80080000; volatile int* vga = (volatile int*)0x800E0000;

int main() { while(1) { *vga = *intIOBTN; }

return 0; } </c>

Programing the ZPU

It is possible to program the ZPU in C or C++. To write the source code e.g. notepad++ can be used. For compiling the ZPUGCC toolchain is used. This toolchain is made for Linus but the project is done on a Windows computer. A solution is to run ZPUGCC on Windows under Cygwin. How to install and use Cygwin and ZPUGCC is explained later in the chapter Compiling.

Access the hardware

The absolute hardware address can be set with a pointer of volatile int. As an example the definition of the switches and LEDs addresses on the board. The value is the address and it will be checked by the hardware multiplexer in the file softcore_top.vhd. <c> volatile int* intIOSW = (volatile int*)0x80080001; volatile int* intIOLED = (volatile int*)0x80080002; </c> By dereferencing the pointer intIOSW the state of the switches can be read, in the same way the LEDs can be set with the intIOLED. <c> while(1) { *intIOLED = *intIOSW; // Reads the value of the switches and sets // the LEDs. // The while loop is used to mirror the // of the switches at all times at the LEDs </c>


To use the interrupts they need to be enabled first then the ZPU will check them. <c> volatile int* interrupt_enable = (volatile int*)0x80000008; volatile int* interrupt_ctrl = (volatile int*)0x8000000c;

void init(void) { // Activate interrupts *interrupt_ctrl = 0x07; // Choose interrupt 1, 2 and 3 (Bit 0, 1 and 2) *interrupt_enable = 0x07; // Activate interrupt 1, 2 and 3 (Bit 0, 1 and 2) }

int main() { init(); }; </c> The function _zpu_interrupt() will be called on an interrupt. It is used to check the interrupts. An example is shown below: <c> volatile int* interrupt_stat = (volatile int*)0x80000004;

void _zpu_interrupt(void) { int interrupt = *interrupt_stat; if ((interrupt & 0x04) != 0) { // The interrupt 3 (Bit 3) has been triggered }

if ((interrupt & 0x02) != 0) { // The interrupt 2 (Bit 2) has been triggered } } </c> With a data width of 32 bit 32 interrupts can be used because every bit represents an interrupt. The if-instructions in the example above are used to isolate the single bits and check them.


The source code is compiled with the ZPUGCC. It will create a .elf-file that can bie loaded directly to the BlockRAM.

1. Download and install Cygwin:

The setup file can be downloaded at http://www.cygwin.com/. Open the setup.exe and clock on next. Select the installation directory and click on next. Select the folder where the installation will store the temporary packages and click on next. Choose Direct Connection and click on next. Select a download mirror and click on next. In the following window a selection will be shown of the packages to be installed. Click on Devel and a individual listing is shown. Select all GCC entries for C and C++ and also GDB, GIT, LIBGCC and MAKE. Click on next to download and install. At the end click on finish.

2. Use the ZPUGCC:

The compiler ZPUGCC can be downloaded at http://opensource.zylin.com/zpudownload.html under the heading ZPU binary toolchains - latest stable for Cygwin. This archive has to be unpacked user folder that is located in the home folder of Cygwin (in the Cagwin program folder, e.g. C:\Programme\cygwin). The archive contains a folder install that has to be renamed into zpugcc. The compiler can be started now directly with zpugcc/bin/zpu-elf-gcc because it is located in the user folder.
To compile under Cygwin the following command has to be entered into the command line:
<c>[pathname where the ZPUGCC is located, e.g. zpugcc]/bin/zpu-elf-gcc -Os -phi [absolute pathname]/[filename].c -o [absolute destination pathname]/[destination filename].elf -Wl,--relax -Wl,--gc-sections –g


It is enough to enter the main source code file, all headers (including own header files) etc. ZPUGCC is automatically adding and compiling.
It is possible to use batch files (a text file with the ending .sh) to run the above mentioned command. Therefore the command is put into this batch file and then the file is called using Cygwin (similar to the .bat and .cmd files in Windows):

sh [absolute pathname]/[filename].sh </c>

Load the program memory – Data2MEM and .bmm-files

After creating the .elf file (see chapter Compiling) its content need to be copied to the program memory (BlockRAM). The BlockRAM is located in the file zpu_bram.vhd. Dependent on the size of the softcore system and the used FPGA the syntesizing of the system can take a long time. To avoid syntesizing the whole system for every little change in the program memory of the cores, Xilinx offers the command line based tool Data2MEM (included in the ISE). It is included in the installation of Xilinx ISE by default. You also save a lot of time because you don't have to copy every single byte in to the program memory.

The data used to configure the FPGA and also the programs of the softcore are located in a .bit file. Data2MEM can replace an old program by a new one in this .bit file. Data2Mem needs some information which parts of the .bit file will be replaced during this action. They are located in the BlockRam Memory Map file (.bmm file), it is a text file that can be edited in every text editor and for the ZPU the syntax is like shown below:: <vhdl> ADDRESS_MAP softcore_top PPC405 0 ADDRESS_SPACE memory1 RAMB16 [0x00000000:0x00003FFF] BUS_BLOCK core/zpu_core/memory/RAMB16_S4_S4_inst7 [31:28]; core/zpu_core/memory/RAMB16_S4_S4_inst6 [27:24]; core/zpu_core/memory/RAMB16_S4_S4_inst5 [23:20]; core/zpu_core/memory/RAMB16_S4_S4_inst4 [19:16]; core/zpu_core/memory/RAMB16_S4_S4_inst3 [15:12]; core/zpu_core/memory/RAMB16_S4_S4_inst2 [11:8]; core/zpu_core/memory/RAMB16_S4_S4_inst1 [7:4]; core/zpu_core/memory/RAMB16_S4_S4_inst0 [3:0]; END_BUS_BLOCK; END_ADDRESS_SPACE; END_ADDRESS_MAP; </vhdl> The core can use 16 KB (4096 x 32 bit data width) program memory that are that are partitioned in 8 BlockRAMs. The address space is specified in bytes in the section ADDRESS_SPACE. The partitioning of the program data into 8 BlockRAMs is listet in the section BUS_BLOCK. In a BlockRAM won't be 512 lines with each 32 bit values, instead there are 4096 lines with each 4 bit of the 32 bit value used.

There are 8 instances of the type RAMB16_S4_S4 defined. The 16 stands for the total memory size of 16KB and the S4 for the data width of 4 bit. The double use of the S4 in the name means it is a dualport RAM. The memory size can be calculated with these informations: 4K x 4 bit width = 16KB total memory. The 8 instances with a width of 4 bit can be used with an appropriate selection as a memory with 32 bit width. A memory is used then with a size of 4K and a width of 32 bit.

The exact names of the instances and pathes, like core/zpu_core/memory/RAMB16_S4_S4_inst7 can be identified with the help of the Floorplaner in the ISE. It describes the encapsulated components: The entity core implements the entity zpu_core, the entity zpu_core implements the entity memory and so on and finally in the entity memory the 8 instances are described.

These informations are still not enough for Data2MEM to transform the .bmm file because the information of the exact BlockRAM is still missing. It is recommended to give this task to the ISE (more exactly Bitgen). The .bmm file is included in the ISE project and during the Generate Programming File process the .bmm file will be automatically completed. Now is a new .bmm file existing in the project folder and its name is extended by „_bd“ and it's containing the location for all instances like shown below: <vhdl> ADDRESS_MAP softcore_top PPC405 0 ADDRESS_SPACE memory1 RAMB16 [0x00000000:0x00003FFF] BUS_BLOCK core/zpu_core/memory/RAMB16_S4_S4_inst7 [31:28] PLACED = X0Y1; core/zpu_core/memory/RAMB16_S4_S4_inst6 [27:24] PLACED = X0Y5; core/zpu_core/memory/RAMB16_S4_S4_inst5 [23:20] PLACED = X1Y3; core/zpu_core/memory/RAMB16_S4_S4_inst4 [19:16] PLACED = X0Y3; core/zpu_core/memory/RAMB16_S4_S4_inst3 [15:12] PLACED = X0Y0; core/zpu_core/memory/RAMB16_S4_S4_inst2 [11:8] PLACED = X0Y2; core/zpu_core/memory/RAMB16_S4_S4_inst1 [7:4] PLACED = X0Y4; core/zpu_core/memory/RAMB16_S4_S4_inst0 [3:0] PLACED = X1Y2; END_BUS_BLOCK; END_ADDRESS_SPACE; END_ADDRESS_MAP; </vhdl> Data2MEM will need besides of the new generated .bmm file also the .bit and the .elf file. The .bit file is normally located in the top level of the project folder and has the same name as the project. The following command transforms the .bit file: <c> C:\Programme\Xilinx\[program version, e.g. 10.1]\ISE\bin\nt\data2mem -bm [absolute path name]\[file name].bmm -bd [absolute path name]\[file name].elf -bt [absolute path name]\[file name].bit -o b [absolute path name]\[destination file name].bit </c> For my project the file names:


The command line based Data2MEM can be executed using the Windows shell. You can also create a batch file with the command. Copy the command in a text file and rename it with the ending .bat. When using a batch file you can use instead of the absolute path names also relative path names - relative to the path of the batch file.

The result of this transformation is the new .bit file softcore_top_fw.bit with changed program memory. This one can be uploaded to the FPGA instead of the softcore_top.bit. For uploading the included software iMPACT can be used together with a JTAG adapter. Comments in the .bmm file have the same syntax like in C: // or /* and */, but only comments before the actual code will be interpreted as comments - all other comments will cause a failure!

The corresponding memory has to be described also in VHDL. In the file zpu_bram.vhd all instances are defined. Each has a name and a type, e.g. RAMB16_S4_S4. This is followed by a generic_map. Its constants INIT_00 to INIT_3F have to be filled completely with 0 (128 bit / 32 characters in hex)- whatever the its type is. This is followed by the port mapping. As an example a dual port RAM of the type RAMB16_S4_S4 is used - the commented out lines are ports used only in the configuration of S9, S18 and S36 (see table below with the classification possibilities). <vhdl> port map (DOA => memARead(3 downto 0), DOB => memBRead(3 downto 0), --DOPA => open, --DOPB => open, ADDRA => memAAddr(addrBitBRAM downto minAddrBit), ADDRB => memBAddr(addrBitBRAM downto minAddrBit), CLKA => clk, CLKB => clk, DIA => memAWrite(3 downto 0), DIB => memBWrite(3 downto 0), --DIPA => "00", --DIPB => "00", ENA => '1', ENB => '1', SSRA => '0', SSRB => '0', WEA => memAWriteEnable, WEB => memBWriteEnable); </vhdl>

Bmm tabelle bram.jpg
Bmm tabelle bram dualport.jpg

References, links, sources...

  • Further sources see documentation