EmbDev.net

Forum: FPGA, VHDL & Verilog looking for a MIPS1 multi cycle, not pipelined


Author: Legacy My (Company: my) (legacy)
Posted on:

Rate this post
0 useful
not useful
hi
i am looking for a mips1 ISA implementation (in VHDL, if possible) that 
is multi cycle and not pipelined

is there anything around ?

Author: Strubi (Guest)
Posted on:

Rate this post
0 useful
not useful
Hi,

I am not aware of any multi cycle MIPS variant, you might want to check 
out the 'ZPU small' instead. It has pretty decent GCC support (however 
it may need a bit of string size bug patches when compiling with recent 
toolchains).
Why would you want to pull apart the MIPS pipeline? If it is to save 
resources, the ZPU is IMHO the better option.

Greetings,

- Strubi

Author: greg (Guest)
Posted on:

Rate this post
0 useful
not useful
MIPS1 is pretty simple, maybe you can implement it yourself? There are 
very few instructions and no particular surprises.

Author: Legacy My (Company: my) (legacy)
Posted on:

Rate this post
0 useful
not useful
> check 'ZPU small'

it is a stack machine, i hate such a stuff


> Why would you want to pull apart the MIPS pipeline?

multi cycle is 1:5 inflated versus the pipeline version, that is 
acceptable if you are not thinking about "performance" instead of "the 
less complexness"

about complexness, removing the pipeline means
- removing the hazard detect unit
- no hazards, stall, alee
- less code, to be written, verified, synthesized and tested
- no branch prediction needed
- no strong slot delay with branches

that means the less issues with gcc-mips1's obj code

that also means less difficulties if you want to toy with the core, for 
example if you want to insert a debug unit that dump out registers on S4 
cycle

S0: fetch
s1: decode
s2: execute
s4: mem
s5: writeback    <----- registers dump, from r1 to r31 + PC, EPC


or if you want to realize a "fault toy tolerant" version of the CPU, or 
such an hobby stuff

Author: Legacy My (Company: my) (legacy)
Posted on:

Rate this post
0 useful
not useful
greg wrote:
> MIPS1 is pretty simple, maybe you can implement it yourself? There are
> very few instructions and no particular surprises.


everybody says that, nobody has posted code: is it so really easy ? why 
i can't see nothing so easy on OpenCores ? Everybody with 
"alpha/beta/not-stable" version of hyper complex MIPS-pipelined never 
finished/never-validated'toy …

writing and validating a Softcore requires effort, the more complex you 
write your Softcore the more effort it will take to you in order to be 
validated!

Keep it simple, especially for time limited hobby purpose !

btw, i am asking MIPS1 multi cycle, i do not want anything else, and 
yes, in case i will write myself, just i do not want to re-invent the 
wheel, so if such a core is existing i am pretty happy to save my 
development time

: Edited by User
Author: greg (Guest)
Posted on:

Rate this post
0 useful
not useful
> Everybody with
"alpha/beta/not-stable" version of hyper complex MIPS-pipelined never
finished/never-validated'toy …

Yeah, a pipelined core is probably at least an order of magnitude harder 
to do than a simple multi cycle one. A multi cycle core is just a big 
but simple state machine.

Author: René D. (Company: www.dossmatik.de) (dose)
Posted on:

Rate this post
0 useful
not useful
Legacy My wrote:
> hi
> i am looking for a mips1 ISA implementation (in VHDL, if possible) that
> is multi cycle and not pipelined
>
> is there anything around ?

A pipelined Core ISA1 is available.

http://www.dossmatik.de/mais-cpu.html

Author: Legacy My (Company: my) (legacy)
Posted on:

Rate this post
0 useful
not useful
> A pipelined Core ISA1 is available.
>
> http://www.dossmatik.de/mais-cpu.html

exactly, this is the problem: MAIS is pipelined, that means good, 
excellent, by "too complex for me", i am looking for a not-pipelined 
multi-cycle.

to be honest i have already implemented something mono-cycle, but it is 
very ugly and - as i have written - i do not want to reinvent the wheel, 
i am looking for a multi-cycle because it is perfect for my purposes, if 
i will not find it … no problem, i will try to evolve my 
ugly-actual-toy-project

in short, i do not want a full MIPS_1 core, i want a toy-core, easiest 
for the most, just to do cool experiments

my purpose is doing experiments, i do not want to implement a core, but 
i need it, and i need it the most simple it can be

Author: Legacy My (Company: my) (legacy)
Posted on:

Rate this post
0 useful
not useful
@Strubi

the CERN has released a custom version of LM32, which is a RISC core 
plus devices, in short a tiny & smart SoC that is light years ahead the 
ZPU.

unfortunately it is not MIPS compliant, its … a different core with a 
completely different ISA. It's RISC, off course, but it is not MIPS.

also, you can see the OpenCores/Atlas, which is a RISC 16bit with an 
incredible documentation and a well written code plus a brilliant 
implementation of short time context switch in CPU space.

i was impressed by it, excellent code, great documentation, funny 
assembly toolchain (the gcc/Atlas is … underdevelopment as far as i 
understood, while gcc/LM32 has already a working port, tested and 
validated) but … it is not what i am looking for.

: Edited by User
Author: Strubi (Guest)
Posted on:

Rate this post
0 useful
not useful
Hi,

just a few comments:

> - no branch prediction needed

The MIPS1k doesn't do that anyway.

> - no strong slot delay with branches

You'd have to emulate the Branch delay slot in the multi cycle variant, 
too.

The LM32 arch is nice, but does not beat the ZPU in terms of code size 
and compactness. I agree that stack machines are kinda dirty when it 
gets to debugging (GDB just isn't made for it), but on an FPGA, stack 
machines with switchable window tricks open up a few nice possibilities 
at similar speed marks.

If it's a toy CPU, I'd recommend to go ahead and write your own. I've 
made very good experiences with MyHDL concerning CPU design and 
verification.
I have played with MIPS16 using abstract microcode instructions (multi 
cycled emulation, basically), but it is not synthesizeable (yet), and I 
guess that's not what you're looking for.


Cheers

- Strubi

Author: Legacy My (Company: my) (legacy)
Posted on:

Rate this post
0 useful
not useful
>> - no branch prediction needed
>
> The MIPS1k doesn't do that anyway.

yes, mips R1000 and R2000 don't, but R3000 does it (and they are both 
MIPS_1 about ISA, they differ about implementation, also the R3K adds 
caches, that is … an other source of more speedup and more complexity), 
especially if you add the pipeline!

with pipeline you have to speculate about every conditional branch: only 
two choices are possible and only one choice will be taken, so you have 
speculate, and in case you have speculated to the wrong choice well .. 
you have to stall the pipeline, trash all the work it has done, and then 
you have … to reload the cpu.reg.PC to the correct position, to fetch 
the correct instruction, and and to go on

PowerPC has a very deeply issue about it, IBM guys call it "the EIEIO 
instruction" that is required to force a pipeline stall and flush in 
order to be sure about the I/O operation: what i thing about it ? Let me 
say, i hate this about PowerPC!

about pipeline, the less stages the pipeline has, the less penalties you 
have to pay with wrong choices about the branch prediction because you 
have the less job to be trashed and reloaded: this may speed up 
performances about conditional branches (the less penalties means more 
instructions completed with success), but makes your design much more 
complex, especially if you do dynamic branch prediction with involves 
statistical analysis and such a stuff.

the pretty easiest solution of pure foolishness (which has been 
practically used in cheap commercial microcontroller RISC like, 
something like … Microchip PIC) is to assume that a conditional-branch 
will not ever take the true-branch (or the false one) and … being ready 
to stall the pipeline and to reload jobs, in case this prediction is 
wrong: you get
the 50% of probability to do the right things, or the wrong things 
saving up to the 60% of the complexity :D

btw, i do not want all of this complexness, i can accept a ratio 1:5 
about MHZ:completed instruction for time

50Mhz -> pipelined MIPS can do 50.000.000 instructions/sec
50Mhz -> multicycle MIPS not pipelined can do 10.000.000 
instructions/sec

in short, my 50Mhz clocked fpga is emulating a CPU that looks like a 
10Mhz RISC (that seems less the performance you can get from a CISC, but 
… who cares about that ? 10.000.000 instructions/sec are still great for 
my hobby)


>> - no strong slot delay with branches
>
> You'd have to emulate the Branch delay slot in the multi cycle variant,
> too.

yes, of course, but it is an "emulation", it is not as "critical" as in 
the pipelined version, that means the less complexity

also, if you gcc-compile with "-O0" you will always get 
branch-delay-slot, that means every conditional/unconditional branch is 
always stuffed with NOP, the compiler will always put a NOP after a 
branch, that's cool and simplifies the design.

the "-O2" cflag is the most critical flag about gcc and you MUST take 
care about the implementation: the compiler will try to do the less 
branch-delay-slot it can do, so … you have to be very very careful about 
the hardware design, especially about "alee" and "hazards" in the 
pipelined version

>
> The LM32 arch is nice, but does not beat the ZPU in terms of code size
> and compactness.

mmm, the Truth is … no RISC core can have the code density of CISC, for 
example a MC6809 has the best code density all over the whole, but it is 
CISC, it has complex ISA, and … it eats a lot of resources on FPGA.

i got system09 and i was playing with it, from the assembly point of 
view (i mean as firmware guy) i love CISC machines, especially the 
motorola 6809 and 68000 family because the make the programming activity 
much more full of pleasure than RISC, unfortunately CISC … are 
frustrating a lot in case you want to synthesize them in HDL/RTL

btw, the ZPU is a trade off of compactness and complexity, and it is 
also a stack machine that consumes the less resources on fpga, and it 
may consume the less iram (instruction ram), but its code eats a lot of 
dram (data ram), especially for its soft registers

> but on an FPGA, stack machines with switchable window tricks
> open up a few nice possibilities  at similar speed marks.

yeah, like the "forth1 CPU" used in my GameDuino-v1, this shield has 
been shipped with spartan3-250 fpga and a simplified "forth1 CPU" has 
been put in it in order to provide you with a sort of GPU  … that sounds 
"cool" but i can assure you i am hating it very so much than i have 
switched to GameDuino-V2 which has an ASIC GPU with a RISC-custom-closed 
machine inside :D

ZPU is obviously better than forth1, but … let me say: i hate every 
stack machine!

>
> If it's a toy CPU, I'd recommend to go ahead and write your own. I've
> made very good experiences with MyHDL concerning CPU design and
> verification.
> I have played with MIPS16 using abstract microcode instructions (multi
> cycled emulation, basically), but it is not synthesizeable (yet), and I
> guess that's not what you're looking for.

good advice, today i have experimented ghdl+gtkwave and i have done the 
earliest patches to put my mono-cycle-toy-softcore into multi-cycle: it 
seems working, obviously i can't believe it (i can't be so lucky and 
skilled), but i have already got good experiences, a lot of more skills, 
and fun =)

: Edited by User
Author: Legacy My (Company: my) (legacy)
Posted on:

Rate this post
0 useful
not useful
about pipeline and IO, the PowerPC EIEIO:

form IBM PowerPC ISA vol1:

"Enforce In-Order Execution of I/O, or EIEIO, is a machine code 
instruction used on the PowerPC computer processor which prevents one 
memory or I/O operation from starting until the previous memory or I/O 
operation completed. This instruction is needed as I/O controllers on 
the system bus require that accesses follow a particular order, while 
the CPU reorders accesses to optimize memory bandwidth usage."

here a freescale doc
-> http://cache.freescale.com/files/32bit/doc/app_not...

: Edited by User
Author: Michael Engel (Guest)
Posted on:

Rate this post
0 useful
not useful
In case you're looking for a really small (mostly) MIPS-I ISA 
implementation in synthesizable Verilog (built for Altera FPGAs), you 
might be interested in U Toronto's "Supersmall" softcore:

http://www.eecg.toronto.edu/~jayar/software/SuperS...

IIRC, the SuperSmall core is not pipelined. However beware - this is a 
bit serial implementation of MIPS-I, so it will also be super slow. 
However, it can be scaled down to use only 115 Stratix III ALMs 
(Adaptive Logic Modules).

-- Michael

Author: Legacy My (Company: my) (legacy)
Posted on:

Rate this post
0 useful
not useful
nah, it is now compliant for my purposes
- it is written in Verilog, i do not want this language, first i do not 
like it at all, and second i can simulate only VHDL (i am using 
ghdl+gtkwave)
- it is serialized, everything is 1 bit, n times repeated, that is not 
acceptable for my actual debug-cpu module (1)

(1) i have designed a debug module with uses a sequencer in oder to dump 
cpu's registers by hardware, this module is scheduling the CPU cycles at 
reduced frequency in order to handle the debug-uart-tx which is running 
at 115200bps, that means 115.2 Khz.

I did something like that

50Mhz physical clock -----> sequencer ---> frequency divisor ---> soft 
core's clock

also, in the soft core i have 5 cpu cycles

cpu.cycle0 - fetch
cpu.cycle1 - decode
cpu.cycle2 - execute
cpu.cycle3 - I/O
cpu.cycle4 - write back <---- the sequencer dumps registers here


the dumper has 36 blocks to be transmitted to host, each block is 
composed by 4 byte ASCI, 4 byte of data, plus other 2 bytes for CR & LF

{ identifier (ASCII, 3byte), separator (ASCII, 1byte), uint32_t (4 byte 
hex), CR, LF}

e.g. {'R','3','1',':',xx,xx,xx,xx, 0a, 0d}


that means the debug-uart-tx has 8+2 bytes to be transmitted for every 
cpu registers

32 cpu-registes + PC, EPC, and other stuff, for a total for 36 blocks

i have already implemented this "engine" and the whole is currently 
working on my Digilent board s3e500.

as "cavia" i am using a modified toy-version of my previous mono-cycle 
MIPS1 soft-toy-core that has become
- multi cycle
- with everything strongly synchronized with 5 cpu cycle

Having a serialized CPU means … to adapt this scheme again and it has no 
sense for me.

: Edited by User
Author: Michael Engel (Guest)
Posted on:

Rate this post
0 useful
not useful
Legacy,

not liking Verilog is one thing - however, most open cores nowadays are 
written in Verilog, so you are going to restrict your options 
significantly. However, free and open source tools such as iVerilog and 
Verilator (which both work with gtkwave) are available for Verilog, too, 
so that should not relly be a reason not to use this HDL.

However, I don't want to start an HDL war here. I just wanted to give 
another link to a MIPS implementation that has not been discussed in 
this thread so far - the eMIPS from Microsoft (MS Research, it even 
comes with a complete NetBSD (!) port): 
http://research.microsoft.com/en-us/projects/emips/

Again, this is written Verilog and pipelined, so it may not fit your 
needs. What is especially interesting about the eMIPS is that you can 
extend the processor by dynamically loading/unloading 
application-specific circuits. Such extensions may add specialized 
instructions to the processor, security monitors, debuggers, new on-chip 
peripherals. One can load dynamically and plug into the stages of the 
eMIPS pipelined data path, to extend the core instruction set of the 
microprocessor.

Just some food for thoughts - this won't help with your actual problem 
at hand, but maybe it gives some inspiration for your future work.

-- Michael

Author: Legacy My (Company: my) (legacy)
Posted on:

Rate this post
0 useful
not useful
a MIPS project done by Microsoft research: simply unbelievable, and very 
interesting ! Giano seems very powerful. I will read their papers in my 
free time.

just a question, why Verilog instead of VHDL ? Why should i prefer it ? 
Any good reason ? I mean, why do you prefer it ?

Author: Michael Engel (Guest)
Posted on:

Rate this post
0 useful
not useful
Verilog seems to find more use in the industry nowadays and it seems 
more universities are currently teaching it. VHDL is mostly used in 
European companies (and some universities) - which seems a bit strange, 
since its development was initiated by the US ministry of defence IIRC. 
From what I've seen, there are simply more open source hardware projects 
written in Verilog. In theory, the language shouldn't matter, both are 
on a roguhly similar description level and interfacing using 
signals/ports is easy. Unfortunately, mixing VHDL and Verilog using free 
simulation tools doesn't seem to be that easy. With Xilinx' toolchain 
and ModelSim, it's rather easy, but that won't help you, again.

Btw., I do not have a HDL preference myself, I tend to be pragmatic and 
use what's available. In fact, I used to teach my students SystemC - 
which also seems to be a bit of a dead end right now (and there are no 
really great synthesis tools IMHO), but the trend seems to go towards 
SystemVerilog.

Anyways, the Microsoft eMIPS project has intrigued me since I read about 
it some years ago. It was easy to get it to work on a Xilinx Virtex 5 
XUP board and it seems well written and documented. However, I didn't 
find the time to do a lot with it so far. The guys who worked on that 
project are definitely nice and helpful. It's a bit of a shame that the 
eMIPS didn't get much response.

Reply

Entering an e-mail address is optional. If you want to receive reply notifications by e-mail, please log in.

Rules — please read before posting

  • Post long source code as attachment, not in the text
  • Posting advertisements is forbidden.

Formatting options

  • [c]C code[/c]
  • [avrasm]AVR assembler code[/avrasm]
  • [vhdl]VHDL code[/vhdl]
  • [code]code in other languages, ASCII drawings[/code]
  • [math]formula (LaTeX syntax)[/math]




Bild automatisch verkleinern, falls nötig
Note: the original post is older than 6 months. Please don't ask any new questions in this thread, but start a new one.