hi i am looking for a mips1 ISA implementation (in VHDL, if possible) that is multi cycle and not pipelined is there anything around ?
Hi, I am not aware of any multi cycle MIPS variant, you might want to check out the 'ZPU small' instead. It has pretty decent GCC support (however it may need a bit of string size bug patches when compiling with recent toolchains). Why would you want to pull apart the MIPS pipeline? If it is to save resources, the ZPU is IMHO the better option. Greetings, - Strubi
MIPS1 is pretty simple, maybe you can implement it yourself? There are very few instructions and no particular surprises.
> check 'ZPU small' it is a stack machine, i hate such a stuff > Why would you want to pull apart the MIPS pipeline? multi cycle is 1:5 inflated versus the pipeline version, that is acceptable if you are not thinking about "performance" instead of "the less complexness" about complexness, removing the pipeline means - removing the hazard detect unit - no hazards, stall, alee - less code, to be written, verified, synthesized and tested - no branch prediction needed - no strong slot delay with branches that means the less issues with gcc-mips1's obj code that also means less difficulties if you want to toy with the core, for example if you want to insert a debug unit that dump out registers on S4 cycle S0: fetch s1: decode s2: execute s4: mem s5: writeback <----- registers dump, from r1 to r31 + PC, EPC or if you want to realize a "fault toy tolerant" version of the CPU, or such an hobby stuff
greg wrote: > MIPS1 is pretty simple, maybe you can implement it yourself? There are > very few instructions and no particular surprises. everybody says that, nobody has posted code: is it so really easy ? why i can't see nothing so easy on OpenCores ? Everybody with "alpha/beta/not-stable" version of hyper complex MIPS-pipelined never finished/never-validated'toy … writing and validating a Softcore requires effort, the more complex you write your Softcore the more effort it will take to you in order to be validated! Keep it simple, especially for time limited hobby purpose ! btw, i am asking MIPS1 multi cycle, i do not want anything else, and yes, in case i will write myself, just i do not want to re-invent the wheel, so if such a core is existing i am pretty happy to save my development time
:
Edited by User
> Everybody with
"alpha/beta/not-stable" version of hyper complex MIPS-pipelined never
finished/never-validated'toy …
Yeah, a pipelined core is probably at least an order of magnitude harder
to do than a simple multi cycle one. A multi cycle core is just a big
but simple state machine.
Legacy My wrote: > hi > i am looking for a mips1 ISA implementation (in VHDL, if possible) that > is multi cycle and not pipelined > > is there anything around ? A pipelined Core ISA1 is available. http://www.dossmatik.de/mais-cpu.html
> A pipelined Core ISA1 is available. > > http://www.dossmatik.de/mais-cpu.html exactly, this is the problem: MAIS is pipelined, that means good, excellent, by "too complex for me", i am looking for a not-pipelined multi-cycle. to be honest i have already implemented something mono-cycle, but it is very ugly and - as i have written - i do not want to reinvent the wheel, i am looking for a multi-cycle because it is perfect for my purposes, if i will not find it … no problem, i will try to evolve my ugly-actual-toy-project in short, i do not want a full MIPS_1 core, i want a toy-core, easiest for the most, just to do cool experiments my purpose is doing experiments, i do not want to implement a core, but i need it, and i need it the most simple it can be
@Strubi the CERN has released a custom version of LM32, which is a RISC core plus devices, in short a tiny & smart SoC that is light years ahead the ZPU. unfortunately it is not MIPS compliant, its … a different core with a completely different ISA. It's RISC, off course, but it is not MIPS. also, you can see the OpenCores/Atlas, which is a RISC 16bit with an incredible documentation and a well written code plus a brilliant implementation of short time context switch in CPU space. i was impressed by it, excellent code, great documentation, funny assembly toolchain (the gcc/Atlas is … underdevelopment as far as i understood, while gcc/LM32 has already a working port, tested and validated) but … it is not what i am looking for.
:
Edited by User
Hi, just a few comments: > - no branch prediction needed The MIPS1k doesn't do that anyway. > - no strong slot delay with branches You'd have to emulate the Branch delay slot in the multi cycle variant, too. The LM32 arch is nice, but does not beat the ZPU in terms of code size and compactness. I agree that stack machines are kinda dirty when it gets to debugging (GDB just isn't made for it), but on an FPGA, stack machines with switchable window tricks open up a few nice possibilities at similar speed marks. If it's a toy CPU, I'd recommend to go ahead and write your own. I've made very good experiences with MyHDL concerning CPU design and verification. I have played with MIPS16 using abstract microcode instructions (multi cycled emulation, basically), but it is not synthesizeable (yet), and I guess that's not what you're looking for. Cheers - Strubi
>> - no branch prediction needed > > The MIPS1k doesn't do that anyway. yes, mips R1000 and R2000 don't, but R3000 does it (and they are both MIPS_1 about ISA, they differ about implementation, also the R3K adds caches, that is … an other source of more speedup and more complexity), especially if you add the pipeline! with pipeline you have to speculate about every conditional branch: only two choices are possible and only one choice will be taken, so you have speculate, and in case you have speculated to the wrong choice well .. you have to stall the pipeline, trash all the work it has done, and then you have … to reload the cpu.reg.PC to the correct position, to fetch the correct instruction, and and to go on PowerPC has a very deeply issue about it, IBM guys call it "the EIEIO instruction" that is required to force a pipeline stall and flush in order to be sure about the I/O operation: what i thing about it ? Let me say, i hate this about PowerPC! about pipeline, the less stages the pipeline has, the less penalties you have to pay with wrong choices about the branch prediction because you have the less job to be trashed and reloaded: this may speed up performances about conditional branches (the less penalties means more instructions completed with success), but makes your design much more complex, especially if you do dynamic branch prediction with involves statistical analysis and such a stuff. the pretty easiest solution of pure foolishness (which has been practically used in cheap commercial microcontroller RISC like, something like … Microchip PIC) is to assume that a conditional-branch will not ever take the true-branch (or the false one) and … being ready to stall the pipeline and to reload jobs, in case this prediction is wrong: you get the 50% of probability to do the right things, or the wrong things saving up to the 60% of the complexity :D btw, i do not want all of this complexness, i can accept a ratio 1:5 about MHZ:completed instruction for time 50Mhz -> pipelined MIPS can do 50.000.000 instructions/sec 50Mhz -> multicycle MIPS not pipelined can do 10.000.000 instructions/sec in short, my 50Mhz clocked fpga is emulating a CPU that looks like a 10Mhz RISC (that seems less the performance you can get from a CISC, but … who cares about that ? 10.000.000 instructions/sec are still great for my hobby) >> - no strong slot delay with branches > > You'd have to emulate the Branch delay slot in the multi cycle variant, > too. yes, of course, but it is an "emulation", it is not as "critical" as in the pipelined version, that means the less complexity also, if you gcc-compile with "-O0" you will always get branch-delay-slot, that means every conditional/unconditional branch is always stuffed with NOP, the compiler will always put a NOP after a branch, that's cool and simplifies the design. the "-O2" cflag is the most critical flag about gcc and you MUST take care about the implementation: the compiler will try to do the less branch-delay-slot it can do, so … you have to be very very careful about the hardware design, especially about "alee" and "hazards" in the pipelined version > > The LM32 arch is nice, but does not beat the ZPU in terms of code size > and compactness. mmm, the Truth is … no RISC core can have the code density of CISC, for example a MC6809 has the best code density all over the whole, but it is CISC, it has complex ISA, and … it eats a lot of resources on FPGA. i got system09 and i was playing with it, from the assembly point of view (i mean as firmware guy) i love CISC machines, especially the motorola 6809 and 68000 family because the make the programming activity much more full of pleasure than RISC, unfortunately CISC … are frustrating a lot in case you want to synthesize them in HDL/RTL btw, the ZPU is a trade off of compactness and complexity, and it is also a stack machine that consumes the less resources on fpga, and it may consume the less iram (instruction ram), but its code eats a lot of dram (data ram), especially for its soft registers > but on an FPGA, stack machines with switchable window tricks > open up a few nice possibilities at similar speed marks. yeah, like the "forth1 CPU" used in my GameDuino-v1, this shield has been shipped with spartan3-250 fpga and a simplified "forth1 CPU" has been put in it in order to provide you with a sort of GPU … that sounds "cool" but i can assure you i am hating it very so much than i have switched to GameDuino-V2 which has an ASIC GPU with a RISC-custom-closed machine inside :D ZPU is obviously better than forth1, but … let me say: i hate every stack machine! > > If it's a toy CPU, I'd recommend to go ahead and write your own. I've > made very good experiences with MyHDL concerning CPU design and > verification. > I have played with MIPS16 using abstract microcode instructions (multi > cycled emulation, basically), but it is not synthesizeable (yet), and I > guess that's not what you're looking for. good advice, today i have experimented ghdl+gtkwave and i have done the earliest patches to put my mono-cycle-toy-softcore into multi-cycle: it seems working, obviously i can't believe it (i can't be so lucky and skilled), but i have already got good experiences, a lot of more skills, and fun =)
:
Edited by User
about pipeline and IO, the PowerPC EIEIO: form IBM PowerPC ISA vol1: "Enforce In-Order Execution of I/O, or EIEIO, is a machine code instruction used on the PowerPC computer processor which prevents one memory or I/O operation from starting until the previous memory or I/O operation completed. This instruction is needed as I/O controllers on the system bus require that accesses follow a particular order, while the CPU reorders accesses to optimize memory bandwidth usage." here a freescale doc -> http://cache.freescale.com/files/32bit/doc/app_note/AN2540.pdf
:
Edited by User
In case you're looking for a really small (mostly) MIPS-I ISA implementation in synthesizable Verilog (built for Altera FPGAs), you might be interested in U Toronto's "Supersmall" softcore: http://www.eecg.toronto.edu/~jayar/software/SuperSmallProcessor/ IIRC, the SuperSmall core is not pipelined. However beware - this is a bit serial implementation of MIPS-I, so it will also be super slow. However, it can be scaled down to use only 115 Stratix III ALMs (Adaptive Logic Modules). -- Michael
nah, it is now compliant for my purposes - it is written in Verilog, i do not want this language, first i do not like it at all, and second i can simulate only VHDL (i am using ghdl+gtkwave) - it is serialized, everything is 1 bit, n times repeated, that is not acceptable for my actual debug-cpu module (1) (1) i have designed a debug module with uses a sequencer in oder to dump cpu's registers by hardware, this module is scheduling the CPU cycles at reduced frequency in order to handle the debug-uart-tx which is running at 115200bps, that means 115.2 Khz. I did something like that 50Mhz physical clock -----> sequencer ---> frequency divisor ---> soft core's clock also, in the soft core i have 5 cpu cycles cpu.cycle0 - fetch cpu.cycle1 - decode cpu.cycle2 - execute cpu.cycle3 - I/O cpu.cycle4 - write back <---- the sequencer dumps registers here the dumper has 36 blocks to be transmitted to host, each block is composed by 4 byte ASCI, 4 byte of data, plus other 2 bytes for CR & LF { identifier (ASCII, 3byte), separator (ASCII, 1byte), uint32_t (4 byte hex), CR, LF} e.g. {'R','3','1',':',xx,xx,xx,xx, 0a, 0d} that means the debug-uart-tx has 8+2 bytes to be transmitted for every cpu registers 32 cpu-registes + PC, EPC, and other stuff, for a total for 36 blocks i have already implemented this "engine" and the whole is currently working on my Digilent board s3e500. as "cavia" i am using a modified toy-version of my previous mono-cycle MIPS1 soft-toy-core that has become - multi cycle - with everything strongly synchronized with 5 cpu cycle Having a serialized CPU means … to adapt this scheme again and it has no sense for me.
:
Edited by User
Legacy, not liking Verilog is one thing - however, most open cores nowadays are written in Verilog, so you are going to restrict your options significantly. However, free and open source tools such as iVerilog and Verilator (which both work with gtkwave) are available for Verilog, too, so that should not relly be a reason not to use this HDL. However, I don't want to start an HDL war here. I just wanted to give another link to a MIPS implementation that has not been discussed in this thread so far - the eMIPS from Microsoft (MS Research, it even comes with a complete NetBSD (!) port): http://research.microsoft.com/en-us/projects/emips/ Again, this is written Verilog and pipelined, so it may not fit your needs. What is especially interesting about the eMIPS is that you can extend the processor by dynamically loading/unloading application-specific circuits. Such extensions may add specialized instructions to the processor, security monitors, debuggers, new on-chip peripherals. One can load dynamically and plug into the stages of the eMIPS pipelined data path, to extend the core instruction set of the microprocessor. Just some food for thoughts - this won't help with your actual problem at hand, but maybe it gives some inspiration for your future work. -- Michael
a MIPS project done by Microsoft research: simply unbelievable, and very interesting ! Giano seems very powerful. I will read their papers in my free time. just a question, why Verilog instead of VHDL ? Why should i prefer it ? Any good reason ? I mean, why do you prefer it ?
Verilog seems to find more use in the industry nowadays and it seems more universities are currently teaching it. VHDL is mostly used in European companies (and some universities) - which seems a bit strange, since its development was initiated by the US ministry of defence IIRC. From what I've seen, there are simply more open source hardware projects written in Verilog. In theory, the language shouldn't matter, both are on a roguhly similar description level and interfacing using signals/ports is easy. Unfortunately, mixing VHDL and Verilog using free simulation tools doesn't seem to be that easy. With Xilinx' toolchain and ModelSim, it's rather easy, but that won't help you, again. Btw., I do not have a HDL preference myself, I tend to be pragmatic and use what's available. In fact, I used to teach my students SystemC - which also seems to be a bit of a dead end right now (and there are no really great synthesis tools IMHO), but the trend seems to go towards SystemVerilog. Anyways, the Microsoft eMIPS project has intrigued me since I read about it some years ago. It was easy to get it to work on a Xilinx Virtex 5 XUP board and it seems well written and documented. However, I didn't find the time to do a lot with it so far. The guys who worked on that project are definitely nice and helpful. It's a bit of a shame that the eMIPS didn't get much response.
Please log in before posting. Registration is free and takes only a minute.
Existing account
Do you have a Google/GoogleMail account? No registration required!
Log in with Google account
Log in with Google account
No account? Register here.