I need to use single and double precision matrix matrix multiply blocks on FPGA for some comparison. Can anyone help me find an open source code for it? Preferably just using on-chip memory.

I also need something to do with fpga and am not able to describe what i want exctly so does anybody have universal code do solve my problem - it's urgent

I need to use it for comparing FPGA's computational performance. I am using Xilinx Virtex 6-XC6VLX130T. I can generate floating point adders and multipliers via Xilinx Core generator. What I am interested in is the architecture design to maximize speed using maximum possible resources on the board (maximally parallel architecture). Any tutorial or any existing code on any architecture will be helpful.