scott_wilson46's comments

scott_wilson46 | 6 years ago | on: Unleashing Open-Source Silicon

The ASIC flow is way more involved than the FPGA flow, you need to think about all sorts of other things, like what pads you want on your chip, how you are going to generate clocks, floor planning, power distribution, test, and a whole raft of other issues. Going from RTL to GDS2 for even a simple chip would take 3 months work as an absolute minimum (and that's just to get it ready to send to the fab. You then have a whole lot of work when it comes back (you need to have boards designed for testing the chip).

scott_wilson46 | 6 years ago | on: Developing open-source FPGA tools

In my day job as an FPGA developer I am using totally open source tools for simulation and verification (the only tool we pay for is Vivado but you have to pay for that in order to build for the FPGA). The simulator is icarus verilog and does the job pretty well. I use cocotb for simulation.

scott_wilson46 | 6 years ago | on: FPGA Design for Software Engineers

I disagree about for loops, you actually end up using these quite a lot in vhdl/verilog (with understanding about what logic you are going to end up with), if you want to do the same operation on multiple things:

  input [NUM_OF_MULTIPLIERS*32-1:0] a_in,
  input [NUM_OF_MULTIPLIERS*32-1:0] b_in,

  output [NUM_OF_MULTIPLIERS*64-1:0] mult_out

  reg [31:0] tmp_a, tmp_b;
  reg [63:0] tmp_mult;

  always @(*) begin
    mult_out = {(NUM_OF_MULTIPLIERS*64){1'b0}};
    for (i=0; i<NUM_OF_MULTIPLIERS; i+=1) begin
      tmp_a = a_in>>(i*32);
      tmp_b = b_in>>(i*32);
      tmp_mult = tmp_a*tmp_b;
      mult_out |= tmp_mult<<(i*64);
    end  
  end
Would give you NUM_OF_MULTIPLIERS multipliers. If you wrote each multiply out, it would be more code and also wouldn't allow you to parametrize the code.

scott_wilson46 | 7 years ago | on: Open Source IDE for FPGAs as QtCreator Learns Verilog

I use it exclusively. I know of other trading firms that use verilator too. To be honest, no matter how big the company, how deep the pockets, theres still going to be a finite amount of questa licenses available. If you use Icarus Verilog it allows you to farm out simulations to anywhere, run as many in parallel, which would not be possible with questa (as you would eventually run out of licenses). Also, I think icarus verilog actually works pretty well, it covers enough of the system verilog syntax to be useful for RTL and with cocotb I don't need access to the system verilog testbench stuff.

scott_wilson46 | 7 years ago | on: Open Source IDE for FPGAs as QtCreator Learns Verilog

I use Icarus Verilog at work for a fairly complex trading system on an FPGA so I disagree that FOSS simulators are almost totally useless. It supports most System Verilog features and works well with cocotb. In fact the fact that it’s open source also allows me not to have to worry about license usage (which has always been a problem using modelsim). I managed quite well abstracting the Xilinx is (most up like rams can be inferred in the code) and things like pcie, transceivers, ddr4 have well defined interfaces so are easy to model in straight Verilog

scott_wilson46 | 7 years ago | on: Ask HN: How to get started developing with FPGA?

I disagree with this, you don't necessarily need a whole team of people and massive amounts of cash to do FPGA development and you don't necessarily need expensive tools. For my current company I created a complete FPGA based trading system from scratch on my own with free tools (apart from Vivado which I just used to turn my RTL into an actual design I could put onto the FPGA board). The board I used cost around £2k and the Vivado tools were £4k (athough if I was going to do it again, it appears you can just pay for your usage of Vivado using the cloud (nimbix has machines that have the Vivado suite on them). The cost to the company for this is pretty much my salary + the board costs.

scott_wilson46 | 8 years ago | on: Ask HN: Where do I get started on ASICs, FPGA, RTL, Verilog et. al?

I've had a different experience to this. I've worked on ASIC's for over ten years and have had experience with nearly all aspects of the design flow (from RTL all the way to GDS2 at one point or another). I've taped out probably 20+ chips (although I've been concentrating on FPGA's for the last three years). Every chip that I've taped out has had extensive FPGA prototyping done on the design. This is in a variety of different areas too (Bluetooth, GPU's, CPU's, video pipelines, etc). You can just get a hell of a lot more cycles through an FPGA prototyping system than you can an RTL sim and when you are spending a lot of money on the ASIC masks, etc you want to have a chance to soak test it first.

scott_wilson46 | 9 years ago | on: Free Range VHDL – VHDL programming book available for free

I think you can treat 60 - 70% of the FPGA design flow as open source. For example, I am developing a system using PCIe, 10G Base T and some logic to send and receive network packets and to design the HDL and test it, I am using two open source tools predominately (icarus verilog and cocotb). I just use the FPGA P&R tools for building the design once I am satisfied it works. You can also run these tools on the command line quite easily and automate most of the process (They all use tcl for scripting up the flow). Sure theres a few FPGa specific interfaces you have to deal with (transceivers, DDR4, pcie hard ip) but you can pretty much traet these as black boxes and write your tests to target the interfaces in and out of the logic. Also, for things like transceivers, the interface is really not that different between Xilinx and Altera (I treat them as a black box that generates 32-bits every 322MHz cycle for 10G-Base T). The flow to my mind is not that disimilar to a traditional software development flow. I have simulation tests and test cases, I use continous integration to run tests everytime something is commited, everytime I build the FPGA with the P&R tools, I kick off hardware tests automatically, etc

scott_wilson46 | 9 years ago | on: Liberouter Combo Cards – FPGA boards focused on network data processing

In actual fact for Xilinx based FPGA's this is quite straightforward. An example for 10G Base R:

You can get Xilinx's component for their pma/pcs for 10g base-r ethernet for free from vivado and stick one of the macs from open cores on the end of it (probably this: http://opencores.org/project,xge_ll_mac - I used it for prototyping and it seems to work (before creating my own pcs/pma block and mac to cut down the latency)

Once you have that, then you would need to deal with the ethernet frames streaming through the FPGA, probably 64-bits at a time at 156MHz for 10G, so you need to pull out the fields you are interested in (like mac addresses, ip addresses, etc). You can buffer the incoming packet into a FIFO whilst waiting for the stuff you want to filter on. Once you have all your fields you can decide whether you want to pass the packet through to the tx side or not (I usually read the packet out of the FIFO either way and just hold the valid low for packets I don't want to send).

Hope this makes sense!

scott_wilson46 | 9 years ago | on: Developer Preview – EC2 Instances with Programmable Hardware

I've heard (although admittedly never seen in practice) that some places take a long time for this sort of things (running over a cluster of computers overnight). If you could do the same job on a single F1 instance in say an hour then I think that would be compelling! Bearing in mind that simple experiments I did showed an improvement of around 100x for this sort of task over a GPU.

scott_wilson46 | 9 years ago | on: Developer Preview – EC2 Instances with Programmable Hardware

Nowadays, I don't believe you need a paid-for simulator like Questa, VCS, etc. I am developing verilog in my day job for FPGA's using icarus verilog (an open source simulator)which works fine for fairly large real world designs (I am also using cocotb for testing my code) and supports quite a lot of system verilog too.

scott_wilson46 | 10 years ago | on: Free FPGA: Reimplement the primitives models

It should be possible to write the majority of the code for an FPGA in a generic fashion and get the tools to infer things like RAM's by the way the Verilog or VHDL is written. Ideally, I think you should only have FPGA specific blocks in the very toplevel of a design and the majority of the design should be agnostic to the FPGA architecture. For example if you write your code like this:

  reg [31:0] mem[0:1023];
  
  always @(posedge clk) begin
    rd_data <= mem[rd_addr];
    if (wr_en)
      mem[wr_addr] <= wr_data;
  end
Then tools like vivado, quartus, synplify will infer a 1k x 32bits ram.

scott_wilson46 | 10 years ago | on: An open source Xilinx Spartan 6 miniPCIe development board

Actually the IC design tools are a lot more expensive than 50k to 100k. You need a variety of tools for a modern process node (Calibre being one of them usually used for LVS/DRV - checing the GDS2 against the schematic and checking the process design rules). You also need a synthesis tool to convert RTL to a gate level representation (usually another 100k or so), a place and route tool (usually many 100's of K), simulators (probably around 50k again), tools for inserting test logic (again around 50k), timing analysis tools (probably around 50k again or more). Usually you have a bunch of timing analysis tools as you need to check timing at a variety of process corners and temperatures at the same time. There are also other tools that get used at various points in the flow that all seem to cost a lot of money too (like tools for analyzing static and dynamic IR drop and logical equivalance tools, formally checking that the gate level description matches the RTL) So you can see you can end up spending a million dollars fairly easily.

scott_wilson46 | 10 years ago | on: FPGAs for numerical mathematics using CLaSH

I think this is a common misconception in the debate about GPU's vs FPGAs. If you take a top of the range GPU you get 2.7 Teraflops of performance (according to the GTP Titan review I just looked at: http://www.techspot.com/review/977-nvidia-geforce-gtx-titan-...). Comparing this to a top of the range Stratix 10 FPGA, you get 3.2 Teraflops (https://www.altera.com/content/dam/altera-www/global/en_US/p...) so there is really not much in it.

scott_wilson46 | 11 years ago | on: The Cx programming language

I think if you can crack the modelling and simulation of asynchronous clock domains then I suspect you will have something that the other HLS solutions don't have at the moment that would be an incredibly useful feature. Design with async clocks is difficult and I have seen loads of bugs with these interfaces (including bugs found in the field for chips that were release many years previously).
page 1