Of State Machines and Car Factories

Published by Bryan Cowger on

How Kazan Networks architects and builds the world’s fastest NVMe-oF solutions.

We’re a bit different here at Kazan Networks in the way we go about creating semiconductor products.  While the rest of the tech world is cranking out embedded processor-based solutions, we take a different approach:  Create powerful and intelligent protocol solutions using hardware state machine-based architectures

How common are embedded processors as the “smarts” inside today’s chips?  Look no further than ARM holdings, recently purchased by Softbank for $32B.  That’s success by any measure.  The acronym SoC (system on a chip) implies one or more embedded processors and seems to have become the new term for ASIC (application specific integrated circuit) in the lexicon of many…   Why?  Because so many people associate an SoC-type architecture with doing a custom chip.

But while that’s a common way to implement a semiconductor solution, it’s not the only way, and for some applications, it’s definitely not the best way.  HW-based state machine architectures, while harder to implement, result in a significantly more efficient product in the end.  More on that later, but first, what is a HW-based state machine anyway?  In essence, it’s a very task-specific piece of logic that executes its specific task very quickly and efficiently.

Simple state machine diagram

Here’s a simile in the car manufacturing industry:  Let’s say you design a car factory around a single worker (ridiculous, I know, but stick with me on this one…)  This worker is pretty talented, can do any of the car assembly tasks from start to finish, and can certainly build you a car in the end.  In this situation, this single worker has to walk the car through the entire factory, beginning with the raw frame, adding every nut and bolt along the way, and ending with the sticker in the window… but as you’d expect, it takes a LONG time to build a car this way.  To make up for this inefficient factory design, you make your worker as powerful and fast as you can, but nonetheless this single worker just can’t crank cars out fast enough.  So you hire 8 or 16 of these “super workers” and hope that they can keep up…

Now we all realize that this isn’t how you design an assembly line – the better way is to build numerous, task-specific stations along the build floor and then have them all working in *parallel*, each station executing on its specific task while the other dozens of stations are simultaneously doing the same.

Here’s how that all relates to SoC vs. state machine designs.  In an SoC approach, the powerful embedded processor is your “super worker” – very general purpose in nature and able to do all the tasks in the factory, but there’s just one of them and they can’t keep up.  So you embed 8 or 16 of these processors in your design and hope that this will be fast enough.  But recall that you had to make this worker very powerful, and it’s just the same with an embedded processor – it’s large and power hungry.  The HW-based state machine is the equivalent of a station that installs the brakes, or the seats, or the exhaust… each station is not general purpose – by design – but it’s the superior way to do a lot of tasks in parallel. 

The guy that architected both of those solutions is named Mike Thompson and he’s the CTO here at Kazan Networks.  He and the team are at it again, having architected and implemented the world’s fastest NVMe-oF bridge. 

Here’s the bottom line:  In an efficient HW-based design, you literally have thousands of decisions being made inside your ASIC every clock cycle, and thousands of actions being taken during that same clock cycle.  Versus an embedded processor that might be able to make ONE decision in a few clock cycles and then take ONE action, based on that decision, in a few more.  Because you get so little done per clock cycle with an embedded processor, you need to crank up the clock speed, which is why many such SoC designs run at speeds well north of a gigahertz.  An efficient HW-based design, on the other hand, can lope along at speeds in the 100s of megahertz and still outperform an SoC by a mile, AND while doing so at a fraction of the power of that high-speed processor (or worse yet, 16 or more of them!)

This is why we here at Kazan design chips based on HW-based engines.  And it’s how we’ve done ASICs all our careers, going back to the Tachyon chip at HP that this team invented, and including the industry’s first HW-based iSCSI controller (that eventually formed the ISP / QLE4000 line of products for qLogic).  The guy that architected both of those solutions is named Mike Thompson and he’s the CTO here at Kazan Networks.  He and the team are at it again, having architected and implemented the world’s fastest NVMe-oF bridge. 

And we’re not talking about trivial little state machines… these are complex protocols like Ethernet, TCP, RoCE, iWARP, and NVMe, all done in hardware and moving commands and data through the chip in an extremely efficient manner.  Back in Aug of 2015, when we launched this company at IDF, we made the statement that our ASIC would deliver less than 500 nanoseconds of latency, and now you know how we accomplish that. 

Today we’re celebrating the 2 year anniversary of the founding of the company, and we’re proud of what we have accomplished during those two years.  Here’s to a great holiday break for everyone out there and to an even better 2017!

Categories: Blog


Leave a Reply

Your email address will not be published. Required fields are marked *