NVM Express™, or NVMe™, is today’s standard interface for high-performance solid-state storage devices. Developed in 2010 and released as a standard a year later, it represents a modern architecture for I/O with none of the limitations of older, legacy device interconnects, such as Fibre Channel, SAS, or SATA. These older interfaces work well for the application they were designed for, which is connecting legacy, rotating media disk drives. NVMe, on the other hand, was designed from the ground up to take advantage of the inherent performance and parallelism of solid state devices. It represents a fairly ideal interface between multi-core processors and high-performance direct-attach storage. Applications for NVMe range all the way from mobile computing to the most demanding workloads in Enterprise data centers.


NVMe’s physical connection is based on the PCI-express bus. This provides a direct connection between a server’s microprocessors and NVMe devices attached directly to the server. This type of configuration is ideal when populating a server with one or several SSDs, but has two primary limitations:

  • This configuration has limited scalability as PCI-express is inherently not designed for expansion far outside the box.
  • Direct attaching devices offers little support for sharing of flash resources amongst multiple servers.
NVMe over Fabrics enables disaggregation of compute and storage without sacrificing performance.

To solve these challenges, the NVMexpress organization and its member companies developed revolutionary new storage networking standard called NVMe over Fabrics™, commonly referred to as NVMe-oF™, with version 1.0 having been released in June of 2016. By mapping the NVMe command set onto an existing fabric, the number of devices one could attach to a system goes from a handful to thousands or more.

Furthermore, pooling flash capacity immediately becomes a practical application, allowing datacenters to be designed more efficiently. Unlike a direct-attach approach, pools of flash storage can be scaled independently of compute resources, and vice versa. Servers can access flash not only in their own chassis or even in their own rack, but potentially anywhere within the datacenter or even to remote datacenters.  “Stranded” or “dark” flash becomes a wasteful thing of the past.

The key to deploying NMVe-oF is that this additional functionality should not significant degrade performance when accessing remote versus local storage resources.  And this is where selecting the right NVMe-oF solution becomes important.


The architecture and implementation of an NVMe over Fabrics controller will have a direct impact on overall system performance.

What makes Kazan Networks’ solution unique? It’s a simple answer: Hardware vs. Firmware. Look inside other solutions and you’ll find one or more embedded microprocessors running firmware. This is certainly an acceptable way to accomplish a lot of I/O processing tasks, but when you’re looking for the lowest latency possible, look for a solution that implements as much of the I/O processing tasks in hardware as possible. More tasks in firmware = SLOWER. More tasks in hardware = FASTER.

So just how far can you take this architectural approach? Here at Kazan Networks our answer to that question is this: All the way! For the fastest, lowest-latency solution available, we implemented the entire I/O path in hardware. Including significant blocks of IP like RDMA engines. Easy to do? No, but solving difficult challenges is what we do.


So how does our leading-edge architecture measure up?  The proof is in the performance numbers Kazan is easily able to demonstrate:

IOPS:  Running 4kB I/Os, Kazan’s Fuji ASIC is able to push over 2.8M I/Os per second through an NVMe-oF infrastructure.

Bandwidth:  Running 128kB I/Os, that same Fuji ASIC can move 11.8 GB/s through a single 100Gb Ethernet pipe.

Latency:  Fuji’s internal latency through the ASIC is as low as 430 nanoseconds (billionths of seconds).  How that translates to running real I/Os is that the average latency to a drive, even the fastest of drives based on 3D-Xpoint© technology, is essentially equal to that of the drive in direct-attach mode.  Yes, that’s right – NO latency penalty for remote-attached storage!

Net-net:  With the right solution, it’s now possible to take advantage of Storage Disaggregation and Composable Infrastructure while losing nothing in terms of performance.  If you can reap the benefits of increased storage utilization and business agility, while paying nothing in terms of performance, the only question is… why aren’t you deploying this today?