Search
StarWind is a hyperconverged (HCI) vendor with focus on Enterprise ROBO, SMB & Edge

iSCSI vs NVMe-oF: Performance Comparison

  • April 4, 2024
  • 22 min read
StarWind DevOps Team Lead. Volodymyr possesses broad expertise in virtualization, storage, and networking, with exceptional experience in architecture planning, storage protocols, hardware sourcing, and research.
StarWind DevOps Team Lead. Volodymyr possesses broad expertise in virtualization, storage, and networking, with exceptional experience in architecture planning, storage protocols, hardware sourcing, and research.

Introduction

When it comes to storing data for businesses, iSCSI has been a go-to solution for its versatility and universal compatibility to work with different systems, big and small. However, iSCSI is an extension of the old storage protocol, that’s why it has limitations (in terms of latency, performance, CPU overhead) that become even clearer with modern storage such as NVMe. This is where NVMe over Fabrics (NVMe-oF) comes in, offering a way to address iSCSI’s weak points and bring about much faster and more efficient storage solutions. NVMe-oF extends the capabilities of NVMe (Non-Volatile Memory Express) across networks, promising to revolutionize storage systems with better performance and scalability for today’s data-heavy tasks.

While iSCSI is adaptable, it struggles with speed and efficiency in demanding environments. NVMe-oF offers a compelling alternative by using NVMe technology to provide faster access to data with lower delays. By using technologies like RDMA (Remote Direct Memory Access) or TCP/IP networks, NVMe-oF removes the roadblocks that slow down traditional iSCSI setups, making it a smart choice for businesses needing high-performance storage.

Purpose

So, why are we here? This evaluation aims to thoroughly understand how well the Linux Storage Performance Development Kit (SPDK) iSCSI Target performs in high-performance storage scenarios. By carefully testing and analyzing its strengths and weaknesses, we hope to offer valuable insights for technical decision-makers striving to make their storage systems work as fast as possible.

Additionally, we want to compare the SPDK iSCSI target with SPDK NVMe-oF to see how they stack up against each other in terms of speed, efficiency, and suitability for different tasks. Our goal is to help businesses make informed decisions about their storage setups, so they can pick the right technology to meet their needs and goals.

Methodology

In our benchmarking we employ fio, a flexible I/O tester, to conduct various test patterns including random and sequential read/write operations at different block sizes. The tests were executed both locally and remotely over iSCSI for measuring SPDK iSCSI target performance.

The following patterns have been used during the benchmark:

  • Random read/write operations at 4k and 64k block sizes.
  • Sequential read/write at 1M block size.

In our article, we provide the specific test parameters that we have identified to have the best performance-to-latency ratio in our set of tests.

In real-world production, we have to work with devices that are in the kernel space and cannot be captured by SPDK in user space (software RAID, hardware RAID, fault tolerance in any form), so we use the AIO module to test scenarios that are closest to production.

We understand that the AIO module has its own overhead, and with its use, we get lower performance than if we had captured NVMe in user space.

Testbed configuration

The testbed for this benchmarking involved two Supermicro nodes with high-performance hardware, equipped with Intel® Optane™ SSDs and ConnectX®-5 EN 100GbE NICs.

Architecture overview

/var/folders/1v/pwy60yxx48qf73skyj6y2q3m0000gn/T/com.microsoft.Word/Content.MSO/967DCA3.tmp

Target Node

Hardware

Target node Supermicro (SYS-220U-TNR)
CPU Intel(R) Xeon(R) Platinum 8352Y @2.2GHz
Sockets 2
Cores/Threads 64/128
RAM 256Gb
Storage 1x Intel® Optane™ SSD DC P5800X Series (400GB)
NICs 1x ConnectX®-5 EN 100GbE (MCX516A-CDAT)

Software

OS Ubuntu 20.04.6 (5.15.0-91-generic)
SPDK v23.05
fio 3.16

Client Node

Hardware

Client node Supermicro (2029UZ-TR4+)
CPU 2x Intel® Xeon® Platinum 8268 Processor @ 2.90GHz
Sockets 2
Cores/Threads 48/96
RAM 96Gb
NIC 1x ConnectX®-5 EN 100GbE (MCX516A-CDAT)

 

Software

OS Ubuntu 20.04.6 (5.4.0-155-generic)
fio 3.16
nvme-cli 1.9

Benchmark results

To start, below are the benchmarking results that compare the baseline performance of a local NVMe device with that of a remote Linux SPDK iSCSI target across a range of block sizes.

Local NVMe vs SPDK iSCSI Target

Linux (local)
1x Intel® Optane™ SSD DC P5800X Series (400GB)
Linux iSCSI initiator (remote)
1x Intel® Optane™ SSD DC P5800X Series (400GB) via AIO
6x LUNs (6 logical volumes), multipathd enabled
4x sessions per LUN
Comparison %
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) CPU usage
Random read 4k 6 4 1552000 6061 0.015 4% 32 32 1272000 4969 0.806 47% 82% 82% 5370% 1175%
Random write 4k 6 4 1160000 4532 0.020 3% 32 32 1130000 4414 0.906 42% 97% 97% 4528% 1400%
Random read 64K 2 2 109000 6799 0.036 1% 6 8 111000 6946 0.431 4% 102% 102% 1197% 400%
Random write 64K 2 2 75500 4717 0.052 1% 6 8 74700 4670 0.641 4% 99% 99% 1233% 400%
Sequential read 1M   1 2 7034 7034 0.284 1% 3 2 6877 6877 0.871 1% 98% 98% 307% 100%
Sequential write 1M 1 2 4790 4790 0.417 1% 3 2 4553 4553 1.316 1% 95% 95% 316% 100%

Now, let’s get straight to the point – analyze the initial benchmarking results. We find that local NVMe storage delivers strong baseline performance for random 4K reads and writes, hitting up to 1.55 million IOPS. The SPDK iSCSI target achieves 82% of this local storage performance, with 1.33 million IOPS. However, for larger block size reads/writes, the performance of local storage and the SPDK iSCSI target is nearly identical.

Digging deeper into latency, we observe that the baseline latency for local storage on 4K reads and writes is 0.015 ms and 0.020 ms, respectively. In contrast, the SPDK iSCSI target’s 4K read and write latency is about 50 times higher than that of local NVMe, at 0.806 ms and 0.906 ms respectively, due to the extra CPU overhead introduced by the iSCSI protocol.

CPU utilization for the SPDK iSCSI target is considerably higher than for local NVMe because the SPDK target software must process each iSCSI command, and more IOPS translates to a greater CPU load. During local storage tests, CPU usage ranged from 1% for 1M block sizes to 4% for 4K writes, whereas the SPDK iSCSI target demonstrated significantly higher CPU usage, from 1% for 1M blocks to 47% for 4K writes.

In conclusion, the SPDK iSCSI implementation offers good overall performance but at the cost of CPU performance on smaller block sizes (4k RR/RW), showing a substantially higher CPU load and I/O latency.

SPDK iSCSI vs SPDK NVMe-oF

Let’s now compare the performance of SPDK NVMe-oF and SPDK iSCSI targets.

Below are tables and graphs displaying test results for various block sizes, highlighting parameters like IOPS, latency, CPU usage, and bandwidth. Beneath each image is a detailed analysis evaluating the efficiency of NVMe-oF over RDMA, NVMe-oF over TCP, and iSCSI.

SPDK iSCSI Target results

Linux iSCSI initiator (remote)
1x Intel® Optane™ SSD DC P5800X Series (400GB) via AIO
6x LUNs (6 logical volumes), multipathd enabled
4x sessions per LUN
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) CPU usage
Random read 4k 1 1 22600 88 0.043 1%
Random write 4k 1 1 8716 34 0.113 1%
Random read 4k 32 32 1272000 4969 0.806 47%
Random write 4k 32 32 1130000 4414 0.906 42%
Random read 64K 6 8 111000 6946 0.431 4%
Random write 64K 6 8 74700 4670 0.641 4%
Sequential read 1M   3 2 6877 6877 0.871 1%
Sequential write 1M 3 2 4553 4553 1.316 1%

SPDK NVMe-oF target over RDMA results

Linux NVMe-oF Initiator (remote, RDMA)
1x Intel® Optane™ SSD DC P5800X Series (400GB) via AIO
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) CPU usage
Random read 4k 1 1 51100 200 0.018 1%
Random write 4k 1 1 28000 113 0.033 1%
Random read 4k 8 16 1558000 6085 0.082 7%
Random write 4k 8 8 1143000 4467 0.055 5%
Random read 64K 3 2 108000 6755 0.055 1%
Random write 64K 3 2 74800 4677 0.079 1%
Sequential read 1M   1 2 7033 7033 0.283 1%
Sequential write 1M 1 2 4786 4786 0.417 1%

SPDK NVMe-oF target over TCP results

 Linux NVMe-oF Initiator (remote, TCP)
1x Intel® Optane™ SSD DC P5800X Series (400GB) via AIO
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) CPU usage
Random read 4k 1 1 28600 112 0.034 1%
Random write 4k 1 1 15500 61 0.063 1%
Random read 4k 18 32 1503000 5871 0.382 19%
Random write 4k 18 16 1102000 4305 0.26 12%
Random read 64K 6 4 109000 6812 0.219 4%
Random write 64K 6 4 75700 4730 0.316 3%
Sequential read 1M   3 2 6756 6756 0.886 1%
Sequential write 1M 3 2 4540 4540 1.32 1%

Visualizing results in charts

With all benchmarks completed and data collected, let’s now compare the achieved results using graphical charts for a clearer understanding.

/var/folders/1v/pwy60yxx48qf73skyj6y2q3m0000gn/T/com.microsoft.Word/Content.MSO/719CA1A9.tmp

Figure 1: 4K RR (IOPS), 1 thread

Figure 1 shows the number of Input/Output Operations Per Second (IOPS) achieved during 4K random read operations with Numjobs = 1 and IOdepth = 1. It reveals that NVMe-oF over RDMA significantly outperforms both NVMe-oF over TCP (by 44%) and iSCSI (by 55.77%) in terms of IOPS in the single-thread scenario.

 

/var/folders/1v/pwy60yxx48qf73skyj6y2q3m0000gn/T/com.microsoft.Word/Content.MSO/D690DA9F.tmp

Figure 2: 4K RR (Latency), 1 thread

Figure 2 compares the latency when reading 4K data blocks with a single thread. The results indicate that NVMe-oF over RDMA has the lowest latency, followed by NVMe-oF over TCP, and then iSCSI.

 

/var/folders/1v/pwy60yxx48qf73skyj6y2q3m0000gn/T/com.microsoft.Word/Content.MSO/D012DD05.tmp

Figure 3: 4K RW (IOPS), 1 thread

Figure 3 shows the IOPS achieved with a 4K random write pattern. The test reveals that NVMe-oF over RDMA outperforms NVMe-oF over TCP by 44.64% and iSCSI by 68.87% in terms of IOPS in the single-thread scenario.

/var/folders/1v/pwy60yxx48qf73skyj6y2q3m0000gn/T/com.microsoft.Word/Content.MSO/F0131A5B.tmp

Figure 4: 4K RW (Latency), 1 thread

Figure 4 showcases the latency results for a 4K block size. The test findings indicate that NVMe-oF over RDMA delivers the lowest latency in single-thread scenarios, outperforming both NVMe-oF over TCP and iSCSI. Clearly, lower latency is better.

 

/var/folders/1v/pwy60yxx48qf73skyj6y2q3m0000gn/T/com.microsoft.Word/Content.MSO/4B168021.tmp

Figure 5: 4K RR (IOPS), Multiple threads

Figure 5 illustrates the IOPS achieved during 4K random read operations. NVMe-oF over RDMA outperforms NVMe-oF over TCP by 3.5% and iSCSI by 18.35% in terms of IOPS.

/var/folders/1v/pwy60yxx48qf73skyj6y2q3m0000gn/T/com.microsoft.Word/Content.MSO/A637D7D7.tmp

Figure 6: 4K RR (Latency), Multiple threads

Figure 6 displays the results of a 4K random read latency test using multiple threads. Lower latency indicates better performance. NVMe-oF over RDMA exhibits roughly 78% lower latency compared to NVMe-oF over TCP and about 90% lower latency compared to iSCSI.

/var/folders/1v/pwy60yxx48qf73skyj6y2q3m0000gn/T/com.microsoft.Word/Content.MSO/549026FD.tmp

Figure 7: 4K RR (CPU usage), Multiple threads

Figure 7 shows the CPU usage during a 4k random read test. NVMe-oF over RDMA exhibits approximately 85% lower CPU usage compared to iSCSI and approximately 63% lower CPU usage compared to NVMe-oF over TCP.

 

/var/folders/1v/pwy60yxx48qf73skyj6y2q3m0000gn/T/com.microsoft.Word/Content.MSO/E39E0F13.tmp

Figure 8: 4K RW (CPU usage), Multiple threads

Figure 8 shows IOPS achieved during 4k random write operations. NVMe-oF over RDMA leads with approximately 3% higher IOPS compared to iSCSI and approximately 4% higher IOPS compared to NVMe-oF over TCP.

 

/var/folders/1v/pwy60yxx48qf73skyj6y2q3m0000gn/T/com.microsoft.Word/Content.MSO/12042D99.tmp

Figure 9: 4K RW (CPU usage), Multiple threads

Figure 9 compares the latency for writing 4K data blocks. NVMe-oF over RDMA demonstrates roughly 79% lower latency compared to NVMe-oF over TCP and about 94% lower latency compared to iSCSI, highlighting its superior performance.

/var/folders/1v/pwy60yxx48qf73skyj6y2q3m0000gn/T/com.microsoft.Word/Content.MSO/1DB67C0F.tmp

Figure 10: 4K RW (CPU usage), Multiple threads

Figure 10 displays CPU usage during a 4k random write test, NVMe-oF over RDMA demonstrates approximately 88% lower CPU usage compared to iSCSI and approximately 58% lower CPU usage compared to NVMe-oF over TCP, highlighting its superb efficiency.

 

/var/folders/1v/pwy60yxx48qf73skyj6y2q3m0000gn/T/com.microsoft.Word/Content.MSO/AE6EAFF5.tmp

Figure 11: 64K RR (Throughput), Multiple threads

Figure 11 illustrates the throughput for random reads in 64KB blocks. iSCSI achieves approximately 2% higher average read speed than NVMe-oF over TCP, and about 3% higher than NVMe-oF over RDMA.

 

/var/folders/1v/pwy60yxx48qf73skyj6y2q3m0000gn/T/com.microsoft.Word/Content.MSO/6EFF9ACB.tmp

Figure 12: 64K RR (Latency), Multiple threads

Figure 12 displays latency for a 64K random read. Naturally, NVMe-oF over RDMA exhibits roughly 75% lower latency compared to NVMe-oF over TCP and about 87% lower latency compared to iSCSI.

 

/var/folders/1v/pwy60yxx48qf73skyj6y2q3m0000gn/T/com.microsoft.Word/Content.MSO/3BDF8A11.tmp

Figure 13: 64K RR (CPU usage), Multiple threads

Figure 13 shows CPU usage during a 64k random read. NVMe-oF over RDMA exhibits approximately 75% lower CPU usage compared to iSCSI and NVMe-oF over TCP.

 

/var/folders/1v/pwy60yxx48qf73skyj6y2q3m0000gn/T/com.microsoft.Word/Content.MSO/9701A747.tmp

Figure 14: 64K RW (Throughput), Multiple threads

Figure 14 shows the average throughput during a 64k random write test. NVMe-oF over TCP demonstrates approximately 1.2% higher average write speed than both NVMe-oF over RDMA and iSCSI.

 

/var/folders/1v/pwy60yxx48qf73skyj6y2q3m0000gn/T/com.microsoft.Word/Content.MSO/40D657ED.tmp

Figure 15: 64K RW (Latency), Multiple threads

Figure 15 shows the latency of 64k random writes. NVMe-oF over RDMA demonstrates approximately 87.67% lower average latency compared to iSCSI and approximately 75% lower average latency compared to NVMe-oF over TCP. The lower, the better.

/var/folders/1v/pwy60yxx48qf73skyj6y2q3m0000gn/T/com.microsoft.Word/Content.MSO/5C0A9D83.tmp

Figure 16: 64K RW (CPU usage), Multiple threads

Figure 16 displays CPU usage during 64k random write. NVMe-oF over RDMA exhibits approximately 75% lower CPU usage compared to iSCSI, and approximately 67% lower CPU usage compared to NVMe-oF over TCP.

 

/var/folders/1v/pwy60yxx48qf73skyj6y2q3m0000gn/T/com.microsoft.Word/Content.MSO/CD5E7589.tmp

Figure 17: 1MB SR (Throughput), Multiple threads

Figure 17 depicts the results of a 1M sequential read test measuring throughput in MiB/s. NVMe-oF over RDMA exhibits approximately 4% higher performance than NVMe-oF over TCP and approximately 2% higher performance than iSCSI.

/var/folders/1v/pwy60yxx48qf73skyj6y2q3m0000gn/T/com.microsoft.Word/Content.MSO/8AAA397F.tmp

Figure 18: 1MB SR (Latency), Multiple threads

Figure 18 shows the average latency during the 1M sequential read test. NVMe-oF over RDMA demonstrates approximately 66% lower average latency compared to iSCSI and NVMe-oF over TCP.

/var/folders/1v/pwy60yxx48qf73skyj6y2q3m0000gn/T/com.microsoft.Word/Content.MSO/B3EAFEE5.tmp

Figure 19: 1MB SR (CPU usage), Multiple threads

Figure 19 shows that the CPU usage for all three protocols during a 1M read test is relatively low, hovering around 1% for most of the time. This is expected, as less IOPS means less CPU cycles.

 

/var/folders/1v/pwy60yxx48qf73skyj6y2q3m0000gn/T/com.microsoft.Word/Content.MSO/8EEDF73B.tmp

Figure 20: 1MB SW (Throughput), Multiple threads

Figure 20 depicts the results of a 1M sequential write test. NVMe-oF over RDMA achieves approximately 5% higher throughput than both NVMe-oF over TCP and iSCSI.

/var/folders/1v/pwy60yxx48qf73skyj6y2q3m0000gn/T/com.microsoft.Word/Content.MSO/41F2D001.tmp

Figure 21: 1MB SW (Latency), Multiple threads

Figure 21 displays the results of a 1M sequential write latency test, NVMe-oF over RDMA demonstrates approximately 68% lower latency compared to NVMe-oF over TCP and iSCSI.

/var/folders/1v/pwy60yxx48qf73skyj6y2q3m0000gn/T/com.microsoft.Word/Content.MSO/635D12B7.tmp

Figure 22: 1MB SW (CPU usage), Multiple threads

Figure 22 shows that the CPU usage for all three protocols during a 1M write test is very low, hovering around 1% for most of the time.

Conclusion

Based on these test results, we can see that NVMe-oF over RDMA is the clear winner for high-performance storage applications demanding the absolute best speed and efficiency. However, for the best results, it requires RDMA-capable hardware and might be more complex to set up compared to iSCSI. NVMe-oF over TCP offers a good balance between performance and complexity, while iSCSI remains a viable option for more traditional workloads where the lowest possible latency and absolute maximum IOPS are not critical.

So, what’s the verdict? Choosing between iSCSI SPDK and NVMe-oF hinges on your specific priorities. If raw speed, efficiency, and future-proofing are paramount, NVMe-oF takes the lead. For simpler setups and broader compatibility, iSCSI SPDK might be a suitable choice, especially if knowing how to tweak it to get the maximum performance. Ultimately, the ideal solution lies in carefully analyzing your performance needs, infrastructure limitations, and budget constraints to make an informed decision that best suits your high-performance storage requirements.

This material has been prepared in collaboration with Iryna Chaplya, Technical Writer with almost 5 years of experience at StarWind.

 

Found Volodymyr’s article helpful? Looking for a reliable, high-performance, and cost-effective shared storage solution for your production cluster?
Dmytro Malynka
Dmytro Malynka StarWind Virtual SAN Product Manager
We’ve got you covered! StarWind Virtual SAN (VSAN) is specifically designed to provide highly-available shared storage for Hyper-V, vSphere, and KVM clusters. With StarWind VSAN, simplicity is key: utilize the local disks of your hypervisor hosts and create shared HA storage for your VMs. Interested in learning more? Book a short StarWind VSAN demo now and see it in action!