Low Latency Trading Insights

Low Latency Trading Insights

The Great Lock-Free Queue Shootout (Part Five)

Queue Design Philosophy: The Art and Science of Optimization

Henrique Bucher's avatar
Henrique Bucher
Sep 12, 2025
∙ Paid
2
1
Share

Looking at our results, it becomes clear that successful SPSC queue design is as much about understanding compilers and architectures as it is about algorithms. Each of the top performers represents a different philosophy about how to extract maximum performance from modern hardware, and their success or failure across our test matrix reveals deep truths about the nature of high-performance programming.

FastQueue2: The Skylake Specialist

FastQueue2's performance profile tells a fascinating story of architectural co-evolution. Its exceptional performance on Skylake (129 million items/sec with GCC) combined with its consistency across compilers suggests a design that's intimately tuned to modern Intel memory subsystem behavior. The queue appears to be optimized for the specific cache line sizes, prefetching patterns, and memory ordering characteristics that Intel implemented in their newer architectures.

What makes FastQueue2 particularly interesting is its compiler independence. While other implementations show wild performance swings between GCC and Clang, FastQueue2 maintains relatively consistent performance characteristics. This suggests a robust design that doesn't rely on specific compiler optimizations or aggressive transformations. Instead, it appears to structure memory accesses and atomic operations in ways that both compilers can optimize effectively.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Henrique Bucher
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture