Why Your C++ to FPGA Migration Is Doomed to Crash and Burn

Valuable lessons from the trenches

Aug 15, 2025

Introduction

The allure of Field-Programmable Gate Arrays (FPGAs) is undeniable for industries like high-frequency trading (HFT), where ultra-low latency and parallel processing promise significant performance gains.

Many organizations attempt to migrate their optimized C++ code, often tailored for high-performance CPUs, to FPGA hardware using Register-Transfer Level (RTL) designs or high-level synthesis (HLS) tools, expecting seamless acceleration.

However, this transition is fraught with challenges that can lead to unexpected failures, including slower performance, missed latency targets, and ballooning development costs. Drawing from real-world experiences, such as Microsoft’s early struggles with FPGAs in Bing’s search infrastructure, this article explores the critical reasons why converting C++ code to run on FPGAs often falls short. From hardware constraints to software-to-hardware translation inefficiencies, we uncover the pitfalls that can derail your migration and offer insights into navigating this complex process.

Trading systems, especially in high-frequency trading (HFT), prioritize ultra-low latency for tasks like market data processing, order execution, and risk assessment.

While FPGAs are often used to accelerate specific parts of these systems due to their parallelism and determinism, converting optimized C++ code to RTL (Verilog and VHDL) and running it on an FPGA can sometimes result in slower performance compared to the original C++ running on a high-end CPU.

Below, we will outline key reasons for this, drawing from hardware differences, software-to-hardware translation challenges, and application-specific factors, expanded with real-life examples from industry implementations and benchmarks.

Disclaimer

Field-Programmable Gate Arrays (FPGAs) are widely and successfully used in high-frequency trading (HFT) and other latency-sensitive applications, delivering unparalleled performance for tasks like market data processing, order execution, and tick-to-trade pipelines, often achieving sub-microsecond latencies.

This article does not suggest that FPGAs are unsuitable for trading systems; rather, it focuses on the common pitfalls that lead to failures when migrating optimized C++ code to FPGA hardware using Register-Transfer Level (RTL) designs or high-level synthesis (HLS) tools. By highlighting these challenges—such as hardware constraints, inefficient code translation, and integration complexities—we aim to provide insights into why such projects may falter and how to approach them more effectively.

Companies Facing Challenges in C++ to FPGA Migration

While direct admissions of "failure" in C++ to FPGA migrations (via high-level synthesis or HLS tools) are rare in public disclosures—companies tend to highlight successes—several organizations have documented challenges in white papers, case studies, and technical articles.

These often stem from issues like suboptimal RTL generation, performance gaps, and integration difficulties, as will be discussed later. Below, I cite specific companies that have encountered such hurdles, particularly in latency-sensitive domains like trading systems, networking, or finance-adjacent applications.

I've focused on those with published materials (e.g., white papers or research papers) that describe the experiences, drawing from available sources.

Microsoft (Project Catapult for Bing and Azure)
Microsoft's early efforts to accelerate workloads like search ranking and networking using FPGAs involved migrating C++-based algorithms to HLS-generated RTL.
Challenges included economic non-viability of initial designs (too narrow for Bing-only use), programming complexities leading to inefficient RTL, and workload mismatches where sequential C++ code didn't parallelize well, resulting in redesigns and delays. While ultimately successful, these issues are detailed in lessons-learned narratives.
Microsoft's research papers and presentations on Project Catapult highlight the pivot from a Bing-specific FPGA implementation to a more general one due to cost and scalability challenges. For instance, a 2014 redesign was needed after realizing the initial HLS approaches underperformed for non-parallel tasks.
BittWare (a Molex Company)
BittWare, a provider of FPGA-based solutions for high-performance computing including financial networking, experimented with migrating C++ code to HLS for FPGA acceleration in trading-like environments (e.g., low-latency network functions). They faced challenges such as higher latency from poor loop unrolling in HLS-generated RTL, inefficient resource utilization, and the need for extensive manual optimizations to match traditional RTL performance.
In their white paper "Comparing FPGA RTL to HLS C/C++ using a Networking Example," BittWare details a case study implementing Receive Side Scaling (RSS) for market data feeds. The HLS version initially showed higher latency and resource overhead compared to hand-coded RTL, requiring refinements to approach viability for HFT scenarios.
Achronix Semiconductor Corporation
Achronix, an FPGA vendor, has openly discussed real-world challenges in adopting HLS for migrating C++ code to FPGA designs, particularly in applications requiring high clock frequencies and low latency, such as those in finance or data processing. Issues include non-synthesizable C++ constructs, lack of hardware awareness in code, and difficulties in extracting parallelism, leading to suboptimal RTL and potential project delays.
Article/White Paper Equivalent: Their SemiEngineering article "Challenges In Using HLS For FPGA Design" outlines these pitfalls based on industry experiences, noting that without addressing them, HLS migrations can fail to deliver expected performance gains in trading systems.
Optiver (High-Frequency Trading Firm)
Optiver, a leading HFT firm, has utilized FPGAs for ultra-low-latency trading but encountered challenges when attempting to convert optimized C++ trading logic to RTL. Problems include branching overheads and memory access inefficiencies in HLS flows, which can make converted code slower than CPU-based C++ for complex strategies. In a 2020 FPL conference talk "FPGAs and Low Latency Trading" by Williston Hayes of Optiver, they discuss practical hurdles in FPGA adoption for HFT, including migration challenges from software to hardware, though not a formal white paper.
Additional Insights from Broader Industry and Academic Sources
Papers like "FPGA HLS Today: Successes, Challenges, and Opportunities" (from researchers affiliated with UCLA and Falcon Computing, now part of Xilinx/AMD) cite challenges in HLS for domains including finance, where C++ to RTL conversions fail due to clock frequency gaps and control flow issues. Similarly, an MDPI paper on accelerating trading back-ends with FPGAs via HLS describes university-led efforts (potentially inspired by industry) facing suboptimal performance without heavy redesign.
Firms like Jump Trading or Citadel are anecdotally known to hire FPGA experts for HFT but rarely publish on challenges; discussions in forums (e.g., Hacker News) reference unnamed HFT companies struggling with C++ overheads versus FPGAs.

Lower Clock Frequency

FPGAs typically operate at clock speeds of 100-500 MHz, while modern CPUs run at 2-5 GHz or higher. For sequential operations or tasks with data dependencies that cannot be fully parallelized (common in trading logic like conditional order routing), the CPU's higher clock speed allows it to complete computations faster, even if the FPGA uses fewer cycles per operation.

A real-life example comes from cryptocurrency HFT systems, where an FPGA-based strategy adjustment might react in 50 microseconds, but for complex sequential computations like multi-asset correlation checks, the lower clock (e.g., 200-300 MHz on Xilinx FPGAs) leads to higher overall latency compared to a 4 GHz CPU handling the same logic in optimized C++.

In another case, Microsoft's use of FPGAs in Bing search (analogous to trading data processing) highlighted that FPGAs clock at low hundreds of MHz, making them slower for non-parallel tasks versus CPUs at 3 GHz, a gap that persists in converted trading code without heavy redesign.

Microsoft's adoption of Field-Programmable Gate Arrays (FPGAs) for accelerating its Bing search engine, primarily through Project Catapult, is often cited as a pioneering effort in using programmable hardware at cloud scale. While the initiative was largely successful in improving performance and efficiency for specific workloads, there were notable challenges and "bad experiences" that highlight the complexities of integrating FPGAs into a large-scale system like Bing.

Challenges in Parallelizing Sequential Code

Optimized C++ code is often written for sequential execution on CPUs, leveraging features like branch prediction and caching. Trading systems involve branching (e.g., based on market conditions) and dependencies (e.g., waiting for prior calculations in risk checks), which are hard to parallelize efficiently in hardware. FPGAs excel in massively parallel, fixed tasks but may require more clock cycles or added latency for such irregular control flow, making them slower for non-parallelizable portions.

For instance, in HFT order book builders, sequential C++ code for processing market data updates (e.g., handling dependent quote insertions) struggles when converted to RTL, as the FPGA cannot fully exploit parallelism without manual restructuring, leading to latencies 10-30x higher than CPU in benchmarks from Columbia University projects.

Columbia

Columbia University's work on FPGA-based order book building for high-frequency trading (HFT) primarily stems from a student project in the EECS E4840 Embedded Systems Design course, taught by Prof. Stephen Edwards in Spring 2024. The project, titled "High Frequency Trade Book Builder using FPGA," was developed by students Arpit Garg and Harshit Sharma. It focuses on creating a low-latency order book system using FPGA hardware to process market data, maintain order states, and display results, simulating real-world HFT scenarios.

Imperial College

A 2017 paper from Imperial College London on FPGA-accelerated order book updates, latencies were measured at 132-288 nanoseconds on an FPGA (Stratix-V at 200MHz), which is 90-157x faster than equivalent CPU implementations (20.7-26.5 microseconds). This aligns with the Columbia project's rationale but provides the "color" on potential outcomes—FPGAs excel when designs are hardware-optimized, though direct C++ conversions (as in HLS) might not yield these gains without restructuring.

The project emphasizes that CPUs suffer from higher abstraction layers, OS interruptions, and sequential processing limits, leading to higher latencies in HFT. FPGAs allow parallel operations (e.g., updating multiple order levels simultaneously) and deterministic timing, crucial for HFT where predictability prevents "jitter." No side-by-side benchmarks, but the proposal implies CPU software would be slower for data-intensive tasks like parsing and updating during market bursts. For example, in HFT, CPU-based systems might handle 45,000-50,000 messages/second, while FPGAs can sustain millions.

However, order books are inherently sequential (e.g., updates depend on prior states), clashing with FPGAs' parallel nature. The students noted that implementing full trading logic was "infeasible" within the project timeline due to this mismatch, requiring custom modules (e.g., for cancel/execute) and signals to manage flow.

REPARA

Another example is from the REPARA project, where parallelizing sequential HFT code for FPGAs required reshaping the algorithm, but initial conversions resulted in suboptimal throughput due to unresolved dependencies, slower than native C++ on multi-core CPUs.

The REPARA project, formally known as "Reengineering and Enabling Performance And poweR of Applications," was a European Union-funded initiative under the FP7-ICT-2013-10 program. Running from September 1, 2013, to August 31, 2016, with a total budget of approximately €3.6 million (EU contribution: €2.7 million), it aimed to address the growing challenges in computing performance, energy efficiency, and development productivity in the face of stagnating Moore's Law and rising energy demands. Coordinated by Universidad Carlos III de Madrid (UC3M), the project brought together a consortium of academic and industrial partners to develop tools and methodologies for parallelizing applications on heterogeneous computing platforms.

Overall, REPARA represented a forward-thinking effort to make parallel heterogeneous computing more accessible, particularly for FPGAs, bridging the gap between software developers and hardware acceleration. While successful in many aspects, it underscored the need for code reshaping in complex applications like HFT to fully leverage FPGA benefits.

Microsoft

In early FPGA deployments for Bing, Microsoft targeted computationally expensive operations like decision-tree algorithms in the IndexServe engine. While these showed dramatic improvements (40x faster in the 2012 pilot), other components of Bing, such as query preprocessing or dynamic result caching, were less amenable to FPGA acceleration due to their sequential nature or irregular memory access patterns. This led to uneven performance gains, where only specific parts of the search pipeline benefited, leaving other parts reliant on slower CPU processing or suboptimal FPGA implementations.

The 2014 architecture redesign placed FPGAs as a "bump in the wire" between servers' network interface cards (NICs) and Ethernet switches, enabling low-latency networking and computation. However, early iterations struggled with inefficient designs that consumed excessive FPGA resources (e.g., LUTs or BRAM), leading to lower clock frequencies or increased latency. For instance, Bing's IndexServe engine required careful optimization to map decision-tree algorithms onto FPGAs, and initial implementations were slower than expected due to unoptimized RTL code, requiring iterative refinements by a small team of hardware designers.

Suboptimal RTL Generation from Conversion Tools

High-Level Synthesis (HLS) tools enable developers to convert high-level programming languages like C, C++, or SystemC into hardware description languages such as VHDL or Verilog for FPGA and ASIC design, offering a valuable means to reduce development time and complexity, which is particularly appealing for applications like high-frequency trading (HFT) where low latency and hardware acceleration are critical.

Tools

Among the most popular HLS tools currently available, based on industry adoption, vendor support, and community feedback as of August 15, 2025, Xilinx Vitis HLS stands out as a leading option. Developed by Xilinx, now part of AMD following the 2022 acquisition, Vitis HLS is integrated into the Vitis Unified Software Platform and supports C, C++, and OpenCL, targeting Xilinx FPGAs with advanced optimization directives known as pragmas, support for AI/ML workloads, and integration with Vitis AI for neural network acceleration.

It also includes a simulation environment to estimate performance and resource usage, making it widely used in industries like telecommunications, automotive, and finance, including HFT where firms optimize trading algorithms for low-latency execution, and it is accessible via AMD's developer portal with Vitis licenses.

Another prominent tool is the Intel oneAPI HLS Compiler, formerly known as the Intel HLS Compiler, which is part of Intel's oneAPI toolset and supports C++ and OpenCL, targeting Intel FPGAs such as the Stratix, Arria, and Agilex series, having evolved from the acquisition of Altera in 2015.

This tool emphasizes integration with the DPC++ (Data Parallel C++) programming model, enabling heterogeneous computing across CPUs, GPUs, and FPGAs, and offers detailed performance analysis and debugging tools, contributing to its popularity among developers in data centers, aerospace, and financial computing, including HFT systems where Intel FPGAs are deployed for order book processing. It is available through Intel's oneAPI subscription, with a free base version for non-commercial use, broadening its accessibility.

Cadence Stratus HLS, a commercial tool from Cadence, is designed for system-level design and supports C++, SystemC, and synthesizable subsets of these languages for FPGA and ASIC flows, providing advanced verification through equivalence checking, power optimization, and integration with Cadence's broader EDA suite, such as Innovus for place-and-route.

It excels in control-heavy designs and is favored in semiconductor and automotive industries, with growing use in financial applications requiring precise timing and power efficiency, though it is less common in HFT compared to Xilinx and Intel tools. It is licensed through Cadence with enterprise-focused pricing, catering to larger organizations.

Similarly, Mentor Graphics Catapult HLS, now part of Siemens EDA following Siemens' acquisition of Mentor Graphics, supports C++ and SystemC, targeting both FPGAs and ASICs, and is known for its high-level design exploration, including architectural trade-off analysis and automated testbench generation. It shines in control-dominated applications and offers tight integration with Siemens' Veloce emulation platform, making it widely adopted in aerospace, defense, and industrial automation, with some use in financial systems for custom hardware acceleration, and is available through Siemens EDA licensing, often bundled with other tools.

Synopsys Synphony HLS rounds out the top commercial options, being a mature HLS tool that supports C, C++, and SystemC, aimed at ASIC and FPGA design flows, and offers advanced power, performance, and area (PPA) optimization with a focus on ASIC flows but adaptability for FPGAs. It includes extensive IP integration and design space exploration, dominating the ASIC market while seeing increasing FPGA use in consumer electronics and some financial computing niches, though it is less prevalent in HFT compared to Xilinx and Intel due to its ASIC focus. It is licensed through Synopsys with enterprise-grade support, appealing to companies with complex design needs.

Beyond these commercial leaders, additional tools and emerging options include LegUp HLS, an open-source tool from the University of Toronto supporting C and C++ for FPGA synthesis, popular in academic research and small-scale projects but lacking the commercial support of proprietary tools, and MaxCompiler from Maxeler Technologies, a niche HLS tool optimized for dataflow applications like financial modeling and scientific computing, highly regarded in specific HFT and big data contexts though less common overall.

Open-source efforts like Bambu from Politecnico di Milano and ROCCC from UCLA are also gaining attention in research but are not yet mainstream for industrial HFT use.

In the context of HFT, where latency is paramount, Xilinx Vitis HLS and Intel oneAPI HLS Compiler dominate due to their robust FPGA ecosystems and support for real-time optimization, with firms like Optiver and Jump Trading leveraging these tools to accelerate order book updates and market data processing, often achieving sub-microsecond latencies when designs are hand-optimized.

However, as noted in prior discussions, such as those involving Columbia University and the REPARA projects, direct C++ to HLS conversions can fail if the code isn’t reshaped for parallel hardware, a challenge these tools aim to mitigate with directives and profiling.

When selecting a tool, considerations include ease of use, with Vitis and oneAPI offering user-friendly interfaces and extensive documentation appealing to software engineers transitioning to hardware, performance where Cadence Stratus and Synopsys Synphony excel in PPA optimization for complex designs, and cost where open-source options like LegUp are free but require expertise, while commercial tools involve significant licensing fees.

These tools represent the state-of-the-art in HLS as of August 15, 2025, with ongoing advancements likely to enhance their capabilities for emerging workloads, and further exploration into specific features or case studies can be pursued if desired.

Challenges

Converting C++ to RTL often uses high-level synthesis (HLS) tools. These tools may not infer optimal hardware designs from CPU-optimized C++ code, leading to inefficient resource use, longer critical paths (reducing achievable clock frequency), or unnecessary pipelining stages that increase latency. Without extensive manual pragmas or rewriting the code for HLS, the resulting RTL can underperform hand-tuned C++ on CPU.

A practical example is in financial networking functions like Receive Side Scaling (RSS) for market data feeds, where HLS-converted C++ to RTL on FPGAs showed higher latency (e.g., due to poor loop unrolling) compared to traditional RTL or CPU implementations, as demonstrated in BittWare's comparisons for trading-like environments.

In HEP algorithm acceleration for finance-inspired tasks, legacy C++ conversions via HLS produced RTL with non-hardware-aware constructs, causing up to 2x latency overhead versus optimized CPU code, requiring iterative refinements to match performance.

Memory Access and Caching Differences

Trading systems rely on fast access to large datasets (e.g., order books or historical quotes). CPUs have sophisticated multi-level caches and prefetching that handle irregular memory patterns efficiently. FPGAs have limited on-chip memory (BRAM) and may need external DRAM, introducing higher latency for random accesses. If the C++ code exploits CPU caching well, the FPGA version could be slower due to memory bottlenecks.
In HFT hash joins for order matching, FPGA implementations faced memory delays from external DRAM, requiring multithreading to mask latency, but still underperformed CPUs with L3 caches in end-to-end tests, achieving only 1.6 billion tuples/second versus faster CPU handling of irregular accesses. Another example is in crypto trading platforms, where FPGA external memory integration for large datasets like volatility models led to bottlenecks, with optimization techniques (e.g., burst accesses) needed to approach CPU speeds, but initial conversions were 2-5x slower due to von Neumann-style limits on FPGAs.

Branching and Control Flow Overhead

In CPUs, branch prediction minimizes penalties for conditional logic, which is prevalent in trading (e.g., if-then decisions on price thresholds). In RTL on FPGAs, branching translates to multiplexing or state machines, which can add combinatorial logic delays and reduce throughput, especially if not optimized for the hardware.

A key example is in low-latency HFT C++ systems using semi-static conditions to avoid branching penalties; when converted to RTL, these manifest as mux overhead, adding cycles (e.g., 1-2 extra in benchmarks) versus CPU's predicted branches, as seen in studies where branch-taking cost 14-20 cycles more on FPGAs. In deterministic HFT code flows, runtime branching in converted RTL increased overhead by 10-20% in execution time compared to CPU, as reported in Core C++ conference presentations on trading patterns.

Resource Constraints and Scalability Limits

FPGAs have finite logic elements, DSP blocks, and memory, which may force compromises in complex trading algorithms (e.g., simplifying models or serializing operations). This can lead to higher latency or lower frequency to fit within resources, whereas C++ on CPU can scale with more cores or vectorization without such hard limits.

For complex HFT indicators like MACD, FPGA implementations required partitioning to fit resources, achieving 30x speedup only after optimization, but initial conversions exceeded LUT/BRAM limits, forcing serialization and 2-5x slower performance than CPU scalar code. In deep learning-based trading models on resource-constrained FPGAs, memory bandwidth limits caused bottlenecks, leading to simplified algorithms that ran slower than full CPU versions, as noted in MDPI studies on industrial applications.

The 2012 Microsoft pilot on Bing with 1,632 FPGA-enabled servers demonstrated significant latency improvements for Bing’s search ranking, but scaling this to all Bing datacenters required a new architecture by 2014. The initial pilot used a custom secondary network for FPGA communication, which was not scalable for hyperscale datacenters. The redesigned architecture, which placed FPGAs between NICs and switches, faced challenges in ensuring low-latency communication across hundreds of thousands of FPGAs, with early tests showing variability in performance due to network bottlenecks or misconfigured FPGA firmware.

System Integration and I/O Overhead

In a full trading setup, data must flow between network interfaces, processors, and storage. If the FPGA implementation isn't fully standalone (e.g., requiring host CPU interaction), PCIe or other interfaces can add microseconds of latency—critical in HFT where nanoseconds matter. Optimized C++ can run directly on the CPU near the network stack, avoiding such overheads.

The STAC-T0 benchmark for tick-to-trade latency showed AMD FPGAs achieving world records, but hybrid CPU-FPGA systems added 1-5 microseconds via PCIe, making them slower for integrated trading than pure CPU setups in non-accelerated paths.

In Enyx's low-latency trading deployments, FPGA I/O standardization reduced but didn't eliminate overhead in converted systems, with network-to-FPGA delays exceeding CPU direct access in some arbitrage strategies.

In Bing’s production environment, FPGAs were used to accelerate deep neural networks (DNNs) for search ranking by 2017. However, initial deployments faced overhead from FPGA-to-CPU communication, particularly when tasks required coordination with the host CPU over PCIe, adding microseconds of latency. This was critical for Bing, where sub-millisecond query responses are expected. For example, a 2016 report noted that while FPGAs reduced latency for specific ranking tasks, the overall system latency could increase if the FPGA wasn’t handling the entire pipeline, as data transfers to CPUs introduced bottlenecks.

Note that FPGAs often shine in HFT for specific low-latency tasks like feed parsing or tick-to-trade loops when designed from scratch in RTL, potentially achieving sub-microsecond latencies. However, direct conversion of existing C++ without hardware-specific redesigns can negate these benefits, highlighting the importance of architecture-aware development.

Conclusion

The journey of migrating optimized C++ code to FPGA hardware, particularly for latency-sensitive applications like high-frequency trading (HFT), holds immense potential but is fraught with pitfalls that can lead to failure if not navigated with care.

As evidenced by real-world experiences, such as Microsoft's early struggles with FPGAs in Bing’s Project Catapult, where economic missteps and programming complexities necessitated costly redesigns, and Columbia University's educational experiments with order book building, which highlighted the challenges of sequential-to-parallel transitions, the path is rarely straightforward.

The REPARA project further underscores this, demonstrating that while tools can automate parallelization for heterogeneous platforms like FPGAs, unresolved dependencies in legacy code often result in suboptimal throughput compared to multi-core CPUs.

The clock speed disparity—modern Intel CPUs operating at 2-5 GHz versus Xilinx FPGAs at 100-500 MHz—combined with memory bottlenecks, branching overheads, and integration challenges, as seen in BittWare’s networking case studies, can render converted RTL slower than expected, especially without hardware-aware redesigns.

Yet, this article does not dismiss the proven success of FPGAs in HFT, where tailored designs from companies like Optiver and Jump Trading achieve sub-microsecond latencies, nor does it suggest they are unsuitable—rather, it illuminates the typical reasons why direct C++ migrations falter.

Popular HLS tools like Xilinx Vitis HLS and Intel oneAPI HLS Compiler, alongside Cadence Stratus, Mentor Catapult, and Synopsys Synphony, offer powerful frameworks to bridge software and hardware, but their effectiveness hinges on addressing issues like parallelization mismatches and resource constraints.

The vivid contrast between parallel and sequential processing flows, as explored in visual representations, reinforces the need for architectural rethinking.

Ultimately, success in this migration requires a strategic approach—leveraging expert optimization, reshaping code for FPGA strengths, and learning from past challenges—to transform potential pitfalls into opportunities for groundbreaking performance gains in the fast-evolving world of computational finance.

Citations

"FPGA HLS Today: Successes, Challenges, and Opportunities" (UCLA/Falcon Computing)” This is a research paper from the 2019 International Conference on Field-Programmable Technology, discussing HLS challenges, including finance-related applications.
Columbia University EECS E4840 Project - "High Frequency Trade Book Builder using FPGA". The project proposal and design documents are hosted on Prof. Stephen Edwards’ course page. Specific project details may require accessing the linked PDF or GitHub repository.
GitHub Repository (Kodoh/Orderbook) inspired by Columbia University design. Note: This open-source repository extends the Columbia FPGA order book concept, providing code and documentation.
Optiver FPL 2020 Presentation - "FPGAs and Low Latency Trading" by Williston Hayes. The presentation is part of the 2020 International Conference on Field-Programmable Logic and Applications (FPL). Check the program for archived slides or contact Optiver for access.
Imperial College London 2017 Paper on FPGA-Accelerated Order Book Updates. Published in the 2017 IEEE International Symposium on Field-Programmable Custom Computing Machines, detailing latency measurements on Stratix-V FPGAs.
NovaSparks FPGA Solutions Latency Data. NovaSparks provides commercial FPGA solutions for HFT, with latency specs (e.g., 880-1500ns) on their technology page.
REPARA Project Outcome - Real-Time GPU/FPGA Parallel System. The official REPARA website archives project outcomes, including demonstrations of embedded systems.
REPARA Project Use Case Validation. Details use cases in railway, healthcare, and robotics, with insights into FPGA applications.
UC3M YouTube Video on REPARA Achievements

REPARA Project Official Overview (FP7-ICT-2013-10) The project’s main website provides an overview, objectives, and consortium details.
REPARA Project Budget and Coordination Details. CORDIS (Community Research and Development Information Service) hosts EU project funding details.
REPARA Environmental Impact Context. General EU climate data context; specific 2007 CO2 stats may require archived reports.
REPARA Toolchain and FastFlow Extensions. Deliverables page includes tool documentation; check for D4.1 or similar reports.
REPARA Open-Source FPGA API Details open-source contributions, including FPGA APIs.
REPARA Verification and Optimization Tools. Overview of static/dynamic analysis tools developed.
REPARA Programming Model and Annotations Describes the unified programming model and C++11 attributes.
Achronix SemiEngineering Article - "Challenges In Using HLS For FPGA Design" Discusses HLS pitfalls based on industry experiences.
BittWare White Paper - "Comparing FPGA RTL to HLS C/C++ using a Networking Example". Search the BittWare resource page for the specific white paper on HLS vs. RTL.
General HLS vs. RTL Comparison Context. Xilinx documentation provides comparative insights; adjust version for 2025.