An Introduction to Xilinx FPGA Memory Primitives
Practical Insights into BlockRAM, Distributed RAM, and UltraRAM usage in your designs.
When designing FPGA-based systems, choosing the right type of memory is crucial for achieving optimal performance and resource utilization. Xilinx FPGAs offer three primary types of on-chip memory: BlockRAM, Distributed RAM, and UltraRAM. Each has its unique characteristics, advantages, and use cases. In this article, we’ll explore the differences between these memory types, helping you make informed decisions for your FPGA designs.
In most cases the Xilinx tools (Vivado Synthesis) will infer the most optimal type of memory based on the user code. Synthesis attributes are available to force the tool to pick a specific memory primitive.
BlockRAM (BRAM)
BlockRAM is a dedicated memory resource available in Xilinx FPGAs. It consists of large, configurable memory blocks that can be used for various applications.
Key Features:
- Size and Configuration: BlockRAMs are typically 36 Kb in size and can be configured as single-port or dual-port memory. They support various data widths, such as 16Kx1, 8Kx8, and 4Kx4. These memory blocks can be aggregated together to support arbitrary widths and depths. BlockRAMs can be configures as Simple Dual Port (SDP) memories or True Dual Port (TDP) memories.
- Performance: BlockRAMs offer high performance with low latency, making them ideal for applications requiring fast data access.
- Use Cases: Commonly used for implementing FIFOs, large buffers, and memory-intensive algorithms.
Distributed RAM
Distributed RAM utilizes the configurable logic blocks (CLBs) in the FPGA to implement small memory structures. It leverages the lookup tables (LUTs) within the CLBs to create memory elements.
Key Features:
- Size and Configuration: Distributed RAM is typically smaller than BlockRAM and is implemented using LUTs. Most modern Xilinx FPGAs use 6-input LUTs (LUT6), which can be configured to store 64 bits of data.
- Flexibility: Highly flexible and can be used to create small, distributed memory structures throughout the FPGA. The access latency for distributed RAM is typically 1 clock cycle for read operations, making it suitable for applications requiring quick access to small amounts of data.
- Use Cases: Ideal for small buffers, coefficient storage, and state machines.
UltraRAM (URAM)
UltraRAM is a large, high-capacity memory resource available in Xilinx UltraScale+ FPGAs. It is designed to provide significant on-chip memory capacity, reducing the need for external memory.
Key Features:
- Size and Configuration: UltraRAM blocks are significantly larger than BlockRAM, typically 288 Kb each. They can be cascaded to create very large memory arrays.
- Performance: UltraRAM offers high performance with configurable pipeline stages to optimize timing. It is designed for applications requiring large, high-speed memory.
- Use Cases: Ideal for applications needing large data storage, such as video processing, deep learning, and large data buffers.
Practical Design Considerations
1. Floorplanning
BlockRAMs and UltraRAMs are placed on the Xilinx FPGA fabric as columns of memory, which are fixed in place.
If a certain RTL module requires fast access to a BlockRAM or UltraRAM-based memory block, it must be placed closer to the memory column, or sufficient pipelining must be provided to ensure timely access without causing timing issues.
Distributed RAMs consume fabric LUTs, which could otherwise be used for implementing other design logic. However, LUTs are ubiquitous and typically do not encounter the floorplanning issues mentioned above.
2. Initialization
BlockRAMs and Distributed RAMs can be initialized with specific values while loading the bitstream onto the FPGA.
In contrast, UltraRAMs do not support initialization during the bitstream loading process. This design choice keeps UltraRAM smaller per bit and avoids significantly increasing the bitstream size. If you need to initialize UltraRAM with specific data, you must do so manually after the FPGA has been configured, involving writing the desired data to the UltraRAM using your design logic.
3. Clocking
UltraRAM has a single clock input and is fully synchronous. Unlike BlockRAM, it does not support independent clock interfaces directly. Distributed RAMs can be configured as True Dual Port memories, providing more flexibility in clocking.
4. Pipelining Considerations in BlockRAMs
BlockRAMs have optional output registers optimized for Clock-to-Q times. These optional output registers improve design performance by eliminating routing delay to the configurable logic block (CLB) flip-flops for pipelined operation. An independent clock and clock enable input is provided for these output registers.
💡Recommendation 💡
Use two pipeline registers on the BRAM read side. This allows the tool to use the register inside the BRAM tile, while the second register is implemented in the fabric to provide some elasticity for longer paths. Additional pipeline registers might be needed if the destination logic is placed further away from the BlockRAM tile.

5. Synthesis Attributes
The ram_style
attribute in Xilinx FPGAs is a synthesis directive to guide the synthesis tool what type of memory to infer during synthesis. This can be used to override the default tool behavior or to document what memory type is intended to be used by the designer.
It is important to understand that ram_style
is a soft attribute.
This needs to be viewed of more as a guidance to the tool rather than strict enforcement. This allows the designer to express intent, while allowing the synthesis tool enough freedom to optimize the design.
// BlockRAM
(* ram_style = "block" *) reg [7:0] bram [0:255];
// Distributed RAM
(* ram_style = "distributed" *) reg [7:0] dram [0:255];
// UltraRAM
(* ram_style = "ultra" *) reg [71:0] uram [0:4095];
Synthesis Attributes for FPGA Memories
If strict enforcement of a certain type of memory is required, then it needs to be instantiated, not inferred.
6. Other Considerations
- UltraRAM can only support a read or a write operation per port, per cycle.
- A True Dual Port memory cannot be generated using UltraRAMs. The behavior of an UltraRAM-based memory can be viewed as a superset of a BRAM-based Simple Dual Port memory.
- UltraRAM data width is fixed at 72 bits, whereas BlockRAMs offer more flexibility with choices of data widths (1, 2, 4, 9, 18, 36, 72).