Home    Company News    How to Accurately Evaluate FPGA Power Consumption

How to Accurately Evaluate FPGA Power Consumption

Created on:2026-05-06 19:29

 

1 Overview

Before power consumption evaluation, we need to first understand the influencing factors to achieve more accurate assessment. FPGA power consumption is mainly divided into two parts: static power and dynamic power. Static power comes from transistor leakage current and is closely related to process node, junction temperature and power supply. Ambient temperature and heat dissipation conditions significantly affect junction temperature. Dynamic power is caused by switching activities, including logic toggle, clock network switching and I/O driving consumption. Its value is directly affected by signal toggle rate, operating frequency, load capacitance and supply voltage. The two together constitute the total power consumption, and the proportion of static power rises significantly in advanced processes.
There are two methods for power evaluation: Method one is to input data directly into the XPE spreadsheet based on experience and estimated resource consumption; Method two is to complete full project compilation with Vivado first. After compilation, a power report is generated, and an XPE file can also be exported to import into the XPE spreadsheet for adjustment. Method one is often difficult to be close to the actual situation, while power evaluated by Method two is much more accurate.
XPE is an Excel spreadsheet used to estimate FPGA chip power consumption. It is usually applied in the early design and pre-implementation stages to assist in selecting appropriate power supply components and thermal management solutions. XPE estimates power distribution according to design resource usage, signal toggle rates, I/O load, thermal environment and other factors combined with device models.
To estimate power as accurately as possible, it is necessary to input complete data conforming to actual conditions. Overly conservative modeling for certain design aspects or estimation with insufficient design information may lead to unreasonable results. In one sentence: do not set parameters arbitrarily; they must conform to actual conditions.
AMD has launched a new power evaluation tool called PDM, which can replace the XPE spreadsheet. Versal series can only use PDM for power evaluation, while other devices can still use XPE.
The following describes the detailed process of the two power evaluation methods. For the XPE tool, this article takes UltraScalePlus_XPE_2023_1_2.xlsm as an example for explanation.

2 XPE Spreadsheet Filling

The following preparations are required before filling:
• Target device, package type, speed grade
• Expected resources used (such as flip-flops, lookup tables, I/O ports, Block RAM, DCM or MMCM and PLL)
• Clock architecture and frequency 
• Estimated approximate signal toggle rate 
• Transceiver interface configuration and data rate 
• Operating thermal environment 
In general, provide as much available information about the design as possible, and leave other settings at default values. This strategy helps determine the power supply and heat dissipation requirements of the device

 

2.1 Fill in Device Information

This part is simple; fill in the specific model information according to actual applications. For the same FPGA model, devices marked with -L support lower core voltage (Vccint). For example, UltraScale+ -2L can reach 0.72V, so its power consumption is lower than the -2 version.

 

2.2 Fill in Thermal Environment

Junction Temperature

Junction temperature, default is a calculated value. It can automatically calculate the junction temperature during FPGA operation according to the set thermal environment parameters. After checking "User Override", manually set a maximum junction temperature that the FPGA is expected to operate at. The XPE system will automatically adjust the ambient temperature to meet the specified junction temperature requirement. This function is suitable for reverse deduction from a known or preset worst-case junction temperature value and defining ambient conditions to ensure the temperature does not exceed the threshold.

Ambient Temp

Ambient temperature. For air cooling, input the air temperature around the FPGA; for water cooling, input the water temperature.

Effective θJA(℃/W)

Thermal resistance from chip junction to ambient, default is a calculated value. It means how many degrees Celsius the junction temperature is higher than the ambient temperature per watt of power consumption. The lower the value, the better the heat dissipation efficiency. Junction Temperature = Ambient Temperature + Chip Power  ×  θJA. Influencing factors of thermal resistance: specific layer count (such as 4-layer or 8-layer), copper thickness (such as 2 oz), number of thermal vias, whether exposed pad is used, and whether a heatsink is installed. If the thermal resistance is calculated by other tools, it is recommended to check "User Override".

Airflow

Defines the air flow speed around the chip, unit LFM (Linear Feet per Minute). 0: natural convection / no fan; 250: low wind speed / weak air cooling; 500: medium wind speed / strong air cooling.

Heat Sink

Used to define the type of heatsink used in the design. Custom is preferred to input θSA provided by the heatsink manufacturer. The Small / Medium / High heatsink options provided by XPE are simplified thermal models built by Xilinx based on JEDEC standard thermal test conditions and typical commercial aluminum heatsinks, without providing exact dimensions and thermal resistance values. Based on experience summary: Small represents low-profile aluminum extrusion heatsink; Medium represents standard aluminum extrusion or copper-core heatsink; High represents high-performance finned / heat pipe heatsink.

Board Selection

Custom is preferred, input θJB simulated according to the designed PCB for the highest accuracy. The larger the board size, the larger the heat dissipation area and the lower the thermal resistance. If θJB is not available, select the closest option according to actual PCB size, unit in inches.

Board Layers

Fill in according to actual PCB layer count. The more layers, especially with large inner ground and power planes, the faster heat can spread laterally through the copper layer, effectively reducing thermal resistance.

2.3 Fill in POWER SUPPLY

Supply voltage significantly affects device power consumption. Vccint is the core voltage, which is automatically filled with the corresponding voltage value after selecting the device, but the supply voltage has a range. The automatically filled voltage is the center value of this range. Actual power supplies have certain errors. For example, conventional DC-DC power modules have an error range of ±3%. For conservative estimation, Vccint should be set to 0.85+0.85*3%=0.8755V (Vccint range specified in DS925 is 0.825~0.876V). In practical design, high-precision power modules are recommended to ensure stable FPGA operation and keep actual power consumption close to theoretical evaluation.

 

2.4 Fill in Clock

Fanout

Refers to the total number of all loads (including flip-flops, RAM, DSP, etc.) driven by the clock signal. It represents the total number of "endpoints" that the clock tree needs to drive logically.
Fanout is relatively easy to obtain; directly sum the synchronous elements in the resources of each module in the project. Note to distinguish different clock domains.

Fanout/Site

Refers to the average number of loads per clock region or per tile (Site) in physical layout. It reflects the physical distribution density of clock loads on the chip. If the layout is concentrated, the value is high; if scattered across the chip, the value is low.
A Site is the smallest independently configurable unit on the FPGA chip with a fixed physical location, containing specific resources such as LUT, register, DSP, BRAM, MMCM/PLL, IOB. A SLICE, a DSP, a BRAM, an MMCM/PLL, an IOB are all one Site.
Fanout/Site ranges from 1 to 16. One SLICE contains up to 16 flip-flops, while other types of Site basically have only one synchronous element, such as one DSP Site containing only one DSP. Therefore the range is 1~16. Fanout/Site is difficult to estimate in advance because it is highly related to Vivado placement and routing results. So the most reliable way is to export after project compilation. If there is no project, refer to the XPE default value for preliminary estimation.

Clock Buffer Enable

Indicates the percentage of time the clock is enabled. If there is no gating/enable control for the clock, set to 100%.

Slice Clock Enable

Refers to the "clock enable ratio", specifically the proportion of registers controlled by clock enable signals within a certain clock domain. This clock enable is not clock gating, but the CE pin of logic cells. Calculation formula: 100%-%_FFs_with_CE*%disable. For example, if 50% of FFs have CE enable signals and CE is low 50% of the time, the calculated value is 100%-50%*50%=75%.
The most accurate setting method is to export XPE after Vivado synthesis or implementation. Use default value if no project is available.

2.5 Fill in Logic

Set the quantities of LUTs as Logic, LUTs as Shift Registers, LUTs as Distributed RAMs and Registers according to module usage. Generally only one item is set per row and others remain 0. A small number of rows with both Registers and LUTs may appear in XPE files exported from Vivado.

Toggle Rate

The percentage probability of a signal toggling (0→1 or 1→0) per clock cycle, which has a great impact on dynamic power consumption. The higher the toggle frequency, the higher the power consumption. Default is 12.5% (toggling once every 8 cycles on average).
This is the most critical parameter in XPE that needs to be combined with design behavior estimation. It is closely related to logic design. For example, the least significant bit (LSB) of a 32-bit counter toggles every clock cycle with a toggle rate close to 100%; while the most significant bit (MSB) toggles very slowly. Toggle rate is difficult to estimate manually in large logic designs. The most accurate way is to import the .saif file output from behavioral simulation after Vivado placement and routing. When no better estimation is available, directly use the XPE default value of 12.5%, which is an average value counted by Xilinx based on a large number of customer designs and is reasonable without accurate data.

Routing Complexity

This parameter is introduced by XPE for UltraScale and UltraScale+ series devices to replace the old fanout parameter, so as to more accurately reflect the impact of physical routing on power consumption. Medium complexity is 8, high complexity is 10, extremely high complexity is 12, only used in severe routing congestion scenarios (especially high utilization and high-performance designs). Default is 10, representing an average level of "medium congestion" that covers most designs.
The accurate Routing Complexity value can be obtained after Vivado placement and routing; use default value in the early design stage

2.6 Fill in IO

This page is easy to set; configure I/O Settings according to actual usage. Note that the lower the drive strength (current value), the lower the power consumption.

Toggle Rate

Indicates toggle rate. For clock signals, the value is 200% because it toggles twice in one clock cycle. For data signals, obtain through simulation according to actual data types, such as LVDS interface collecting CMOS image sensor data, simulate sensor output timing and fill in pseudo-random image data, then export .saif file after simulation. If it cannot be determined, set to default 12.5%.

Data Rate

SDR means valid on single clock edge; DDR means valid on both rising and falling edges. In DDR mode, toggle rate can be calculated normally, and the tool will automatically multiply power consumption by 2; Clock means the signal is a clock. When this option is selected, toggle rate becomes fixed at 200% and cannot be set; Async means asynchronous signals not associated with any clock, such as external interrupts, key inputs or signals generated by other asynchronous logic. Assume the signal has 5,000,000 toggles per second, and one complete cycle requires two toggles (0→1→0), so the equivalent frequency is 2.5 MHz (5,000,000 / 2 = 2,500,000). At this time, fill 2.5 in Clock(MHz) column without filling toggle rate.

Output Enable
Only valid for output and bidirectional signals: this percentage indicates the proportion of time the pin is in output enable state within one working cycle.

Term Disable
Only valid for DCI and IOB33 OCT standards. It is the percentage of time the DCI or OCT termination resistor is disabled.
Output Load 
Output Load mainly depends on PCB trace parasitic capacitance and input capacitance of the receiving chip. Fill in the load capacitance value according to actual hardware design.
Method to obtain accurate values:
Check the datasheet of the receiving chip connected to FPGA (such as DDR4, sensor, ADC, etc.) to find the Input Capacitance (C_in) specification, which is the main part of Output Load.
Estimate PCB trace capacitance: roughly 1-2 pF per inch depending on PCB material and line width. Sum the receiving chip input capacitance and trace capacitance to get total load capacitance.
pre-emphasis
Pre-emphasis has the same purpose as transceiver pre-emphasis: compensate high-frequency signal loss on transmission lines to ensure correct sampling at the receiving end. DDR4 and higher DDR memory interfaces run up to 2400 Mb/s or even 3200 Mb/s. At such high rates, high-frequency components decay severely after passing through PCB traces, vias and connectors to DRAM chips. Pre-emphasis artificially enhances high-frequency signal energy at the transmitter to offset transmission loss and ensure an open eye diagram at the DRAM end that meets JEDEC setup/hold timing requirements. This feature is only used in high-rate scenarios.
DDR Interface
Directly use "Add Memory Interface" in XPE for configuration. It can automatically generate power consumption models for all I/O pins of DDR4 interface compliant with JEDEC standards, including data (DQ), address (ADDR), clock (CK), strobe (DQS), avoiding errors and omissions caused by manual input.

2.7 Fill in BRAM

For rough BRAM power estimation: if the specific BRAM type and working mode are not clear in the design, the best solution is to confirm the required capacity first, select the corresponding number of true dual-port RAMB 18K, set data width, use default 12.5% toggle rate, set Cascade Group Size to 1. Write Enable and Write Mode have little impact on power consumption. Of course, the most accurate way is to compile the project with Vivado and export data; the tool will fill all parameters accurately.
Cascade Group Size

Cascade can be understood as a dedicated hard-wired "data highway" between BRAM modules, which can save power consumption. In pre-UltraScale devices, if two BRAMs are used to form deeper memory depth, the output of the first BRAM needs to be connected to the input of the second through FPGA general routing resources, consuming logic resources and introducing routing delay. UltraScale BRAM integrates dedicated cascade input/output ports (CASDOUT and CASDIN), which can directly transmit data output from the previous BRAM to the next through internal hardware routing without passing through LUT or general routing.
Cascade mode can form deeper memory. Cascade Group Size indicates the number of cascaded BRAMs in one group. For example, if a module has 20 BRAMs with Cascade Group Size of 4, the tool calculates 20/4=5 groups with 4 cascaded BRAMs per group. Leave blank or set to 0 if no BRAM cascade is used.

Enable Rate

BRAM has dedicated enable pins. Port A is ENA, Port B is ENB. When no read/write operation is performed, pull it low as much as possible to significantly reduce BRAM power consumption. Enable Rate indicates the percentage of time BRAM is enabled. ENA is the enable control of the entire BRAM cell, different from read enable and write enable.

Toggle Rate
Toggle rate. Use default 12.5% for manual filling without mature experience reference. Export from Vivado after project compilation for highest accuracy.

Bit Width

The bit width of the instantiated BRAM, set according to actual usage. Readers unfamiliar with Bit Width can refer to documents such as 《ug573 UltraScale Architecture Memory Resources》.

 

2.8 Fill in DSP

Refer to documents such as 《7 Series DSP48E1 Slice User Guide (UG479)》 or 《UltraScale Architecture DSP Slice User Guide (UG579)》 for definitions of MULT, MREG, Pre-add and other parameters.

Toggle Rate
Toggle rate has a great impact on power consumption. The most accurate method is obtained from simulation; it can also be estimated: random data mode is usually 10%~30%, and the average value of logic-intensive design is about 12.5%.
The default toggle rate is calculated based on 27x18 bit width. Adjust proportionally according to actual bit width to ensure accurate power estimation. For example, if the expected toggle rate of 18x18 is 25%, scale it by 0.86 to 21.5% for input. Similarly, scale 12x12 configuration by 0.8. AMD does not provide scaling ratios for other bit widths, but approximate estimation can be combined with actual bit width based on official 18x18 and 12x12 examples.

2.9 Fill in CLKMGR

It is easy to fill in clock unit parameters. Just run Vivado, open the Clock Wizard module, set clock information, check each parameter on the "mmcm settings" page, and fill directly into the spreadsheet.

2.10 Fill in GT

It is easy to fill in GT unit parameters according to design, provided you have basic understanding of GT. Otherwise, filled parameters may not conform to actual conditions. Refer to documents 《7 Series FPGAs GTX/GTH Transceivers User Guide (UG476)》, 《7 Series FPGAs GTP Transceivers User Guide (UG482)》, 《UltraScale Architecture GTH Transceivers User Guide (UG576)》, 《UltraScale Architecture GTY Transceivers User Guide (UG578)》.


3 Export Power Data after Vivado Compilation

If we already have a complete project, power evaluation becomes simple. After Vivado completes project compilation, power data can be viewed in the Power column on the Project Summary page. However, this power data has certain deviation. Mainly the estimated dynamic power consumption may differ greatly from actual value, because Vivado uses guessed toggle rates which may not match actual conditions. Then how to obtain accurate toggle rate? The answer is to use SAIF file. After opening Implemented Design, click Report Power, import SAIF file on the Switching page, then regenerate the power report.
What is an SAIF file? SAIF stands for Switching Activity Interchange Format. It is essentially a log file recording the switching activity of each signal net and logic unit during simulation. For example, a clock enable signal is only valid 10% of the time on average, and a 32-bit data bus only changes 8 bits on average. These details are faithfully recorded in the SAIF file. This means the tool not only knows total power consumption, but also accurately points out which module and which net consumes power.
How to generate and use SAIF file?
First compile the module or full project, at least complete placement and routing. SAIF file is generated based on post-implementation netlist simulation, so this is a prerequisite.
Build simulation testbench for the evaluated module or full project, instantiate the top-level post-implementation netlist. The purpose is to inject input data close to actual scenarios to make the generated switching activity data closest to real conditions, or inject limit scenario data to simulate maximum power consumption.
Click project Setting, enter Simulation page, set runtime (not too short nor too long, enough to obtain typical switching activity data, e.g. half-frame data for 4K video input), set SAIF file save path, check saif_all_signals.

 

Run timing simulation

Vivado will prepare for a period of time before starting simulation, please wait patiently.
SAIF file is generated automatically after simulation runs to the set time.

Now import SAIF file. Click Report Power, set parameters on Environment page according to actual conditions (refer to Thermal Environment chapter for setting method), then import SAIF file on Switching page with the path set earlier in GUI.


After the above settings, you can also export XPE file on Output page for further adjustment in XPE spreadsheet. Vivado will not report errors if SAIF file is valid, and power report can be generated normally even without importing SAIF.
 

The following shows power consumption difference with and without SAIF file import. It can be seen static power is almost the same, but dynamic power differs significantly, especially logic power which can be several times different.
Power with SAIF imported:

 

Power without SAIF imported:

If you need to adjust parameters such as environment settings, open the XPE spreadsheet, enable macros, click Import File, and load the XPE file exported from Vivado as shown below.