The enterprise context: why small issues become large
Scale dynamics you cannot ignore
Power BI's strengths—fast in memory analytics, flexible modeling, self service—become liabilities without disciplined architecture. As data volumes and user concurrency grow, the interplay between storage modes (Import, DirectQuery, Dual), capacity limits, and gateway throughput turns minor configuration gaps into systemic incidents. What worked on a Pro workspace with a single dataset breaks when hundreds of semantic models share a Power BI Premium capacity and dozens of refreshes collide with interactive query peaks.
Canonical symptoms
- Refresh jobs stuck in "Evaluating" or "Queued" with sporadic failures.
- CPU saturation on Premium capacities, throttling, and visual timeouts at ~30–60 seconds.
- DirectQuery models showing > 1s per query on simple visuals; report filters "pinwheel" during business hours.
- Gateway clusters hitting connection limits, SSL renegotiation spikes, or memory pressure from many parallel mashup containers.
- Model size bloat after incremental refresh rollouts, even with partitions configured.
Background: the Power BI architecture that matters in troubleshooting
Semantic model lifecycle
A Power BI semantic model ("dataset") consists of tables, relationships, partitions, and measures. Import tables are compressed in memory via VertiPaq; DirectQuery tables push filters to source systems. During refresh, Power Query (M) mashup engines materialize partitions, then the tabular engine encodes columns into dictionaries and segments. Each stage has distinct failure modes.
Capacities, workspaces, and throttling
In Premium, capacities enforce CPU and memory budgets across workspaces. When loads exceed budgets, scheduling delays and interactive throttling occur. Refresh, query, and background tasks compete; noisy neighbors are real.
Gateways and mashup containers
The on premises data gateway hosts mashup containers that execute M queries for refresh and DirectQuery. Container count, credential isolation, query folding, and source concurrency determine how much pressure reaches data sources.
Storage modes and composite models
Import offers speed but consumes memory. DirectQuery preserves source freshness but adds latency and source dependency. Dual mode enables aggregation tables in Import with detail in DirectQuery. Composite models chain these modes together—and misconfigurations amplify latency.
Diagnostics: instrumentation before intervention
Collect the right evidence
- Capacity metrics: CPU seconds, memory evictions, query durations, refresh wait times, and background queue length.
- Gateway logs: concurrent connections, mashup container counts, query durations by data source, SSL/TLS errors.
- Model level telemetry: partition refresh times, rows processed, compression ratios, and segment counts.
- Source side evidence: execution plans, tempdb/warehouse spill indicators, and throttling codes from SaaS APIs.
Establish a clean baseline
Choose a representative but controllable window (off hours + synthetic load). Run a single dataset refresh in isolation. Then add concurrent refreshes and concurrent report sessions in steps. The slope of failure (first queueing, then throttling, then timeouts) tells you whether the bottleneck is capacity CPU, gateway concurrency, or source pushdown.
Trace refresh stages explicitly
Enable detailed Power Query diagnostics during a one off run to determine where time is spent: source fetch, folding evaluation, data type transformations, or VertiPaq encoding. This prevents premature optimizations in the wrong layer.
Root causes and how to recognize them
1) Query folding breaks mid pipeline
M transformations that look harmless ("Add Column", "Replace Errors") can turn a folded query into a local filter, forcing full scans through the gateway and into the capacity. Symptom: refresh steps are fast until one step triggers huge data download; "View Native Query" goes grey.
2) High cardinality + low compression columns
VertiPaq shines when columns compress well. Free text, GUIDs, and excessively granular timestamps explode dictionary sizes and segment counts. Symptom: memory footprint and refresh times increase nonlinearly with rows; repeated DAX scans on those columns are slow.
3) Suboptimal relationships and bi directional filters
Bi directional filtering across large tables increases filter propagation cost. Symptom: simple slicers trigger multi table scans; CPU spikes during cross filtering; visuals time out under concurrent load.
4) DirectQuery round trips and "chatty" visuals
Each visual can generate multiple SQL statements. With many tiles and tooltips, DirectQuery sends a burst per render. Symptom: report opens cause a storm of small queries; source CPU is low but latency is high due to network and queueing.
5) Incremental refresh partitions misaligned
Partitions set with incorrect ranges or detected changes materialize far more data than needed. Symptom: "Detect data changes" refreshes entire years; constant merging/reencoding causes long refresh windows and model bloat.
6) Gateway cluster saturation
Default mashup container limits and HTTPS negotiation overhead cap throughput. Symptom: gateway CPU is moderate but refresh durations elongate; logs show many containers created/destroyed; TLS renegotiation spikes.
7) Capacity noisy neighbors
Multiple teams scheduling refreshes on the hour pile onto the same capacity. Symptom: top of the hour latency spikes, scheduled refresh queueing, and random visual timeouts for unrelated reports.
Step by step fixes with precise techniques
Fix A: restore query folding and push work to the source
Refactor M to preserve folding as late as possible. Replace row by row transformations with set operations; push data type changes upstream; avoid custom functions inside Table.AddColumn on large tables.
// M example: foldable filter then projection let Source = Sql.Database("corp-sql", "SalesDW"), Fact = Source{[Schema="dbo",Item="FactSales"]}[Data], Filtered = Table.SelectRows(Fact, each [OrderDate] >= #date(2024,1,1)), Columns = Table.SelectColumns(Filtered,{"OrderDate","ProductKey","StoreKey","NetAmount"}), Typed = Table.TransformColumnTypes(Columns,{{"NetAmount", type number}}) in Typed
Validate folding via "View Native Query". If it greys out after a step, move that step to the source (view/SQL), or perform it in DAX after import.
Fix B: reduce cardinality and dictionary pressure
Split high cardinality columns, normalize, or hash them to integers for relationships. For timestamps, store a date surrogate key and a time bucket; keep raw timestamps only in DirectQuery detail tables as needed.
// M example: timestamp bucketing for better compression let AddDate = Table.AddColumn(Fact, "DateKey", each Date.ToText([OrderDateTime], "yyyyMMdd"), type text), AddHour = Table.AddColumn(AddDate, "Hour", each Time.Hour(Time.From([OrderDateTime])), Int64.Type), Remove = Table.RemoveColumns(AddHour,{"OrderDateTime"}) in Remove
Fix C: replace bi directional filters with explicit measures
Prefer single direction filters with well designed star schemas. Where cross filtering is required, use CROSSFILTER() in measures or role specific virtual relationships via TREATAS() to avoid global bi directional settings.
-- DAX: virtual relationship Sales Amount Visible := VAR SelProds = VALUES('Product'[ProductKey]) RETURN CALCULATE([Sales Amount], TREATAS(SelProds, 'Fact'[ProductKey]))
Fix D: turn DirectQuery sprawl into purpose built aggregates
Introduce Import based aggregation tables on key grains (daily, product, region) and mark them as aggregation tables mapped to DirectQuery detail. Configure Dual storage for shared dimensions. This offloads most visual queries to in memory scans while preserving drill through to detail.
// DAX: aggregation mapping concept (defined in model properties) -- Aggregation table: FactSales_Agg (Import) -- Detail table: FactSales (DirectQuery) -- GroupBy: DateKey, ProductKey, StoreKey -- PreSum: NetAmount
Fix E: recalibrate incremental refresh
Partition by date with realistic windows. Set "Store rows" to business retention needs and "Refresh rows" to the smallest period with change risk. Use "Detect data changes" only when a reliable LastModified column exists and is indexed in the source.
// M parameters for incremental refresh RangeStart = #datetime(2025,1,1,0,0,0); RangeEnd = #datetime(2025,9,1,0,0,0); Filtered = Table.SelectRows(Fact, each [OrderDate] >= RangeStart and [OrderDate] < RangeEnd)
Fix F: scale and harden the gateway cluster
Deploy multiple gateway nodes behind the cluster; pin gateway to dedicated compute with fast disks; increase mashup container limits prudently and enable persistent connections to sources. Align Windows power settings and NIC offload options to avoid throttling under load.
# Example: raise mashup container cap (gateway config) MashupDisableContainerAutoCleanup=false MashupMemoryLimitInMB=4096 MashupGCOnMemoryPressure=true
Fix G: capacity scheduling and isolation
Stagger refresh schedules to avoid top of the hour surges. Reserve a dedicated capacity for critical semantic models or use workload settings to cap background refresh CPU. Separate heavy dataflows from interactive report workspaces to reduce contention.
Fix H: DAX measure optimization where it counts
Remove implicit filters inside iterators; prefer SUMX over FILTER+SUM when the filter can be folded into relationships; precompute expensive calculated columns in Power Query where possible, but beware of folding.
-- Slow pattern Sales LY Slow := CALCULATE([Sales Amount], SAMEPERIODLASTYEAR('Date'[Date])) -- Faster with date offsets table Sales LY := CALCULATE([Sales Amount], DATEADD('Date'[Date], -1, YEAR))
Fix I: reduce visual chatter
Limit visuals per page; disable unnecessary cross highlighting; pre aggregate with calculation groups for common time intelligence rather than per visual "quick measures". Test with Performance Analyzer to count queries per interaction.
Fix J: semantic model hygiene via XMLA
Connect with XMLA endpoints and verify segment sizes, encoding strategies, and dictionary cardinalities. Repartition large tables to keep per partition row counts within optimal compression ranges (e.g., 5–20 million depending on column widths).
Deep dives: practical examples
Diagnosing a refresh that stays in "Evaluating" for hours
Pattern: A 500M row fact table with multiple Add Column steps and a custom error handler lost folding. The gateway pulled raw data and evaluated row by row. Fix: move error handling into the source view, keep only type conversions and filters in M, and split the fact into yearly partitions with incremental refresh. Result: refresh time dropped from 9 hours to 70 minutes; capacity CPU stabilized.
DirectQuery report with 6+ second slicer latency
Pattern: A composite model used DirectQuery for a large detail table and bi directional relationships to a "tags" lookup. Each interaction sent 8+ SQL statements with string joins on high cardinality columns. Fix: add Import aggregations for the hot measures, convert "tags" into many to many via a bridging table and use TREATAS in measures. Result: slicer latency reduced to < 700ms; source CPU remained under 30%.
Gateway cluster that "works at night, fails by day"
Pattern: Daytime concurrency caused the gateway to exceed default mashup container caps and recycle frequently, adding TLS negotiation overhead. Fix: scale out to three nodes, increase container caps moderately, enable HTTP/2 to sources that support it, and pin refresh schedules away from peak interactive windows. Result: refresh success rate reached > 99.5% with consistent durations.
Design principles for the long term
Model to the star
Adopt a disciplined star schema with conformed dimensions. Avoid snowflaking unless necessary. Keep relationships single direction and many to one; use TREATAS selectively for advanced scenarios.
Choose storage intentionally
Default to Import for core analytic grains with Dual dimensions. Use DirectQuery only when data freshness requirements mandate it or when data sovereignty prevents import. If using DirectQuery, design with aggregations from the start.
Partition for refresh, not for query
Choose partition keys that align with change patterns (e.g., DateKey). Keep partition counts manageable; too many tiny partitions increase metadata overhead and slow refresh orchestration.
Separate concerns by workspace and capacity
Place heavy dataflows and ELT into a separate workspace/capacity from interactive reporting. Apply workload settings to bound background processes. This isolates user experience from refresh noise.
Automate governance
Use deployment pipelines and source control (model JSON via Tabular Editor or similar) to track changes. Enforce linting rules: no bi directional relationships without approval, no DirectQuery without aggregations, and no M steps that break folding on large sources.
Performance optimization checklist
- Confirm query folding for every large table; document any non foldable steps and justify them.
- Reduce high cardinality columns via surrogate keys, bucketing, or moving to DirectQuery detail.
- Replace bi directional relationships with explicit measures or bridge tables.
- Introduce Import aggregations mapped to DirectQuery detail; test hit rates with Performance Analyzer.
- Recalibrate incremental refresh: realistic windows, correct change detection column, indexed at source.
- Scale gateway clusters; monitor mashup container counts and enable persistent connections.
- Stagger refresh schedules; avoid top of the hour collisions; align with capacity maintenance windows.
- Refactor DAX: leverage calculation groups and avoid row context where filter context suffices.
- Audit capacity metrics weekly: CPU seconds, evictions, query durations, refresh queue length.
- Use XMLA to inspect segment sizes and reprocess partitions that drift from optimal compression.
Pitfalls and anti patterns
- Allowing self service "quick measures" to proliferate without review; they often hide expensive FILTERs.
- Modeling many to many relationships with bi directional filters across large tables.
- Incremental refresh with "Detect data changes" pointing to a non deterministic column.
- Relying on DirectQuery to "solve" memory pressure without aggregations.
- Scheduling all refreshes on the hour because it is "easier to remember".
- Embedding M custom functions in the hot path of billion row tables.
Reference implementation snippets
1) Robust "as of" measure without bi directional filters
-- DAX: inventory "as of" with virtual relationship Inventory As Of := VAR d = MAX('Date'[Date]) RETURN CALCULATE([Inventory Qty], FILTER(ALL('Date'), 'Date'[Date] <= d))
2) Safe time intelligence with a marked date table
-- DAX: YoY with calculation group support Sales YoY := DIVIDE([Sales Amount] - [Sales LY], [Sales LY])
3) M pattern for late type conversion after folding
// Convert after filter/projection to keep folding let Source = Sql.Database("corp-sql","OpsDW"), T = Source{[Schema="dbo",Item="FactOps"]}[Data], Step1 = Table.SelectRows(T, each [EventDate] >= #date(2024,1,1)), Step2 = Table.SelectColumns(Step1,{"EventDate","SiteKey","MetricA"}), Typed = Table.TransformColumnTypes(Step2,{{"MetricA", Int64.Type}}) in Typed
4) XMLA driven re partitioning plan (concept)
// Pseudocode: keep partitions ~10M rows for each table in model: if table.Rows > 200M then create monthly partitions else create quarterly partitions
Operational guardrails
Alerting and SLOs
Define SLOs for refresh completion times, report TTFB (time to first byte), and capacity CPU headroom. Alert on trends, not only spikes: a rising 7 day average refresh time usually precedes failures.
Change management
Any model change that alters table cardinality, relationships, or storage mode must go through a performance gate. Capture a before/after snapshot of memory size and Performance Analyzer traces; do not promote if either regresses beyond agreed budgets.
Cost control
Size capacities for peak plus margin; revisit when new dataflows or models join. Promote aggregations to reduce DirectQuery cost at the sources and capacity compute during business hours.
Conclusion
Enterprise Power BI reliability hinges on respecting how the platform moves and shapes data: folding in M, encoding in VertiPaq, filter propagation in DAX, and concurrency across gateways and capacities. The failures you see—stalled refreshes, slow visuals, and capacity throttling—are predictable byproducts of broken folding, high cardinality, bi directional designs, DirectQuery chatiness, and uncoordinated scheduling. By restoring folding, reducing cardinality, eliminating global bi directional filters, introducing Import aggregations, recalibrating incremental refresh, scaling gateways, and isolating workloads on Premium capacities, you build a system that stays fast and stable as data and teams grow. Make these fixes systematic—codified in pipelines, linting, and reviews—and Power BI becomes an engine for insight rather than an operational fire drill.
FAQs
1. How do I prove a refresh is slow due to broken folding rather than VertiPaq encoding?
Capture Power Query diagnostics and compare "Data Retrieval" vs "Evaluation" times. If retrieval dominates and "View Native Query" is disabled after a specific step, the issue is folding; refactor that step or move it upstream.
2. When should I choose DirectQuery over Import?
Only when strict freshness or data sovereignty requires it. If interactivity matters, pair DirectQuery with Import aggregations and Dual dimensions so most queries hit in memory data while detail remains on the source.
3. Why did incremental refresh increase my model size?
Likely partition ranges are too granular or "Detect data changes" is forcing unnecessary reprocessing. Re verify the change detection column and adjust "Refresh rows" to the minimal moving window that truly changes.
4. What's the fastest way to reduce slicer latency on a slow DirectQuery page?
Cut visuals per page, disable unneeded cross highlighting, and add Import aggregations for the measures used by slicers. Validate aggregation hit rates and ensure dimensions used in slicers are Dual or Import.
5. How can I prevent top of the hour capacity spikes?
Stagger refresh schedules using randomized offsets, move heavy dataflows to a separate capacity, and cap background CPU with workload settings. Consider a "refresh ring" policy that assigns time windows by team to avoid stampedes.