Understanding Metric Cardinality in New Relic
What Is Metric Cardinality?
Cardinality refers to the number of unique combinations of metric dimensions (e.g., host, region, pod_id). High-cardinality metrics—such as custom labels for each request, dynamic IDs, or session-level dimensions—can generate millions of unique time-series entries, overwhelming New Relic's processing engine.
Architectural Impacts
In large Kubernetes or serverless architectures, auto-generated labels (e.g., pod names, build hashes) can create rapid metric bloat. These over-specified metrics slow down NRQL queries, break dashboards, and inflate ingest costs. Worse, they may cause dropped data when cardinality limits are reached—without obvious warnings.
Diagnostic Strategy
Identify High-Cardinality Metrics
Use the New Relic Data Explorer to inspect metrics with the highest number of time series. For example:
SELECT count(unique(metric.name)) FROM Metric FACET metric.name SINCE 30 minutes ago
Also use the Telemetry Data Platform to audit custom metric ingest volume and granularity.
Analyze NRQL Query Performance
If dashboards take longer than a few seconds to load, inspect slow NRQL queries with the Query Analysis
tool:
SELECT count(*) FROM NrQuery WHERE duration > 2000 FACET query SINCE 1 hour ago
Check Log Correlation Tags
Log forwarding agents may unintentionally inject high-cardinality tags (e.g., UUIDs, timestamps) into every trace or span. Review your FluentBit or Logstash configuration for dynamic label interpolation.
Common Pitfalls and Anti-Patterns
Overuse of Dynamic Labels
Attaching request IDs, user IDs, or function hashes to metrics may seem useful but severely degrades performance. These should be handled via logs or traces, not as metric dimensions.
Improper Custom Instrumentation
Custom metrics via APIs like recordCustomEvent
or customMetrics.recordMetric
can unintentionally explode cardinality if used inside high-frequency loops or error handlers.
newrelic.recordCustomEvent("Request", { userId: generateUUID(), status: "500" });
Abusing NRQL for Real-Time Analytics
NRQL is not optimized for real-time stream processing. Overloading dashboards with thousands of facets (e.g., all hostnames or endpoints) may lead to query timeouts.
Step-by-Step Remediation Plan
Step 1: Audit Metric Dimensions
Review custom metrics and eliminate unnecessary or dynamic attributes. Aggregate where possible:
// Bad {"endpoint":"/api/v1/user/12345", "status":"200"} // Good {"endpoint_group":"/api/v1/user", "status":"200"}
Step 2: Filter at the Source
Configure telemetry SDKs and agents (e.g., OpenTelemetry, New Relic APM) to drop labels below certain thresholds using regex or sampling policies.
Step 3: Group Metrics with Controlled Facets
Use FACET
clauses strategically and avoid querying across volatile dimensions. Instead of:
SELECT average(duration) FROM Transaction FACET userId SINCE 1 hour ago
Do:
SELECT average(duration) FROM Transaction FACET statusCode SINCE 1 hour ago
Step 4: Implement Alerts on Ingest Volume
Create proactive alerts for when your metric ingest rate or cardinality exceeds thresholds:
SELECT rate(count(*), 1 minute) FROM Metric WHERE metricName LIKE 'custom%' FACET metricName
Step 5: Use Dimensional Metric APIs
Switch to dimensional APIs that allow better control over tag granularity and context propagation across telemetry pipelines.
Best Practices
- Limit custom metric dimensions to under 10 per metric
- Batch and debounce custom events instead of sending per-request
- Standardize labeling conventions for services and endpoints
- Use logs and traces for high-cardinality data, not metrics
- Regularly audit ingest rates and query latency via New Relic Insights
Conclusion
Metric cardinality issues in New Relic often go unnoticed until performance or cost alarms go off. By applying disciplined instrumentation practices, auditing label usage, and filtering unnecessary dimensions, teams can regain control over observability data. Cardinality awareness is essential not only for performance, but also for long-term maintainability of dashboards, alerts, and data pipelines. Make monitoring reliable by designing it like software: intentional, lean, and observable itself.
FAQs
1. How many metric dimensions are too many in New Relic?
As a rule of thumb, keep dimensions under 10 per metric. Anything above that increases the risk of time-series explosion.
2. What are symptoms of cardinality problems in New Relic?
Slow dashboards, dropped metrics, NRQL timeouts, and high ingest costs often point to excessive cardinality.
3. Can I retroactively clean up high-cardinality data?
Not directly. You must adjust your telemetry pipeline and redeploy services to stop future ingestion. Past data may persist until TTL expiry.
4. Does OpenTelemetry solve this issue automatically?
No. OpenTelemetry provides more control, but developers must still configure label filtering and sampling policies manually.
5. Are traces immune to cardinality issues?
No. Spans with high-cardinality attributes can also overwhelm trace stores. Use attribute filtering in collectors and exporters.