Detecting Kubernetes Nodes Running Only DaemonSet Pods – A Deep Dive
A real-world story about PromQL struggles, Helm templating, alert design, and operational savings by Dedico Servers.
Executive Summary
At Dedico Servers, we specialize in building efficient, cost-optimized Kubernetes clusters.
In this article, we engineer a Prometheus-based alert to detect nodes running only DaemonSet pods — an operational and financial risk.
By tackling this hidden inefficiency, we help our clients save thousands of dollars annually while improving the resilience of their clusters.
Table of Contents
- Background
- Why It Matters: Operational Impact and Cost Waste
- The Goal
- Step-by-Step Journey
- Final Helm Template
- Conclusion
- About Dedico Servers
Background
In Kubernetes, DaemonSets ensure that a pod runs on every node.
However, sometimes nodes end up running only DaemonSet pods, without any business-critical workloads, indicating:
- Scheduling issues
- Misconfiguration
- Resource shortages
- Node taints blocking scheduling
Detecting DaemonSet-only nodes improves both operational excellence and cost optimization.
Why It Matters: Operational Impact and Cost Waste
Operational Risks
- Nodes sitting idle lead to pod scheduling bottlenecks.
- Masked scaling issues or cluster misconfigurations.
- Poor failover resilience due to fragmented node pools.
Cost Waste 💸
Each Kubernetes node typically costs $20–$200+ per month depending on its instance type.
Nodes running only system pods (e.g., kube-proxy
, metrics-server
) without any real workloads:
- Waste money on unused compute.
- Increase cloud bills without adding business value.
- Drain budgets silently.
Example
10 idle nodes × $100/month = $12,000 wasted per year.
The Goal
Create a PrometheusRule that:
- Detects nodes only running DaemonSet pods.
- Ignores special nodes based on labels.
- Is fully dynamic and templated with Helm.
- Sends Slack-friendly, human-readable alerts.
Step-by-Step Journey
1. Finding the Right Metrics
We explored:
kube_pod_owner{owner_kind="DaemonSet", pod="ebs-csi-node-cqwrx"}
Problem
kube_pod_owner
shows the pod’s owner but not the node.
2. Mapping Pods to Nodes
Solution
Use kube_pod_info
to find the node where a pod is running.
Join kube_pod_owner
with kube_pod_info
:
kube_pod_owner{owner_kind!="DaemonSet"}
* on (pod, namespace)
group_left(node)
kube_pod_info
✅ Now we could associate pod ownership and node scheduling.
3. Solving Many-to-Many Matching Errors
Problem
Same pod names across different namespaces caused matching errors.
Solution
Always join on both pod
and namespace
:
* on (pod, namespace)
4. Handling Absence Detection
Problem
Initial attempts using absent()
failed because missing data is invisible to Prometheus calculations.
Solution
Find a better way to detect nodes without non-DaemonSet pods.
5. Using Set Logic Correctly
Solution
Use unless
for set difference:
(
count by (node) (kube_node_info)
)
unless
(
count by (node) (
kube_pod_owner{owner_kind!="DaemonSet"}
* on (pod, namespace) group_left(node)
kube_pod_info
) > bool 0
)
✅ This correctly finds DaemonSet-only nodes.
6. Enriching Node Labels
We wanted to display:
node/role
node/env
instance-type
Solution
Join using label_kubernetes_io_hostname
:
* on (node)
group_left(label_node_env, label_node_role)
kube_node_labels{label_kubernetes_io_hostname!=""}
7. Ignoring Special Nodes Dynamically
Certain nodes (e.g., system nodes) are expected to have only DaemonSets.
Solution
Templating ignoreLabels
dynamically in Helm:
ignoreLabels:
label_node_role: core-services
label_node_env: infra
Rendering PromQL:
unless on(node) kube_node_labels{label_node_role="core-services", label_node_env="infra"}
8. Formatting Alerts for Slack
To make alerts Slack-friendly, we wrapped important parts in backticks:
summary: "DaemonSet-only node detected: `{{ $labels.node }}`"
description: |
*Problem*: Node `{{ $labels.node }}` is running only DaemonSet pods.
*Node Labels*:
- `node/env`: `{{ $labels.label_node_env }}`
- `node/role`: `{{ $labels.label_node_role }}`
*Impact*: No regular workloads are running on this node.
✅ Easy-to-read, professional Slack alerts.
Final Helm Template
{{- $ignoreLabels := .Values.ignoreLabels }}
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: daemonset-only-nodes
spec:
groups:
- name: daemonset-only-nodes.rules
rules:
- alert: DaemonSetOnlyNodeDetected
expr: |
(
(count by (node) (kube_node_info))
unless
(count by (node) (
kube_pod_owner{owner_kind!="DaemonSet"}
* on (pod, namespace) group_left(node)
kube_pod_info
) > bool 0)
)
* on (node)
group_left(label_node_env, label_node_role)
kube_node_labels
{{- if $ignoreLabels }}
!= ""
{{- range $key, $value := $ignoreLabels }}
unless on(node) kube_node_labels{ {{ printf "%s=\"%s\"" $key $value }} }
{{- end }}
{{- end }}
for: 5m
labels:
severity: warning
annotations:
summary: "DaemonSet-only node detected: `{{`{{ $labels.node }}`}}`"
description: |
*Problem*: Node `{{`{{ $labels.node }}`}}` is running only DaemonSet pods.
*Node Labels*:
- `node/env`: `{{`{{ $labels.label_node_env }}`}}`
- `node/role`: `{{`{{ $labels.label_node_role }}`}}`
*Impact*: No regular workloads are running on this node.
Conclusion
Building complex Kubernetes monitoring with Prometheus is not always straightforward.
However, with the right techniques and templates, you can detect hidden inefficiencies, prevent operational risks, and save thousands annually.
Key takeaways:
- Understand PromQL joins and label matching.
- Prefer
unless
over-
for set exclusion. - Template dynamic ignore rules via Helm.
- Format alerts for maximum Slack readability.
- Idle nodes cost real money — observability saves budgets.
About Dedico Servers
At Dedico Servers, we build resilient, efficient, and cost-optimized cloud and Kubernetes infrastructure.
If you’d like help optimizing your Kubernetes clusters, monitoring stack, or reducing your cloud costs, contact us today.
---
✅ **This will now render beautifully** — code blocks look clean, Problems/Solutions are properly separated, easy to read.
---
Would you like me to now also generate a small **architecture diagram**? (Pods → Owners → Nodes → Prometheus → Alertmanager → Slack)
Would look great at the top of the blog post 🚀✨
Let’s make it next-level if you want! 🎯