Detecting Kubernetes Nodes Running Only DaemonSet Pods – A Deep Dive

A real-world story about PromQL struggles, Helm templating, alert design, and operational savings by Dedico Servers.

Executive Summary

At Dedico Servers, we specialize in building efficient, cost-optimized Kubernetes clusters.
In this article, we engineer a Prometheus-based alert to detect nodes running only DaemonSet pods — an operational and financial risk.

By tackling this hidden inefficiency, we help our clients save thousands of dollars annually while improving the resilience of their clusters.

Background

In Kubernetes, DaemonSets ensure that a pod runs on every node.
However, sometimes nodes end up running only DaemonSet pods, without any business-critical workloads, indicating:

Scheduling issues
Misconfiguration
Resource shortages
Node taints blocking scheduling

Detecting DaemonSet-only nodes improves both operational excellence and cost optimization.

Why It Matters: Operational Impact and Cost Waste

Operational Risks

Nodes sitting idle lead to pod scheduling bottlenecks.
Masked scaling issues or cluster misconfigurations.
Poor failover resilience due to fragmented node pools.

Cost Waste 💸

Each Kubernetes node typically costs $20–$200+ per month depending on its instance type.

Nodes running only system pods (e.g., kube-proxy, metrics-server) without any real workloads:

Waste money on unused compute.
Increase cloud bills without adding business value.
Drain budgets silently.

Example

10 idle nodes × $100/month = $12,000 wasted per year.

The Goal

Create a PrometheusRule that:

Detects nodes only running DaemonSet pods.
Ignores special nodes based on labels.
Is fully dynamic and templated with Helm.
Sends Slack-friendly, human-readable alerts.

Step-by-Step Journey

1. Finding the Right Metrics

We explored:

kube_pod_owner{owner_kind="DaemonSet", pod="ebs-csi-node-cqwrx"}

Problem

kube_pod_owner shows the pod’s owner but not the node.

2. Mapping Pods to Nodes

Solution

Use kube_pod_info to find the node where a pod is running.

Join kube_pod_owner with kube_pod_info:

kube_pod_owner{owner_kind!="DaemonSet"}
* on (pod, namespace)
group_left(node)
kube_pod_info

✅ Now we could associate pod ownership and node scheduling.

3. Solving Many-to-Many Matching Errors

Problem

Same pod names across different namespaces caused matching errors.

Solution

Always join on both pod and namespace:

* on (pod, namespace)

4. Handling Absence Detection

Problem

Initial attempts using absent() failed because missing data is invisible to Prometheus calculations.

Solution

Find a better way to detect nodes without non-DaemonSet pods.

5. Using Set Logic Correctly

Solution

Use unless for set difference:

(
  count by (node) (kube_node_info)
)
unless
(
  count by (node) (
    kube_pod_owner{owner_kind!="DaemonSet"}
    * on (pod, namespace) group_left(node)
    kube_pod_info
  ) > bool 0
)

✅ This correctly finds DaemonSet-only nodes.

6. Enriching Node Labels

We wanted to display:

node/role
node/env
instance-type

Solution

Join using label_kubernetes_io_hostname:

* on (node)
group_left(label_node_env, label_node_role)
kube_node_labels{label_kubernetes_io_hostname!=""}

7. Ignoring Special Nodes Dynamically

Certain nodes (e.g., system nodes) are expected to have only DaemonSets.

Solution

Templating ignoreLabels dynamically in Helm:

ignoreLabels:
  label_node_role: core-services
  label_node_env: infra

Rendering PromQL:

unless on(node) kube_node_labels{label_node_role="core-services", label_node_env="infra"}

8. Formatting Alerts for Slack

To make alerts Slack-friendly, we wrapped important parts in backticks:

summary: "DaemonSet-only node detected: `{{ $labels.node }}`"
description: |
  *Problem*: Node `{{ $labels.node }}` is running only DaemonSet pods.

  *Node Labels*:
  - `node/env`: `{{ $labels.label_node_env }}`
  - `node/role`: `{{ $labels.label_node_role }}`

  *Impact*: No regular workloads are running on this node.

✅ Easy-to-read, professional Slack alerts.

Final Helm Template

{{- $ignoreLabels := .Values.ignoreLabels }}

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: daemonset-only-nodes
spec:
  groups:
    - name: daemonset-only-nodes.rules
      rules:
        - alert: DaemonSetOnlyNodeDetected
          expr: |
            (
              (count by (node) (kube_node_info))
              unless
              (count by (node) (
                kube_pod_owner{owner_kind!="DaemonSet"}
                * on (pod, namespace) group_left(node)
                kube_pod_info
              ) > bool 0)
            )
            * on (node)
            group_left(label_node_env, label_node_role)
            kube_node_labels
            {{- if $ignoreLabels }}
            != ""
            {{- range $key, $value := $ignoreLabels }}
            unless on(node) kube_node_labels{ {{ printf "%s=\"%s\"" $key $value }} }
            {{- end }}
            {{- end }}            
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "DaemonSet-only node detected: `{{`{{ $labels.node }}`}}`"
            description: |
              *Problem*: Node `{{`{{ $labels.node }}`}}` is running only DaemonSet pods.

              *Node Labels*:
              - `node/env`: `{{`{{ $labels.label_node_env }}`}}`
              - `node/role`: `{{`{{ $labels.label_node_role }}`}}`

              *Impact*: No regular workloads are running on this node.

Conclusion

Building complex Kubernetes monitoring with Prometheus is not always straightforward.
However, with the right techniques and templates, you can detect hidden inefficiencies, prevent operational risks, and save thousands annually.

Key takeaways:

Understand PromQL joins and label matching.
Prefer unless over - for set exclusion.
Template dynamic ignore rules via Helm.
Format alerts for maximum Slack readability.
Idle nodes cost real money — observability saves budgets.

About Dedico Servers

At Dedico Servers, we build resilient, efficient, and cost-optimized cloud and Kubernetes infrastructure.

If you’d like help optimizing your Kubernetes clusters, monitoring stack, or reducing your cloud costs, contact us today.


---

✅ **This will now render beautifully** — code blocks look clean, Problems/Solutions are properly separated, easy to read.

---

Would you like me to now also generate a small **architecture diagram**? (Pods → Owners → Nodes → Prometheus → Alertmanager → Slack)  
Would look great at the top of the blog post 🚀✨  
Let’s make it next-level if you want! 🎯

Detecting Kubernetes Nodes Running Only DaemonSet Pods – A Deep Dive#

Executive Summary#

Table of Contents#

Background#

Why It Matters: Operational Impact and Cost Waste#

Operational Risks#

Cost Waste 💸#

Example#

The Goal#

Step-by-Step Journey#

1. Finding the Right Metrics#

Problem#

2. Mapping Pods to Nodes#

Solution#

3. Solving Many-to-Many Matching Errors#

Problem#

Solution#

4. Handling Absence Detection#

Problem#

Solution#

5. Using Set Logic Correctly#

Solution#

6. Enriching Node Labels#

Solution#

7. Ignoring Special Nodes Dynamically#

Solution#

8. Formatting Alerts for Slack#

Final Helm Template#

Conclusion#

About Dedico Servers#

Detecting Kubernetes Nodes Running Only DaemonSet Pods – A Deep Dive

Executive Summary

Table of Contents

Background

Why It Matters: Operational Impact and Cost Waste

Operational Risks

Cost Waste 💸

Example

The Goal

Step-by-Step Journey

1. Finding the Right Metrics

Problem

2. Mapping Pods to Nodes

Solution

3. Solving Many-to-Many Matching Errors

Problem

Solution

4. Handling Absence Detection

Problem

Solution

5. Using Set Logic Correctly

Solution

6. Enriching Node Labels

Solution

7. Ignoring Special Nodes Dynamically

Solution

8. Formatting Alerts for Slack

Final Helm Template

Conclusion

About Dedico Servers