The Uncomfortable Truth
An attacker execs into a production pod, reads the mounted service account token, tests permissions, lists secrets, creates a ClusterRoleBinding, deploys a cryptominer, and disappears inside normal cluster noise. The attack takes minutes. The incident is discovered weeks later when a bill spikes or a customer reports impact.
Audit logs were not missing. Signal was missing. The API server recorded the behavior, but the policy captured the wrong level, the backend stored too much noise, and nobody had detections mapped to attacker actions.
Kubernetes audit logging is the best native evidence source in the platform. Every API request crosses the API server: pods/exec, secrets, rolebindings, clusterrolebindings, serviceaccounts/token, workload creation, and network policy deletion. If you tune it correctly, you see the attack before the attacker gets durable control.
Why Default Audit Logging Fails
Most clusters fail in two predictable ways.
RequestResponse turns your SIEM into a credential vault for attackers.A good audit policy is not a compliance checkbox. It is a threat model expressed as YAML: drop what cannot help, preserve what attackers touch, and keep sensitive values out of the log stream.
Audit Levels That Matter
| Level | Captured data | Use it for | Risk |
|---|---|---|---|
None | No event | Health checks, leases, API discovery, noisy system chatter | Blind if overused |
Metadata | User, verb, resource, namespace, status, source IP | Secrets, broad read operations, auth probes | Limited forensic depth |
Request | Metadata plus request body | Selected write operations | Can expose sensitive inputs |
RequestResponse | Metadata, request body, response body | RBAC mutations, workload creation, pod exec, service account token creation | High volume; dangerous on secrets |
Never logsecretsatRequestorRequestResponse. You need to know who touched a secret, not copy the secret value into your log backend.
Security-Focused Audit Policy
This policy is opinionated. It drops system noise first, then captures high-value attacker behavior at the right level. It also removes RequestReceived, which doubles volume without helping detection.
apiVersion: audit.k8s.io/v1
kind: Policy
omitStages:
- "RequestReceived"
rules:
# Drop high-volume system noise.
- level: None
users: ["system:kube-probe"]
- level: None
userGroups: ["system:nodes"]
verbs: ["get", "list", "watch"]
- level: None
resources:
- group: ""
resources: ["events"]
- group: "coordination.k8s.io"
resources: ["leases"]
- level: None
nonResourceURLs: ["/healthz*", "/livez*", "/readyz*", "/version"]
# Secrets: metadata only. Never log secret values.
- level: Metadata
resources:
- group: ""
resources: ["secrets"]
# RBAC mutations: full body for forensics.
- level: RequestResponse
resources:
- group: "rbac.authorization.k8s.io"
resources: ["roles", "rolebindings", "clusterroles", "clusterrolebindings"]
# Interactive access.
- level: RequestResponse
resources:
- group: ""
resources: ["pods/exec", "pods/attach", "pods/portforward"]
# Service account and token creation.
- level: RequestResponse
resources:
- group: ""
resources: ["serviceaccounts", "serviceaccounts/token"]
# Workload mutations.
- level: RequestResponse
verbs: ["create", "update", "patch", "delete"]
resources:
- group: ""
resources: ["pods", "configmaps", "namespaces"]
- group: "apps"
resources: ["deployments", "daemonsets", "statefulsets", "replicasets"]
- group: "batch"
resources: ["jobs", "cronjobs"]
# Network policy changes can enable lateral movement.
- level: RequestResponse
resources:
- group: "networking.k8s.io"
resources: ["networkpolicies"]
# Auth probing.
- level: Metadata
resources:
- group: "authentication.k8s.io"
resources: ["tokenreviews"]
- group: "authorization.k8s.io"
resources: ["subjectaccessreviews"]
# Catch-all.
- level: MetadataDetection Rules Mapped to the Kill Chain
Detection rules must map to what attackers actually do. Do not alert on everything. Alert on behavior that changes attacker capability.
pods/exec, pods/attach, port-forward1. Anonymous or unauthenticated API access
Any successful request from system:anonymous or system:unauthenticated is a page. Denied bursts are reconnaissance.
user.username = "system:anonymous" OR user.groups contains "system:unauthenticated" Alert: - any 2xx response - more than 10 denied requests in 5 minutes from one source IP
2. 403 spikes by identity
Attackers probe RBAC. Legitimate operators usually know what they are allowed to do. A service account generating denied requests across namespaces is a strong signal.
annotations.authorization.k8s.io/decision = "forbid" GROUP BY user.username WHERE count > 10 in 5 minutes High priority: objectRef.subresource = "exec" AND decision = "forbid"
3. Cross-namespace secret access
Normal workloads rarely enumerate secrets across namespaces. Attackers do. Secrets must stay at Metadata level so the detection sees access without leaking values.
objectRef.resource = "secrets"
AND verb IN ("get", "list")
GROUP BY user.username
WHERE distinct(objectRef.namespace) > 3 in 10 minutes4. Pod exec and attach
pods/exec is command execution inside a container. In production, every exec should be explained by an incident ticket, an SRE action, or a break-glass workflow.
objectRef.subresource IN ("exec", "attach")
AND verb = "create"
Critical:
objectRef.namespace = "kube-system"5. Cluster-admin binding creation
This is not suspicious. This is escalation. A new ClusterRoleBinding pointing to cluster-admin gives the subject full control of the cluster.
objectRef.resource = "clusterrolebindings"
AND verb IN ("create", "patch", "update")
AND requestObject.roleRef.name = "cluster-admin"
Also alert:
RBAC mutation by any service account outside platform namespaces6. Privileged pod or hostPath creation
A privileged container, hostPID, hostNetwork, or sensitive hostPath mount is container escape by configuration.
requestObject.spec.hostPID = true
OR requestObject.spec.hostNetwork = true
OR requestObject.spec.containers[*].securityContext.privileged = true
OR requestObject.spec.volumes[*].hostPath.path IN ("/", "/etc", "/proc", "/dev", "/sys", "/var/run/docker.sock")7. Persistence workload in system namespaces
Attackers hide in places operators mentally skip. A CronJob or DaemonSet in kube-system or a platform namespace deserves immediate review.
verb = "create"
AND objectRef.resource IN ("cronjobs", "daemonsets", "deployments")
AND objectRef.namespace IN ("kube-system", "monitoring", "logging", "ingress-nginx")8. NetworkPolicy deletion
Deleting or weakening NetworkPolicies is a lateral movement enabler. If an identity removes egress restrictions, assume the next move is east-west scanning.
objectRef.resource = "networkpolicies"
AND verb IN ("delete", "patch", "update")Audit Event Anatomy
Raw audit events are noisy until you normalize the fields that carry security meaning. These are the fields I would keep in every detection pipeline before enrichment.
| Field | Why it matters | Detection use |
|---|---|---|
user.username | Identity behind the request. | Separate human operators, controllers, and service accounts. |
sourceIPs | Network origin of the API call. | Detect stolen service account tokens used outside expected pod ranges. |
verb | Action requested. | Prioritize create, patch, update, delete, and suspicious list. |
objectRef.resource | Target resource. | Catch access to secrets, pods/exec, clusterrolebindings, and networkpolicies. |
objectRef.namespace | Blast-radius boundary. | Detect cross-namespace sweeps and access into kube-system. |
responseStatus.code | Allowed or denied result. | Differentiate successful compromise from permission probing. |
requestObject | Payload of high-risk writes. | Find privileged pods, hostPath mounts, cluster-admin bindings, unknown images. |
annotations.authorization.k8s.io/decision | RBAC decision context. | Baseline denied requests and flag enumeration spikes. |
{
"verb": "create",
"user": { "username": "system:serviceaccount:prod:web" },
"sourceIPs": ["10.42.8.19"],
"objectRef": {
"namespace": "payments",
"resource": "pods",
"subresource": "exec"
},
"responseStatus": { "code": 201 },
"annotations": {
"authorization.k8s.io/decision": "allow"
}
}SIEM Query Pack
The article should not stop at theory. Below are starter queries you can adapt to CloudWatch Logs Insights, Microsoft Sentinel / Log Analytics, and Splunk. Tune namespaces and identities to your environment.
CloudWatch Logs Insights — EKS exec and secret access
fields @timestamp, user.username, verb, objectRef.namespace, objectRef.resource, objectRef.subresource, sourceIPs.0, responseStatus.code | filter objectRef.subresource in ["exec", "attach"] or objectRef.resource = "secrets" | filter responseStatus.code between 200 and 299 | sort @timestamp desc | limit 100
Microsoft Sentinel / AKS — cluster-admin binding
AzureDiagnostics
| where Category in ("kube-audit", "kube-audit-admin")
| where requestObject_s has "cluster-admin"
| where objectRef_resource_s == "clusterrolebindings"
| where verb_s in ("create", "patch", "update")
| project TimeGenerated, user_username_s, sourceIPs_s, verb_s, objectRef_name_s, requestObject_sSplunk — forbidden request spike
index=kubernetes_audit annotations.authorization.k8s.io/decision=forbid
| bin _time span=5m
| stats count dc(objectRef.resource) as resources values(objectRef.namespace) as namespaces by _time user.username sourceIPs{}
| where count > 10 OR resources > 4Response Playbook
Audit detections should trigger action. The faster you move from event to containment, the less value the attacker extracts from the cluster.
user.username, sourceIPs, pod name, namespace, and auditID across the previous 24 hours.Build the Detection Pipeline
A detection rule in a document is not a control. Ship audit logs outside the cluster, normalize them, alert on high-confidence behavior, and keep enough retention for incident reconstruction.
API Server audit log ├─ file backend or webhook backend ├─ Fluent Bit / managed control-plane logging ├─ external storage: S3, CloudWatch, Log Analytics, Elasticsearch, Splunk ├─ detection processor: SIEM rules, Sigma mappings, custom correlation └─ alerting: Slack, Teams, PagerDuty, incident queue
The audit trail must survive cluster compromise. If the attacker can delete your logs from inside the cluster, your logging architecture is part of the blast radius.
Managed Kubernetes Reality
| Platform | What to enable | Operational note |
|---|---|---|
| EKS | Control plane audit logging to CloudWatch | Disabled unless explicitly enabled. Query with CloudWatch Logs Insights and forward to SIEM. |
| GKE | Admin Activity plus Data Access logs | Admin Activity is default. Data Access gives richer read visibility and must be deliberately enabled. |
| AKS | Diagnostic settings for kube-audit or kube-audit-admin | Send to Log Analytics. Use analytics rules for exec, secret access, RBAC mutation, and privileged pod creation. |
What Audit Logs Cannot See
Audit logs capture API server activity. They do not capture commands after a successful exec, raw pod-to-pod traffic, direct kubelet API abuse, direct etcd access, or process-level behavior inside the container.
pods/exec, the API server does not see every shell command. Use runtime telemetry.Audit Log Maturity Model
| Level | State | Detection capability |
|---|---|---|
| 0 | No audit logging | Post-incident guessing. |
| 1 | Default logging, noisy policy | Some forensics, weak detection. |
| 2 | Tuned policy and external storage | Fast investigation and reliable event history. |
| 3 | Automated detections | Real-time alerts for common attack paths. |
| 4 | Layered detection and response | Audit logs, runtime signals, network telemetry, and automated containment. |
Get to Level 2 in one week. Get to Level 3 in one month. Level 4 is where mature Kubernetes security operations live.
Final Word
Every serious Kubernetes attack leaves API-server fingerprints. The attacker who steals a service account token uses it against the API server. The attacker who creates a privileged pod submits it through the API server. The attacker who binds themselves to cluster-admin modifies RBAC through the API server.
My position is simple: if Kubernetes audit logs are not tuned, shipped externally, and connected to detection logic, the cluster is not monitored. It is only producing evidence for someone to read after the damage is done.
— Riad DAHMANI, k8sec.io
Turn audit logs into attack-path intelligence.
K8SEC correlates audit events with RBAC, workload posture, and network exposure so teams can see attack paths before they become incidents.
Explore K8SEC