Back to Blog
Kubernetes Detection Engineering

Kubernetes Audit Logs
Detect Attacks Before It Is Too Late

The API server already sees the attacker. The failure is usually audit policy, noise, retention, context, and weak detection logic. This article turns Kubernetes audit logs into a practical detection pipeline for exec abuse, secret access, RBAC escalation, privileged pods, and persistence.

Riad DAHMANIk8sec Security ResearchMay 202618 min read
K8SEC Kubernetes audit logs anomaly detection visualization
Kubernetes audit log detection visual

The Uncomfortable Truth

An attacker execs into a production pod, reads the mounted service account token, tests permissions, lists secrets, creates a ClusterRoleBinding, deploys a cryptominer, and disappears inside normal cluster noise. The attack takes minutes. The incident is discovered weeks later when a bill spikes or a customer reports impact.

Audit logs were not missing. Signal was missing. The API server recorded the behavior, but the policy captured the wrong level, the backend stored too much noise, and nobody had detections mapped to attacker actions.

Kubernetes audit logging is the best native evidence source in the platform. Every API request crosses the API server: pods/exec, secrets, rolebindings, clusterrolebindings, serviceaccounts/token, workload creation, and network policy deletion. If you tune it correctly, you see the attack before the attacker gets durable control.

Why Default Audit Logging Fails

Most clusters fail in two predictable ways.

No audit logging
The API server is not configured with an audit policy. After compromise, your forensic capability is guesswork.
Metadata for everything
The cluster produces massive event volume, but the logs lack request bodies for the operations that matter.
Secrets logged incorrectly
Logging secrets at RequestResponse turns your SIEM into a credential vault for attackers.
No noise suppression
Kubelets, probes, leases, and events drown out human and attacker behavior.

A good audit policy is not a compliance checkbox. It is a threat model expressed as YAML: drop what cannot help, preserve what attackers touch, and keep sensitive values out of the log stream.

Audit Levels That Matter

LevelCaptured dataUse it forRisk
NoneNo eventHealth checks, leases, API discovery, noisy system chatterBlind if overused
MetadataUser, verb, resource, namespace, status, source IPSecrets, broad read operations, auth probesLimited forensic depth
RequestMetadata plus request bodySelected write operationsCan expose sensitive inputs
RequestResponseMetadata, request body, response bodyRBAC mutations, workload creation, pod exec, service account token creationHigh volume; dangerous on secrets
Never log secrets at Request or RequestResponse. You need to know who touched a secret, not copy the secret value into your log backend.

Security-Focused Audit Policy

This policy is opinionated. It drops system noise first, then captures high-value attacker behavior at the right level. It also removes RequestReceived, which doubles volume without helping detection.

/etc/kubernetes/audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy

omitStages:
  - "RequestReceived"

rules:
  # Drop high-volume system noise.
  - level: None
    users: ["system:kube-probe"]

  - level: None
    userGroups: ["system:nodes"]
    verbs: ["get", "list", "watch"]

  - level: None
    resources:
      - group: ""
        resources: ["events"]
      - group: "coordination.k8s.io"
        resources: ["leases"]

  - level: None
    nonResourceURLs: ["/healthz*", "/livez*", "/readyz*", "/version"]

  # Secrets: metadata only. Never log secret values.
  - level: Metadata
    resources:
      - group: ""
        resources: ["secrets"]

  # RBAC mutations: full body for forensics.
  - level: RequestResponse
    resources:
      - group: "rbac.authorization.k8s.io"
        resources: ["roles", "rolebindings", "clusterroles", "clusterrolebindings"]

  # Interactive access.
  - level: RequestResponse
    resources:
      - group: ""
        resources: ["pods/exec", "pods/attach", "pods/portforward"]

  # Service account and token creation.
  - level: RequestResponse
    resources:
      - group: ""
        resources: ["serviceaccounts", "serviceaccounts/token"]

  # Workload mutations.
  - level: RequestResponse
    verbs: ["create", "update", "patch", "delete"]
    resources:
      - group: ""
        resources: ["pods", "configmaps", "namespaces"]
      - group: "apps"
        resources: ["deployments", "daemonsets", "statefulsets", "replicasets"]
      - group: "batch"
        resources: ["jobs", "cronjobs"]

  # Network policy changes can enable lateral movement.
  - level: RequestResponse
    resources:
      - group: "networking.k8s.io"
        resources: ["networkpolicies"]

  # Auth probing.
  - level: Metadata
    resources:
      - group: "authentication.k8s.io"
        resources: ["tokenreviews"]
      - group: "authorization.k8s.io"
        resources: ["subjectaccessreviews"]

  # Catch-all.
  - level: Metadata

Detection Rules Mapped to the Kill Chain

Detection rules must map to what attackers actually do. Do not alert on everything. Alert on behavior that changes attacker capability.

1. ProbeAnonymous requests, 403 spikes, SubjectAccessReview abuse
2. Executepods/exec, pods/attach, port-forward
3. HarvestSecret access and service account token creation
4. EscalateRoleBinding or ClusterRoleBinding mutation
5. PersistCronJob, DaemonSet, unknown image registry

1. Anonymous or unauthenticated API access

Any successful request from system:anonymous or system:unauthenticated is a page. Denied bursts are reconnaissance.

Detection logic
user.username = "system:anonymous"
OR user.groups contains "system:unauthenticated"

Alert:
- any 2xx response
- more than 10 denied requests in 5 minutes from one source IP

2. 403 spikes by identity

Attackers probe RBAC. Legitimate operators usually know what they are allowed to do. A service account generating denied requests across namespaces is a strong signal.

Detection logic
annotations.authorization.k8s.io/decision = "forbid"
GROUP BY user.username
WHERE count > 10 in 5 minutes

High priority:
objectRef.subresource = "exec"
AND decision = "forbid"

3. Cross-namespace secret access

Normal workloads rarely enumerate secrets across namespaces. Attackers do. Secrets must stay at Metadata level so the detection sees access without leaking values.

Detection logic
objectRef.resource = "secrets"
AND verb IN ("get", "list")
GROUP BY user.username
WHERE distinct(objectRef.namespace) > 3 in 10 minutes

4. Pod exec and attach

pods/exec is command execution inside a container. In production, every exec should be explained by an incident ticket, an SRE action, or a break-glass workflow.

Detection logic
objectRef.subresource IN ("exec", "attach")
AND verb = "create"

Critical:
objectRef.namespace = "kube-system"

5. Cluster-admin binding creation

This is not suspicious. This is escalation. A new ClusterRoleBinding pointing to cluster-admin gives the subject full control of the cluster.

Detection logic
objectRef.resource = "clusterrolebindings"
AND verb IN ("create", "patch", "update")
AND requestObject.roleRef.name = "cluster-admin"

Also alert:
RBAC mutation by any service account outside platform namespaces

6. Privileged pod or hostPath creation

A privileged container, hostPID, hostNetwork, or sensitive hostPath mount is container escape by configuration.

Sensitive fields
requestObject.spec.hostPID = true
OR requestObject.spec.hostNetwork = true
OR requestObject.spec.containers[*].securityContext.privileged = true
OR requestObject.spec.volumes[*].hostPath.path IN ("/", "/etc", "/proc", "/dev", "/sys", "/var/run/docker.sock")

7. Persistence workload in system namespaces

Attackers hide in places operators mentally skip. A CronJob or DaemonSet in kube-system or a platform namespace deserves immediate review.

Detection logic
verb = "create"
AND objectRef.resource IN ("cronjobs", "daemonsets", "deployments")
AND objectRef.namespace IN ("kube-system", "monitoring", "logging", "ingress-nginx")

8. NetworkPolicy deletion

Deleting or weakening NetworkPolicies is a lateral movement enabler. If an identity removes egress restrictions, assume the next move is east-west scanning.

Detection logic
objectRef.resource = "networkpolicies"
AND verb IN ("delete", "patch", "update")

Audit Event Anatomy

Raw audit events are noisy until you normalize the fields that carry security meaning. These are the fields I would keep in every detection pipeline before enrichment.

FieldWhy it mattersDetection use
user.usernameIdentity behind the request.Separate human operators, controllers, and service accounts.
sourceIPsNetwork origin of the API call.Detect stolen service account tokens used outside expected pod ranges.
verbAction requested.Prioritize create, patch, update, delete, and suspicious list.
objectRef.resourceTarget resource.Catch access to secrets, pods/exec, clusterrolebindings, and networkpolicies.
objectRef.namespaceBlast-radius boundary.Detect cross-namespace sweeps and access into kube-system.
responseStatus.codeAllowed or denied result.Differentiate successful compromise from permission probing.
requestObjectPayload of high-risk writes.Find privileged pods, hostPath mounts, cluster-admin bindings, unknown images.
annotations.authorization.k8s.io/decisionRBAC decision context.Baseline denied requests and flag enumeration spikes.
high-signal audit event shape
{
  "verb": "create",
  "user": { "username": "system:serviceaccount:prod:web" },
  "sourceIPs": ["10.42.8.19"],
  "objectRef": {
    "namespace": "payments",
    "resource": "pods",
    "subresource": "exec"
  },
  "responseStatus": { "code": 201 },
  "annotations": {
    "authorization.k8s.io/decision": "allow"
  }
}

SIEM Query Pack

The article should not stop at theory. Below are starter queries you can adapt to CloudWatch Logs Insights, Microsoft Sentinel / Log Analytics, and Splunk. Tune namespaces and identities to your environment.

CloudWatch Logs Insights — EKS exec and secret access

cloudwatch logs insights
fields @timestamp, user.username, verb, objectRef.namespace, objectRef.resource, objectRef.subresource, sourceIPs.0, responseStatus.code
| filter objectRef.subresource in ["exec", "attach"] or objectRef.resource = "secrets"
| filter responseStatus.code between 200 and 299
| sort @timestamp desc
| limit 100

Microsoft Sentinel / AKS — cluster-admin binding

kql
AzureDiagnostics
| where Category in ("kube-audit", "kube-audit-admin")
| where requestObject_s has "cluster-admin"
| where objectRef_resource_s == "clusterrolebindings"
| where verb_s in ("create", "patch", "update")
| project TimeGenerated, user_username_s, sourceIPs_s, verb_s, objectRef_name_s, requestObject_s

Splunk — forbidden request spike

splunk
index=kubernetes_audit annotations.authorization.k8s.io/decision=forbid
| bin _time span=5m
| stats count dc(objectRef.resource) as resources values(objectRef.namespace) as namespaces by _time user.username sourceIPs{}
| where count > 10 OR resources > 4

Response Playbook

Audit detections should trigger action. The faster you move from event to containment, the less value the attacker extracts from the cluster.

Revoke identity
Delete or rotate the compromised service account token. Remove suspicious RoleBindings and ClusterRoleBindings immediately.
Isolate workload
Apply a deny-all egress NetworkPolicy to the namespace or quarantine the pod with labels your CNI enforces.
Reconstruct timeline
Pivot on user.username, sourceIPs, pod name, namespace, and auditID across the previous 24 hours.
Prevent replay
Disable automounted service account tokens where not needed and tighten RBAC to namespace-scoped, verb-scoped roles.

Build the Detection Pipeline

A detection rule in a document is not a control. Ship audit logs outside the cluster, normalize them, alert on high-confidence behavior, and keep enough retention for incident reconstruction.

Reference pipeline
API Server audit log
  ├─ file backend or webhook backend
  ├─ Fluent Bit / managed control-plane logging
  ├─ external storage: S3, CloudWatch, Log Analytics, Elasticsearch, Splunk
  ├─ detection processor: SIEM rules, Sigma mappings, custom correlation
  └─ alerting: Slack, Teams, PagerDuty, incident queue
The audit trail must survive cluster compromise. If the attacker can delete your logs from inside the cluster, your logging architecture is part of the blast radius.

Managed Kubernetes Reality

PlatformWhat to enableOperational note
EKSControl plane audit logging to CloudWatchDisabled unless explicitly enabled. Query with CloudWatch Logs Insights and forward to SIEM.
GKEAdmin Activity plus Data Access logsAdmin Activity is default. Data Access gives richer read visibility and must be deliberately enabled.
AKSDiagnostic settings for kube-audit or kube-audit-adminSend to Log Analytics. Use analytics rules for exec, secret access, RBAC mutation, and privileged pod creation.

What Audit Logs Cannot See

Audit logs capture API server activity. They do not capture commands after a successful exec, raw pod-to-pod traffic, direct kubelet API abuse, direct etcd access, or process-level behavior inside the container.

Container commands
After pods/exec, the API server does not see every shell command. Use runtime telemetry.
East-west traffic
API audit does not show a compromised pod talking to a database. Use CNI/network telemetry.
Direct kubelet access
Traffic to kubelet can bypass the API server. Harden and monitor kubelet endpoints.
Direct etcd access
etcd access bypasses Kubernetes audit logs entirely. Isolate etcd and enforce authentication.

Audit Log Maturity Model

LevelStateDetection capability
0No audit loggingPost-incident guessing.
1Default logging, noisy policySome forensics, weak detection.
2Tuned policy and external storageFast investigation and reliable event history.
3Automated detectionsReal-time alerts for common attack paths.
4Layered detection and responseAudit logs, runtime signals, network telemetry, and automated containment.

Get to Level 2 in one week. Get to Level 3 in one month. Level 4 is where mature Kubernetes security operations live.

Final Word

Every serious Kubernetes attack leaves API-server fingerprints. The attacker who steals a service account token uses it against the API server. The attacker who creates a privileged pod submits it through the API server. The attacker who binds themselves to cluster-admin modifies RBAC through the API server.

My position is simple: if Kubernetes audit logs are not tuned, shipped externally, and connected to detection logic, the cluster is not monitored. It is only producing evidence for someone to read after the damage is done.

— Riad DAHMANI, k8sec.io

Turn audit logs into attack-path intelligence.

K8SEC correlates audit events with RBAC, workload posture, and network exposure so teams can see attack paths before they become incidents.

Explore K8SEC