ktl logs: Rollout-Aware Debugging Beyond kubectl logs
Most incident timelines fail not because logs are missing, but because the log stream is not contextual.
ktl logs focuses on context: rollout roles, events, node signals, and shareable sessions.
Teams often start debugging with kubectl logs and then switch between five terminals: one for
pod restarts, one for events, one for rollout status, one for node issues, and one to share output in chat.
The problem is not raw access to logs. The problem is fragmentation.
ktl logs solves that by treating debugging as a stream orchestration problem, not a single command.
You still get fast log tailing, but with rollout-awareness and better transport options for team workflows.
Quick comparison
| Capability | kubectl logs / classic flow |
ktl logs |
|---|---|---|
| Rollout-aware pod selection | Manual pod picking during deploys/canaries. | deploy/<name> lens with --deploy-mode active|stable|canary. |
| Multi-source incident context | Usually separate commands and terminals. | Single stream can include pod logs, events, and node/system logs. |
| Structured filtering | Pipe to jq/grep manually. |
Built-in JSON key filters via --filter key=value. |
| Remote sharing | Screen share or paste snippets. | --ws-listen, --mirror-bus, and --remote-agent support. |
| Session replay/audit | Ad hoc transcript files. | --capture persists logs + selection changes into SQLite. |
1) Deployment lens: follow the right pods automatically
The most practical differentiator is ktl logs deploy/my-app -n prod. Instead of hard-coding pod
names, ktl resolves deployment selectors, watches ReplicaSets and pods, and keeps your stream pointed at the
correct workload during rollout changes.
During canary rollouts, you can intentionally scope to stable or canary traffic paths.
# Follow active rollout pods
ktl logs deploy/checkout -n prod-payments --deploy-mode active
# Focus only on canary pods
ktl logs deploy/checkout -n prod-payments --deploy-mode canary
# Compare stable and canary in one stream
ktl logs deploy/checkout -n prod-payments --deploy-mode stable+canary
2) Incident context in one stream: pods + events + nodes
Production failures are rarely a pure app-log problem. You also need scheduler, kubelet, and cluster events.
ktl logs can merge those signals in one timeline.
# Pod logs + related Kubernetes events
ktl logs 'checkout-.*' -n prod-payments --events
# Add node/system logs from nodes hosting matched pods
ktl logs 'checkout-.*' -n prod-payments --node-logs
# Stream only node-level logs when isolating infrastructure issues
ktl logs 'checkout-.*' -n prod-payments --node-log-only --node-log kubelet.log
3) Structured filtering and safer narrowing
Grep-first workflows are noisy on JSON logs. The built-in --filter lets responders narrow by keys
directly, and combine it with pod conditions or selectors.
# Show only error-level JSON lines from Ready=false pods
ktl logs 'checkout-.*' -n prod-payments \
--condition ready=false \
--filter level=error,status=500
4) Collaboration modes: local, remote, mirrored
If one engineer has cluster access and others need visibility, transport matters. ktl logs supports
multiple collaboration paths without changing the incident command model.
--ws-listen :9090exposes a raw WebSocket stream for dashboards/viewers.--mirror-buspublishes to a shared gRPC mirror bus for multi-subscriber replay/tail.--remote-agentforwards execution to a remotektl-agentendpoint.
# Local cluster access, shared stream
ktl logs 'checkout-.*' -n prod-payments --ws-listen :9090
# Remote execution model
ktl --remote-agent agent.internal:9443 --remote-tls \
--remote-token "$KTL_REMOTE_TOKEN" logs 'checkout-.*' -n prod-payments
5) Capture for replay and audit
Incident notes are usually incomplete. --capture writes session metadata, log events, and tailer
selection changes into SQLite so teams can review exactly what happened and when.
ktl logs deploy/checkout -n prod-payments \
--events \
--capture ./captures/checkout-incident.sqlite \
--capture-tag incident=INC-4821 \
--capture-tag env=prod
This is useful for postmortems because you keep the debugging timeline, not just a few copied snippets.
Superpower examples (copy/paste)
These are practical commands teams use during real incident windows.
# Superpower 1: One command for rollout + events + node signals + capture
ktl logs deploy/checkout -n prod-payments \
--deploy-mode active \
--events \
--node-logs \
--capture ./captures/checkout-live.sqlite \
--capture-tag incident=INC-4902
# Superpower 2: Dependency-aware logs from stack config
ktl logs checkout -n prod-payments \
--deps \
--config ./stack.yaml
# Superpower 3: Team-viewable stream for remote responders
ktl logs 'checkout-.*' -n prod-payments \
--events \
--ws-listen :9090
# Superpower 4: Automation-friendly JSON export (no follow)
ktl logs 'checkout-.*' -n prod-payments \
--output extjson \
--tail 300 \
--no-follow > checkout-tail.jsonl
6) Security scanning superpower: verify before rollout
Fast debugging is not enough if risky manifests keep reappearing. Security scanning should run before and after incidents to catch privilege, policy, and drift problems early.
A practical pattern is: ktl logs for runtime signals, then verify for policy checks.
This gives responders both symptom context and security posture context in the same workflow.
# Verify stack health gates from stack config
ktl stack verify --config stack.yaml
# Verify rendered chart policies before apply
verify --chart ./chart --release checkout -n prod-payments
# Verify live namespace posture
verify --namespace prod-payments --context prod-us
# Discover and inspect builtin rules
verify rules list
verify rules show k8s/container_is_privileged
# Compare to a known-good baseline report
verify verify.yaml --compare-to ./baseline.json
This is where scanning matters most: it reduces repeat incidents by preventing insecure rollout patterns from shipping again under time pressure.
7) Tunnel superpower: debug local code against real cluster dependencies
After logs identify the failing path, teams often need one more step: run a local fix while keeping real
backend dependencies from the cluster. That is where ktl tunnel is uniquely useful.
Instead of hand-wiring multiple port-forward sessions, ktl tunnel can open dependency
tunnels from stack config, inject env vars from a live workload, and start your local debug command once
everything is ready.
# Local debug loop with remote dependencies + runtime env
ktl tunnel frontend \
--deps \
--config ./stack.yaml \
--env-from deployment/frontend \
--exec "npm run dev"
For resilience debugging, tunnel can also inject network chaos so you can reproduce issues that only appear under degraded links.
# Simulate bad network while testing retry logic
ktl tunnel backend \
--latency 400ms \
--error-rate 0.15
In practice, this combination is powerful: ktl logs finds the failure, ktl tunnel
gives a realistic local reproduction path, and verify ensures the rollout fix is policy-safe.
Practical playbook
If you are migrating from pure kubectl logs usage, this sequence works well:
- Start with
deploy/<name>lens in one service that has frequent rollouts. - Add
--eventsas default during incidents. - Use
ktl tunnel --deps --env-from --execto reproduce and validate local fixes faster. - Turn on
--capturefor production-only troubleshooting sessions. - Introduce
--ws-listenor--remote-agentfor distributed response teams.
Final take
ktl logs is not trying to replace every terminal trick. It improves the parts that actually slow
incident response: choosing the right pods during rollouts, joining signals into one view, and sharing or
replaying sessions with minimal friction.
Faster debugging is not just more tailing speed. It is higher-quality context per minute.
See the docs and examples at https://kubekattle.github.io/ktl/.