Troubleshooting Guide

Comprehensive troubleshooting guide for resolving common KafkaGuard issues and errors.

Table of Contents


Overview

This guide helps you diagnose and resolve common KafkaGuard issues. Each section includes:

  • Error message - What you see
  • Cause - Why it happened
  • Fix - How to resolve it
  • Prevention - How to avoid it in the future

Quick Troubleshooting Checklist

Before diving into specific errors, verify these basics:

  • KafkaGuard is installed and in PATH (kafkaguard version)
  • Kafka brokers are reachable from your network
  • Correct bootstrap server addresses (hostname and port)
  • Credentials are correct (if using SASL)
  • Certificates are valid (if using SSL/TLS)
  • Policy file exists and is valid
  • Output directory has write permissions

Connection Errors

Error: connection refused

Full Error Message

Error: kafka: client has run out of available brokers to talk to
Error: dial tcp 192.168.1.100:9092: connect: connection refused

Cause

  1. Kafka broker is not running
  2. Wrong broker address (hostname or port)
  3. Network firewall blocking Kafka ports
  4. Broker not listening on specified interface

Fix

Step 1: Verify broker is running

# Check Kafka process
ps aux | grep kafka

# Check Kafka service status
systemctl status kafka  # Linux systemd

Step 2: Test network connectivity

# Test TCP connection
telnet kafka.example.com 9092

# Or use netcat
nc -zv kafka.example.com 9092

# Or use kafkaGuard with debug logging
kafkaguard scan --bootstrap kafka:9092 --log-level debug

Step 3: Check broker address and port

# Verify bootstrap servers
# Common ports:
# - 9092: PLAINTEXT
# - 9093: SSL
# - 9094: SASL_PLAINTEXT
# - 9095: SASL_SSL

# Try different port
kafkaguard scan --bootstrap kafka:9093  # SSL port

Step 4: Check firewall rules

# Linux: Check iptables
sudo iptables -L -n | grep 9092

# macOS: Check pf firewall
sudo pfctl -s rules

# Test from broker server directly
ssh kafka-server
telnet localhost 9092

Prevention

  • Maintain inventory of broker addresses and ports
  • Monitor broker health (use monitoring tools)
  • Document network topology and firewall rules
  • Use DNS names instead of IP addresses (easier to update)

Error: i/o timeout

Full Error Message

Error: kafka: client has run out of available brokers to talk to
Error: dial tcp 192.168.1.100:9092: i/o timeout

Cause

  1. Network latency or packet loss
  2. Firewall dropping packets (no explicit deny)
  3. Broker overloaded and not responding
  4. Timeout too low for network conditions

Fix

Step 1: Test network latency

# Ping broker
ping kafka.example.com

# Traceroute to broker
traceroute kafka.example.com

# Test TCP latency
time telnet kafka.example.com 9092

Step 2: Increase timeout

# Increase scan timeout to 600 seconds (10 minutes)
kafkaguard scan \
  --bootstrap kafka:9092 \
  --timeout 600

# For very large clusters or high latency networks
kafkaguard scan \
  --bootstrap kafka:9092 \
  --timeout 900  # 15 minutes

Step 3: Check broker load

# On broker server, check CPU and memory
top
htop

# Check Kafka metrics
kafka-broker-api-versions.sh --bootstrap-server localhost:9092

Step 4: Verify firewall is not dropping packets

# Run tcpdump on broker to see if packets arrive
sudo tcpdump -i any port 9092

# Check for firewall rules with silent drops
sudo iptables -L -v -n | grep DROP

Prevention

  • Set appropriate timeout based on network conditions
  • Monitor network latency between KafkaGuard and Kafka brokers
  • Configure firewall rules to explicitly deny (not drop silently)
  • Scale Kafka brokers if consistently overloaded

Error: no such host

Full Error Message

Error: kafka: client has run out of available brokers to talk to
Error: dial tcp: lookup kafka.example.com: no such host

Cause

  1. DNS resolution failure
  2. Hostname typo in --bootstrap flag
  3. DNS server unreachable
  4. Host entry missing from /etc/hosts

Fix

Step 1: Verify hostname spelling

# Check for typos
echo "kafka.example.com"  # Verify exact spelling

Step 2: Test DNS resolution

# Test DNS lookup
nslookup kafka.example.com
dig kafka.example.com

# Check which DNS server is being used
cat /etc/resolv.conf

Step 3: Use IP address instead

# If DNS is failing, use IP address temporarily
kafkaguard scan --bootstrap 192.168.1.100:9092

Step 4: Add to /etc/hosts (if DNS unavailable)

# Add entry to /etc/hosts
sudo bash -c 'echo "192.168.1.100 kafka.example.com" >> /etc/hosts'

# Verify
cat /etc/hosts | grep kafka

Prevention

  • Use DNS for production (more flexible than IP addresses)
  • Maintain accurate DNS records
  • Monitor DNS health
  • Document broker hostnames and IP addresses

Error: connection reset by peer

Full Error Message

Error: kafka: client has run out of available brokers to talk to
Error: read tcp 192.168.1.10:12345->192.168.1.100:9092: read: connection reset by peer

Cause

  1. Broker closed connection (authentication failure)
  2. Security protocol mismatch (using PLAINTEXT for SASL_SSL broker)
  3. TLS handshake failure
  4. Broker restarted during scan

Fix

Step 1: Verify security protocol

# Try auto-detection (no --security-protocol flag)
kafkaguard scan --bootstrap kafka:9092

# Or explicitly specify protocol
kafkaguard scan \
  --bootstrap kafka:9095 \
  --security-protocol SASL_SSL

Step 2: Check broker listener configuration

# On broker server, check listeners config
grep listeners /opt/kafka/config/server.properties

# Example output:
# listeners=PLAINTEXT://0.0.0.0:9092,SASL_SSL://0.0.0.0:9095

Step 3: Enable debug logging

kafkaguard scan \
  --bootstrap kafka:9092 \
  --log-level debug \
  2>&1 | tee debug.log

# Review debug.log for TLS or SASL errors

Step 4: Verify broker is stable

# Check broker uptime
ssh kafka-server
uptime

# Check Kafka logs for restarts
tail -100 /var/log/kafka/server.log

Prevention

  • Document broker security protocols for each cluster
  • Monitor broker restarts
  • Use explicit --security-protocol flag in production
  • Test scans during maintenance windows

Authentication Errors

Error: SASL authentication failed

Full Error Message

Error: kafka: client has run out of available brokers to talk to
Error: SASL authentication failed: kafka: invalid username or password

Cause

  1. Invalid username or password
  2. SASL mechanism mismatch (client uses SHA-256, broker requires SHA-512)
  3. User not created on Kafka broker
  4. User credentials expired or revoked

Fix

Step 1: Verify credentials

# Check environment variables
echo "Username: $KAFKAGUARD_SASL_USERNAME"
echo "Password length: ${#KAFKAGUARD_SASL_PASSWORD}"

# Verify credentials with kafka-console-consumer
kafka-console-consumer.sh \
  --bootstrap-server kafka:9095 \
  --consumer.config client.properties \
  --topic __consumer_offsets \
  --max-messages 1

Step 2: Check SASL mechanism

# List broker's enabled mechanisms
ssh kafka-server
grep sasl.enabled.mechanisms /opt/kafka/config/server.properties

# Example output:
# sasl.enabled.mechanisms=SCRAM-SHA-512

# Match client to broker
kafkaguard scan \
  --bootstrap kafka:9095 \
  --security-protocol SASL_SSL \
  --sasl-mechanism SCRAM-SHA-512  # Match broker config

Step 3: Verify user exists on broker

# List SCRAM users
kafka-configs.sh --bootstrap-server kafka:9095 \
  --command-config admin.properties \
  --describe --entity-type users

# Create user if missing
kafka-configs.sh --bootstrap-server kafka:9095 \
  --command-config admin.properties \
  --alter --add-config 'SCRAM-SHA-512=[password=secret]' \
  --entity-type users --entity-name kafkaguard

Step 4: Test with minimal ACLs

# Grant minimal read permissions
kafka-acls.sh --bootstrap-server kafka:9095 \
  --command-config admin.properties \
  --add --allow-principal User:kafkaguard \
  --operation Describe --cluster kafka-cluster

Prevention

  • Use strong password management (Vault, AWS Secrets Manager)
  • Document SASL mechanism for each cluster
  • Monitor authentication failures
  • Rotate credentials on schedule (30-90 days)
  • Maintain user inventory

Error: TLS certificate verify failed

Full Error Message

Error: x509: certificate signed by unknown authority
Error: tls: failed to verify certificate: x509: certificate signed by unknown authority

Cause

  1. CA certificate not provided
  2. Wrong CA certificate file
  3. Self-signed certificate without proper CA chain
  4. Expired CA certificate

Fix

Step 1: Provide CA certificate

# Specify CA certificate
kafkaguard scan \
  --bootstrap kafka:9093 \
  --security-protocol SSL \
  --tls-ca-cert /path/to/ca-cert.pem

Step 2: Verify CA certificate is correct

# View CA certificate details
openssl x509 -in /path/to/ca-cert.pem -text -noout

# Check issuer matches broker certificate
openssl s_client -connect kafka:9093 -showcerts | openssl x509 -text -noout | grep Issuer

Step 3: Check CA certificate expiry

# Check expiry date
openssl x509 -in /path/to/ca-cert.pem -noout -dates

# Example output:
# notBefore=Jan  1 00:00:00 2023 GMT
# notAfter=Dec 31 23:59:59 2025 GMT

Step 4: Test TLS connection manually

# Test TLS handshake
openssl s_client -connect kafka:9093 -CAfile /path/to/ca-cert.pem

# Should show "Verify return code: 0 (ok)"

Step 5: For self-signed certificates

# Ensure you have the self-signed CA certificate (not the server certificate)
# The CA cert is the one used to sign the server certificate

# If you only have server cert, you may need to extract or obtain the CA cert from broker admin

Prevention

  • Maintain CA certificate inventory
  • Monitor certificate expiry (KG-005 control does this)
  • Use centralized certificate management
  • Document certificate chain for each cluster

Error: TLS handshake failure

Full Error Message

Error: tls: protocol version not supported
Error: tls: handshake failure

Cause

  1. Broker using TLS 1.0 or 1.1 (deprecated)
  2. Cipher suite mismatch
  3. TLS protocol version incompatibility

Fix

Step 1: Check broker TLS version

# Test TLS 1.2 support
openssl s_client -connect kafka:9093 -tls1_2

# Test TLS 1.3 support
openssl s_client -connect kafka:9093 -tls1_3

# If these fail, broker may be using deprecated TLS 1.0/1.1

Step 2: Upgrade broker TLS configuration

# On broker server, update server.properties
# Add or modify:
ssl.protocol=TLSv1.2
# Or for TLS 1.3:
ssl.protocol=TLSv1.3

# Restart broker
systemctl restart kafka

Step 3: Check cipher suites

# List broker's supported cipher suites
openssl s_client -connect kafka:9093 -tls1_2 -cipher 'ALL' 2>&1 | grep Cipher

# KafkaGuard requires modern cipher suites (TLS 1.2+)

Prevention

  • Use TLS 1.2 or 1.3 for all brokers
  • Disable deprecated protocols (TLS 1.0, 1.1)
  • Monitor broker TLS configuration (KG-006 control)
  • Test TLS configuration before deploying

Error: Certificate hostname mismatch

Full Error Message

Error: x509: certificate is valid for kafka-broker1, not kafka.example.com

Cause

Certificate CN (Common Name) or SAN (Subject Alternative Name) doesn't match the hostname used in --bootstrap flag.

Fix

Step 1: Use hostname from certificate

# View certificate details
openssl x509 -in /path/to/broker-cert.pem -text -noout | grep -A1 "Subject:"

# Example output shows CN=kafka-broker1
# Use this hostname:
kafkaguard scan --bootstrap kafka-broker1:9093

Step 2: Check Subject Alternative Names

# View SANs
openssl x509 -in /path/to/broker-cert.pem -text -noout | grep -A5 "Subject Alternative Name"

# Example output:
# DNS:kafka.example.com, DNS:kafka-broker1, IP:192.168.1.100

# Use any of these names/IPs
kafkaguard scan --bootstrap kafka.example.com:9093

Step 3: Regenerate certificate with correct hostnames

# Generate new certificate with correct CN and SANs
# Include all hostnames and IPs that clients will use

# Example: Create certificate with multiple SANs
openssl req -new -x509 -days 365 \
  -key broker-key.pem \
  -out broker-cert.pem \
  -subj "/CN=kafka.example.com" \
  -addext "subjectAltName=DNS:kafka.example.com,DNS:kafka-broker1,IP:192.168.1.100"

Prevention

  • Include all hostnames and IPs in certificate SANs
  • Use wildcard certificates for flexibility (*.example.com)
  • Document hostname conventions
  • Test certificates before deployment

Policy Validation Errors

Error: Policy file not found

Full Error Message

Error: open policies/custom-policy.yaml: no such file or directory

Cause

  1. File path is incorrect
  2. File doesn't exist
  3. Relative path used from wrong directory
  4. Typo in filename

Fix

Step 1: Verify file exists

# Check if file exists
ls -l policies/custom-policy.yaml

# List all policies
ls -l policies/*.yaml

Step 2: Use absolute path

# Use full path to policy file
kafkaguard scan \
  --bootstrap kafka:9092 \
  --policy /full/path/to/policies/custom-policy.yaml

Step 3: Check current directory

# Verify you're in correct directory
pwd

# List files in current directory
ls -l

# If policy is in a different location, cd there first
cd /path/to/kafkaguard
kafkaguard scan --policy policies/custom-policy.yaml

Prevention

  • Use absolute paths in scripts and automation
  • Maintain policy files in standard location (policies/)
  • Version control policy files
  • Document policy file locations

Error: Invalid control ID format

Full Error Message

Error: invalid control ID format 'KG-1' at control index 3
šŸ’” Suggestion: Control IDs must match pattern KG-XXX where XXX is 3 digits (e.g., KG-001, KG-042)

Cause

Control IDs not using 3-digit format (KG-001, KG-042, etc.)

Fix

Step 1: Identify invalid control IDs

# Validate policy to see all errors
kafkaguard validate-policy --policy policies/custom-policy.yaml --log-level debug

Step 2: Update control IDs to 3-digit format

# Before (INCORRECT):
controls:
  - id: KG-1  # āŒ Only 1 digit
    title: "Control 1"
  - id: KG-42  # āŒ Only 2 digits
    title: "Control 42"

# After (CORRECT):
controls:
  - id: KG-001  # āœ… 3 digits
    title: "Control 1"
  - id: KG-042  # āœ… 3 digits
    title: "Control 42"

Step 3: Validate fixed policy

kafkaguard validate-policy --policy policies/custom-policy.yaml

# Expected: āœ… Policy validation successful

Prevention

  • Use 3-digit control IDs from the start (KG-001 to KG-999)
  • Validate policies before deployment (validate-policy command)
  • Use policy templates as starting point
  • Code review for custom policies

Error: CEL expression syntax error

Full Error Message

Error: CEL syntax error in control KG-001: undeclared reference to 'borker' (did you mean 'broker'?)

Cause

  1. Typo in CEL expression
  2. Invalid CEL syntax
  3. Reference to undefined variable
  4. Incorrect CEL function usage

Fix

Step 1: Identify problematic expression

# Validate policy to see exact error location
kafkaguard validate-policy --policy policies/custom-policy.yaml --log-level debug

Step 2: Fix CEL syntax

# Before (INCORRECT):
expr: |
  borker.config["sasl.enabled"] == true  # āŒ Typo: 'borker'

# After (CORRECT):
expr: |
  broker.config["sasl.enabled"] == true  # āœ… Correct: 'broker'

Common CEL Expression Issues:

# āŒ INCORRECT: Missing quotes around keys
expr: broker.config[sasl.enabled]

# āœ… CORRECT: Keys must be quoted
expr: broker.config["sasl.enabled"]

# āŒ INCORRECT: Wrong comparison operator
expr: broker.version = "2.8.0"

# āœ… CORRECT: Use == for comparison
expr: broker.version == "2.8.0"

# āŒ INCORRECT: Undefined variable
expr: cluster.total_brokers > 3

# āœ… CORRECT: Use available variables (broker, topic, cluster)
expr: cluster.broker_count > 3

Step 3: Test CEL expression

# Validate policy after fixing
kafkaguard validate-policy --policy policies/custom-policy.yaml

Step 4: Reference CEL documentation

  • CEL Specification
  • KafkaGuard available variables:
    • broker - Broker configuration and metadata
    • topic - Topic configuration and metadata
    • cluster - Cluster-wide information

Prevention

  • Validate policies before deployment
  • Reference working policy examples
  • Test CEL expressions incrementally
  • Use IDE with YAML and CEL syntax highlighting

Error: Duplicate control ID

Full Error Message

Error: duplicate control ID 'KG-001' found at indices 2 and 5

Cause

Same control ID used multiple times in policy file.

Fix

Step 1: Find duplicate IDs

# Search for duplicate IDs in policy file
grep -n "id: KG-001" policies/custom-policy.yaml

# Example output:
# 15:  id: KG-001
# 42:  id: KG-001  # Duplicate!

Step 2: Assign unique IDs

# Change one of the duplicate IDs to a unique value
controls:
  - id: KG-001
    title: "First control"
  - id: KG-002  # Changed from KG-001 to KG-002
    title: "Second control"

Step 3: Validate policy

kafkaguard validate-policy --policy policies/custom-policy.yaml

Prevention

  • Maintain control ID inventory (KG-001 to KG-999)
  • Use sequential IDs for custom controls (KG-101, KG-102, etc.)
  • Validate policies before committing to version control

Performance Issues

Issue: Scan takes longer than expected

Symptoms

Scan running for 5+ minutes on small cluster
Expected: <60 seconds
Actual: 300+ seconds

Cause

  1. Large cluster (many brokers, topics, partitions)
  2. Network latency between KafkaGuard and Kafka
  3. Timeout too low (causing retries)
  4. Broker overloaded and slow to respond

Fix

Step 1: Increase timeout

# Default timeout is 300 seconds (5 minutes)
# Increase for large clusters
kafkaguard scan \
  --bootstrap kafka:9092 \
  --timeout 600  # 10 minutes

# For very large clusters (1000+ topics)
kafkaguard scan \
  --bootstrap kafka:9092 \
  --timeout 900  # 15 minutes

Step 2: Check network latency

# Measure round-trip time to broker
ping -c 10 kafka.example.com

# Average RTT should be <50ms for good performance

Step 3: Monitor broker load

# Check broker CPU and memory usage
ssh kafka-server
top
htop

# Check Kafka metrics (if JMX enabled)
kafka-run-class.sh kafka.tools.JmxTool \
  --object-name kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec \
  --reporting-interval 1000

Step 4: Enable parallel collection (default)

# Parallel collection is enabled by default
# Verify it's not disabled in config
kafkaguard scan \
  --bootstrap kafka:9092 \
  --parallel true \
  --max-collectors 10  # Increase if many brokers

Prevention

  • Set appropriate timeout based on cluster size
  • Monitor cluster size growth (topics, partitions)
  • Run scans during off-peak hours
  • Monitor network latency

Expected Scan Duration:

Cluster SizeExpected Duration
3 brokers, <100 topics8-15 seconds
5 brokers, 100-500 topics15-30 seconds
10 brokers, 500-1000 topics30-60 seconds
20+ brokers, 1000+ topics60-180 seconds

Issue: High memory usage during scan

Symptoms

Memory usage spikes to >500MB during scan
Expected: <200MB
Actual: 500-1000MB

Cause

  1. Very large cluster (1000+ topics, 5000+ ACLs)
  2. Memory leak (unlikely, but possible)
  3. Large report data (many findings)

Fix

Step 1: Monitor memory usage

# Monitor KafkaGuard memory during scan
top -p $(pgrep kafkaguard)

# Or use ps
ps aux | grep kafkaguard

Step 2: Reduce parallel collectors (if very large cluster)

# Reduce max collectors to limit concurrent operations
kafkaguard scan \
  --bootstrap kafka:9092 \
  --max-collectors 3  # Reduce from default 6

Step 3: Run with lower memory limit (for testing)

# Use ulimit to restrict memory (Linux)
ulimit -v 262144  # Limit to 256MB virtual memory
kafkaguard scan --bootstrap kafka:9092

Step 4: Report high memory usage

If memory usage is consistently >500MB, please report to the KafkaGuard team:

  • Open an issue
  • Include: cluster size, topic count, ACL count, memory usage

Prevention

  • Monitor memory usage in production
  • Set resource limits in containerized environments
  • Scale infrastructure if necessary

Issue: Timeout errors during collection

Symptoms

Error: context deadline exceeded
Error: timeout waiting for broker response

Cause

Timeout too low for cluster size or network conditions.

Fix

# Increase timeout significantly
kafkaguard scan \
  --bootstrap kafka:9092 \
  --timeout 900  # 15 minutes for large clusters

Prevention

  • Set appropriate timeout from the start
  • Monitor scan duration over time (cluster growth)
  • Alert if scan duration increases significantly

Report Generation Errors

Error: Permission denied (writing reports)

Full Error Message

Error: open ./reports/scan-20251115140530-abc123.json: permission denied

Cause

  1. No write permission to output directory
  2. Output directory doesn't exist
  3. Directory owned by different user

Fix

Step 1: Check directory permissions

# Check permissions
ls -ld ./reports

# Expected: drwxrwxr-x (at least user write permission)
# If shows: drw-r--r-- (no write), you need to fix permissions

Step 2: Create directory with correct permissions

# Create reports directory
mkdir -p reports

# Ensure you have write permission
chmod 755 reports

# Or make writable by all (if appropriate)
chmod 777 reports

Step 3: Use different output directory

# Write to /tmp (always writable)
kafkaguard scan \
  --bootstrap kafka:9092 \
  --out /tmp/kafkaguard-reports

# Or use home directory
kafkaguard scan \
  --bootstrap kafka:9092 \
  --out ~/kafkaguard-reports

Step 4: Fix ownership (if directory owned by different user)

# Change ownership to current user
sudo chown -R $(whoami):$(whoami) reports/

# Verify
ls -ld reports/

Prevention

  • Create report directories before first scan
  • Set appropriate permissions (755 or 775)
  • Use standard report locations (/var/reports/kafkaguard)
  • Document report directory locations

Error: No space left on device

Full Error Message

Error: write ./reports/scan-20251115140530-abc123.pdf: no space left on device

Cause

Insufficient disk space for report generation (PDF reports can be large).

Fix

Step 1: Check disk space

# Check available space
df -h

# Check reports directory usage
du -sh reports/

# Find large files
du -h reports/ | sort -h | tail -20

Step 2: Free up disk space

# Delete old reports (older than 90 days)
find reports/ -name "scan-*.json" -mtime +90 -delete

# Compress old PDF reports
find reports/ -name "scan-*.pdf" -mtime +30 -exec gzip {} \;

# Or move reports to different disk
mv reports/ /mnt/large-disk/kafkaguard-reports/

Step 3: Use different output directory with more space

# Use disk with more space
kafkaguard scan \
  --bootstrap kafka:9092 \
  --out /mnt/large-disk/kafkaguard-reports

Step 4: Reduce report formats (temporary)

# Generate only JSON (smallest)
kafkaguard scan \
  --bootstrap kafka:9092 \
  --format json

# Skip PDF if not needed
kafkaguard scan \
  --bootstrap kafka:9092 \
  --format json,html

Prevention

  • Monitor disk space (set up alerts)
  • Implement report retention policy (delete old reports)
  • Compress archived reports
  • Use dedicated storage for reports

Typical Report Sizes:

  • JSON: 50-500 KB
  • HTML: 100 KB - 1 MB
  • PDF: 500 KB - 5 MB
  • CSV: 20-200 KB

Error: Report generation failed

Full Error Message

Error: failed to generate PDF report: template error: ...

Cause

  1. Invalid report data
  2. Template rendering error
  3. Missing fonts (for PDF generation)
  4. Corrupted scan results

Fix

Step 1: Check scan results (JSON)

# Verify JSON report is valid
LATEST_JSON=$(ls -t reports/scan-*.json | head -1)
cat "$LATEST_JSON" | jq '.'

# If jq fails, JSON is corrupted

Step 2: Generate only JSON to isolate issue

# Generate JSON only (always works)
kafkaguard scan \
  --bootstrap kafka:9092 \
  --format json

# Then try other formats individually
kafkaguard scan \
  --bootstrap kafka:9092 \
  --format html

kafkaguard scan \
  --bootstrap kafka:9092 \
  --format pdf

Step 3: Report issue with debug logs

# Run with debug logging
kafkaguard scan \
  --bootstrap kafka:9092 \
  --format pdf \
  --log-level debug \
  2>&1 | tee debug-report-gen.log

# Share debug-report-gen.log in GitHub issue

Step 4: Workaround (use JSON and convert manually)

# Generate JSON report
kafkaguard scan \
  --bootstrap kafka:9092 \
  --format json

# Convert to HTML/PDF manually (if needed)
# Use external tools or custom scripts

Prevention

  • Always generate JSON (most reliable)
  • Monitor report generation errors
  • Test report formats after KafkaGuard updates

Debugging Techniques

Enable Debug Logging

# Run scan with debug logging
kafkaguard scan \
  --bootstrap kafka:9092 \
  --log-level debug \
  2>&1 | tee scan-debug.log

# Review debug log
less scan-debug.log

Test Kafka Connectivity

# Test TCP connection
telnet kafka.example.com 9092

# Test TLS connection
openssl s_client -connect kafka:9093 -CAfile /path/to/ca-cert.pem

# Test SASL authentication
kafka-console-consumer.sh \
  --bootstrap-server kafka:9095 \
  --consumer.config client.properties \
  --topic __consumer_offsets \
  --max-messages 1

Validate Policy Files

# Validate policy syntax
kafkaguard validate-policy --policy policies/custom-policy.yaml

# Check YAML syntax
yamllint policies/custom-policy.yaml

# Or use Python
python3 -c "import yaml; yaml.safe_load(open('policies/custom-policy.yaml'))"

Test with Minimal Policy

# minimal-test-policy.yaml
version: "1.0"
name: "Minimal Test Policy"
description: "Minimal policy for testing connectivity"
tier: "test"

controls:
  - id: KG-001
    title: "Test control"
    description: "Always passes"
    severity: LOW
    category: operational
    expr: "true"  # Always passes
    remediation: "N/A"
    compliance:
      pci_dss: []
      soc2: []
      iso27001: []
# Test with minimal policy
kafkaguard scan \
  --bootstrap kafka:9092 \
  --policy minimal-test-policy.yaml

# If this works, issue is likely in policy file
# If this fails, issue is likely in connectivity or configuration

Use --no-color for CI/CD Log Parsing

# Disable colored output
kafkaguard scan \
  --bootstrap kafka:9092 \
  --no-color

# Easier to parse in CI/CD logs

Capture Network Traffic (Advanced)

# Capture packets for analysis (requires root)
sudo tcpdump -i any -w kafkaguard-traffic.pcap port 9092

# In another terminal, run kafkaguard scan
kafkaguard scan --bootstrap kafka:9092

# Stop tcpdump (Ctrl+C)

# Analyze with Wireshark
wireshark kafkaguard-traffic.pcap

Support Resources

GitHub Issues

When opening an issue, include:

  • KafkaGuard version (kafkaguard version)
  • Operating system and version
  • Full command you ran
  • Complete error message
  • Debug logs (if applicable)
  • Kafka version and configuration
  • Steps to reproduce

GitHub Discussions

  • Ask Questions: Start a discussion
  • Share Ideas: Community discussions
  • Get Help: Non-bug help requests

Documentation

Community Support


Common Error Patterns Summary

Error PatternLikely CauseQuick Fix
connection refusedBroker down or wrong portVerify broker is running, check port
i/o timeoutNetwork latency or firewallIncrease --timeout, check firewall
no such hostDNS failure or typoUse IP address or fix DNS
connection resetProtocol mismatchVerify --security-protocol
SASL authentication failedWrong credentialsVerify username/password, check mechanism
certificate verify failedMissing/wrong CA certProvide correct CA cert with --tls-ca-cert
tls: handshake failureTLS version mismatchUpgrade broker to TLS 1.2+
policy file not foundWrong pathUse absolute path
invalid control IDWrong ID formatUse 3-digit format (KG-001)
CEL syntax errorTypo in expressionFix CEL expression, validate policy
permission denied (reports)No write permissionCreate directory with write permissions
no space leftDisk fullFree up space or use different directory

Next Steps

If you couldn't resolve your issue:

  1. Check GitHub Issues - Someone may have encountered the same problem
  2. Enable Debug Logging - Run with --log-level debug and review logs
  3. Test Incrementally - Use minimal policy, test connectivity separately
  4. Open an Issue - Provide all requested information for faster resolution

Document Information

  • Last Updated: 2025-11-15
  • Applies to Version: KafkaGuard 1.0.0+
  • Feedback: Open an issue for improvements