Troubleshooting Guide
Comprehensive troubleshooting guide for resolving common KafkaGuard issues and errors.
Table of Contents
- Overview
- Connection Errors
- Authentication Errors
- Policy Validation Errors
- Performance Issues
- Report Generation Errors
- Debugging Techniques
- Support Resources
Overview
This guide helps you diagnose and resolve common KafkaGuard issues. Each section includes:
- Error message - What you see
- Cause - Why it happened
- Fix - How to resolve it
- Prevention - How to avoid it in the future
Quick Troubleshooting Checklist
Before diving into specific errors, verify these basics:
- KafkaGuard is installed and in PATH (
kafkaguard version) - Kafka brokers are reachable from your network
- Correct bootstrap server addresses (hostname and port)
- Credentials are correct (if using SASL)
- Certificates are valid (if using SSL/TLS)
- Policy file exists and is valid
- Output directory has write permissions
Connection Errors
Error: connection refused
Full Error Message
Error: kafka: client has run out of available brokers to talk to
Error: dial tcp 192.168.1.100:9092: connect: connection refused
Cause
- Kafka broker is not running
- Wrong broker address (hostname or port)
- Network firewall blocking Kafka ports
- Broker not listening on specified interface
Fix
Step 1: Verify broker is running
# Check Kafka process
ps aux | grep kafka
# Check Kafka service status
systemctl status kafka # Linux systemd
Step 2: Test network connectivity
# Test TCP connection
telnet kafka.example.com 9092
# Or use netcat
nc -zv kafka.example.com 9092
# Or use kafkaGuard with debug logging
kafkaguard scan --bootstrap kafka:9092 --log-level debug
Step 3: Check broker address and port
# Verify bootstrap servers
# Common ports:
# - 9092: PLAINTEXT
# - 9093: SSL
# - 9094: SASL_PLAINTEXT
# - 9095: SASL_SSL
# Try different port
kafkaguard scan --bootstrap kafka:9093 # SSL port
Step 4: Check firewall rules
# Linux: Check iptables
sudo iptables -L -n | grep 9092
# macOS: Check pf firewall
sudo pfctl -s rules
# Test from broker server directly
ssh kafka-server
telnet localhost 9092
Prevention
- Maintain inventory of broker addresses and ports
- Monitor broker health (use monitoring tools)
- Document network topology and firewall rules
- Use DNS names instead of IP addresses (easier to update)
Error: i/o timeout
Full Error Message
Error: kafka: client has run out of available brokers to talk to
Error: dial tcp 192.168.1.100:9092: i/o timeout
Cause
- Network latency or packet loss
- Firewall dropping packets (no explicit deny)
- Broker overloaded and not responding
- Timeout too low for network conditions
Fix
Step 1: Test network latency
# Ping broker
ping kafka.example.com
# Traceroute to broker
traceroute kafka.example.com
# Test TCP latency
time telnet kafka.example.com 9092
Step 2: Increase timeout
# Increase scan timeout to 600 seconds (10 minutes)
kafkaguard scan \
--bootstrap kafka:9092 \
--timeout 600
# For very large clusters or high latency networks
kafkaguard scan \
--bootstrap kafka:9092 \
--timeout 900 # 15 minutes
Step 3: Check broker load
# On broker server, check CPU and memory
top
htop
# Check Kafka metrics
kafka-broker-api-versions.sh --bootstrap-server localhost:9092
Step 4: Verify firewall is not dropping packets
# Run tcpdump on broker to see if packets arrive
sudo tcpdump -i any port 9092
# Check for firewall rules with silent drops
sudo iptables -L -v -n | grep DROP
Prevention
- Set appropriate timeout based on network conditions
- Monitor network latency between KafkaGuard and Kafka brokers
- Configure firewall rules to explicitly deny (not drop silently)
- Scale Kafka brokers if consistently overloaded
Error: no such host
Full Error Message
Error: kafka: client has run out of available brokers to talk to
Error: dial tcp: lookup kafka.example.com: no such host
Cause
- DNS resolution failure
- Hostname typo in
--bootstrapflag - DNS server unreachable
- Host entry missing from
/etc/hosts
Fix
Step 1: Verify hostname spelling
# Check for typos
echo "kafka.example.com" # Verify exact spelling
Step 2: Test DNS resolution
# Test DNS lookup
nslookup kafka.example.com
dig kafka.example.com
# Check which DNS server is being used
cat /etc/resolv.conf
Step 3: Use IP address instead
# If DNS is failing, use IP address temporarily
kafkaguard scan --bootstrap 192.168.1.100:9092
Step 4: Add to /etc/hosts (if DNS unavailable)
# Add entry to /etc/hosts
sudo bash -c 'echo "192.168.1.100 kafka.example.com" >> /etc/hosts'
# Verify
cat /etc/hosts | grep kafka
Prevention
- Use DNS for production (more flexible than IP addresses)
- Maintain accurate DNS records
- Monitor DNS health
- Document broker hostnames and IP addresses
Error: connection reset by peer
Full Error Message
Error: kafka: client has run out of available brokers to talk to
Error: read tcp 192.168.1.10:12345->192.168.1.100:9092: read: connection reset by peer
Cause
- Broker closed connection (authentication failure)
- Security protocol mismatch (using PLAINTEXT for SASL_SSL broker)
- TLS handshake failure
- Broker restarted during scan
Fix
Step 1: Verify security protocol
# Try auto-detection (no --security-protocol flag)
kafkaguard scan --bootstrap kafka:9092
# Or explicitly specify protocol
kafkaguard scan \
--bootstrap kafka:9095 \
--security-protocol SASL_SSL
Step 2: Check broker listener configuration
# On broker server, check listeners config
grep listeners /opt/kafka/config/server.properties
# Example output:
# listeners=PLAINTEXT://0.0.0.0:9092,SASL_SSL://0.0.0.0:9095
Step 3: Enable debug logging
kafkaguard scan \
--bootstrap kafka:9092 \
--log-level debug \
2>&1 | tee debug.log
# Review debug.log for TLS or SASL errors
Step 4: Verify broker is stable
# Check broker uptime
ssh kafka-server
uptime
# Check Kafka logs for restarts
tail -100 /var/log/kafka/server.log
Prevention
- Document broker security protocols for each cluster
- Monitor broker restarts
- Use explicit
--security-protocolflag in production - Test scans during maintenance windows
Authentication Errors
Error: SASL authentication failed
Full Error Message
Error: kafka: client has run out of available brokers to talk to
Error: SASL authentication failed: kafka: invalid username or password
Cause
- Invalid username or password
- SASL mechanism mismatch (client uses SHA-256, broker requires SHA-512)
- User not created on Kafka broker
- User credentials expired or revoked
Fix
Step 1: Verify credentials
# Check environment variables
echo "Username: $KAFKAGUARD_SASL_USERNAME"
echo "Password length: ${#KAFKAGUARD_SASL_PASSWORD}"
# Verify credentials with kafka-console-consumer
kafka-console-consumer.sh \
--bootstrap-server kafka:9095 \
--consumer.config client.properties \
--topic __consumer_offsets \
--max-messages 1
Step 2: Check SASL mechanism
# List broker's enabled mechanisms
ssh kafka-server
grep sasl.enabled.mechanisms /opt/kafka/config/server.properties
# Example output:
# sasl.enabled.mechanisms=SCRAM-SHA-512
# Match client to broker
kafkaguard scan \
--bootstrap kafka:9095 \
--security-protocol SASL_SSL \
--sasl-mechanism SCRAM-SHA-512 # Match broker config
Step 3: Verify user exists on broker
# List SCRAM users
kafka-configs.sh --bootstrap-server kafka:9095 \
--command-config admin.properties \
--describe --entity-type users
# Create user if missing
kafka-configs.sh --bootstrap-server kafka:9095 \
--command-config admin.properties \
--alter --add-config 'SCRAM-SHA-512=[password=secret]' \
--entity-type users --entity-name kafkaguard
Step 4: Test with minimal ACLs
# Grant minimal read permissions
kafka-acls.sh --bootstrap-server kafka:9095 \
--command-config admin.properties \
--add --allow-principal User:kafkaguard \
--operation Describe --cluster kafka-cluster
Prevention
- Use strong password management (Vault, AWS Secrets Manager)
- Document SASL mechanism for each cluster
- Monitor authentication failures
- Rotate credentials on schedule (30-90 days)
- Maintain user inventory
Error: TLS certificate verify failed
Full Error Message
Error: x509: certificate signed by unknown authority
Error: tls: failed to verify certificate: x509: certificate signed by unknown authority
Cause
- CA certificate not provided
- Wrong CA certificate file
- Self-signed certificate without proper CA chain
- Expired CA certificate
Fix
Step 1: Provide CA certificate
# Specify CA certificate
kafkaguard scan \
--bootstrap kafka:9093 \
--security-protocol SSL \
--tls-ca-cert /path/to/ca-cert.pem
Step 2: Verify CA certificate is correct
# View CA certificate details
openssl x509 -in /path/to/ca-cert.pem -text -noout
# Check issuer matches broker certificate
openssl s_client -connect kafka:9093 -showcerts | openssl x509 -text -noout | grep Issuer
Step 3: Check CA certificate expiry
# Check expiry date
openssl x509 -in /path/to/ca-cert.pem -noout -dates
# Example output:
# notBefore=Jan 1 00:00:00 2023 GMT
# notAfter=Dec 31 23:59:59 2025 GMT
Step 4: Test TLS connection manually
# Test TLS handshake
openssl s_client -connect kafka:9093 -CAfile /path/to/ca-cert.pem
# Should show "Verify return code: 0 (ok)"
Step 5: For self-signed certificates
# Ensure you have the self-signed CA certificate (not the server certificate)
# The CA cert is the one used to sign the server certificate
# If you only have server cert, you may need to extract or obtain the CA cert from broker admin
Prevention
- Maintain CA certificate inventory
- Monitor certificate expiry (KG-005 control does this)
- Use centralized certificate management
- Document certificate chain for each cluster
Error: TLS handshake failure
Full Error Message
Error: tls: protocol version not supported
Error: tls: handshake failure
Cause
- Broker using TLS 1.0 or 1.1 (deprecated)
- Cipher suite mismatch
- TLS protocol version incompatibility
Fix
Step 1: Check broker TLS version
# Test TLS 1.2 support
openssl s_client -connect kafka:9093 -tls1_2
# Test TLS 1.3 support
openssl s_client -connect kafka:9093 -tls1_3
# If these fail, broker may be using deprecated TLS 1.0/1.1
Step 2: Upgrade broker TLS configuration
# On broker server, update server.properties
# Add or modify:
ssl.protocol=TLSv1.2
# Or for TLS 1.3:
ssl.protocol=TLSv1.3
# Restart broker
systemctl restart kafka
Step 3: Check cipher suites
# List broker's supported cipher suites
openssl s_client -connect kafka:9093 -tls1_2 -cipher 'ALL' 2>&1 | grep Cipher
# KafkaGuard requires modern cipher suites (TLS 1.2+)
Prevention
- Use TLS 1.2 or 1.3 for all brokers
- Disable deprecated protocols (TLS 1.0, 1.1)
- Monitor broker TLS configuration (KG-006 control)
- Test TLS configuration before deploying
Error: Certificate hostname mismatch
Full Error Message
Error: x509: certificate is valid for kafka-broker1, not kafka.example.com
Cause
Certificate CN (Common Name) or SAN (Subject Alternative Name) doesn't match the hostname used in --bootstrap flag.
Fix
Step 1: Use hostname from certificate
# View certificate details
openssl x509 -in /path/to/broker-cert.pem -text -noout | grep -A1 "Subject:"
# Example output shows CN=kafka-broker1
# Use this hostname:
kafkaguard scan --bootstrap kafka-broker1:9093
Step 2: Check Subject Alternative Names
# View SANs
openssl x509 -in /path/to/broker-cert.pem -text -noout | grep -A5 "Subject Alternative Name"
# Example output:
# DNS:kafka.example.com, DNS:kafka-broker1, IP:192.168.1.100
# Use any of these names/IPs
kafkaguard scan --bootstrap kafka.example.com:9093
Step 3: Regenerate certificate with correct hostnames
# Generate new certificate with correct CN and SANs
# Include all hostnames and IPs that clients will use
# Example: Create certificate with multiple SANs
openssl req -new -x509 -days 365 \
-key broker-key.pem \
-out broker-cert.pem \
-subj "/CN=kafka.example.com" \
-addext "subjectAltName=DNS:kafka.example.com,DNS:kafka-broker1,IP:192.168.1.100"
Prevention
- Include all hostnames and IPs in certificate SANs
- Use wildcard certificates for flexibility (*.example.com)
- Document hostname conventions
- Test certificates before deployment
Policy Validation Errors
Error: Policy file not found
Full Error Message
Error: open policies/custom-policy.yaml: no such file or directory
Cause
- File path is incorrect
- File doesn't exist
- Relative path used from wrong directory
- Typo in filename
Fix
Step 1: Verify file exists
# Check if file exists
ls -l policies/custom-policy.yaml
# List all policies
ls -l policies/*.yaml
Step 2: Use absolute path
# Use full path to policy file
kafkaguard scan \
--bootstrap kafka:9092 \
--policy /full/path/to/policies/custom-policy.yaml
Step 3: Check current directory
# Verify you're in correct directory
pwd
# List files in current directory
ls -l
# If policy is in a different location, cd there first
cd /path/to/kafkaguard
kafkaguard scan --policy policies/custom-policy.yaml
Prevention
- Use absolute paths in scripts and automation
- Maintain policy files in standard location (
policies/) - Version control policy files
- Document policy file locations
Error: Invalid control ID format
Full Error Message
Error: invalid control ID format 'KG-1' at control index 3
š” Suggestion: Control IDs must match pattern KG-XXX where XXX is 3 digits (e.g., KG-001, KG-042)
Cause
Control IDs not using 3-digit format (KG-001, KG-042, etc.)
Fix
Step 1: Identify invalid control IDs
# Validate policy to see all errors
kafkaguard validate-policy --policy policies/custom-policy.yaml --log-level debug
Step 2: Update control IDs to 3-digit format
# Before (INCORRECT):
controls:
- id: KG-1 # ā Only 1 digit
title: "Control 1"
- id: KG-42 # ā Only 2 digits
title: "Control 42"
# After (CORRECT):
controls:
- id: KG-001 # ā
3 digits
title: "Control 1"
- id: KG-042 # ā
3 digits
title: "Control 42"
Step 3: Validate fixed policy
kafkaguard validate-policy --policy policies/custom-policy.yaml
# Expected: ā
Policy validation successful
Prevention
- Use 3-digit control IDs from the start (KG-001 to KG-999)
- Validate policies before deployment (
validate-policycommand) - Use policy templates as starting point
- Code review for custom policies
Error: CEL expression syntax error
Full Error Message
Error: CEL syntax error in control KG-001: undeclared reference to 'borker' (did you mean 'broker'?)
Cause
- Typo in CEL expression
- Invalid CEL syntax
- Reference to undefined variable
- Incorrect CEL function usage
Fix
Step 1: Identify problematic expression
# Validate policy to see exact error location
kafkaguard validate-policy --policy policies/custom-policy.yaml --log-level debug
Step 2: Fix CEL syntax
# Before (INCORRECT):
expr: |
borker.config["sasl.enabled"] == true # ā Typo: 'borker'
# After (CORRECT):
expr: |
broker.config["sasl.enabled"] == true # ā
Correct: 'broker'
Common CEL Expression Issues:
# ā INCORRECT: Missing quotes around keys
expr: broker.config[sasl.enabled]
# ā
CORRECT: Keys must be quoted
expr: broker.config["sasl.enabled"]
# ā INCORRECT: Wrong comparison operator
expr: broker.version = "2.8.0"
# ā
CORRECT: Use == for comparison
expr: broker.version == "2.8.0"
# ā INCORRECT: Undefined variable
expr: cluster.total_brokers > 3
# ā
CORRECT: Use available variables (broker, topic, cluster)
expr: cluster.broker_count > 3
Step 3: Test CEL expression
# Validate policy after fixing
kafkaguard validate-policy --policy policies/custom-policy.yaml
Step 4: Reference CEL documentation
- CEL Specification
- KafkaGuard available variables:
broker- Broker configuration and metadatatopic- Topic configuration and metadatacluster- Cluster-wide information
Prevention
- Validate policies before deployment
- Reference working policy examples
- Test CEL expressions incrementally
- Use IDE with YAML and CEL syntax highlighting
Error: Duplicate control ID
Full Error Message
Error: duplicate control ID 'KG-001' found at indices 2 and 5
Cause
Same control ID used multiple times in policy file.
Fix
Step 1: Find duplicate IDs
# Search for duplicate IDs in policy file
grep -n "id: KG-001" policies/custom-policy.yaml
# Example output:
# 15: id: KG-001
# 42: id: KG-001 # Duplicate!
Step 2: Assign unique IDs
# Change one of the duplicate IDs to a unique value
controls:
- id: KG-001
title: "First control"
- id: KG-002 # Changed from KG-001 to KG-002
title: "Second control"
Step 3: Validate policy
kafkaguard validate-policy --policy policies/custom-policy.yaml
Prevention
- Maintain control ID inventory (KG-001 to KG-999)
- Use sequential IDs for custom controls (KG-101, KG-102, etc.)
- Validate policies before committing to version control
Performance Issues
Issue: Scan takes longer than expected
Symptoms
Scan running for 5+ minutes on small cluster
Expected: <60 seconds
Actual: 300+ seconds
Cause
- Large cluster (many brokers, topics, partitions)
- Network latency between KafkaGuard and Kafka
- Timeout too low (causing retries)
- Broker overloaded and slow to respond
Fix
Step 1: Increase timeout
# Default timeout is 300 seconds (5 minutes)
# Increase for large clusters
kafkaguard scan \
--bootstrap kafka:9092 \
--timeout 600 # 10 minutes
# For very large clusters (1000+ topics)
kafkaguard scan \
--bootstrap kafka:9092 \
--timeout 900 # 15 minutes
Step 2: Check network latency
# Measure round-trip time to broker
ping -c 10 kafka.example.com
# Average RTT should be <50ms for good performance
Step 3: Monitor broker load
# Check broker CPU and memory usage
ssh kafka-server
top
htop
# Check Kafka metrics (if JMX enabled)
kafka-run-class.sh kafka.tools.JmxTool \
--object-name kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec \
--reporting-interval 1000
Step 4: Enable parallel collection (default)
# Parallel collection is enabled by default
# Verify it's not disabled in config
kafkaguard scan \
--bootstrap kafka:9092 \
--parallel true \
--max-collectors 10 # Increase if many brokers
Prevention
- Set appropriate timeout based on cluster size
- Monitor cluster size growth (topics, partitions)
- Run scans during off-peak hours
- Monitor network latency
Expected Scan Duration:
| Cluster Size | Expected Duration |
|---|---|
| 3 brokers, <100 topics | 8-15 seconds |
| 5 brokers, 100-500 topics | 15-30 seconds |
| 10 brokers, 500-1000 topics | 30-60 seconds |
| 20+ brokers, 1000+ topics | 60-180 seconds |
Issue: High memory usage during scan
Symptoms
Memory usage spikes to >500MB during scan
Expected: <200MB
Actual: 500-1000MB
Cause
- Very large cluster (1000+ topics, 5000+ ACLs)
- Memory leak (unlikely, but possible)
- Large report data (many findings)
Fix
Step 1: Monitor memory usage
# Monitor KafkaGuard memory during scan
top -p $(pgrep kafkaguard)
# Or use ps
ps aux | grep kafkaguard
Step 2: Reduce parallel collectors (if very large cluster)
# Reduce max collectors to limit concurrent operations
kafkaguard scan \
--bootstrap kafka:9092 \
--max-collectors 3 # Reduce from default 6
Step 3: Run with lower memory limit (for testing)
# Use ulimit to restrict memory (Linux)
ulimit -v 262144 # Limit to 256MB virtual memory
kafkaguard scan --bootstrap kafka:9092
Step 4: Report high memory usage
If memory usage is consistently >500MB, please report to the KafkaGuard team:
- Open an issue
- Include: cluster size, topic count, ACL count, memory usage
Prevention
- Monitor memory usage in production
- Set resource limits in containerized environments
- Scale infrastructure if necessary
Issue: Timeout errors during collection
Symptoms
Error: context deadline exceeded
Error: timeout waiting for broker response
Cause
Timeout too low for cluster size or network conditions.
Fix
# Increase timeout significantly
kafkaguard scan \
--bootstrap kafka:9092 \
--timeout 900 # 15 minutes for large clusters
Prevention
- Set appropriate timeout from the start
- Monitor scan duration over time (cluster growth)
- Alert if scan duration increases significantly
Report Generation Errors
Error: Permission denied (writing reports)
Full Error Message
Error: open ./reports/scan-20251115140530-abc123.json: permission denied
Cause
- No write permission to output directory
- Output directory doesn't exist
- Directory owned by different user
Fix
Step 1: Check directory permissions
# Check permissions
ls -ld ./reports
# Expected: drwxrwxr-x (at least user write permission)
# If shows: drw-r--r-- (no write), you need to fix permissions
Step 2: Create directory with correct permissions
# Create reports directory
mkdir -p reports
# Ensure you have write permission
chmod 755 reports
# Or make writable by all (if appropriate)
chmod 777 reports
Step 3: Use different output directory
# Write to /tmp (always writable)
kafkaguard scan \
--bootstrap kafka:9092 \
--out /tmp/kafkaguard-reports
# Or use home directory
kafkaguard scan \
--bootstrap kafka:9092 \
--out ~/kafkaguard-reports
Step 4: Fix ownership (if directory owned by different user)
# Change ownership to current user
sudo chown -R $(whoami):$(whoami) reports/
# Verify
ls -ld reports/
Prevention
- Create report directories before first scan
- Set appropriate permissions (755 or 775)
- Use standard report locations (
/var/reports/kafkaguard) - Document report directory locations
Error: No space left on device
Full Error Message
Error: write ./reports/scan-20251115140530-abc123.pdf: no space left on device
Cause
Insufficient disk space for report generation (PDF reports can be large).
Fix
Step 1: Check disk space
# Check available space
df -h
# Check reports directory usage
du -sh reports/
# Find large files
du -h reports/ | sort -h | tail -20
Step 2: Free up disk space
# Delete old reports (older than 90 days)
find reports/ -name "scan-*.json" -mtime +90 -delete
# Compress old PDF reports
find reports/ -name "scan-*.pdf" -mtime +30 -exec gzip {} \;
# Or move reports to different disk
mv reports/ /mnt/large-disk/kafkaguard-reports/
Step 3: Use different output directory with more space
# Use disk with more space
kafkaguard scan \
--bootstrap kafka:9092 \
--out /mnt/large-disk/kafkaguard-reports
Step 4: Reduce report formats (temporary)
# Generate only JSON (smallest)
kafkaguard scan \
--bootstrap kafka:9092 \
--format json
# Skip PDF if not needed
kafkaguard scan \
--bootstrap kafka:9092 \
--format json,html
Prevention
- Monitor disk space (set up alerts)
- Implement report retention policy (delete old reports)
- Compress archived reports
- Use dedicated storage for reports
Typical Report Sizes:
- JSON: 50-500 KB
- HTML: 100 KB - 1 MB
- PDF: 500 KB - 5 MB
- CSV: 20-200 KB
Error: Report generation failed
Full Error Message
Error: failed to generate PDF report: template error: ...
Cause
- Invalid report data
- Template rendering error
- Missing fonts (for PDF generation)
- Corrupted scan results
Fix
Step 1: Check scan results (JSON)
# Verify JSON report is valid
LATEST_JSON=$(ls -t reports/scan-*.json | head -1)
cat "$LATEST_JSON" | jq '.'
# If jq fails, JSON is corrupted
Step 2: Generate only JSON to isolate issue
# Generate JSON only (always works)
kafkaguard scan \
--bootstrap kafka:9092 \
--format json
# Then try other formats individually
kafkaguard scan \
--bootstrap kafka:9092 \
--format html
kafkaguard scan \
--bootstrap kafka:9092 \
--format pdf
Step 3: Report issue with debug logs
# Run with debug logging
kafkaguard scan \
--bootstrap kafka:9092 \
--format pdf \
--log-level debug \
2>&1 | tee debug-report-gen.log
# Share debug-report-gen.log in GitHub issue
Step 4: Workaround (use JSON and convert manually)
# Generate JSON report
kafkaguard scan \
--bootstrap kafka:9092 \
--format json
# Convert to HTML/PDF manually (if needed)
# Use external tools or custom scripts
Prevention
- Always generate JSON (most reliable)
- Monitor report generation errors
- Test report formats after KafkaGuard updates
Debugging Techniques
Enable Debug Logging
# Run scan with debug logging
kafkaguard scan \
--bootstrap kafka:9092 \
--log-level debug \
2>&1 | tee scan-debug.log
# Review debug log
less scan-debug.log
Test Kafka Connectivity
# Test TCP connection
telnet kafka.example.com 9092
# Test TLS connection
openssl s_client -connect kafka:9093 -CAfile /path/to/ca-cert.pem
# Test SASL authentication
kafka-console-consumer.sh \
--bootstrap-server kafka:9095 \
--consumer.config client.properties \
--topic __consumer_offsets \
--max-messages 1
Validate Policy Files
# Validate policy syntax
kafkaguard validate-policy --policy policies/custom-policy.yaml
# Check YAML syntax
yamllint policies/custom-policy.yaml
# Or use Python
python3 -c "import yaml; yaml.safe_load(open('policies/custom-policy.yaml'))"
Test with Minimal Policy
# minimal-test-policy.yaml
version: "1.0"
name: "Minimal Test Policy"
description: "Minimal policy for testing connectivity"
tier: "test"
controls:
- id: KG-001
title: "Test control"
description: "Always passes"
severity: LOW
category: operational
expr: "true" # Always passes
remediation: "N/A"
compliance:
pci_dss: []
soc2: []
iso27001: []
# Test with minimal policy
kafkaguard scan \
--bootstrap kafka:9092 \
--policy minimal-test-policy.yaml
# If this works, issue is likely in policy file
# If this fails, issue is likely in connectivity or configuration
Use --no-color for CI/CD Log Parsing
# Disable colored output
kafkaguard scan \
--bootstrap kafka:9092 \
--no-color
# Easier to parse in CI/CD logs
Capture Network Traffic (Advanced)
# Capture packets for analysis (requires root)
sudo tcpdump -i any -w kafkaguard-traffic.pcap port 9092
# In another terminal, run kafkaguard scan
kafkaguard scan --bootstrap kafka:9092
# Stop tcpdump (Ctrl+C)
# Analyze with Wireshark
wireshark kafkaguard-traffic.pcap
Support Resources
GitHub Issues
- Bug Reports: Open an issue
- Feature Requests: Request a feature
- Search Existing Issues: Browse issues
When opening an issue, include:
- KafkaGuard version (
kafkaguard version) - Operating system and version
- Full command you ran
- Complete error message
- Debug logs (if applicable)
- Kafka version and configuration
- Steps to reproduce
GitHub Discussions
- Ask Questions: Start a discussion
- Share Ideas: Community discussions
- Get Help: Non-bug help requests
Documentation
- User Guide: User Guide Index
- CLI Reference: CLI Reference
- Configuration Guide: Configuration Guide
- Security Guide: Security & Authentication
- README: Introduction
Community Support
- Contribute: Contributing Guide
- Development: Development Guide
Common Error Patterns Summary
| Error Pattern | Likely Cause | Quick Fix |
|---|---|---|
connection refused | Broker down or wrong port | Verify broker is running, check port |
i/o timeout | Network latency or firewall | Increase --timeout, check firewall |
no such host | DNS failure or typo | Use IP address or fix DNS |
connection reset | Protocol mismatch | Verify --security-protocol |
SASL authentication failed | Wrong credentials | Verify username/password, check mechanism |
certificate verify failed | Missing/wrong CA cert | Provide correct CA cert with --tls-ca-cert |
tls: handshake failure | TLS version mismatch | Upgrade broker to TLS 1.2+ |
policy file not found | Wrong path | Use absolute path |
invalid control ID | Wrong ID format | Use 3-digit format (KG-001) |
CEL syntax error | Typo in expression | Fix CEL expression, validate policy |
permission denied (reports) | No write permission | Create directory with write permissions |
no space left | Disk full | Free up space or use different directory |
Next Steps
If you couldn't resolve your issue:
- Check GitHub Issues - Someone may have encountered the same problem
- Enable Debug Logging - Run with
--log-level debugand review logs - Test Incrementally - Use minimal policy, test connectivity separately
- Open an Issue - Provide all requested information for faster resolution
Document Information
- Last Updated: 2025-11-15
- Applies to Version: KafkaGuard 1.0.0+
- Feedback: Open an issue for improvements