Common issues
Comprehensive troubleshooting guide for resolving common Operion deployment and runtime issues
This guide covers common issues you may encounter when running Operion and how to resolve them.
Service Startup Issues
Service Won't Start
Symptoms:
- Service exits immediately after startup
- "Address already in use" errors
- Permission denied errors
Diagnosis:
# Check if port is already in use
lsof -i :3000
netstat -tulpn | grep :3000
# Check service logs
./bin/operion-api
# or
journalctl -u operion-api -f
# Check permissions
ls -la bin/
ls -la data/
Solutions:
Port already in use:
# Kill process using the port
sudo kill $(lsof -t -i:3000)
# Or change the port
export PORT=3001
./bin/operion-api
Permission errors:
# Fix file permissions
sudo chown -R $USER:$USER /path/to/operion
chmod +x bin/*
# Fix directory permissions - data dir should match DATABASE_URL path
chmod 755 data/
chmod 755 plugins/ # Should match PLUGINS_PATH
Missing environment variables:
# Check required variables
export DATABASE_URL="postgresql://user:password@localhost:5432/operion"
export EVENT_BUS_TYPE="kafka"
export PLUGINS_PATH="./plugins"
# Or create .env file
cat > .env << EOF
DATABASE_URL=file://./data/workflows.json
EVENT_BUS_TYPE=kafka
PLUGINS_PATH=./plugins
LOG_LEVEL=info
EOF
Build Issues
Go build failures:
# Update Go modules
go mod tidy
go mod download
# Clear module cache
go clean -modcache
# Rebuild
make clean
make build
Node.js build failures:
# Clear npm cache
npm cache clean --force
cd ui/operion-editor
rm -rf node_modules package-lock.json
npm install
# Update dependencies
npm update
Database Connection Issues
PostgreSQL Connection Failures
Symptoms:
- "connection refused" errors
- "authentication failed" errors
- Timeout errors
Diagnosis:
# Test database connectivity
pg_isready -h localhost -p 5432 -U operion
# Test connection with psql
psql -h localhost -p 5432 -U operion -d operion
# Check database logs
sudo journalctl -u postgresql -f
Solutions:
Connection refused:
# Start PostgreSQL service
sudo systemctl start postgresql
sudo systemctl enable postgresql
# Check PostgreSQL status
sudo systemctl status postgresql
Authentication failed:
-- Reset user password
ALTER USER operion WITH PASSWORD 'newpassword';
-- Check user permissions
\du operion
GRANT ALL PRIVILEGES ON DATABASE operion TO operion;
Connection string issues:
# Verify connection string format
DATABASE_URL=postgresql://operion:password@localhost:5432/operion?sslmode=require
# Test with different SSL modes
DATABASE_URL=postgresql://operion:password@localhost:5432/operion?sslmode=disable
File Storage Issues
Permission errors:
# Create required directories
mkdir -p data/workflows
mkdir -p plugins
# Fix permissions
chmod 755 data/
chmod 644 data/workflows/*
Disk space issues:
# Check disk usage
df -h
du -sh data/
# Clean up old workflows (if safe)
find data/workflows -name "*.json" -mtime +30 -delete
Event Bus Issues
Kafka Connection Problems
Symptoms:
- Workers not receiving events
- "broker not available" errors
- High event processing latency
Diagnosis:
# Check Kafka broker status
kafka-broker-api-versions.sh --bootstrap-server localhost:9092
# List topics
kafka-topics.sh --list --bootstrap-server localhost:9092
# Check consumer groups
kafka-consumer-groups.sh --bootstrap-server localhost:9092 --list
# Monitor topic
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic workflow-events
Solutions:
Broker connection issues:
# Check Kafka service
sudo systemctl status kafka
sudo systemctl start kafka
# Verify broker configuration
cat /opt/kafka/config/server.properties | grep listeners
cat /opt/kafka/config/server.properties | grep advertised.listeners
Topic doesn't exist:
# Create topic
kafka-topics.sh --create \
--bootstrap-server localhost:9092 \
--topic workflow-events \
--partitions 3 \
--replication-factor 1
Consumer group issues:
# Reset consumer group offset
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
--group operion-workers --reset-offsets --to-latest --topic workflow-events --execute
# Delete consumer group
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
--group operion-workers --delete
Workflow Execution Issues
Workflows Not Executing
Symptoms:
- Workflows created but never run
- Workers not processing events
- Stuck in pending state
Diagnosis:
# Check worker logs
./bin/operion-worker
journalctl -u operion-worker -f
# Check dispatcher logs
./bin/operion-dispatcher
journalctl -u operion-dispatcher -f
# Test workflow manually
curl -X POST http://localhost:9091/workflows \
-H "Content-Type: application/json" \
-d @examples/data/workflows/simple-http-request.json
Solutions:
Workers not running:
# Start worker service
./bin/operion-worker &
# Check worker registration
curl http://localhost:9091/health
Event bus not connected:
# Check event bus configuration
export EVENT_BUS_TYPE=kafka
export KAFKA_BROKERS=localhost:9092
Trigger issues:
# Check trigger service (uses WEBHOOK_PORT, default 8085)
./bin/operion-dispatcher &
# Verify trigger configuration
curl http://localhost:9091/registry/triggers
Workflow Timeout Issues
Symptoms:
- Workflows timing out prematurely
- Long-running workflows killed
Solutions:
# Note: Timeout configuration may vary by implementation
# Check your specific Operion version for timeout environment variables
# Configure per workflow (recommended approach)
{
"timeout": "300s",
"steps": [...]
}
Action Execution Failures
HTTP Request Action Issues:
# Check network connectivity
curl -v http://target-api.com/endpoint
# Check DNS resolution
nslookup target-api.com
# Test with different timeouts
{
"action": "http_request",
"config": {
"url": "https://api.example.com",
"timeout": "30s",
"retries": {"attempts": 3, "delay": 1000}
}
}
Transform Action Issues:
# Test Go template expression
go run -c 'package main
import ("text/template"; "os"; "strings")
func main() {
tmpl := template.Must(template.New("test").Parse("{{.name}}"))
data := map[string]any{"name": "test"}
var buf strings.Builder
tmpl.Execute(&buf, data)
println(buf.String())
}'
# Validate input data
{
"action": "transform",
"config": {
"expression": "{{.name}}",
"input": "steps.previous_step.result"
}
}
Template Context Issues:
Symptoms:
- Template fields not found errors
- Variables not accessible in templates
- Environment variables not available
Common template context structure:
{
"steps": {}, // Results from previous workflow steps
"vars": {}, // Workflow variables (note: 'vars', not 'variables')
"trigger": {}, // Data from the trigger that started the workflow
"metadata": {}, // Workflow execution metadata
"env": {}, // Environment variables
"execution": { // Execution context
"id": "",
"workflow_id": ""
}
}
Solutions:
# Common template access patterns:
# ✅ Correct: Access workflow variables
"{{.vars.api_key}}"
"{{.vars.config.timeout}}"
# ✅ Correct: Access step results
"{{.steps.fetch_user.body.name}}"
"{{.steps.previous_step.result}}"
# ✅ Correct: Access trigger data
"{{.trigger.kafka.message}}"
"{{.trigger.schedule.timestamp}}"
# ✅ Correct: Access environment variables
"{{.env.DATABASE_URL}}"
"{{.env.API_KEY}}"
# ❌ Wrong: Old variable access (will not work)
"{{.vars.api_key}}" # Correct: use .vars instead of .variables
"{{.vars.api_key}}" # Correct: include dot prefix
# Debug template context - add temporary debug step:
{
"action": "log",
"config": {
"message": "Context: steps={{.steps}}, vars={{.vars}}, trigger={{.trigger}}"
}
}
Plugin Issues
Internal Plugin Loading Failures
Symptoms:
- Built-in actions not found (http_request, transform, log)
- Plugin registry errors on startup
Diagnosis:
# Check if internal plugins are built
ls -la plugins/
# Check API registry
curl http://localhost:9091/registry/actions
Solutions:
Missing internal plugins:
# Rebuild entire project including internal plugins
make clean
make build
# Verify plugins directory exists
mkdir -p plugins
Custom Plugin Issues
Note: This section applies to user-developed custom plugins only.
Symptoms:
- Custom plugin not loading
- Symbol lookup errors for user plugins
- Custom plugin compatibility issues
Diagnosis:
# Check custom plugin files
ls -la plugins/custom_*.so
# Test custom plugin loading (example)
ldd plugins/custom_action.so
# Check custom plugin symbols
nm -D plugins/custom_action.so | grep Factory
Solutions:
Custom plugin development:
# For custom plugins, follow your plugin build process
# Example for Go-based custom plugins:
go build -buildmode=plugin -o plugins/custom_action.so ./custom_plugins/
# Ensure custom plugin implements required interface
# Check custom plugin source code for required exports
# Rebuild custom plugins with same Go version as Operion
go version # Should match Operion build version
Performance Issues
High Memory Usage
Diagnosis:
# Monitor memory usage
top -p $(pgrep -f operion)
ps aux | grep operion
# Monitor memory usage over time
watch -n 5 'ps aux | grep operion | awk "{print \$4, \$6, \$11}"'
# Check system memory usage
free -h
vmstat 1 5
Solutions:
# Note: Performance tuning variables may vary by version
# Consult your Operion version documentation for specific variables
# Use multiple workers instead of single worker scaling
./bin/operion-worker --worker-id worker-01 &
./bin/operion-worker --worker-id worker-02 &
# Enable garbage collection tuning (Go runtime)
export GOGC=50
High CPU Usage
Diagnosis:
# Monitor CPU usage
htop
iostat 1
# Monitor CPU usage over time
top -p $(pgrep -f operion) -d 5
# Check load average and process stats
uptime
ps aux | grep operion | awk '{print $3, $4, $11}'
Solutions:
# Note: Polling intervals may vary by version
# Check your specific Operion documentation for tuning parameters
# Optimize database queries
# Add indexes to frequently queried columns in DATABASE_URL database
# Scale horizontally - run multiple worker instances
./bin/operion-worker --worker-id worker-01 &
./bin/operion-worker --worker-id worker-02 &
./bin/operion-worker --worker-id worker-03 &
Slow Workflow Execution
Diagnosis:
# Enable debug logging
export LOG_LEVEL=debug
./bin/operion-worker
# Check step execution times
# Review workflow logs for bottlenecks
# Test individual actions via API server (PORT default: 9091)
curl -X POST http://localhost:9091/workflows/test-action \
-H "Content-Type: application/json" \
-d '{"action": "http_request", "config": {...}}'
Solutions:
# Optimize HTTP requests
{
"action": "http_request",
"config": {
"timeout": "10s",
"retries": {"attempts": 2, "delay": 500}
}
}
# Optimize workflow structure - keep steps lightweight
{
"steps": [
{"id": "step1", "action": "http_request", "config": {"timeout": "5s"}},
{"id": "step2", "action": "transform", "config": {"expression": "$.data"}},
{"id": "step3", "action": "log", "config": {"message": "Workflow complete"}}
]
}
# Note: Performance tuning configuration may vary by version
# Check your Operion documentation for specific optimization options
Network and Connectivity Issues
API Not Accessible
Symptoms:
- Connection refused errors
- Timeout errors
- 404 not found errors
Diagnosis:
# Check API service status (default PORT: 9091)
curl -v http://localhost:9091/health
# Check dispatcher status (default WEBHOOK_PORT: 8085)
curl -v http://localhost:8085/health
# Check port binding
netstat -tulpn | grep :9091 # API server
netstat -tulpn | grep :8085 # Dispatcher webhooks
# Check firewall
sudo ufw status
sudo iptables -L
Solutions:
# Check service binding (use proper environment variables)
export PORT=9091 # API server port
export WEBHOOK_PORT=8085 # Dispatcher port
# Configure firewall for both services
sudo ufw allow 9091 # API server
sudo ufw allow 8085 # Dispatcher webhooks
sudo iptables -A INPUT -p tcp --dport 9091 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 8085 -j ACCEPT
# Test with different network interfaces
curl http://127.0.0.1:9091/health
curl http://127.0.0.1:8085/health
curl http://$(hostname -I | awk '{print $1}'):9091/health
Monitoring and Debugging
Enable Debug Mode
# Enable debug logging
export LOG_LEVEL=debug
# Start services with debug logging
./bin/operion-api
./bin/operion-worker
./bin/operion-dispatcher
Log Analysis
# Follow logs in real-time (services log to stdout/stderr)
./bin/operion-api | tee api.log &
./bin/operion-worker | tee worker.log &
./bin/operion-dispatcher | tee dispatcher.log &
# Or use journalctl if running as systemd services
journalctl -u operion-api -f
journalctl -u operion-worker -f
journalctl -u operion-dispatcher -f
# Search for errors in current logs
grep -i error api.log worker.log dispatcher.log
# Analyze log patterns
awk '/ERROR/ {print $1, $2, $NF}' api.log | sort | uniq -c
Health Monitoring
# Check all service health
curl http://localhost:9091/health # API server
curl http://localhost:8085/health # Dispatcher
# Monitor resource usage
watch -n 1 'ps aux | grep operion'
watch -n 1 'free -m'
Getting Help
⚠️ Security Warning
IMPORTANT: Before sharing diagnostic information, always review and sanitize sensitive data:
- Remove passwords from
DATABASE_URL
and configuration files - Redact API keys and authentication tokens from environment variables
- Mask IP addresses and hostnames if they contain sensitive information
- Remove personal data from logs and workflow definitions
- Check for secrets in environment variables and configuration files
Collecting Diagnostic Information
When reporting issues, include the following sanitized information:
# System information
uname -a
go version
docker version # if using Docker
# Service status
systemctl status operion-*
ps aux | grep operion
# Configuration (SANITIZE BEFORE SHARING!)
env | grep -E "(DATABASE_URL|EVENT_BUS_TYPE|PLUGINS_PATH|LOG_LEVEL|PORT|WEBHOOK_PORT|WORKER_ID|DISPATCHER_ID)"
# IMPORTANT: Remove passwords from DATABASE_URL before sharing
# Logs (last 100 lines) - REVIEW FOR SENSITIVE DATA
tail -100 /var/log/operion/api.log
journalctl -u operion-api --lines=100
# Resource usage
free -m
df -h
Log Collection Script
⚠️ WARNING: This script collects sensitive information. Review and sanitize before sharing.
#!/bin/bash
# collect-logs.sh
echo "⚠️ WARNING: This script collects potentially sensitive information!"
echo "Review and sanitize all files before sharing with support."
echo "Collecting Operion diagnostic information..."
mkdir -p operion-diagnostics
cd operion-diagnostics
# System info
uname -a > system-info.txt
go version >> system-info.txt
docker version >> system-info.txt 2>/dev/null
# Service status
systemctl status operion-* > service-status.txt 2>/dev/null
ps aux | grep operion > processes.txt
# Configuration (CONTAINS SENSITIVE DATA - REVIEW BEFORE SHARING!)
env | grep -E "(DATABASE_URL|EVENT_BUS_TYPE|PLUGINS_PATH|LOG_LEVEL|PORT|WEBHOOK_PORT|WORKER_ID|DISPATCHER_ID)" > environment.txt
echo "# ⚠️ REVIEW THIS FILE FOR PASSWORDS AND SECRETS BEFORE SHARING!" >> environment.txt
# Logs
journalctl -u operion-api --lines=500 > api.log 2>/dev/null
journalctl -u operion-worker --lines=500 > worker.log 2>/dev/null
journalctl -u operion-dispatcher --lines=500 > dispatcher.log 2>/dev/null
# Resource usage
free -m > memory.txt
df -h > disk.txt
lsof -i > network.txt 2>/dev/null
cd ..
tar -czf operion-diagnostics-$(date +%Y%m%d-%H%M%S).tar.gz operion-diagnostics/
echo ""
echo "✅ Diagnostic information collected in operion-diagnostics-*.tar.gz"
echo ""
echo "⚠️ IMPORTANT SECURITY REMINDERS:"
echo " 1. Review environment.txt for passwords and secrets"
echo " 2. Check all log files for sensitive information"
echo " 3. Remove or redact any personal/confidential data"
echo " 4. Only share sanitized information with support"
For additional help:
- GitHub Issues - Report bugs and request features
- Documentation - Full documentation
- Community Forum - Community support