Apache Kafka cluster setup for web application

Our company is engaged in the development, support and maintenance of sites of any complexity. From simple one-page sites to large-scale cluster systems built on micro services. Experience of developers is confirmed by certificates from vendors.
Development and maintenance of all types of websites:
Informational websites or web applications
Business card websites, landing pages, corporate websites, online catalogs, quizzes, promo websites, blogs, news resources, informational portals, forums, aggregators
E-commerce websites or web applications
Online stores, B2B portals, marketplaces, online exchanges, cashback websites, exchanges, dropshipping platforms, product parsers
Business process management web applications
CRM systems, ERP systems, corporate portals, production management systems, information parsers
Electronic service websites or web applications
Classified ads platforms, online schools, online cinemas, website builders, portals for electronic services, video hosting platforms, thematic portals

These are just some of the technical types of websites we work with, and each of them can have its own specific features and functionality, as well as be customized to meet the specific needs and goals of the client.

Showing 1 of 1 servicesAll 2065 services
Apache Kafka cluster setup for web application
Complex
~3-5 business days
FAQ
Our competencies:
Development stages
Latest works
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822
  • image_crm_chasseurs_493_0.webp
    CRM development for Chasseurs
    847
  • image_website-sbh_0.png
    Website development for SBH Partners
    999
  • image_website-_0.png
    Website development for Red Pear
    451

Setting up Apache Kafka cluster for web application

Kafka is not just a message queue. It's a distributed log with ordering guarantees, replication, and the ability to replay events from any point. For a web application this means: asynchronous event processing, service decoupling, audit logs, event sourcing, real-time analytics.

A single broker suits only development. A production cluster requires at least 3 brokers with replication.

Choosing mode: KRaft vs ZooKeeper

Since Kafka 3.3+, KRaft mode (without ZooKeeper) became production-ready and is the recommended approach. For new setups — KRaft only.

3-node cluster in KRaft mode:
- kafka-1: controller + broker
- kafka-2: controller + broker
- kafka-3: controller + broker

Installation on Ubuntu 22.04

# Java is required
apt install -y openjdk-21-jdk-headless

# Download Kafka
KAFKA_VERSION=3.7.0
SCALA_VERSION=2.13
wget https://downloads.apache.org/kafka/${KAFKA_VERSION}/kafka_${SCALA_VERSION}-${KAFKA_VERSION}.tgz
tar -xzf kafka_${SCALA_VERSION}-${KAFKA_VERSION}.tgz -C /opt/
ln -s /opt/kafka_${SCALA_VERSION}-${KAFKA_VERSION} /opt/kafka

useradd -r -s /bin/false kafka
chown -R kafka:kafka /opt/kafka
mkdir -p /var/log/kafka /data/kafka
chown kafka:kafka /var/log/kafka /data/kafka

KRaft configuration (on each node)

/opt/kafka/config/kraft/server.properties — different for each node:

# Node 1 (change node.id and advertised.listeners for nodes 2 and 3)
node.id=1
process.roles=broker,controller
controller.quorum.voters=1@kafka-1:9093,2@kafka-2:9093,3@kafka-3:9093

listeners=PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
advertised.listeners=PLAINTEXT://kafka-1.internal:9092
inter.broker.listener.name=PLAINTEXT
controller.listener.names=CONTROLLER
listener.security.protocol.map=PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXT

# Storage
log.dirs=/data/kafka
num.recovery.threads.per.data.dir=4

# Performance
num.io.threads=16
num.network.threads=8
socket.send.buffer.bytes=1048576
socket.receive.buffer.bytes=1048576
socket.request.max.bytes=104857600

# Replication
default.replication.factor=3
min.insync.replicas=2
num.partitions=6
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2

# Retention
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000

# Compression
compression.type=lz4

Storage initialization (one time):

# Generate cluster UUID (once, same for all nodes)
CLUSTER_UUID=$(kafka-storage.sh random-uuid)

# Format storage on each node
kafka-storage.sh format \
    -t $CLUSTER_UUID \
    -c /opt/kafka/config/kraft/server.properties

Systemd unit

[Unit]
Description=Apache Kafka
After=network.target

[Service]
Type=simple
User=kafka
Environment="KAFKA_HEAP_OPTS=-Xmx4g -Xms4g"
Environment="KAFKA_JVM_PERFORMANCE_OPTS=-server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true"
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/kraft/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
Restart=on-failure
RestartSec=5
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

Setting up TLS between brokers and clients

Without TLS all traffic is sent in plaintext. Minimum — TLS for external clients.

# Generate CA and certificates for each broker
keytool -keystore kafka-1.keystore.jks -alias kafka-1 \
    -keyalg RSA -validity 365 \
    -genkey -storepass changeit \
    -dname "CN=kafka-1.internal, OU=Kafka, O=Company, L=City, ST=State, C=US"

# Sign with CA
keytool -keystore kafka-1.keystore.jks -alias kafka-1 \
    -certreq -file kafka-1.csr -storepass changeit

openssl x509 -req -CA ca.crt -CAkey ca.key \
    -in kafka-1.csr -out kafka-1-signed.crt \
    -days 365 -CAcreateserial

Add to server.properties:

listeners=PLAINTEXT://0.0.0.0:9092,SSL://0.0.0.0:9094,CONTROLLER://0.0.0.0:9093
ssl.keystore.location=/etc/kafka/ssl/kafka-1.keystore.jks
ssl.keystore.password=changeit
ssl.key.password=changeit
ssl.truststore.location=/etc/kafka/ssl/kafka.truststore.jks
ssl.truststore.password=changeit
ssl.client.auth=required
ssl.enabled.protocols=TLSv1.3,TLSv1.2

Monitoring — JMX + Prometheus

# kafka-jmx-exporter.yml — configuration for JMX Exporter
startDelaySeconds: 0
hostPort: 127.0.0.1:9999
lowercaseOutputName: true
rules:
  - pattern: kafka.server<type=BrokerTopicMetrics, name=MessagesInPerSec><>OneMinuteRate
    name: kafka_server_broker_topic_messages_in_per_sec
  - pattern: kafka.server<type=ReplicaManager, name=UnderReplicatedPartitions><>Value
    name: kafka_server_under_replicated_partitions
  - pattern: kafka.controller<type=KafkaController, name=ActiveControllerCount><>Value
    name: kafka_controller_active_count
  - pattern: kafka.network<type=RequestMetrics, name=TotalTimeMs, request=Produce><>99thPercentile
    name: kafka_network_produce_total_time_ms_p99

Key metrics for alerting:

  • kafka_server_under_replicated_partitions > 0 — replica loss
  • kafka_controller_active_count != 1 — controller issue
  • consumer lag > threshold — consumer lagging

Initial testing

# Create test topic
kafka-topics.sh --bootstrap-server kafka-1:9092 \
    --create --topic test-topic \
    --partitions 6 --replication-factor 3

# Check replication
kafka-topics.sh --bootstrap-server kafka-1:9092 \
    --describe --topic test-topic

# Producer performance test
kafka-producer-perf-test.sh \
    --topic test-topic \
    --num-records 1000000 \
    --record-size 1024 \
    --throughput -1 \
    --producer-props bootstrap.servers=kafka-1:9092,kafka-2:9092,kafka-3:9092 \
        acks=all compression.type=lz4

# Consumer test
kafka-consumer-perf-test.sh \
    --bootstrap-server kafka-1:9092 \
    --topic test-topic \
    --messages 1000000 \
    --group perf-test-group

Project timeline

Day 1 — infrastructure preparation: 3 VMs/servers with separate disks for Kafka data (not system partition), DNS setup, open ports 9092/9093/9094 between nodes.

Day 2 — Java installation, Kafka setup, cluster UUID generation, storage formatting, systemd configuration, cluster startup.

Day 3 — TLS setup, creation of production topics with correct partition/replication factors, performance testing.

Day 4 — monitoring integration (JMX Exporter + Prometheus + Grafana), alert configuration for under-replicated partitions and consumer lag.

Day 5 — failure scenario testing: broker shutdown, verify cluster continues, recovery.

Additionally: Kafka Schema Registry and Kafka Connect setup adds 2–3 days each.