Kafka 101: Deploying Kafka to Google Compute Engine

This article provides a startup script for deploying Kafka to a Google Compute Engine instance. This isn’t meant to be a production-ready system - it uses the Zookeeper instance embedded with Kafka and keeps most of the default settings. Instead, treat this as a quick and easy way do Kafka development using a live server.

This article uses Compute Engine startup scripts to install and run Kafka on instance startup. Startup scripts allow you to run arbitrary Bash commands whenever an instance is created or restarted. Since this script is run on every restart, we lead with a check that makes sure we have not already ran the startup script and, if we have, we simply exit.


#!/usr/bin/env bash
STARTUP_VERSION=1
STARTUP_MARK=/var/startup.script.$STARTUP_VERSION

if [[ -f $STARTUP_MARK ]]; then
  exit 0
fi

Then we configure our Kafka and Scala version numbers used in the rest of the script.


SCALA_VERSION=2.10
KAFKA_VERSION=0.9.0.0-SNAPSHOT
KAFKA_HOME=/opt/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION"

Next, we install any prerequisites needed to run Kafka. Namely, supervisor and Java.


sudo apt-get update
sudo apt-get install -y wget supervisor openjdk-7-jre

Now we are ready to download and run Kafka. We use our version variables defined earlier and extract Kafka to $KAFKA_HOME.

wget -q http://apache.mirrors.spacedump.net/kafka/"$KAFKA_VERSION"/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz -O /tmp/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz

tar xfz /tmp/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz -C /opt
rm /tmp/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz

We use supervisor to run both Zookeeper and Kafka. Supervisor takes care of keeping the processes alive and restarting on any failures, including system restart. Supervisor requires a configuration file for the services it monitors so we create one for Zookeeper and one for Kafka.


cat <<EOF > /etc/supervisor/conf.d/zookeeper.conf
[program:zookeeper]

command=$KAFKA_HOME/bin/zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties

autostart=true

autorestart=true

EOF

cat <<EOF > /etc/supervisor/conf.d/kafka.conf
[program:kafka]

command=$KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties

autostart=true

autorestart=true

EOF

sudo supervisorctl reread

sudo supervisorctl update

Finally, we create a test topic we can use for development.

$KAFKA_HOME/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

Our Kafka instance is now ready to use! One trick with Compute Engine is that each instance has both an internal and external IP. Any VM running on Compute Engine has access to the internal IP (as long as they are on the same network and the appropriate firewall rules have been created). Because of the way that Kafka advertises the hostname to connect to a broker, the configuration above assumes that you will be accessing Kafka from the internal IP. Therefore, any Compute Engine instance within the same network will be able to communicate with this Kafka instance. If you need to connect to the instance externally (e.g. from your laptop) you will need to modify your /etc/hosts file to translate from your machines internal hostname to the appropriate external IP address of your Compute Engine instance. Something like the following should do the trick: you will have to modify it for the particular values of your Compute Engine instance.


##

# Host Database

#

# localhost is used to configure the loopback interface

# when the system is booting.  Do not change this entry.

##

127.0.0.1 localhost

255.255.255.255 broadcasthost
::1             localhost


104.154.46.158 kafka-0-8-2-1.c.myproject.internal kafka-0-8-2-1

Putting it all together

Below is the contents of the startup script in their entirety. By passing this script to the instance on startup Compute Engine will install and run Kafka with Zookeeper.


gcloud compute instances create kafka-0-8-2-1 \

  --image debian-7-backports \

  --metadata-from-file startup-script=kafka_startup_script.sh


#!/usr/bin/env bash




STARTUP_VERSION=1

STARTUP_MARK=/var/startup.script.$STARTUP_VERSION


# Exit if this script has already ran

if [[ -f $STARTUP_MARK ]]; then

  exit 0

fi


set -o nounset

set -o pipefail

set -o errexit


SCALA_VERSION=2.11

KAFKA_VERSION=0.8.2.1

KAFKA_HOME=/opt/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION"


# Install prerequesites

sudo apt-get update

sudo apt-get install -y wget supervisor openjdk-7-jre

# Download Kafka

wget -q http://apache.mirrors.spacedump.net/kafka/"$KAFKA_VERSION"/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz -O /tmp/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz


tar xfz /tmp/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz -C /opt

rm /tmp/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz


# Configure Supervisor

cat <<EOF > /etc/supervisor/conf.d/zookeeper.conf
[program:zookeeper]

command=$KAFKA_HOME/bin/zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties

autostart=true

autorestart=true

EOF

cat <<EOF > /etc/supervisor/conf.d/kafka.conf
[program:kafka]

command=$KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties

autostart=true

autorestart=true

EOF

# Run

sudo supervisorctl reread
sudo supervisorctl update

$KAFKA_HOME/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test


touch $STARTUP_MARK

Hai Dam

Search This Blog

Kafka 101: Deploying Kafka to Google Compute Engine

Putting it all together

Labels

Comments

Post a Comment

Popular posts from this blog

Merge AVHDX Hyper-V Checkpoints

Openstack manila phần 4: Native GlusterFS Driver

Zabbix, AWS and Auto Registration