Skip to main content

Kafka 101: Deploying Kafka to Google Compute Engine

This article provides a startup script for deploying Kafka to a Google Compute Engine instance. This isn’t meant to be a production-ready system - it uses the Zookeeper instance embedded with Kafka and keeps most of the default settings. Instead, treat this as a quick and easy way do Kafka development using a live server.
This article uses Compute Engine startup scripts to install and run Kafka on instance startup. Startup scripts allow you to run arbitrary Bash commands whenever an instance is created or restarted. Since this script is run on every restart, we lead with a check that makes sure we have not already ran the startup script and, if we have, we simply exit.
#!/usr/bin/env bash
STARTUP_VERSION=1 STARTUP_MARK=/var/startup.script.$STARTUP_VERSION if [[ -f $STARTUP_MARK ]]; then exit 0 fi
Then we configure our Kafka and Scala version numbers used in the rest of the script.
SCALA_VERSION=2.10
KAFKA_VERSION=0.9.0.0-SNAPSHOT KAFKA_HOME=/opt/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION"
Next, we install any prerequisites needed to run Kafka. Namely, supervisor and Java.
sudo apt-get update
sudo apt-get install -y wget supervisor openjdk-7-jre
Now we are ready to download and run Kafka. We use our version variables defined earlier and extract Kafka to $KAFKA_HOME.
wget -q http://apache.mirrors.spacedump.net/kafka/"$KAFKA_VERSION"/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz -O /tmp/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz

tar xfz /tmp/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz -C /opt
rm /tmp/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz
We use supervisor to run both Zookeeper and Kafka. Supervisor takes care of keeping the processes alive and restarting on any failures, including system restart. Supervisor requires a configuration file for the services it monitors so we create one for Zookeeper and one for Kafka.
cat <<EOF > /etc/supervisor/conf.d/zookeeper.conf
[program:zookeeper]
command=$KAFKA_HOME/bin/zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties
autostart=true
autorestart=true
EOF
cat <<EOF > /etc/supervisor/conf.d/kafka.conf
[program:kafka]
command=$KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties
autostart=true
autorestart=true
EOF
sudo supervisorctl reread
sudo supervisorctl update
Finally, we create a test topic we can use for development.
$KAFKA_HOME/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
Our Kafka instance is now ready to use! One trick with Compute Engine is that each instance has both an internal and external IP. Any VM running on Compute Engine has access to the internal IP (as long as they are on the same network and the appropriate firewall rules have been created). Because of the way that Kafka advertises the hostname to connect to a broker, the configuration above assumes that you will be accessing Kafka from the internal IP. Therefore, any Compute Engine instance within the same network will be able to communicate with this Kafka instance. If you need to connect to the instance externally (e.g. from your laptop) you will need to modify your /etc/hosts file to translate from your machines internal hostname to the appropriate external IP address of your Compute Engine instance. Something like the following should do the trick: you will have to modify it for the particular values of your Compute Engine instance.
##
# Host Database
#
# localhost is used to configure the loopback interface
# when the system is booting. Do not change this entry.
##
127.0.0.1 localhost
255.255.255.255 broadcasthost
::1 localhost
104.154.46.158 kafka-0-8-2-1.c.myproject.internal kafka-0-8-2-1

Putting it all together

Below is the contents of the startup script in their entirety. By passing this script to the instance on startup Compute Engine will install and run Kafka with Zookeeper.
gcloud compute instances create kafka-0-8-2-1 \
--image debian-7-backports \
--metadata-from-file startup-script=kafka_startup_script.sh
#!/usr/bin/env bash
STARTUP_VERSION=1
STARTUP_MARK=/var/startup.script.$STARTUP_VERSION
# Exit if this script has already ran
if [[ -f $STARTUP_MARK ]]; then
exit 0
fi
set -o nounset
set -o pipefail
set -o errexit
SCALA_VERSION=2.11
KAFKA_VERSION=0.8.2.1
KAFKA_HOME=/opt/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION"
# Install prerequesites
sudo apt-get update
sudo apt-get install -y wget supervisor openjdk-7-jre
# Download Kafka
wget -q http://apache.mirrors.spacedump.net/kafka/"$KAFKA_VERSION"/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz -O /tmp/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz
tar xfz /tmp/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz -C /opt
rm /tmp/kafka_"$SCALA_VERSION"-"$KAFKA_VERSION".tgz
# Configure Supervisor
cat <<EOF > /etc/supervisor/conf.d/zookeeper.conf
[program:zookeeper]
command=$KAFKA_HOME/bin/zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties
autostart=true
autorestart=true
EOF
cat <<EOF > /etc/supervisor/conf.d/kafka.conf
[program:kafka]
command=$KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties
autostart=true
autorestart=true
EOF
# Run
sudo supervisorctl reread
sudo supervisorctl update
$KAFKA_HOME/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
touch $STARTUP_MARK

Comments

Popular posts from this blog

Merge AVHDX Hyper-V Checkpoints

When you create a snapshot of a virtual machine in Microsoft Hyper-V, a new file is created with the  .avhdx  file extension. The name of the file begins with the name of its parent VHDX file, but it also has a GUID following that, uniquely representing that checkpoint (sometimes called snapshots). You can see an example of this in the Windows Explorer screenshot below. Creating lots of snapshots will result in many  .avhdx  files, which can quickly become unmanageable. Consequently, you might want to merge these files together. If you want to merge the  .avhdx  file with its parent  .vhdx  file, it’s quite easy to accomplish. PowerShell Method Windows 10 includes support for a  Merge-VHD  PowerShell command, which is incredibly easy to use. In fact, you don’t even need to be running PowerShell “as Administrator” in order to merge VHDX files that you have access to. All you need to do is call  Merge-VHD  with the...

Openstack manila phần 4: Native GlusterFS Driver

Tiếp tục loạt bài về Openstack Manila hôm nay tôi sẽ cấu hình backend sử dụng GlusterFS Yêu cầu phiên bản GlusterFS >= 3.6. Với glusterfs nếu cluster của bạn không hỗ trợ snapshot thì trên manila cũng sẽ mất đi tính năng này. Để cấu hình snapshot ta sẽ cấu hình Thin Provision theo bài hướng dẫn link Với bài lab của mình có 2 node và chạy kiểu replicate. Mình sẽ tạo các thinly provisioned và tạo volume trên đó. Mô hình cài đặt Cài đặt glusterfs-v3.7 add-apt-repository ppa:gluster/glusterfs-3.7 -y apt-get update apt-get install glusterfs-server -y Tham khảo script tạo thin LV và gluster volume Script tạo thinly provisioned chạy trên 2 node apt-get install xfsprogs -y pvcreate /dev/sdb vgcreate myVG /dev/sdb lvcreate -L 8G -T myVG/thinpool for ((i = 1;i<= 5; i++ )) do mkdir -p /manila/manila-"$i" for (( j = 1; j<= 5; j++)) do lvcreate -V "${i}"Gb -T myVG/thinpool -n vol-"$i"-"$j" mkfs.xfs /dev/my...

Zabbix, AWS and Auto Registration

One of the things I love the most with AWS is  auto-scaling . You choose an AMI, set some parameters and AWS will spin instances up and down whenever a threshold is breached. But with all these instances spinning up and down there are some unknowns. For example, what is the IP address of the new instance? Its host name? This can be critical when other components of your infrastructure are dependent on knowing these parameters. I had this problem when I started to use  Zabbix  as the monitoring system. At first it seemed like a complicated one, but Zabbix has a wonderful feature called  Auto Registration  which can be used exactly for this situation. I will try to show how to configure auto registration both on the client (EC2 instance running Ubuntu 14.04) and on the Zabbix server (Zabbix Server 2.4.2). Zabbix-agent Installation and Configuration Let’s start with installing zabbix-agent on the Ubuntu client: 1 2 $ sudo apt-get update $ sud...