Sending StatsD Metrics and Visualizing in Grafana

Outline

Introduction

Example data pipeline from insertion to visualization

Step 1: Creating Security Groups and EC2 Instances (~5 min)

Step 2: Installing Graphite Carbon, Graphite Web, and StatsD (~15 minutes)

sudo apt-get update -y
sudo apt-get upgrade -y
sudo apt-get -y install graphite-web graphite-carbon
Grahpite carbon whisper database prompt
sudo apt-get install -y postgresql libpq-dev python-psycopg2
nano ./setup.sql
CREATE USER graphite WITH PASSWORD 'password';
CREATE DATABASE graphite WITH OWNER graphite;
sudo -u postgres psql -f setup.sql
sudo nano /etc/graphite/local_settings.py
sudo graphite-manage migrate
sudo graphite-manage migrate --run-syncdb
Running Django migrations
sudo graphite-manage createsuperuser
Creating a superuser
sudo chmod -R 777 /var/log/graphite
sudo nano /etc/default/graphite-carbon
CARBON_CACHE_ENABLED=false
CARBON_CACHE_ENABLED=true
Carbon cache configuration
sudo nano /etc/carbon/carbon.conf
ENABLE_LOGROTATION = False
ENABLE_LOGROTATION = True
Carbon configuration
sudo nano /etc/carbon/storage-schemas.conf
[statsd]
pattern = ^stats.*
retentions = 10s:1d,1m:7d,10m:1y
Storage schemas
sudo apt-get install apache2 libapache2-mod-wsgi -y
sudo a2dissite 000-default
sudo cp /usr/share/graphite-web/apache2-graphite.conf /etc/apache2/sites-available
sudo a2ensite apache2-graphite
sudo service apache2 reload
sudo apt-get install -y git devscripts debhelper dh-systemd
curl -sL https://deb.nodesource.com/setup_14.x | sudo -E bash -
sudo apt-get install -y nodejs
mkdir ~/build
cd ~/build
git clone https://github.com/etsy/statsd.git
cd statsd
dpkg-buildpackage
cd ..
sudo dpkg -i statsd_0.8.6-1_all.deb
cd ~

Step 3: Installing Grafana (~ 5 min)

sudo apt-get install -y apt-transport-https
sudo apt-get install -y software-properties-common wget
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
Adding Grafana repo
sudo apt-get update -y
sudo apt-get install grafana -y
sudo service grafana-server start
sudo service carbon-cache stop

sudo service carbon-cache start
sudo service statsd restart
sudo service apache2 reload
Graphite web
Grafana web interface

Step 4: Configuring StatsD (~5 min)

sudo pip3 install statsd
# Imports and running findspark
import findspark
findspark.init('/etc/spark')
import pyspark
from pyspark import RDD
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
import json
# Spark context details
sc = SparkContext(appName="PythonSparkStreamingKafka")
ssc = StreamingContext(sc,2)
# Creating Kafka direct stream
dks = KafkaUtils.createDirectStream(ssc, ["testDB.dbo.fruit"], {"metadata.broker.list":"|replace with your Kafka private address|:9092"})
# Transforming CDC JSON data to sum fruit numbers
# based on fruit name

def printy(a, b):
listy = b.collect()
for l in listy:
print(l)
counts = dks.map(lambda x: json.loads(x[1])).flatMap(lambda dict: dict.items()).filter(lambda items: items[0]=="payload").map(lambda tupler: (tupler[1]["after"]["fruit_name"], tupler[1]["after"]["num_sold"])).reduceByKey(lambda a, b: a+b).foreachRDD(printy)#Starting Spark context
ssc.start()
ssc.awaitTermination()
import statsd
c = statsd.StatsClient('{private IP for Grafana instance}', 8125)
# Spark context details
def printy(a, b):
listy = b.collect()
for l in listy:
print(l)
def printy(a, b):
listy = b.collect()
for l in listy:
c.incr("{0}.sold".format(l[0]), l[1])
Apple.sold:10|g
metric_name:metric_value|metric_type

Step 5: Starting All Pipeline Services (~10 min)

net start MSSQLSERVER
net start SQLSERVERAGENT
Starting services
/etc/kafka/bin/zookeeper-server-start.sh /etc/kafka/config/zookeeper.properties &> zookeeper_log &/etc/kafka/bin/kafka-server-start.sh /etc/kafka/config/server.properties &> broker_log &/etc/kafka/bin/connect-distributed.sh /etc/kafka/config/connect-distributed.properties &> connect_log &
curl -H "Accept:application/json" localhost:8083/connectors/;curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d '{ "name": "test-connector", "config": { "connector.class": "io.debezium.connector.sqlserver.SqlServerConnector", "database.hostname": "{Private IP Address}", "database.port": "1433", "database.user": "testuser", "database.password": "password!", "database.dbname": "testDB", "database.server.name": "testDB", "table.whitelist": "dbo.fruit", "database.history.kafka.bootstrap.servers": "localhost:9092", "database.history.kafka.topic": "dbhistory.fulfillment" } }';
Connector not added
Connector added
/etc/spark/bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.3,org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.3 readkafka.py

Step 6: Configuring Grafana and Creating a Dashboard (~10 min)

Grafana web interface
Changing the admin password
Grafana welcome page
Adding a new data source
http://127.0.0.1:80
Saving the new data source
Creating a new dashboard
Adding a new panel
New live graph
Changing data source for query
Display settings
Time range
Refresh rate
use testDB;INSERT INTO fruit(fruit_name, num_sold)
VALUES ('Apple', 5);
INSERT INTO fruit(fruit_name, num_sold)
VALUES ('Pear', 10);
INSERT INTO fruit(fruit_name, num_sold)
VALUES ('Peach', 20);
INSERT INTO fruit(fruit_name, num_sold)
VALUES ('Watermelon', 25);
sqlcmd -U testuser -P {your password} -i {location of testdb.sql}
Running an insert
Selecting metrics
Adding a new series
Second series
Third and fourth series
Grafana live graph

Completed Python File

# Imports and running findspark
import findspark
findspark.init('/etc/spark')
import pyspark
from pyspark import RDD
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
import json
import statsd
c = statsd.StatsClient('{private IP for Grafana instance}', 8125)
# Spark context details
sc = SparkContext(appName="PythonSparkStreamingKafka")
ssc = StreamingContext(sc,2)
# Creating Kafka direct stream
dks = KafkaUtils.createDirectStream(ssc, ["testDB.dbo.fruit"], {"metadata.broker.list":"|replace with your Kafka private address|:9092"})
# Transforming CDC JSON data to sum fruit numbers
# based on fruit name

def printy(a, b):
listy = b.collect()
for l in listy:
c.incr("{0}.sold".format(l[0]), l[1])
counts = dks.map(lambda x: json.loads(x[1])).flatMap(lambda dict: dict.items()).filter(lambda items: items[0]=="payload").map(lambda tupler: (tupler[1]["after"]["fruit_name"], tupler[1]["after"]["num_sold"])).reduceByKey(lambda a, b: a+b).foreachRDD(printy)#Starting Spark context
ssc.start()
ssc.awaitTermination()

Conclusion

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store