Open in app

Sign In

Write

Sign In

Sandeep Kattepogu
Sandeep Kattepogu

86 Followers

Home

About

Jan 22, 2021

Example Spark 3.0.1 Data Transformations in Python

Step-by-step transformations on arbitrary text input Outline Introduction (including software versions used) Creating AWS Instance Installation (Spark, Kafka) Spark job 1: Output raw data to console Spark job 2: Run custom functions on input and output as new column Spark job 3: Parse JSON and output specific fields Spark job 4…

Apache Spark

12 min read

Example Spark 3.0.1 Data Transformations in Python
Example Spark 3.0.1 Data Transformations in Python
Apache Spark

12 min read


Dec 29, 2020

Basic Arch Linux Installation on a VM (With or Without Full Disk Encryption)

Does “do-it-yourself” have to be so hard? Introduction The process of installing Arch Linux is quite different compared to other OS’s. In fact, installing a standard Linux ISO is more similar to installing Windows 10 than it is to installing Arch Linux. …

Arch

9 min read

Basic Arch Linux Installation on a VM (With or Without Full Disk Encryption)
Basic Arch Linux Installation on a VM (With or Without Full Disk Encryption)
Arch

9 min read


Dec 1, 2020

Using VirtualBox as a Cloud Computing Server

For when you really just don’t want to pay for vSphere Outline Introduction Requirements Part 0: Installing VirtualBox and Extension Pack on the Command Line Part 1: Creating and Deleting a VirtualBox VM using “VBoxManage” Part 2: Enabling RDP access for a VirtualBox VM Part 3: Additional Considerations Introduction When I was…

Virtualbox

6 min read

Blog: Using VirtualBox as a Cloud Computing Server
Blog: Using VirtualBox as a Cloud Computing Server
Virtualbox

6 min read


Nov 27, 2020

Streaming Data from MySQL into Apache Kafka

CDC-like data pipeline using MySQL binary logs Outline Introduction Creating Security Groups and EC2 Instances (~15 min) Installing MySQL and Configuring to Allow Binary Log Reading (~15 min) Installing/Configuring Kafka and Debezium Connector (~15 min) Introduction In my previous set of tutorials, I explained how to use the Debezium connector to stream…

MySQL

6 min read

Streaming Data from MySQL into Apache Kafka
Streaming Data from MySQL into Apache Kafka
MySQL

6 min read


Jun 17, 2020

Sending StatsD Metrics and Visualizing in Grafana

Creating a CDC data pipeline: Part 3 Outline Introduction Creating Security Groups and EC2 Instances (~5 min) Installing Graphite Carbon, Graphite Web, and StatsD (~15 minutes) Installing Grafana (~5 min) Configuring StatsD (~5 min) Starting All Pipeline Services (~10 min) Configuring Grafana and Creating a Dashboard (~10 min) Completed Python File …

Microsoft Sql Server

11 min read

Sending StatsD Metrics and Visualizing in Grafana
Sending StatsD Metrics and Visualizing in Grafana
Microsoft Sql Server

11 min read


Jun 12, 2020

Streaming Data from Apache Kafka Topic using Apache Spark 2.4.7 and Python

Creating a CDC data pipeline: Part 2 Outline Introduction Creating Security Groups and EC2 Instances (~5 min) Installing/Configuring Spark (~5 min) Starting All Pipeline Services (~10 min) Extracting CDC Row Insertion Data Using Pyspark (~15 min) Running Own Functions on Output Changing the Spark Job to Filter out Deletes and Updates …

Microsoft Sql Server

10 min read

Streaming Data from Apache Kafka Topic using Apache Spark 2.4.7 and Python
Streaming Data from Apache Kafka Topic using Apache Spark 2.4.7 and Python
Microsoft Sql Server

10 min read


Jun 9, 2020

Streaming Data from Microsoft SQL Server into Apache Kafka

Creating a CDC data pipeline: Part 1 Outline Introduction Creating Security Groups and EC2 Instances (~15 min) Configuring SQL Server for CDC (~15 min) Installing/Configuring Kafka and Debezium Connector (~15 min) Reading CDC Topic (~5 min) Addendum 1: Important Commands Used Addendum 2: Next Article in the Tutorial Introduction In this three-part…

Microsoft Sql Server

8 min read

Streaming Data from Microsoft SQL Server into Apache Kafka
Streaming Data from Microsoft SQL Server into Apache Kafka
Microsoft Sql Server

8 min read


May 15, 2020

Greenplum 6.7.1 on AWS

How to install a three-node Greenplum Database cluster with segment mirroring — Overview I. Introduction II. System Requirements III. Part 1 — Create AWS Instances (~15 minutes) IV. Part 2 — Setup on Each Node (~15 minutes) V. Part 3 — Initialize Greenplum Database (~15 minutes) Introduction Greenplum Database uses an MPP (massively parallel processing) database architecture that is able to take advantage of…

Greenplum

5 min read

Greenplum 6.7.1 on AWS
Greenplum 6.7.1 on AWS
Greenplum

5 min read

Sandeep Kattepogu

Sandeep Kattepogu

86 Followers

A man with a passion for information technology.

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech