Data Collector Training

StreamSets Data Collector Training

Labs

Review lesson slides and run labs at your own pace

Live Instructor Led

Take the 2-day course live with a Certified Instructor

StreamSets Data Collector Course Overview

High-performance deployments deserve high-caliber training. The fastest, most reliable way to build proven skills in StreamSets is via expert instructor-led hands-on classroom training in a structured learning environment.

This two-day hands-on training course provides a comprehensive introduction to StreamSets Data Collector. Participants will learn how to create complex pipelines that ingest data from a variety of sources, manipulate that data, and then export it to destinations including Apache Kafka, relational database management systems, and Apache Hadoop. Throughout the course, hands-on exercises reinforce the concepts being discussed.

Requirements

Students preferably should have a general knowledge of operating systems, networking, programming concepts, and databases.

Audience

The course is designed for those who will be designing, building, and running data flow pipelines, including data engineers, data developers, data analysts, data scientists, ETL developers, and data architects. No prior knowledge of StreamSets Data Collector is required.

Objectives

Introduction
Lab environment
Course Resources

Overview of the StreamSets
DataOps Platform
DataOps Platform Overview
StreamSets DataOps Architecture and Use Cases
Custom Examples

An Introduction to StreamSets Data Collector
Getting Started with Data Collector
SDC Overview
Building Pipelines
Previewing Data
Running the Pipeline

Pipeline Development
Connectors
Processors & Evaluators
Executors
Expression Language

Pipeline Events, Rules, and Alerts
Generating and Handling Events
Metric Rules
Data Rules

Reading, Writing and Transforming Data
Flat Files
Relational Databases: MySQL, Oracle, and Change Data Capture
Messaging Broker Systems: Kafka
Event Based: APIs
Distributed Storage: HDFS
Lookups: Relational Databases

Administration and Monitoring
Monitoring your SDC instances

Troubleshooting and Tuning
Troubleshooting errors and broken pipelines
Tuning configurations to speed up pipelines

Handling Data Drift
Data Drift Rules with MySQL