StreamSets Transformer Training
StreamSets Transformer Course Overview
The fastest, most reliable way to build proven skills in StreamSets is via expert instructor-led hands-on classroom training in a structured learning environment.
This two day Streamsets Transformer course is designed for audiences who are experienced with StreamSets Data Collector and Control Hub. It is fast-paced and assumes that students have good knowledge of the StreamSets Data Collector and pipeline development.
This two-day hands-on training course provides comprehensive coverage of StreamSets Transformer while at the same time providing an overview of the entire StreamSets eco-system. Additionally, today’s heterogeneous IT landscape requires businesses to seamlessly interact with a variety of environments such as traditional databases, Hadoop, DataBricks, SnowFlake, AWS, Azure. This course provides the in depth learning experience that prepares students to meet these challenges.
Participants will learn how to configure and use Transformer to access the various environments, transfer and transform data, use the Pipeline Repository, configure and run jobs, and monitor the performance of pipelines across all instances of StreamSets products running in the organization. Throughout the course, hands-on exercises reinforce the concepts being discussed.
Requirements
Experience with StreamSets Data Collector is required. Students preferably should have a general knowledge of operating systems, networking, programming concepts, and databases.
Audience
The course is designed for those who will be building, managing, monitoring, and administering data flow pipelines. No prior knowledge of StreamSets Transformer is required.
Objectives
Introduction
Lab environment
Course Resources
Overview of the StreamSets Data Operations Platform
DataOps Platform Overview
StreamSets DataOps Architecture and Use Cases
Transformer UI Overview
Pipelines
Controls & Views
Package Management
Origins, Operators, Destinations
Spark Overview
Spark Overview
RDDs
DataFrames
Datasets
Transformer Deep Dive
Transformer Execution
Pipeline Processing on Spark
Transformer Batch Mode
Transformer Streaming Mode
Data Origin & Data Sources
Spark Partitioning & Caching
Ludicrous Mode
Batch Processing
Spark Batch Processing
Transformer Batch Processors
SparkSQL
Streaming & Windowing
Spark Streaming
Common Streaming Pipelines
Window Processor
Logs & Monitoring
Log Management & log files
Monitoring Pipelines
Spark UI & Execution
Framework Connectors
Hadoop Distributed Architecture
Hadoop, Hive, Kafka, Spark, Databricks, Snowflake, AWS, and Azure Operators
Hive Tables
Using PySpark ML Functions
PySpark Operator
PySpark Inputs
Machine Learning with PySpark ML Example
Spark Tuning in Transformer
Spark Tuning Properties
Partition, Shuffle, Repartition
Network Considerations
Java Serialization & Garbage Collection
SCH & Transformer Security
Web UI Security
Authentication
Access Control
Limiting Deployment of Stage Libraries
Source/Destination Security
Credential Security