StreamSets Transformer Training
HANDS ON TRAINING
BUILD PROVEN SKILLS
2 DAY COURSE
StreamSets Transformer Course Overview
The fastest, most reliable way to build proven skills in StreamSets is via expert instructor-led hands-on classroom training in a structured learning environment.
This two day Streamsets Transformer course is designed for audiences who are experienced with StreamSets Data Collector and Control Hub. It is fast-paced and assumes that students have good knowledge of the StreamSets Data Collector and pipeline development.
This two-day hands-on training course provides comprehensive coverage of StreamSets Transformer while at the same time providing an overview of the entire StreamSets eco-system. Additionally, today’s heterogeneous IT landscape requires businesses to seamlessly interact with a variety of environments such as traditional databases, Hadoop, DataBricks, SnowFlake, AWS, Azure. This course provides the in depth learning experience that prepares students to meet these challenges.
Participants will learn how to configure and use Transformer to access the various environments, transfer and transform data, use the Pipeline Repository, configure and run jobs, and monitor the performance of pipelines across all instances of StreamSets products running in the organization. Throughout the course, hands-on exercises reinforce the concepts being discussed.
Experience with StreamSets Data Collector is required. Students preferably should have a general knowledge of operating systems, networking, programming concepts, and databases.
The course is designed for those who will be building, managing, monitoring, and administering data flow pipelines. No prior knowledge of StreamSets Transformer is required.
Overview of the StreamSets Data Operations Platform
DataOps Platform Overview
StreamSets DataOps Architecture and Use Cases
Transformer UI Overview
Controls & Views
Origins, Operators, Destinations
Transformer Deep Dive
Pipeline Processing on Spark
Transformer Batch Mode
Transformer Streaming Mode
Data Origin & Data Sources
Spark Partitioning & Caching
Spark Batch Processing
Transformer Batch Processors
Streaming & Windowing
Common Streaming Pipelines
Logs & Monitoring
Log Management & log files
Spark UI & Execution
Hadoop Distributed Architecture
Hadoop, Hive, Kafka, Spark, Databricks, Snowflake, AWS, and Azure Operators
Using PySpark ML Functions
Machine Learning with PySpark ML Example
Spark Tuning in Transformer
Spark Tuning Properties
Partition, Shuffle, Repartition
Java Serialization & Garbage Collection
SCH & Transformer Security
Web UI Security
Limiting Deployment of Stage Libraries