Enterprise Scala and Spark


Enterprise Scala and Spark

Enterprise Data Science is a very wide-ranging field built on many core technologies and paradigms that combine to provide a robust solution. Some of these key technologies and/or practices include ETL, Data Engineering, Machine Learning, Network/Grid/Cloud engineering, Business Rules and others. Enterprise Scala in Spark is five-day, hands-on course that explores some of these areas at a higher level and provides experienced developers with a big-picture understanding of how these technologies and skillsets fit together in a professional-grade enterprise Data Science environment, focusing on best practices and emerging trends.

This course introduces Scala and Functional programming as well as Spark and Enterprise Integration techniques in a "breadth"-based approach for maximum exposure into the wider world of Enterprise Data Science. The course has several hands-on labs integrated throughout the training, but additional "depth"-based learning should be expected after this class for true mastery of Enterprise Data Science.

If your team requires different topics or tools, additional skills or custom approach, this course may be easily adjusted to accommodate. We offer additional related Scala, Spark, data science, programming and development courses which may be blended with this course for a track that best suits your development objectives.

This course provides indoctrination in the practical use of the umbrella of technologies that are on the leading edge of data science development.

Working in a hands-on learning environment led by our expert practitioner, attending students will lean:

  • Essential Scala programming, leveraging your existing OO development experience
  • How to write essential Spark programs and perform exploratory data analysis in Scala and the Spark shell
  • Work with Spark Core
  • Work with NoSQL
  • How to write programs for Spark Streaming in Scala

Attending students are required to have a background in programming with basic OO development skills. Students should have the following incoming skills or knowledge:

  • Experience in developing object-oriented enterprise applications to at least a basic level
  • Familiarity with Eclipse
  • Be comfortable with the Linux/Unix command line, including editing text files

Incoming students should have skills equivalent to the topics

  • Java Programming Fundamentals

Session: Functional Programming in Scala (2-3 hours)

  • Functional Programming
  • Scala Overview
  • Scala vs. Python vs. Java vs. R
  • REPL in Scala
  • Installing Scala
  • Hello, Scala

Session: Scala (1.5 days)

  • Classes and Objects
  • Traits
  • Mixins
  • High-Order Functions
  • Types and Inference
  • Lists
  • Annotations
  • Collections
  • Pattern Matching
  • Using Java in Scala
  • Futures, Promises, and Parallel Collections (Concurrency)
  • Functional Programming Overview

Session: Spark Core (1 day)

  • Hadoop and Spark Overview
  • File I/O with HDFS
  • Data Frames and Resilient Distributed Datasets
  • Spark SQL
  • In-memory lookups
  • Essential AI with MLLib
  • Using Web Notebooks (Optional)

Session: Working with NOSQL (5 hours)

  • Not Only SQL
  • Relational Data
  • Sqoop
  • Columnar Databases
  • Cassandra
  • Document Databases
  • Key/Value Databases
  • Graph Databases
  • Neo4J
  • GraphX
  • Hive in Spark

Session: Spark Streaming (3.5 hours)

  • Spark Streaming Model
  • Streaming with Kafka

Session: ML Lib (3.5 hours)

  • Machine Learning Essentials
  • Spark ML/MLLib
  • MLLib and Streaming
  • MLlib, Streaming, and Kafka

Session: Enterprise Integration (3.5 hours)

  • Enterprise Service and Message Busses
  • Lambda Architecture


Whether you need assistance scheduling a class for yourself or for your group, GCA's Education Account Manager's will craft a customized training solution to meet the needs of your organization.