Get in touch

Contact us for all inquiries regarding services and general information






Get in touch

Contact us for all inquiries regarding services and general information






Get in touch

Contact us for all inquiries regarding services and general information







Get in touch

Contact us for all inquiries regarding services and general information






image-course image-course
Business Analytics Apache Spark – advanced usage
DURATION 2 Days
The course is aimed at the participants who want to advance their knowledge in the Spark environment, such as the Spark Streaming.

Apache Spark is a framework for processing a large amount of data fast. Apache Spark is a very popular system, often used for advanced analytics, data science, modern BigData architecture, as well as for the complex batch (ETL) processing and for processing in real time. Spark contains a few key components such as: Spark SQL for data structuring, Spark Streaming for processing a large amount of data in real time, Spark MLib for machine learning, Spark GraphX for graph processing and SparkR for statistic data processing using R language. Spark can be started by itself, on a YARN (Hadoop) cluster or in a Mesos environment, so it can start in any environment. Spark is a polyglot framework, which means that it abstracts its usage to the maximum, and it imposes using a programme language (Python, Java, Scala, R), to the development environment, which fits the organization or the business type the best.

All the examples in this education will be primarily processed in Python, but other programme languages, e.g. Scala, will also be used. The exercise will be done in the independent and cluster environment, depending on the assignment the participants will be working on.

This education is aimed at system architects, development engineers and business analysts.

The advanced usage of Apache Spark

The course is aimed at the participants who want to advance their knowledge in the Spark environment, such as the Spark Streaming. The participants will get all the necessary info about how to establish a streaming process for data processing in real time. They will learn about the MLib library for machine learning, where they will build a model for machine learning and a process of model training will be showed to them as well. By using the GraphX library for processing graph databases through a few examples, we will show how to use it efficiently in practice.

The prerequisites

Basic knowledge of the Python programme language, knowledge of OO programming, advanced knowledge of the SQL language.

For all possible inquiries, do not hesitate to contact us on our e-mail address learn@croz.net

course-apply@3x
APPLY TO COURSE

For all inquiries regarding education, please contact us at learn@croz.net or apply online.

Apply for course