Apache Spark – Java

Apache Spark is a fast, high performance unified analytics engine. It provides high-level APIs in Scala, Java, and Python. Spark is mainly known for batch and data streaming, using the state-of-the-art DAG scheduler, a query optimizer, and physical execution engine.

In this post, we will see how we can execute a simple program with Spark Java. Refer.

  • Prerequisite

Java must be installed first.
You will also need Maven Setup done already on your machine.
You will need an IDE like Eclipse to build and execute the project.

Find the Source Code on GitHub: Apache Spark – Java.

  • Spark Java: Hello World Program

Once above pre-requisites are done, either use the GitHub project to clone and execute Spark Java Sample Program. Or Create Sample program with instructions below.

  1. Once the JDK and Maven setup is done. In any of the empty directory checkout the Git project for SparkJava. This will be your workspace with eclipse to execute or modify the Apache Spark Java Source code.

    Apache Spark Java Maven Project

  2. The SparkJava program will be created in Eclipse and it will have /src/main/java/ folder for writing java source code.

    Now, run the Maven command from this Project folder to build the project, mvn clean install.

    And then, run the SparkJava program by executing the Main class, SparkJavaApplication which has the Main method. This will give you the result on a console as below.

    Then use the URL http://localhost:4567/hello to execute see the result on browser.

    HelloGuice Console Result

    So here we are done with the first program with Apache Spark in Java.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.