Apache Spark is a fast, high performance unified analytics engine. It provides high-level APIs in Scala, Java, and Python. Spark is mainly known for batch and data streaming, using the state-of-the-art DAG scheduler, a query optimizer, and physical execution engine.
In this post, we will see how we can execute a simple program with Spark Java. Refer.
- Prerequisite
Java must be installed first.
You will also need Maven Setup done already on your machine.
You will need an IDE like Eclipse to build and execute the project.
Find the Source Code on GitHub: Apache Spark – Java.
- Spark Java: Hello World Program
Once above pre-requisites are done, either use the GitHub project to clone and execute Spark Java Sample Program. Or Create Sample program with instructions below.
-
Once the JDK and Maven setup is done. In any of the empty directory checkout the Git project for SparkJava. This will be your workspace with eclipse to execute or modify the Apache Spark Java Source code.
-
The SparkJava program will be created in Eclipse and it will have /src/main/java/ folder for writing java source code.
Now, run the Maven command from this Project folder to build the project, mvn clean install.
And then, run the SparkJava program by executing the Main class,
SparkJavaApplication
which has the Main method. This will give you the result on a console as below.Then use the URL http://localhost:4567/hello to execute see the result on browser.
So here we are done with the first program with Apache Spark in Java.