Apache Spark is a fast, high performance unified analytics engine. It provides high-level APIs in Scala, Java, and Python. Spark is mainly known for batch and data streaming, using the state-of-the-art DAG scheduler, a query optimizer, and physical execution engine.
In this post, we will see how we can execute a simple program with Spark Java. Refer.
Find the Source Code on GitHub: Apache Spark – Java.
- Spark Java: Hello World Program
Once above pre-requisites are done, either use the GitHub project to clone and execute Spark Java Sample Program. Or Create Sample program with instructions below.
Once the JDK and Maven setup is done. In any of the empty directory checkout the Git project for SparkJava. This will be your workspace with eclipse to execute or modify the Apache Spark Java Source code.
The SparkJava program will be created in Eclipse and it will have /src/main/java/ folder for writing java source code.
Now, run the Maven command from this Project folder to build the project, mvn clean install.
And then, run the SparkJava program by executing the Main class,
SparkJavaApplicationwhich has the Main method. This will give you the result on a console as below.
Then use the URL http://localhost:4567/hello to execute see the result on browser.
So here we are done with the first program with Apache Spark in Java.