What is the proper way of running a Spark application on YARN using Oozie (with Hue)?

I have written an application in Scala that uses Spark. The application consists of two modules - the App module which contains classes with different logic, and the Env module which contains environment and system initialization code, as well as utility functions. The entry point is located in Env, and after initialization, it creates a class in App (according to args, using Class.forName) and the logic is executed. The modules are exported into 2 different JARs (namely, env.jar and app.jar).

When I run the application locally, it executes well. The next step is to deploy the application to my servers. I use Cloudera's CDH 5.4.

I used Hue to create a new Oozie workflow with a Spark task with the following parameters:

  • Spark Master: yarn
  • Mode: cluster
  • App name: myApp
  • Jars/py files: lib/env.jar,lib/app.jar
  • Main class: env.Main (in Env module)
  • Arguments: app.AggBlock1Task

I then placed the 2 JARs inside the lib folder in the workflow's folder (/user/hue/oozie/workspaces/hue-oozie-1439807802.48).

When I run the workflow, it throws a FileNotFoundException and the application does not execute:

java.io.FileNotFoundException: File file:/cloudera/yarn/nm/usercache/danny/appcache/application_1439823995861_0029/container_1439823995861_0029_01_000001/lib/app.jar,lib/env.jar does not exist

However, when I leave the Spark master and mode parameters empty, it all works properly, but when I check spark.master programmatically it is set to local[*] and not yarn. Also, when observing the logs, I encountered this under Oozie Spark action configuration:

--master
null
--name
myApp
--class
env.Main
--verbose
lib/env.jar,lib/app.jar
app.AggBlock1Task

I assume I'm not doing it right - not setting Spark master and mode parameters and running the application with spark.master set to local[*]. As far as I understand, creating a SparkConf object within the application should set the spark.master property to whatever I specify in Oozie (in this case yarn) but it just doesn't work when I do that..

Is there something I'm doing wrong or missing? Any help will be much appreciated!

Answers


I managed to solve the problem by putting the two JARs in the user directory /user/danny/app/ and specifying the Jar/py files parameter as ${nameNode}/user/danny/app/env.jar. Running it caused a ClassNotFoundException to be thrown, even though the JAR was located at the same folder in HDFS. To work around that, I had to go to the settings and add the following to the options list: --jars ${nameNode}/user/danny/app/app.jar. This way the App module is referenced as well and the application runs successfully.


Need Your Help

Different Font size for Unicode Font

css unicode joomla fonts

I'm setting up a multilingual Joomla site where the user can choose the language. Say - English and Tibetan. Everything looks fine on English but when viewing on Tibetan site, the fonts are too sma...

Selecting data assistance

sql

I'm trying to display two sets of data. The first set shows the people that own complexes that have tenants in them. The second set shows people that own complexes that do not have any tenants in t...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.