Low CPU utilization in Spark

I'm running a Spark job in local mode on a 8 core machine. It has a local SSD and 64GB of RAM. HDFS is being run in pseudo distributed mode on the same machine. When running the below job, I can't get CPU utilization to get past maxing out a single core. RAM usage stays under 10GB. The loopback interface maxes out around 333MB/s. Disk IO is typically under 30MB/s either way. How can I write this to make better use of my hardware resources?

object FilterProperty {
    def main(args:Array[String]) {
        val conf = new SparkConf()
            .setAppName("Filter Claims Data for Property")
            .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
            .set("spark.cores.max", "16")
        val sc = new SparkContext(conf)
        val filtered = sc.textFile("hdfs://localhost:9000/user/kevin/intermediate/claims.json", 48)
            .filter(s => s != "")
            .map(s => Json.parse(s).as[JsObject])
        filtered.saveAsTextFile("hdfs://localhost:9000/user/kevin/intermediate/property_claims.json" + fn)


You should change this line of code




which means using as many threads as cores on your machine. Or you can set a number instead of * which means use that number of threads.

Need Your Help

Implement a fixed SearchBar and TableView in a Custom UIViewController

ios uitableview ios7 uisearchbar

The purpose is to implement a fixed search bar just like Contacts in iOS7.

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.