欢迎关注大数据技术架构与案例微信公众号:过往记忆大数据
过往记忆博客公众号iteblog_hadoop
欢迎关注微信公众号:
过往记忆大数据

Spark程序编写:继承App的问题

  我们知道,编写Scala程序的时候可以使用下面两种方法之一:

object IteblogTest extends App {
  //ToDo
}

object IteblogTest{
  def main(args: Array[String]): Unit = {
    //ToDo
  }
}

  上面的两种方法都可以运行程序,但是在Spark中,第一种方法有时可不会正确的运行(出现异常或者是数据不见了)。比如下面的代码运行的时候就会出现异常:

object IteblogTest extends App{
  val conf = new SparkConf().setAppName("Iteblog")
  val sc = new SparkContext(conf)
  val sample = sc.parallelize(1 to 100)
  val bro = sc.broadcast(0)
  val broSample = sample.map(x => x.toString + bro.value)
  broSample.collect().foreach(println)
}

运行上面的程序会出现以下的异常:

15/12/10 17:14:49 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, www.iteblog.com): java.lang.NullPointerException
  at com.iteblog.demo.IteblogTest$$anonfun$1.apply(IteblogTest.scala:13)
  at com.iteblog.demo.IteblogTest$$anonfun$1.apply(IteblogTest.scala:13)
  at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
  at scala.collection.Iterator$class.foreach(Iterator.scala:727)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
  at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
  at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
  at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
  at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
  at scala.collection.AbstractIterator.to(Iterator.scala:1157)
  at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
  at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
  at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
  at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
  at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:909)
  at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:909)
  at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
  at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
  at org.apache.spark.scheduler.Task.run(Task.scala:88)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:744)

  这是因为:如果你继承了App trait,那么里面的变量被当作了单例类的field了;而如果是main方法,则当作是局部变量了。而且trait App是继承了DelayedInit,所以里面的变量只有用到了main方法的时候才会被初始化。

It should be noted that this trait is implemented using the [[DelayedInit]] functionality, which means that fields of the object will not have been initialized before the main method has been executed.

为了程序能够正常地运行,最好不要继承App,直接用main方法。甚至在Spark官方文档也提出最好不要用App

Note that applications should define a `main()` method instead of extending `scala.App`.Subclasses of `scala.App` may not work correctly.

不过如果你实在是想继承App trait,可以将代码写成下面的格式:

object IteblogTest extends App {{
  //ToDo
}}
本博客文章除特别声明,全部都是原创!
原创文章版权归过往记忆大数据(过往记忆)所有,未经许可不得转载。
本文链接: 【Spark程序编写:继承App的问题】(https://www.iteblog.com/archives/1543.html)
喜欢 (5)
分享 (0)
发表我的评论
取消评论

表情
本博客评论系统带有自动识别垃圾评论功能,请写一些有意义的评论,谢谢!