默认情况下,Apache Zeppelin启动Spark是以本地模式起的,master
的值是local[*]
,我们可以通过修改conf/zeppelin-env.sh
文件里面的MASTER
的值如下:
export MASTER= yarn-client export HADOOP_HOME=/home/q/hadoop/hadoop-2.2.0 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop/
然后启动Zeppelin,但是我们有时会发现日志出现了以下的异常信息:
ERROR [2016-01-20 18:28:35,016] ({pool-2-thread-2} Logging.scala[logError]:96) - Error initializing SparkContext. org.apache.spark.SparkException: Unable to load YARN support at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:389) at org.apache.spark.deploy.SparkHadoopUtil$.yarn$lzycompute(SparkHadoopUtil.scala:384) at org.apache.spark.deploy.SparkHadoopUtil$.yarn(SparkHadoopUtil.scala:384) at org.apache.spark.deploy.SparkHadoopUtil$.get(SparkHadoopUtil.scala:401) at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:2049) at org.apache.spark.storage.BlockManager.(BlockManager.scala:97) at org.apache.spark.storage.BlockManager. (BlockManager.scala:173) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:347) at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:194) at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277) at org.apache.spark.SparkContext. (SparkContext.scala:450) at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:343) at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:145) at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:469) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at org.apache.zeppelin.spark.SparkSqlInterpreter.getSparkInterpreter(SparkSqlInterpreter.java:100) at org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:115) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:302) at org.apache.zeppelin.scheduler.Job.run(Job.java:171) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.YarnSparkHadoopUtil at scala.tools.nsc.interpreter.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:83) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.spark.util.Utils$.classForName(Utils.scala:173) at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:385) ... 29 more
导致这个异常的原因是我们在编译Apache Zeppelin没有加入对YARN的依赖,解决的办法是我们可以在编译Apache Zeppelin的时候加上-Pyarn
来解决上面的问题:
[iteblog@www.iteblog.com iteblog]$ mvn package -Pspark-1.5 \ -DskipTests -Dhadoop.version=2.2.0 -Phadoop-2.2 -Pyarn
然后再启动Apache Zeppelin
[iteblog@www.iteblog.com iteblog]$ bin/zeppelin-daemon.sh start
然后启动的Spark Job就会在YARN上跑。所有的日志可以在logs目录下找,在我的例子里面记录日志的文件是zeppelin-interpreter-spark-iteblog-www.iteblog.com.log
。
上述的Hadoop为Apache发行版的,如果你运行的是hdp(Hortonworks Data Platform)类型的Hadoop,你除了设置
本博客文章除特别声明,全部都是原创!MASTER
和HADOOP_CONF_DIR
之外,还必须设置hdp发行版Hadoop的版本,如下: export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.1.0-2574"
关于hdp版本不知道如何获取可以运行下面命令得到:
hdp-select status hadoop-client | sed 's/hadoop-client - \(.*\)/\1/' # It returned 2.3.1.0-2574
另外需要注意的是,Apache Zeppelin & Spark在YARN模式上运行只支持yarn-client模式,如果你将MASTER设置成了yarn-cluster,那么将会出现异常:
INFO [2016-01-21 10:36:59,445] ({pool-1-thread-5} Logging.scala[logInfo]:59) - Running Spark version 1.5.2 ERROR [2016-01-21 10:36:59,447] ({pool-1-thread-5} Logging.scala[logError]:96) - Error initializing SparkContext. org.apache.spark.SparkException: Detected yarn-cluster mode, but isn't running on a cluster. Deployment to YARN is not supported directly by SparkContext. Please use spark-submit. at org.apache.spark.SparkContext.<init>(SparkContext.scala:404) at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:343) at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:145) at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:469) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:109) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.getProgress(RemoteInterpreterServer.java:335) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1090) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1075) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744)
原创文章版权归过往记忆大数据(过往记忆)所有,未经许可不得转载。
本文链接: 【在Yarn上运行Apache Zeppelin & Spark】(https://www.iteblog.com/archives/1571.html)
博主:你好,
设置好后,启动还是本地提交模式呢,在yarn的web界面没看到有任务提交?
你好,我的配置完之后报错信息如下,我不知道怎么解决,请教一下,希望可以得到回复,谢谢!
权限不对啊,你使用root用户写/user目录,但是你应该不属于hdfs的用户组,所以你修改一些启动zeppelin的用户吧。
大神,Zeppelin必须部署在hadoop集群的namenode或datanode上面吗?
可以单独部署,然后提交任务到yarn吗?
不是,你可以部署在任何机器上,然后配置好相关的HADOOP_HOME环境变量就可以将任务提交到yarn上。
大神,如果独立部署的话,怎么提交任务到yarn集群呢?
这篇文章不就是介绍这个的吗?
大神,我已经设置了Zeppelin机器上的hadoop,yarn-site.xml和core-site.xml都指向了yarn集群的nn,但是在notebook提交spark任务的时候,日志里面老是提示这个:Connecting to ResourceManager at 0.0.0.0/0.0.0.0:8032,提交不过去啊。
另外我在这台机器上执行“hadoop fs -ls /”是可以显示集群的hdfs资源的。
你的Hadoop环境变量是配置在conf/zeppelin-env.s里面吗?你的那个错误应该是Zeppelin没有读取到Hadoop相关的配置。
在zeppelin-env.sh文件里面设置了hadoop的环境变量;除此之外,还在profile文件里面也设置了全局的hadoop环境变量。当然都是指定的同一个地方。
很奇怪,我刚刚又看了一下,我的zeppelin-env.sh文件里面只配置了下面四个配置:
export MASTER=yarn-client
export SPARK_HOME=/user/iteblog/spark-1.5.2-bin-2.2.0/
export HADOOP_HOME=/user/iteblog/hadoop-2.2.0
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop/
然后我启动了zeppelin,可以在YARN上运行。
确实需要export SPARK_HOME=...这个配置,配置了这个之后,就可以正常提交到yarn集群了。谢谢大神耐心回复 🙂
你好,这是我的配置文件zeppelin-env.sh
export JAVA_HOME=/usr/local/jdk1.8
export MASTER=yarn-client
export HADOOP_HOME=/usr/local/hadoop-2.6.0
export HADOOP_CONF_DIR=/usr/local/hadoop-2.6.0/etc/hadoop/
export SPARK_HOME=/usr/local/spark-1.5.2
但是启动会执行
sc.version
java.lang.ClassNotFoundException: org.apache.spark.repl.SparkCommandLine
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:393)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:344)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
zeppelin编译通过了吗?会不会编译是时候有报错?
大神可否添加一下qq,互相交流一下 1578326883 先谢过
谢谢你回复,编译的时候确实有报错,不过最后还是success了,但是就是测试不成功,这能用%sh,其他的不行,可否添加一下你的qq,请教一下,我的qq(1578326883)互相学习