欢迎关注大数据技术架构与案例微信公众号:过往记忆大数据
过往记忆博客公众号iteblog_hadoop
欢迎关注微信公众号:
过往记忆大数据

Apache Zeppelin & Spark 解析Json异常

  我下载的Apache Zeppelin和Apache Spark版本分别为:0.6.0-incubating-SNAPSHOT和1.5.2,在Zeppelin中使用SQLContext读取Json文件创建DataFrame的过程中出现了以下的异常:

val profilesJsonRdd =sqlc.jsonFile("hdfs://www.iteblog.com/tmp/json")
val profileDF=profilesJsonRdd.toDF()
profileDF.printSchema()
profileDF.show()
profileDF.registerTempTable("profiles")

com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)
 at [Source: {"id":"0","name":"hadoopRDD"}; line: 1, column: 1]
  at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
  at com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843)
  at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533)
  at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:220)
  at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:143)
  at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:409)
  at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:358)
  at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:265)
  at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:245)
  at com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:143)
  at com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:439)
  at com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3666)
  at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3558)
  at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2578)
  at org.apache.spark.rdd.RDDOperationScope$.fromJson(RDDOperationScope.scala:82)
  at org.apache.spark.rdd.RDD$$anonfun$34.apply(RDD.scala:1603)
  at org.apache.spark.rdd.RDD$$anonfun$34.apply(RDD.scala:1603)
  at scala.Option.map(Option.scala:145)
  at org.apache.spark.rdd.RDD.<init>(RDD.scala:1603)
  at org.apache.spark.rdd.HadoopRDD.<init>(HadoopRDD.scala:101)
  at org.apache.spark.rdd.HadoopRDD.<init>(HadoopRDD.scala:122)
  at org.apache.spark.SparkContext$$anonfun$hadoopRDD$1.apply(SparkContext.scala:996)
  at org.apache.spark.SparkContext$$anonfun$hadoopRDD$1.apply(SparkContext.scala:992)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
  at org.apache.spark.SparkContext.withScope(SparkContext.scala:709)
  at org.apache.spark.SparkContext.hadoopRDD(SparkContext.scala:992)
  at org.apache.spark.sql.execution.datasources.json.JSONRelation.org$apache$spark$sql$execution$datasources$json$JSONRelation$$createBaseRdd(JSONRelation.scala:92)
  at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6$$anonfun$apply$1.apply(JSONRelation.scala:106)
  at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6$$anonfun$apply$1.apply(JSONRelation.scala:106)
  at scala.Option.getOrElse(Option.scala:120)
  at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6.apply(JSONRelation.scala:106)
  at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6.apply(JSONRelation.scala:100)
  at scala.Option.getOrElse(Option.scala:120)
  at org.apache.spark.sql.execution.datasources.json.JSONRelation.dataSchema$lzycompute(JSONRelation.scala:100)
  at org.apache.spark.sql.execution.datasources.json.JSONRelation.dataSchema(JSONRelation.scala:99)
  at org.apache.spark.sql.sources.HadoopFsRelation.schema$lzycompute(interfaces.scala:561)
  at org.apache.spark.sql.sources.HadoopFsRelation.schema(interfaces.scala:560)
  at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:120)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104)
  at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:219)
  at org.apache.spark.sql.SQLContext.jsonFile(SQLContext.scala:1065)
  at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:23)
  at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28)
  at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:30)
  at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:32)
  at $iwC$$iwC$$iwC$$iwC.<init>(<console>:34)
  at $iwC$$iwC$$iwC.<init>(<console>:36)
  at $iwC$$iwC.<init>(<console>:38)
  at $iwC.<init>(<console>:40)
  at <init>(<console>:42)
  at .<init>(<console>:46)
  at .<clinit>(<console>)
  at .<init>(<console>:7)
  at .<clinit>(<console>)
  at $print(<console>)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
  at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
  at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
  at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
  at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
  at org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:713)
  at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:678)
  at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:671)
  at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
  at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
  at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:302)
  at org.apache.zeppelin.scheduler.Job.run(Job.java:171)
  at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:744)

运行第一句代码就出现了异常。分析了一些,原来Apache Zeppelin 0.6.0-incubating-SNAPSHOT版本依赖的Jackson 相关文件版本为:2.5.x(参考里面的README.md文件),如下:

(Apache 2.0) Jackson (com.fasterxml.jackson.core:jackson-core:2.5.3 - https://github.com/FasterXML/jackson-core)
(Apache 2.0) Jackson (com.fasterxml.jackson.core:jackson-annotations:2.5.0 - https://github.com/FasterXML/jackson-core)
(Apache 2.0) Jackson (com.fasterxml.jackson.core:jackson-databind:2.5.3 - https://github.com/FasterXML/jackson-core)

而Apache Spark 1.5.2依赖的Jackson 相关文件版本为2.4.4(参考Spark的pom.xml文件):

<fasterxml.jackson.version>2.4.4</fasterxml.jackson.version>

jsonFile函数在解析的时候用到了上面相关的类,导致了冲突。Apache Zeppelin在启动的时候会加载${ZEPPELIN_HOME}/lib下面的类库,而里面就有jackson相关的jar文件:

-rw-r--r-- 1 iteblog iteblog   39815 Jan 20 16:35 jackson-annotations-2.5.0.jar
-rw-r--r-- 1 iteblog iteblog  229998 Jan 20 16:35 jackson-core-2.5.3.jar
-rw-r--r-- 1 iteblog iteblog 1143162 Jan 20 16:35 jackson-databind-2.5.3.jar

问题就处在这了,所以我们可以将上面三个jar文件全部替换成2.4.4版本的:

-rw-r--r-- 1 iteblog iteblog   38597 Nov 25  2014 jackson-annotations-2.4.4.jar
-rw-r--r-- 1 iteblog iteblog  225302 Nov 25  2014 jackson-core-2.4.4.jar
-rw-r--r-- 1 iteblog iteblog 1076926 Nov 25  2014 jackson-databind-2.4.4.jar

然后重启zeppelin,再运行上面的语句,问题不再出现了。本来我想在pom.xml文件里面直接修改jackson的版本,然后再重新编译,但是搜遍了里面的pom.xml文件也没找到上面三个jar。

本博客文章除特别声明,全部都是原创!
原创文章版权归过往记忆大数据(过往记忆)所有,未经许可不得转载。
本文链接: 【Apache Zeppelin & Spark 解析Json异常】(https://www.iteblog.com/archives/1570.html)
喜欢 (11)
分享 (0)
发表我的评论
取消评论

表情
本博客评论系统带有自动识别垃圾评论功能,请写一些有意义的评论,谢谢!
(2)个小伙伴在吐槽
  1. Could not initialize class org.apache.spark.rdd.RDDOperationScope

    这个问题也是这个原因造成的,。。。

    Xing Zhang2016-10-10 17:03 回复
    • 感谢补充

      w3970907702016-10-11 09:18 回复