我下载的Apache Zeppelin和Apache Spark版本分别为:0.6.0-incubating-SNAPSHOT和1.5.2,在Zeppelin中使用SQLContext读取Json文件创建DataFrame的过程中出现了以下的异常:
val profilesJsonRdd =sqlc.jsonFile("hdfs://www.iteblog.com/tmp/json") val profileDF=profilesJsonRdd.toDF() profileDF.printSchema() profileDF.show() profileDF.registerTempTable("profiles") com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope) at [Source: {"id":"0","name":"hadoopRDD"}; line: 1, column: 1] at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148) at com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843) at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533) at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:220) at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:143) at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:409) at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:358) at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:265) at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:245) at com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:143) at com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:439) at com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3666) at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3558) at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2578) at org.apache.spark.rdd.RDDOperationScope$.fromJson(RDDOperationScope.scala:82) at org.apache.spark.rdd.RDD$$anonfun$34.apply(RDD.scala:1603) at org.apache.spark.rdd.RDD$$anonfun$34.apply(RDD.scala:1603) at scala.Option.map(Option.scala:145) at org.apache.spark.rdd.RDD.<init>(RDD.scala:1603) at org.apache.spark.rdd.HadoopRDD.<init>(HadoopRDD.scala:101) at org.apache.spark.rdd.HadoopRDD.<init>(HadoopRDD.scala:122) at org.apache.spark.SparkContext$$anonfun$hadoopRDD$1.apply(SparkContext.scala:996) at org.apache.spark.SparkContext$$anonfun$hadoopRDD$1.apply(SparkContext.scala:992) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.SparkContext.withScope(SparkContext.scala:709) at org.apache.spark.SparkContext.hadoopRDD(SparkContext.scala:992) at org.apache.spark.sql.execution.datasources.json.JSONRelation.org$apache$spark$sql$execution$datasources$json$JSONRelation$$createBaseRdd(JSONRelation.scala:92) at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6$$anonfun$apply$1.apply(JSONRelation.scala:106) at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6$$anonfun$apply$1.apply(JSONRelation.scala:106) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6.apply(JSONRelation.scala:106) at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6.apply(JSONRelation.scala:100) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.execution.datasources.json.JSONRelation.dataSchema$lzycompute(JSONRelation.scala:100) at org.apache.spark.sql.execution.datasources.json.JSONRelation.dataSchema(JSONRelation.scala:99) at org.apache.spark.sql.sources.HadoopFsRelation.schema$lzycompute(interfaces.scala:561) at org.apache.spark.sql.sources.HadoopFsRelation.schema(interfaces.scala:560) at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:120) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104) at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:219) at org.apache.spark.sql.SQLContext.jsonFile(SQLContext.scala:1065) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:23) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:30) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:32) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:34) at $iwC$$iwC$$iwC.<init>(<console>:36) at $iwC$$iwC.<init>(<console>:38) at $iwC.<init>(<console>:40) at <init>(<console>:42) at .<init>(<console>:46) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:713) at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:678) at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:671) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:302) at org.apache.zeppelin.scheduler.Job.run(Job.java:171) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744)
运行第一句代码就出现了异常。分析了一些,原来Apache Zeppelin 0.6.0-incubating-SNAPSHOT版本依赖的Jackson 相关文件版本为:2.5.x(参考里面的README.md
文件),如下:
(Apache 2.0) Jackson (com.fasterxml.jackson.core:jackson-core:2.5.3 - https://github.com/FasterXML/jackson-core) (Apache 2.0) Jackson (com.fasterxml.jackson.core:jackson-annotations:2.5.0 - https://github.com/FasterXML/jackson-core) (Apache 2.0) Jackson (com.fasterxml.jackson.core:jackson-databind:2.5.3 - https://github.com/FasterXML/jackson-core)
而Apache Spark 1.5.2依赖的Jackson 相关文件版本为2.4.4(参考Spark的pom.xml
文件):
<fasterxml.jackson.version>2.4.4</fasterxml.jackson.version>
而jsonFile
函数在解析的时候用到了上面相关的类,导致了冲突。Apache Zeppelin在启动的时候会加载${ZEPPELIN_HOME}/lib
下面的类库,而里面就有jackson相关的jar文件:
-rw-r--r-- 1 iteblog iteblog 39815 Jan 20 16:35 jackson-annotations-2.5.0.jar -rw-r--r-- 1 iteblog iteblog 229998 Jan 20 16:35 jackson-core-2.5.3.jar -rw-r--r-- 1 iteblog iteblog 1143162 Jan 20 16:35 jackson-databind-2.5.3.jar
问题就处在这了,所以我们可以将上面三个jar文件全部替换成2.4.4版本的:
-rw-r--r-- 1 iteblog iteblog 38597 Nov 25 2014 jackson-annotations-2.4.4.jar -rw-r--r-- 1 iteblog iteblog 225302 Nov 25 2014 jackson-core-2.4.4.jar -rw-r--r-- 1 iteblog iteblog 1076926 Nov 25 2014 jackson-databind-2.4.4.jar
然后重启zeppelin,再运行上面的语句,问题不再出现了。本来我想在pom.xml文件里面直接修改jackson的版本,然后再重新编译,但是搜遍了里面的pom.xml文件也没找到上面三个jar。
本博客文章除特别声明,全部都是原创!原创文章版权归过往记忆大数据(过往记忆)所有,未经许可不得转载。
本文链接: 【Apache Zeppelin & Spark 解析Json异常】(https://www.iteblog.com/archives/1570.html)
Could not initialize class org.apache.spark.rdd.RDDOperationScope
这个问题也是这个原因造成的,。。。
感谢补充