Spark 1.6.1于2016年3月11日正式发布,此版本主要是维护版本,主要涉及稳定性修复,并不涉及到大的修改。推荐所有使用1.6.0的用户升级到此版本。
Spark 1.6.1主要修复的bug包括:
1、当写入数据到含有大量分区表时出现的OOM:SPARK-12546
2、实验性Dataset API的许多bug修复:SPARK-12478, SPARK-12696, SPARK-13101, SPARK-12932
完整的Bug修改列表可见: http://s.apache.org/spark-1.6.1
Release Notes - Spark - Version 1.6.1
Sub-task
- [SPARK-11031] - SparkR str() method on DataFrame objects
- [SPARK-12393] - Add read.text and write.text for SparkR
Bug
- [SPARK-7615] - MLLIB Word2Vec wordVectors divided by Euclidean Norm equals to zero
- [SPARK-9844] - File appender race condition during SparkWorker shutdown
- [SPARK-10524] - Decision tree binary classification with ordered categorical features: incorrect centroid
- [SPARK-10847] - Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure
- [SPARK-11394] - PostgreDialect cannot handle BYTE types
- [SPARK-11624] - Spark SQL CLI will set sessionstate twice
- [SPARK-11972] - [Spark SQL] the value of 'hiveconf' parameter in CLI can't be got after enter spark-sql session
- [SPARK-12006] - GaussianMixture.train crashes if an initial model is not None
- [SPARK-12010] - Spark JDBC requires support for column-name-free INSERT syntax
- [SPARK-12016] - word2vec load model can't use findSynonyms to get words
- [SPARK-12026] - ChiSqTest gets slower and slower over time when number of features is large
- [SPARK-12268] - pyspark shell uses execfile which breaks python3 compatibility
- [SPARK-12300] - Fix schema inferance on local collections
- [SPARK-12316] - Stack overflow with endless call of `Delegation token thread` when application end.
- [SPARK-12327] - lint-r checks fail with commented code
- [SPARK-12346] - GLM summary crashes with NoSuchElementException if attributes are missing names
- [SPARK-12363] - PowerIterationClustering test case failed if we deprecated KMeans.setRuns
- [SPARK-12399] - Display correct error message when accessing REST API with an unknown app Id
- [SPARK-12424] - The implementation of ParamMap#filter is wrong.
- [SPARK-12453] - Spark Streaming Kinesis Example broken due to wrong AWS Java SDK version
- [SPARK-12470] - Incorrect calculation of row size in o.a.s.sql.catalyst.expressions.codegen.GenerateUnsafeRowJoiner
- [SPARK-12477] - [SQL] Tungsten projection fails for null values in array fields
- [SPARK-12478] - Dataset fields of product types can't be null
- [SPARK-12486] - Executors are not always terminated successfully by the worker.
- [SPARK-12489] - Fix minor issues found by Findbugs
- [SPARK-12499] - make_distribution should not override MAVEN_OPTS
- [SPARK-12502] - Script /dev/run-tests fails when IBM Java is used
- [SPARK-12511] - streaming driver with checkpointing unable to finalize leading to OOM
- [SPARK-12517] - No default RDD name for ones created by sc.textFile
- [SPARK-12526] - `ifelse`, `when`, `otherwise` unable to take Column as value
- [SPARK-12546] - Writing to partitioned parquet table can fail with OOM
- [SPARK-12558] - AnalysisException when multiple functions applied in GROUP BY clause
- [SPARK-12562] - DataFrame.write.format("text") requires the column name to be called value
- [SPARK-12579] - User-specified JDBC driver should always take precedence
- [SPARK-12582] - IndexShuffleBlockResolverSuite fails in windows
- [SPARK-12589] - result row size is wrong in UnsafeRowParquetRecordReader
- [SPARK-12591] - NullPointerException using checkpointed mapWithState with KryoSerializer
- [SPARK-12598] - Bug in setMinPartitions function of StreamFileInputFormat
- [SPARK-12611] - test_infer_schema_to_local depended on old handling of missing value in row
- [SPARK-12617] - socket descriptor leak killing streaming app
- [SPARK-12624] - When schema is specified, we should give better error message if actual row length doesn't match
- [SPARK-12629] - SparkR: DataFrame's saveAsTable method has issues with the signature and HiveContext
- [SPARK-12638] - Parameter explaination not very accurate for rdd function "aggregate"
- [SPARK-12647] - 1.6 branch test failure o.a.s.sql.execution.ExchangeCoordinatorSuite.determining the number of reducers: aggregate operator
- [SPARK-12654] - sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop
- [SPARK-12662] - Add a local sort operator to DataFrame used by randomSplit
- [SPARK-12673] - Prepending base URI of job description is missing
- [SPARK-12678] - MapPartitionsRDD should clear reference to prev RDD
- [SPARK-12682] - Hive will fail if the schema of a parquet table has a very wide schema
- [SPARK-12685] - word2vec trainWordsCount gets overflow
- [SPARK-12690] - NullPointerException in UnsafeInMemorySorter.free()
- [SPARK-12696] - Dataset serialization error
- [SPARK-12708] - Sorting task error in Stages Page when yarn mode
- [SPARK-12711] - ML StopWordsRemover does not protect itself from column name duplication
- [SPARK-12734] - Fix Netty exclusions and use Maven Enforcer to prevent bug from being reintroduced
- [SPARK-12739] - Details of batch in Streaming tab uses two Duration columns
- [SPARK-12746] - ArrayType(_, true) should also accept ArrayType(_, false)
- [SPARK-12747] - Postgres JDBC ArrayType(DoubleType) 'Unable to find server array type'
- [SPARK-12755] - Spark may attempt to rebuild application UI before finishing writing the event logs in possible race condition
- [SPARK-12760] - inaccurate description for difference between local vs cluster mode in closure handling
- [SPARK-12780] - Inconsistency returning value of ML python models' properties
- [SPARK-12783] - Dataset map serialization error
- [SPARK-12784] - Spark UI IndexOutOfBoundsException with dynamic allocation
- [SPARK-12805] - Outdated details in doc related to Mesos run modes
- [SPARK-12807] - Spark External Shuffle not working in Hadoop clusters with Jackson 2.2.3
- [SPARK-12841] - UnresolvedException with cast
- [SPARK-12859] - Names of input streams with receivers don't fit in Streaming page
- [SPARK-12874] - ML StringIndexer does not protect itself from column name duplication
- [SPARK-12921] - Use SparkHadoopUtil reflection to access TaskAttemptContext in SpecificParquetRecordReaderBase
- [SPARK-12961] - Work around memory leak in Snappy library
- [SPARK-12989] - Bad interaction between StarExpansion and ExtractWindowExpressions
- [SPARK-13047] - Pyspark Params.hasParam should not throw an error
- [SPARK-13056] - Map column would throw NPE if value is null
- [SPARK-13082] - sqlCtx.real.json() doesn't work with PythonRDD
- [SPARK-13087] - Grouping by a complex expression may lead to incorrect AttributeReferences in aggregations
- [SPARK-13088] - DAG viz does not work with latest version of chrome
- [SPARK-13101] - Dataset complex types mapping to DataFrame (element nullability) mismatch
- [SPARK-13121] - java mapWithState mishandles scala Option
- [SPARK-13122] - Race condition in MemoryStore.unrollSafely() causes memory leak
- [SPARK-13142] - Problem accessing Web UI /logPage/ on Microsoft Windows
- [SPARK-13153] - PySpark ML persistence failed when handle no default value parameter
- [SPARK-13195] - PairDStreamFunctions.mapWithState fails in case timeout is set without updating State[S]
- [SPARK-13265] - Refactoring of basic ML import/export for other file system besides HDFS
- [SPARK-13298] - DAG visualization does not render correctly for jobs
- [SPARK-13300] - Spark examples page gives errors : Liquid error: pygments
- [SPARK-13312] - ML Model Selection via Train Validation Split example uses incorrect data
- [SPARK-13355] - Replace GraphImpl.fromExistingRDDs by Graph
- [SPARK-13371] - TaskSetManager.dequeueSpeculativeTask compares Option[String] and String directly.
- [SPARK-13390] - Java Spark createDataFrame with List parameter bug
- [SPARK-13410] - unionAll AnalysisException with DataFrames containing UDT columns.
- [SPARK-13441] - NullPointerException when either HADOOP_CONF_DIR or YARN_CONF_DIR is not readable
- [SPARK-13454] - Cannot drop table whose name starts with underscore
- [SPARK-13473] - Predicate can't be pushed through project with nondeterministic field
- [SPARK-13475] - HiveCompatibilitySuite should still run in PR builder even if a PR only changes sql/core
- [SPARK-13482] - `spark.storage.memoryMapThreshold` has two kind of the value.
- [SPARK-13697] - TransformFunctionSerializer.loads doesn't restore the function's module name if it's '__main__'
Documentation
- [SPARK-12351] - Add documentation of submitting Mesos jobs with cluster mode
- [SPARK-12507] - Expose closeFileAfterWrite and allowBatching configurations for Streaming
- [SPARK-12722] - Typo in Spark Pipeline example
- [SPARK-12758] - Add note to Spark SQL Migration section about SPARK-11724
- [SPARK-12814] - Add deploy instructions for Python in flume integration doc
- [SPARK-12894] - Add deploy instructions for Python in Kinesis integration doc
- [SPARK-13214] - Fix dynamic allocation docs
- [SPARK-13274] - Fix Aggregator Links on GroupedDataset Scala API
- [SPARK-13350] - Configuration documentation incorrectly states that PYSPARK_PYTHON's default is "python"
- [SPARK-13439] - Document that spark.mesos.uris is comma-separated
Improvement
- [SPARK-5273] - Improve documentation examples for LinearRegression
- [SPARK-11780] - Provide type aliases in org.apache.spark.sql.types for backwards compatibility
- [SPARK-12120] - Improve exception message when failing to initialize HiveContext in PySpark
- [SPARK-12411] - Reconsider executor heartbeats rpc timeout
- [SPARK-12450] - Un-persist broadcasted variables in KMeans
- [SPARK-12701] - Logging FileAppender should use join to ensure thread is finished
- [SPARK-12834] - Use type conversion instead of Ser/De of Pickle to transform JavaArray and JavaList
- [SPARK-12932] - Bad error message with trying to create Dataset from RDD of Java objects that are not bean-compliant
- [SPARK-13094] - No encoder implicits for Seq[Primitive]
- [SPARK-13279] - Scheduler does O(N^2) operation when adding a new task set (making it prohibitively slow for scheduling 200K tasks)
New Feature
- [SPARK-10359] - Enumerate Spark's dependencies in a file and diff against it for new pull requests
Task
- [SPARK-13474] - Update packaging scripts to stage artifacts to home.apache.org
原创文章版权归过往记忆大数据(过往记忆)所有,未经许可不得转载。
本文链接: 【Spark 1.6.1正式发布】(https://www.iteblog.com/archives/1599.html)