分类：Spark

Apache Spark相比Hadoop的优势

　　以下的话是由Apache Spark committer的Reynold Xin阐述。　　从很多方面来讲，Spark都是MapReduce 模式的最好实现。比如从程序抽象的角度来看：　　1、他抽象出Map/Reduce两个阶段来支持tasks的任意DAG。大多数计算通过依赖将maps和reduces映射到一起(Most computation maps (no pun intended) into many maps and reduces with dependencies among them. )。而在Spark的RDD

w397090770 10年前 (2015-03-09) 8117℃ 0评论9喜欢

Spark函数讲解：coalesce

　　对RDD中的分区重新进行合并。函数原型[code lang="scala"]def coalesce(numPartitions: Int, shuffle: Boolean = false)　　　　(implicit ord: Ordering[T] = null): RDD[T][/code]　　返回一个新的RDD，且该RDD的分区个数等于numPartitions个数。如果shuffle设置为true，则会进行shuffle。实例[code lang="scala"]/** * User: 过往记忆 * Date: 15-03-09 * Time: 上午0

w397090770 10年前 (2015-03-09) 14289℃ 1评论5喜欢

Spark函数讲解序列文章

　　本博客近日将对Spark 1.2.1 RDD中所有的函数进行讲解，主要包括函数的解释，实例以及注意事项，每日一篇请关注。以下是将要介绍的函数，按照字母的先后顺序进行介绍，可以点的说明已经发布了。　　aggregate、aggregateByKey、cache、cartesian、checkpoint、coalesce、cogroup groupWith collect, toArraycollectAsMap combineByKey computecontext, spar

w397090770 10年前 (2015-03-08) 7283℃ 0评论6喜欢

Spark函数讲解：checkpoint

　　为当前RDD设置检查点。该函数将会创建一个二进制的文件，并存储到checkpoint目录中，该目录是用SparkContext.setCheckpointDir()设置的。在checkpoint的过程中，该RDD的所有依赖于父RDD中的信息将全部被移出。对RDD进行checkpoint操作并不会马上被执行，必须执行Action操作才能触发。函数原型[code lang="scala"]def checkpoint()[/code]实例

w397090770 10年前 (2015-03-08) 60578℃ 0评论7喜欢

Spark函数讲解：cartesian

　　从名字就可以看出这是笛卡儿的意思，就是对给的两个RDD进行笛卡儿计算。官方文档说明：Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of elements (a, b) where a is in `this` and b is in `other`.函数原型[code lang="scala"]def cartesian[U: ClassTag](other: RDD[U]): RDD[(T, U)][/code]　　该函数返回的是Pair类型的RDD，计算结果

w397090770 10年前 (2015-03-07) 11321℃ 0评论5喜欢

Spark函数讲解：cache

　　使用MEMORY_ONLY储存级别对RDD进行缓存，其内部实现是调用persist()函数的。官方文档定义：Persist this RDD with the default storage level (`MEMORY_ONLY`).函数原型[code lang="scala"]def cache() : this.type[/code]实例[code lang="scala"]/** * User: 过往记忆 * Date: 15-03-04 * Time: 下午06:30 * bolg: * 本文地址：/archives/1274 * 过往记忆博客，

w397090770 10年前 (2015-03-04) 14195℃ 0评论8喜欢

Spark函数讲解：aggregateByKey

　　该函数和aggregate类似，但操作的RDD是Pair类型的。Spark 1.1.0版本才正式引入该函数。官方文档定义：Aggregate the values of each key, using given combine functions and a neutral "zero value". This function can return a different result type, U, than the type of the values in this RDD, V. Thus, we need one operation for merging a V into a U and one operation for merging two U's, as in scala.Traversabl

w397090770 10年前 (2015-03-02) 39650℃ 2评论35喜欢

Spark函数讲解：aggregate

　　我们先来看看aggregate函数的官方文档定义：Aggregate the elements of each partition, and then the results for all the partitions, using given combine functions and a neutral "zero value". This function can return a different result type, U, than the type of this RDD, T. Thus, we need one operation for merging a T into an U and one operation for merging two U's, as in scala.TraversableOnce. Both of these functions

w397090770 10年前 (2015-02-12) 37467℃ 5评论23喜欢

Learning Spark完整版下载

Learning Spark这本书链接是完整版，和之前的预览版是不一样的，我不是标题党。这里提供的Learning Spark电子书格式是mobi、pdf以及epub三种格式的文件，如果你有亚马逊Kindle电子书阅读器，是可以直接阅读mobi、pdf。但如果你用电脑，也可以下载相应的PC版阅读器。如果你需要阅读器，可以找我。如果想及时了解Spark、Hadoop或者Hbase相

w397090770 10年前 (2015-02-11) 51038℃ 305评论70喜欢

Spark 1.2.1稳定版本发布(released)

　　美国时间2015年2月09日Spark 1.2.1正式发布了，邮件如下：Hi All,I've just posted the 1.2.1 maintenance release of Apache Spark. We recommend all 1.2.0 users upgrade to this release, as this release includes stability fixes across all components of Spark.- Download this release: http://spark.apache.org/downloads.html- View the release notes: http://spark.apache.org/releases/spark-release-1-2-1.html-

w397090770 10年前 (2015-02-10) 3507℃ 0评论2喜欢

上一页
1
···
33
34
35
36
37
38
39
40
41
42
43
...
46
下一页
共 46 页