欢迎关注大数据技术架构与案例微信公众号:过往记忆大数据
过往记忆博客公众号iteblog_hadoop
欢迎关注微信公众号:
过往记忆大数据

这是Learning Spark的目录,点击相应的标题可以进入阅读页面。PDF文档正在整理中,完整之后会公布下载地址。


如果想及时了解Spark、Hadoop或者Hbase相关的文章,欢迎关注微信公共帐号:iteblog_hadoop

Table of Contents

  1. Preface
    1. Audience
    2. How This Book is Organized
    3. Supporting Books
    4. Code Examples
    5. Early Release Status and Feedback
  2. 1. Introduction to Data Analysis with Spark
    1. What is Apache Spark?
    2. A Unified Stack
      1. Spark Core
      2. Spark SQL
      3. Spark Streaming
      4. MLlib
      5. GraphX
      6. Cluster Managers
    3. Who Uses Spark, and For What?
      1. Data Science Tasks
      2. Data Processing Applications
    4. A Brief History of Spark
    5. Spark Versions and Releases
    6. Spark and Hadoop
  3. 2. Downloading and Getting Started
    1. Downloading Spark
    2. Introduction to Spark’s Python and Scala Shells
    3. Introduction to Core Spark Concepts
    4. Standalone Applications
      1. Initializing a SparkContext
    5. Conclusion
  4. 3. Programming with RDDs
    1. RDD Basics
    2. Creating RDDs
    3. RDD Operations
      1. Transformations
      2. Actions
      3. Lazy Evaluation
    4. Passing Functions to Spark
      1. Python
      2. Scala
      3. Java
    5. Common Transformations and Actions
      1. Basic RDDs
        1. Transformations
        2. Element-wise transformations
        3. Pseudo Set Operations
        4. Actions
      2. Converting Between RDD Types
        1. Scala
        2. Java
        3. Python
    6. Persistence (Caching)
    7. Conclusion
  5. 4. Working with Key-Value Pairs
    1. Motivation
    2. Creating Pair RDDs
    3. Transformations on Pair RDDs
      1. Aggregations
        1. Tuning the Level of Parallelism
      2. Grouping Data
      3. Joins
      4. Sorting Data
    4. Actions Available on Pair RDDs
    5. Data Partitioning
      1. Determining an RDD’s Partitioner
      2. Operations that Benefit from Partitioning
      3. Operations that Affect Partitioning
      4. Example: PageRank
      5. Custom Partitioners
    6. Conclusion
  6. 5. Loading and Saving Your Data
    1. Motivation
    2. Choosing a Format
    3. Formats
      1. Text Files
      2. JSON
      3. CSV (Comma Separated Values) / TSV (Tab Separated Values)
      4. Sequence Files
      5. Object Files
      6. Hadoop Input and Output Formats
        1. Protocol Buffers
      7. Hive and Parquet
    4. File Systems
      1. Local/"Regular” FS
        1. Amazon S3
      2. HDFS
    5. Compression
    6. Databases
      1. Elasticsearch
      2. Mongo
      3. Cassandra
      4. HBase
      5. Java Database Connectivity (JDBC)
    7. Conclusion
  7. About the Authors
  8. Copyright
本博客文章除特别声明,全部都是原创!
原创文章版权归过往记忆大数据(过往记忆)所有,未经许可不得转载。
本文链接: 【Learning Spark(目录)】(https://www.iteblog.com/learning-spark-table-of-contents/)
发表我的评论
取消评论

表情
本博客评论系统带有自动识别垃圾评论功能,请写一些有意义的评论,谢谢!
(1)个小伙伴在吐槽
  1. 谢谢分享

    xiaogang08052014-12-30 13:16 回复