Spark1.1.0预览文档(Spark Overview)

文章目录

1 下载
2 运行例子以Shell
3 在集群上运行Spark

　　Apache Spark是快速的通用集群计算系统。它在Java、Scala以及Python等语言提供了高层次的API，并且在通用的图形计算方面提供了一个优化的引擎。同时，它也提供了丰富的高层次工具，这些工具包括了Spark SQL、结构化数据处理、机器学习工具(MLlib)、图形计算(GraphX)以及Spark Streaming。

如果想及时了解Spark、Hadoop或者Hbase相关的文章，欢迎关注微信公共帐号：iteblog_hadoop

下载

　　大家可以从Spark工程的下载页面得到相应的park。本文档是基于Spark 1.1.0版本。在下载页面包含了针对各个HDFS流行版本预先编译好的Spark包。如果你自己想从源码中编译Spark，可以阅读 building Spark with Maven。
Spark可以在Windows和类UNIX系统（比如Linux，Mac OS）上面运行。在一台机器上运行Spark很简单，只需要安装好Java，并且设置好相应的PATH以及JAVA_HOME环境变量。
Spark可以在Java 6+和Python 2.6+上运行。Spark 1.1.0用到了Scala 2.10，你需要用兼容版本的Scala（2.10.X）

运行例子以Shell

　　Spark安装包中一些例子程序，这些程序在examples/src/main目录中，并且有Scala, Java 和Python版本。如果想运行Java或Scala例子，可以在Spark的顶级目录运行bin/run-example [params]（其实在背后是调用了spark-submit脚本来运行这个applications）,比如：

./bin/run-example SparkPi 10

　　你也可以通过交互式的Scala Shell上运行Spark，这是学习Spark框架非常有效的方法：

./bin/spark-shell --master local[2]

　　--master选项指定了分布式集群的master URL或者是local（用一个线程在本地运行）或者是local[N](用N个线程在本地运行)。你可以用local来测试Spark程序。如果你想了解全部的选项。可以在运行的时候加上--help选项。

./bin/spark-shell   --help

Usage: ./bin/spark-shell [options]
Options:
  --master MASTER_URL         spark://host:port, mesos://host:port, yarn, or local.
  --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
                              on one of the worker machines inside the cluster ("cluster")
                              (Default: client).
  --class CLASS_NAME          Your application's main class (for Java / Scala apps).
  --name NAME                 A name of your application.
  --jars JARS                 Comma-separated list of local jars to include on the driver
                              and executor classpaths.
  --py-files PY_FILES         Comma-separated list of .zip, .egg, or .py files to place
                              on the PYTHONPATH for Python apps.
  --files FILES               Comma-separated list of files to be placed in the working
                              directory of each executor.

  --conf PROP=VALUE           Arbitrary Spark configuration property.
  --properties-file FILE      Path to a file from which to load extra properties. If not
                              specified, this will look for conf/spark-defaults.conf.

  --driver-memory MEM         Memory for driver (e.g. 1000M, 2G) (Default: 512M).
  --driver-java-options       Extra Java options to pass to the driver.
  --driver-library-path       Extra library path entries to pass to the driver.
  --driver-class-path         Extra class path entries to pass to the driver. Note that
                              jars added with --jars are automatically included in the
                              classpath.

  --executor-memory MEM       Memory per executor (e.g. 1000M, 2G) (Default: 1G).

  --help, -h                  Show this help message and exit
  --verbose, -v               Print additional debug output

 Spark standalone with cluster deploy mode only:
  --driver-cores NUM          Cores for driver (Default: 1).
  --supervise                 If given, restarts the driver on failure.

 Spark standalone and Mesos only:
  --total-executor-cores NUM  Total cores for all executors.

 YARN-only:
  --executor-cores NUM        Number of cores per executor (Default: 1).
  --queue QUEUE_NAME          The YARN queue to submit to (Default: "default").
  --num-executors NUM         Number of executors to launch (Default: 2).
  --archives ARCHIVES         Comma separated list of archives to be extracted into the
                              working directory of each executor.

　　Spark同时也提供了Python API，如果想交互式的运行Python interpreter。可以用bin/pyspark：

./bin/pyspark --master local[2]

Spark同时也提供Python版本的例子，可以用下面命令运行：

./bin/spark-submit examples/src/main/python/pi.py 10

在集群上运行Spark

　　在Spark集群模式预览页面上解释了在集群上运行Spark的核心概念。Spark既可以独自运行，也可以在现有的集群管理上运行。目前它为部署Spark集群提供了一些选项：
　　1、Amazon EC2: our EC2 scripts let you launch a cluster in about 5 minutes
　　2、Standalone Deploy Mode: 最简单的方式来部署Spark集群
　　3、Apache Mesos
　　4、Hadoop YARN

本博客文章除特别声明，全部都是原创！
原创文章版权归过往记忆大数据（过往记忆）所有，未经许可不得转载。
本文链接: 【Spark1.1.0预览文档(Spark Overview)】（https://www.iteblog.com/archives/1133.html）