Tuesday, June 14, 2016

ipython with spark

install Spark on Windows 10
Go to http://spark.apache.org/docs/latest/ and findout in the downloading section, which java version you need to install on your PC.

Go to java.com, download and install that version of java.


go to http://spark.apache.org/downloads.html

select pre-built for hadoop 2.6

click spark-1.6.1-bin-hadoop2.6.tgz will take you to another page

There, click http://ftp.wayne.edu/apache/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.6.tgz

unzip it in a folder and name it spark (optional).
Go to dos prompt and navigate to bin directory inside that folder.
type spark-shell and ENTER, you should get scala prompt. if so, its ok.

Go to https://www.continuum.io/downloads and download appropriate version of anaconda and install it.

Go to DOS prompt and run:
conda update conda
conda update python

From https://ysinjab.com/2015/03/28/hello-spark/

How to use IPython & IPython notebook with Apache Spark:

  • IPython:
    • Spark version <= 1.1.0 : set environment variable IPYTHON = 1
    • Spark version > 1.1.0 : set environment variable  PYSPARK_DRIVER_PYTHON = ipython
  • IPython notebook:
    • Spark version <= 1.1.0 : set environment variables IPYTHON = 1  and IPYTHON_OPTS = notebook
    • Spark version > 1.1.0 : set environment variables PYSPARK_DRIVER_PYTHON = ipython and PYSPARK_DRIVER_PYTHON_OPTS = notebook
Go to spark/bin folder from command prompt and run pyspark. ipython notebook should fire up. issue the command print sc. you should get a valid object.