CWYAlpha

Just another WordPress.com site

Thought this was cool: Deploying a GraphLab/Spark/Mesos cluster on EC2

leave a comment »


I got the following instructions from my collaborator Jay (Haijie Gu) who spent some time learning

Spark cluster deployment and adapted those useful scripts to be used in GraphLab.

This tutorial will help you spawn a GraphLab distributed cluster, run alternating least squares task, collect the results and shutdown the cluster.

This tutorial is very new beta release. Please contact me if you are brave enough to try it out.. 

Step 0: Requirements

1) You should have Amazon EC2 account eligible to run on us-east-1a zone.
2) Find out using the Amazon AWS console your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
3) Download your private/public key pair (called here graphlab.pem)
4) Download Graphlab 2.1 using the instructions here.

Step 1: Environment setup

Edit your .bashrc or .bash_profile (remember to source it after editing)
export AWS_ACCESS_KEY_ID=[ Your access key ]
export AWS_SECRET_ACCESS_KEY=[ Your access key secret ]

Step 2: Start the cluster

$ cd ~/graphlabapi/scripts/ec2
$ ./gl-ec2 -i ~/.ssh/graphlab.pem -k graphlabkey -z us-east-1a -s 1 launch launchtest
(In the above command, we created a 2-node cluster in us-east-1a zone. -s is the number of slaves, and launch is the action, and launchtest is the name of the cluster)

Step 2.1: Update the image to contain the latest code (optional)

$./gl-ec2 -i ~/.ssh/graphlab.pem -k graphlabkey update launchtest

This operation gets the latest graphlab from mercurial, compiles the code and sends
the latest code to the cluster slaves.
Note: this operation may be slow, since the project is fully compiled. It should be done
only once when starting the image.

Step 2.2: Start Hadoop (mandatory when using HDFS)

This operation is needed when you want to work with HDFS

./gl-ec2 -i ~/.ssh/graphlab.pem -k graphlabkey start-hadoop launchtest



Step 3: Run alternating least squares demo

This step runs ALS (alternating least squares) in a cluster using small netflix susbset.
It first downloads the data from the web: http://www.select.cs.cmu.edu/code/graphlab/datasets/smallnetflix_mm.train and http://www.select.cs.cmu.edu/code/graphlab/datasets/smallnetflix_mm.validate, copy it into HDFS, and run 5 alternating least squares iterations:

./gl-ec2 -i ~/.ssh/graphlab.pem -k graphlabkey als_demo launchtest

After the run is completed, you can login into the master node and view the output files in the folder ~/graphlabapi/release/toolkits/collaborative_filtering/

The algorithm and exact format is explained here.

Step 4: shutdown the cluster

$ ./gl-ec2 -i ~/.ssh/graphlab.pem -k grpahlabkey destroy launchtest

Advanced functionality:

  Step 5: Login into the master node

$ ./gl-ec2 -i ~/.ssh/graphlab.pem -k graphlabkey login launchtest
sudo echo “MaxStartups 100” >> /etc/sshd_config

  Step 6: Manual building of GraphLab code

On the master:
cd ~/graphlabapi/release/toolkits
hg pull; hg update;
make

/* Sync the binary folder to slaves */
cd ~/graphlabapi/release/toolkits;  ~/graphlabapi/scripts/mpirsync


/* Sync the local dependency folder to slaves */
cd ~/graphlabapi/deps/local; ~/graphlabapi/scripts/mpirsync

Manual run of ALS demo

 Login into the master node
 cd graphlabapi/release/toolkits/collaborative_filtering/
 mkdir smallnetflix
 cd smallnetflix/
 wget http://www.select.cs.cmu.edu/code/graphlab/datasets/smallnetflix_mm.train
 wget http://www.select.cs.cmu.edu/code/graphlab/datasets/smallnetflix_mm.validate
 cd ..
 hadoop fs -copyFromLocal smallnetflix/ /
 mpiexec -n 2 ./als –matrix hdfs://`hostname`/smallnetflix –max_iter=3 –ncpus=1

Troubleshooting 

Known Errors:
Starting the dfs:
namenode running as process 1302. Stop it first.
localhost: datanode running as process 1435. Stop it first.
ip-10-4-51-142: secondarynamenode running as process 1568. Stop it first.
Starting map reduce:
jobtracker running as process 1647. Stop it first.
localhost: tasktracker running as process 1774. Stop it first.

Solution:
Kill hadoop and restart it again using the commands:

./gl-ec2 -i ~/.ssh/graphlab.pem -k graphlabkey stop-hadoop launchtest

./gl-ec2 -i ~/.ssh/graphlab.pem -k graphlabkey start-hadoop launchtest

Error:
12/10/20 13:37:18 INFO ipc.Client: Retrying connect to server: domU-12-31-39-16-86-CC/10.96.133.54:8020. Already tried 0 time(s).

Solution: run jps to verify that one of the Hadoop nodes failed.

./gl-ec2 -i ~/.ssh/graphlab.pem -k graphlabkey jps launchtest

> jps
1669 TaskTracker
2087 Jps
1464 SecondaryNameNode
1329 DataNode
1542 JobTracker
In the above example, NameNode is missing (not running).  Stop hadoop execution using stop-hadoop command line.

Error:

mpiexec was unable to launch the specified application as it could not access
or execute an executable:


Executable: /home/ubuntu/graphlabapi/release/toolkits/graph_analytics/pagerank
Node: domU-12-31-39-0E-C8-D2


while attempting to start process rank 0.
Solution:
Executable is missing. Run update:

Error:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGILL (0x4) at pc=0x000000000056c0be, pid=1638, tid=140316305243104
#
# JRE version: 6.0_26-b03
# Java VM: Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [als+0x16c0be]  graphlab::distributed_ingress_base<vertex_data, edge_data>::finalize()+0xe0e
#
# An error report file with more information is saved as:
# /home/ubuntu/graphlabapi/release/toolkits/collaborative_filtering/hs_err_pid1638.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#


Solution:
1) Update the code:

$./gl-ec2 -i ~/.ssh/graphlab.pem -k graphlabkey update launchtest

2) If the problem still persists submit a bug report to GraphLab users list.

Error:
bickson@thrust:~/graphlab2.1/graphlabapi/scripts/ec2$ ./gl-ec2 -i ~/.ssh/graphlab.pem -k graphlabkey login launchtest
ERROR: The environment variable AWS_ACCESS_KEY_ID must be set
Solution:
Need to set environment variables, as explained in step 1.


from Large Scale Machine Learning and Other Animals: http://bickson.blogspot.com/2012/10/deploying-graphlabsparkmesos-cluster-on.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+blogspot%2FsYXZE+%28Large+Scale+Machine+Learning+and+Other+Animals%29

Written by cwyalpha

十月 21, 2012 在 3:55 下午

发表在 Uncategorized

发表评论

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / 更改 )

Twitter picture

You are commenting using your Twitter account. Log Out / 更改 )

Facebook photo

You are commenting using your Facebook account. Log Out / 更改 )

Google+ photo

You are commenting using your Google+ account. Log Out / 更改 )

Connecting to %s

%d 博主赞过: