Thought this was cool: GraphChi visual toolkit – or understanding your data
A few weeks ago I wrote about Orange d4d data of cellular user behavior in Africa.
The data of phone call patterns is given as a text file in the following format:
With the following format:
[calling user] [receiving user]\n
Since there are hundreds of thousands of phone calls it is very hard to understand what is actually the network structure. I decided to write a quick visual tool that will help user examine their graphs and understand better their structure.
Here is how you can try it out:
1. Checkout GraphChi from mercurial using the instructions here.
2. # cd graphchi; bash install.sh; make parsers; make ga
3. # cd toolkits/visual
4. Run the visual toolkit to create a sub graph representation. You will need to input the graph input file name, and the number of edges to extract. It is recommended to display less than 1000 edges or else the plot may be slow.
# bash make_data.csv.sh -f [input graph name] -n [number of lines]
For example, you can use the sample graph provided:
# bash make_data.csv.sh -f `pwd`/sample_graph -n 1000
5. # firefox index.html
Here are some examples of the images I got when playing with orange data:
As you can see different kinds of users emerge very clearly.. the red nodes are the “seed” users where the graph was traversed from. Each edge is a phone call connection. We can see different users:
1) unsocial – rarely makes phone calls..
2) small network – few calls to neighbors
3) nagging – often calls to call centers (highly connected neighbors)
4) social – connected to a lot of friends which are interconnected together
Next I tried the same visualization on some twitter data I have. Each link is a twit or retwit directed to a certain user.
Next I looked at some phone calls data from a large European country. The graph captures only several minutes time span. It is interesting to see that from the gray node in the middle the is a 6 hop link of someone who called someone who called someone in a very short time.
And here is a sample webpage which shows the output of the visualization.
1) It is possible to traverse a graph starting from a set of seed nodes.
Use the command line –s XXX for example: -s 12
or -s 192,31990,2312
2) When selecting a seed node, specify the number of hops to traverse using -h XX command. For example, -h 3 will traverse 3 hops around the sets of seed nodes.
3) If your input file is not in sparse matrix market format, but in [from] [to] format, you need
to specify an upper limit on the number of graph nodes using -o XX command.
How does your data look like? I would love any feedback from people who are trying to visualize their own graphs… let me know if you have any questions about the setup.
from Large Scale Machine Learning and Other Animals: http://bickson.blogspot.com/2012/11/graphchi-visual-toolkit-or.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+blogspot%2FsYXZE+%28Large+Scale+Machine+Learning+and+Other+Animals%29