27/05/2015

Struggle for social graph and datavizzz


The holly Grail of the day
Build an interactive data visualization of my own networks where I could jump from one network to the other and navigate in time.  On the paper it sounds easy: use your own network data (facebook (FB), linkedin (LI), twitter, instagram, EyeEm...) to exercise yourself on social graph. In other words use tools from your beloved statistic toolbox (Matlab, Python, R...).

The why
Why, why and why using your own data? First reason and obvious to me, you know the data - or at least part of it - and it should be bit easier to navigate through them. About the first why bother to do that? Once again it's simple and the answer is curiosity. The more people use a buzz word in all conversations the less they understand what it means and I don't like to not understand.

Social graphs are interesting because they illustrate part of our multiple identities - this of course if you decided to look at your own network instead of looking at the interaction between people forming a group which is also interesting (data journalism loves to dissect political social network to find out who are the leaders). We don't know the same people/don't play the same character depending of the network as they describe different interactions (e.g. FB vs LI).

The reverse engineer path
The path I did follow wasn't probably the most efficient but I'm getting better every day. Plotting a social graph isn't the most difficult task. Using gephi you can relatively fast generate beautiful graphs. In parallel I took in statistic and social network analysis to refresh parts of my brain on the topic.

The prototype
As inmaps isn't available any more I ended up on another automatic solution called socilab.con that requires you to log with your linkedin account. It's nicely made, you get a graph and several score values that describe your network and which role you play in it. Sadly it is limited to 500 contacts, so if your contact list is much bigger the analysis is incomplete. But this website allows you to download this version of your contact list. And actually what you are downloading is the formatted data from your LI account under the form of an adjacency matrix. I had to clean a bit the data using Python and Pandas which make any manipulation of csv file a real pleasure.

The adjacency matrix
This matrix - if I understood correctly - should be square where both columns and rows have the same names: your contact name list. Depending of the cell value 0 or 1 you know if your contact know each other or not, the matrix isn't symmetric. It's a particular case of data, because if you look at a FB group of people liking peanut butter toast for diner they may not know each other but they are all connected by their irrational attraction to fatty cream and low safe consideration.

Where the trouble starts
It starts right when you want to access your data... Building by hand this matrix is doable but is a really silly task. And both LI and FB do make the task easy neither. You will need to play with their API (I haven't checked for twitter, instagram and more yet) to access your account and download/build your matrix.



Aucun commentaire: