MrBonsoir à l'internet: big data

Affichage des articles dont le libellé est big data. Afficher tous les articles

30/11/2015

A small recap to our HR friends

I can do Python, I can do data analysis (stats, modeling, machine learning, analytic and so on), I can do data visualization, I can learn fast, I can create, I can communicate, I can translate (or report information to different departments), I can write, I can connect people, I can invest myself a lot, I can team play. Basically I can do a lot and I want to do a lot. I also run pretty fast.

What I can't do is to come with miracle solutions to every problems before knowing the problems... Or it's pure luck and I should really start playing lottery then.

And I'm looking for a job.

24/11/2015

Some data about a data conference

I had the chance to attend the Data Natives 2015 event last week in Berlin. A first time event having for topics FinTech, IoT and of course Big Data. I heard some interesting talks but also less interesting ones. You can still hear people having the dream of forecasting anything with the help of more data, but with often the feeling it's only in order to sale more stuffs. I haven't got the life improvement that suppose to go with Big Data (e.g. mass surveillance doesn't work obviously).

But here and there you can sometimes hear someone talking about a project that hold your attention. For me the most interesting aspect is the inter-disciplinary or multi-disciplinary aspect of the data. To achieve something relevant or meaningful with all the available information, you need to be able to define first what is the problem you are trying to solve (and yes I have a degree in opening open door).

Four presentations are still in my head, one using NLP to pre-sort a lot of CV (from HitFox during the first day of the program) and a second using computer vision to automatically give feedback on webpage design (from EyeQuant last talk of the second day). Actually for the last one their talk was much wider than this single problem.

About IoT and FinTech is wasn't really impressive. Actually the only striking aspect is that the same tools are used whatever is your field of work (like data science / analytic / finance): you accumulate data, you trying to find information and pattern into them, this in order to derive model to make prediction. And without surprises the most interesting talks about FinTech came from the people involved in the Bitcoin economy / technology (Blockchain and ascribe Gmbh). Maybe the banks have some cool stuffs to talk about, but they weren't really present.

02/11/2015

Data conference coming!

I'm interested in knowing, observing how my field - applied research, technology, imaging, innovation... whatever you call it - is evolving. Working full time on one project is of course a good solution to see what's going on, but it's also taking the risk of being stock in daily routines. In that sens it's always wise to have a look of what your neighbors, competitors are doing, how they try to solve the same problem you are working on.

From my own experience I know that we - let's call us/me applied/data scientist - are very fast categorized in sub-fields, as experts and that it is sometimes difficult to extract yourself from the prism of how people are perceiving what you can do. Having said that, to be able to attend events, meet a new crowd, hopefully interesting people, exchange information, re-present yourself, feel how an industry is growing is something vital.

A few days ago I did spot an event Data Natives 2015 scheduled in Berlin the coming 19-20 of November. The keywords combination used to introduce the program is almost too perfect: IoT (internet of thing), FinTech (Financial Technology and not tech from Finland which sounds pretty cool too) and Big Data of course. Needless to say that I'm pretty excited to attend this conference!

09/04/2015

Deep learning (ou deep learning in French)

What was your question already?
How to explain deep learning to your friends, family members, neighbors, random stranger, dog? A very good question indeed. Rather than going deeply into neural networks and other festivities let's start with describing the problem(s) we want to solve. Or least let's give an example of what we are trying to do here.

Over the years I had to come with strategies if I wanted to explain what I do for living. Giving keywords such "color science", "computer vision", "image processing", "digital photography" is usually not enough or saying "I do work with images" neither. I always found interesting to answer the question "why to you want to do that?" or "which problem do you want to solve?". So to explain what I can do I try to give an idea of the tasks I have to solve.

What is the problem you are trying to solve already?
In some way asking these questions is already machine learning/deep learning-ish approach of solving a problem. In theory if someone asks you to solve a problem he knows the kind of results he want to obtain for a given input or starting point. What he doesn't know is what is happening between these two stages. Applied mathematics and optimization are a reasonable standard solution: you develop of model that recreate more-less accurately what is happening between these two stages, then for a new entree point your model will predict what an output will be.

I'm sure "big data" is an expression you have heard in the past years or months. It has of course different meaning depending who to is giving a definition. But, coming back to images and the incredible amount of images we are producing daily there is a need to develop solutions, tools to be able to interact with these images. You have in your hand an extremely large image database and using keyword as a search query isn't enough anymore. So here is the problem: how to navigate, how to browse into large image database in a more natural way? There is a bit of database here but that is not the main point of my article, check my past post on graph and database if you are interested.

Face recognition to recognition of everything
Working with images is fascinating, you see one image and automatically you extract some of its information. Of course there is a long learning curve, when you see a tree, a car, a known object in a picture you don't even realize it, you know, you have learned over the years you spent on earth to recognize, categorize, organize the continuous stream of visual information that come to your eyes and is later processed in your brain.

If you think of face recognition, the mathematical tools are now pretty standard. We can with high probability find out faces in images, classification comes after the recognition. And if you train your model you will be able to recognize semi automatically in a database faces of different persons as the tools/filters can be tuned for a given target. It can be scary of course if the threshold that decide for a true recognition/classification isn't verified by a real human and that action lead to a rocket launch. Actually any automatic action issued from an algorithm decision having impact on a human being is pretty bad (hello mass surveillance and hello Terminator). You want help from robots not to help robots or it's too late anyway.

An idea behind deep learning is to be able to learn what are into images - in a similar way as we human do - to extract features and to perform tasks on other images based on a trained neural network. I'm making shortcuts but that's the idea. To understand and to later mimic how information is circulating into the brain has been a dream of many researchers. Neural networks go into that direction. If a few years ago the algorithms were limited because of computer power the global picture is different now.

What is also interesting is that new strategies had to be developed to overcome the overload of data. In a way the system were "over learning" and people talked about over-fitting the data. And it makes sens. If I'm not too mistaken our brain is not indefinitely expandable, meaning we are sorting information continuously. One big part of these tools is to perform drop-out which can be explained as "now that our system can learn we have to teach him to forget part of what he knows in real time".

Cross disciplines
A chance I see - for me - is the need in some industries for expert being not only expert in one field. Specially for this kind of large scale problems involving images, computer vision, real time and fancy applied research projects. To know only about machine learning or statistic is not enough, to know both about computer and machine learning tools is better.

[We talk later about existing and possible applications.]

28/05/2014

Graphs en stock

Several meetings on Meetup and now one day of workshop on graph database organized by people behind Neo4j (check here neo4j). Answers bring new questions, what can I do with this approach? Using cypher you can ask your database in an almost "natural language way" and find information in your database. A nice feature presented here http://gist.neo4j.org/ allows you to run live query on the web and render graphs, several examples are presented.

Now for me the question is what do I do with that? Looking at my thesis references I could explore where I found relevant information (even so I kind of know the main source of information) or in which direction I could go. It is very close to study a social network, you are looking at the connections between authors and topic of research.

Regarding this blog, each post is a node and the nodes are sharing common features (keywords and dates) that can be used to navigate through the "database of stories". Here I would have to build my database (and being able to update it as the blog is evolving at each new entree) and to manage to create a nice visualization (a graph) that could be displayed for each post (to show how it it related to the others) or to give the possibility to look for all posts sharing similar keywords (and create new path into the database).

Regarding color science and metamerism, it could be interesting illustrate this matter for a database of spectrum (knowing that different surfaces with specific spectral properties can perceived identical under a common illuminant).

To me it is a lost about visualization and d3js is knocking on the door.

24/09/2013

Back to IBC Multi-platform and big data in TV

Two things I remember about the "Second Screen" and "Big Data" sessions in IBC this year and I think they are connected. The multiplicity of devices around us has changed somehow the way we watch television (I don't have a television). For some it is an improvement, it is the open door to multi-platform (TV + online presence on various networks) more than cross-media and surely a challenge to monitor the viewer attention. In other words what is doing the audience when watching television and where do I put my advertising?

England seems to be a giant laboratory where everyone can be observed, his behavior analyzed and quantified (they are statistically big twitter users). The amount a of generated data is enormous - we talk about big data - and the risk to be overloaded is real. This actually the case and I heard during the session from the panel discussion speakers that data scientists are needed (good for us).

An interesting talk from Twitter UK what to illustrate how the live audience reaction can be used to add information to a TV show. Example of the last US presidential debate was to say: six channels (not sure) were broadcasting exactly the same video stream, after days one was getting most of the viewers attention. Why? Explanation was this TV channel (it was Fox I believe) was able to analyze the tweets live during the debate (using the twitter API everyone can access all tweets) and to provide a global audience reaction to it, so nothing like "this candidate sucks" or "I like him" was appearing on screen but a simple feedback yes the candidate is answering the question or no he is not (not exactly that but not too far). You do need to have data scientists, people doing social graph analysis to retrieve such information. And there are companies offering this service to TV channels, doing interactive programming (I think they call it like that) and able to process the multiple streams of information coming from the audience, be it a tweet, sms, email, FB message... And if you know where your audience is then you are able to monetize this information.

Usability, it's nice to have many possibilities to react, send feedbacks, but they are so many options that it is difficult to drag most of the viewers attention or at least the group you have targeted without losing half of them on the way. I explain, if you are on your sofa, you don't want to follow a specific procedure, fill a form, touch your tablet screen with 3 fingers, flip the tablet in the air to be able to "interact" with the program to access something. In that sens someone from Shazam gave a great talk. He made a simple experiment to illustrate his point by asking us: who had the app on his smart-phone and what do you do to use it? You press the button to start the app and raise you phone toward the speaker. Their idea was to use this known behavior to communicate with the TV audience: you are watching a program, the shazam logo appear on screen, you raise you arm with the app on, some kind of audio qr-code are activated and you have access to new content on you tablet or smart-phone, brilliant (it was for tv program RedBull if I'm correct).

MrBonsoir à l'internet