28/05/2014

Graphs en stock

Several meetings on Meetup and now one day of workshop on graph database organized by people behind Neo4j (check here neo4j). Answers bring new questions, what can I do with this approach? Using cypher you can ask your database in an almost "natural language way" and find information in your database. A nice feature presented here http://gist.neo4j.org/ allows you to run live query on the web and render graphs, several examples are presented.

Now for me the question is what do I do with that? Looking at my thesis references I could explore where I found relevant information (even so I kind of know the main source of information) or in which direction I could go. It is very close to study a social network, you are looking at the connections between authors and topic of research.

Regarding this blog, each post is a node and the nodes are sharing common features (keywords and dates) that can be used to navigate through the "database of stories". Here I would have to build my database (and being able to update it as the blog is evolving at each new entree) and to manage to create a nice visualization (a graph) that could be displayed for each post (to show how it it related to the others) or to give the possibility to look for all posts sharing similar keywords (and create new path into the database).

Regarding color science and metamerism, it could be interesting illustrate this matter for a database of spectrum (knowing that different surfaces with specific spectral properties can perceived identical under a common illuminant).

To me it is a lost about visualization and d3js is knocking on the door.

23/05/2014

Another conference in Berlin about Culture and Informatic

A rather chaotic conference with a nice location, location being a room in the Bode museum in Berlin. Second edition for me and I still struggle to understand how this one works. I found each session too short regarding the number of presentation per session (without talking about the people reading their presentation) and the keynotes being too numerous in comparison.

But despite my negative first impression I found some interesting facts and problematic. One of them is the great diversity within museums depending of their location, be it in a province of China or in Namibia. The matter of getting visitors, making them to come and visit is extremely different.  Having something to present to the public is not enough depending of the geographical location together with the social environment.

Always cool to talk with people following a complete different path as yours. I started my journey in Paris, went to Montreal, back to Paris, Oslo and now Berlin and here I had very interesting conversation with Indian and African people, for one traveling East to West in the South hemisphere and the other North to South crossing several time the hemisphere. Complete different languages and geographical references, but we found ourselves on work topic and way of living, three different continents and none of us working and living in our country of origin for at least 10 years.

16/05/2014

Archiving conference 2014 in Berlin

This week I had the pleasure to attend the Archiving conference (AC) 2014 in Berlin. The event was hosted at the kino Arsenal in Potsdamer Platz in one of the cinema room with extremely comfortable red seats. This conference belongs to a group of conferences run by the Society for Imaging Science and Technology. I usually attend the Color Imaging Conference (CIC) or the Electronic Imaging conference (EIC), both in the US, so when I saw that the AC was taking place this year in Berlin I jumped at the chance and joined the event.

Comparing to CIC the topics of AC are much more applied and not only dedicated to imaging problems. The variety of represented fields makes this event very interesting. Starting from archiving itself, we can understand it by all problems related to collect documents, from papers, books, audio files, video files, electronic documents for e-government, art collections, website, internet. Once the databases are built you have to think of how to access the documents knowing that the amount of information grows without stopping, having documents digitized do not mean they don't need to be processed anymore (e.g. how to save scanned pages of an old book with its hand-written annotations). In relation to the scanning task, the one day industrial exhibition gives us the possibility to see different book scanners with special features to be able to manipulate fragile documents.

A never ending challenge in this domain is the continuous evolution of technology. If the tasks remain identical - to archive documents whatever they are - the people in charge of these tasks - working in library, museum, archive department - have to save the documents and keep alive the technology to actually store these documents. Archives being public they have to be accessible to the public. This also asks the question of the founding, the responsible people have to continuously argue with politicians to keep their founding at a reasonable level.

To be give a very simple image, in the past all documents were saved on solid media (ie stone, paper, film...), something physical with a long life duration. Then the time is speeding up, we accumulate more and more information, we need more space to archive it and we need to reduce the physical size of the archiving media. Data centers full of servers are available but the life duration they proposed is not as long as the physical media: you enter the configuration where these centers are continuously copying and migrating information from server to server to be faster than the archiving media life, therefore our global archives are always moving: a global power outage could mean the end of our archives.

Archiving film documents
An important parameter is the constant evolution of technology. This is something the movie industry is facing also continuously. In a way the challenges of restoring and archiving very old film material are very similar to those faced by during a movie production when different sources (ie contents from different cameras, different softwares) have to be combined: there are different workflow that need to be merged in order to produce the final document/film. There are discussions that propose to use the DCP film distribution format to store new films and old films. One advantage it is understood by many people but it brings the matter of DRM in the document. One problem is the projection system and the encryption are linked (if I'm correct), what happend when the projectin technology changes? All these aspects were tackled in the opening keynote given by a fellow from Fraunhofer IIS.

Crowd sourcing and database browsing
Crowd sourcing was to me a very interesting topic. It did appear in oral presentations and in posters. I joined together the database browsing because I think these two points are related. The amount of available information - as I mentioned above - is increasing and if you remember big data we have both feet inside. To process the documents the archiving departments have (eg in the conference program from the BNF in Paris, in the Netherland or in Germany) started campaigns where they ask the public to achieve tasks to validate digitization (I'm simplifying the problem here) where the task can't be perform fully automatically without errors. It follows the web 2.0 model where the user is also the one creating the content, except here he is helping/working on archive media which concerns everybody, it's our memory after all.

Graph database was mentioned and I could like to hear more about it. Especially how database query languages are developed to ease the access to the information in those databases. But this also implies to think how to build these databases. Pretty interesting in any cases.

First impressions
To be short there were mostly good. Different people, different backgrounds but similar challenges as archiving, digital preservation, curation are. Multidisciplinary as I like it.