Retour à l'index du GREYC

Séminaire Algorithmique

Site du CNRS

Séminaire Algorithmique

Le séminaire a lieu le mardi à 11 h 45 (sauf modification exceptionnelle), au campus Côte de Nacre, bâtiment Sciences 3, salle S3 351, 3ème étage.

Résumé du séminaire du Mardi 15 Décembre 2015

Algorithmic challenges in temporal Web analytics

par Marc Spaniol (GREYC, Caen)

Web-preservation organization like the Internet Archive not only capture the history of born-digital content but also reflect the zeitgeist of different time periods over more than a decade. This longitudinal data is a potential gold mine for researchers like sociologists, politologists, media and market analysts, or experts on intellectual property.

Longitudinal data analytics – the Web of the Past – poses research challenges, but has not received due attention. The sheer size and content of Web archives render them relevant to analysts within a range of domains. The Internet Archive holds more than 350 billion versions of Web pages, captured since almost two decades. In my talk I will introduce several aspects that are relevant for temporal Web analytics. These include, but are not limited to, achieving archive coherence, named entity disambiguation, emerging concept identification or knowledge linking.

Based on dedicated examples, I will pinpoint the underlying research challenges from an algorithmic point of view.

GREYC
Campus Côte de Nacre, boulevard du Maréchal Juin
BP 5186
14032 Caen Cedex
FAX : +33 (0)2 31 56 73 30
http://www.greyc.fr