Web Mining

This course is divided into 3 parts: The first part essentially aims to introduce the students to the opportunities offered by the adequate extraction of information from large scale textual data. It then delves into the Vector Space Model (VSM) and applies to it algorithms such as Term Frequency (TF) – Inverse Document Frequency (IDF) in the context of word association mining. It goes on to explore probabilistic topic models and delves into the implementation of algorithms such as Expectation-Maximization (EM) and Probabilistic Latent Semantic Analysis (PLSA). It also covers text clustering and categorization algorithms as well as opinion mining and sentiment analysis. The second part discusses the analysis of large graphs. It views the web as a directed graph and essentially explores link analysis and PageRank. The third and final part delves into recommender systems with a focus on content-based and collaborative filtering algorithms.

Temps présentiel : 35 heures

Charge de travail étudiant : 70 heures

Méthode(s) d'évaluation : Examen final

Ce cours est proposé dans les diplômes suivants
 Master en data sciences