Deep learning in information retrieval and NLP

My research is motivated by the proposal of new information retrieval and natural language processing models. The common goal of these models is to facilitate the processing of textual data, possibly combined with other information sources, such as user logs (2011-2018), social data (2014-2015), semantic resources (2015-2018) or visual data (2015-2018).
Initially, my research focused on the formalization of models based on IR techniques (term weighting, matching models, …). Today, my work is rather based on deep learning models for IR and NLP.

  • Data-to-text
    The objective is to generate textual descriptions of structured inputs (tables/graphs/…). This task is particularly interesting in the financial domain, sport journalism, or health since it allows to synthetize and reason over large set of structured data which might be hardly readable for humans.
  • Reinforcement learning-driven IR (e.g., search-oriented chatbot, query reformulation, …). Supported by ANR JCJC SESAMS.
    The objective is to support users’ search through interactive and proactive systems, anticipating their needs and guiding users for solving their task. Information need refinement/understanding, belief tracker, dialog systems, and language generation are examples of tasks that can be addressed in this topic.
  • Knowledge base-enhanced IR
    Leveraging knowledge bases to enhance IR is particularly relevant for technical domains (e.g., medecine), when the semantic matching provided by neural models is not sufficient to capture all peculiarities of the application domain.
  • Multimodal word and sentence embeddings
    Words are grounded in congnitive sides (what we thing, what they refer to, how they can be used in the physical world). Our objective was therefore to enhance word semantics with their visual semantics.


Invited talks

  • NaverlLabs (July 2021) “Data-to-text generation: let your data speak fluently”
  • Summer school ETAL (June 2021): Lecturer – “Information retrieval models” and practical activities
  • “THL et multimodalité” Days – THL/AFIA (oct 2020): “From multimodal representation learning to multimodal information access”
  • LIS seminar – Marseille (December 2020): “From multimodal representation learning to multimodal information access”
  • PhisIA seminar – Univ Paris Diderot (nov 2019): “Le symbolique au service du connexionnisme et vice-versa : apprentissage de représentation augmenté, extraction d’information et bases de connaissances
  • ERIC lab seminar (oct 2019): “Apprentissage de représentations textuelles augmentées bases de connaissances: application à la Recherche d’information
  • GDR IA – 2019: “Ancrage visuel et conceptuel du texte pour l’apprentissage de représentation”
  • Panelist for the Pré-GDR TAL (March 2019)
  • Laboratoire ERIC “De la Recherche d’information collaborative à la recherche d’information socio-collaborative : fondements, modèles et perspectives”



  • 2022-2026: ANR PRCE ACDC. Data-to-text generation
    Consortium: MLIA@Sorbonne, LAMSADE@ParisDauphine/PSL MHNH@Sorbonne, Recital
  • 2019-2024:  ANR JCJC SESAMS. Search-oriented Conversational systems      — Coordinator 
    Consortium: Vincent Guigue (MLIA-LIP6), Ludovic Denoyer (FAIR Paris), Jian-Yun Nie (Univ. Montréal Canada), Philippe Preux (Univ. Lille)
  • 2014-2019: CHIST-ERA MUSTER. Ground language in perception (visual inputs) and extract representations of meaning tied to the physical world.
    Consortium: KU Leuven, Belgium; ETH Zurich, Switzerland; LIP6 UPMC, France; University of the Basque Country, Spain
  • 2014-2015: PEPS CNRS. EXPloration sur l’usage des médias sociaux pour un Accès Collaboratif à l’information.
    Consortium: IRIT- SIG ; CNRS March Bloch, Berlin ; Maths, SMMA, Université Paris Sorbonne