Open Science Research Data Awards

News from the Committee
10/12/2024

In 2024 the Ministry of Higher Education and Research’s ‘Open Science Research Data’ Awards were distributed for the third time. This initiative is part of the Second French Plan for Open Science and rewards researchers, projects and research teams that work on the management and dissemination of data. Some of the award winners are rewarded for basing their research on the re-use of available data.

This year, the prize was divided into 3 categories: 

  • The ‘Creation of a missing dataset‘ award rewards four exemplary projects for making a new dataset available in response to a specific scientific requirement.
  • The ‘Creating the right conditions for re-use‘ awards rewards two teams considered to have done exemplary work in managing research data to make it re-usable.
  • The Jury’s ‘Coup de Coeur award for an exemplary project involving making data available and enhancing them.

The prizes were awarded on November 26th 2024 at the Assises Nationales des Données de la Recherche‘ conference held in Marseille.

The ‘Creation of a missing dataset’ category

Base Étendue, Améliorée et Unifiée des Annonces des Marchés Publics

The ‘Base Étendue, Améliorée et Unifiée des Annonces des Marchés Publics‘ project presents a new large-scale dataset on how public contracts were awarded in France from 2015 to 2023. Textual data from the Bulletin Officiel des Annonces des Marchés Publics were consolidated and structured then crossed with INSEE data on companies and public purchasers. These data are complex and often fragmentary and were consolidated using AI algorithms to determine the most likely alignments with the economic agents (SIRET numbers) described in other databases. 
This project received funding from the French National Research Agency (ANR) and is led by two PhD students from Avignon University – Adrien Deschamps who works in economics, and Lucas Potin, a computer scientist. 

Carminabase

Carminabase is an online database of medieval incantation formulae taken from collections of medical recipes and treatises, sermons and the margins of different manuscripts. Carminabase gathers together and consolidates this dispersed data then enriches them with metadata to situate these incantatory practices in a historical anthropological perspective. The data are accessible in a thematic data repository and can also be consulted using a dedicated interface. Carminabase currently includes 236 incantations and is continually being enriched with new data and metadata like textual criticism elements and several thematic indexations. 
This project is led by a research group at the École des Hautes Études en Sciences Sociales (EHESS).

Cartographie nationale des milieux humides 

The aim of the ‘Cartographie nationale des milieux humides‘ project is to create a dataset to locate wetlands throughout mainland France and characterise their natural, semi-natural and man-made habitats. These maps are produced using artificial intelligence and are based on freely available data, remote sensing data and in situ observations. This dataset is intended for use by researchers of course but also public policy-makers and the general public, with particular emphasis on society. 
This dataset is proposed by a multi-disciplinary group from several institutions: University of Rennes 2, the French National Natural History Museum (MNHN), Institut Agro Rennes-Angers, the Tour du Valat Foundation, the National Research Institute for Agriculture, Food and Environment (INRAE) and the National Centre for Scientific Research (CNRS).

Mapping Ancient Polytheisms

The ‘Mapping Ancient Polytheisms‘ project has created the first semantic database of linguistic resources on interactions between men and gods. It is based on documents from the 1000 BC to 400 AD period (almost 1500 years) from the Mediterranean basin in Greek and Semitic languages (Hebrew, Aramaic, Phoenician, Punic, etc.). This database can be accessed on and downloaded from an academic repository and explored on a dedicated platform. A great deal of work went into compiling and indexing the data from what is a ‘niche’ research field to make them accessible to the general public. The database is regularly expanded thanks to analysis of new sources which makes it possible to aim towards exhaustiveness in religious studies on polytheism. 
This project received ERC ‘Advanced Grant’ funding and involves a team from the University of Toulouse. 

The ‘Creating the right conditions for re-use’ category

MAKAHO

The MAKAHO project disseminates the results of trend calculations based on open primary data from hydrometric stations whose flows are not greatly influenced by human actions. These data products are made available on the Recherche Data Gouv platform and enhanced presentation-wise with an added-value interactive display interface. The datasets and associated tools are useful for scientific outreach in the area of climate change and to support public policy-makers. Thus the data have increased potential for re-use thanks to the rich documentation accompanying them and because they are in open access. 
This project is led by a group of researchers at the INRAE.

MDVerse

The dual objectives of the MDverse project are to catalogue data from molecular dynamics simulations that are available in various open data repositories and improve the existing associated metadata so these data are easier for the scientific community to find and reuse. MDVerse spotlights open data with a relatively low level of visibility by enriching their description and metadata through a metaengine that facilitates their discovery and a reuse potential indicator defined by experts in the field. 
This project is led by a team from the Université Paris Cité (UPC) and the CNRS working with international collaborators.

The Jury’s ‘Coup de Coeur

Lacas

LaCAS (Open Archive in Language and Cultural Area Studies) is intended as a reference tool for the areal studies research community. This platform was developed by the National Institute of Oriental Languages and Civilizations (INALCO) which harvests, aggregates, structures and enriches heterogeneous areal studies resources from open archives and data repositories. LaCAS classifies and structures these data for editorial use on LaCAS Publications by aligning and linking heterogeneous data (text, video, sound, images) harvested from multiple repositories (HAL, Zenodo, Nakala, Gallica, Calame, Isidore, Persée, Open Alex, Semantic Scholar) and thanks to a substantial and regularly enriched thesaurus. 
LaCAS received SESAME funding from the Île-de-France region and IdEx funding from Université Paris-Cité.

Jury

The jury for the 2024 Open Science Research Data Awards was chaired by Aude Chambodut (scientific open science and data officer at CNRS Earth & Space and an observatory physicist at the University of Strasbourg’s School and Observatory of Earth Sciences). The members were: 

  • Ms Esther Dzalé (GenEval Association)
  • Mr Etienne Roesch (University of Reading)
  • Mr Hadi Quesneville (INRAE)
  • MrJean-Denis Vigne (MNHN)
  • Ms Julie Vallée (CNRS)
  • Mr Olivier Marlet (Université de Tours)
  • Mr Pascal Hot (Université Savoie Mont Blanc)