(Gestion, Analyse et Modèles pour les Masses de données en Agronomie)
Experimental and theoretical approaches have been recognized for centuries as the founding paradigms of scientific research. More recently, simulations have become a standard tool to explore previously inaccessible fields. As experimental and simulation technologies are producing increasing amounts of data, the use of information technology and communication gave birth to a fourth research paradigm (J. Gray and A. Szalay, eScience: Transformed Scientific Method) whose characteristics are to be collaborative, integrated and data-driven.
Research in agronomy and environment is symptomatic of this trend. In these domains, the use of new technical means provides richer data sets. In addition, if the implementation of additional experiments in fields or in laboratories is a constantly renewed research challenge (e.g. phenotyping platforms, biological studies of soil, bioprocesses), modeling approaches have also grown in importance in recent decades. A given scientific questioning often needs the building of in-vivo and in-silico data sets, where data are of different nature (phenotypic, environmental, genotypic), of various types (curves, readings, expertise, time and space, etc.), mixed quality (some data are
very noisy or unreliable) and collaborative (that is to say, not from a single experiment but forming a network of experimentations). To enhance and exploit these large amounts of data requires a new generation of tools able to manage and analyze them.
The scientific objective of GAMMA (Management, Analysis and Modeling for Masses of data in Agronomy) is to develop methods and tools suited to the processing of such data, with particular attention to the temporal dimension. The synergies created by the presence of computer scientists and statisticians allow the team to have an integrated approach from data and knowledge management to prediction and decision support. The GAMMA team is a partner of the LABEX NUMEV whose objectives are, for us, methodological, as well as a partner of the LABEX AGRO dedicated to the study of agricultural systems.
Complex agronomic and environmental systems
The application areas covered by the team are plants, bioprocesses and food processing. Precision agriculture or natural resource management are also in our scope.
From a methodological point of view, these application fields have common characteristics: (i) environmental interactions and observations over a time period (ii) multi-scale integration or multi-step processes in an agri-food chain (see figures). The observation levels cover different levels of granularity, from high resolution on line measurements to concepts and behaviors described by expertise.
Our activities can be grouped according to two axes:
1- Managing data quality and knowledge, focused on scientific information management,
2- Understanding, prediction and decision support, geared towards exploiting the previous information.
Axis 1.Managing data quality and knowledge
This axis concerns the acquisition, organization, share and re-use of integrated high quality data and knowledge. These data can be complex and multi-scale. They often are non-reproducible (phenotyping in the field, vintage, etc.) or generated by models. The goal is to develop methods to help manage data in an integrated manner. Another challenge is the taking into account of new types of data, generated by experiments evolving with technological innovations and scientific priorities.
The GAMMA team proposes an original approach, based on semantic graphs, to collect and organize multi-scale data issued from heterogeneous sources. The team also develops methods to integrate expertise by learning rules from available data, for instance on agroecosystems. Many methodological questions arise from these topics, the privileged topics including data models, descriptive models (RDF) and ontologies.
Once data are organized, they usually require validation and pre-treatment before being analyzed. Measurements may be noisy due to an incident (eg. sensor dysfunctioning), and it is important to detect such faults and eventually to reconstruct corrected data. The great amount of data to be processed requires the development of automatic and generic procedures. Such approaches combining expertise and statistical techniques are currently studied by the team, as well as the design of statistical procedures under specific expert constraints. For instance, numerous kinetics are known to be increasing (sometimes convex). The implementation of such constraints in a statistical estimation of the kinetics is not straightforward. Novel research results obtained by the team include the proposal of Bayesian estimation methods to reconstruct a curve under shape constraints.
After data pre-processing, the practitioner has at his disposal (experimental and simulated) datasets, as well as data-related knowledge. It may be interesting to put them in closer relation, for instance by using the various levels (scales) of ontological granularity in order to guide learning algorithms such as decision trees or segmentation methods. Learning can in turn contribute to ontology and knowledge enrichment.
Axis 2. Learning, prediction and decision support.
Numerous agronomic studies are dedicated to the dynamic evolution of plants or individuals, among which growth kinetics are of major importance. Since its creation, the GAMMA team has been strongly implied in the analysis of such data, called longitudinal or functional, according to the number of observations and their frequency. The variety of proposed approaches: parametric, non parametric, frequentist or Bayesian, allows to deal with the specificities of many targeted applications. An important emerging issue consists in exploring and modelling the curve data in relation with a relatively high number of co-factors. Examples of such co-factors for plant phenotyping include the genetic and environmental features available in integrated case studies. For instance, a significant challenge can be the determination of the genetic components responsible for maize growth under conditions of water stress. New data analysis needs require the design and tuning of innovating approaches combining curve analysis (functional statistics), high dimensional statistics (feature selection), multi-scale integration (hierarchical models) or clustering. Another family of approaches is based on the combination of statistical and deterministic approaches for analyzing temporal data, whose objectives can be to estimate the properties of a noisy dynamical system or to understand the behavior of a complex dynamical model, e.g. the association of forestry and cereal culture, through a statistical analysis of simulations.
The application domains cover a wide range, from the study at plot level in precision agriculture, to agrifood transformations, and up to the sector scale (e.g. the Pilotype Project for the vineyard and wine industry). On the basis of methodological advances arising from the various projects, specialized software is produced to assist professionals and researchers from the agrifood industry or precision agriculture. Reasoning tools that allow to associate expert knowledge and data have been developed using several knowledge formalizing approaches (ontologies, fuzzy logic). They can handle information granularity and design aggregated variables at system level (e.g. a production sector).
The team proposes approaches that also result in the production of generic software tools. These tools are developed with a strong implication of partner research laboratories at all stages. The process allows to adequately adjust software to user issues and requirements, as for the SILEX (Information systems for Experimentation) or Fispro (Fuzzy Inference system design and optimization) projects. It enables us to value and to better focus the team researches, and to enhance our visibility in agronomic domains such as the wine industry or the
cereal transformation sector.
Two structuring and emblematic projects
- Phenome (Infrastructure of excellence) Phenotyping has become a bottleneck in all programs aiming to develop genotypes able to maintain or enhance the agronomic performances of a culture submitted to climatic changes and to input reduction constraints. The Phenome project objective is to provide France with a versatile and innovative infrastructure, in order to achieve a high phenotyping capacity in a range of environmental conditions, and to develop methods able to analyze the resulting information in a genetic context. The Phenome project brings together 12 academic laboratories and 2 professional technical institutes. Its strategic originality is to develop the methodological projects at consortium scale. Among these projects, the GAMMA team is co-responsible of two tasks: i) the design of an information system to be used by the whole research community of the Phenome project (axis 1) ii) the development of informatics and mathematical approaches to annotate and analyze phenotypes (axes 1 and 2).
- LACCAVE : Long term impacts and adaptations to Climate Change in Viticulture and Enology (ACCAV INRA Metaprogram on climate change) The LACCAVE project aims to study the strategies for the viticulture and wine industry to adapt and innovate in response to climate change. This will be done with a consideration for the economical importance of these two sectors in France. The LACCAVE project brings together 22 research teams and 7 scientific departments from INRA. GAMMA carries out the design of an information system (axis 1) that compiles and integrates the various kinds of information relevant to the wine sector. Analysis of the data necessary to formulate scenarios and to validate models will rely, among others, upon methods developed within the team (axis 2).