Scientific Relations

National

Through several national research programs in which we were involved, we had weaved extensive relationships with several French research teams.
Hence, thanks to the MediaGrid project, we had a close cooperation with the database group of the LIG (Grenoble) and IBISC (ex LaMi, Evry), in particular on integration of genomic data and evaluation of flexible queries.

Within the APMD project, led by Mokrane Bouzeghoub, we had strong collaborations with database groups of many CNRS laboratories and INRIA (LINA Nantes, IRIT Toulouse, LIG Grenoble, LIRIS Lyon, IRISA Lannion), in particular on data personalization.

Within the Quadris project, we have close cooperation with three other labs (LSIS Marseille, CEDRIC Paris, IRISA Rennes) and two industrial partners (EDF/DER and Institut Curie).

These collaborations have generated common publications and common research reports. They have particularly consolidated a sparse community which was able to elaborate and acquire new common research projects.

International

At the international level, we have strong cooperation with U. Fédérale of Pernambuco, Recife, Brasil. Prof. Ana Carolina Salgado spent several short visits in our group and a sabbatical year in 2007-2008. A PhD student from the same university (Carlos Pires) spent also a year in our group (2007-2008). Another PhD student (Damires Fernandes) spent a month in 2008.

From our side, Zoubida Kedad had a visit of Two weeks to Recife. Other cooperation with Brazil includes Federal University of Ceará, Fortaleza, (Prof. Benadette Farias Loscio). Bernadette had several visits of one month in our group. All these cooperation concerns data integration, P2P systems and data quality. Common publications, reports and projects have been elaborated together.

We have also, since several years, a strong cooperation with La Republica University at Montevideo, Uruguay (Prof. Raul Ruggia). This cooperation was materialized by several common PhD students (Adriana Marotta, Lorena Etchevery, Veronika Peralta). Veronika Peralta spent more than three years in our group while other PhD students had multiple visits to our group.

A common research project has been initiated in 2007 between our group and the previous research groups in the context of STIC-AMSUD program. This project concern data quality and data integration.

Finally, a recent cooperation has been started since 4 years with University of El Cauca, Colombia. Juan Carlos Corrales has performed his PhD research in our group and is now Associate Professor at his former university. Besides, two other students have spent their Master training course in our group. Both of them have worked on Web services and graph matching.

Other fruitful collaborations have also been weaved with University of New South Welse, Sydney (Australia) and University of Trento (Italy). Daniela Grigori had two months visits in each site in 2006 and 2008, respectively.

Industrial

We have close cooperation with Alcatel-Lucent since 2006. Bilateral cooperation has been signed with our group since beginning of 2007. The PhD thesis of Sofiane Abbar is funded by this contract. Dimitre Kostadinov was recently recruited by this company to work in Bell Labs on application personalization.

 Imprimer  E-mail

Resource Discovery and Integration

Overview

Resource discovery and integration is one of the major problems addressed in the last decade. The challenge is to provide advanced solutions which permit business applications to interoperate between several information sources and several software services.

Scalability and heterogeneity are the main concern in these issues. Indeed, in one hand, the number of provided resources (i.e. information sources and business services), by various business operators, increases dramatically. In the other hand, a wide spectrum of software architectures, representation models and programming languages are proposed by IT actors to support agile design and deployment of new applications. Building a flexible information system in this context consists in desining a high level abstract solution composed by a virtual schema and a specification of a business process.

Given a deployment architecture, implementing such IS consists into mapping the virtual schema into selected data sources (i.e. defining queries which compute instances of the virtual schema from those of source schemas) and binding business activities to selected software services (e.g. Web services). Dynamic discovery of mappings and services is the main issue to introduce flexibility in the design and evolution of these systems.

The scalability of discovery algorithms and the heterogeneity handling of resource descriptions are the main challenges to deal with in research projects. Mappings discovery (MP) is considered as extremely difficult as the designer of the system must have a thorough understanding of the semantics of the numerous data sources which compose the integration system as well as the target schemas to which they should be linked. Another major issue is the maintenance of the mappings when the integration system evolves frequently.

Our research focused on mappings generation and evolution algorithms, exploiting rich meta data and putting emphasis on scalability.

Services discovery (SD) consists in selecting the most appropriate software services (or Web services) to compose a business application. Current approaches for services retrieval are mostly limited to the matching of their inputs/outputs, keywords search in registers like UDDI or ebXML, or correspondence tables. But recall and precision of these approaches are not satisfactory for many applications.

Within the framework of the semantic Web, description logics were proposed for a richer and precise formal specification of services. Derived ontologies, such as OWL-S, are used as a basis for semantic matching between a declarative description of the required service and descriptions of the offered services. However, the few existing approaches are only concerned by exact matches while many other services can partially fulfill user requirement.

Our research focused on SD based on behavioral specification allowing approximate and partial matching.

Mapping discovery

Our research on mapping generation and evolution started in the late 90' in the context of relational-based mediation systems. Within the MediaGrid project (ACI GRID, 2002-2004), these algorithms have been extended to XML data sources and the generation of XQuery mappings. Work done during 2004-2008 period essentially consists in the improvement of mappings generation algorithms for relational data sources, and in the specification of the mappings generation algorithm for XML data sources.

The problem of mappings discovery is formalized as a path searching problem in a graph whose nodes are source relations (possibly hundreds or thousands of nodes), and edges are possible joins between them. The desired paths are those which constitute queries that compute target relations.

Two main problems have to be solved in mapping discovery: search path optimization and heterogeneity resolution. We have introduced some heuristics on the lengths of the paths to limit the exponential cost of the exhaustive search. To detect the syntactic and semantic mismatches and resolve heterogeneity problems, we have proposed to extend data source descriptions with a rich data typing mechanism which will later facilitate matching procedures and the search of compensation rules. The mapping generation algorithm has then been extended to this purpose and a new advanced prototype has been implemented and evaluated. Significant improvements have been noticed with respect to the first generation algorithm.

Mapping generation for XML data sources is not fundamentally different from the one of relational data sources. The main difference resides in the complexity of data structure and on the variability of objects structures (i.e. the same object may be structured in several ways). To handle the complexity of mapping discovery, the target schema is decomposed into subtrees for which mappings are first created following the same methodology as for the relational model (assuming existence of join operations between two structured objects). Mappings for the whole schema are then obtained by composition of the partial mappings.

The initial funding of this research has been done through two national projects: Reanimatic (ACI Télémédecine) and MediaGrid (ACI GRID) which respectively concern integration of epidemiologic data (in particular data relative to nosocomial diseases) and integration of genomic data.

Service discovery

Our objective in service discovery is to propose an approach for service retrieval based on behavioral specification allowing an approximate match. The originality of the work is then the capability of proposed algorithms to retrieve services having similar behavior on the basis of a behavior-based similarity measure. Consequently, even if a service satisfying exactly the user requirements does not exist, the most similar ones, called partial matches, will be retrieved and proposed for reuse by extension or modification.

To do so, we reduce the problem of behavioral matching to a graph matching problem. We have introduced a semantic distance measure and a set of edit operations which allow the user to dynamically restructure his query graph when target graphs do not match to his requirement. We have studied two types of behavioral models: a simple automata-based model and a more complex model allowing parallel tasks. The matching algorithms have been improved by introducing quality factors, hence allowing to prune some undesired solutions. Another extension has been initiated in a collaborative work with B. Benatallah, F. Casati and F. Toumani. It concerns a taxonomy of the main mismatches that can arise between two services and a set of appropriate adaptors to alleviate these mismatches. A prototype has been developed; it takes as input two conversation protocols and evaluates the semantic distance between them. It also provides the script of edit operations that can be used to alter the query graph to render it as closer as possible to the target one. Finally, it include an evaluation platform which allows users to generate a catalog of service descriptions, a set of query graphs and the corresponding matches with their distance measures. This prototype is available as a Web service and has been demonstrated in the last EDBT conference.

The work has been partially funded by the grant of Alban Program (Europe Latino-America cooperation).

 Imprimer  E-mail

Quality Evaluation

Overview

Information quality (IQ) is becoming a strategic issue in modern IS, in particular in Data Integration Systems (mediation systems, peer to peer systems, data warehouses) which provide access to large amounts of data from alternative sources. In this kind of systems, making decisions without having information about IQ is highly risky. Several surveys and empirical studies showed the importance of quality in the design of IS, in particular for multi-source IS.

IQ problems have been reported as critical in several scientific and social areas such as Environment, Genetics, Economy, Statistics, Data Mining and Web IS. Important governmental and commercial organizations have invested a lot in IQ projects, in particular in quality standardization, formalization and evaluation (e.g. Data Quality Act of the US Administration in 2001, Evaluation Framework for Data Quality of the IMF in 2003, Data quality Objective Process of the US Environmental Protection Agency in 2006, the French IGN Quality Approach in 2008). Such strategic investments in IQ problems opens a door to consider data production processes as any other item production processes and, as such, should be certified by recognized assessment procedures.

IQ is a multidimensional problem, described through multiple factors such as freshness, accuracy, completeness, consistency. IQ can be enforced directly by comparing the data to some reference patterns or considered through the data production process.

There are two commonly agreed solutions to ensure IQ: auditing and curating. Both solutions are usually done by human experts with the major drawbacks of scalability and cost: manually browsing and correcting thousands of data items is out of the scope of many IT departments. Ad hoc and fragmented approaches are proposed to specific application domains (e.g., DSS, CRM, e-Business) or to specific data transformations (e.g. format homogenization, address standardization, duplicates elimination, record linkage).

Our approach goes a step forward and proposes a reasoning model and a quality toolbox which will support a low cost development of evolving quality decision support systems. Both approaches are complementary, as they propose respectively a generic framework to evaluate a large class of quality factors and a set of generic services which allow integrating several quality evaluation tools into a uniform decision support system devoted to quality assessment.

IQ evaluation framework

The generic framework consists in a process-oriented quality model defined as an adorned business process model. The business process model is represented as a workflow diagram, while the adornment is represented by a set of quality annotations put on nodes and edges of this diagram. These annotations result from the analysis of the main dimensions which influence quality evaluation (e.g. nature of data, type of system architecture, synchronization policy, ...). The whole quality model is then represented as a DAG over which two quality evaluation processes are defined.

The forward evaluation process propagates actual quality values from data sources up to the end users. The backward evaluation process propagates expected quality values from the end users down to the data sources or services. Both processes traverse the DAG and spread quality values over nodes and edges depending respectively on quality composition rules and quality decomposition rules. Both processes are equiped with a decision procedure which highlights the critical paths in the quality DAG. A set of restructuring operations are also provided by the model to support the business process restructuring in order to achieve certain quality requirements.

The model is implemented and its genericity experimented with two major quality factors: freshness and accuracy, and applied to the evaluation of three business processes. We have also adapted the model to the quality evaluation of Web services instead of integrated data.

Quality toolbox

Each application domain has its specific vision of data quality as well as a suite of (generally ad hoc) solutions to solve quality problems. However, there is an increasing interest in reusing quality knowledge and measurement methods, and in combining several quality evaluation tools which address different facets of data/process quality. To achieve this goal, we have defined a metadata platform devoted to quality tools integration and quality values collection and consolidation.

The Qbox-Foundation platform is a corner stone to a more complete toolset, defined in the Quadris project. This platform is based on a quality metamodel which is a refinement of the Goal-Question-Metric and DWQ quality models defined earlier. The Qbox-Foundation provides an extensible collection of reusable measurement methods and supports their instantiation and automates their execution. Its evaluation is planed within the third term of the Quadris project, using real quality requirements provided by EDF and Institut Curie.

The research done in this topic is partially supported by the ANR French Agency under its ARA program on data masses (project Quadris, ARA Masses de Données, 2005-2009) and partially supported by the QESID project within the STIC-AMSUD program (franco-latino-american cooperation).

 Imprimer  E-mail

Présentation Equipe AMIS- old

Advanced Modelling of Adaptive Information Systems

Strong requirements are put on modern information systems (IS) to provide high flexibility in their design, their deployment and their usage, and to guarantee high level quality for the services they supply and for the data they deliver to end-users. Resource discovery and integrationapplication personalization and quality evaluation are the main research techniques which cope with these issues. They correspond to the research topics we address in the AMIS team.

Members

Leader Mokrane Bouzeghoub Pr, UVSQ  
       
Permanent Daniela Grigory MC, UVSQ  
  Zoubida Kedad MC, UVSQ  
  Stéphane Lopes MC, UVSQ  
       
Ongoing PhD Sofiane Abbar    
  Lorena Etchevery    
  Ahmed Gater    
  Fernando Lemos    

Research Projects

  • AOC: Appariement d'objets complexe; ANR/Content; 2008-2011; IRIT, IRISA, LIRIS, LIESP.
  • QESID: évolution et qualité des données ; STIC-AMSUD (France-Latin America cooperation); 2007-2009; U. Marseille, U. de La Republica (UR), U. Féd. de Pernanmbucco (BR) U. Fédérale de Ceara (BR).
  • UUP: Gestion de profils utilisateurs; industrial; 2007-2008; ALCATEL-LUCENT.
  • QUADRIS: qualité des données dans les SI ; ACI Masses de Données (ANR); 2005-2009; IRISA, LSIS, CEDRIC, EDF/DER, Institut Curie.
  • APMD: Accès personnalisé à des MD; ACI Masses de Données (ANR); 2004-2007; CLIPS/LIG, LIRIS, IRIT, LINA, IRISA.
  • AS-MDA: Ingénierie dirigée par les modèles ; Action Spécifique CNRS; 2004-2005; LINA, LIFL, IRISA, IMAG/LSR, CEA, U. Nice.
  • AS-Personnalisation; Action Spécifique CNRS; 2003-2004; IRIT, LINA, IRISA, LIRIS, IMAG/CLIPS.
  • MEDIAGRID; ACI GRID (Ministère Recherche); 2002-2004; LSR, LaMI.

Publications

list of publications of the AMIS team can be obtained from the HAL publication service.

 Imprimer  E-mail

Présentation Equipe AMIS

Advanced Modelling of Adaptive Information Systems

Strong requirements are put on modern information systems (IS) to provide high flexibility in their design, their deployment and their usage, and to guarantee high level quality for the services they supply and for the data they deliver to end-users. Resource discovery and integrationapplication personalization and quality evaluation are the main research techniques which cope with these issues. They correspond to the research topics we address in the AMIS team.

Members

Leader Béatrice Finance MC (HdR), UVSQ  
       
Permanent Mokrane Bouzeghoub Pr, UVSQ, Directeur adjoint scientifique de l'INS2I au CNRS  
  Daniela Grigory Pr, Dauphine  
  Zoubida Kedad MC (HdR), UVSQ  
  Stéphane Lopes MC, UVSQ.  
       
Ongoing PhD Ahmed Gater    
  Fernando Lemos    
  Reda Bouadjenek    
  Kim Tam Huynh    
  Hanane Ouksili    

Research Projects

  • AOC: Appariement d'objets complexe; ANR/Content; 2008-2011; IRIT, IRISA, LIRIS, LIESP.
  • QESID: évolution et qualité des données ; STIC-AMSUD (France-Latin America cooperation); 2007-2009; U. Marseille, U. de La Republica (UR), U. Féd. de Pernanmbucco (BR) U. Fédérale de Ceara (BR).
  • UUP: Gestion de profils utilisateurs; industrial; 2007-2008; ALCATEL-LUCENT.
  • QUADRIS: qualité des données dans les SI ; ACI Masses de Données (ANR); 2005-2009; IRISA, LSIS, CEDRIC, EDF/DER, Institut Curie.
  • APMD: Accès personnalisé à des MD; ACI Masses de Données (ANR); 2004-2007; CLIPS/LIG, LIRIS, IRIT, LINA, IRISA.
  • AS-MDA: Ingénierie dirigée par les modèles ; Action Spécifique CNRS; 2004-2005; LINA, LIFL, IRISA, IMAG/LSR, CEA, U. Nice.
  • AS-Personnalisation; Action Spécifique CNRS; 2003-2004; IRIT, LINA, IRISA, LIRIS, IMAG/CLIPS.
  • MEDIAGRID; ACI GRID (Ministère Recherche); 2002-2004; LSR, LaMI.

Publications

A list of publications of the AMIS team can be obtained from the HAL publication service.

 Imprimer  E-mail

Plus d'articles...

Our website is protected by DMC Firewall!