Présentation Equipe DIM

Data Integration and Mining

The new forms of information production and usage strongly impact the research orientations in database area. Multi-source data are becoming more popular, the data quantities are more and more massive and their operation requires increasingly reasoning and mining to produce intelligible knowledge. An important part of the data corresponds to temporal or spatiotemporal events. This leads on the one hand, to consider the semantics of time and space in the data models, and on the other hand to integrate specific techniques for data stream processing.

In this context, DIM team is interested in the problems of multi-source data access and data mining of data coming from the web or from temporal (RSS, sensors) or spatiotemporal (moving objects) flows. The research interests range from logical models, languages and algorithms in databases and data mining, to physical data access optimizations.

Research Topics

  • Model for data and knowledge integration: multi-valued logic, conceptual graph, algebraic model. Challenges: semantic of spatiotemporal location, imperfect and multi-granular knowledge.

  • Data model and query language: object-relational, RDF, extended SQL, XML, XQuery, SPARQL, CQL, etc. Challenges: expressiveness, continuous spatiotemporal queries.

  • Optimization: specialized access methods, algorithms, query evaluation and optimization, architecture design. Challenges: identification of new problems, adaptation of operators (aggregates and multi-criteria), cache management, analytical and experimental evaluations, simulation, benchmarking, robustness with respect to flows.

  • Temporal and spatiotemporal data mining: approaches and algorithms for discovering temporal and spatiotemporal patterns, evaluation and optimization of data processing. Challenges: preprocessing, similarity, clustering, profiling, scalability, measurement of interest, adjustment to flows, privacy.

Outlooks

  • Multi-source integration: querying "heterogeneous" flows - modeling, language, continuous/temporal queries, optimization.

  • Multi-source querying that adapts to local geographic context.

  • Data mining: segmentation/summarization of temporal traces (GPS logs or other traces).

  • Combining data integration and mining: context-awareness and mobility prediction.

Former Projects

  • IST Satine: Semantic-based Interoperability Infrastructure for Integrating Web Service Platforms to Peer-to-Peer Networks.
  • CEC HEARTS: Health Effects and Risks of Transport Systems
  • ACI SemWeb: Semantic Mediation of Web Sources based on XQuery and more
  • ANR WebContent: Framework for content management and integration of semantic web techniques
  • ANR PlugDB: Design and experimentation of technologies allowing an ubiquitous and secured management of personal data
  • ANR WebStand: Data integration platform for social sciences

Prototypes

  • XLive
    XLive is a light XQuery mediator. It deals with a subset of XQuery. It supports the basic for where return expression of XQuery and nested queries at any level in the where clause, so that nested XML can be composed from the data sources. Multiple data sources are possible using wrappers and adapters. In the current version, the relational DBMSs Oracle 9i and MySQL, and the native XML DBMSs Xyleme, XHive, Exist are supported. For being integrated, a source must conform to a Java XQuery API or to a simple Web service API.

  • Hospital-Records Classifier
    Classification software for Hospital-Records (HR): it assigns automatically one or more ICD (International Classification of Diseases) codes for a given HR.


 

Partnerships

  • Running partnership with Altova.

 Imprimer  E-mail

DMC Firewall is developed by Dean Marshall Consultancy Ltd