Research Areas
    Activities
   Conferences Hosted by SoC
  
Professional Services
    Research Funding
    Research Publications
    Internship
    Useful Links

 

  Home > Research
   
  Database Management System
  The Database Management Systems (DBMS) Research Lab in the School of Computing has a strong group of faculty members and students working on many diverse data management issues, ranging from core database research such as storage structure, query processing and optimisation, and indexing, to advanced database applications such as XML, data mining, data warehousing and peer-to-peer computing.

Our research interests in database management systems are:
  • Database security
  • Distributed systems for standing subscriptions
  • Information integration
  • Privacy and anonymity in databases
  • Semi-structured data repository and query processing
  • Skyline query processing
  • XML query evaluation and estimation
  • Mobile databases
  • Peer-to-peer database management
  • Peer-to-peer technologies for distributed applications
  • Query processing in peer-to-peer (P2P) systems
  • Schema integration and schematic discrepancy
  • Sensor network
  • Spatio-temporal data mining

Database Security

We started working in Database Security in 2001. Security is critical to any computing system. We aim to focus on designing techniques to protect data/ databases. We have focused on several aspects over the years, ranging from storage design (steganographic file systems) to securing data streams. In the past two years, we have studied data authentication in outsourced databases. Here, we have focused on the completeness issue which has not been previously addressed in the literature. Essentially, we have designed a scheme that enables users to verify that results obtained from third party publishers are complete without violating user access control. We are also the first to look into verifying multi-dimensional data.

Distributed Systems for Standing Subscriptions

This project started in January 2006, and is funded by the University Research Committee (URC). The goal of this project is to study efficient techniques to disseminate data in a distributed publish/subscribe environment.

To date, we have achieved the following:

a) We have designed a publish/subscribe system to facilitate dissemination of streaming data where the subscription has a coherency constraint. Our proposed method is also adaptive to runtime changes in the environment, and can adapt the dissemination tree dynamically at runtime to optimise performance.

b) We have also developed a novel architecture for scaleable distributed stream processing. Our system provides services to evaluate a large number of continuous queries for a large number of clients. The query layer is built over a distributed publish/ subscribe architecture. We have also proposed a solution to distribute queries in the query layer.

c) We have developed an efficient approach for filtering and disseminating fragmented XML documents in a publish/subscribe environment. The approach consists of novel query scheduling and optimisation algorithms that exploit the semantics of fragmented data to minimise unnecessary and redundant processing overhead.

d) We have developed a new optimisation technique that enables downstream routers to leverage the processing done by upstream routers to reduce response time. This is achieved by dynamically piggybacking useful processing hints along with the XML data.

Information Integration

The objective of this research is the design of strategies, techniques and tools for the integration and management of multi-modal and multimedia information from distributed, heterogeneous and autonomous sources. The approach is primarily grounded in database research. As such, it emphasises various ideas such as declarative query languages, conceptual data models, adaptive processing and optimisation of programs and queries, and the consideration of large numbers of users and large collections of data. Currently, the research focuses on the following themes: XML data management and processing, progressive processing of disparate information (stream processing), business process definitions management and integration, and algorithms for web information (clustering).

The research findings concern the semantic integration of disparate information sources, in particular, the integration and management of XML data. The integration and management of multi-modal information also concern textual and geographical information indexing, retrieval and extraction, as well as business processes management.

Our research achievements include:

  • The first geographical search engine Global Atlas
  • The X007 benchmark
  • The iMinMax indexing technique for multidimensional data

Privacy and Anonymity in Databases

Organisations, such as government agencies, insurance companies and hospitals, need to release detailed data (e.g., medical records, financial data, etc.) for research and other public benefit purposes. However, sensitive personal information (e.g., disease of a specific person) may be revealed in this process, despite the hiding of identifying attributes, such as name or social security number. This is due to the existence of quasi-identifiers (QI), which are sets of attributes (e.g., ZIP, DateOfBirth, Sex) that can be used in conjunction with information obtained from a variety of sources (e.g., public voting registration data) to reveal the identity of individuals whose data is residing on the records. Indeed, a well-known study has found that 87% of the US population can be uniquely identified by the combination of ZIP, DateOfBirth, Sex. As more data is made available online, greater concern over privacy may be expected. Recent research has proposed k-Anonymity and l-Diversity to solve this problem. Both approaches involve generalisation (e.g., showing the area code instead of the exact phone number) or suppression (completely hiding some data), which inadvertently causes information loss.

Our research focuses on:

a) Developing a framework for efficient data transformations (i.e., k-Anonymity and l-Diversity), which minimises information loss (for the given k or l ). Most existing research employs many-to-one mappings to compute anonymised versions of original multi-dimensional data. We focus on more flexible many-to-many multi-dimensional mappings. Although it has been acknowledged that this method may achieve lower information loss, it has not received much attention because it is computationally expensive. To reduce computation cost, we use dimensionality reduction techniques and study efficient approximation algorithms in one-dimensional space.

b) Studying the dual problem: Given the maximum acceptable information loss, we seek to find a transformation that maximises privacy (i.e., k or l). Although this problem is significant in practice, it has not been studied before.

c) Extending our framework (i.e., anonymisation via dimensionality reduction) to related applications. Examples include location-based services (e.g., online maps), where the location of the querying user (rather than the data) must be anonymised, or stream data (e.g., data from sensors, stock market, etc.).

Semi-structured Data Repository and Query Processing

There has been increased interest in semi-structured data recently with the introduction of XML and related languages and technologies. Semi-structured data is claimed to be self-describing, making it important to define a schema for the data. The data models that have been proposed specifically for semi-structured data, (e.g. DOM, Dataguides, etc.), capture limited semantics in the data, and do not model the semantics expressed in the schema.

Although XML documents could have rather complex internal structures, they can generally be modelled as ordered trees. In most XML query languages, the structures of XML documents are expressed by twig patterns while the values of XML elements are used as part of selection predicates. Efficiently matching all twig patterns in an XML database is a major concern of XML query processing. Among them, the holistic twig join approach has been taken as an efficient way to match twig patterns through reducing intermediate results.

The project has so far achieved the following:

a) We have defined a data model called ORASS that captures richer semantics. Using this semantically rich data model, we have investigated and experimented with more efficient storage mechanisms, identified valid views of the base data, defined efficient view maintenance algorithms, and proposed methods for efficiently querying a semistructured data repository.

b) We have extended the existing Dewey node labelling technique and used it to improve twig pattern query processing.

c) We have developed new node labelling techniques to label dynamic XML data without re-labelling any node labels when the XML data is updated. Prior node labelling techniques used in twig pattern processing are designed for static XML data.

Skyline Query Processing

This project started in January 2005, and is funded by the Faculty Research Committee (FRC). The goal of this project is to investigate efficient techniques to process an important class of qualitative preference queries called skyline queries. A skyline query returns a set of records such that each returned record is not dominated by other records with respect to a userspecified set of attributes.

The project has so far achieved the following:

a) We have developed a novel framework and several scalable algorithms to efficiently process a generalised class of skyline queries that support data attributes with partially ordered domains.

b) We have introduced two novel notions of interesting skyline points (frequent skylines and k-dominant skylines), and developed efficient algorithms to compute them. These concepts are particularly important for high-dimensional data points in enabling a meaningful ranking of large sets of skyline query results.

XML Query Evaluation and Estimation

With the fast-growing use of XML data on the Web, optimising XML queries has become one of the most active and exciting research areas. Developments in query processing and selectivity estimations of XML data are among the major issues since they determine data access methods and the best possible execution plans for complex XML queries.

Our research in this area aims to develop efficient approaches for query evaluation and selectivity estimations of XML queries. We examine how path information in XML data can be utilised to speed up structural join, which is the core operation in XML query processing. The proposed solution comprises a path-based node labelling scheme and a path join algorithm. The path-based approach is also efficient for an important class of XML queries involving structural join with not-predicates.

We have also designed several methods for XML selectivity estimations. These include a compact statistical method, a histogram-based method for skewed XML data, and a path-based method for estimating the selectivity of XML queries with or without order-based axes.


The faculty members involved in database management systems are:

  • BRESSAN Stephane
  • CHAN Chee Yong
  • HSU Wynne
  • KALNIS Panagiotis
  • LEE Mong Li
  • LING Tok Wang
  • OOI Beng Chin
  • TAN Kian Lee
  • TUNG Kum Hoe, Anthony


© Copyright 2001-08 National University of Singapore. All Rights Reserved