Abstract
A data warehouse is a repository of integrated information, which collects and maintains a large amount of data from multiple distributed, autonomous, and possibly heterogeneous data sources. Often the data are stored in the form of materialized views in order to provide fast access to the integrated data. How to maintain the warehouse data completely consistently with the remote source data is a challenging issue in a distributed environment. Transactions containing multiple updates at one or multiple sources further complicate this consistency issue. Due to the fact that a data warehouse usually contains a very large amount of data and its processing is time consuming, it becomes inevitable to introduce parallelism to data warehousing. The popularity and cost-effective parallelism brought by the PC cluster makes it a promising platform for this purpose. This article considers the complete consistency maintenance of select-project-join (SPJ) materialized views. Based on a PC cluster consisting of K personal computers, several parallel maintenance algorithms for the materialized views are presented. The key behind the proposed algorithms is how to trade off the work load among the PCs and how to balance the communications cost among the PCs as well as between the PC cluster and remote sources.
Original language | English |
---|---|
Pages (from-to) | 147-154 |
Number of pages | 8 |
Journal | International Journal of Parallel and Distributed Systems and Networks |
Volume | 5 |
Issue number | 4 |
Publication status | Published - 2002 |