Generating query-answering plans for data integration systems requires to translate a user query, formulated in terms of a mediated schema, to a query that uses relations that are actually stored in data sources. Previous solutions to the translation problem produced sets of conjunctive plans, and w
Query containment for data integration systems
β Scribed by Todd Millstein; Alon Halevy; Marc Friedman
- Publisher
- Elsevier Science
- Year
- 2003
- Tongue
- English
- Weight
- 213 KB
- Volume
- 66
- Category
- Article
- ISSN
- 0022-0000
No coin nor oath required. For personal study only.
β¦ Synopsis
The problem of query containment is fundamental to many aspects of database systems, including query optimization, determining independence of queries from updates, and rewriting queries using views. In the data-integration framework, however, the standard notion of query containment does not suffice. We define relative containment, which formalizes the notion of query containment relative to the sources available to the data-integration system. First, we provide optimal bounds for relative containment for several important classes of datalog queries, including the common case of conjunctive queries. Next, we provide bounds for the case when sources enforce access restrictions in the form of binding pattern constraints. Surprisingly, we show that relative containment for conjunctive queries is still decidable in this case, even though it is known that finding all answers to such queries may require a recursive datalog program over the sources. Finally, we provide tight bounds for variants of relative containment when the queries and source descriptions may contain comparison predicates.
π SIMILAR VOLUMES
In this paper we present the design of two essential components for the spatial querying system we have been developing. The overall system architecture utilizes multiple levels of agents to process external sources of spatial data. Upon a user query, agents are spawned to mine various web sources,
A data integration system provides the user with a unified view, called global schema, of the data residing at different sources. Users issue their queries against the global schema, and the system computes answers to queries by suitably accessing the sources, through the mapping, i.e., the specific
In distributed environments, replication of data provides improved availability, isolation between workloads with different characteristics, and improved performance through local access to data. The "real data" is server resident and by "local data" we refer to cached client data. We examine which