Archive for the ‘Query Engine’ Category

FQL: A Functorial Query Language

Thursday, March 6th, 2014

FQL: A Functorial Query Language

From the webpage:

The FQL IDE is a visual schema mapping tool for developing FQL programs. It can run FQL programs, generate SQL from FQL, generate FQL from SQL, and generate FQL from schema correspondences. Using JDBC, it can run transparently using an external SQL engine and on external database instances. It can output RDF/OWL/XML and comes with many built-in examples. David Spivak and Ryan Wisnesky are the primary contributors. Requires Java 7.

As if FQL and the IDE weren’t enough, papers, slides, source code await you.

I first saw this in a tweet by Computer Science.

Apache Drill

Sunday, May 19th, 2013

Michael Hausenblas at NoSQL Matters 2013 does a great lecture on Apache Drill.

Slides.

Google’s Dremel Paper

Projects “beta” for Apache Drill by second quarter and GA by end of year.

Apache Drill User.

From the rationale:

There is a strong need in the market for low-latency interactive analysis of large-scale datasets, including nested data (eg, JSON, Avro, Protocol Buffers). This need was identified by Google and addressed internally with a system called Dremel.

How do you handle ad hoc exploration of data sets as part of planning a topic map?

Being able to “test” merging against data prior to implementation sounds like a good idea.

Aggregate Skyline Join Queries: Skylines with Aggregate Operations over Multiple Relations

Tuesday, January 29th, 2013

Aggregate Skyline Join Queries: Skylines with Aggregate Operations over Multiple Relations by Arnab Bhattacharya and B. Palvali Teja.
(Submitted on 28 Jun 2012)

Abstract:

The multi-criteria decision making, which is possible with the advent of skyline queries, has been applied in many areas. Though most of the existing research is concerned with only a single relation, several real world applications require finding the skyline set of records over multiple relations. Consequently, the join operation over skylines where the preferences are local to each relation, has been proposed. In many of those cases, however, the join often involves performing aggregate operations among some of the attributes from the different relations. In this paper, we introduce such queries as “aggregate skyline join queries”. Since the naive algorithm is impractical, we propose three algorithms to efficiently process such queries. The algorithms utilize certain properties of skyline sets, and processes the skylines as much as possible locally before computing the join. Experiments with real and synthetic datasets exhibit the practicality and scalability of the algorithms with respect to the cardinality and dimensionality of the relations.

The authors illustrate a “skyline” query with a search for a hotel that has a good price and it close to the beach. A “skyline” set of hotels excludes hotels that are not as good on those points as hotels in the set. They then observe:

In real applications, however, there often exists a scenario when a single relation is not sufficient for the application, and the skyline needs to be computed over multiple relations [16]. For example, consider a flight database. A person traveling from city A to city B may use stopovers, but may still be interested in flights that are cheaper, have a less overall journey time, better ratings and more amenities. In this case, a single relation specifying all direct flights from A to B may not suffice or may not even exist. The join of multiple relations consisting of flights starting from A and those ending at B needs to be processed before computing the preferences.

The above problem becomes even more complex if the person is interested in the travel plan that optimizes both on the total cost as well as the total journey time for the two flights (other than the ratings and amenities of each
airline). In essence, the skyline now needs to be computed on attributes that have been aggregated from multiple relations in addition to attributes whose preferences are local within each relation. The common aggregate operations are sum, average, minimum, maximum, etc.

No doubt the travel industry thinks it has conquered semantic diversity in travel arrangements. If they have, it has since I stopped traveling several years ago.

Even simple tasks such as coordination of air and train schedules was unnecessarily difficult.

I suspect that is still the case and so mention “skyline” queries as a topic to be aware of and if necessary, to include in a topic map application that brings sanity to travel arrangements.

True, you can get a travel service that handles all the details, but only for a price and only if you are that trusting.

Google open sources Supersonic query engine

Wednesday, October 17th, 2012

Google open sources Supersonic query engine

From the post:

Google has released Supersonic, a query engine designed to work efficiently with column-oriented databases. The announcement suggests that Supersonic would be “extremely useful for creating a column oriented database back-end”, and that it aims to offer “second-to-none execution times”. As part of achieving that design goal, the C++ library uses many low-level, cache-aware optimisations, SIMD instructions and vectorised execution so that it can make the best use of modern pipelined CPUs, while still working as a single process.

Supersonic can perform “Operations” on columnar data such as Compute, Filter, Sort, HashJoin, and more; on views these operations can be chained together to produce a final result. Data for these operations is currently held in memory; there is no current built-in data storage format, but the developers say that there is “a strong intention of developing one”. Other work in progress includes the provision of wide test coverage for the library. A tarball archive of the code is available to download, while the source can be git cloned from the Google Code project pages.

Do you ever wonder what “secret” software must be like to have packages like this in open source?