Archive for the ‘Benchmarks’ Category

XDGBench: 3rd party benchmark results against graph databases [some graph databases]

Tuesday, April 30th, 2013

XDGBench: 3rd party benchmark results against graph databases by Luca Garulli.

From the post:

Toyotaro Suzumura and Miyuru Dayarathna from the Department of Computer Science of the Tokyo Institute of Technology and IBM Research published an interesting research about a benchmark between Graph Databases in the Clouds called:

XGDBench: A Benchmarking Platform for Graph Stores in Exascale Clouds”

This research conducts a performance evaluation of four famous graph data stores AllegroGraph, Fuseki, Neo4j, an OrientDB using XGDBench on Tsubame 2.0 HPC cloud environment. XGDBench is an extension of famous Yahoo! Cloud Serving Benchmark (YCSB).

OrientDB is the faster Graph Database among the 4 products tested. In particular OrientDB is about 10x faster (!) than Neo4j in all the tests.

Look at the Presentation (25 slides) and Research PDF.

Researchers are free to pick any software packages for comparison but the selection here struck me as odd before reading a comment on the original post asking for ObjectivityDB be added to the comparison.

For that matter, where are GraphChi, Infinite Graph, Dex, Titan, FlockDB? Just to call a few of the other potential candidates out.

Will be interesting when a non-winner on such a benchmark cites it for the proposition that easy of use, reliability, lower TOC outweighs brute speed in a benchmark test.

Yahoo! Cloud Serving Benchmark

Friday, April 19th, 2013

Yahoo! Cloud Serving Benchmark

From the webpage:

With the many new serving databases available including Sherpa, BigTable, Azure and many more, it can be difficult to decide which system is right for your application, partially because the features differ between systems, and partially because there is not an easy way to compare the performance of one system versus another.

The goal of the Yahoo! Cloud Serving Benchmark (YCSB) project is to develop a framework and common set of workloads for evaluating the performance of different “key-value” and “cloud” serving stores.

The project comprises two areas:

  • The YCSB Client, an extensible workload generator
  • The Core workloads, a set of workload scenarios to be executed by the generator

Although the core workloads provide a well-rounded picture of a system’s performance, the Client is extensible so that you can define new and different workloads to examine system aspects, or application scenarios, not adequately covered by the core workload. Similarly, the Client is extensible to support benchmarking different databases. Although we include sample code for benchmarking HBase and Cassandra, it is straightforward to write a new interface layer to benchmark your favorite database.

A common use of the tool is to benchmark multiple systems and compare them. For example, you can install multiple systems on the same hardware configuration, and run the same workloads against each system. Then you can plot the performance of each system (for example, as latency versus throughput curves) to see when one system does better than another.

The Yahoo! Cloud Serving Benchmark (YCSB) doesn’t get discussed in the video and only briefly in the paper: How to Compare NoSQL Databases.

YCSB source code and Benchmarking Cloud Serving Systems with YCSB may be helpful.

Performance of databases depend upon your point of view, benchmarks and their application and no doubt other causes as well.

Would make an interesting topic map project to make create a comparison of the metrics from different benchmarks and to attempt to create a crosswalk between them.

That would require a very deep and explicit definition of commonalities and differences between the benchmarks and their application to various database architectures.

How to Compare NoSQL Databases

Friday, April 19th, 2013

How to Compare NoSQL Databases by Ben Engber. (video)

From the description:

Ben Engber, CEO and founder of Thumbtack Technology, will discuss how to perform tuned benchmarking across a number of NoSQL solutions (Couchbase, Aerospike, MongoDB, Cassandra, HBase, others) and to do so in a way that does not artificially distort the data in favor of a particular database or storage paradigm. This includes hardware and software configurations, as well as ways of measuring to ensure repeatable results.

We also discuss how to extend benchmarking tests to simulate different kinds of failure scenarios to help evaluate the maintainablility and recoverability of different systems. This requires carefully constructed tests and significant knowledge of the underlying databases — the talk will help evaluators overcome the common pitfalls and time sinks involved in trying to measure this.

Lastly we discuss the YCSB benchmarking tool, its significant limitations, and the significant extensions and supplementary tools Thumbtack has created to provide distributed load generation and failure simulation.

Ben makes a very good case for understanding the details of your use case versus the characteristics of particular NoSQL solutions.

Where you will find “better” performance depends on non-obvious details.

Watch the use of terms like “consistency” in this presentation.

The paper Ben refers to: Ultra-High Performance NoSQL Benchmarking: Analyzing Durability and Performance Tradeoffs.

Forty-three pages of analysis and charts.

Slow but interesting reading.

If you are into the details of performance and NoSQL databases.

LinkBench [Graph Benchmark]

Tuesday, April 2nd, 2013

LinkBench

From the webpage:

LinkBench Overview

LinkBench is a database benchmark developed to evaluate database performance for workloads similar to those of Facebook’s production MySQL deployment. LinkBench is highly configurable and extensible. It can be reconfigured to simulate a variety of workloads and plugins can be written for benchmarking additional database systems.

LinkBench is released under the Apache License, Version 2.0.

Background

One way of modeling social network data is as a social graph, where entities or nodes such as people, posts, comments and pages are connected by links which model different relationships between the nodes. Different types of links can represent friendship between two users, a user liking another object, ownership of a post, or any relationship you like. These nodes and links carry metadata such as their type, timestamps and version numbers, along with arbitrary payload data.

Facebook represents much of its data in this way, with the data stored in MySQL databases. The goal of LinkBench is to emulate the social graph database workload and provide a realistic benchmark for database performance on social workloads. LinkBench’s data model is based on the social graph, and LinkBench has the ability to generate a large synthetic social graph with key properties similar to the real graph. The workload of database operations is based on Facebook’s production workload, and is also generated in such a way that key properties of the workload match the production workload.

A benchmark for testing your graph database performance!

Additional details at: LinkBench: A database benchmark for the social graph by Tim Armstrong.

I first saw this in a tweet by Stefano Bertolo.

Algebraix Data Achieves Unrivaled Semantic Benchmark Performance

Saturday, March 16th, 2013

Algebraix Data Achieves Unrivaled Semantic Benchmark Performance by Angela Guess.

From the post:

Algebraix Data Corporation today announced its SPARQL Server(TM) RDF database successfully executed all 17 of its queries on the SP2 benchmark up to one billion triples on one computer node. The SP2 benchmark is the most computationally complex for testing SPARQL performance and no other vendor has reported results for all queries on data sizes above five million triples.

Furthermore, SPARQL Server demonstrated linear performance in total SP2Bench query time on data sets from one million to one billion triples. These latest dramatic results are made possible by algebraic optimization techniques that maximize computing resource utilization.

“Our outstanding SPARQL Server performance is a direct result of the algebraic techniques enabled by our patented Algebraix technology,” said Charlie Silver, CEO of Algebraix Data. “We are investing heavily in the development of SPARQL Server to continue making substantial additional functional, performance and scalability improvements.”

Pretty much a copy of the press release from Algebraix.

You may find:

Doing the Math: The Algebraix DataBase Whitepaper: What it is, how it works, why we need it (PDF) by Robin Bloor, PhD

ALGEBRAIX Technology Mathematics Whitepaper (PDF), by Algebraix Data

and,

Granted Patents

more useful.

BTW, The SP²Bench SPARQL Performance Benchmark, will be useful as well.

Algebraix listed its patents but I supplied the links. Why the links were missing at Algebraix I cannot say.

If the “…no other vendor has reported results for all queries on data sizes above five million triples…” is correct, isn’t scaling an issue for SQARQL?

S3G2: A Scalable Structure-Correlated Social Graph Generator

Sunday, February 24th, 2013

S3G2: A Scalable Structure-Correlated Social Graph Generator by Minh-Duc Pham, Peter Boncz, Orri Erling. (The same text you will find at: Selected Topics in Performance Evaluation and Benchmarking Lecture Notes in Computer Science Volume 7755, 2013, pp 156-172. DOI: 10.1007/978-3-642-36727-4_11)

Abstract:

Benchmarking graph-oriented database workloads and graph-oriented database systems is increasingly becoming relevant in analytical Big Data tasks, such as social network analysis. In graph data, structure is not mainly found inside the nodes, but especially in the way nodes happen to be connected, i.e. structural correlations. Because such structural correlations determine join fan-outs experienced by graph analysis algorithms and graph query executors, they are an essential, yet typically neglected, ingredient of synthetic graph generators. To address this, we present S3G2: a Scalable Structure-correlated Social Graph Generator. This graph generator creates a synthetic social graph, containing non-uniform value distributions and structural correlations, which is intended as test data for scalable graph analysis algorithms and graph database systems. We generalize the problem by decomposing correlated graph generation in multiple passes that each focus on one so-called correlation dimension; each of which can be mapped to a MapReduce task. We show that S3G2 can generate social graphs that (i) share well-known graph connectivity characteristics typically found in real social graphs (ii) contain certain plausible structural correlations that influence the performance of graph analysis algorithms and queries, and (iii) can be quickly generated at huge sizes on common cluster hardware.

You may also want to see the slides.

What a nice way to start the week!

Enjoy!

I first saw this at Datanami.

TPC Benchmark H

Wednesday, February 13th, 2013

TPC Benchmark H

From the webpage:

Summary

The TPC Benchmark™H (TPC-H) is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions.

The performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@Size), and reflects multiple aspects of the capability of the system to process queries. These aspects include the selected database size against which the queries are executed, the query processing power when queries are submitted by a single stream, and the query throughput when queries are submitted by multiple concurrent users. The TPC-H Price/Performance metric is expressed as $/QphH@Size.

Just in case you want to incorporate the TPC-H benchmark into your NoSQL solution.

I don’t recall any literature on benchmarks for semantic integration solutions.

At least in the sense of either the speed of semantic integration based on some test of semantic equivalence or the range of semantic equivalents handled by a particular engine.

You?

Semantic Technology ROI: Article of Faith? or Benchmarks for 1.28% of the web?

Friday, December 14th, 2012

Orri Erling, in LDBC: A Socio-technical Perspective, writes in part:

I had a conversation with Michael at a DERI meeting a couple of years ago about measuring the total cost of technology adoption, thus including socio-technical aspects such as acceptance by users, learning curves of various stakeholders, whether in fact one could demonstrate an overall gain in productivity arising from semantic technologies. [in my words, paraphrased]

“Can one measure the effectiveness of different approaches to data integration?” asked I.

“Of course one can,” answered Michael, “this only involves carrying out the same task with two different technologies, two different teams and then doing a double blind test with users. However, this never happens. Nobody does this because doing the task even once in a large organization is enormously costly and nobody will even seriously consider doubling the expense.”

LDBC does in fact intend to address technical aspects of data integration, i.e., schema conversion, entity resolution, and the like. Addressing the sociotechnical aspects of this (whether one should integrate in the first place, whether the integration result adds value, whether it violates privacy or security concerns, whether users will understand the result, what the learning curves are, etc.) is simply too diverse and so totally domain dependent that a general purpose metric cannot be developed, at least not in the time and budget constraints of the project. Further, adding a large human element in the experimental setting (e.g., how skilled the developers are, how well the stakeholders can explain their needs, how often these needs change, etc.) will lead to experiments that are so expensive to carry out and whose results will have so many unquantifiable factors that these will constitute an insuperable barrier to adoption.

The need for parallel systems to judge the benefits of a new technology is a straw man. And one that is easy to dispel.

For example, if your company provides technical support, you are tracking metrics on how quickly your staff can answer questions. And probably customer satisfaction with your technical support.

Both are common metrics in use today.

Assume the suggestion that linked data to improve technical support for your products. You begin with a pilot project to measure the benefit from the suggested change.

If the length of support calls goes down or customer customer satisfaction goes up, or both, change to linked data. If not, don’t.

Naming a technology as “semantic” doesn’t change how you measure the benefits of a change in process.

LDBC will find purely machine based performance measures easier to produce than answering more difficult socio-technical issues.

But of what value are great benchmarks for a technology that no one wants to use?

See my comments under: Web Data Commons (2012) – [RDFa at 1.28% of 40.5 million websites]. Benchmarks for 1.28% of the web?

LDBC: Linked Data Benchmark Council [I count 15 existing Graph/RDF benchmarks. You?]

Saturday, November 10th, 2012

LDBC: Linked Data Benchmark Council

From the webpage:

In the last years we have seen an explosion of massive amounts of graph shaped data coming from a variery of applications that are related to social networks like facebook, twitter, blogs and other on-line media and telecommunication networks. Furthermore, the W3C linking open data initiative has boosted the publication and interlinkage of a large number of datasets on the semantic web resulting to the Linked Data Cloud. These datasets with billions of RDF triples such as Wikipedia, U.S. Census bureau, CIA World Factbook, DBPedia, and government sites have been created and published online. Moreover, numerous datasets and vocabularies from e-science are published nowadays as RDF graphs most notably in life and earth sciences, astronomy in order to facilitate community annota- tion and interlinkage of both scientific and scholarly data of interest.

Technology and bandwidth now provide the opportunities for compiling, publishing and sharing massive Linked Data datasets. A significant number of commercial semantic repositories (RDF databases with reasoner and query-engine) which are the cornerstone of the Semantic Web exist.

Neverthless at the present time,

  • there is no comprehensive suite of benchmarks that encourage the advancement of technology by providing both academia and industry with clear targets for performance and functionality and
  • no independent authority for developing benchmarks and verifying the results of those engines. The same holds for the emerging field of noSQL graph databases, which share with RDF a graph data model and pattern- and pathoriented query languages.

The Linked Data Benchmark Council (LDBC) project aims to provide a solution to this problem by making insightful the critical properties of graph and RDF data management technology, and stimulating progress through compettion. This is timely and urgent since non-relational data management is emerging as a critical need for the new data economy based on large, distributed, heterogeneous, and complexly structured data sets. This new data management paradigm also provides an opportunity for research results to impact young innovative companies working on RDF and graph data management to start playing a significant role in this new data economy.

This announcement puzzled me because I know I have seen (and written about) graph benchmarks.

A quick search with a popular search engine turned up three of the better known graph benchmarks (in the first ten “hits”):

  1. BHOSLIB: Benchmarks with Hidden Optimum Solutions for Graph Problems (Maximum Clique, Maximum Independent Set, Minimum Vertex Cover and Vertex Coloring) —— Hiding Exact Solutions in Random Graphs by Ke XU, Beijing University of Aeronautics and Astronautics.
  2. HPC Graph Analysis From the homepage:

    We maintain a parallel graph theory benchmark that solves multiple graph analysis kernels on small-world networks. An early version of the benchmark was part of the DARPA High Productivity Computing Systems (HPCS) Compact Application (SSCA) suite. The benchmark performance across current HPC systems can be compared using a single score called TrEPS (Traversed Edges Per Second).

  3. Graph 500. From their specifications page:

    There are five problem classes defined by their input size:

    toy 17GB or around 1010 bytes, which we also call level 10,

    mini 140GB (1011 bytes, level 11),

    small 1TB (1012 bytes, level 12),

    medium 17TB (1013 bytes, level 13),

    large 140TB (1014 bytes, level 14), and

    huge 1.1PB (1015 bytes, level 15).

On RDF graphs in particular, the W3C wiki page: RDF Store Benchmarking, has a host of resources, including twelve (12) benchmarks for RDF Stores:

For a total of fifteen (15) graph/RDF benchmarks, discoverable in just a few minutes.

Given the current financial difficulties at the EU, duplicating research already performed/underway by others is a poor investment.


PS: Pass my name along to anyone you know in the EU research approval committees. I would be happy to do new/not-new evaluations of proposals on a contract basis.

PPS: As always, if you know of other graph/RDF benchmarks, I would be happy to add them. I deliberately did not attempt an exhaustive survey of graph or RDF benchmarks. If you or your company are interested in such a survey, ping me.

Ålenkå

Monday, January 30th, 2012

Ålenkå

If you don’t mind alpha code, ålenkå was pointed out in the bitmap posting I cited earlier today.

From its homepage:

Alenka is a modern analytical database engine written to take advantage of vector based processing and high bandwidth of modern GPUs.

Features include:

Vector-based processing
CUDA programming model allows a single operation to be applied to an entire set of data at once.

Self optimizing compression
Ultra fast compression and decompression performed directly inside GPU

Column-based storage
Minimize disk I/O by only accessing the relevant data

Fast database loads
Data load times measured in minutes, not in hours.

Open source and free

Apologies for the name spelling differences, Ålenkå versus Alenka. I suspect it has something to do with character support in whatever produced the readme file, but can’t say for sure.

The benchmarks (there is that term again) are impressive.

Would semantic benchmarks be different from the ones used in IR currently? Different from precision and recall? What about range (same subject but identified differently) or accuracy (different identifications but same subject, how many false positives)?

The VLTS Benchmark Suite

Wednesday, August 31st, 2011

The VLTS Benchmark Suite

From the webpage:

The VLTS acronym stands for “Very Large Transition Systems“.

The VLTS benchmark suite is a collection of Labelled Transition Systems (hereafter called benchmarks).

Each Labelled Transition System is a directed, connected graph, whose vertices are called states and whose edges are called transitions. There is one distinguished vertex called the initial state. Each transition is labelled by a character string called action or label. There is one distinguished label noted “i” that is used for so-called invisible transitions (also known as hidden transitions or tau-transitions).

The VLTS benchmarks have been obtained from various case studies about the modelling of communication protocols and concurrent systems. Many of these case studies correspond to real life, industrial systems.

If you aren’t already working with large graphs in your work on topic maps, you will be.