Archive for the ‘VoltDB’ Category

The Shrinking Big Data MarketPlace

Tuesday, May 13th, 2014

VoltDB Survey Finds That Big Data Goes to Waste at Most Organizations

From the post:

VoltDB today announced the findings of an industry survey which reveals that most organizations cannot utilize the vast majority of the Big Data they collect. The study exposes a major Big Data divide: the ability to successfully capture and store huge amounts of data is not translating to improved bottom-line business benefits.

Untapped Data Has Little or No Value

The majority of respondents reveal that their organizations can’t utilize most of their Big Data, despite the fact that doing so would drive real bottom line business benefits.

  • 72 percent of respondents cannot access and/or utilize the majority of the data coming into their organizations.
  • Respondents acknowledge that if they were able to better leverage Big Data their organizations could: deliver a more personalized customer experience (49%); increase revenue growth (48%); and create competitive advantages (47%).

(emphasis added)

News like that makes me wonder how long the market for “big data tools” that can’t produce ROI is going to continue?

I suspect VoltDB has its eyes on addressing the software aspects of the non-utilization problem (more power to them) but that still leaves the usual office politics of who has access to what data and the underlying issues of effectively sharing data across inconsistent semantics.

Topic maps can’t help you address the office politics problem, unless you want to create a map of who is in the way of effective data sharing. Having created such a map, how you resolve personnel issues is your problem.

Topic maps can help with the inconsistent semantics that are going to persist even in the best of organizations. Departments have inconsistent semantics in many cases because their semantics or “silo” if you like, works best for their workflow.

Why not allow the semantics/silo stay in place and map it into other semantics/silos as need be? That way every department gets their familiar semantics and you get the benefit of better workflow.

To put it another way, silos aren’t the problem, it is the opacity of silos that is the problem. Make silos transparent and you have better data interchange and as a consequence, greater access to the data you are collecting.

Improve your information infrastructure on top of improved mapping/access to data and you will start to improve your bottom line. Someday you will get to “big data.” But as the survey says: Using big data tools != improved bottom line.

VoltDB 3.0

Friday, January 25th, 2013

VoltDB 3.0 (press release)

From the press release:

BILLERICA, Mass., January 22, 2013VoltDB, the world’s fastest high-velocity database, today announced the immediate availability of the newest version of its flagship offering, VoltDB 3.0.

VoltDB is an in-memory relational database designed specifically to solve the big data velocity problem. Despite the deafening hype around big data, most enterprises have not been able to build applications that can ingest, analyze and act on massive volumes of data fast enough to deliver business value. VoltDB solves this problem by narrowing the “ingestion-to-decision” gap from minutes, or even hours, to milliseconds.

“With every passing second, time saps the value of data. This is why so many big data applications have not delivered business value – it simply takes too long to analyze and identify actionable information in the morass of data,” said Ryan Hubbard, CTO of Yellowhammer. “VoltDB has solved this problem for Yellowhammer. For the first time, we can ingest, analyze and decision on data in real time. This capability opens a new world of possibilities for applications that truly deliver competitive advantage.”

Purpose built for high velocity big data applications, VoltDB enables real-time visibility into the data that drives business value. With these industry first capabilities, VoltDB is making it possible for developers to create an entirely new generation of big data applications, with application functionality that could not be realized with traditional database offerings.

The Planning Guide for VoltDB is refreshing, albeit a bit brief.

VoltDB is fast, etc., but the Planning Guide makes it clear usefulness is not a given. VoltDB provides a robust foundation but you have to take advantage of it.

VoltDB community. Downloads, documentation, community, etc.

Volt University

Saturday, January 12th, 2013

Volt University

From the homepage:

Volt University is designed to inspire and enable the art of disruption. It gives enterprise and independent developers worldwide the insight, tools, and best practices they need to build applications never before imagined, applications that ingest, analyze, and act on incredibly large volumes of data with real-time speed. This is the power of VoltDB – the fully durable in-memory database that combines high-velocity data ingestion with real-time data analytics and decisioning to turn imagination into reality.

Led by VoltDB’s own engineering organization, Volt University provides customers, partners, and members of the entire VoltDB Community with a vast portfolio of instructional content, classes, tools, and other resources. The curriculum and supporting material range from beginner to advanced, giving developers at all levels the practical knowledge and support they need to build whatever application they can envision.

Formal classes and certification aren’t free but:

Volt University Online – VoltDB delivers a wealth of information and educational content to the VoltDB Community through its Volt University Online offering. From live monthly webcasts and on-demand “how to” videos to white papers, tutorials, demonstrations, and code samples, VoltDB users have a significant library of material to draw from for inspiration and instruction as they design and build high velocity applications on VoltDB. Content is available free of charge for all members of the VoltDB Community – simply click here to access (no form) and start building!

Re-post and say nice things about VoltDB. This sort of behavior should be encouraged.

I first saw this at: VoltDB Launches Volt University.

Masstree – Much Faster than MongoDB, VoltDB, Redis, and Competitive with Memcached

Tuesday, May 1st, 2012

Masstree – Much Faster than MongoDB, VoltDB, Redis, and Competitive with Memcached

From the post:

The EuroSys 2012 system conference has an excellent live blog summary of their talks for: Day 1, Day 2, Day 3 (thanks Henry at the Paper Trail blog). Summaries for each of the accepted papers are here.

One of the more interesting papers from a NoSQL perspective was Cache Craftiness for Fast Multicore Key-Value Storage, a wonderfully detailed description of the low level techniques used to implement Masstree:

A storage system specialized for key-value data in which all data fits in memory, but must persist across server restarts. It supports arbitrary, variable-length keys. It allows range queries over those keys: clients can traverse subsets of the database, or the whole database, in sorted order by key. On a 16-core machine Masstree achieves six to ten million operations per second on parts A–C of the Yahoo! Cloud Serving Benchmark benchmark, more than 30x as fast as VoltDB [5] or MongoDB [2].

An inspiration for anyone pursuing pure performance in the key-value space.

As the authors note when comparing Masstree to other systems:

Many of these systems support features that Masstree does not, some of which may bottleneck their performance. We disable other systems’ expensive features when possible.

The lesson here is to not buy expensive features unless you need them.

VoltDB Version 2.5

Saturday, April 14th, 2012

VoltDB Version 2.5

VoltDB 2.5 has arrived with:

Database Replication. As I’d previously described here, Database Replication is the headline feature of 2.5 (until recently, we referred to the feature as WAN replication). It allows VoltDB databases to be automatically replicated within and across data centers. Available in the VoltDB Enterprise Edition, Database Replication ensures that every database transaction applied to a VoltDB database is asynchronously applied to a defined replica database. Following a catastrophic crash, you can immediately promote the database replica to be the master and redirect all traffic to that cluster. Once the original master has been recovered, you can quickly and easily reverse the process.

In addition to serving disaster recovery needs, you can also use Database Replication to maintain a hot standby database (i.e., to eliminate service windows when you’re doing systems maintenance) and for workload optimization where, for example, write traffic is directed to the master VoltDB database, and read traffic is directed to the replica.

Performance improvements. Version 2.5 includes performance improvements to the VoltDB SQL planner, which benefit all VoltDB products. In addition, we eliminated some unnecessary cluster messaging for single-node deployments, which reduce average transaction latencies to around 1ms for our VoltOne product.

Functional enhancements. In 2.5 we expanded VoltDB’s SQL support and extended support for distributed joins. We also added new administrative options for managing database snapshots and controlling the behavior of command logging activities.

Updated Node.js support. As Andy Wilson describes here, VoltDB 2.5 includes an updated client library for the Node.js programming framework. This driver, which was originally created by community member Jacob Wright, includes performance optimizations, bug fixes and modifications that align the driver with Node.js coding standards.

It may already exist (pointer please!) but with new versions of databases, when not entirely new databases, appearing on a regular basis, a common test suite of data would be a good thing to have. Nothing heavy, say 50 GB uncompressed of CSV files with varying structures.

Thoughts?

Optimizing Distributed Read Operations in VoltDB

Wednesday, August 3rd, 2011

Optimizing Distributed Read Operations in VoltDB

From the post:

Many VoltDB applications, such as gaming leader boards and real-time analytics, use multi-partition procedures to compute consistent global aggregates (and other interesting statistics). It’s challenging to efficiently process distributed reads operations, especially for performance sensitive applications. Based on feedback from our users, we in VoltDB engineering have been enhancing the VoltDB SQL planner over the last few releases to improve this capability.

Executing global aggregates efficiently requires calculating sub-results at each partition replica and combining the sub-results at a coordinating partition to produce the final result. For example, to calculate a total sum, the VoltDB planner should produce a sub-total at each partition and then sum the sub-totals at the coordinator node. All of this work must be transparent to the application, of course.

Hmmm, “global aggregates,” doesn’t that sound familiar? I realize here is means summing up the number of “kills,” “votes,” etc., simple number stuff but in principal, what you return and how you sum it I would think is application specific. Yes?

VoltDB Announces Hadoop Integration

Wednesday, June 22nd, 2011

VoltDB Announces Hadoop Integation

From the announcement:

VoltDB, a leading provider of high-velocity data management systems, today announced the release of VoltDB Integration for Hadoop. The new product functionality, available in VoltDB Enterprise Edition, allows organizations to selectively stream high velocity data from a VoltDB cluster into Hadoop’s native HDFS file system by leveraging Cloudera’s Distribution Including Apache Hadoop (CDH), which has SQL-to-Hadoop integration technology, Apache Sqoop, built in.

“The term ‘big data’ is being applied to a diverse set of data storage and processing problems related to the growing volume, variety and velocity of data and the desire of organizations to store and process data sets in their totality,” said Matt Aslett, senior analyst, enterprise software, The 451 Group. “Choosing the right tool for the job is crucial: high velocity data requires an engine that offers fast throughput and real-time visibility; high volume data requires a platform that can expose insights in massive data sets. Integration between VoltDB and CDH will help organizations to combine two special purpose engines to solve increasingly complex data management problems.”

See also: Cloudera – Apache Hadoop Connector for Netezza.

I can’t imagine a better environment for promotion of topic maps than “big data.” The more data there is processed, the more semantic integration issues will come to the fore. At least to clients paying the bills for sensible answers. It is sorta like putting teenagers in Indy race cars. It won’t take all that long before some of them will decide they need driving lessons.

VoltDB

Saturday, June 18th, 2011

VoltDB

From the website:

VoltDB is a blazingly fast relational database system. It is specifically designed for modern software applications that are pushed beyond their limits by high velocity data sources. This new generation of systems – real-time feeds, machine-generated data, micro-transactions, high performance content serving – requires database throughput that can reach millions of operations per second. What’s more, the applications that use this data must be able to scale on demand, provide flawless fault tolerance and give real-time visibility into the data that drives business value.

Note that the “community” version is only for development, testing, tuning. If you want to go to deployment, commercial licensing kicks in.

It’s encouraging to see all the innovation and development in SQL, NoSQL (mis-named but has stuck), graph databases and the like. Only practical experience will decide which ones survive but in any event, data will be more accessible than ever before. Data analysis and not data access skills will come to the fore.