Archive for the ‘Database’ Category
Friday, May 3rd, 2013
Designing Databases for Historical Research by Matt Phillpott.
From the post:
The Institute of Historical Research now offer a wide selection of digital research training packages designed for historians and made available online on History SPOT. Most of these have received mention on this blog from time to time and hopefully some of you will have had had a good look at them. These courses are freely available and we only ask that you register for History SPOT to access them (which is a free and easy process). Full details of our online and face-to-face courses can also be found on the IHR website. Here is a brief look at one of them.
Designing Databases for Historical Research was one of two modules that we launched alongside History SPOT late in 2011. Unlike most courses on databases that are generic in scope, this module focuses very much on the historian and his/her needs. The module is written in a handbook format by Dr Mark Merry. Mark runs our face to face databases course and is very much the man to go to for advice on building databases to house historical data.
The module looks at the theory behind using databases rather than showing you how to build them. It is very much a starting point, a place to go to before embarking on the lengthy time that databases require of their creators. Is your historical data appropriate for database use or should a different piece of software be used? What things should you consider before starting the database? Getting it right from the very beginning does save you a lot of time and frustration later on.
If you need more convincing then here is a snippet from the module, where Mark discusses the importance of thinking about the data and database before you even open up the software.
Great background material if you are working in history or academic circles.
Posted in Database, History | No Comments »
Wednesday, May 1st, 2013
My first encounter with this series by Oren Eini was: Reviewing LevelDB: Part XVIII–Summary.
At first I thought it had to be a late April Fool’s day joke.
On further investigation, much to my delight, it was not!
Searching his blog returned a hodge-podge listing in no particular order, with some omissions.
As a service to you (and myself), I have collated the posts in order:
Reviewing LevelDB, Part I: What is this all about?
Reviewing LevelDB: Part II, Put some data on the disk, dude
Reviewing LevelDB: Part III, WriteBatch isn’t what you think it is
Reviewing LevelDB: Part IV: On std::string, buffers and memory management in C++
Reviewing LevelDB: Part V, into the MemTables we go
Reviewing LevelDB: Part VI, the Log is base for Atomicity
Reviewing LevelDB: Part VII–The version is where the levels are
Reviewing LevelDB: Part VIII–What are the levels all about?
Reviewing RaveDB [LevelDB]: Part IX- Compaction is the new black
Reviewing LevelDB: Part X–table building is all fun and games until…
Reviewing LevelDB: Part XI–Reading from Sort String Tables via the TableCache
Reviewing RavenDB [LevelDB]: Part XII–Reading an SST
Reviewing LevelDB: Part XIII–Smile, and here is your snapshot
Reviewing LevelDB: Part XIV– there is the mem table and then there is the immutable memtable
Reviewing LevelDB: Part XV–MemTables gets compacted too
Reviewing LevelDB: Part XVI–Recovery ain’t so tough?
Reviewing LevelDB: Part XVII– Filters? What filters? Oh, those filters …
Reviewing LevelDB: Part XVIII–Summary
Parts IX and XII have typos in the titles, RavenDB instead of LevelDB.
Now there is a model for reviewing database software!
Posted in Database, leveldb | No Comments »
Saturday, April 27th, 2013
Strange Loop 2013
Dates:
- Call for presentation opens: Apr 15th, 2013
- Call for presentation ends: May 9, 2013
- Speakers notified by: May 17, 2013
- Registration opens: May 20, 2013
- Conference dates: Sept 18-20th, 2013
From the webpage:
Below is some guidance on the kinds of topics we are seeking and have historically accepted.
- Frequently accepted or desired topics: functional programming, logic programming, dynamic/scripting languages, new or emerging languages, data structures, concurrency, database internals, NoSQL databases, key/value stores, big data, distributed computing, queues, asynchronous or dataflow concurrency, STM, web frameworks, web architecture, performance, virtual machines, mobile frameworks, native apps, security, biologically inspired computing, hardware/software interaction, historical topics.
- Sometimes accepted (depends on topic): Java, C#, testing frameworks, monads
- Rarely accepted (nothing wrong with these, but other confs cover them well): Agile, JavaFX, J2EE, Spring, PHP, ASP, Perl, design, layout, entrepreneurship and startups, game programming
It isn’t clear why Strange Loop claims to have “archives:”
2009 – 2010 – 2011 – 2012
As far as I can tell, these are listings with bios of prior presentations, but no substantive content.
Am I missing something?
Posted in Conferences, Data Structures, Database, NoSQL, Programming | No Comments »
Friday, April 19th, 2013
Apache Hadoop and Data Agility by Ofer Mendelevitch.
From the post:
In a recent blog post I mentioned the 4 reasons for using Hadoop for data science. In this blog post I would like to dive deeper into the last of these reasons: data agility.
In most existing data architectures, based on relational database systems, the data schema is of central importance, and needs to be designed and maintained carefully over the lifetime of the project. Furthermore, whatever data fits into the schema will be stored, and everything else typically gets ignored and lost. Changing the schema is a significant undertaking, one that most IT organizations don’t take lightly. In fact, it is not uncommon for a schema change in an operational RDBMS system to take 6-12 months if not more.
Hadoop is different. A schema is not needed when you write data; instead the schema is applied when using the data for some application, thus the concept of “schema on read”.
If a schema is supplied “on read,” how is data validation accomplished?
I don’t mean in terms of datatypes such as string, integer, double, etc. That are trivial forms of data validation.
How do we validate the semantics of data when a schema is supplied on read?”
Mistakes do happen in RDBMS systems but with a schema, which defines data semantics, applications can attempt to police those semantics.
I don’t doubt that schema “on read” supplies a lot of useful flexibility, but how do we limit the damage that flexibility can cause?
For example, many years ago, area codes (for telephones) in the USA were tied to geographic exchanges. Data from the era still exists in the bowels of some data stores. That is no longer true in many cases.
Let’s assume I have older data that has area codes tied to geographic areas and newer data that has area codes that are not. Without a schema to define the area code data in both cases, how would I know to treat the area code data differently?
I concede that schema “on read” can be quite flexible.
On the other hand, let’s not discount the value of schema “on write” as well.
Posted in BigData, Database, Hadoop, SQL, Schema | No Comments »
Friday, April 19th, 2013
How to Compare NoSQL Databases by Ben Engber. (video)
From the description:
Ben Engber, CEO and founder of Thumbtack Technology, will discuss how to perform tuned benchmarking across a number of NoSQL solutions (Couchbase, Aerospike, MongoDB, Cassandra, HBase, others) and to do so in a way that does not artificially distort the data in favor of a particular database or storage paradigm. This includes hardware and software configurations, as well as ways of measuring to ensure repeatable results.
We also discuss how to extend benchmarking tests to simulate different kinds of failure scenarios to help evaluate the maintainablility and recoverability of different systems. This requires carefully constructed tests and significant knowledge of the underlying databases — the talk will help evaluators overcome the common pitfalls and time sinks involved in trying to measure this.
Lastly we discuss the YCSB benchmarking tool, its significant limitations, and the significant extensions and supplementary tools Thumbtack has created to provide distributed load generation and failure simulation.
Ben makes a very good case for understanding the details of your use case versus the characteristics of particular NoSQL solutions.
Where you will find “better” performance depends on non-obvious details.
Watch the use of terms like “consistency” in this presentation.
The paper Ben refers to: Ultra-High Performance NoSQL Benchmarking: Analyzing Durability and Performance Tradeoffs.
Forty-three pages of analysis and charts.
Slow but interesting reading.
If you are into the details of performance and NoSQL databases.
Posted in Aerospike, Benchmarks, Cassandra, Couchbase, Database, MongoDB, NoSQL | 1 Comment »
Saturday, April 6th, 2013
ODBMS.ORG – Object Database Management Systems
From the “about” page:
Launched in 2005, ODBMS.ORG was created to serve faculty and students at educational and research institutions as well as software developers in the open source community or at commercial companies.
It is designed to meet the fast-growing need for resources focusing on Big Data, Analytical data platforms, Scalable Cloud platforms, Object databases, Object-relational bindings, NoSQL databases, Service platforms, and new approaches to concurrency control
This portal features an easy introduction to ODBMSs as well as free software, lecture notes, tutorials, papers and other resources for free download. It is complemented by listings of relevant books and vendors to provide a comprehensive and up-to-date overview of available resources.
The Expert Section contains exclusive contributions from 130+ internationally recognized experts such as Suad Alagic, Scott Ambler, Michael Blaha, Jose Blakeley, Rick Cattell, William Cook, Ted Neward, and Carl Rosenberger.
The ODBMS Industry Watch Blog is part of this portal and contains up to date Information, Trends, and Interviews with industry leaders on Big Data, New Data Stores (NoSQL, NewSQL Databases), New Developments and New Applications for Objects and Databases, New Analytical Data Platforms, Innovation.
The portal’s editor, Roberto V. Zicari, is Professor of Database and Information Systems at Frankfurt University and representative of the Object Management Group (OMG) in Europe. His interest in object databases dates back to his work at the IBM Research Center in Almaden, CA, in the mid ’80s, when he helped craft the definition of an extension of the relational data model to accommodate complex data structures. In 1989, he joined the design team of the Gip Altair project in Paris, later to become O2, one of the world’s first object database products.
All materials and downloads are free and anonymous.
Non-profit ODBMS.ORG is made possible by contributions from ODBMS.ORG’s Panel of Experts,and the support of the sponsors displayed in the right margin of these pages.
http://www.odbms.org/About/books.aspx
The free download page is what first attracted my attention.
By any measure, a remarkable collection of material.
Ironic isn’t it?
CS needs to develop better access strategies for its own output.
Posted in Database, ODBMS | No Comments »
Wednesday, March 27th, 2013
Database Landscape Map – February 2013 by 451 Research.

A truly awesome map of available databases.
Originated from Neither fish nor fowl: the rise of multi-model databases by Matthew Aslett.
Matthew writes:
One of the most complicated aspects of putting together our database landscape map was dealing with the growing number of (particularly NoSQL) databases that refuse to be pigeon-holed in any of the primary databases categories.
I have begun to refer to these as “multi-model databases” in recognition of the fact that they are able to take on the characteristics of multiple databases. In truth though there are probably two different groups of products that could be considered “multi-model”:
I think I understand the grouping from the key to the map but the ordering within groups, if meaningful, escapes me.
I am sure you will recognize most of the names but equally sure there will be some you can’t quite describe.
Enjoy!
Posted in Database, Graph Databases, Key-Value Stores, NoSQL, SQL, Software | 1 Comment »
Monday, March 18th, 2013
39th International Conference on Very Large Data Bases
Dates:
Submissions still open:
Industrial & Application Papers, Demonstration Proposals, Tutorial Proposals, PhD Workshop Papers, due by March 31st, 2013, author notification: May 31st, 2013
Conference: August 26 – 30, 2013.
From the webpage:
VLDB is a premier annual international forum for data management and database researchers, vendors, practitioners, application developers, and users. The conference will feature research talks, tutorials, demonstrations, and workshops. It will cover current issues in data management, database and information systems research. Data management and databases remain among the main technological cornerstones of emerging applications of the twenty-first century.
VLDB 2013 will take place at the picturesque town of Riva del Garda, Italy. It is located close to the city of Trento, on the north shore of Lake Garda, which is the largest lake in Italy, formed by glaciers at the end of the last ice age. In the 17th century, Lake Garda became a popular destination for young central European nobility. The list of its famous guests includes Goethe, Freud, Nietzsche, the Mann brothers, Kafka, Lawrence, and more recently James Bond. Lake Garda attracts many tourists every year, and offers numerous opportunities for sightseeing in the towns along its shores (e.g., Riva del Garda, Malcesine, Torri del Benaco, Sirmione), outdoors activities (e.g., hiking, wind-surfing, swimming), as well as fun (e.g., Gardaland amusement theme park).
Smile when you point “big data” colleagues to 1st Very Large Data Bases VLDB 1975: Framingham, Massachusetts.
Some people catch on sooner than others.
Posted in BigData, Conferences, Database | No Comments »
Saturday, March 9th, 2013
The god Architecture
From the overview:
god is a scalable, performant, persistent, in-memory data structure server. It allows massively distributed applications to update and fetch common data in a structured and sorted format.
Its main inspirations are Redis and Chord/DHash. Like Redis it focuses on performance, ease of use and a small, simple yet powerful feature set, while from the Chord/DHash projects it inherits scalability, redundancy, and transparent failover behaviour.
This is a general architectural overview aimed at somewhat technically inclined readers interested in how and why god does what it does.
To try it out right now, install Go, git, Mercurial and gcc, go get github.com/zond/god/god_server, run god_server, browse to http://localhost:9192/.
For API documentation, go to http://go.pkgdoc.org/github.com/zond/god.
For the source, go to https://github.com/zond/god
I know, “in memory” means its not “web scale” but to be honest, I have a lot of data needs that aren’t “web scale.”
There, I’ve said it. Some (most?) important data is not “web scale.”
And when it is, I only have to check my spam filter for options to deal with “web scale” data.
The set operations in particular look quite interesting.
Enjoy!
I first saw this in Nat Torkington’s Four short links: 1 March 2013.
Posted in DHash, Database, Redis, god Architecture | No Comments »
Thursday, February 14th, 2013
Survey of graph database models by Renzo Angles and Claudio Gutierrez. (ACM Computing Surveys (CSUR) Surveys, Volume 40 Issue 1, February 2008, Article No. 1 )
Abstract:
Graph database models can be defined as those in which data structures for the schema and instances are modeled as graphs or generalizations of them, and data manipulation is expressed by graph-oriented operations and type constructors. These models took off in the eighties and early nineties alongside object-oriented models. Their influence gradually died out with the emergence of other database models, in particular geographical, spatial, semistructured, and XML. Recently, the need to manage information with graph-like nature has reestablished the relevance of this area. The main objective of this survey is to present the work that has been conducted in the area of graph database modeling, concentrating on data structures, query languages, and integrity constraints.
If you need an antidote for graph database hype, look no further than this thirty-nine (39) page survey article.
You will come away with a deeper appreciate for graph databases and their history.
If you are looking for a self-improvement reading program, you could do far worse than starting with this article and reading the cited references one by one.
Posted in Database, Graphs, Networks | No Comments »
Friday, February 8th, 2013
History SPOT
I discovered this site via a post entitled: Text Mining for Historians: Natural Language Processing.
From the webpage:
Welcome to History SPOT. This is a subsite of the IHR [Institute of Historical Research] website dedicated to our online research training provision. On this page you will find the latest updates regarding our seminar podcasts, online training courses and History SPOT blog posts.
Currently offered online training courses (free registration required):
- Designing Databases for Historians
- Podcasting for Historians
- Sources for British History on the Internet
- Data Preservation
- Digital Tools
- InScribe Palaeography
Not to mention over 300 pod casts!
Two thoughts:
First, a good way to learn about the tools and expectations that historians have of their digital tools. That should help you prepare an answer to: “What do topic maps have to offer over X technology?”
Second, I rather like the site and its module orientation. A possible template for topic map training online?
Posted in Data Preservation, Database, History, Natural Language Processing | No Comments »
Wednesday, February 6th, 2013
Oracle’s MySQL 5.6 released
From the post:
Just over two years after the release of MySQL 5.5, the developers at Oracle have released a GA (General Availability) version of Oracle MySQL 5.6, labelled MySQL 5.6.10. In MySQL 5.5, the developers replaced the old MyISAM backend and used the transactional InnoDB as the default for database tables. With 5.6, the retrofitting of full-text search capabilities has enabled InnoDB to now take on the position of default storage engine for all purposes.
Accelerating the performance of sub-queries was also a focus of development; they are now run using a process of semi-joins and materialise much faster; this means it should not be necessary to replace subqueries with joins. Many operations that change the data structures, such as ALTER TABLE, are now performed online, which avoids long downtimes. EXPLAIN also gives information about the execution plans of UPDATE, DELETE and INSERT commands. Other optimisations of queries include changes which can eliminate table scans where the query has a small LIMIT value.
MySQL’s row-oriented replication now supports “row image control” which only logs the columns needed to identify and make changes on each row rather than all the columns in the changing row. This could be particularly expensive if the row contained BLOBs, so this change not only saves disk space and other resources but it can also increase performance. “Index Condition Pushdown” is a new optimisation which, when resolving a query, attempts to use indexed fields in the query first, before applying the rest of the WHERE condition.
MySQL 5.6 also introduces a “NoSQL interface” which uses the memcached API to offer applications direct access to the InnoDB storage engine while maintaining compatibility with the relational database engine. That underlying InnoDB engine has also been enhanced with persistent optimisation statistics, multithreaded purging and more system tables and monitoring data available.
Download MySQL 5.6.
I mentioned Oracle earlier today (When Oracle bought MySQL [Humor]) so it’s only fair that I point out their most recent release of MySQL.
Posted in Database, MySQL, NoSQL, Oracle | No Comments »
Wednesday, January 30th, 2013
A co-Relational Model of Data for Large Shared Data Banks by Erik Meijer and Gavin Bierman.
I missed this when it appeared in March of 2011.
From the conclusion:
The nascent noSQL market is extremely fragmented, with many competing vendors and technologies. Programming, deploying, and managing noSQL solutions requires specialized and low-level knowledge that does not easily carry over from one vendor’s product to another.
A necessary condition for the network effect to take off in the noSQL database market is the availability of a common abstract mathematical data model and an associated query language for noSQL that removes product differentiation at the logical level and instead shifts competition to the physical and operational level. The availability of such a common mathematical underpinning of all major noSQL databases can provide enough critical mass to convince businesses, developers, educational institutions, etc. to invest in noSQL.
In this article we developed a mathematical data model for the most common form of noSQL—namely, key-value stores as the mathematical dual of SQL’s foreign-/primary-key stores. Because of this deep and beautiful connection, we propose changing the name of noSQL to coSQL. Moreover, we show that monads and monad comprehensions (i.e., LINQ) provide a common query mechanism for both SQL and coSQL and that many of the strengths and weaknesses of SQL and coSQL naturally follow from the mathematics.
The ACM Digital Library reports only 3 citations, which is unfortunate for such an interesting proposal.
I have heard about key/value pairs somewhere else. I will have to think about that and get back to you. (Hint for the uninitiated, try the Topic Maps Reference Model (TMRM). A new draft of the TMRM is due to appear in a week or so.)
Posted in Category Theory, Database, NoSQL, SQL, TMRM | No Comments »
Tuesday, January 29th, 2013
A Formalism for Graph Databases and its Model of Computation by Tony Tan and Juan Reutter.
Abstract:
Graph databases are directed graphs in which the edges are labeled with symbols from a finite alphabet. In this paper we introduce a logic for such graphs in which the domain is the set of edges. We compare its expressiveness with the standard logic in which the domain the set of vertices. Furthermore, we introduce a robust model of computation for such logic, the so called graph pebble automata.
The abstract doesn’t really do justice to the importance of this paper for graph analysis. From the paper:
For querying graph structured data, one normally wishes to specify certain types of paths between nodes. Most common examples of these queries are conjunctive regular path queries [1, 14, 6, 3]. Those querying formalisms have been thoroughly studied, and their algorithmic properties are more or less understood. On the other hand, there has been much less work devoted on other formalisms other than graph reachability patterns, say, for example, the integrity constraints such as labels with unique names, typing constraints on nodes, functional dependencies, domain and range of properties. See, for instance, the survey [2] for more examples of integrity constraints.
The survey referenced in that quote is: Renzo Angles and Claudio Gutierrez. 2008. Survey of graph database models. ACM Comput. Surv. 40, 1, Article 1 (February 2008), 39 pages. DOI=10.1145/1322432.1322433 http://doi.acm.org/10.1145/1322432.1322433.
The abstract for the survey reads:
Graph database models can be defined as those in which data structures for the schema and instances are modeled as graphs or generalizations of them, and data manipulation is expressed by graph-oriented operations and type constructors. These models took off in the eighties and early nineties alongside object-oriented models. Their influence gradually died out with the emergence of other database models, in particular geographical, spatial, semistructured, and XML. Recently, the need to manage information with graph-like nature has reestablished the relevance of this area. The main objective of this survey is to present the work that has been conducted in the area of graph database modeling, concentrating on data structures, query languages, and integrity constraints.
Recommended if you want to build upon what is already known and well-established about graph databases.
Posted in Database, Graphs | No Comments »
Saturday, January 12th, 2013
Schemaless Data Structures by Martin Fowler.
From the first slide:
In recent years, there’s been an increasing amount of talk about the advantages of schemaless data. Being schemaless is one of the main reasons for interest in NoSQL databases. But there are many subtleties involved in schemalessness, both with respect to databases and in-memory data structures. These subtleties are present both in the meaning of schemaless and in the advantages and disadvantages of using a schemaless approach.
Martin points out that “schemaless” does not mean the lack of a schema but rather the lack of an explicit schema.
Sounds a great deal like the implicit subjects that topic maps have the ability to make explicit.
Is there a continuum of explicitness for any given subject/schema?
Starting from entirely implied, followed by an explicit representation, then further explication as in a data dictionary, and at some distance from the start, a subject defined as a set of properties, which are themselves defined as sets of properties, in relationships with other sets of properties.
How far you go down that road depends on your requirements.
Posted in Data Structures, Database, Schema | No Comments »
Wednesday, December 12th, 2012
ArangoDB
From the webpage:
A universal open-source database with a flexible data model for documents, graphs, and key-values. Build high performance applications using a convenient sql-like query language or JavaScript/Ruby extensions.
Design considerations:
In a nutshell:
- Schema-free schemas with shapes: Inherent structures at hand are automatically recognized and subsequently optimized.
- Querying: ArangoDB is able to accomplish complex operations on the provided data (query-by-example and query-language).
- Application Server: ArangoDB is able to act as application server on Javascript-devised routines.
- Mostly memory/durability: ArangoDB is memory-based including frequent file system synchronizing.
- AppendOnly/MVCC: Updates generate new versions of a document; automatic garbage collection.
- ArangoDB is multi-threaded.
- No indices on file: Only raw data is written on hard disk.
- ArangoDB supports single nodes and small, homogenous clusters with zero administration.
I have mentioned this before but ran across it again at: An experiment with Vagrant and Neo4J by Patrick Mulder.
Posted in ArangoDB, Database, NoSQL | No Comments »
Thursday, December 6th, 2012
Introduction to Databases (info/registration link) – Starts January 15, 2013.
From the webpage:
About the Course
“Introduction to Databases” had a very successful public offering in fall 2011, as one of Stanford’s inaugural three massive open online courses. Since then, the course materials have been improved and expanded, and we’re excited to be launching a second public offering of the course in winter 2013. The course includes video lectures and demos with in-video quizzes to check understanding, in-depth standalone quizzes, a wide variety of automatically-checked interactive programming exercises, midterm and final exams, a discussion forum, optional additional exercises with solutions, and pointers to readings and resources. Taught by Professor Jennifer Widom, the curriculum draws from Stanford’s popular Introduction to Databases course.
Why Learn About Databases?
Databases are incredibly prevalent — they underlie technology used by most people every day if not every hour. Databases reside behind a huge fraction of websites; they’re a crucial component of telecommunications systems, banking systems, video games, and just about any other software system or electronic device that maintains some amount of persistent information. In addition to persistence, database systems provide a number of other properties that make them exceptionally useful and convenient: reliability, efficiency, scalability, concurrency control, data abstractions, and high-level query languages. Databases are so ubiquitous and important that computer science graduates frequently cite their database class as the one most useful to them in their industry or graduate-school careers.
Course Syllabus
This course covers database design and the use of database management systems for applications. It includes extensive coverage of the relational model, relational algebra, and SQL. It also covers XML data including DTDs and XML Schema for validation, and the query and transformation languages XPath, XQuery, and XSLT. The course includes database design in UML, and relational design principles based on dependencies and normal forms. Many additional key database topics from the design and application-building perspective are also covered: indexes, views, transactions, authorization, integrity constraints, triggers, on-line analytical processing (OLAP), JSON, and emerging NoSQL systems. Working through the entire course provides comprehensive coverage of the field, but most of the topics are also well-suited for “a la carte” learning.
Biography
Jennifer Widom is the Fletcher Jones Professor and Chair of the Computer Science Department at Stanford University. She received her Bachelors degree from the Indiana University School of Music in 1982 and her Computer Science Ph.D. from Cornell University in 1987. She was a Research Staff Member at the IBM Almaden Research Center before joining the Stanford faculty in 1993. Her research interests span many aspects of nontraditional data management. She is an ACM Fellow and a member of the National Academy of Engineering and the American Academy of Arts & Sciences; she received the ACM SIGMOD Edgar F. Codd Innovations Award in 2007 and was a Guggenheim Fellow in 2000; she has served on a variety of program committees, advisory boards, and editorial boards.
Another reason to take the course:
The structure and capabilities of databases shape the way we create solutions.
Consider normalization. An investment of time and effort that may be needed, for some problems, but not others.
Absent alternative approaches, you see every data problem as requiring normalization.
(You may anyway after taking this course. Education cannot impart imagination.)
Posted in CS Lectures, Database | No Comments »
Tuesday, November 20th, 2012
Towards a Scalable Dynamic Spatial Database System by Joaquín Keller, Raluca Diaconu, Mathieu Valero.
Abstract:
With the rise of GPS-enabled smartphones and other similar mobile devices, massive amounts of location data are available. However, no scalable solutions for soft real-time spatial queries on large sets of moving objects have yet emerged. In this paper we explore and measure the limits of actual algorithms and implementations regarding different application scenarios. And finally we propose a novel distributed architecture to solve the scalability issues.
At least in this version, you will find two copies of the same paper, the second copy sans the footnotes. So read the first twenty (20) pages and ignore the second eighteen (18) pages.
I thought the limitation of location to two dimensions understandable, for the use cases given, but am less convinced that treating a third dimension as an extra attribute is always going to be suitable.
Still, the results here are impressive as compared to current solutions so an additional dimension can be a future improvement.
The use case that I see missing is an ad hoc network of users feeding geo-based information back to a collection point.
While the watchers are certainly watching us, technology may be on the cusp of answering the question: “Who watches the watchers?” (The answer may be us.)
I first saw this in a tweet by Stefano Bertolo.
Posted in Database, Geographic Data, Geographic Information Retrieval, Spatial Index | No Comments »
Thursday, November 1st, 2012
SQL-99 Complete, Really by Peter Gulutzan & Trudy Pelzer.
From the preface:
If you’ve ever used a relational database product, chances are that you’re already familiar with SQL — the internationally-accepted, standard programming language for databases whic is supported by the vast majority of relational database management system (DBMS) products available today. You may also have noticed that, despite the large number of “reference” works that claim to describe standard SQL, not a single one provides a complete, accurate and example-filled description of the entire SQL Standard. This book was written to fill that void.
True, this is the SQL-99 standard.
I collect old IT standards and books about old IT standards. The standards we draft today address issues that have been seen before, just not dressed in current fashion.
By attempting to understand what worked and what perhaps didn’t in older standards, we can make new mistakes instead of repeating old ones.
Posted in Database, SQL | No Comments »
Tuesday, October 30th, 2012
Summary and Links for CAP Articles on IEEE Computer Issue by Alex Popescu.
From the post:
Daniel Abadi has posted a quick summary of the articles signed by Eric Brewer, Seth Gilbert and Nancy Lynch, Daniel Abadi, Raghu Ramakrishnan, Ken Birman, Daniel Freedman, Qi Huang, and Patrick Dowell for the IEEE Computer issue dedicated to the CAP theorem. Plus links to most of them:
Be sure to read Daniel’s comments as carefully as you read the IEEE articles.
Posted in CAP, Database | No Comments »
Sunday, October 14th, 2012
Big data cube by John D. Cook.
From the post:
Erik Meijer’s paper Your Mouse is a Database has an interesting illustration of “The Big Data Cube” using three axes to classify databases.
Enjoy John’s short take, then spend some time with Erik’s paper.
Some serious time with Erik’s paper.
You won’t be disappointed.
Posted in BigData, Database, NoSQL | No Comments »
Friday, October 5th, 2012
JugglingDB
From the webpage:
JugglingDB is cross-db ORM, providing common interface to access most popular database formats. Currently supported are: mysql, mongodb, redis, neo4j and js-memory-storage (yep, self-written engine for test-usage only). You can add your favorite database adapter, checkout one of the existing adapters to learn how, it’s super-easy, I guarantee.
For those of you communing with your favourite databases this weekend.
Posted in Database, ORM | No Comments »
Thursday, October 4th, 2012
PostgreSQL Database Modeler
From the readme file at github:
PostgreSQL Database Modeler, or simply, pgModeler is an open source tool for modeling databases that merges the classical concepts of entity-relationship diagrams with specific features that only PostgreSQL implements. The pgModeler translates the models created by the user to SQL code and apply them onto database clusters from version 8.0 to 9.1.
Other modeling tools you have or are likely to encounter writing topic maps?
When the output of diverse modeling tools or diverse output from the same modeling tool needs semantic reconciliation, I would turn to topic maps.
I first saw this at DZone.
Posted in Database, Modeling, PostgreSQL | No Comments »
Saturday, September 29th, 2012
Amazon RDS Now Supports SQL Server 2012
From the post:
The Amazon Relational Database Service (RDS) now supports SQL Server 2012.You can now launch the Express, Web, and Standard Editions of this powerful database from the comfort of the AWS Management Console. SQL Server 2008 R2 is still available, as are multiple versions and editions of MySQL and Oracle Database.
If you are from the Microsoft world and haven't heard of RDS, here's the executive summary: You can run the latest and greatest offering from Microsoft in a fully managed environment. RDS will install and patch the database, make backups, and detect and recover from failures. It will also provide you with a point-and-click environment to make it easy for you to scale your compute resources up and down as needed.
What's New?
SQL Server 2012 supports a number of new features including contained databases, columnstore indexes, sequences, and user-defined roles:
- A contained database is isolated from other SQL Server databases including system databases such as "master." This isolation removes dependencies and simplifies the task of moving databases from one instance of SQL Server to another.
- Columnstore indexes are used for data warehouse style queries. Used properly, they can greatly reduce memory consumption and I/O requests for large queries.
- Sequences are counters that can be used in more than one table.
- The new user-defined role management system allows users to create custom server roles.
Read the SQL Server What's New documentation to learn more about these and other features.
I almost missed this!
It is about the only way I am going to get to play with SQL Server. I don’t have a local Windows sysadmin to maintain the server, etc.
Posted in Database, SQL Server | No Comments »
Sunday, September 23rd, 2012
What if all transactions required strict global consistency? by Matthew Aslett.
Matthew quotes Basho CTO Justin Sheehy on eventual consistency and traditional accounting:
“Traditional accounting is done in an eventually-consistent way and if you send me a payment from your bank to mine then that transaction will be resolved in an eventually consistent way. That is, your bank account and mine will not have a jointly-atomic change in value, but instead yours will have a debit and mine will have a credit, each of which will be applied to our respective accounts.”
And Matthew comments:
The suggestion that bank transactions are not immediately consistent appears counter-intuitive. Comparing what happens in a transaction with a jointly atomic change in value, like buying a house, with what happens in normal transactions, like buying your groceries, we can see that for normal transactions this statement is true.
We don’t need to wait for the funds to be transferred from our accounts to a retailer before we can walk out the store. If we did we’d all waste a lot of time waiting around.
This highlights a couple of things that are true for both database transactions and financial transactions:
- that eventual consistency doesn’t mean a lack of consistency
- that different transactions have different consistency requirements
- that if all transactions required strict global consistency we’d spend a lot of time waiting for those transactions to complete.
All of which is very true but misses an important point about financial transctions.
Financial transactions (involving banks, etc.) are eventually consistent according to the same rules.
That’s no accident. It didn’t just happen that banks adopted ad hoc rules that resulted in a uniform eventual consistency.
It didn’t happen over night but the current set of rules for “uniform eventual consistency” of banking transactions are spelled out by the Uniform Commercial Code. (And other laws, regulations but that is a major part of it.)
Dare we say a uniform semantic for financial transactions was hammered out without the use of formal ontologies or web addresses? And that it supports billions of transactions on a daily basis? To become eventually consistent?
Think about the transparency (to you) of your next credit card transaction. Standards and eventual consistency make that possible.
Posted in Consistency, Database, Finance Services, Law, Law - Sources | No Comments »
Saturday, September 22nd, 2012
The Stages of Database Development (video) by Jeremiah Peschka.
The description:
Strong development practices don’t spring up overnight; they take time, effort, and teamwork. Database development practices are doubly hard because they involve many moving pieces – unit testing, integration testing, and deploying changes that could have potential side effects beyond changing logic. In this session, Microsoft SQL Server MVP Jeremiah Peschka will discuss ways users can move toward a healthy cycle of database development using version control, automated testing, and rapid deployment.
Nothing you haven’t heard before in one form or another.
Question: How does your database environment compare to the one Jeremiah describes?
(Never mind that you have “reasons” (read excuses) for the current state of your database environment.)
Doesn’t just happen with databases or even servers.
What about your topic map development environment?
Or other development environment.
Looking forward to a sequel (sorry) to this video.
Posted in Database, Design | No Comments »
Sunday, September 16th, 2012
Spanner : Google’s globally distributed database
From the post:
This paper, whose co-authors include Jeff Dean and Sanjay Ghemawat of MapReduce fame, describes Spanner. Spanner is Google’s scalable, multi-version, globally distributed, and synchronously-replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. Finally the paper comes out! Really exciting stuff!
Abstract from the paper:
Spanner is Google’s scalable, multi-version, globally-distributed, and synchronously-replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This paper describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful features: non-blocking reads in the past, lock-free read-only transactions, and atomic schema changes, across all of Spanner.
Spanner: Google’s Globally Distributed Database (PDF File)
Facing user requirements, Google did not say: Suck it up and use tools already provided.
Google engineered new tools to meet their requirements.
Is there a lesson there for other software projects?
Posted in Database, Distributed Systems | No Comments »
Wednesday, September 12th, 2012
PostgreSQL 9.2 released
From the announcement:
The PostgreSQL Global Development Group announces PostgreSQL 9.2, the latest release of the leader in open source databases. Since the beta release was announced in May, developers and vendors have praised it as a leap forward in performance, scalability and flexibility. Users are expected to switch to this version in record numbers.
“PostgreSQL 9.2 will ship with native JSON support, covering indexes, replication and performance improvements, and many more features. We are eagerly awaiting this release and will make it available in Early Access as soon as it’s released by the PostgreSQL community,” said Ines Sombra, Lead Data Engineer, Engine Yard.
Links
Downloads, including packages and installers
Release Notes
Documentation
What’s New in 9.2
Press Kit
New features like range types:
Range types are used to store a range of data of a given type. There are a few pre-defined types. They are integer (int4range), bigint (int8range), numeric (numrange), timestamp without timezone (tsrange), timestamp with timezone (tstzrange), and date (daterange).
Ranges can be made of continuous (numeric, timestamp…) or discrete (integer, date…) data types. They can be open (the bound isn’t part of the range) or closed (the bound is part of the range). A bound can also be infinite.
Without these datatypes, most people solve the range problems by using two columns in a table. These range types are much more powerful, as you can use many operators on them.
have captured my attention.
Now to look at other new features: Index-only scans, Replication improvements and JSON datatype.
Posted in Database, PostgreSQL | No Comments »
Tuesday, September 4th, 2012
The events page for XLDB has:
XLDB 2011 (Slides/Videos), as well as reports back to the 1st XLDB workshop.
Check back to find later proceedings.
Posted in Database, XLDB | No Comments »
Wednesday, August 22nd, 2012
VLDB 2012 Advance Program
I took this text from the conference homepage:
VLDB is a premier annual international forum for data management and database researchers, vendors, practitioners, application developers, and users. The conference will feature research talks, tutorials, demonstrations, and workshops. It will cover current issues in data management, database and information systems research. Data management and databases remain among the main technological cornerstones of emerging applications of the twenty-first century.
I can’t think of a better summary of the papers, tutorials, etc., that you will find here.
I could easily lose the better part of a week just skimming abstracts.
Suggestion/comments?
Posted in CS Lectures, Database | No Comments »