## Archive for the ‘PostgreSQL’ Category

### How To Legally Dick With The NSA – PostgreSQL 10 Beta 1

Thursday, May 18th, 2017

The release of PostgreSQL 10 Beta 1 gives everyone an opportunity to legally dick with the NSA.

In Stop Blaming NSA For The Ransomware Attack, Patrick Tucker repeats claims by the NSA that about 80% of vulnerabilities are revealed and 20% are concealed by the NSA.

Which means if there are 10 security vulnerabilities in PostgreSQL 10 Beta 1, the NSA will keep two for themselves.

Let’s disappoint them on that score. With widespread community testing, fuzzing, etc., the NSA score on PostgreSQL 10 Beta 1 could be zero.

That won’t help vendors with 70 million lines of closed source databases (look for Mary Ann Davidson). Such databases may have true accidental vulnerabilities or ones introduced by NSA ringers.

If NSA ringers working for closed source companies sounds like tin-hat conspiracy theory, recall the NSA is barred from spying on American citizens at all. In fact, they have vehemently denied it. At least until they admitted they were lying and in fact spying on all American citizens.

Also bear in mind that the NSA was a participant in many of the covert/overt attempts by the United States to influence elections in other countries. (Dov H. Levin, as of May 18, 2017, the datasets are forthcoming. See also: Database Tracks History Of U.S. Meddling In Foreign Elections, an NPR interview that counts 80 US-backed efforts to interfere in elections.)

On the technical front, the NSA is known to have intentionally damaged a U.S. cryptography standard. NSA Efforts to Evade Encryption Technology Damaged U.S. Cryptography Standard. That report isn’t from a blog that is a continuation of a photocopied version of a mimeographed conspiracy report found in low-end coffee shops.

No, the damage to U.S. cryptography report appears in Scientific American.

I can’t honestly name one illegal, immoral, unethical, act that the NSA is not capable of.

You?

Beyond “sticking to the NSA,” database researchers and users have these PostgreSQL 10 Beta 1 features to enjoy:

The PostgreSQL Global Development Group announces today that the first beta release of PostgreSQL 10 is available for download. This release contains previews of all of the features which will be available in the final release of version 10, although some details will change before then. Users are encouraged to begin testing their applications against this latest release.

#### Major Features of 10

The new version contains multiple features that will allow users to both scale out and scale up their PostgreSQL infrastructure:

• Logical Replication: built-in option for replicating specific tables or using replication to upgrade
• Native Table Partitioning: range and list partitioning as native database objects
• Additional Query Parallelism: including index scans, bitmap scans, and merge joins
• Quorum Commit for Synchronous Replication: ensure against loss of multiple nodes

We have also made three improvements to PostgreSQL connections, which we are calling on driver authors to support, and users to test:

• SCRAM Authentication, for more secure password-based access
• Multi-host “failover”, connecting to the first available in a list of hosts
• target_session_attrs parameter, so a client can request a read/write host

Many other new features and improvements have been added to PostgreSQL 10, some of which may be as important, or more important, to specific users than the above. Certainly all of them require testing. Among them are:

• Crash-safe and replicable Hash Indexes
• Multi-column Correlation Statistics
• New “monitoring” roles for permission grants
• Latch Wait times in pg_stat_activity
• XMLTABLE query expression
• Restrictive Policies for Row Level Security
• Full Text Search support for JSON and JSONB
• ICU collation support
• Push Down Aggregates to foreign servers
• Transition Tables in trigger execution

Further, developers have contributed performance improvements in the SUM() function, character encoding conversion, expression evaluation, grouping sets, and joins against unique columns. Analytics queries against large numbers of rows should be up to 40% faster. Please test if these are faster for you and report back.

See the Release Notes for a complete list of new and changed features.

Make the lives of PostgreSQL users everywhere better and the lives of government intelligence services around the world worse!

I call that a win-win situation.

### PostgreSQL 9.5: UPSERT, Row Level Security, and Big Data

Thursday, January 7th, 2016

PostgreSQL 9.5: UPSERT, Row Level Security, and Big Data

Let’s reverse the order of the announcement, to be in reader-friendly order:

Press kit

Release Notes

What’s New in 9.5

Edit: I moved my comments above the fold as it were:

Just so you know, PostgreSQL 9.5 documentation, 9.14.2.2 XMLEXISTS says:

Also note that the SQL standard specifies the xmlexists construct to take an XQuery expression as first argument, but PostgreSQL currently only supports XPath, which is a subset of XQuery.

Apologies, you will have to scroll for the subsection, there was no anchor at 9.14.2.2.

If you are looking to make a major contribution to PostgreSQL, note that XQuery is on the todo list.

Now for all the stuff that you will skip reading anyway. 😉

(I would save the prose for use in reports to management about using or transitioning to PostgreSQL 9.5.)

7 JANUARY 2016: The PostgreSQL Global Development Group announces the release of PostgreSQL 9.5. This release adds UPSERT capability, Row Level Security, and multiple Big Data features, which will broaden the user base for the world’s most advanced database. With these new capabilities, PostgreSQL will be the best choice for even more applications for startups, large corporations, and government agencies.

Annie Prévot, CIO of the CNAF, the French Child Benefits Office, said, “The CNAF is providing services for 11 million persons and distributing 73 billion Euros every year, through 26 types of social benefit schemes. This service is essential to the population and it relies on an information system that must be absolutely efficient and reliable. The CNAF’s information system is satisfyingly based on the PostgreSQL database management system.”

## UPSERT

A most-requested feature by application developers for several years, “UPSERT” is shorthand for “INSERT, ON CONFLICT UPDATE”, allowing new and updated rows to be treated the same. UPSERT simplifies web and mobile application development by enabling the database to handle conflicts between concurrent data changes. This feature also removes the last significant barrier to migrating legacy MySQL applications to PostgreSQL.

Developed over the last two years by Heroku programmer Peter Geoghegan, PostgreSQL’s implementation of UPSERT is significantly more flexible and powerful than those offered by other relational databases. The new ON CONFLICT clause permits ignoring the new data, or updating different columns or relations in ways which will support complex ETL (Extract, Transform, Load) toolchains for bulk data loading. And, like all of PostgreSQL, it is designed to be absolutely concurrency-safe and to integrate with all other PostgreSQL features, including Logical Replication.

## Row Level Security

PostgreSQL continues to expand database security capabilities with its new Row Level Security (RLS) feature. RLS implements true per-row and per-column data access control which integrates with external label-based security stacks such as SE Linux. PostgreSQL is already known as “the most secure by default.” RLS cements its position as the best choice for applications with strong data security requirements, such as compliance with PCI, the European Data Protection Directive, and healthcare data protection standards.

RLS is the culmination of five years of security features added to PostgreSQL, including extensive work by KaiGai Kohei of NEC, Stephen Frost of Crunchy Data, and Dean Rasheed. Through it, database administrators can set security “policies” which filter which rows particular users are allowed to update or view. Data security implemented this way is resistant to SQL injection exploits and other application-level security holes.

## Big Data Features

PostgreSQL 9.5 includes multiple new features for bigger databases, and for integrating with other Big Data systems. These features ensure that PostgreSQL continues to have a strong role in the rapidly growing open source Big Data marketplace. Among them are:

BRIN Indexing: This new type of index supports creating tiny, but effective indexes for very large, “naturally ordered” tables. For example, tables containing logging data with billions of rows could be indexed and searched in 5% of the time required by standard BTree indexes.

Faster Sorts: PostgreSQL now sorts text and NUMERIC data faster, using an algorithm called “abbreviated keys”. This makes some queries which need to sort large amounts of data 2X to 12X faster, and can speed up index creation by 20X.

CUBE, ROLLUP and GROUPING SETS: These new standard SQL clauses let users produce reports with multiple levels of summarization in one query instead of requiring several. CUBE will also enable tightly integrating PostgreSQL with more Online Analytic Processing (OLAP) reporting tools such as Tableau.

Foreign Data Wrappers (FDWs): These already allow using PostgreSQL as a query engine for other Big Data systems such as Hadoop and Cassandra. Version 9.5 adds IMPORT FOREIGN SCHEMA and JOIN pushdown making query connections to external databases both easier to set up and more efficient.

TABLESAMPLE: This SQL clause allows grabbing a quick statistical sample of huge tables, without the need for expensive sorting.

“The new BRIN index in PostgreSQL 9.5 is a powerful new feature which enables PostgreSQL to manage and index volumes of data that were impractical or impossible in the past. It allows scalability of data and performance beyond what was considered previously attainable with traditional relational databases and makes PostgreSQL a perfect solution for Big Data analytics,” said Boyan Botev, Lead Database Administrator, Premier, Inc.

### pgcli [Inspiration for command line tool for XPath/XQuery?]

Tuesday, January 20th, 2015

pgcli

From the webpage:

Pgcli is a command line interface for Postgres with auto-completion and syntax highlighting.

Postgres folks who don’t know about pgcli will be glad to see this post.

But, having spent several days with XPath/XQuery/FO 3.1 syntax, I can only imagine the joy in XML circles for a similar utility for use with command line XML tools.

Properly done, the increase in productivity would be substantial.

The same applies for your favorite NoSQL query language. (Datomic?)

Will SQL users be the only ones with such a command line tool?

I first saw this in a tweet by elishowk.

### The Big Book of PostgreSQL

Sunday, November 30th, 2014

The Big Book of PostgreSQL by Thom Brown.

From the post:

Documentation is crucial to the success of any software program, particularly open source software (OSS), where new features and functionality are added by many contributors. Like any OSS, PostgreSQL needs to produce accurate, consistent and reliable documentation to guide contributors’ work and reflect the functionality of every new contribution. Documentation also an important source of information for developers, administrators and other end users as they will take actions or base their work on the functionality described in the documentation. Typically, the author of a new feature provides the relevant documentation changes to the project, and that person can be anyone in any role in IT. So it can really come from anywhere.

Postgres documentation is extensive (you can check out the latest 9.4 documentation here). In fact, the U.S. community PDF document is 2,700 pages long. It would be a mighty volume and pretty unwieldy if published as a physical book. The Postgres community is keenly aware that the quality of documentation can make or break an open source project, and thus regularly updates and improves our documentation, a process I’ve appreciated being able to take part in.

A recent podcast, Solr Usability with Steve Rowe & Tim Potter goes to some lengths to describe efforts to improve Solr documentation.

If you know anyone in the Solr community, consider this a shout out that PostgreSQL documentation isn’t a bad example to follow.

Saturday, November 22nd, 2014

Jarrod C. Taylor writes in part 1:

Introduction

Clojure is a great language that is continuing to improve itself and expand its user base year over year. The Clojure ecosystem has many great libraries focused on being highly composable. This composability allows developers to easily build impressive applications from seemingly simple parts. Once you have a solid understanding of how Clojure libraries fit together, integration between them can become very intuitive. However, if you have not reached this level of understanding, knowing how all of the parts fit together can be daunting. Fear not, this series will walk you through start to finish, building a tested compojure web app backed by a Postgres Database.

Where We Are Going

The project we will build and test over the course of this blog series is an address book application. We will build the app using ring and Compojure and persist the data in a Postgres Database. The app will be a traditional client server app with no JavaScript. Here is a teaser of the final product.

Not that I need another address book but as an exercise in onboarding, this rocks!

(see above)

Recap and Restructure

So far we have modified the default Compojure template to include a basic POST route and used Midje and Ring-Mock to write a test to confirm that it works. Before we get started with templates and creating our address book we should provide some additional structure to our application in an effort to keep things organized as the project grows.

Introduction

In this installment of the address book series we are finally ready to start building the actual application. We have laid all of the ground work required to finally get to work.

Persisting Data in Postgres

At this point we have an address book that will allow us to add new contacts. However, we are not persisting our new additions. It’s time to change that. You will need to have Postgres installed. If you are using a Mac, postgresapp is a very simple way of installing. If you are on another OS you will need to follow the install instructions from the Postgres website.

Once you have Postgres installed and running we are going to create a test user and two databases.

The Finish Line

Our address book application has finally taken shape and we are in a position to put the finishing touches on it. All that remains is to allow the user the ability to edit and delete existing contacts.

One clever thing Jarrod has done is post all five (5) parts to this series on one day. You can go as fast or as slow as you choose to go.

Another clever thing is that testing is part of the development process.

How many programmers actually incorporate testing day to day? Given the prevalence of security bugs (to say nothing at all of other bugs), I would say less than one hundred percent (100%).

You?

How much less than 100% I won’t hazard a guess.

### PgOSQuery [OSQuery for Postgres]

Thursday, November 6th, 2014

PgOSQuery

From the webpage:

So I saw Facebook’s OSQuery, and thought “That looks awesome, but complicated to build on top of SQLite. Postgres’ Foreign Data Wrappers seem like a much better foundation. How long would it take to write the same app on top of Postgres?”. Turns out it takes about 15 minutes, for someone who’s never written an FDW before 🙂

This approach does have the downside that it runs as the postgres user rather than as root, so it can’t see the full details of other people’s processes, but I’m sure that could be worked around if you really want to.

Currently this is just a proof-of-concept to see how useful Postgres’ foreign data wrappers are, and how easy they are to create with the Multicorn python library. Seems the answers are “very useful” and “very easy”. If people want to make this more useful by adding more virtual tables, pull requests are welcome~

The system information captured by OSQuery and PgOSQuery is always present. But in order to talk about it (in terms of recorded information), you must capture that information and, just as importantly, have a method to associate your comments with that information.

Any database could capture the system information captured by OSQuery and PgSQLQuery. But having captured it, how do you talk about the column headers for the data? Data dictionaries are an option if your database supports them, but then how do you talk about the entry in your data dictionary?

Not that you are required to talk about entries in your data dictionary but it should be a design choice to not talk about data dictionary entries, not a default cone of silence.

### Postgres-XL

Tuesday, May 13th, 2014

Database vendor open sources Postgres-XL for scale-out workloads by Derrick Harris.

From the post:

Database software company TransLattice has rebuilt the popular PostgreSQL database as a clustered system designed for handling large datasets. The open-source product, called Postgres-XL, is designed to be just like regular Postgres, only more scalable and also functional as a massively parallel analytic database.

That’s interesting news but I puzzled over a comment that Derrick makes:

Postgres-XL is among a handful of recent attempts to turn Postgres into a more fully functional database.

True, there are projects to add features to Postgres that it previously did not have but I would not characterize them as making Postgres “…a more fully functional database.”

PostgreSQL is already a fully functional database, a rather advanced one. It may lack some feature someone imagines as useful, but the jury is still out on whether such added “features” are features or cruft.

### Global Biodiversity Information Facility

Wednesday, October 9th, 2013

Global Biodiversity Information Facility

Some stats:

417,165,184 occurrences

1,426,888 species

11,976 data sets

578 data publishers

What lies at the technical heart of this beast?

Would you believe a PostgreSQL database and an embedded Apache SOLR index?

Start with the Summary of the GBIF infrastructure. The details on PostgreSQL and Solr are under the Registry tab.

BTW, the system recognizes multiple identification systems and more are to be added.

Need to read more of the documents on that part of the system.

### PostgreSQL 9.3 released!

Wednesday, September 11th, 2013

PostgreSQL 9.3 released!

From the post:

The PostgreSQL Global Development Group announces the release of PostgreSQL 9.3, the latest version of the world’s leading open source relational database system. This release expands PostgreSQL’s reliability, availability, and ability to integrate with other databases. Users are already finding that they can build applications using version 9.3 which would not have been possible before.

“PostgreSQL 9.3 provides features that as an app developer I can use immediately: better JSON functionality, regular expression indexing, and easily federating databases with the Postgres foreign data wrapper. I have no idea how I completed projects without 9.3,” said Jonathan S. Katz, CTO of VenueBook.

From the what’s new page, an item of particular interest:

Writeable Foreign Tables:

“Foreign Data Wrappers” (FDW) were introduced in PostgreSQL 9.1, providing a way of accessing external data sources from within PostgreSQL using SQL. The original implementation was read-only, but 9.3 will enable write access as well, provided the individual FDW drivers have been updated to support this. At the time of writing, only the Redis and PostgreSQL drivers have write support (need to verify this).

I haven’t gotten through the documentation on FDW but for data integration it sounds quite helpful.

Assuming you document the semantics of the data you are writing back and forth. 😉

A use case for a topic map that spans both the local and “foreign” data source or separate topic maps for the local and “foreign” data source that can then be merged together.

### Postgres and Full Text Indexes

Sunday, September 8th, 2013

After reading Jeff Larson’s account of his text mining adventures in ProPublica’s Jeff Larson on the NSA Crypto Story, I encountered a triplet of post from Gary Sieling on Postgres and full text indexes.

In order of appearance:

Fixing Issues Where Postgres Optimizer Ignores Full Text Indexes

GIN vs GiST For Faceted Search with Postgres Full Text Indexes

Querying Multiple Postgres Full-Text Indexes

If Postgres and full text indexing are project requirements, these are must read posts.

Gary does note in the middle post that Solr with default options (no tuning) out performs Postgres.

Solr would have been the better option for Jeff Larson when compared to Postgres.

But the difference in that case is a contrast between structured data and “dumpster data.”

It appears that the hurly-burly race to enable “connecting the dots” post-9/11:

Structural barriers to performing joint intelligence work. National intelligence is still organized around the collection disciplines of the home agencies, not the joint mission. The importance of integrated, all-source analysis cannot be overstated. Without it, it is not possible to “connect the dots.” No one component holds all the relevant information.

Yep, #1 with a bullet problem.

Response? From the Manning and Snowden leaks, one can only guess that “dumpster data” is the preferred solution.

By “dumpster data” I mean that data from different sources, agencies, etc., are simply dumped into a large data store.

No wonder the NSA runs 600,000 of queries a day or about 20 million queries a month. That is a lot of data dumpster diving.

Secrecy may be hiding that data from the public, but poor planning is hiding it from the NSA.

### Choosing a PostgreSQL text search method

Friday, August 30th, 2013

Choosing a PostgreSQL text search method by Craig Ringer.

From the post:

PostgreSQL offers several tools for searching and pattern matching text. The challenge is choosing which to use for a job. There’s:

There’s also SIMILAR TO, but we don’t speak of that in polite company, and PostgreSQL turns it into a regular expression anyway.

If you are thinking about running a PostgreSQL backend and need text searching, this will be a useful post for you.

I really appreciated Craig’s closing paragraph:

At no point did I try to determine whether LIKE or full-text search is faster for a given query. That’s because it usually doesn’t matter; they have different semantics. Which goes faster, a car or a boat? In most cases it doesn’t matter because speed isn’t your main selection criteria, it’s “goes on water” or “goes on land”.

Something to keep in mind with the “web scale” chorus comes along.

Most of the data of interest to me (not all) isn’t of web scale.

### Kindred Britain

Monday, August 26th, 2013

Kindred Britian by Nicholas Jenkins, Elijah Meeks and Scott Murray.

From the website:

Kindred Britain is a network of nearly 30,000 individuals — many of them iconic figures in British culture — connected through family relationships of blood, marriage, or affiliation. It is a vision of the nation’s history as a giant family affair.

A quite remarkable resource.

Family relationships connecting people, a person’s relationship to geographic locations and a host of other associated details for 30,000 people await you!

From the help page:

ESSAYS

Originating Kindred Britain by Nicholas Jenkins

Developing Kindred Britain by Elijah Meeks and Karl Grossner

Designing Kindred Britain by Scott Murray

Kindred Britain: Statistics by Elijah Meeks

GENERAL INFORMATION

User’s Guide by Hannah Abalos and Nicholas Jenkins

FAQs

Glossary by Hannah Abalos and Emma Townley-Smith

Acknowledgements

An acronym that may puzzle you: ODNB – Oxford Dictionary of National Biography.

In Developing Kindred Britain you will learn Kindred Britain has no provision for reader annotation or contribution of content.

Given a choice between the rich presentation and capabilities of Kindred Britain, which required several technical innovations and less capabilities but reader annotation, I would always choose the former over the latter.

You should forward the link to Kindred Britain to anyone working on robust exploration and display of data, academic or otherwise.

### K-nearest neighbour search for PostgreSQL [Data Types For Distance Operator]

Wednesday, May 29th, 2013

K-nearest neighbour search for PostgreSQL by Oleg Bartunov and Teodor Sigaev.

Excellent presentation from PGCon-2010 on the KNN index type in Postgres.

And an exception to the rule about wordy slides.

Or at least wordy slides are better for non-attendees.

KNN uses the <-> distance operator.

And the slides say:

distance operator, should be provided for data type

Looking at the Postgre documentation (9.3, but same for 9.1 and 9.2), I read:

In addition to the typical B-tree search operators, btree_gist also provides index support for <> (“not equals”). This may be useful in combination with an exclusion constraint, as described below.

Also, for data types for which there is a natural distance metric, btree_gist defines a distance operator <->, and provides GiST index support for nearest-neighbor searches using this operator. Distance operators are provided for int2, int4, int8, float4, float8, timestamp with time zone, timestamp without time zone, time without time zone, date, interval, oid, and money.

What collective subjects would you like to gather up using the defined data types for the distance operator?

How would you represent properties to take advantage of the defined data types?

Are there other data types that you would define for the distance operator?

### Open Source Release: postgresql-hll

Saturday, May 25th, 2013

Open Source Release: postgresql-hll

From the post:

We’re happy to announce the first open-source release of AK’s PostgreSQL extension for building and manipulating HyperLogLog data structures in SQL, postgresql-hll. We are releasing this code under the Apache License, Version 2.0 which we feel is an excellent balance between permissive usage and liability limitation.

### What is it and what can I do with it?

The extension introduces a new data type, hll, which represents a probabilistic distinct value counter that is a hybrid between a HyperLogLog data structure (for large cardinalities) and a simple set (for small cardinalities). These structures support the basic HLL methods: insert, union, and cardinality, and we’ve also provided aggregate and debugging functions that make using and understanding these things a breeze. We’ve also included a way to do schema versioning of the binary representations of hlls, which should allow a clear path to upgrading the algorithm, as new engineering insights come up.

A quick overview of what’s included in the release:

• C-based extension that provides the hll data structure and algorithms
• Austin Appleby’s MurmurHash3 implementation and SQL-land wrappers for integer numerics, bytes, and text
• Full storage specification in STORAGE.markdown
• Full function reference in REFERENCE.markdown
• .spec file for rpmbuild
• Full test suite

A quick note on why we included MurmurHash3 in the extension: we’ve done a good bit of research on the importance of a good hash function when using sketching algorithms like HyperLogLog and we came to the conclusion that it wouldn’t be very user-friendly to force the user to figure out how to get a good hash function into SQL-land. Sure, there are plenty of cryptographic hash functions available, but those are (computationally) overkill for what is needed. We did the research and found MurmurHash3 to be an excellent non-cryptographic hash function in both theory and practice. We’ve been using it in production for a while now with excellent results. As mentioned in the README, it’s of crucial importance to reliably hash the inputs to hlls.

Would you say topic maps aggregate data?

I thought so.

How would you adapt HLL to synonymous identifications?

I ask because of this line in the post:

Essentially, we’re doing interactive set-intersection of operands with millions or billions of entries in milliseconds. This is intersection computed using the inclusion-exclusion principle as applied to hlls:

Performing “…interactive set-intersection of operands with millions or billions of entries in milliseconds…” sounds like an attractive feature for topic map software.

Yes?

### Postgres Demystified

Friday, May 24th, 2013

From the description:

Postgres has long been known as a stable database product that reliably stores your data. However, in recent years it has picked up many features, allowing it to become a much sexier database.

This video covers a whirlwind of Postgres features, which highlight why you should consider it for your next project. These include: Datatypes Using other languages within Postgres Extensions including NoSQL inside your SQL database Accessing your non-Postgres data (Redis, Oracle, MySQL) from within Postgres Window Functions.

Chris Kerstiens does a fast paced overview of Postgres.

### PostgreSQL 9.3 Beta 1 Released

Wednesday, May 15th, 2013

PostgreSQL 9.3 Beta 1 Released

From the post:

The first beta release of PostgreSQL 9.3, the latest version of the world’s best open source database, is now available. This beta contains previews of all of the features which will be available in version 9.3, and is ready for testing by the worldwide PostgreSQL community. Please download, test, and report what you find.

Major Features

The major features available for testing in this beta include:

• Writeable Foreign Tables, enabling pushing data to other databases
• pgsql_fdw driver for federation of PostgreSQL databases
• Automatically updatable VIEWs
• MATERIALIZED VIEW declaration
• LATERAL JOINs
• Additional JSON constructor and extractor functions
• Indexed regular expression search
• Disk page checksums to detect filesystem failures

In 9.3, PostgreSQL has greatly reduced its requirement for SysV shared memory, changing to mmap(). This allows easier installation and configuration of PostgreSQL, but means that we need our users to rigorously test and ensure that no memory management issues have been introduced by the change. We also request that users spend extra time testing the improvements to Foreign Key locks.

If that isn’t enough features for you to test, see the full announcement! 😉

### AXLE: Advanced Analytics for Extremely Large European Databases

Monday, December 3rd, 2012

AXLE: Advanced Analytics for Extremely Large European Databases

From the webpage:

The objectives of the AXLE project are to greatly improve the speed and quality of decision making on real-world data sets. AXLE aims to make these improvements generally available through high quality open source implementations via the PostgreSQL and Orange products.

The project started in early November 2012. Will be checking back to see what is proposed for PostgreSQL and/or Orange.

### Up to Date on Open Source Analytics

Tuesday, October 23rd, 2012

Up to Date on Open Source Analytics by Steve Miller.

Steve updates his Wintel laptop with the latest releases of open source analytics tools.

Steve’s list:

I first saw this mentioned at KDNuggets.

### PostgreSQL Database Modeler

Thursday, October 4th, 2012

PostgreSQL Database Modeler

From the readme file at github:

PostgreSQL Database Modeler, or simply, pgModeler is an open source tool for modeling databases that merges the classical concepts of entity-relationship diagrams with specific features that only PostgreSQL implements. The pgModeler translates the models created by the user to SQL code and apply them onto database clusters from version 8.0 to 9.1.

Other modeling tools you have or are likely to encounter writing topic maps?

When the output of diverse modeling tools or diverse output from the same modeling tool needs semantic reconciliation, I would turn to topic maps.

I first saw this at DZone.

### JMyETL

Friday, September 14th, 2012

JMyETL, an easy to use ETL tool that supports 10 different RDBMS by Esen Sagynov.

From the post:

JMyETL is a very useful and simple Java based application for Windows OS which allows users to import and export data from/to various database systems. For example:

• CUBRID –> Sybase ASE, Sybase ASA, MySQL, Oracle, PostgreSQL, SQL Server, DB2, Access, SQLite
• MySQL –> Sybase ASE/ASA, Oracle, Access, PostgreSQL, SQL Server, DB2, SQLite, CUBRID
• Sybase ASE –> Sybase ASA, MySQL, Oracle, Access, PostgreSQL, SQL Server, DB2, SQLite, CUBRID
• Sybase ASA –> Sybase ASE, MySQL, Oracle, Access, PostgreSQL, SQL Server, DB2, SQLite, CUBRID
• Oracle –> Sybase ASA, Sybase ASE, MySQL, Access, PostgreSQL, SQL Server, DB2, SQLite, CUBRID
• Access –> Sybase ASE, Sybase ASA, MySQL, Oracle, PostgreSQL, SQL Server, DB2, SQLite, CUBRID
• PostgreSQL –> Sybase ASE, Sybase ASA, MySQL, Oracle, Access, SQL Server, DB2, SQLite, CUBRID
• SQL Server –> Sybase ASE, Sybase ASA, MySQL, Oracle, PostgreSQL, Access, DB2, SQLite, CUBRID
• DB2 –> Sybase ASE, Sybase ASA, MySQL, Oracle, PostgreSQL, SQL Server, Access, SQLite, CUBRID
• SQLite –> Sybase ASE, Sybase ASA, MySQL, Oracle, PostgreSQL, SQL Server, DB2, Access, CUBRID

Just in case you need a database to database ETL utility.

I first saw this at DZone.

### PostgreSQL 9.2 released

Wednesday, September 12th, 2012

PostgreSQL 9.2 released

From the announcement:

The PostgreSQL Global Development Group announces PostgreSQL 9.2, the latest release of the leader in open source databases. Since the beta release was announced in May, developers and vendors have praised it as a leap forward in performance, scalability and flexibility. Users are expected to switch to this version in record numbers.

“PostgreSQL 9.2 will ship with native JSON support, covering indexes, replication and performance improvements, and many more features. We are eagerly awaiting this release and will make it available in Early Access as soon as it’s released by the PostgreSQL community,” said Ines Sombra, Lead Data Engineer, Engine Yard.

New features like range types:

Range types are used to store a range of data of a given type. There are a few pre-defined types. They are integer (int4range), bigint (int8range), numeric (numrange), timestamp without timezone (tsrange), timestamp with timezone (tstzrange), and date (daterange).

Ranges can be made of continuous (numeric, timestamp…) or discrete (integer, date…) data types. They can be open (the bound isn’t part of the range) or closed (the bound is part of the range). A bound can also be infinite.

Without these datatypes, most people solve the range problems by using two columns in a table. These range types are much more powerful, as you can use many operators on them.

have captured my attention.

Now to look at other new features: Index-only scans, Replication improvements and JSON datatype.

### PostgreSQL’s place in the New World Order

Friday, July 27th, 2012

PostgreSQL’s place in the New World Order by Matthew Soldo.

Description:

Mainstream software development is undergoing a radical shift. Driven by the agile development needs of web, social, and mobile apps, developers are increasingly deploying to platforms-as-a-service (PaaS). A key enabling technology of PaaS is cloud-services: software, often open-source, that is consumed as a service and operated by a third-party vendor. This shift has profound implications for the open-source world. It enables new business models, increases emphasis on user-experience, and creates new opportunities.

PostgreSQL is an excellent case study in this shift. The PostgreSQL project has long offered one of the most reliable open source databases, but has received less attention than competing technologies. But in the PaaS and cloud-services world, reliability and open-ness become increasingly important. As such, we are seeing the beginning of a shift in adoption towards PostgreSQL.

The datastore landscape is particularly interesting because of the recent attention given to the so-called NoSQL technologies. Data is suddenly sexy again. This attention is largely governed by the same forces driving developers to PaaS, namely the need for agility and scalability in building modern apps. Far from being a threat to PostgreSQL, these technologies present an amazing opportunity for showing the way towards making PostgreSQL more powerful and more widely adopted.

The presentation sounds great, but alas, the slidedeck is just a slidedeck. 🙁

I do recommend it for the next to last slide graphic. Very cool!

(And it may be time to take a another look at PostgreSQL as well.)

### Trees in the Database: Advanced Data Structures

Monday, March 5th, 2012

Trees in the Database: Advanced Data Structures

Lorenzo Alberton writes:

Despite the NoSQL movement trying to flag traditional databases as a dying breed, the RDBMS keeps evolving and adding new powerful weapons to its arsenal. In this talk we’ll explore Common Table Expressions (SQL-99) and how SQL handles recursion, breaking the bi-dimensional barriers and paving the way to more complex data structures like trees and graphs, and how we can replicate features from social networks and recommendation systems. We’ll also have a look at window functions (SQL:2003) and the advanced reporting features they make finally possible. The first part of this talk will cover several different techniques to model a tree data structure into a relational database: parent-child (adjacency list) model, materialized path, nested sets, nested intervals, hybrid models, Common Table Expressions. Then we’ll move one step forward and see how we can model a more complex data structure, i.e. a graph, with concrete examples from today’s websites. Starting from real-world examples of social networks’ and recommendation systems’ features, and with the help of some graph theory, this talk will explain how to represent and traverse a graph in the database. Finally, we will take a look at Window Functions and how they can be useful for data analytics and simple inline aggregations, among other things. All the examples have been tested on PostgreSQL >= 8.4.

Very impressive presentation!

Definitely makes me want to dust off my SQL installations and manuals for a closer look!

### Migrating from Oracle to PostgreSQL

Monday, February 20th, 2012

Migrating from Oracle to PostgreSQL by Kevin Kempter

From the post:

This video presents Ora2Pg, a free tool that you can use to migrate an Oracle database to a PostgreSQL compatible schema. Ora2Pg connects your Oracle database, scan it automatically and extracts its structure or data, it then generates SQL scripts that you can load into your PostgreSQL database.

Ora2Pg can be used from reverse engineering Oracle database for database migration or to replicate Oracle data into a PostgreSQL database. The video shows where to download it and talks about the prerequisites. It explains how to install Ora2Pg and configure it. At the end, it presents some examples of ora2pg being used.

Like the man says, useful for migration or replication.

What I wonder about is the day in the not too distant future when “migration” isn’t a meaningful term. Either because the data is too large or dynamic for “migration” to be meaningful. Not to mention the inevitable dangers of corruption during “migration.”

And if you think about it, isn’t the database engine, Oracle or PostgreSQL simply a way to access data already stored? If I want to use a different engine to access the same data, what is the difficulty?

I would much rather design a topic map that queries “Oracle” data in place, either using an Oracle interface or even directly than to “migrate” the data with all the hazards and dangers that brings.

Will be interesting if the “cloud” results in data storage separate from application interfaces. Much like we all use TCP/IP for network traffic, although the packets are put to different purposes by different applications.

### PGStrom (PostgreSQL + GPU)

Tuesday, January 31st, 2012

PGStrom

From the webpage:

PG-Strom is a module of FDW (foreign data wrapper) of PostgreSQL database. It was designed to utilize GPU devices to accelarate sequential scan on massive amount of records with complex qualifiers. Its basic concept is CPU and GPU should focus on the workload with their advantage, and perform concurrently. CPU has much more flexibility, thus, it has advantage on complex stuff such as Disk-I/O, on the other hand, GPU has much more parallelism of numerical calculation, thus, it has advantage on massive but simple stuff such as check of qualifiers for each rows.

The below figure is a basic concept of PG-Strom. Now, on sequential scan workload, vanilla PostgreSQL does iteration of fetch a tuple and checks of qualifiers for each tuples. If we could consign GPU the workload of green portion, it enables to reduce workloads of CPU, thus, it shall be able to load more tuples in advance. Eventually, it should allow to provide shorter response-time on complex queries towards large amount of data.

Requires setting up the table for the GPU ahead of time but performance increase is reported to be 10x – 20x.

It occurs to me that GPUs should be well suited for graph processing. Yes? Will have to look into that and report back.

### Beyond Relational

Monday, December 26th, 2011

Beyond Relational

I originally arrived at this site because of a blog hosted there with lessons on Oracle 10g. Exploring a bit I decided to post about it.

Seems to have fairly broad coverage, from Oracle and PostgreSQL to TSQL and XQuery.

Likely to be a good site for learning cross-overs between systems that you can map for later use.

Suggestions of similar sites?

### SQL Database-as-a-Service

Thursday, December 15th, 2011

SQL Database-as-a-Service

Documentation

Just starting the documentation but two quick thoughts:

First, most conventionally, this could be the back-end to a topic map server. Despite having started off many years ago in server administration (or perhaps because of it), server configuration/management isn’t an “additional” duty for any mission critical development/support staff. Too easy to hire server management services, who are capable of providing maintenance/support that no small firm could afford locally.

Second, a bit more unconventionally, this could be an illustration for a Topic-Map-As-Service. Think about it. If instead of the mish-mash that is Wikipedia, you had a topic maps of facts that were supplemented (read mapped) to records from various public reporting agencies, that could be interesting to “press” a local topic map against to acquire more recent data.

True, there are the public record services but they only give you person by person records and not relationships between them. Not to mention that if you are inventive, you could create some very interesting topic maps (intersections of records).

Imagine the stir that a topic map of license plates of cars with local plates at motels would cause. Rather than offering free access, since most people would only be interested in one license plate in particular, suggest that you show one plate at some random time each hour and where it was seen. (not the date) Sell advertising for the page where you offer the “free” sneak peak. Suspect you better have some load expansion capacity.

### Seven Databases in Seven Weeks now in Beta

Thursday, December 1st, 2011

Seven Databases in Seven Weeks now in Beta

From the webpage:

Redis, Neo4J, Couch, Mongo, HBase, Riak, and Postgres: with each database, you’ll tackle a real-world data problem that highlights the concepts and features that make it shine. You’ll explore the five data models employed by these databases: relational, key/value, columnar, document, and graph. See which kinds of problems are best suited to each, and when to use them.

You’ll learn how MongoDB and CouchDB, both JavaScript powered, document oriented datastores, are strikingly different. Learn about the Dynamo heritage at the heart of Riak and Cassandra. Understand MapReduce and how to use it to solve Big Data problems.

Build clusters of servers using scalable services like Amazon’s Elastic Compute Cloud (EC2). Discover the CAP theorem and its implications for your distributed data. Understand the tradeoffs between consistency and availability, and when you can use them to your advantage. Use multiple databases in concert to create a platform that’s more than the sum of its parts, or find one that meets all your needs at once.

Seven Databases in Seven Weeks will give you a broad understanding of the databases, their strengths and weaknesses, and how to choose the ones that fit your needs.

Now in beta, in non-DRM PDF, epub, and mobi from pragprog.com/book/rwdata.

If you know the Seven Languages in Seven Weeks by Bruce Tate, no further recommendation is necessary for the approach.

I haven’t read the book, yet, but will be getting the electronic beta tonight. More to follow.

### Postgres Plus Connector for Hadoop

Friday, November 11th, 2011

From the webpage:

The Postgres Plus Connector for Hadoop provides developers easy access to massive amounts of SQL data for integration with or analysis in Hadoop processing clusters. Now large amounts of data managed by PostgreSQL or Postgres Plus Advanced Server can be accessed by Hadoop for analysis and manipulation using Map-Reduce constructs.

EnterpriseDB recognized early on that Hadoop, a framework allowing distributed processing of large data sets across computer clusters using a simple programming model, was a valuable and complimentary data processing model to traditional SQL systems. Map-Reduce processing serves important needs for basic processing of extremely large amounts of data and SQL based systems will continue to fulfill their mission critical needs for complex processing of data well into the future. What was missing was an easy way for developers to access and move data between the two environments.

EnterpriseDB has created the Postgres Plus Connector for Hadoop by extending the Pig platform (an engine for executing data flows in parallel on Hadoop) and using an EnterpriseDB JDBC driver to allow users the ability to load the results of a SQL query into Hadoop where developers can operate on that data using familiar Map-Reduce programming. In addition, data from Hadoop can also be moved back into PostgreSQL or Postgres Plus Advanced Server tables.

A private beta is in progress, see the webpage for details and to register.

Plus, there is a webinar, Tuesday, November 29, 2011 11:00 am Eastern Standard Time (New York, GMT-05:00), Extending SQL Analysis with the Postgres Plus Connector for Hadoop. Registration at the webpage as well.

A step towards seamless data environments. Much like word processing now without the “.” commands. Same commands for the most part but unseen. Data is going in that direction. You will specify desired results and environments will take care of access, processor(s), operations and the like. Tables will appear as tables because you have chosen to view them as tables, etc.

### Indexed Nearest Neighbour Search in PostGIS

Thursday, September 29th, 2011

Indexed Nearest Neighbour Search in PostGIS

From the post:

An always popular question on the PostGIS users mailing list has been “how do I find the N nearest things to this point?”.

To date, the answer has generally been quite convoluted, since PostGIS supports bounding box index searches, and in order to get the N nearest things you need a box large enough to capture at least N things. Which means you need to know how big to make your search box, which is not possible in general.

PostgreSQL has the ability to return ordered information where an index exists, but the ability has been restricted to B-Tree indexes until recently. Thanks to one of our clients, we were able to directly fund PostgreSQL developers Oleg Bartunov and Teodor Sigaev in adding the ability to return sorted results from a GiST index. And since PostGIS indexes use GiST, that means that now we can also return sorted results from our indexes.

This feature (the PostGIS side of it) was funded by Vizzuality, and hopefully it comes in useful in their CartoDB work.

You will need PostgreSQL 9.1 and the PostGIS source code from the repository, but this is what a nearest neighbour search looks like:

PostgreSQL? Isn’t that SQL? 🙂

Indexed nearest neighbour search is a question of results, not ideology.

Better targeting through technology.