Archive for the ‘Oracle’ Category

Oracle Identity Manager Sets One Black Space Password – Functional “Lazy” Hacking?

Wednesday, November 1st, 2017

Oracle Identity Manager – Default User Accounts

From the webpage:


This account is set to a ‘run as’ user for Message Driven Beans (MDBs) executing JMS messages. This account is created during installation and is used internally by Oracle Identity Manager.

The password of this account is set to a single space character in Oracle Identity Manager database to prevent user login through Oracle Identity Manager Design console or Oracle Identity Manager System Administration Console.

Do not change the user name or password of this account.

That’s right! Hit the space bar once and you’ve got it!

What’s more, it’s a default account!

Is this “functional hacking?” Being lazy and waiting for Oracle to hack itself?

Figures Don’t Lie, But Liars Can Figure

Thursday, March 12th, 2015

A pair of posts that you may find amusing on the question of “free” and “cheaper.”

HBase is Free but Oracle NoSQL Database is cheaper

When does “free” challenge that old adage, “You get what you pay for”?

Two brief quotes from the first post set the stage:

How can Oracle NoSQL Database be cheaper than “free”? There’s got to be a catch. And of course there is, but it’s not where you are expecting. The problem in that statement isn’t with “cheaper” it’s with “free”.

An HBase solution isn’t really free because you do need hardware to run your software. And when you need to scale out, you have to look at the how well the software scales. Oracle NoSQL Database scales much better than HBase which translated in this case to needing much less hardware. So, yes, it was cheaper than free. Just be careful when somebody says software is free.

The second post tries to remove the vendor (Oracle) from the equation:

Read-em and weep …. NOT according to Oracle, HBase does not take advantage of SSD’s anywhere near the extent with which Oracle NoSQL does … couldn’t even use the same scale on the vertical bar.

SanDisk on HBase with SSD

SanDisk on Oracle NoSQL with SSD

And so the question remains, when does “free” challenge the old adage “you get what you pay for”, because in this case, the adage continues to hold up.

And as the second post notes, Oracle has committed code back to the HBase product so it isn’t unfamiliar to them.

First things first, the difficulty that leads to these spats is using “cheap,” “free,” “scalable,” “NoSQL,” etc. as the basis for IT marketing or decision making. That may work with poorer IT decision makers and however happy it makes the marketing department, it is just noise. Noise that is a disservice to IT consumers.

Take “cheaper,” and “free” as used in these posts. Is hardware really the only cost associated with HBase or Oracle installations? If it is, I have been severely misled.

On the Hbase expense side I would expect to find HBase DBAs, maintenance of those personnel, hardware (+maintenance), programmers, along with use case requirements that must be met.

On the Oracle expense side I would expect to find Oracle DBAs, maintenance of those personnel, Oracle software licensing, hardware (+maintenance), programmers, along with use case requirements that must be met.

Before you jump to my listing “Oracle software licensing,” consider how that will impact the availability of appropriate personnel, the amount of training needed to introduce new IT staff to HBase, etc.

Not to come down too hard for Oracle, Oracle DBAs and their maintenance aren’t cheap, nor are some of the “features” of Oracle software.

Truth be told there is a role for project requirements, experience of current IT personnel, influence IT has over the decision makers, and personal friendships of decision makers in any IT decision making.

To be very blunt, IT decision making is just as political as any other enterprise decision.

Numbers are a justification for a course chosen for other reasons. As a user I am always more concerned with my use cases being met than numbers. Aren’t you?

Introducing Espresso – LinkedIn’s hot new distributed document store

Sunday, January 25th, 2015

Introducing Espresso – LinkedIn’s hot new distributed document store by Aditya Auradkar.

From the post:

Espresso is LinkedIn’s online, distributed, fault-tolerant NoSQL database that currently powers approximately 30 LinkedIn applications including Member Profile, InMail (LinkedIn’s member-to-member messaging system), portions of the Homepage and mobile applications, etc. Espresso has a large production footprint at LinkedIn with over a dozen clusters in use. It hosts some of the most heavily accessed and valuable datasets at LinkedIn serving millions of records per second at peak. It is the source of truth for hundreds of terabytes (not counting replicas) of data.


To meet the needs of online applications, LinkedIn traditionally used Relational Database Management Systems (RDBMSs) such as Oracle and key-value stores such as Voldemort – both serving different use cases. Much of LinkedIn requires a primary, strongly consistent, read/write data store that generates a timeline-consistent change capture stream to fulfill nearline and offline processing requirements. It has become apparent that many, if not most, of the primary data requirements of LinkedIn do not require the full functionality of monolithic RDBMSs, nor can they justify the associated costs.


A must read if you are concerned with BigData and/or distributed systems.

A refreshing focus on requirements, as opposed to engineering by slogan, “all the world’s a graph.”

Looking forward to more details on Expresso as they emerge.

I first saw this in a tweet by Martin Kleppmann.

Larry Ellison as Pinocchio?

Tuesday, June 10th, 2014

Hazelcast Enterprise: A Direct Challenge to Larry Ellison’s Vision for In-Memory Computing

You might not have sipped so much Oracle kool-aid today if Larry Ellison were in fact Pinocchio.

I don’t know about you but seeing Ellison balancing his nose on the end of a wheel barrow would make me think carefully about any statements he made.


How Neo4j beat Oracle Database

Sunday, February 10th, 2013

Neo Technology execs: How Neo4j beat Oracle Database by Paul Krill.

From the post:

Neo Technology, which was formed in 2007, offers Neo4J, a Java-based open source NoSQL graph database. With a graph database, which can search social network data, connections between data are explored. Neo4j can solve problems that require repeated network probing (the database is filled with nodes, which are then linked), and the company stresses Neo4j’s high performance. InfoWorld Editor at Large Paul Krill recently talked with Neo CEO Emil Eifrem and Philip Rathle, Neo senior director of products, about the importance of graph database technology as well as Neoo4j’s potential in the mobile space. Eifrem also stressed his confidence in Java, despite recent security issues affecting the platform.

InfoWorld: Graph database technology is not the same as NoSQL, is it?

Eifrem: NoSQL is actually four different types of databases: There’s key value stores, like Amazon DynamoDB, for example. There’s column-family stores like Cassandra. There’s document databases like MongoDB. And then there’s graph databases like Neo4j. There are actually four pillars of NoSQL, and graph databases is one of them. Cisco is building a master data management system based on Neo4j, and this is actually our first Fortune 500 customer. They found us about two years ago when they tried to build this big, complex hierarchy inside of Oracle RAC. In Oracle RAC, they had response time in minutes, and then when they replaced it [with] Neo4j, they had response times in milliseconds. (emphasis added)

It is a great story and one I would repeat if I were marketing Neo4j (which I like a lot).

However, there are a couple of bits missing from the story that would make it more informative.

Such as what “…big, complex hierarchy…” was Cisco trying to build? Details please.

There are things that relational databases don’t do well.

Not realizing that up front is a design failure, not one of software or of relational databases.

Another question I would ask: What percentage of Cisco databases are relational vs. graph?

Fewer claims/stories and more data would go a long way towards informed IT decision making.

Oracle’s MySQL 5.6 released

Wednesday, February 6th, 2013

Oracle’s MySQL 5.6 released

From the post:

Just over two years after the release of MySQL 5.5, the developers at Oracle have released a GA (General Availability) version of Oracle MySQL 5.6, labelled MySQL 5.6.10. In MySQL 5.5, the developers replaced the old MyISAM backend and used the transactional InnoDB as the default for database tables. With 5.6, the retrofitting of full-text search capabilities has enabled InnoDB to now take on the position of default storage engine for all purposes.

Accelerating the performance of sub-queries was also a focus of development; they are now run using a process of semi-joins and materialise much faster; this means it should not be necessary to replace subqueries with joins. Many operations that change the data structures, such as ALTER TABLE, are now performed online, which avoids long downtimes. EXPLAIN also gives information about the execution plans of UPDATE, DELETE and INSERT commands. Other optimisations of queries include changes which can eliminate table scans where the query has a small LIMIT value.

MySQL’s row-oriented replication now supports “row image control” which only logs the columns needed to identify and make changes on each row rather than all the columns in the changing row. This could be particularly expensive if the row contained BLOBs, so this change not only saves disk space and other resources but it can also increase performance. “Index Condition Pushdown” is a new optimisation which, when resolving a query, attempts to use indexed fields in the query first, before applying the rest of the WHERE condition.

MySQL 5.6 also introduces a “NoSQL interface” which uses the memcached API to offer applications direct access to the InnoDB storage engine while maintaining compatibility with the relational database engine. That underlying InnoDB engine has also been enhanced with persistent optimisation statistics, multithreaded purging and more system tables and monitoring data available.

Download MySQL 5.6.

I mentioned Oracle earlier today (When Oracle bought MySQL [Humor]) so it’s only fair that I point out their most recent release of MySQL.

Online Education- MongoDB and Oracle R Enterprise

Wednesday, October 24th, 2012

Online Education- MongoDB and Oracle R Enterprise by Ajay Ohri.

Ajay brings news of two MongoDB online courses, one for developers and one for DBAs, and an Oracle offering on R.

The MongoDB classes started Monday (22nd of October) so you had better hurry to register.

Applying Parallel Prediction to Big Data

Saturday, October 6th, 2012

Applying Parallel Prediction to Big Data by Dan McClary (Principal Product Manager for Big Data and Hadoop at Oracle).

From the post:

One of the constants in discussions around Big Data is the desire for richer analytics and models. However, for those who don’t have a deep background in statistics or machine learning, it can be difficult to know not only just what techniques to apply, but on what data to apply them. Moreover, how can we leverage the power of Apache Hadoop to effectively operationalize the model-building process? In this post we’re going to take a look at a simple approach for applying well-known machine learning approaches to our big datasets. We’ll use Pig and Hadoop to quickly parallelize a standalone machine-learning program written in Jython.

Playing Weatherman

I’d like to predict the weather. Heck, we all would – there’s personal and business value in knowing the likelihood of sun, rain, or snow. Do I need an umbrella? Can I sell more umbrellas? Better yet, groups like the National Climatic Data Center offer public access to weather data stretching back to the 1930s. I’ve got a question I want to answer and some big data with which to do it. On first reaction, because I want to do machine learning on data stored in HDFS, I might be tempted to reach for a massively scalable machine learning library like Mahout.

For the problem at hand, that may be overkill and we can get it solved in an easier way, without understanding Mahout. Something becomes apparent on thinking about the problem: I don’t want my climate model for San Francisco to include the weather data from Providence, RI. Weather is a local problem and we want to model it locally. Therefore what we need is many models across different subsets of data. For the purpose of example, I’d like to model the weather on a state-by-state basis. But if I have to build 50 models sequentially, tomorrow’s weather will have happened before I’ve got a national forecast. Fortunately, this is an area where Pig shines.

Two quick observations:

First, Dan makes my point about your needing the “right” data, which may or may not be the same thing as “big data.” Decide what you want to do before you reach for big iron and data.

Second, I never hear references to the “weatherman” without remembering: “you don’t need to be a weatherman to know which way the wind blows.” (link to the manifesto) If you prefer a softer version, Subterranean Homesick Blues by Bob Dylan.

Amazon RDS for Oracle Database – Now Starting at $30/Month

Saturday, September 29th, 2012

Amazon RDS for Oracle Database – Now Starting at $30/Month by Jeff Barr.

From the post:

You can now create Amazon RDS database instances running Oracle Database on Micro instances.

This new, option will allow you to build, test, and run your low-traffic database-backed applications at a cost starting at $30 per month ($0.04 per hour) using the License Included option. If you have a more intensive application, the micro instance enables you to get hands on experience with Amazon RDS before you scale up to a larger instance size. You can purchase Reserved Instances in order to further lower your effectively hourly rate.

These instances are available now in all AWS Regions. You can learn more about using Amazon RDS for managing Oracle database instances by attending this webinar.

Oracle databases aren’t for the faint of heart but they are everywhere in enterprise settings.

If you are or aspire to be working with enterprise information systems, the more you know about Oracle databases the more valuable you become.

To your employer and your clients.

Oracle ADF Core Functionality Now Available for Free…

Monday, September 24th, 2012

Oracle ADF Core Functionality Now Available for Free – Presenting Oracle ADF Essentials by Shay Shmeltzer.

From the post:

We are happy to announce the new Oracle ADF Essentials – a free to develop and deploy version of the core technologies at the base of Oracle ADF – Oracle’s strategic development framework that was used, among other things, to build the new generation of the enterprise Oracle Fusion Applications.

This release is aligned with the new Oracle JDeveloper version that we released today.

Oracle ADF Essentials enables developers to use the following free:

  • Oracle ADF Faces Rich Client components –over 150 JSF 2.0 components that include extensive charting and data visualization components, supports skinning, internalization, accessibility and touch gestures and providing advanced Ajax, windowing, drag and drop and other UI capabilities in a declarative way.
  • Oracle ADF Controller – an extension on top of the JSF controller providing complete process flow definition and enabling advanced reusability of flows inside page’s regions.
  • Oracle ADF Binding – a declarative way to bind various business services to JSF user interfaces eliminating tedious managed-beans coding.
  • Oracle ADF Business Components – a declarative layer for building Java based business services on top of relational databases.

The lesson here is to give away tools for people to write the interfaces to products you are interested in selling. Particularly if interfaces aren’t in your product line.

Like applying topic maps to relational database content. Just as an example.

I first saw this at DZone.


Friday, September 14th, 2012

JMyETL, an easy to use ETL tool that supports 10 different RDBMS by Esen Sagynov.

From the post:

JMyETL is a very useful and simple Java based application for Windows OS which allows users to import and export data from/to various database systems. For example:

  • CUBRID –> Sybase ASE, Sybase ASA, MySQL, Oracle, PostgreSQL, SQL Server, DB2, Access, SQLite
  • MySQL –> Sybase ASE/ASA, Oracle, Access, PostgreSQL, SQL Server, DB2, SQLite, CUBRID
  • Sybase ASE –> Sybase ASA, MySQL, Oracle, Access, PostgreSQL, SQL Server, DB2, SQLite, CUBRID
  • Sybase ASA –> Sybase ASE, MySQL, Oracle, Access, PostgreSQL, SQL Server, DB2, SQLite, CUBRID
  • Oracle –> Sybase ASA, Sybase ASE, MySQL, Access, PostgreSQL, SQL Server, DB2, SQLite, CUBRID
  • Access –> Sybase ASE, Sybase ASA, MySQL, Oracle, PostgreSQL, SQL Server, DB2, SQLite, CUBRID
  • PostgreSQL –> Sybase ASE, Sybase ASA, MySQL, Oracle, Access, SQL Server, DB2, SQLite, CUBRID
  • SQL Server –> Sybase ASE, Sybase ASA, MySQL, Oracle, PostgreSQL, Access, DB2, SQLite, CUBRID
  • DB2 –> Sybase ASE, Sybase ASA, MySQL, Oracle, PostgreSQL, SQL Server, Access, SQLite, CUBRID
  • SQLite –> Sybase ASE, Sybase ASA, MySQL, Oracle, PostgreSQL, SQL Server, DB2, Access, CUBRID

Just in case you need a database to database ETL utility.

I first saw this at DZone.

DBMS_COMPARISON Package [Oracle]

Monday, September 3rd, 2012


Mahmoud A. El-Sayed introduces the Oracle 11g package, DBML_COMPARISON, which compares database objects, thus:

The DBMS_COMPARISON package can compare the following types of database objects:
    a- Tables
    b- Single-table views
    c- Materialized views
    d- Synonyms for tables, single-table views, and materialized views

The DBMS_COMPARISON package cannot compare data in columns of the following data types:
    a- LONG
    b- LONG RAW
    c- ROWID
    d- UROWID
    e- CLOB
    f- NCLOB
    g- BLOB
    h- BFILE
    i- User-defined types (including object types, REFs, varrays, and nested tables)
    j- Oracle-supplied types (including any types, XML types, spatial types, and media types)

You may also be interested in the Oracle documentation on DBML_COMPARISON – Oracle.

Merging presumes some comparison step so I commend this to you if you are in an Oracle environment.

Thoughts on the data type exclusions from comparison? Documentation says they are excluded but I didn’t see any hint at a reason for the exclusion.

I first saw this at DZone.

Where to get a BZR tree of the latest MySQL releases

Friday, August 17th, 2012

Where to get a BZR tree of the latest MySQL releases by Stewart Smith.

Sometimes, being difficult can develop into its own reward. Not always appreciated when it arrives but a reward none the less.

Using Oracle Full-Text Search in Entity Framework

Wednesday, July 25th, 2012

Using Oracle Full-Text Search in Entity Framework

From the post:

Oracle database supports an advanced functionality of full-text search (FTS) called Oracle Text, which is described comprehensively in the documentation:

We decided to meet the needs of our users willing to take advantage of the full-text search in Entity Framework and implemented the basic Oracle Text functionality in our Devart dotConnect for Oracle ADO.NET Entity Framework provider.

Just in case you run across a client using Oracle to store text data. 😉

I first saw this at Beyond Search (As Stephen implies, it is not a resource for casual data miners.)

Big Game Hunting in the Database Jungle

Thursday, May 17th, 2012

If all these new DBMS technologies are so scalable, why are Oracle and DB2 still on top of TPC-C? A roadmap to end their dominance.

Alexander Thomson and Daniel Abadi write:

In the last decade, database technology has arguably progressed furthest along the scalability dimension. There have been hundreds of research papers, dozens of open-source projects, and numerous startups attempting to improve the scalability of database technology. Many of these new technologies have been extremely influential—some papers have earned thousands of citations, and some new systems have been deployed by thousands of enterprises.

So let’s ask a simple question: If all these new technologies are so scalable, why on earth are Oracle and DB2 still on top of the TPC-C standings? Go to the TPC-C Website with the top 10 results in raw transactions per second. As of today (May 16th, 2012), Oracle 11g is used for 3 of the results (including the top result), 10g is used for 2 of the results, and the rest of the top 10 is filled with various versions of DB2. How is technology designed decades ago still dominating TPC-C? What happened to all these new technologies with all these scalability claims?

The surprising truth is that these new DBMS technologies are not listed in the TPC-C top ten results not because that they do not care enough to enter, but rather because they would not win if they did.

Preview of a paper that Alex is presenting at SIGMOD next week. Introducing “Calvin,” a new approach to database processing.

So where does Calvin fall in the OldSQL/NewSQL/NoSQL trichotomy?

Actually, nowhere. Calvin is not a database system itself, but rather a transaction scheduling and replication coordination service. We designed the system to integrate with any data storage layer, relational or otherwise. Calvin allows user transaction code to access the data layer freely, using any data access language or interface supported by the underlying storage engine (so long as Calvin can observe which records user transactions access).

What I find exciting about this report (and the paper) is the re-thinking of current assumptions concerning data processing. May be successful or may not be. But the exciting part is the attempt to transcend decades of acceptance of the maxims of our forefathers.

BTW, Calvin is reported to support 500,000 transactions a second.

Big game hunting anyone?*

* I don’t mean that as an expression of preference for or against Oracle.

I suspect Calvin will be a wake up call to R&D at Oracle to re-double their own efforts at ground breaking innovation.

Breakthroughs in matching up multi-dimensional indexes would be attractive to users who need to match up disparate data sources.

Speed is great but a useful purpose attracts customers.

Advanced Analytics with R and SAP HANA

Saturday, March 17th, 2012

Advanced Analytics with R and SAP HANA. Slides by Jitender Aswani and Jens Doerpmund.

Ajay Ohri reports that SAP is following Oracle in using R. And we have all heard about Hadoop and R.

Question: What language for analytics are you going to start learning for Oracle, SAP and Hadoop? (To say nothing of mining/analysis for topic maps.)

Migrating from Oracle to PostgreSQL

Monday, February 20th, 2012

Migrating from Oracle to PostgreSQL by Kevin Kempter

From the post:

This video presents Ora2Pg, a free tool that you can use to migrate an Oracle database to a PostgreSQL compatible schema. Ora2Pg connects your Oracle database, scan it automatically and extracts its structure or data, it then generates SQL scripts that you can load into your PostgreSQL database.

Ora2Pg can be used from reverse engineering Oracle database for database migration or to replicate Oracle data into a PostgreSQL database. The video shows where to download it and talks about the prerequisites. It explains how to install Ora2Pg and configure it. At the end, it presents some examples of ora2pg being used.

Like the man says, useful for migration or replication.

What I wonder about is the day in the not too distant future when “migration” isn’t a meaningful term. Either because the data is too large or dynamic for “migration” to be meaningful. Not to mention the inevitable dangers of corruption during “migration.”

And if you think about it, isn’t the database engine, Oracle or PostgreSQL simply a way to access data already stored? If I want to use a different engine to access the same data, what is the difficulty?

I would much rather design a topic map that queries “Oracle” data in place, either using an Oracle interface or even directly than to “migrate” the data with all the hazards and dangers that brings.

Will be interesting if the “cloud” results in data storage separate from application interfaces. Much like we all use TCP/IP for network traffic, although the packets are put to different purposes by different applications.

Oracle Announces General Availability of MySQL Cluster 7.2

Friday, February 17th, 2012

Oracle Announces General Availability of MySQL Cluster 7.2

Another demonstration that high quality open source projects are not inconsistent with commercial products.

From the post:

Delivers up to 70x More Performance for Complex Queries; Adds New NoSQL Memcached Interface

News Facts

  • Continuing to drive MySQL innovation, Oracle today announced the general availability of MySQL Cluster 7.2.
  • For highly demanding Web-based and communications products and services, MySQL Cluster is designed to cost-effectively deliver 99.999% availability, high write scalability and very low latency.
  • With SQL and NoSQL access through a new Memcached API, MySQL Cluster represents a “best of both worlds” solution allowing key value operations and complex SQL queries within the same database.
  • With MySQL Cluster 7.2, users can also gain up to a 70x increase in performance on complex queries, and enhanced multi-data center scalability.
  • MySQL Cluster 7.2 is also certified with Oracle VM. The combination of its elastic, on-demand scalability and self-healing features, together with Oracle VM support, makes MySQL Cluster an ideal choice for deployments in the cloud.
  • Also generally available today is the latest release of the MySQL Cluster Manager, version 1.1.4, further improving the ease of use and administration of MySQL Cluster.

Oracle: “Open Source isn’t all that weird” (Cloudera)

Tuesday, January 10th, 2012

OK, maybe that’s not an exact word-for-word quotation. 😉

Oracle selects CDH and Cloudera Manager as the Apache Hadoop Platform for the Oracle Big Data Appliance

Ed Albanese (Ed leads business development for Cloudera. He is responsible for identifying new markets, revenue opportunities and strategic alliances for the company.) writes:

Summary: Oracle has selected Cloudera’s Distribution Including Apache Hadoop (CDH) and Cloudera Manager software as core technologies on the Oracle Big Data Appliance, a high performance “engineered system.” Oracle and Cloudera announced a multiyear agreement to provide CDH, Cloudera Manager, and support services in conjunction with Oracle Support for use on the Oracle Big Data Appliance.

Announced at Oracle Open World in October 2011, the Big Data Appliance was received with significant market interest. Oracle reported then that it would be released in the first half of 2012. Just 10 days into that period, Oracle has announced that the Big Data Appliance is available immediately.

The product itself is noteworthy. Oracle has combined Oracle hardware and software innovations with Cloudera technology to deliver what it calls an “engineered system.” Oracle has created several such systems over the past few years, including the Exadata, Exalogic, and Exalytics products. The Big Data Appliance combines Apache Hadoop with a purpose-built hardware platform and software that includes platform components such as Linux and Java, as well as data management technologies such as the Oracle NoSql database and Oracle integration software.

Read the post to get Ed’s take on what this will mean for both Cloudera and Oracle customers (positive).

I’m glad for Cloudera but also take this as validation of the overall Hadoop ecosystem. Not that it is appropriate for every application but where it is, it deserves serious consideration.

Beyond Relational

Monday, December 26th, 2011

Beyond Relational

I originally arrived at this site because of a blog hosted there with lessons on Oracle 10g. Exploring a bit I decided to post about it.

Seems to have fairly broad coverage, from Oracle and PostgreSQL to TSQL and XQuery.

Likely to be a good site for learning cross-overs between systems that you can map for later use.

Suggestions of similar sites?

Topic Maps & Oracle: A Smoking Gun

Saturday, December 24th, 2011

Using Similarity-based Operations for Resolving Data-Level Conflicts (2003)


Dealing with discrepancies in data is still a big challenge in data integration systems. The problem occurs both during eliminating duplicates from semantic overlapping sources as well as during combining complementary data from different sources. Though using SQL operations like grouping and join seems to be a viable way, they fail if the attribute values of the potential duplicates or related tuples are not equal but only similar by certain criteria. As a solution to this problem, we present in this paper similarity-based variants of grouping and join operators. The extended grouping operator produces groups of similar tuples, the extended join combines tuples satisfying a given similarity condition. We describe the semantics of these operators, discuss efficient implementations for the edit distance similarity and present evaluation results. Finally, we give examples how the operators can be used in given application scenarios.

No, the title of the post is not a mistake.

The authors of this paper, in 2003, conclude:

In this paper we presented database operators for finding related data and identifying duplicates based on user-specific similarity criteria. The main application area of our work is the integration of heterogeneous data where the likelihood of occurrence of data objects representing related or the same real-world objects though containing discrepant values is rather high. Intended as an extended grouping operation and by combining it with aggregation functions for merging/reconciling groups of conflicting values our grouping operator fits well into the relational algebra framework and the SQL query processing model. In a similar way, an extended join operator takes similarity predicates used for both operators into consideration. These operators can be utilized in ad-hoc queries as part of more complex data integration and cleaning tasks.

In addition to a theoretical background, the authors illustrate an implementation of their techniques, using Oracle 8i. (Oracle 11i is the current version.)

Don’t despair! 😉

Leaves a lot to be done, including:

  • Interchange between relational database stores
  • Semantic integration in non-relational database stores
  • Interchange in mixed relational/non-relational environments
  • Identifying bases for semantic integration in particular data sets (the tough nut)
  • Others? (your comments can extend this list)

The good news for topic maps is that Oracle has some name recognition in IT contexts. 😉

There is a world of difference between a CIO saying to the CEO:

“That was a great presentation about how we can use our data more effectively with topic maps and some software, what did he say the name was?”


“That was a great presentation about using our Oracle database more effectively!”


Big iron for your practice of topic maps. A present for your holiday tradition.

Aside to Matt O’Donnell. Yes, I am going to be covering actual examples of using these operators for topic map purposes.

Right now I am sifting through a 400 document collection on “multi-dimensional indexing” where I discovered this article. Remind me to look at other databases/indexers with similar operators.

Toad Virtual Expo – 11.11.11 – 24-hour Toad Event

Tuesday, November 8th, 2011

Toad Virtual Expo – 11.11.11 – 24-hour Toad Event

From the website:

24 hours of Toad is here! Join us on 11.11.11, and take an around the world journey with Toad and database experts who will share database development and administration best practices. This is your chance to see new products and new features in action, virtually collaborate with other users – and Quest’s own experts, and get a first-hand look at what’s coming in the world of Toad.

If you are not going to see the Immortals on 11.11.11 or looking for something to do after the movie, drop in on the Toad Virtual Expo! 😉 (It doesn’t look like a “chick” movie anyway.)


Register today for Quest Software’s 24-hour Toad Virtual Expo and learn why the best just got better.

  1. Tokyo Friday, November 11, 2011 6:00 a.m. JST – Saturday, November 12, 2011 6:00 a.m. JST
  2. Sydney Friday, November 11, 2011 8:00 a.m. EDT – Saturday, November 12, 2011 8:00 a.m. EDT

  3. Tel Aviv Thursday, November 10, 2011 11:00 p.m. IST – Friday, November 11, 2011 11:00 p.m. IST
  4. Central Europe Thursday, November 10, 2011 10:00 p.m. CET – Friday, November 11, 2011 10:00 p.m. CET
  5. London Thursday, November 10, 2011 9:00 p.m. GMT – Friday, November 11, 2011 9:00 p.m. GMT
  6. New York Thursday, November 10, 2011 4:00 p.m. EST – Friday, November 11, 2011 4:00 p.m. EST
  7. Los Angeles Thursday, November 10, 2011 1:00 p.m. PST – Friday, November 11, 2011 1:00 p.m. PST

The site wasn’t long on specifics but this could be fun!

Toad for Cloud Databases (Quest Software)

Tuesday, November 8th, 2011

Toad for Cloud Databases (Quest Software)

From the news release:

The data management industry is experiencing more disruption than at any other time in more than 20 years. Technologies around cloud, Hadoop and NoSQL are changing the way people manage and analyze data, but the general lack of skill sets required to manage these new technologies continues to be a significant barrier to mainstream adoption. IT departments are left without a clear understanding of whether development and DBA teams, whose expertise lies with traditional technology platforms, can effectively support these new systems. Toad® for Cloud Databases addresses the skill-set shortage head-on, empowering database professionals to directly apply their existing skills to emerging Big Data systems through an easy-to-use and familiar SQL-based interface for managing non-relational data. 

News Facts:

  • Toad for Cloud Databases is now available as a fully functional, commercial-grade product, for free, at  Toad for Cloud Databases enables users to generate queries, migrate, browse, and edit data, as well as create reports and tables in a familiar SQL view. By simplifying these tasks, Toad for Cloud Databases opens the door to a wider audience of developers, allowing more IT teams to experience the productivity gains and cost benefits of NoSQL and Big Data.
  • Quest first released Toad for Cloud Databases into beta in June 2010, making the company one of the first to provide a SQL-based database management tool to support emerging, non-relational platforms. Over the past 18 months, Quest has continued to drive innovation for the product, growing its list of supported platforms and integrating a UI for its bi-directional data connector between Oracle and Hadoop.
  • Quest’s connector between Oracle and Hadoop, available within Toad for Cloud Databases, delivers a fast and scalable method for data transfer between Oracle and Hadoop in both directions. The bidirectional characteristic of the utility enables organizations to take advantage of Hadoop’s lower cost of storage and analytical capabilities. Quest also contributed the connector to the Apache Hadoop project as an extension to the existing SQOOP framework, and is also available as part of Cloudera’s Distribution Including Apache Hadoop. 
  • Toad for Cloud Databases today supports:
    • Apache Hive
    • Apache HBase
    • Apache Cassandra
    • MongoDB
    • Amazon SimpleDB
    • Microsoft Azure Table Services
    • Microsoft SQL Azure, and
    • All Open Database Connectivity (ODBC)-enabled relational databases (Oracle, SQL Server, MySQL, DB2, etc)


Anything that eases the transition to cloud computing is going to be welcome. Toad being free will increase the ranks of DBAs who will at least experiment on their own.

Oracle Releases NoSQL Database

Wednesday, October 26th, 2011

Oracle Releases NoSQL Database by Leila Meyer.

From the post:

Oracle has released Oracle NoSQL Database 11g, the company’s new entry into the NoSQL database market. Oracle NoSQL Database is a distributed, highly scalable, key-value database that uses the Oracle Berkeley Database Java Edition as its underlying storage system. Developed as a key component of the Oracle Big Data Appliance that was unveiled Oct. 3, Oracle NoSQL Database is available now as a standalone product.

(see the post for the list of features and other details)

Oracle NoSQL Database will be available in a Community Edition through an open source license and an Enterprise Edition through an Oracle Technology Network (OTN) license. The Community Edition is still awaiting final licensing approval, but the Enterprise Edition is available now for download from the Oracle Technology Network.

Don’t know that I will have the time but it would be amusing to compare the actual release with pre-release speculation about its features and capabilities.

More to follow as information becomes available.

Overview of the Oracle NoSQL Database

Wednesday, October 26th, 2011

Overview of the Oracle NoSQL Database nice review by Daniel Abadi.

Where Daniel is inferring information he makes that clear but as one of the leading researchers in the area, I suspect we will find, eventually, that he wasn’t far off the mark.

Interesting reading.