Archive for the ‘SQL Server’ Category

Query the Northwind Database as a Graph Using Gremlin

Wednesday, October 21st, 2015

Query the Northwind Database as a Graph Using Gremlin by Mark Kromer.

From the post:

One of the most popular and interesting topics in the world of NoSQL databases is graph. At DataStax, we have invested in graph computing through the acquisition of Aurelius, the company behind TitanDB, and are especially committed to ensuring the success of the Gremlin graph traversal language. Gremlin is part of the open source Apache TinkerPop graph framework project and is a graph traversal language used by many different graph databases.

I wanted to introduce you to a superb web site that our own Daniel Kuppitz maintains called “SQL2Gremlin” ( which I think is great way to start learning how to query graph databases for those of us who come from the traditional relational database world. It is full of excellent sample SQL queries from the popular public domain RDBMS dataset Northwind and demonstrates how to produce the same results by using Gremlin. For me, learning by example has been a great way to get introduced to graph querying and I think that you’ll find it very useful as well.

I’m only going to walk through a couple of examples here as an intro to what you will find at the full site. But if you are new to graph databases and Gremlin, then I highly encourage you to visit the sql2gremlin site for the rest of the complete samples. There is also a nice example of an interactive visualization / filtering, search tool here that helps visualize the Northwind data set as it has been converted into a graph model.

I’ve worked with (and worked for) Microsoft SQL Server for a very long time. Since Daniel’s examples use T-SQL, we’ll stick with SQL Server for this blog post as an intro to Gremlin and we’ll use the Northwind samples for SQL Server 2014. You can download the entire Northwind sample database here. Load that database into your SQL Server if you wish to follow along.

When I first saw the title to this post,

Query the Northwind Database as a Graph Using Gremlin (emphasis added)

I thought this was something else. A database about the Northwind album.

Little did I suspect that the Northwind Database is a test database for SQL Server 2005 and SQL Server 2008. Yikes!

Still, I thought some of you might have access to such legacy software and so I am pointing you to this post. 😉


Support for SQL Server 2005 ends April 16, 2016 (that’s next April)

Support for SQL Server 2008 ended July 8, 2014 Ouch! You are more than a year into a dangerous place. Upgrade, migrate or get another job. Hard times are coming and blame will be assigned.

MS SQL Server -> Hadoop

Thursday, January 16th, 2014

Community Tutorial 04: Import from Microsoft SQL Server into the Hortonworks Sandbox using Sqoop

From the webpage:

For a simple proof of concept I wanted to get data from MS SQL Server into the Hortonworks Sandbox in an automated fasion using Sqoop. Apache Sqoop provides a way of efficiently transferring bulk data between Apache Hadoop and relational databases. This tutorial will show you how to use Sqoop to import data into the Hortonworks Sandbox from a Microsoft SQL Server data source.

You’ll have to test this one without me.

I have thought about setting up a MS SQL Server but never got around to it. 😉


Wednesday, March 6th, 2013


From the webpage:

PolyBase is a fundamental breakthrough in data processing used in SQL Server 2012 Parallel Data Warehouse to enable truly integrated query across Hadoop and relational data.

Complementing Microsoft’s overall Big Data strategy, PolyBase is a breakthrough new technology on the data processing engine in SQL Server 2012 Parallel Data Warehouse designed as the simplest way to combine non-relational data and traditional relational data in your analysis. While customers would normally burden IT to pre-populate the warehouse with Hadoop data or undergo an extensive training on MapReduce in order to query non-relational data, PolyBase does this all seamlessly giving you the benefits of “Big Data” without the complexities.

I must admit I had my hopes up for the videos labeled: “Watch informative videos to understand PolyBase.”

But the first one was only 2:52 in length and the second was about the Jim Gray Systems Lab (2:13).

So, fair to say it was short on details. 😉

The closest thing I found to a clue was in the PolyBase datasheet that reads (under PolyBase Use Cases, if you are reading along) where it says:

PolyBase introduces the concept of external tables to represent data residing in HDFS. An external table defines a schema (that is, columns and their types) for data residing in HDFS. The table’s metadata lives in the context of a SQL Server database and the actual table data resides in HDFS.

I assume that means that the data in HDFS could have multiple external tables for the same data? Depending upon the query?

Curious if the external tables and/or data types are going to have mapreduce capabilities built-in? To take advantage of parallel processing of the data?

BTW, for topic map types, subject identities for the keys and data types would be the same as with more traditional “internal” tables. In case you want to merge data.

Just out of curiosity, any thoughts on possible IP on external schemas being applied to data?

I first saw this at Alex Popescu’s Microsoft PolyBase: Unifying Relational and Non-Relational Data.

A Rickety Stairway to SQL Server Data Mining, Part 0.1: Data In, Data Out

Sunday, December 2nd, 2012

A Rickety Stairway to SQL Server Data Mining, Part 0.1: Data In, Data Out

A rather refreshing if anonymous take on statistics and data mining.

Since I can access SQL Servers in the cloud (without the necessity of maintaining a local Windows Server box), thought I should look at data mining for SQL Servers.

This was one of the first posts I encountered.

In the first of a series of amateur tutorials on SQL Server Data Mining (SSDM), I promised to pull off an impossible stunt: explaining the broad field of statistics in a few paragraphs without the use of equations. What other SQL Server blog ends with a cliffhanger like that? Anyone who aims at incorporating data mining into their IT infrastructure or skill set in any substantial way is going to have to learn to interpret equations, but it is possible to condense a few key statistical concepts in a way that will help those who aren’t statisticians – like me – to make productive use of SSDM without them. These crude Cliff’s Notes can at least familiarize DBAs, programmers and other readers of these tutorials with the minimal bare bones concepts they will need to know in order to interpret the data output by SSDM’s nine algorithms, as well as to illuminate the inner workings of the algorithms themselves. Without that minimal foundation, it will be more difficult to extract useful meaning from your data mining efforts.

The first principle to keep in mind is so absurdly obvious that it is often half-consciously forgotten – perhaps because it is right before our noses – but it is indispensable to understanding both the field of statistics and the stats output by SSDM. To wit, the numbers signify something. Some intelligence assigned meaning to them. One of the biggest hurdles when interpreting statistical data, reading equations or learning a foreign language is the subtle, almost subconscious error of forgetting that these symbols reflect ideas in the head of another conscious human being, which probably correspond to ideas that you also have in your head, but simply lack the symbols to express. An Englishman learning to read or write Spanish, Portuguese, Russian or Polish may often forget that the native speakers of these languages are trying to express the exact same concepts that an English speaker would; they have the exact same ideas in their heads as we do, but communicate them quite differently. Quite often, the seemingly incoherent quirks and rules of a particular foreign language may actually be part of a complex structure designed to convey identical, ordinary ideas in a dissimilar, extraordinary way. It is the same way with mathematical equations: the scientists and mathematicians who use them are trying to convey ideas in the most succinct way they know. It is often easier for laymen to understand the ideas and supporting evidence that those equations are supposed to express, when they’re not particularly well-versed in the detailed language that equations represent. I’m a layman, like some of my readers probably are. My only claim to expertise in this area is that when I was in fourth grade, I learned enough about equations to solve the ones my father, a college physics teacher, taught every week – but then I forgot it all, so I found myself back at Square One when I took up data mining a few years back.

On a side note, it would be wise for anyone who works with equations regularly to consciously remind themselves that they are merely symbols representing ideas, rather than the other way around; a common pitfall among physicists and other scientists who work with equations regularly seems to be the Pythagorean heresy, i.e. the quasi-religious belief that reality actually consists of mathematical equations. It doesn’t. If we add two apples to two apples, we end up with four apples; the equation 2 + 2 = 4 expresses the nature and reality of several apples, rather than the apples merely being a stand-in for the equation. Reality is not a phantom that obscures some deep, dark equation underlying all we know; math is simply a shortcut to expressing certain truths about the external world. This danger is magnified when we pile abstraction on top of abstraction, which may lead to the construction of ivory towers that eventually fall, often spectacularly. This is a common hazard in the field of finance, where our economists often forget that money is just an abstraction based on agreements among large numbers of people to assign certain meanings to it that correspond to tangible, physical goods; all of the periodic financial crashes that have plagued Western civilization since Tulipmania have been accompanied by a distinct forgetfulness of this fact, which automatically produces the scourge of speculation. I’ve often wondered if this subtle mistake has also contributed to the rash of severe mental illness among mathematicians and physicists, with John Nash (of the film A Beautiful Mind), Nicolai Tesla and Georg Cantor being among the most recognized names in a long list of victims. It may also be linked to the uncanny ineptitude of our most brilliant physicists and mathematicians when it comes to philosophy, such as Rene Descartes, Albert Einstein, Stephen Hawking and Alan Turing. In his most famous work, Orthodoxy, 20th Century British journalist G.K. Chesterton noticed the same pattern, which he summed up thus: “Poets do not go mad; but chess-players do. Mathematicians go mad, and cashiers; but creative artists very seldom. I am not, as will be seen, in any sense attacking logic: I only say that this danger does lie in logic, not in imagination.”[1] At a deeper level, some of the risk to mental health from excessive math may pertain to seeking patterns that aren’t really there, which may be closely linked to the madness underlying ancient “arts” of divination like haruspicy and alectromancy.

Amazon RDS Now Supports SQL Server 2012

Saturday, September 29th, 2012

Amazon RDS Now Supports SQL Server 2012

From the post:

The Amazon Relational Database Service (RDS) now supports SQL Server 2012.You can now launch the Express, Web, and Standard Editions of this powerful database from the comfort of the AWS Management Console. SQL Server 2008 R2 is still available, as are multiple versions and editions of MySQL and Oracle Database.

If you are from the Microsoft world and haven't heard of RDS, here's the executive summary: You can run the latest and greatest offering from Microsoft in a fully managed environment. RDS will install and patch the database, make backups, and detect and recover from failures. It will also provide you with a point-and-click environment to make it easy for you to scale your compute resources up and down as needed.

What's New?
SQL Server 2012 supports a number of new features including contained databases, columnstore indexes, sequences, and user-defined roles:

  • A contained database is isolated from other SQL Server databases including system databases such as "master." This isolation removes dependencies and simplifies the task of moving databases from one instance of SQL Server to another.
  • Columnstore indexes are used for data warehouse style queries. Used properly, they can greatly reduce memory consumption and I/O requests for large queries.
  • Sequences are counters that can be used in more than one table.
  • The new user-defined role management system allows users to create custom server roles.

Read the SQL Server What's New documentation to learn more about these and other features.

I almost missed this!

It is about the only way I am going to get to play with SQL Server. I don’t have a local Windows sysadmin to maintain the server, etc.


Friday, September 14th, 2012

JMyETL, an easy to use ETL tool that supports 10 different RDBMS by Esen Sagynov.

From the post:

JMyETL is a very useful and simple Java based application for Windows OS which allows users to import and export data from/to various database systems. For example:

  • CUBRID –> Sybase ASE, Sybase ASA, MySQL, Oracle, PostgreSQL, SQL Server, DB2, Access, SQLite
  • MySQL –> Sybase ASE/ASA, Oracle, Access, PostgreSQL, SQL Server, DB2, SQLite, CUBRID
  • Sybase ASE –> Sybase ASA, MySQL, Oracle, Access, PostgreSQL, SQL Server, DB2, SQLite, CUBRID
  • Sybase ASA –> Sybase ASE, MySQL, Oracle, Access, PostgreSQL, SQL Server, DB2, SQLite, CUBRID
  • Oracle –> Sybase ASA, Sybase ASE, MySQL, Access, PostgreSQL, SQL Server, DB2, SQLite, CUBRID
  • Access –> Sybase ASE, Sybase ASA, MySQL, Oracle, PostgreSQL, SQL Server, DB2, SQLite, CUBRID
  • PostgreSQL –> Sybase ASE, Sybase ASA, MySQL, Oracle, Access, SQL Server, DB2, SQLite, CUBRID
  • SQL Server –> Sybase ASE, Sybase ASA, MySQL, Oracle, PostgreSQL, Access, DB2, SQLite, CUBRID
  • DB2 –> Sybase ASE, Sybase ASA, MySQL, Oracle, PostgreSQL, SQL Server, Access, SQLite, CUBRID
  • SQLite –> Sybase ASE, Sybase ASA, MySQL, Oracle, PostgreSQL, SQL Server, DB2, Access, CUBRID

Just in case you need a database to database ETL utility.

I first saw this at DZone.

Data Mining with Microsoft SQL Server 2008 [Book Review]

Sunday, June 17th, 2012

Data Mining with Microsoft SQL Server 2008

Sandro Saitta writes:

If you are using Microsoft data mining tools, this book is a must have. Written by MacLennan, Tang and Crivat, it describes how to perform data mining using SQL Server 2008. The book is huge – more than 630 pages – but it is normal since authors give detailed explanation for each data mining function. The book covers topics such as general data mining concepts, DMX, Excel add-ins, OLAP cubes, data mining architecture and many more. The seven data mining algorithms included in the tool are described in separate chapters.

The book is well written, so it can be read from A to Z or by selecting specific chapters. Each theoretical concept is explained through examples. Using screenshots, each step of a given method is presented in details. It is thus more a user manual than a book explaining data mining concepts. Don’t expect to read any detailed algorithms or equations. A good surprise of the book are the case studies. They are present in most chapters and show real examples and how to solve them. It really shows the experience of the authors in the field.

I haven’t seen the book, yet, but that can be corrected. 😉

Beyond Relational

Monday, December 26th, 2011

Beyond Relational

I originally arrived at this site because of a blog hosted there with lessons on Oracle 10g. Exploring a bit I decided to post about it.

Seems to have fairly broad coverage, from Oracle and PostgreSQL to TSQL and XQuery.

Likely to be a good site for learning cross-overs between systems that you can map for later use.

Suggestions of similar sites?

Toad for Cloud Databases (Quest Software)

Tuesday, November 8th, 2011

Toad for Cloud Databases (Quest Software)

From the news release:

The data management industry is experiencing more disruption than at any other time in more than 20 years. Technologies around cloud, Hadoop and NoSQL are changing the way people manage and analyze data, but the general lack of skill sets required to manage these new technologies continues to be a significant barrier to mainstream adoption. IT departments are left without a clear understanding of whether development and DBA teams, whose expertise lies with traditional technology platforms, can effectively support these new systems. Toad® for Cloud Databases addresses the skill-set shortage head-on, empowering database professionals to directly apply their existing skills to emerging Big Data systems through an easy-to-use and familiar SQL-based interface for managing non-relational data. 

News Facts:

  • Toad for Cloud Databases is now available as a fully functional, commercial-grade product, for free, at  Toad for Cloud Databases enables users to generate queries, migrate, browse, and edit data, as well as create reports and tables in a familiar SQL view. By simplifying these tasks, Toad for Cloud Databases opens the door to a wider audience of developers, allowing more IT teams to experience the productivity gains and cost benefits of NoSQL and Big Data.
  • Quest first released Toad for Cloud Databases into beta in June 2010, making the company one of the first to provide a SQL-based database management tool to support emerging, non-relational platforms. Over the past 18 months, Quest has continued to drive innovation for the product, growing its list of supported platforms and integrating a UI for its bi-directional data connector between Oracle and Hadoop.
  • Quest’s connector between Oracle and Hadoop, available within Toad for Cloud Databases, delivers a fast and scalable method for data transfer between Oracle and Hadoop in both directions. The bidirectional characteristic of the utility enables organizations to take advantage of Hadoop’s lower cost of storage and analytical capabilities. Quest also contributed the connector to the Apache Hadoop project as an extension to the existing SQOOP framework, and is also available as part of Cloudera’s Distribution Including Apache Hadoop. 
  • Toad for Cloud Databases today supports:
    • Apache Hive
    • Apache HBase
    • Apache Cassandra
    • MongoDB
    • Amazon SimpleDB
    • Microsoft Azure Table Services
    • Microsoft SQL Azure, and
    • All Open Database Connectivity (ODBC)-enabled relational databases (Oracle, SQL Server, MySQL, DB2, etc)


Anything that eases the transition to cloud computing is going to be welcome. Toad being free will increase the ranks of DBAs who will at least experiment on their own.

Microsoft unites SQL Server with Hadoop

Friday, October 14th, 2011

Microsoft unites SQL Server with Hadoop by Ted Samson.

From the post:

Microsoft today revealed more details surrounding Windows and SQL Server 12 support for big data analytics via cozier integration with Apache Hadoop, the increasingly popular open source cloud platform for handling the vast quantities of unstructured data spawned daily.

With this move, Microsoft may be able to pull off a feat that has eluded other companies: bring big data to the mainstream. As it stands, only large-scale companies with fat IT budgets have been able to reap that analytical bounty, as the tools on the market tend to be both complex and pricey.

Microsoft’s strategy is to groom Linux-friendlier Hadoop to fit snugly into Windows environments, thus giving organizations on-tap, seamless, and simultaneous access to both structured and unstructured data via familiar desktop apps, such as Excel, as well as BI tools such as Microsoft PowerPivot.

That’s the thing isn’t it? There are only so many DoD size contracts to go around. True enough MS will get their share of those as well (enterprises don’t call the corner IT shop). But the larger market is all the non-supersized enterprises with only internal IT shops and limited budgets.

By making MS apps the information superhighway to information stored/processed elsewhere/elsehow (read non-MS), MS opens up an entire world for its user base. Needs to be seamless but I assume MS will be devoting sufficient resources to that cause.

The more seamless MS makes its apps with non-MS innovations, such as Hadoop, the more attractive its apps become to its user base.

The ultimate irony. Non-MS innovators driving demand for MS products.