Archive for the ‘PolyBase’ Category

PolyBase

Wednesday, March 6th, 2013

PolyBase

From the webpage:

PolyBase is a fundamental breakthrough in data processing used in SQL Server 2012 Parallel Data Warehouse to enable truly integrated query across Hadoop and relational data.

Complementing Microsoft’s overall Big Data strategy, PolyBase is a breakthrough new technology on the data processing engine in SQL Server 2012 Parallel Data Warehouse designed as the simplest way to combine non-relational data and traditional relational data in your analysis. While customers would normally burden IT to pre-populate the warehouse with Hadoop data or undergo an extensive training on MapReduce in order to query non-relational data, PolyBase does this all seamlessly giving you the benefits of “Big Data” without the complexities.

I must admit I had my hopes up for the videos labeled: “Watch informative videos to understand PolyBase.”

But the first one was only 2:52 in length and the second was about the Jim Gray Systems Lab (2:13).

So, fair to say it was short on details. 😉

The closest thing I found to a clue was in the PolyBase datasheet that reads (under PolyBase Use Cases, if you are reading along) where it says:

PolyBase introduces the concept of external tables to represent data residing in HDFS. An external table defines a schema (that is, columns and their types) for data residing in HDFS. The table’s metadata lives in the context of a SQL Server database and the actual table data resides in HDFS.

I assume that means that the data in HDFS could have multiple external tables for the same data? Depending upon the query?

Curious if the external tables and/or data types are going to have mapreduce capabilities built-in? To take advantage of parallel processing of the data?

BTW, for topic map types, subject identities for the keys and data types would be the same as with more traditional “internal” tables. In case you want to merge data.

Just out of curiosity, any thoughts on possible IP on external schemas being applied to data?

I first saw this at Alex Popescu’s Microsoft PolyBase: Unifying Relational and Non-Relational Data.