Archive for the ‘Big Query’ Category

BigQuery [first 1 TB of data processed each month is free]

Sunday, February 22nd, 2015

BigQuery [first 1 TB of data processed each month is free]

Apologies if this is old news to you but I saw a tweet by GoogleCloudPlatform advertising the “first 1 TB of data processed each month is free” and felt compelled to pass it on.

Like so much news on the Internet, if it is “new” to us, we assume it must be “new” to everyone else. (That is how the warnings of malware that will alter your DNA spread.)

It is a very temping offer.

Temping enough that I am going to spend some serious time looking at BigQuery.

What’s your query for BigQuery?

Got big JSON? BigQuery expands data import for large scale web apps

Tuesday, October 2nd, 2012

Got big JSON? BigQuery expands data import for large scale web apps by Ryan Boyd, Developer Advocate.

From the post:

JSON is the data format of the web. JSON is used to power most modern websites, is a native format for many NoSQL databases hosting top web applications, and provides the primary data format in many REST APIs. Google BigQuery, our cloud service for ad-hoc analytics on big data, has now added support for JSON and the nested/repeated structure inherent in the data format.

JSON opens the door to a more object-oriented view of your data compared to CSV, the original data format supported by BigQuery. It removes the need for duplication of data required when you flatten records into CSV. Here are some examples of data you might find a JSON format useful for:

  • Log files, with multiple headers and other name-value pairs.
  • User session activities, with information about each activity occurring nested beneath the session record.
  • Sensor data, with variable attributes collected in each measurement.

Nested/repeated data support is one of our most requested features. And while BigQuery’s underlying infrastructure supports it, we’d only enabled it in a limited fashion through M-Lab’s test data. Today, however, developers can use JSON to get any nested/repeated data into and out of BigQuery.

It had to happen. “Big Json” that is.

My question is when “Bigger Data” is going to catch on?

If you got far enough ahead, say six to nine months, you could copyright something like “Biggest Data” and start collecting fees when it comes into common usage.

Will Google Big Query Transform Big Data Analysis?

Sunday, March 25th, 2012

Will Google Big Query Transform Big Data Analysis? by Doug Henschen.

From the post:

Google shared details Wednesday about Google Big Query, a cloud-based service that promises to bring the search giant’s immense compute power and expertise with algorithms to bear on large data sets. The service is still in limited beta preview, but it promises to speed analysis of Google ad data while opening up ways to mash up and analyze huge data sets from external sources.

Google Big Query was described by Ju-Kay Kwek, product manager for Google Cloud Platform Team, as offering an array of SQL and graphical-user-interface-driven SQL analyses of tens of terabytes of data per customer, yet it doesn’t require indexing or pre-caching. What’s more, customers will get fine-grained analysis of all their data without summaries or aggregations.

“Fine-grained data is the key to the service because we don’t know what questions customers are going to ask,” said Kwek in an onstage interview at this week’s GigaOm Structure Data conference in New York.

Some of Google’s beta customers are uploading data to the service with batches and data streams and treating it as a cloud-based data warehouse, but Kwek said ad data would be the first priority, supporting a Google customer’s need to understand massive global campaigns running in multiple languages.

“When an advertiser wants to understand the ROI or effectiveness of a keyword campaign running across the globe, that’s a big-data problem,” Kwek said. “They’re currently extracting data using the Adwords API, building sharded databases on-premises, doing all the indexing, and sometimes losing track of the questions they wanted to ask by the time they have the data available.”

Thus, time to insight will be the biggest benefit of the service, Kwek said, with analyses taking a day or less, rather than days or weeks, when customers face extracting and structuring data on less robust and capable on-premises platforms.

I am troubled by the presumptions that Google is making with Big Query.

Google’s Big Query presumes:

  1. Customer’s big data has value to be extracted.
  2. Value is not being extracted now due to lack of computing resources.
  3. The missing computing resources can be supplied by Big Query.
  4. The customer has the analysis resources to extract the value using Big Query. (Not the same thing as writing SQL or dashboards.)
  5. The customer can act upon the value extracted from its big data.

If any of those presumptions fail, then so does the value of using Google’s Big Query.

Resources for BigQuery developers. Including version 2 of the Developers Guide.