Archive for the ‘Google Compute Engine’ Category

Jump-start your data pipelining into Google BigQuery

Monday, October 7th, 2013

Like they said at Woodstock, “if you don’t think ETL is all that weird,” wait, wasn’t that, “if you don’t think capitalism is all that weird?”

Maybe, maybe not. But in any event, Wally Yau has written guidance on getting the Google Compute Engine up and ready do to some ETL in Jump-start your data pipelining into Google BigQuery

Or if you have already “cooked” data there is another sample application, Automated File Loader for BigQuery, shows how to load data that will produce your desired results.

Both of these are from: Getting Started with Google BigQuery.

You do know that Google is located in the United States?

Google Compute Engine: Computing without limits

Friday, June 29th, 2012

Google Compute Engine: Computing without limits by Craig McLuckie.

From the post:

Over the years, Google has built some of the most high performing, scalable and efficient data centers in the world by constantly refining our hardware and software. Since 2008, we’ve been working to open up our infrastructure to outside developers and businesses so they can take advantage of our cloud as they build applications and websites and store and analyze data. So far this includes products like Google App Engine, Google Cloud Storage, and Google BigQuery.

Today, in response to many requests from developers and businesses, we’re going a step further. We’re introducing Google Compute Engine, an Infrastructure-as-a-Service product that lets you run Linux Virtual Machines (VMs) on the same infrastructure that powers Google. This goes beyond just giving you greater flexibility and control; access to computing resources at this scale can fundamentally change the way you think about tackling a problem.

Google Compute Engine offers:

  • Scale. At Google we tackle huge computing tasks all the time, like indexing the web, or handling billions of search queries a day. Using Google’s data centers, Google Compute Engine reduces the time to scale up for tasks that require large amounts of computing power. You can launch enormous compute clusters – tens of thousands of cores or more.
  • Performance. Many of you have learned to live with erratic performance in the cloud. We have built our systems to offer strong and consistent performance even at massive scale. For example, we have sophisticated network connections that ensure consistency. Even in a shared cloud you don’t see interruptions; you can tune your app and rely on it not degrading.
  • Value. Computing in the cloud is getting even more appealing from a cost perspective. The economy of scale and efficiency of our data centers allows Google Compute Engine to give you 50% more compute for your money than with other leading cloud providers. You can see pricing details here.

The capabilities of Google Compute Engine include:

  • Compute. Launch Linux VMs on-demand. 1, 2, 4 and 8 virtual core VMs are available with 3.75GB RAM per virtual core.
  • Storage. Store data on local disk, on our new persistent block device, or on our Internet-scale object store, Google Cloud Storage.
  • Network. Connect your VMs together using our high-performance network technology to form powerful compute clusters and manage connectivity to the Internet with configurable firewalls.
  • Tooling. Configure and control your VMs via a scriptable command line tool or web UI. Or you can create your own dynamic management system using our API.

Google Compute Engine Preview – Signup

Wondering how this will impact evaluations of CS papers? And what data sets will be used on a routine basis?

To say nothing of exploration of data/text mining.

Now if we can just get access to the majority of research literature, well, but that’s an issue for another forum.