Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

June 21, 2012

Lessons from Amazon RDS on Bringing Existing Apps to the Cloud

Filed under: Amazon Web Services AWS,Cloud Computing — Patrick Durusau @ 5:56 am

Lessons from Amazon RDS on Bringing Existing Apps to the Cloud by Nati Shalom.

From the post:

Its a common believe that Cloud is good for green field apps. There are many reasons for this, in particular the fact that the cloud forces a different kind of thinking on how to run apps. Native cloud apps were designed to scale elastically, they were designed with complete automation in mind, and so forth. Most of the existing apps (a.k.a brown field apps) were written in a pre-cloud world and therefore don’t support these attributes. Adding support for these attributes could carry a significant investment. In some cases, this investment could be so big that it would make more sense to go through a complete re-write.

In this post I want to challenge this common belief. Over the past few years I have found that many stateful applications running on the cloud don’t support all those attributes, elasticity in particular. One of the better-known examples of this is MySQL and its Amazon cloud offering, RDS, which I’ll use throughout this post to illustrate my point.

Amazon RDS as an example for migrating a brown-field applications

MySQL was written in a pre-cloud world and therefore fits into the definition of a brown-field app. As with many brown-field apps, it wasn’t designed to be elastic or to scale out, and yet it is one of the more common and popular services on the cloud. To me, this means that there are probably other attributes that matter even more when we consider our choice of application in the cloud. Amazon RDS is the cloud-enabled version of MySQL. It can serve as a good example to find what those other attributes could be.

You have to admit that the color imagery is telling. Pre-cloud applications are “brown-field” apps and cloud apps are “green.”

I think the survey numbers about migrating to the cloud are fairly soft and not always consistent. There will be “green” and “brown” field apps created or migrated to the cloud.

But brown field apps will remain just as relational databases did not displace all the non-relational databases, which persist to this day.

Technology is as often “in addition to” as it is “in place of.”

June 17, 2012

MapR Now Available as an Option on Amazon Elastic MapReduce

Filed under: Amazon Web Services AWS,Hadoop,MapR,MapReduce — Patrick Durusau @ 3:59 pm

MapR Now Available as an Option on Amazon Elastic MapReduce

From the post:

MapR Technologies, Inc., the provider of the open, enterprise-grade distribution for Apache Hadoop, today announced the immediate availability of its MapR Distribution for Hadoop as an option within the Amazon Elastic MapReduce service. Customers can now provision dynamically scalable MapR clusters while taking advantage of the flexibility, agility and massive scalability of Amazon Web Services (AWS). In addition, AWS has made its own Hadoop enhancements available to MapR customers, allowing them to seamlessly use MapR with other AWS offerings such as Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB and Amazon CloudWatch.

“We’re excited to welcome MapR’s feature-rich distribution as an option for customers running Hadoop in the cloud,” said Peter Sirota, general manager of Amazon Elastic MapReduce, AWS. “MapR’s innovative high availability data protection and performance features combined with Amazon EMR’s managed Hadoop environment and seamless integration with other AWS services provides customers a powerful tool for generating insights from their data.”

Customers can provision MapR clusters on-demand and automatically terminate them after finishing data processing, reducing costs as they only pay for the resources they consume. Customers can augment their existing on-premise deployments with AWS-based clusters to improve disaster recovery and access additional compute resources as required.

“For many customers there is no longer a compelling business case for deploying an on-premise Hadoop cluster given the secure, flexible and highly cost effective platform for running MapR that AWS provides,” said John Schroeder, CEO and co-founder, MapR Technologies. “The combination of AWS infrastructure and MapR’s technology, support and management tools enables organizations to potentially lower their costs while increasing the flexibility of their data intensive applications.”

Are you doing topic maps in the cloud yet?

A rep from one of the “big iron” companies was telling me how much more reliable owning your own hardware with their software than the cloud.

True, but that has the same answer as the question: Who needs the capacity to process petabytes of data in real time?

If the truth were told, there are a few companies, organizations that could benefit from that capability.

But the rest of us don’t have that much data or the talent to process it if we did.

Over the summer I am going to try the cloud out, both generally and for topic maps.

Suggestions/comments?

June 12, 2012

One Trillion Stored (and counting) [new uncertainty principle?]

Filed under: Amazon Web Services AWS — Patrick Durusau @ 2:34 pm

Amazon S3 – The First Trillion Objects

Jeff Barr writes:

Late last week the number of objects stored in Amazon S3 reached one trillion (1,000,000,000,000 or 1012). That’s 142 objects for every person on Planet Earth or 3.3 objects for every star in our Galaxy. If you could count one object per second it would take you 31,710 years to count them all.

We knew this day was coming! Lately, we’ve seen the object count grow by up to 3.5 billion objects in a single day (that’s over 40,000 new objects per second).

Old news because no doubt the total is greater than one trillion a week later. Or perhaps any time period greater than 1/40,000 of a second?

Is there a new uncertainty principle? Overall counts for S3 are estimates for some time X?

May 19, 2012

New Mechanical Turk Categorization App

Filed under: Amazon Web Services AWS,Classification,Mechanical Turk — Patrick Durusau @ 10:52 am

New Mechanical Turk Categorization App

Categorization is one of the more popular use cases for the Amazon Mechanical Turk. A categorization HIT (Human Intelligence Task) asks the Worker to select from a list of options. Our customers use HITs of this type to assign product categories, match URLs to business listings, and to discriminate between line art and photographs.

Using our new Categorization App, you can start categorizing your own items or data in minutes, eliminating the learning curve that has traditionally accompanied this type of activity. The app includes everything that you need to be successful including:

  1. Predefined HITs (no HTML editing required).
  2. Pre-qualified Master Workers (see Jinesh’s previous blog post on Mechanical Turk Masters).
  3. Price recommendations based on complexity and comparable HITs.
  4. Analysis tools.

The Categorization App guides you through the four simple steps that are needed to create your categorization project.

I thought the contrast between gamers (the GPU post) and MTurkers would be a nice to close the day. 😉

Although, there are efforts to create games where useful activity happens, whether intended or not. (Would that take some of the joy out of a game?)

If you use this particular app, please blog or post a note about your experieince.

Thanks!

May 1, 2012

AWS NYC Summit 2012

Filed under: Amazon Web Services AWS,Cloud Computing — Patrick Durusau @ 4:46 pm

AWS NYC Summit 2012

The line that lead me to this read:

We posted 25 presentations from the New York 2012 AWS Summit.

Actually, no.

Posted 25 slide decks, not presentations.

Useful yes, presentations, no.

Not to complain too much given the rapid expansion of services and technical guidance but let’s not confuse slides with presentations.

The AWS Report (Episode 2) has one major improvement: The clouds in the background don’t move! (As they did in the first episode. Now there was a shadow that moved over the front of the desk.)

We need to ask Amazon to get Jeff a new laptop without all the stickers on the top. If Paula Abdul or Vanna White were doing the interview, the laptop stickers would not be distracting. Or at least not enough to complain. Jeff isn’t Paula Abdul or Vanna White. Sorry Jeff.

I think the AWS Report has real potential. Several short segments with more “facts” and fewer “general” statements would be great.

Enjoyed the Elastic Beanstalk episode but hearing customers are busy, happy and requirements were gathered for other language support (besides Java) is like hearing public service announcements on PBS.

Nothing to disagree with but no real content either.

Suggestion: Perhaps short, say 90 to 120 second description of a typical issue (off mailing list?) that ends with: What is your solution? and feature one or more solutions on the next show? To get the audience involved and get other people hawking the show.

Not quite the cover of the Rolling Stone but perhaps someday… 😉

April 14, 2012

CloudSpokes Coding Challenge Winners – Build a DynamoDB Demo

Filed under: Amazon DynamoDB,Amazon Web Services AWS,Contest,Dynamo — Patrick Durusau @ 6:27 pm

CloudSpokes Coding Challenge Winners – Build a DynamoDB Demo

From the post:

Last November CloudSpokes was invited to participate in the DynamoDB private beta. We spent some time kicking the tires, participating in the forums and developing use cases for their Internet-scale NoSQL database service. We were really excited about the possibilities of DynamoDB and decided to crowdsource some challenge ideas from our 38,000 strong developer community. Needless to say, the release generated quite a bit of buzz.

When Amazon released DynamoDB in January, we launched our CloudSpokes challenge Build an #Awesome Demo with Amazon DynamoDB along with a blog post and a sample ”Kiva Loan Browser Demo” application to get people started. The challenge requirements were wide open and all about creating the coolest application using Amazon DynamoDB. We wanted to see what the crowd could come up with.

The feedback we received from numerous developers was extremely positive. The API was very straightforward and easy to work with. The SDKs and docs, as usual, were top-notch. Developers were able to get up to speed fast as DynamoDB’s simple storage and query methods were easy to grasp. These methods allowed developers to store and access data items with a flexible number of attributes using the simple “Put” or “Get” verbs that they are familiar with. No surprise here, but we had a number of comments regarding the speed of both read and write operations.

When our challenge ended a week later we were pleasantly surprised with the applications and chose to highlight the following top five:

I don’t think topic maps has 38,000 developers but challenges do seem to pull people out of the woodwork.

Any thoughts on what would make interesting/attractive challenges? Other than five figure prizes? 😉

April 12, 2012

The CloudFormation Circle of Life : Part 1

Filed under: Amazon Web Services AWS,Cloud Computing — Patrick Durusau @ 7:04 pm

The CloudFormation Circle of Life : Part 1

From the post:

AWS CloudFormation makes it easier for you to create, update, and manage your AWS resources in a predictable way. Today, we are announcing a new feature for AWS CloudFormation that allows you to add or remove resources from your running stack, enabling your stack to evolve as its requirements change over time. With AWS CloudFormation, you can now manage the complete lifecycle the AWS resources powering your application.

I think there is a name for this sort of thing. Innovation, that’s right! That’s the name for it!

As topic map services move into the clouds, being able to take advantage of resource stacks is likely to be important. Particularly if you have mapping empowered resources that can be placed in a stack of resources.

The “cloud” in general looks like an opportunity to move away from ETL (Extract-Transform-Load) into more of an ET (Extract-Transform) model. Particularly if you take a functional view of data. Will save on storage costs, particularly if the data sets are quite large.

Definitely a service that anyone working with topic maps in the cloud needs to know more about.

April 11, 2012

AWS Documentation Now Available on the Kindle

Filed under: Amazon Web Services AWS — Patrick Durusau @ 4:35 pm

AWS Documentation Now Available on the Kindle

From the post:

AWS documentation is now available on the Kindle – if this is all you need to know, start here and you’ll have access to the new documents in seconds.

I “purchased” (the actual cost is $0.00) the EC2 Getting Started Guide and had it delivered to my trusty Kindle DX, where it looked great:

[graphic omitted]

You can highlight, annotate, and search the content as desired.

We’ve uploaded 43 documents so far; others will follow shortly.

Two observations:

For the “cloud” Kindle (what I use on Linux to read Kindle titles), should be able to select multiple AWS documentation titles for a single batch download. Yes?

Ahem, at least the “Analyzing Big Data with AWS” did not have an index.

Indexing all the AWS titles together (not entirely auto-magically), would make AWS documentation a cut above its competitors. (At least a goal to start with. Later versions can mix in titles from publishers, blogs, etc.)

April 6, 2012

Amazon DynamoDB Libraries, Mappers, and Mock Implementations Galore!

Filed under: Amazon DynamoDB,Amazon Web Services AWS — Patrick Durusau @ 6:50 pm

Amazon DynamoDB Libraries, Mappers, and Mock Implementations Galore!

From the post:

Today’s guest blogger is Dave Lang, Product Manager of the DynamoDB team, who has a great list of tools and SDKs that will allow you to use DynamoDB from just about any language or environment.

While you are learning AWS, you may as well take a look at the DynamoDB.

Comments on any of these resources? I just looked at them briefly but they seemed quite, err, uneven.

I understand wanting to thank everyone who made an effort but on the other hand, I think AWS customers would be well served by a top coder’s type list of products. X% of the top 100 AWS projects use Y. That sort of thing.

April 2, 2012

The 1000 Genomes Project

The 1000 Genomes Project

If Amazon is hosting a single dataset > 200 TB, is your data “big data?” 😉

This merits quoting in full:

We're very pleased to welcome the 1000 Genomes Project data to Amazon S3. 

The original human genome project was a huge undertaking. It aimed to identify every letter of our genetic code, 3 billion DNA bases in total, to help guide our understanding of human biology. The project ran for over a decade, cost billions of dollars and became the corner stone of modern genomics. The techniques and tools developed for the human genome were also put into practice in sequencing other species, from the mouse to the gorilla, from the hedgehog to the platypus. By comparing the genetic code between species, researchers can identify biologically interesting genetic regions for all species, including us.

A few years ago there was a quantum leap in the technology for sequencing DNA, which drastically reduced the time and cost of identifying genetic code. This offered the promise of being able to compare full genomes from individuals, rather than entire species, leading to a much more detailed genetic map of where we, as individuals, have genetic similarities and differences. This will ultimately give us better insight into human health and disease.

The 1000 Genomes Project, initiated in 2008, is an international public-private consortium that aims to build the most detailed map of human genetic variation available, ultimately with data from the genomes of over 2,661 people from 26 populations around the world. The project began with three pilot studies that assessed strategies for producing a catalog of genetic variants that are present at one percent or greater in the populations studied. We were happy to host the initial pilot data on Amazon S3 in 2010, and today we're making the latest dataset available to all, including results from sequencing the DNA of approximately 1,700 people.

The data is vast (the current set weighs in at over 200Tb), so hosting the data on S3 which is closely located to the computational resources of EC2 means that anyone with an AWS account can start using it in their research, from anywhere with internet access, at any scale, whilst only paying for the compute power they need, as and when they use it. This enables researchers from laboratories of all sizes to start exploring and working with the data straight away. The Cloud BioLinux AMIs are ready to roll with the necessary tools and packages, and are a great place to get going.

Making the data available via a bucket in S3 also means that customers can crunch the information using Hadoop via Elastic MapReduce, and take advantage of the growing collection of tools for running bioinformatics job flows, such as CloudBurst and Crossbow

You can find more information, the location of the data and how to get started using it on our 1000 Genomes web page, or from the project pages.

If that sounds like a lot of data, just imagine all of the recorded mathematical texts and the relationships between the concepts represented in such texts?

It is in our view that data looks smooth or simple. Or complex.

The Total Cost of (Non) Ownership of a NoSQL Database Service

Filed under: Amazon DynamoDB,Amazon Web Services AWS,Cloud Computing — Patrick Durusau @ 5:47 pm

The Total Cost of (Non) Ownership of a NoSQL Database Service

From the post:

We have received tremendous positive feedback from customers and partners since we launched Amazon DynamoDB two months ago. Amazon DynamoDB enables customers to offload the administrative burden of operating and scaling a highly available distributed database cluster while only paying for the actual system resources they consume. We also received a ton of great feedback about how simple it is get started and how easy it is to scale the database. Since Amazon DynamoDB introduced the new concept of a provisioned throughput pricing model, we also received several questions around how to think about its Total Cost of Ownership (TCO).

We are very excited to publish our new TCO whitepaper: The Total Cost of (Non) Ownership of a NoSQL Database service. Download PDF.

I bet you can guess how the numbers work out without reading the PDF file. 😉

Makes me wonder though if there would be a market for a different hosted NoSQL database or topic map application? Particularly a topic map application.

Not along the lines of Maiana but more of a topic based data set, which could respond to data by merging it with already stored data. Say for example a firefighter scans the bar code on a railroad car lying alongside the tracks with fire getting closer. The only think they want is a list of the necessary equipment and whether to leave now, or not.

Most preparedness agencies would be well pleased to simply pay for the usage they get of such a topic map.

March 24, 2012

Two New AWS Getting Started Guides

Filed under: Amazon Web Services AWS,Cloud Computing — Patrick Durusau @ 7:36 pm

Two New AWS Getting Started Guides

From the post:

We’ve put together a pair of new Getting Started Guides for Linux and Microsoft Windows. Both guides will show you how to use EC2, Elastic Load Balancing, Auto Scaling, and CloudWatch to host a web application.

The Linux version of the guide (HTML, PDF) is built around the popular Drupal content management system. The Windows version (HTML, PDF) is built around the equally popular DotNetNuke CMS.

These guides are comprehensive. You will learn how to:

  • Sign up for the services
  • Install the command line tools
  • Find an AMI
  • Launch an Instance
  • Deploy your application
  • Connect to the Instance using the MindTerm SSH Client or PuTTY
  • Configure the Instance
  • Create a custom AMI
  • Create an Elastic Load Balancer
  • Update a Security Group
  • Configure and use Auto Scaling
  • Create a CloudWatch Alarm
  • Clean up

Other sections cover pricing, costs, and potential cost savings.

Not quite a transparent computing fabric, yet. 😉

February 13, 2012

Be Careful When Comparing AWS Costs… (Truth Squad)

Filed under: Amazon Web Services AWS,Marketing — Patrick Durusau @ 8:18 pm

Be Careful When Comparing AWS Costs… (Truth Squad)

Jeff Barr writes:

Earlier today, GigaOM published a cost comparison of self-hosting vs. hosting on AWS. I wanted to bring to your attention a few quick issues that we saw with this analysis:

….

[and concludes]

We did our own calculations taking in to account only the first four issues listed above and came up with a monthly cost for AWS of $56,043 (vs. the \$70,854 quoted in the article). Obviously each workload differs based on the nature of what resources are utilized most.

These analyses are always tricky to do and you always need to make apples-to-apples cost comparisons and the benefits associated with each approach. We’re always happy to work with those wanting to get in to the details of these analyses; we continue to focus on lowering infrastructure costs and we’re far from being done.

Although I applaud Jeff’s efforts to insure we have accurate cost information for AWS, that isn’t why I am following up on his post.

Jeff is following a “truth squad” approach. A “truth squad” knows the correct information and uses it in great detail to correct errors made by others.

To anyone not on the “truth squad” the explanation offered is jargon riddled to the point of being completely opaque. All I really know is that Jeff disagrees with GigaOM. OK, but that’s not real helpful.

More than a few of my topic map posts, past, present and no doubt future, follow a similar approach. With about as much success.

I have a suggestion for myself and Jeff, one that I won’t follow all the time but will try.

If you can’t explain AWS pricing (or topic maps) on the back of a regulation size business card, either you don’t have a clear idea and/or you are explaining it poorly.

Remember that part of Einstein’s theory of relativity can be expressed as: e = mc2.

Within lies a vast amount of detail but it can be expressed very simply.

Something for AWS pricing experts and topic map writers to consider.

February 8, 2012

Amazon S3 Price Reduction

Filed under: Amazon Web Services AWS — Patrick Durusau @ 5:11 pm

Amazon S3 Price Reduction

In case you want to gather up your email archives before turning them into a topic map, Amazon S3 prices have dropped:

As you can tell from my recent post on Amazon S3 Growth for 2011, our customers are uploading new objects to Amazon S3 at an incredible rate. We continue to innovate on your behalf to drive down storage costs and pass along the resultant savings to you at every possible opportunity. We are now happy (some would even say excited) to announce another in a series of price reductions.

With this price change, all Amazon S3 standard storage customers will see a significant reduction in their storage costs. For instance, if you store 50 TB of data on average you will see a 12% reduction in your storage costs, and if you store 500 TB of data on average you will see a 13.5% reduction in your storage costs.

I must confess disappointment that there was no change in the “Next 4000 TB” rate but I suppose I can keep some of my email archives locally. 😉

Other cloud storage options/rates?

February 1, 2012

Amazon S3 Growth for 2011 – Now 762 Billion Objects

Filed under: Amazon Web Services AWS,Semantics — Patrick Durusau @ 4:37 pm

Amazon S3 Growth for 2011 – Now 762 Billion Objects

Just a quick illustration of how one data locale is out stripping efforts to embed semantics in web based content.

January 28, 2012

About the Performance of Map Reduce Jobs

Filed under: Amazon Web Services AWS,Hadoop,MapReduce — Patrick Durusau @ 10:53 pm

About the Performance of Map Reduce Jobs by Michael Kopp.

From the post:

One of the big topics in the BigData community is Map/Reduce. There are a lot of good blogs that explain what Map/Reduce does and how it works logically, so I won’t repeat it (look here, here and here for a few). Very few of them however explain the technical flow of things, which I at least need, to understand the performance implications. You can always throw more hardware at a map reduce job to improve the overall time. I don’t like that as a general solution and many Map/Reduce programs can be optimized quite easily, if you know what too look for. And optimizing a large map/reduce jobs can be instantly translated into ROI!

The Word Count Example

I went over some blogs and tutorials about performance of Map/Reduce. Here is one that I liked. While there are a lot of good tips out there, none, except the one mentioned, talk about the Map/Reduce program itself. Most dive right into the various hadoop options to improve distribution and utilization. While this is important, I think we should start the actual problem we try to solve, that means the Map/Reduce Job.

To make things simple I am using Amazons Elastic Map Reduce. In my setup I started a new Job Flow with multiple steps for every execution. The Job Flow consisted of one master node and two task nodes. All of them were using the Small Standard instance.

While AWS Elastic Map/Reduce has its drawbacks in terms of startup and file latency (Amazon S3 has a high volatility), it is a very easy and consistent way to execute Map/Reduce jobs without needing to setup your own hadoop cluster. And you only pay for what you need! I started out with the word count example that you see in every map reduce documentation, tutorial or Blog.

Yet another reason (other than avoiding outright failure) for testing your Map/Reduce jobs locally before in a pay-for-use environment. The better you understand the job and its requirements, the more likely you are to create an effective and cost-efficient solution.

January 26, 2012

AWS HowTo: Using Amazon Elastic MapReduce with DynamoDB

Filed under: Amazon DynamoDB,Amazon Web Services AWS,MapReduce — Patrick Durusau @ 6:44 pm

AWS HowTo: Using Amazon Elastic MapReduce with DynamoDB by Adam Gray. Adam is a Product Manager on the Elastic MapReduce Team.

From the post:

Apache Hadoop and NoSQL databases are complementary technologies that together provide a powerful toolbox for managing, analyzing, and monetizing Big Data. That’s why we were so excited to provide out-of-the-box Amazon Elastic MapReduce (Amazon EMR) integration with Amazon DynamoDB, providing customers an integrated solution that eliminates the often prohibitive costs of administration, maintenance, and upfront hardware. Customers can now move vast amounts of data into and out of DynamoDB, as well as perform sophisticated analytics on that data, using EMR’s highly parallelized environment to distribute the work across the number of servers of their choice. Further, as EMR uses a SQL-based engine for Hadoop called Hive, you need only know basic SQL while we handle distributed application complexities such as estimating ideal data splits based on hash keys, pushing appropriate filters down to DynamoDB, and distributing tasks across all the instances in your EMR cluster.

In this article, I’ll demonstrate how EMR can be used to efficiently export DynamoDB tables to S3, import S3 data into DynamoDB, and perform sophisticated queries across tables stored in both DynamoDB and other storage services such as S3.

Time to get that AWS account!

January 18, 2012

Amazon DynamoDB

Filed under: Amazon DynamoDB,Amazon Web Services AWS — Patrick Durusau @ 7:58 pm

Amazon DynamoDB – a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications by Werner Vogels.

From the post:

Today is a very exciting day as we release Amazon DynamoDB, a fast, highly reliable and cost-effective NoSQL database service designed for internet scale applications. DynamoDB is the result of 15 years of learning in the areas of large scale non-relational databases and cloud services. Several years ago we published a paper on the details of Amazon’s Dynamo technology, which was one of the first non-relational databases developed at Amazon. The original Dynamo design was based on a core set of strong distributed systems principles resulting in an ultra-scalable and highly reliable database system. Amazon DynamoDB, which is a new service, continues to build on these principles, and also builds on our years of experience with running non-relational databases and cloud services, such as Amazon SimpleDB and Amazon S3, at scale. It is very gratifying to see all of our learning and experience become available to our customers in the form of an easy-to-use managed service.

Amazon DynamoDB is a fully managed NoSQL database service that provides fast performance at any scale. Today’s web-based applications often encounter database scaling challenges when faced with growth in users, traffic, and data. With Amazon DynamoDB, developers scaling cloud-based applications can start small with just the capacity they need and then increase the request capacity of a given table as their app grows in popularity. Their tables can also grow without limits as their users store increasing amounts of data. Behind the scenes, Amazon DynamoDB automatically spreads the data and traffic for a table over a sufficient number of servers to meet the request capacity specified by the customer. Amazon DynamoDB offers low, predictable latencies at any scale. Customers can typically achieve average service-side in the single-digit milliseconds. Amazon DynamoDB stores data on Solid State Drives (SSDs) and replicates it synchronously across multiple AWS Availability Zones in an AWS Region to provide built-in high availability and data durability.

Impressive numbers and I am sure this is impressive software.

Two questions: Werner starts off talking about “internet scale” and then in the second paragraph says there is “…fast performance at any scale.”

Does anybody know what “internet scale” means? If they said U.S. Census scale, where I know software has been developed for record linkage on billion row tables, then I might have some idea of what is meant. If you know or can point to someone who does, please comment.

Second question: So if I need the Amazon DynamoDB because it handles “internet scale,” why would I need it for something less? My wife needs a car to go back and forth to work, but that doesn’t mean she needs a Hummer. Yes? I would rather choose a tool that is fit for the intended purpose. If you know a sensible break point for choosing the Amazon DynamoDB, please comment.

Disclosure: I buy books and other stuff at Amazon. But I don’t think my purchases past, present or future have influenced my opinions in this post. 😉

First seen at: myNoSQL as: Amazon DynamoDB – a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications.

November 9, 2011

Apache Mahout: Scalable machine learning for everyone

Filed under: Amazon Web Services AWS,Mahout — Patrick Durusau @ 7:41 pm

Apache Mahout: Scalable machine learning for everyone by Grant Ingersoll.

Summary:

Apache Mahout committer Grant Ingersoll brings you up to speed on the current version of the Mahout machine-learning library and walks through an example of how to deploy and scale some of Mahout’s more popular algorithms.

A short summary to a twenty-three (23) page paper that concludes with two (2) pages of pointers to additional resources!

You will learn a lot about Mahout and Amazon Web Services (EC2).

« Newer Posts

Powered by WordPress