Archive for the ‘MapR’ Category

MapR and Ubuntu

Wednesday, April 3rd, 2013

MapR has posted all of its Hadoop ecosystem source code to Github: MapR Technologies.

MapR has also partnered with Canonical to release the entire Hadoop stack for 12.04 LTS and 12.10 releases of Ubuntu on www.ubuntu.com starting April 25, 2013.

For details see: MapR Teams with Canonical to Deliver Hadoop on Ubuntu.

I first saw this at: MapR Turns to Ubuntu in Bid to Increase Footprint by Isaac Lopez.

LucidWorks™ Teams with MapR™… [Not 26% but 5-6% + not from Big Data]

Wednesday, February 20th, 2013

LucidWorks™ Teams with MapR™ Technologies to Offer Best-in-Class Big Data Analytics Solution

Performance Day just keeps on going!

From the press release:

REDWOOD CITY, Calif. – February 20, 2013 – Big Data provides a very real opportunity for organizations to drive business decisions by utilizing new information that has yet to be tapped. However, it is increasingly apparent that organizations are struggling to make effective use of this new multi-structured content for data-driven decision-making. According to a report from the Economist Intelligence Unit, the challenge is not so much the volume, but instead it is the pressing need to analyze and act on Big Data in real-time.

Existing business intelligence (BI) tools have simply not been designed to provide spontaneous search on multi-structured data in motion. Responding directly to this need, LucidWorks, the company transforming the way people access information, and MapR Technologies, the Hadoop technology leader, today announced the integration between LucidWorks Search™ and MapR. Available now, the combined solution allows organizations to easily search their MapR Distributed File System (DFS) in a natural way to discover actionable insights from information maintained in Hadoop.

“Organizations that wait to address big data until this evolution is well under way will lose out competitively in their vertical markets, compared to organizations that have aggressively pursued big data flexibility. Aggressive organizations will demonstrate faster, more accurate analysis and decisions relating to their tactical operations and strategic planning.”

  • Source: Big Data Drives Rapid Changes in Infrastructure and $232 Billion in IT Spending Through 2016, Gartner Group

Integration Solution Highlights

  • Combines the best of Big Data with Search with an integrated and fully distributed solution
  • Supports a pre-defined MapR target data source within LucidWorks Search
  • Enables users to create and configure the MapR data source directly from the LucidWorks Search administration console
  • Leverages enterprise security features offered by both MapR and LucidWorks Search

The Economist Intelligence Unit study found that global companies experienced a 26 percent improvement in performance over the last three years when big data analytics were applied to the decision-making process. And now, those data-savvy executives are forecasting a 41 percent improvement over the next three years. The integration between LucidWorks Search and MapR makes it easier to put Big Data analytics in motion.

I’m really excited about this match up but you know I can’t simply let claims like “…global companies experienced a 26 percent improvement in performance….” slide by. ;-)

If you go read the report,
The Deciding Factor: Big Data & Decision Making
, you will find at page six (6):

On average, survey participants say that big data has improved their organisations’ performance in the past three years by 26%, and they are optimistic that it will improve performance by an average of 41% in the next three years. While “performance” in this instance is not rigorously specified, it is a useful gauge of mood.

The measured difference in performance, from:

firms that emphasise decision-making based on data and analytics performed 5-6% better—as measured by output and performance—than those that rely on intuition and experience for decision-making.

So, not 26% but 5-6% measured and the 5-6% is for decision-making on data and analytics, not big data.

You don’t find code written at either LucidWorks or MapR that is “close enough.” Both have well deserved reputations for clean code and hard work.

Why should communications fall short of that mark?

Reflective Intelligence and Unnatural Acts

Thursday, December 13th, 2012

I wasn’t in the best of shape today but did manage to attend the webinar: Crowd Sourcing Reflected Intelligence Using Search and Big Data.

Not a lot of detail but there were two topics that caught my attention.

The first was “reflective intelligence,” that is a system that reflects the intelligence of the users back to other users.

Intelligence derived from tracking “clicks,” search terms, etc.

Question: How does your topic map solution “reflect” the intelligence of its users?

That is how do responses “improve” (by some measure) as a result of user interaction.

Could be measuring user behavior, what links do they select for particular query terms. (That is an example from the webinar.) Or could be users adding information, perhaps even suggesting/voting on merges.

The second riff that got my attention was a description of the software under discussion as:

“I don’t have to do unnatural acts.”

Is that like the Papa John’s “better ingredients?” Taken to imply that other pizzas use sub-par ingredients?

Or in this case, other software solutions require “unnatural acts?”

Interesting selling point.

What unusual properties would you claim for topic maps or topic map software?

Crowd Sourcing Reflected Intelligence Using Search and Big Data [Webinar]

Monday, December 3rd, 2012

Crowd Sourcing Reflected Intelligence Using Search and Big Data

Date: December 13, 2012

Time: 10:00 am PT / 1:00 pm ET

From the webpage:

Anyone interested in drawing insights from their Big Data repository/project/application should attend this informative webinar brought to you by MapR and LucidWorks. LucidWorks Search is a development platform that accelerates and simplifies building highly secure, scalable, and cost-effective search applications.

This webinar will show:

  • how search users’ search behavior can be mined
  • how big data analytics can be applied to that raw data
  • how to redeploy that data back to the users to improve their experience

Experts from MapR and Lucidworks will show the strengths of combining the easiest, most dependable and fastest distribution for Hadoop with the real-time, ad hoc data accessibility of LucidWorks Search to provide analytic capabilities along with scalable machine learning algorithms for deeper insight into both content and user behavior.

Speakers: Grant Ingersoll, Chief Scientist for LucidWorks and Ted Dunning, Chief Application Architect for MapR.

I have seen Grant on video and it was great. If Ted is anywhere close to as good as Grant, this is going to be a webinar to remember!

MapR Now Available as an Option on Amazon Elastic MapReduce

Sunday, June 17th, 2012

MapR Now Available as an Option on Amazon Elastic MapReduce

From the post:

MapR Technologies, Inc., the provider of the open, enterprise-grade distribution for Apache Hadoop, today announced the immediate availability of its MapR Distribution for Hadoop as an option within the Amazon Elastic MapReduce service. Customers can now provision dynamically scalable MapR clusters while taking advantage of the flexibility, agility and massive scalability of Amazon Web Services (AWS). In addition, AWS has made its own Hadoop enhancements available to MapR customers, allowing them to seamlessly use MapR with other AWS offerings such as Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB and Amazon CloudWatch.

“We’re excited to welcome MapR’s feature-rich distribution as an option for customers running Hadoop in the cloud,” said Peter Sirota, general manager of Amazon Elastic MapReduce, AWS. “MapR’s innovative high availability data protection and performance features combined with Amazon EMR’s managed Hadoop environment and seamless integration with other AWS services provides customers a powerful tool for generating insights from their data.”

Customers can provision MapR clusters on-demand and automatically terminate them after finishing data processing, reducing costs as they only pay for the resources they consume. Customers can augment their existing on-premise deployments with AWS-based clusters to improve disaster recovery and access additional compute resources as required.

“For many customers there is no longer a compelling business case for deploying an on-premise Hadoop cluster given the secure, flexible and highly cost effective platform for running MapR that AWS provides,” said John Schroeder, CEO and co-founder, MapR Technologies. “The combination of AWS infrastructure and MapR’s technology, support and management tools enables organizations to potentially lower their costs while increasing the flexibility of their data intensive applications.”

Are you doing topic maps in the cloud yet?

A rep from one of the “big iron” companies was telling me how much more reliable owning your own hardware with their software than the cloud.

True, but that has the same answer as the question: Who needs the capacity to process petabytes of data in real time?

If the truth were told, there are a few companies, organizations that could benefit from that capability.

But the rest of us don’t have that much data or the talent to process it if we did.

Over the summer I am going to try the cloud out, both generally and for topic maps.

Suggestions/comments?

The Search Is Over: Integrating Solr and Hadoop to Simplify Big Data Analytics

Sunday, May 27th, 2012

The Search Is Over: Integrating Solr and Hadoop to Simplify Big Data Analytics

From MapR Technologies.

Show of hands. How many of you can name the solution found in these slides?

;-)

Slides are great for entertainment.

Solutions require more, a great deal more.

For the “more” on MapR, see: Download Hadoop Software Datasheets, Product Documentation, White Papers

Mr. MapR: A Xoogler

Sunday, January 8th, 2012

Mr. MapR: A Xoogler

Cynthia Murrell of BeyondSearch writes:

Wired Enterprise gives us a glimpse into MapR, a new distribution for Apache Hadoop, in “Ex-Google Man Sells Search Genius to Rest of World.” The ex-Googler in this case is M.C. Srivas, who was so impressed with Google’s MapReduce platform that he decided to spread its concepts to the outside world.

Sounds great! So I head over to the MapR site and choose Unique Features of MapR Hadoop Distribution, where I find:

  • Finish small jobs quickly with MapR ExpressLane
  • Mount your Hadoop cluster with Direct Access NFS™
  • Enable realtime data flows
  • Use the MapR Heatmap™, alerts, and alarms to monitor your cluster
  • Manage your data easily with Volumes
  • Scale up and create an unlimited number of files
  • Get jobs done faster with half the hardware
  • Eliminate downtime and performance bottlenecks with Distributed NameNode HA
  • Eliminate lost jobs with HA Jobtracker
  • Enable Point-in-time Recovery with MapR Snapshots
  • Synchronize data across clusters with Mirroring
  • Let multiple jobs safely share your Hadoop cluster
  • Control data placement for improved performance, security or manageability

Maybe I am missing it. Do you see any Search Genius in that list?

MapR may have improved the usability/reliability of Hadoop, which is no small thing, but disappointing when looking for better search results.

Let’s represent the original Hadoop with this Wikipedia image:

Additor

and the MapR version of Hadoop with this Wikipedia image:

It is true that the MapR version has more unique features but none of them appear to relate to search.

I am sure that Hadoop cluster managers and others will be interested in MapR (as will some of the rest of us), as managers.

As searchers, we may have to turn somewhere else. Do you disagree?

PS: Cloudera has made more contributions to the Hadoop and Apache communities than I can list in a very long post. Keep than in mind when you see ill-mannered and juvenile sniping at their approach to Hadoop.