Archive for the ‘LucidWorks’ Category

LucidWorks™ Teams with MapR™… [Not 26% but 5-6% + not from Big Data]

Wednesday, February 20th, 2013

LucidWorks™ Teams with MapR™ Technologies to Offer Best-in-Class Big Data Analytics Solution

Performance Day just keeps on going!

From the press release:

REDWOOD CITY, Calif. – February 20, 2013 – Big Data provides a very real opportunity for organizations to drive business decisions by utilizing new information that has yet to be tapped. However, it is increasingly apparent that organizations are struggling to make effective use of this new multi-structured content for data-driven decision-making. According to a report from the Economist Intelligence Unit, the challenge is not so much the volume, but instead it is the pressing need to analyze and act on Big Data in real-time.

Existing business intelligence (BI) tools have simply not been designed to provide spontaneous search on multi-structured data in motion. Responding directly to this need, LucidWorks, the company transforming the way people access information, and MapR Technologies, the Hadoop technology leader, today announced the integration between LucidWorks Search™ and MapR. Available now, the combined solution allows organizations to easily search their MapR Distributed File System (DFS) in a natural way to discover actionable insights from information maintained in Hadoop.

“Organizations that wait to address big data until this evolution is well under way will lose out competitively in their vertical markets, compared to organizations that have aggressively pursued big data flexibility. Aggressive organizations will demonstrate faster, more accurate analysis and decisions relating to their tactical operations and strategic planning.”

  • Source: Big Data Drives Rapid Changes in Infrastructure and $232 Billion in IT Spending Through 2016, Gartner Group

Integration Solution Highlights

  • Combines the best of Big Data with Search with an integrated and fully distributed solution
  • Supports a pre-defined MapR target data source within LucidWorks Search
  • Enables users to create and configure the MapR data source directly from the LucidWorks Search administration console
  • Leverages enterprise security features offered by both MapR and LucidWorks Search

The Economist Intelligence Unit study found that global companies experienced a 26 percent improvement in performance over the last three years when big data analytics were applied to the decision-making process. And now, those data-savvy executives are forecasting a 41 percent improvement over the next three years. The integration between LucidWorks Search and MapR makes it easier to put Big Data analytics in motion.

I’m really excited about this match up but you know I can’t simply let claims like “…global companies experienced a 26 percent improvement in performance….” slide by. ;-)

If you go read the report,
The Deciding Factor: Big Data & Decision Making
, you will find at page six (6):

On average, survey participants say that big data has improved their organisations’ performance in the past three years by 26%, and they are optimistic that it will improve performance by an average of 41% in the next three years. While “performance” in this instance is not rigorously specified, it is a useful gauge of mood.

The measured difference in performance, from:

firms that emphasise decision-making based on data and analytics performed 5-6% better—as measured by output and performance—than those that rely on intuition and experience for decision-making.

So, not 26% but 5-6% measured and the 5-6% is for decision-making on data and analytics, not big data.

You don’t find code written at either LucidWorks or MapR that is “close enough.” Both have well deserved reputations for clean code and hard work.

Why should communications fall short of that mark?

Searching for Dark Data

Tuesday, February 19th, 2013

Searching for Dark Data by Paul Doscher.

From the post:

We live in a highly connected world where every digital interaction spawns chain reactions of unfathomable data creation. The rapid explosion of text messaging, emails, video, digital recordings, smartphones, RFID tags and those ever-growing piles of paper – in what was supposed to be the paperless office – has created a veritable ocean of information.

Welcome to the world of Dark Data

Welcome to the world of Dark Data, the humongous mass of constantly accumulating information generated in the Information Age. Whereas Big Data refers to the vast collection of the bits and bytes that are being generated each nanosecond of each day, Dark Data is the enormous subset of unstructured, untagged information residing within it.

Research firm IDC estimates that the total amount of digital data, aka Big Data, will reach 2.7 zettabytes by the end of this year, a 48 percent increase from 2011. (One zettabyte is equal to one billion terabytes.) Approximately 90 percent of this data will be unstructured – or Dark.

Dark Data has thrown traditional business intelligence and reporting technologies for a loop. The software that countless executives have relied on to access information in the past simply cannot locate or make sense of the unstructured data that comprises the bulk of content today and tomorrow. These tools are struggling to tap the full potential of this new breed of data.

The good news is that there’s an emerging class of technologies that is ready to pick up where traditional tools left off and carry out the crucial task of extracting business value from this data.

Effective exploration of Dark Data will require something different from search tools that depend upon:

  • Pre-specified semantics (RDF) because Dark Data has no pre-specified semantics.
  • Structure because Dark Data has no structure.

Effective exploration of Dark Data will require:

Machine assisted-Interactive searching with gifted and grounded semantic comparators (people) creating pathways, tunnels and signposts into the wilderness of Dark Data.

I first saw this at: Delving into Dark Data.

Reflective Intelligence and Unnatural Acts

Thursday, December 13th, 2012

I wasn’t in the best of shape today but did manage to attend the webinar: Crowd Sourcing Reflected Intelligence Using Search and Big Data.

Not a lot of detail but there were two topics that caught my attention.

The first was “reflective intelligence,” that is a system that reflects the intelligence of the users back to other users.

Intelligence derived from tracking “clicks,” search terms, etc.

Question: How does your topic map solution “reflect” the intelligence of its users?

That is how do responses “improve” (by some measure) as a result of user interaction.

Could be measuring user behavior, what links do they select for particular query terms. (That is an example from the webinar.) Or could be users adding information, perhaps even suggesting/voting on merges.

The second riff that got my attention was a description of the software under discussion as:

“I don’t have to do unnatural acts.”

Is that like the Papa John’s “better ingredients?” Taken to imply that other pizzas use sub-par ingredients?

Or in this case, other software solutions require “unnatural acts?”

Interesting selling point.

What unusual properties would you claim for topic maps or topic map software?

Crowd Sourcing Reflected Intelligence Using Search and Big Data [Webinar]

Monday, December 3rd, 2012

Crowd Sourcing Reflected Intelligence Using Search and Big Data

Date: December 13, 2012

Time: 10:00 am PT / 1:00 pm ET

From the webpage:

Anyone interested in drawing insights from their Big Data repository/project/application should attend this informative webinar brought to you by MapR and LucidWorks. LucidWorks Search is a development platform that accelerates and simplifies building highly secure, scalable, and cost-effective search applications.

This webinar will show:

  • how search users’ search behavior can be mined
  • how big data analytics can be applied to that raw data
  • how to redeploy that data back to the users to improve their experience

Experts from MapR and Lucidworks will show the strengths of combining the easiest, most dependable and fastest distribution for Hadoop with the real-time, ad hoc data accessibility of LucidWorks Search to provide analytic capabilities along with scalable machine learning algorithms for deeper insight into both content and user behavior.

Speakers: Grant Ingersoll, Chief Scientist for LucidWorks and Ted Dunning, Chief Application Architect for MapR.

I have seen Grant on video and it was great. If Ted is anywhere close to as good as Grant, this is going to be a webinar to remember!

LucidWorks Announces Lucene Revolution 2013

Sunday, November 18th, 2012

LucidWorks Announces Lucene Revolution 2013 by Paul Doscher, CEO of LucidWorks.

From the webpage:

LucidWorks, the trusted name in Search, Discovery and Analytics, today announced that Lucene Revolution 2013 will take place at The Westin San Diego on April 29 – May 2, 2013. Many of the brightest minds in open source search will convene at this 4th annual Lucene Revolution to discuss topics and trends driving the next generation of search. The conference will be preceded by two days of Apache Lucene, Solr and Big Data training.

BTW, the call for papers opened up on November 12, 2012, but you still have time left: http://lucenerevolution.org/2013/call-for-papers

Jan. 13, 2013: CFP closes
Feb 1, 2013: Speakers notified

Searching Big Data’s Open Source Roots

Monday, October 22nd, 2012

Searching Big Data’s Open Source Roots by Nicole Hemsoth.

Nicole talks to Grant Ingersoll, Chief Scientist at LucidWorks, about the open source roots of big data.

No technical insights but a nice piece to pass along to the c-suite. Investment in open source projects can pay rich dividends. So long as you don’t need them next quarter. ;-)

And a snapshot of where we are now, which is on the brink of new tools and capabilities in search technologies.

Proximity Operators [LucidWorks]

Thursday, August 16th, 2012

Proximity Operators

From the webpage:

A proximity query searches for terms that are either near each other or occur in a specified order in a document rather than simply whether they occur in a document or not.

You will use some of these operators more than others but having a bookmark to the documentation will prove to be useful.

Lucid Imagination become LucidWorks [Man Bites Dog Story]

Monday, August 13th, 2012

Lucid Imagination becomes LucidWorks

Soft news except for the note about the soon to appear SearchHub.org (September, 2012).

And the company listening to users refer to it as LucidWorks and deciding to change the name of the company from Lucid Imagination to LucidWorks.

Sort of a man bites dog sort of story don’t your think?

Hurray for LucidWorks!

Makes me curious about the SearchHub.org site. Likely to listen to users there as well.

Lucene Eurocon / ApacheCon Europe

Wednesday, August 8th, 2012

Lucene Eurocon / ApacheCon Europe November 5-8 | Sinsheim, Germany

From a post I got today from Lucid Imagination:

Lucid Imagination and the Apache Foundation have agreed to co-locate Lucid’s Apache Lucene EuroCon with ApacheCon Europe being held this November 5-8 in Sinsheim, Germany. Lucene EuroCon at ApacheCon Europe will cover the breadth and depth of search innovation and application. The dedicated track will bring together Apache Lucene/Solr committers and technologists from around the world to offer compelling presentations that share future directions for the project and technical implementation experiences. Topic examples include channeling the flood of structured and unstructured data into faster, more cost-effective Lucene/Solr search applications that span a host of sectors and industries.

Some of the most talented Lucene/Solr developers gather each year at Apache Lucene EuroCon to share best practices and create next-generation search applications. Coupling Apache Lucene EuroCon with this year’s ApacheCon Europe offers a great benefit to the community at large. The combined attendees benefit from expert trainings and in-depth sessions, real-world case studies, excellent networking and the opportunity to connect with the industry’s leading minds.

Call For Papers Deadline is August 13

The Call for Papers for ApacheCon has been extended to August 13, 2012, and can be found on the ApacheCon website. As always, proceeds from Apache Lucene EuroCon benefit The Apache Software Foundation. We encourage all Lucene/Solr committers and developers who have a technical story to tell to submit an abstract. Apache Lucene/Solr has a rich community of developers. Supporting ApacheCon Europe by submitting your abstract and sharing your story is important for maintaining this important and thriving community.

Just so you don’t think this is a search only event, papers are welcome on:

  • Apache Daily – Tools frameworks and components used on a daily basis
  • ApacheEE – Java enterprise projects
  • Big Data – Cassandra, Hadoop, HBase, Hive, Kafka, Mahout, Pig, Whirr, ZooKeeper and friends
  • Camel in Action – All things Apache Camel, from their problems to their solutions
  • Cloud – Cloud-related applications of a broad range of Apache projects
  • Linked Data – (need a concise caption for this track)
  • Lucene, SOLR and Friends – Learn about important web search technologies from the experts
  • Modular Java Applications – Using Felix, ACE, Karaf, Aries and Sling to deploy modular Java applications to public and private cloud environments
  • NoSQL Database – Use cases and recent developments in Cassandra, HBase, CouchDBa and Accumulo
  • OFBiz – The Apache Enterprise Automation project
  • Open Office – Open Office and the Apache Content Ecosystem
  • Web Infrastructure – HTTPD, TomCat and Traffic Server, the heart of many Internet projects

Submissions are welcome from any developer or user of Apache projects. First-time speakers are just as welcome as experienced ones, and we will do our best to make sure that speakers get all the help they need to give a great presentation.

Reducing Software Highway Friction

Thursday, June 7th, 2012

Lucid Imagination Search Product Offered in Windows Azure Marketplace

From the post:

Ease of use and flexibility are two key business drivers that are fueling the rapid adoption of cloud computing. The ability to disconnect an application from its supporting architecture provides a new level of business agility that has never before been possible. To ease the move towards this new realm of computing, integrated platforms have begun emerge that make cloud computing easier to adopt and leverage.

Lucid Imagination, a trusted name in Search, Discovery and Analytics, today announced that its LucidWorks Cloud product has been selected by Microsoft Corp. to be offered as a Search-as-a-Service product in Microsoft’s Windows Azure Marketplace. LucidWorks Cloud is a full cloud service version of its LucidWorks Enterprise platform. LucidWorks Cloud delivers full open source Apache Lucene/Solr community innovation with support and maintenance from the world’s leading experts in open source search. An extensible platform architected for developers, LucidWorks Cloud is the only Solr distribution that provides security, abstraction and pre-built connectors for essential enterprise data sources – along with dramatic ease of use advantages in a well-tested, integrated and documented package.

Example use cases for LucidWorks Cloud include Search-as-a-Service for websites, embedding search into SaaS product offerings, and Prototyping and developing cloud-based search-enabled applications in general.

…..

Highlights of LucidWorks Cloud Search-as-a-Service

  • Sign-up for a plan and start building your search application in minute
  • Well-organized UI makes Apache Lucene/Solr innovation easier to consume and more adaptable to constant change
  • Create multiple search collections and manage them independently
  • Configure index and query settings, fields, stop words, synonyms for each collection
  • Built-in support for Hadoop, Microsoft SharePoint and traditional online content types
  • An open connector framework is available to customize access to other data sources
  • REST API automates and integrates search as a service with an application
  • Well-instrumented dashboard for infrastructure administration, monitoring and reporting
  • Monitored 24×7 by Lucid Development Operations insuring minimum downtime

Source: PR Newswire (http://s.tt/1dzre)

I find this deeply encouraging.

It is a step towards a diverse but reduced friction software highway.

The user community is not well served by uniform models for data, software or UIs.

The user community can be well served by a reduced friction software highway as they move data from application to application.

Microsoft has taken a large step towards a reduced friction software highway today. And it is appreciated!

Different ways to make auto suggestions with Solr

Monday, June 4th, 2012

Different ways to make auto suggestions with Solr

From the post:

Nowadays almost every website has a full text search box as well as the auto suggestion feature in order to help users to find what they are looking for, by typing the least possible number of characters possible. The example below shows what this feature looks like in Google. It progressively suggests how to complete the current word and/or phrase, and corrects typo errors. That’s a meaningful example which contains multi-term suggestions depending on the most popular queries, combined with spelling correction.

Starts with seven (7) questions you should ask yourself about auto-suggestions and then covers four methods for implementing them in Solr.

You can have the typical word completion seen in most search engines or you can be more imaginative, using custom dictionaries.

Lucene conference touches many areas of growth in search

Monday, May 14th, 2012

Lucene conference touches many areas of growth in search by Andy Oram.

From the post:

With a modern search engine and smart planning, web sites can provide visitors with a better search experience than Google. For instance, Google may well turn up interesting results if you search for a certain kind of shirt, but a well-designed clothing site can also pull up related trousers, skirts, and accessories. It’s not Google’s job to understand the intricate interrelationships of data on a particular web property, but the site’s own team can constantly tune searches to reflect what the site has to offer and what its visitors uniquely need.

Hence the important of search engines like Solr, based on the Lucene library. Both are open source Apache projects, maintained by Lucid Imagination, a company founded to commercialize the underlying technology. I attended parts of Lucid Imagination’s conference this week, Lucene Revolution, and found Lucene evolving in the ways much of the computer industry is headed.

Andy’s summary of the conference will make you wonder two things:

  1. Why weren’t you at the Lucene Revolution conference this year?
  2. Where are the videos from Lucene Revolution 2012?

I won’t ever be able to answer #1 but will post an answer to #2 as soon as it is available.

Dark Data

Sunday, May 13th, 2012

Lucid Imagination Combines Search, Analytics and Big Data to Tackle the Problem of Dark Data

This post was too well written to break up as quotes/excerpts. I am re-posting it in full.

Organizations today have little to no idea how much lost opportunity is hidden in the vast amounts of data they’ve collected and stored.  They have entered the age of total data overload driven by the sheer amount of unstructured information, also called “dark” data, which is contained in their stored audio files, text messages, e-mail repositories, log files, transaction applications, and various other content stores.  And this dark data is continuing to grow, far outpacing the ability of the organization to track, manage and make sense of it.

Lucid Imagination, a developer of search, discovery and analytics software based on Apache Lucene and Apache Solr technology, today unveiled LucidWorks Big Data. LucidWorks Big Data is the industry’s first fully integrated development stack that combines the power of multiple open source projects including Hadoop, Mahout, R and Lucene/Solr to provide search, machine learning, recommendation engines and analytics for structured and unstructured content in one complete solution available in the cloud.

Tweet This: Lucid Imagination combines #search, analytics and #BigData in complete stack. Beta now open http://ow.ly/aMHef

With LucidWorks Big Data, Lucid Imagination equips technologists and business users with the ability to initially pilot Big Data projects utilizing technologies such as Apache Lucene/Solr, Mahout and Hadoop, in a cloud sandbox. Once satisfied, the project can remain in the cloud, be moved on premise or executed within a hybrid configuration.  This means they can avoid the staggering overhead costs and long lead times associated with infrastructure and application development lifecycles prior to placing their Big Data solution into production.

The product is now available in beta. To sign up for inclusion in the beta program, visit http://www.lucidimagination.com/products/lucidworks-search-platform/lucidworks-big-data.

Dark Data Problem Is Real

How big is the problem of dark data? The total amount of digital data in the world will reach 2.7 zettabytes in 2012, a 48 percent increase from 2011.* 90 percent of this data will be unstructured or “dark” data. Worldwide, 7.5 quintillion bytes of data, enough to fill over 100,000 Libraries of Congress get generated every day. Conversely, that deep volume of data can serve to help predict the weather, uncover consumer buying patterns or even ease traffic problems – if discovered and analyzed proactively.

“We see a strong opportunity for search to play a key role in the future of data management and analytics,” said Matthew Aslett, research manager, data management and analytics, 451 Research. “Lucid’s Big Data offering, and its combination of large-scale data storage in Hadoop with Lucene/Solr-based indexing and machine-learning capabilities, provides a platform for developing new applications to tackle emerging data management challenges.”

LucidWorks Big Data

Data analytics has traditionally been the domain of business intelligence technologies. Most of these tools, however, have been designed to handle structured data such as SQL, and cannot easily tap into the broad range of data types that can be used in a Big Data application. With the announcement of LucidWorks Big Data, organizations will be able to utilize a single platform for their Big Data search, discovery and analytics needs. LucidWorks Big Data is the only complete platform that:

  • Combines the real time, ad hoc data accessibility of LucidWorks (Lucene/Solr) with compute and storage capabilities of Hadoop
  • Delivers commonly used analytic capabilities along with Mahout’s proven, scalable machine learning algorithms for deeper insight into both content and users
  • Tackles data, both big and small with ease, seamlessly scaling while minimizing the impact of provisioning Hadoop, LucidWorks and other components
  • Supplies a single, coherent, secure and well documented REST API for both application integration and administration
  • Offers fault tolerance with data safety baked in
  • Provides choice and flexibility, via on premise, cloud hosted or hybrid deployment solutions
  • Is tested, integrated and fully supported by the world’s leading experts in open source search.
  • Includes powerful tools for configuration, deployment, content acquisition, security, and search experience that is packaged in a convenient, well-organized application

Lucid Imagination’s Open Search Platform uncovers real-time insights from any enterprise data, whether structured in databases, unstructured in formats such as emails or social channels, or semi-structured from sources such as websites.  The company’s rich portfolio of enterprise-grade solutions is based on the same proven open source Apache Lucene/Solr technology that powers many of the world’s largest e-commerce sites. Lucid Imagination’s on-premise and cloud platforms are quicker to deploy, cost less than competing products and are more easily tailored to specific needs than business intelligence solutions because they leverage innovation from the open source community.  

“We’re allowing a broad set of enterprises to test and implement data discovery and analysis projects that have historically been the province of large multinationals with large data centers. Cloud computing and LucidWorks Big Data finally level the field,” said Paul Doscher, CEO of Lucid Imagination. “Large companies, meanwhile, can use our Big Data stack to reduce the time and cost associated with evaluating and ultimately implementing big data search, discovery and analysis. It’s their data – now they can actually benefit from it.”

LucidWorks 2.1

Thursday, April 26th, 2012

LucidWorks 2.1

There are times, not very often, when picking only a few features to report would be unfair to a product.

This is one of those times.

I have reproduced the description of LucidWorks 2.1 as it appears on the Lucid Imagination site:

LucidWorks 2.1 new features list:

Enhancement Areas Key Benefits

Includes the latest Lucene/Solr 4.0

  • Near Real Time
  • Fault Tolerance and High Availability
  • Data Durability
  • Centralized Configuration
  • Elasticity

Business Rules

  • Integrate your business processes and rules with the user search experience
  • Examples: Landing Pages, provide targeted search results per user, etc.
  • Framework to integrate with your BRMS (Business Rules Management System)
  • OOB integration with leading open source BRMS – Drools

Upgrade and Migrations

  • Lucid can help upgrade customers from Solr 3.x to 4.0 or older Solr versions to LucidWorks 2.1
  • Upgrades for existing LucidWorks customers on previous versions of LucidWorks to LucidWorks 2.1

Enhanced Connector Framework

  • Easily build integrations to index data from any application or data sources
  • Framework supports REST API driven integration, generates dynamic configuration UI, and allows admins to schedule the new connectors
  • Connectors available to crawl large amounts of HDFS data, integrate twitter updates into index, and CMIS connector to support CMS systems like Alfresco, etc.

Efficient Crawl of Large web content

  • OOB integration for Nutch  (open source)
  • Helps crawl Webscale data into your index

REST API and UI Enhancements

  • Supports memory and cache settings, schema less configuration using Dynamic fields from UI
  • Subject Matter Experts can create Best Bets for improved search experience

Key features and benefits of LucidWorks search platform

  • Streamlined search configuration, optimization and operations: Well-organized UI makes Solr innovation easier to consume, better adapting to constant change.
  • Enterprise-grade, business critical manageability Includes tools for infrastructure administration, monitoring and reporting so your search application can thrive within a well-defined, well-managed operational environment; includes upgradability across successive releases. We can help migrate Solr installations to LucidWorks 2.1.
  • Broad-based content acquisition Access big data and enterprise content faster and more securely with built-in support for Hadoop and Amazon S3, along with Sharepoint and traditional online content types – plus a new open connector framework to customize access to other data sources
  • Versatile access and data security Flexible, resilient built-in security simplifies getting search connected right to the right data and content
  • Advanced search experience enhancements Powerful, innovative search capabilities deliver faster, better, more useful results for a richer user experience; easily integrates into your application and infrastructure; REST API automates and integrates search as a service with your application.
  • Open source power and innovation Complete, supported release of Lucene/Solr 4.0, including latest innovations in Near Real Time search, distributed indexing and more versatile field faceting over and above Apache Lucene/Solr 3.x; all the flexibility of open source, packaged for business-critical development, maintenance and deployment
  • Cost-effective commercial grade expertise & Global 24×7 Support a range of annual support subscriptions including bundled services, consulting, training and certification from the world’s leading experts in Lucene/Solr open source.

Optimizing Findability in Lucene and Solr

Sunday, January 1st, 2012

Optimizing Findability in Lucene and Solr

From the post:

To paraphrase an age-old question about trees falling in the woods: “If content lives in your application and you can’t find it, does it still exist?” In this article, we explore how to make your content findable by presenting tips and techniques for discovering what is important in your content and how to leverage it in the Lucene Stack.

Table of Contents

Introduction
Planning for Findability
Knowing your Content
Knowing your Users
Garbage In, Garbage Out
Analyzing your Analysis
Stemming In Greater Detail
Query Techniques for Better Search
Navigation Hints
Final Thoughts
Resources

by Grant Ingersoll

You know when a blog post starts off with a table of contents it is long. Fortunately in this case, it is also very good. By one of the principal architects of Lucene, Grant Ingersoll.

A good start on developing findability skills but as the post points out, a lot of it will depend on your knowledge of what “findability” means to your users. Only you can answer that question.

LucidWorks Enterprise 2.0.1 Release

Friday, December 30th, 2011

LucidWorks Enterprise 2.0.1 Release

From the post:

LucidWorks Enterprise 2.0.1 is an interim bug-fix release. We’ve have resolved couple of critical bugs and LDAP integration issues. The list of issues resolved with this updates are available here.

Relevancy Driven Development with Solr

Thursday, December 1st, 2011

Relevancy Driven Development with Solr by Robin Bramley.

From the post:

The relevancy of search engine results is very subjective so therefore testing the relevancy of queries is also subjective. One technique that exists in the information retrieval field is the use of judgement lists; an alternative approach discussed here is to follow the Behaviour Driven Development methodology employing user story acceptance criteria – I’ve been calling this Relevancy Driven Development or RDD for short.

I’d like to thank Eric Pugh for a great discussion on search engine testing and for giving me a guest slot in his ‘Better Search Engine Testing‘ talk* at Lucene EuroCon Barcelona 2011 to mention RDD. The first iteration of Solr-RDD combines my passion for automated testing with my passion for Groovy by leveraging EasyB (a Groovy BDD testing framework).

The Solr-RDD GitHub site comes closer to the expectations of the project:

The aim of RDD is to allow the business users to gain confidence in the relevancy of the search query results.

The trick is that the business users can use a constrained data set, define a query and the results they expect in the order that they expect.

Well…, maybe. Two things of concern:

First, a user would have to “know” the data extremely well to formulate queries in that sort of detail, and

Second, it does not appear to leave any room for unexpected information that might also be useful to the user.

Perhaps this is a technique that works well with very well known data sets with few if any unexpected results.

Search + Big Data: It’s (still) All About the User (Users or Documents?)

Tuesday, November 8th, 2011

Search + Big Data: It’s (still) All About the User by Grant Ingersoll.

Slides

Abstract:

Apache Hadoop has rapidly become the primary framework of choice for enterprises that need to store, process and manage large data sets. It helps companies to derive more value from existing data as well as collect new data, including unstructured data from server logs, social media channels, call center systems and other data sets that present new opportunities for analysis. This keynote will provide insight into how Apache Hadoop is being leveraged today and how it evolving to become a key component of tomorrow’s enterprise data architecture. This presentation will also provide a view into the important intersection between Apache Hadoop and search.

Awesome as always!

Please watch the presentation and review the slides before going further. What follows won’t make much sense without Grant’s presentation as a context. I’ll wait……

Back so soon? ;-)

On slide 4 (I said to review the slides), Grant presents four overlapping areas, starting with Documents: Models, Feature Selection; Content Relationships: Page Rank, etc., Organization; Queries: Phrases, NLP; User Interaction: Clicks, Ratings/Reviews, Learning to Rank, Social Graph; and the intersection of those four areas is where Grant says search is rapidly evolving.

On slide 5 (sorry, last slide reference), Grant say to mine that intersection is a loop composed of: Search -> Discovery -> Analytics -> (back to Search). All of which involve processing of data that has been collected from use of the search interface.

Grant’s presentation made clear something that I have been overlooking:

Search/Indexing, as commonly understood, does not capture any discoveries or insights of users.

Even the search trails that Grant mentions are just lemming tracks complete with droppings. You can follow them if you like, may find interesting data, may not.

My point being that there is no way to capture the user’s insight that LBJ, for instance, is a common acronym for Lyndon Baines Johnson. So that the next user who searches for LBJ will find the information contributed by a prior user. Such as distinguishing application of Lyndon Baines Johnson to a graduate school (Lyndon B. Johnson School of Public Affairs), a hospital (Lyndon B. Johnson General Hospital), a PBS show (American Experience . The Presidents . Lyndon B. Johnson), a biography (American President: Lyndon Baines Johnson), and that is in just the first ten (10) “hits.” Oh, and as the name of an American President.

Grant made that clear for me with his loop of Search -> Discovery -> Analytics -> (back to Search) because Search only ever focuses on the documents, never the user’s insight into the documents.

And with every search, every user (with the exception of search trails), starts over at the beginning.

What if a colleague found a bug in program code, but you have to start at the beginning of the program and work your way there. Good use of your time? To reset with every user? That is what happens with search, nearly a complete reset. (Not complete because of page rank, etc. but only just.)

If we are going to make it “All About the User,” shouldn’t we be indexing their insights* into data? (Big or otherwise.)

*”Clicks” are not insights. Could be an unsteady hand, DTs, etc.

Solr and LucidWorks Enterprise: When to use each

Wednesday, September 28th, 2011

Solr and LucidWorks Enterprise: When to use each

From the post:

If LucidWorks Enterprise is built on Solr, how do you know which one to use when for your own circumstances? This article describes the difference between using straight Solr, using the LucidWorks Enterprise user interface, and using LucidWorks Enterprise’s ReST API for accomplishing various common tasks so you can see which fits your situation at a given moment.

In today’s world, building the perfect product is a lot like trying to repair a set of train tracks while the train is barreling down on you. The world just keeps moving, with great ideas and new possibilities tempting you every day. And to make things worse, innovation doesn’t just show its face for you; it regularly visits your competitors as well.

That’s why you use open source software in the first place. You have smart people; does it make sense to have them building search functionality when Apache Solr already provides it? Of course not. You’d rather rely on the solid functionality that’s already been built by the community of Solr developers, and let your people spend their time building innovation into your own products. It’s simply a more efficient use of resources.

But what if you need search-related functionality that’s not available in straight Solr? In some cases, you may be able to fill those holes and lighten your load with LucidWorks Enterprise. Built on Solr, LucidWorks Enterprise starts by simplifying the day-to-day use tasks involved in using Solr, and then moves on to adding additional features that can help free up your development team for work on your own applications. But how do you know which path would be right for you?

Since I posted the LucidWorks 2.0 announcement yesterday, I thought this might be helpful in terms of its evaluation. I did not see a date on it but it looks current enough.