Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 4, 2012

Solr vs. ElasticSearch: Part 2 – Data Handling

Filed under: ElasticSearch,Search Engines,Solr — Patrick Durusau @ 1:47 pm

Solr vs. ElasticSearch: Part 2 – Data Handling by Rafał Kuć.

In the previous part of Solr vs. ElasticSearch series we talked about general architecture of these two great search engines based on Apache Lucene. Today, we will look at their ability to handle your data and perform indexing and language analysis.

  1. Solr vs. ElasticSearch: Part 1 – Overview
  2. Solr vs. ElasticSearch: Part 2 – Data Handling
  3. Solr vs. ElasticSearch: Part 3 – Querying
  4. Solr vs. ElasticSearch: Part 4 – Faceting
  5. Solr vs. ElasticSearch: Part 5 – API Usage Possibilities

Rafal takes a dive into indexing and data handling under Solr and ElasticSearch.

PS: Can you suggest a search engine that does not befoul URLs with tracking information? Or at least consistently presents a “clean” version alongside a tracking version?

August 24, 2012

Solr vs. ElasticSearch: Part 1 – Overview

Filed under: ElasticSearch,Solr,SolrCloud — Patrick Durusau @ 8:18 am

Solr vs. ElasticSearch: Part 1 – Overview by Rafał Kuć.

From the post:

A good Solr vs. ElasticSearch coverage is long overdue. We make good use of our own Search Analytics and pay attention to what people search for. Not surprisingly, lots of people are wondering when to choose Solr and when ElasticSearch.

As the Apache Lucene 4.0 release approaches and with it Solr 4.0 release as well, we thought it would be beneficial to take a deeper look and compare the two leading open source search engines built on top of Lucene – Apache Solr and ElasticSearch. Because the topic is very wide and can go deep, we are publishing our research as a series of blog posts starting with this post, which provides the general overview of the functionality provided by both search engines.

Rafal gets this series of posts off to a good start!

PS: Solr vs. ElasticSearch: Part 2 – Data Handling

August 6, 2012

From Solr to elasticsearch [Clarity as a Value?]

Filed under: ElasticSearch,JSON,Solr — Patrick Durusau @ 4:39 pm

From Solr to elasticsearch by Rob Young.

From the post:

Search is right at the center of GOV.UK. It’s the main focus of the homepage and it appears in the corner of every single page. Many of our recent and upcoming apps such as licence finder also rely heavily on search. So, making sure we have the right tool for the job is vital. Recently we decided to begin switching away from Solr to elasticsearch for our search server. Rob Young, a developer at GDS explains in some detail the basis for our decisions – the usual disclaimers about this being quite technical apply.

I am sure there are points to be made for both Solr and ElasticSearch. No doubt much religious debate will follow this decision.

What interested me was the claim that:

Just about the most important feature of any search engine is the ability to query it. Both Solr and elasticsearch expose their query APIs over HTTP but they do so in quite different ways. Solr queries are made up of two and three letter URL parameters, while elasticsearch queries are clear, self documenting JSON objects passed in the HTTP body.

It is possible, as the example in the post shows, to have “…clear, self documenting JSON objects….” in ElasticSearch but isn’t clarity in that case optional?

Or at least in the eyes of its user?

Not to downplay the important of being “…clear and self-documenting…” but to make it clear that is a design choice. A good one in my opinion but a design choice none the less.

That clarity occurs in this case in JSON is an accident of expression.

August 5, 2012

Elastisch, a Clojure client for ElasticSearch

Filed under: Clojure,ElasticSearch — Patrick Durusau @ 6:15 pm

Elastisch, a Clojure client for ElasticSearch

From about this guide:

This guide covers ElasticSearch indexing capabilities in depth, explains how Elastisch presents them in the API and how some of the key features are commonly used.

This guide covers:

  • What is indexing in the context of full text search
  • What kind of features ElasticSearch has w.r.t. indexing, how Elastisch exposes them in the API
  • Mapping types and how they define how the data is indexed by ElasticSearch
  • How to define mapping types with Elastisch
  • Lucene built-in analyzers, their characteristics, what different kind of analyzers are good for.
  • Other topics related to indexing and working with indexes

An extensive introduction to ElasticSearch.

If you are not familiar with ElasticSearch already, now might be a good time.

July 24, 2012

ActionGenerator, Part Two

Filed under: ActionGenerator,ElasticSearch,Solr — Patrick Durusau @ 4:13 pm

Rafał Kuć returns in: ActionGenerator, Part Two to cover action generators for Elastic Search and Solr.

Just in case you are interested. 😉

Both include indexing and query action generators, just in case you want to stress your deployment before a big opening day. (Zero day crashes don’t encourage your user/customer base.)

Future plans include action generators for SenseiDB.

July 17, 2012

elasticsearch. The Company

Filed under: ElasticSearch,Lucene,Search Engines — Patrick Durusau @ 3:45 pm

elasticsearch. The Company

ElasticSearch needs no introduction to readers of this blog or really anyone active in the search “space.”

It was encouraging to hear that after years of building an ever increasingly useful product, that ElasticSearch has matured into a company.

With all the warm fuzzies that support contracts and such bring.

Sounds like they will demonstrate that the open source and commercial worlds aren’t, you know, incompatible.

It helps that they have a good product in which they have confidence and not a product that their PR/Sales department is pushing as a “good” product. The fear of someone “finding out” would make you real defensive in the latter case.

Looking forward to good fortune for ElasticSearch, its founders and anyone who wants to follow a similar model.

July 11, 2012

Search Data at Scale in Five Minutes with Pig, Wonderdog and ElasticSearch

Filed under: ElasticSearch,Pig,Wonderdog — Patrick Durusau @ 2:26 pm

Search Data at Scale in Five Minutes with Pig, Wonderdog and ElasticSearch

Russell Jurney continues his posts on searching at scale:

Working code examples for this post (for both Pig 0.10 and ElasticSearch 0.18.6) are available here.

ElasticSearch makes search simple. ElasticSearch is built over Lucene and provides a simple but rich JSON over HTTP query interface to search clusters of one or one hundred machies. You can get started with ElasticSearch in five minutes, and it can scale to support heavy loads in the enterprise. ElasticSearch has a Whirr Recipe, and there is even a Platform-as-a-Service provider, Bonsai.io.

Apache Pig makes Hadoop simple. In a previous post, we prepared the Berkeley Enron Emails in Avro format. The entire dataset is available in Avro format here: https://s3.amazonaws.com/rjurney.public/enron.avro. Lets check them out:

Scale is important for some queries but what other factors are important for searches?

Thinking that Google is searching at scale. Is that a counter-example to scale being the only measure of search success? Or the best measure?

Or is scale of searching just a starting point?

Where do you go after scale? Scale is easy to evaluate/measure, so whatever your next step, how is it evaluated or measured?

Or is that the reason for emphasis on scale/size? It’s an easy mark(in several senses)?

March 28, 2012

Kibana

Filed under: ElasticSearch,Kibana,logstash — Patrick Durusau @ 4:22 pm

Kibana

From the webpage:

You have logs. Billions of lines of data. You shipped, dated it, parsed it and stored it. Now what do you do with it? Now you make sense of it. Kibana helps you do that. Kibana is an alternative browser based interface for Logstash and ElasticSearch that allows you to efficiently search, graph, analyze and otherwise make sense of a mountain of logs.

Any thoughts of what data you would map to such an interface? Or map to the aggregations that it offers?

March 7, 2012

elasticsearch: Search made easy for (web) developers

Filed under: ElasticSearch — Patrick Durusau @ 5:41 pm

elasticsearch: Search made easy for (web) developers by Alexander Reelsen.

Nothing particularly new or startling but does cover elasticsearch from a perspective that may be of interest to web developers.

I don’t think search is “easy” for web developers or any other community. Some times, however, tools don’t get in the way as much as others.

February 18, 2012

Bird’s Eye View of the ElasticSearch Query DSL

Filed under: DSL,ElasticSearch,Query Language — Patrick Durusau @ 5:26 pm

Bird’s Eye View of the ElasticSearch Query DSL
Peter Karich.

From the post:

I’ve copied the whole post into a gist so that you can simply clone, copy and paste the important stuff and even could contribute easily.

Several times per month there are questions regarding the query structure on the ElasticSearch user group.

Although there are good docs explaining this in depth, I think a bird’s eye view of the Query DSL is necessary to understand what is written there. There is even some good external documentation available. And there were attempts to define a schema but nevertheless I’ll add my 2 cents here. I assume you set up your ElasticSearch instance correctly and on the local machine filled with exactly those 3 articles.

Do you have a feel for what a “bird’s eye view” would say about the differences in NoSQL query languages?

SQL has been relatively uniform, enabling users to learn the basics and then fill in the particulars as necessary. How far are we from a query DSL that obscures most of the differences from the average user?

February 9, 2012

ElasticSearch vs. Apache Solr

Filed under: ElasticSearch,Solr — Patrick Durusau @ 4:28 pm

ElasticSearch vs. Apache Solr

Good for the information it does present but both pros and cons are primarily focused on ElasticSearch. Which is fine because the author did a very good job trying to present both the pros and cons on ElasticSearch.

It would be a better (read different) presentation if it walked through the pros and cons with ElasticSearch and Apache Solr side by side. (But an author is entitled to write the post they want, not the one desired by others.)

I don’t know if it would be worth the effort but a common interface that searched the JIRA (or other issue tracking mechanisms) for common search engines and presented issues grouped together, that could be quite useful.

For comparison purposes but also for cross-pollination of solutions.

RSS River Plugin (ElasticSearch)

Filed under: ElasticSearch,RSS River Plugin — Patrick Durusau @ 4:22 pm

RSS River Plugin (ElasticSearch)

Here the non-technical stuff:

RSS River Plugin offers a simple way to index RSS feeds into Elasticsearch.

It reads your feeds with a regular period and index content.

As all rivers, it’s quite simple to create an RSS River :

  • Install the plugin and start Elasticsearch
  • Create your index (with mapping if needed)
  • Define the river
  • Search for RSS content

Rivers enable you to automatically index content as it arrives.

ElasticSearch offers default rivers for CouchDB, RabbitMQ, Twitter and Wikipedia.

February 7, 2012

8 Best Open Source Search Engines built on top of Lucene

Filed under: Bobo Search,Compass,Constello,ElasticSearch,IndexTank,Katta,Lucene,Solr,Summa — Patrick Durusau @ 4:36 pm

8 Best Open Source Search Engines built on top of Lucene

By my count I get five (5) based on Lucene. See what you think.

Lucene base:

  • Apache Solr
  • Compass
  • Constellio
  • Elastic Search
  • Katta

No Lucene base:

  • Bobo Search
  • Index Tank
  • Summa

Post has short summaries about the search engines and links to their sites.

Do you think the terminology around search engines is as confused as around NoSQL databases?

Any cross-terminology comparisons you would recommend to CIO’s or even users?

January 25, 2012

Berlin Buzzwords 2012

Filed under: BigData,Conferences,ElasticSearch,Hadoop,HBase,Lucene,MongoDB,Solr — Patrick Durusau @ 3:24 pm

Berlin Buzzwords 2012

Important Dates (all dates in GMT +2)

Submission deadline: March 11th 2012, 23:59 MEZ
Notification of accepted speakers: April 6st, 2012, MEZ
Publication of final schedule: April 13th, 2012
Conference: June 4/5. 2012

The call:

Call for Submission Berlin Buzzwords 2012 – Search, Store, Scale — June 4 / 5. 2012

The event will comprise presentations on scalable data processing. We invite you to submit talks on the topics:

  • IR / Search – Lucene, Solr, katta, ElasticSearch or comparable solutions
  • NoSQL – like CouchDB, MongoDB, Jackrabbit, HBase and others
  • Large Data Processing – Hadoop itself, MapReduce, Cascading or Pig and relatives

Related topics not explicitly listed above are more than welcome. We are looking for presentations on the implementation of the systems themselves, technical talks, real world applications and case studies.

…(moved dates to top)…

High quality, technical submissions are called for, ranging from principles to practice. We are looking for real world use cases, background on the architecture of specific projects and a deep dive into architectures built on top of e.g. Hadoop clusters.

Here is your chance to experience summer in Berlin (Berlin Buzzwords 2012) and in Montreal (Balisage).

Seriously, both conferences are very strong and worth your attention.

December 1, 2011

elasticsearch version 0.18.5

Filed under: ElasticSearch,Search Engines — Patrick Durusau @ 7:40 pm

elasticsearch version 0.18.5

From the blog entry:

You can download it here. It includes an upgraded Lucene version (3.5), featuring bug fixes and memory improvements, as well as more bug fixes in elasticsearch itself. Changes can be found here.

October 12, 2011

Querying ElasticSearch from VIM

Filed under: ElasticSearch,JSON — Patrick Durusau @ 4:40 pm

Querying ElasticSearch from VIM

From the post:

I’m using ElasticSearch quite a bit and finally decided to make it easy to debug. I now write JSON queries with a .es extension. And have this in my .vim/filetype.vim file:

Debugging ElasticSearch results with Perl.

I just know Robert (Barta) has a one liner for this and thought this might temp him into commenting. 😉

October 10, 2011

Introducing Truffler – Advanced search made easy

Filed under: ElasticSearch,Truffler — Patrick Durusau @ 6:20 pm

Introducing Truffler – Advanced search made easy

From the post:

Last week during a presentation at a user group I showed the project that me and my two partners Henrik Lindström and Marcus Granström have been working on for quite a while now – Truffler. Truffler is a search engine that we offer both as Software as a Service and as dedicated servers for rent. It’s a commercial product but we offer free trial indexes as well as personal indexes to developers that want to use it for their blogs or hobby projects as long as they link to us.

Built on ElasticSearch, offering a .Net API. Windows developers take note.

Curious if you try it out, what do you make of the claim that “advanced search [is] made easy.”? Lots of people make it, not just Truffler. How would you evaluate that claim? Is ease of programming/configuration enough? What of the results? How do you judge those?

For my class, consider a project proposal for how you would compare two search engines, Truffler and another. Not actually doing the proposal but writing up the process by which would test one against the other. I think you will find simply designing how you would compare the two a reasonably sized project. Could be interesting to pitch your project design to the search engines in question to see if they would fund the comparison as a research project.

September 30, 2011

jQuery UI Autocompletion with Elasticsearch Backend

Filed under: ElasticSearch,JQuery — Patrick Durusau @ 7:07 pm

jQuery UI Autocompletion with Elasticsearch Backend by Gerhard Hipfinger.

From the post:

I recently discovered Elasticsearch is an incredible easy search engine solution for JSON documents. As we heavily use CouchDB in our product development Elasticsearch and CouchDB are a perfect match. Even more since Elasticsearch comes with a great out of the box connection for CouchDB! So the next step is to use Elasticsearch as backend for a jQuery UI autocompletion field.

Your users may like some form of autocompletion in their topic map interface. If nothing else, it is amusing how far they land from what the user is looking for.

BTW, read the comments on this post.

ElasticSearch: Beyond Full Text Search

Filed under: ElasticSearch,Search Algorithms,Search Engines,Searching — Patrick Durusau @ 7:07 pm

ElasticSearch: Beyond Full Text Search by Karel Minařík.

If you aren’t into hard core searching already, this is a nice introduction to the area. Would like to see the presentation that went with the slides but even the slides alone should be useful.

September 20, 2011

ElasticSearch 0.17.7 Released!

Filed under: ElasticSearch,NoSQL — Patrick Durusau @ 7:52 pm

ElasticSearch 0.17.7 Released!

From the post:

This release include the usual list of bug fixes, and also include an upgrade to Lucene 3.4.0 (fixes critical bugs, so make sure you upgrade), as well as improvements to the couchdb river (memory usage wise).

Release Notes

September 10, 2011

SearchWorkings

Filed under: ElasticSearch,Lucene,Mahout,Solr — Patrick Durusau @ 6:02 pm

SearchWorkings

From the About Us page:

SearchWorkings.org was created by a bunch of really passionate search technology professionals who realised that the world (read: other search professionals) doesn’t have a single point of contact or comprehensive resource where they can learn and talk about all the exciting new developments in the wonderful world of open source search solutions. These professionals all work at JTeam, a leading supplier of high-quality custom-built applications and end-to-end solutions provider, and moreover a market leader when it comes to search solutions.

A wide variety of materials, from whitepapers and articles, forums (Lucene, Solr, ElasticSearch, Mahout), training videos, news, and blogs.

You do have to register/join (free) to get access to the good stuff.

August 29, 2011

Building Search App for Public Mailing Lists

Filed under: ElasticSearch,Search Engines,Search Interface,Searching — Patrick Durusau @ 6:25 pm

Building Search App for Public Mailing Lists in 15 Minutes with ElasticSearch by Lukáš Vlček.

You will need the slides to follow the presentation: Building Search App for Public Mailing Lists.

Very cool if fast presentation on building an email search application with ElasticSearch.

BTW, the link to BigDesk (A tiny monitoring tool for ElasticSearch clusters) is incorrect. Try: https://github.com/lukas-vlcek/bigdesk.

August 28, 2011

Road To A Distibuted Search Engine

Filed under: ElasticSearch,Lucene,Search Engines — Patrick Durusau @ 7:54 pm

Road To A Distributed Search Engine by Shay Banon.

If you are looking for a crash course on the construction details of Elasticsearch, you are in the right place.

My only quibble and this is common to all really good presentations (this is one of those) is that there isn’t a transcript to go along with it. There is so much information that I will have to watch it more than once to take it all in.

If you watch the presentation, do pay attention so you are not like the person who suggested that Solr and Elasticsearch were similar. 😉

August 26, 2011

Realtime Search: Solr vs Elasticsearch

Filed under: ElasticSearch,Solr — Patrick Durusau @ 6:24 pm

Realtime Search: Solr vs Elasticsearch by Ryan Sonnek.

Comparison of Solr and Elasticsearch for realtime searching.

Where “realtime” means you are updating the index while performing searches.

I’m not convinced that “realtime” requirements are any more common than those of “BigData.” They do exist and when they do, use the appropriate solution. On the other hand, don’t plan or build for “realtime” or “BigData” unless those are your requirements.

August 12, 2011

Apache CouchDB & Elasticsearch

Filed under: CouchDB,ElasticSearch — Patrick Durusau @ 7:22 pm

Apache CouchDB & Elasticsearch by Benoît Chesneau.

With one hundred and twenty-five (125) slides you can get off into the weeds and talk about the details. Very much worth your time to take a look.

August 8, 2011

Creating an Elasticsearch Plugin

Filed under: ElasticSearch,Lucene — Patrick Durusau @ 6:28 pm

Creating an Elasticsearch Plugin

From the post:

Elasticsearch is a great search engine built on top of Apache Lucene. We came across the need to add new functionality and did not want to fork Elasticsearch for this. Luckily Elasticsearch comes with a plugin framework. We all ready leverage this framework to use the Apache Thrift transport. There was no documentation on how to create a plugin so after digging around in the code a little we where able to to create our own plugin.

Here is a tutorial on creating a plugin and installing it into Elasticsearch.

Just in case you are using and need to extent Elasticsearch.

August 1, 2011

99 Problems, But the Search Ain’t One

Filed under: ElasticSearch,Search Engines,Searching — Patrick Durusau @ 3:55 pm

99 Problems, But the Search Ain’t One

A fairly comprehensive overview of elasticsearch, including replication/sharding and API summaries.

Depending on the type of search and “aggregation” (read merging) you require, this may fit the bill.

July 21, 2011

Wonderdog

Filed under: ElasticSearch,Hadoop,Pig — Patrick Durusau @ 6:30 pm

Wonderdog

From the webpage:

Wonderdog is a Hadoop interface to Elastic Search. While it is specifically intended for use with Apache Pig, it does include all the necessary Hadoop input and output formats for Elastic Search. That is, it’s possible to skip Pig entirely and write custom Hadoop jobs if you prefer.

I may just be paying more attention but the search scene seems to be really active.

That’s good for topic maps because the more data that is searched, the greater the likelihood of heterogeneous data. Text messages between teens are probably heterogeneous but who cares?

Medical researchers using different terminology results in heterogeneous data, not just today, but data from yesteryear. Now that could be important.

July 14, 2011

Elasticsearch, Kettle and the CTools

Filed under: ElasticSearch,Search Engines,Searching — Patrick Durusau @ 4:11 pm

Elasticsearch, Kettle and the CTools

From the post:

I’m not much into the sql vs nosql discussion. I have enough years of BI to know that the important thing is to choose the right tool for the job. And that requires a lot of tools!

Here’s one more for our set: ElasticSearch. ElasticSearch is an Open Source (Apache 2), Distributed, RESTful, Search Engine built on top of Lucene.

Adds Elasticsearch to Kettle for BI.


Updated 14 May 2012 (forgot the URL for the link, now fixed)

June 15, 2011

elasticsearch: The Road to a Distributed, (Near) Real Time, Search Engine

Filed under: ElasticSearch,Lucene — Patrick Durusau @ 3:08 pm

elasticsearch: The Road to a Distributed, (Near) Real Time, Search Engine by Shay Banon

Covers Lucene basics and then shards and replicas using elasticsearch

« Newer PostsOlder Posts »

Powered by WordPress