Archive for the ‘Solr’ Category
Friday, May 17th, 2013
Solr 4, the NoSQL Search Server by Yonik Seeley
Date: Thursday, May 30, 2013
Time: 10:00am Pacific Time
From the description:
The long awaited Solr 4 release brings a large amount of new functionality that blurs the line between search engines and NoSQL databases. Now you can have your cake and search it too with Atomic updates, Versioning and Optimistic Concurrency, Durability, and Real-time Get!
Learn about new Solr NoSQL features and implementation details of how the distributed indexing of Solr Cloud was designed from the ground up to accommodate them.
Featured Presenter:
Yonik Seeley – Research creator of Apache Solr and the Chief Open Source Architect and Co-Founder at LucidWorks. Mr. Seeley is an Apache Lucene/Solr PMC member and committer and an expert in distributed search systems architecture and performance. His work experience includes CNET Networks, BEA and Telcordia. He earned his M.S. in Computer Science from Stanford University.
This could be a real treat!
Notes on the webinar to follow.
Posted in NoSQL, Searching, Solr | No Comments »
Tuesday, May 14th, 2013
Eating dog food with Lucene by Michael McCandless.
From the post:
Eating your own dog food is important in all walks of life: if you are a chef you should taste your own food; if you are a doctor you should treat yourself when you are sick; if you build houses for a living you should live in a house you built; if you are a parent then try living by the rules that you set for your kids (most parents would fail miserably at this!); and if you build software you should constantly use your own software.
So, for the past few weeks I’ve been doing exactly that: building a simple Lucene search application, searching all Lucene and Solr Jira issues, and using it instead of Jira’s search whenever I need to go find an issue.
It’s currently running at jirasearch.mikemccandless.com and it’s still quite rough (feedback welcome!).
Now there’s a way to learn the details!
Makes me think about the poor search capabilities at an SDO I frequent.
Could be a way to spend some quality time with Lucene and Solr.
Will have to give it some thought.
Posted in Lucene, Solr | No Comments »
Monday, May 6th, 2013
See Lucene Changes.txt.
See Solr Changes.txt
More good news for a Monday!
Posted in Lucene, Solr | No Comments »
Monday, April 29th, 2013
Indexing Millions Of Documents Using Tika And Atomic Update by Patricia Gorla.
From the post:
On a recent engagement, we were posed with the problem of sorting through 6.5 million foreign patent documents and indexing them into Solr. This totaled about 1 TB of XML text data alone. The full corpus included an additional 5 TB of images to incorporate into the index; this blog post will only cover the text metadata.
Streaming large volumes of data into Solr is nothing new, but this dataset posed a unique challenge: Each patent document’s translation resided in a separate file, and the location of each translation file was unknown at runtime. This meant that for every document processed we wouldn’t know where its match would be. Furthermore, the translations would arrive in batches, to be added as they come. And lastly, the project needed to be open to different languages and different file formats in the future.
Our options for dealing with inconsistent data came down to: cleaning all data and organizing it before processing, or building an ingester robust enough to handle different situations.
We opted for the latter and built an ingester that would process each file individually and index the documents with an atomic update (new in Solr 4). To detect and extract the text metadata we chose Apache Tika. Tika is a document-detection and content-extraction tool useful for parsing information from many different formats.
On the surface Tika offers a simple interface to retrieve data from many sources. Our use case, however, required a deeper extraction of specific data. Using the built-in SAX parser allowed us to push Tika beyond its normal limits, and analyze XML content according to the type of information it contained.
No magic bullet but an interesting use case (patents in multiple languages).
Posted in Indexing, Solr, Tika | No Comments »
Saturday, April 27th, 2013
Developing a Solr Plugin by Andrew Janowczyk.
From the post:
For our flagship product, Searchbox.com, we strive to bring the most cutting-edge technologies to our users. As we’ve mentioned in earlier blog posts, we rely heavily on Solr and Lucene to provide the framework for these functionalities. The nice thing about the Solr framework is that it allows for easy development of plugins which can greatly extend the capabilities of the software. We’ll be creating a set of slideshares which describe how to implement 3 types of plugins so that you can get ahead of the learning curve and start extending your own custom Solr installation now.
There are mainly 4 types of custom plugins which can be created. We’ll discuss their differences here:
Sometimes Andrew says three (3) types of plugins and sometimes he says four (4).
I tried to settle the question by looking at the Solr Wiki on plugins.
Depends on how you want to count separate plugins.
But, Andrew’s advice about learning to write plugins is sound. It will put your results above those of others.
Posted in Searching, Solr, Uncategorized | No Comments »
Tuesday, April 16th, 2013
How To Debug Solr With Eclipse by Doug Turnbull.
From the post:
Recently I was puzzled by some behavior Solr was showing me. I scratched my head and called over a colleague. We couldn’t quite figure out what was going on. Well Solr is open source so… next stop – Debuggersville!
Running Solr in the Eclipse debugger isn’t hard, but there are many scattered user group posts and blog articles that you’ll need to manually tie together into a coherent picture. So let me do you the favor of tying all of that info together for you here.
This looks very useful.
Curious of there are any statistical function debuggers?
That step you through the operations and show the state of values as they change?
Thinking that could be quite useful as a sanity test when the numbers just don’t jive.
Posted in Eclipse, Lucene, Solr | No Comments »
Wednesday, April 10th, 2013
Bug fix releases for Apache Lucene and Solr.
Apache Lucene 4.2.1: Changes; Downloads.
Apache Solr 4.2.1: Changes; Downloads.
Posted in Lucene, Solr | No Comments »
Monday, April 8th, 2013
Beginners Guide To Enhancing Solr/Lucene Search With Mahout’s Machine Learning by Doug Turnbull.
From the post:
Yesterday, John and I gave a talk to the DC Hadoop Users Group about using Mahout with Solr to perform Latent Semantic Indexing — calculating and exploiting the semantic relationships between keywords. While we were there, I realized, a lot of people could benefit from a bigger picture, less in-depth, point of view outside of our specific story. In general where do Mahout and Solr fit together? What does that relationship look like, and how does one exploit Mahout to make search even more awesome? So I thought I’d blog about how you too get start to put these pieces together to simultaneously exploit Solr’s search and Mahout’s machine learning capabilities.
The root of how this all works is with a slightly obscure feature of Lucene based search — Term Vectors. Lucene based search applications give you the ability to generate term vectors from documents in the search index. Its a feature often turned on for specific search features, but other than that can appear to be a weird opaque feature to beginners. What is a term vector, you might ask? And why would you want to get one?
You know my misgivings about metric approaches to non-metric data (such as semantics) but there is no denying that Latent Semantic Indexing can be useful.
Think of Latent Semantic Indexing as a useful tool.
A saw is a tool too but not every cut made with a saw is a correct one.
Yes?
Posted in Lucene, Mahout, Solr | No Comments »
Saturday, April 6th, 2013
Indexing PDF for OSINT and Pentesting by Alejandro Nolla.
From the post:
Most of us, when conducting OSINT tasks or gathering information for preparing a pentest, draw on Google hacking techniques like site:company.acme filetype:pdf “for internal use only” or something similar to search for potential sensitive information uploaded by mistake. At other times, a customer will ask us to find out if through negligence they have leaked this kind of sensitive information and we proceed to make some google hacking fu.
But, what happens if we don’t want to make this queries against Google and, furthermore, follow links from search that could potentially leak referrers? Sure we could download documents and review them manually in local but it’s boring and time consuming. Here is where Apache Solr comes into play for processing documents and creating an index of them to give us almost real time searching capabilities.
A nice outline of using Solr for internal security testing of PDF files.
At the same time, a nice outline of using Solr for external security testing of PDF files.
You can sweep sites for new PDF files on a periodic basis and retain only those meeting a particular criteria.
Low grade ore but even low grade ore can have a small diamond every now and again.
Posted in Cybersecurity, Indexing, PDF, Security, Solr | No Comments »
Friday, March 29th, 2013
How NoSQL Paid Off for Telenor by Sebastian Verheughe and Katrina Sponheim.
A presentation I encountered while searching for something else.
Makes a business case for Lucene/Solr and Neo4j solutions to improve customer access to data.
As opposed to the world being a better place case.
What information process/need have you encountered where you can make a business case for topic maps?
Posted in Lucene, Marketing, Neo4j, Solr | 5 Comments »
Thursday, March 28th, 2013
Build a search engine in 20 minutes or less by Ben Ogorek.
I was suspicious but pleasantly surprised by the demonstration of the vector space model you will find here.
True, it doesn’t offer all the features of the latest Lucene/Solr releases but it will give you a firm grounding on vector space models.
Enjoy!
PS: One thing to keep in mind, semantics do not map to vector space. We can model word occurrences in vector space but occurrences are not semantics.
Posted in Indexing, Lucene, Search Engines, Solr | No Comments »
Friday, March 22nd, 2013
Lucene/Solr 4 – A Revolution in Enterprise Search Technology (Webinar). Presenter: Erik Hatcher, Lucene/Solr Committer and PMC member.
Date: Wednesday, March 27, 2013
Time: 10:00am Pacific Time
From the signup page:
Lucene/Solr 4 is a ground breaking shift from previous releases. Solr 4.0 dramatically improves scalability, performance, reliability, and flexibility. Lucene 4 has been extensively upgraded. It now supports near real-time (NRT) capabilities that allow indexed documents to be rapidly visible and searchable. Additional Lucene improvements include pluggable scoring, much faster fuzzy and wildcard querying, and vastly improved memory usage.
The improvements in Lucene have automatically made Solr 4 substantially better. But Solr has also been considerably improved and magnifies these advances with a suite of new “SolrCloud” features that radically improve scalability and reliability.
In this Webinar, you will learn:
- What are the Key Feature Enhancements of Lucene/Solr 4, including the new distributed capabilities of SolrCloud
- How to Use the Improved Administrative User Interface
- How Sharding has been improved
- What are the improvements to GeoSpatial Searches, Highlighting, Advanced Query Parsers, Distributed search support, Dynamic core management, Performance statistics, and searches for rare values, such as Primary Key
Great way to get up to speed on the latest release of Lucene/Solr!
Posted in Indexing, Lucene, Solr | No Comments »
Tuesday, March 19th, 2013
MongoDB 2.4 Release
From the webpage:
Developer Productivity
…
- Capped Arrays simplify development by making it easy to incorporate fixed, sorted lists for features like leaderboards and logging.
- Geospatial Enhancements enable new use cases with support for polygon intersections and analytics based on geospatial data.
- Text Search provides a simplified, integrated approach to incorporating search functionality into apps (Note: this feature is currently in beta release).
Operations
…
- Hash-Based Sharding simplifies deployment of large MongoDB systems.
- Working Set Analyzer makes capacity planning easier for ops teams.
- Improved Replication increases resiliency and reduces administration.
- Mongo Client creates an intuitive, consistent feature set across all drivers.
Performance
…
- Faster Counts and Aggregation Framework Refinements make it easier to leverage real-time, in-place analytics.
- V8 JavaScript Engine offers better concurrency and faster performance for some operations, including MapReduce jobs.
Monitoring
…
- On-Prem Monitoring provides comprehensive monitoring, visualization and alerting on more than 100 operational metrics of a MongoDB system in real time, based on the same application that powers 10gen’s popular MongoDB Monitoring Service (MMS). On-Prem Monitoring is only available with MongoDB Enterprise.
Security
….
- Kerberos Authentication enables enterprise and government customers to integrate MongoDB into existing enterprise security systems. Kerberos support is only available in MongoDB Enterprise.
- Role-Based Privileges allow organizations to assign more granular security policies for server, database and cluster administration.
You can read more about the improvements to MongoDB 2.4 in the Release Notes. Also, MongoDB 2.4 is available for download on MongoDB.org.
Lots to look at in MongoDB 2.4!
But I am curious about the beta text search feature.
MongoDB Text Search: Experimental Feature in MongoDB 2.4 says:
Text search (SERVER-380) is one of the most requested features for MongoDB 10gen is working on an experimental text-search feature, to be released in v2.4, and we’re already seeing some talk in the community about the native implementation within the server. We view this as an important step towards fulfilling a community need.
MongoDB text search is still in its infancy and we encourage you to try it out on your datasets. Many applications use both MongoDB and Solr/Lucene, but realize that there is still a feature gap. For some applications, the basic text search that we are introducing may be sufficient. As you get to know text search, you can determine when MongoDB has crossed the threshold for what you need. (emphasis added)
So, why isn’t MongoDB incorporating Solr/Lucene instead of a home grown text search feature?
Seems like users could leverage their Solr/Lucene skills with their MongoDB installations.
Yes?
Posted in Lucene, MongoDB, NoSQL, Searching, Solr | 1 Comment »
Saturday, March 16th, 2013
Lux
From the readme:
Lux is an open source XML search engine formed by fusing two excellent technologies: the Apache Lucene/Solr search index and the Saxon XQuery/XSLT processor.
At its core, Lux provides XML-aware indexing, an XQuery 1.0 optimizer that rewrites queries to use the indexes, and a function library for interacting with Lucene via XQuery. These capabilities are tightly integrated with Solr, and leverage its application framework in order to deliver a REST service and application server.
The REST service is accessible to applications written in almost any language, but it will be especially convenient for developers already using Solr, for whom Lux operates as a Solr plugin that provides query services using the same REST APIs as other Solr search plugins, but using a different query language (XQuery). XML documents may be inserted (and updated) using standard Solr REST calls: XML-aware indexing is triggered by the presence of an XML-aware field in a document. This means that existing application frameworks written in many different languages are positioned to use Lux as a drop-in capability for indexing and querying semi-structured content.
The application server is a great way to get started with Lux: it provides the ability to write a complete application in XQuery and XSLT with data storage backed by Lucene.
If you are looking for experience with XQuery and Lucene/Solr, look no further!
May be a good excuse for me to look at defining equivalence statements using XQuery.
I first saw this in a tweet by Michael Kay.
Posted in Lucene, Saxon, Solr, XQuery, XSLT | No Comments »
Saturday, March 16th, 2013
Apache Solr 4 Cookbook (Win a free copy)
Deadline 28.03.2013.
From the post:
Readers would be pleased to know that we have teamed up with Packt Publishing to organize a Giveaway of the Apache Solr 4 Cookbook. Two lucky winners will win a copy of the book (in eBook format). Keep reading to find out how you can be one of the Lucky Winners.
Let’s start with a little reminder about the book:
- Learn how to make Apache Solr search faster, more complete, and comprehensively scalable
- Solve performance, setup, configuration, analysis, and query problems in no time
- Get to grips with, and master, the new exciting features of Apache Solr 4
Read more about this book and download free Sample Chapter.
How to Enter ?
All you need to do is head on over to the book page (Apache Solr 4 Cookbook) and look through the product description of the book and drop a line via the comments below this post to let us know what interests you the most about this book. It’s that simple.
Product Description: http://www.packtpub.com/apache-solr-4-cookbook/book
Deadline
The contest will close on 28.03.2013. Winners will be contacted by email, so be sure to use your real email address when you comment!
Who Will Win ?
The winners will be chosen by the Solr.pl team randomly from readers entering the competition that replied with on topic comment.
If you want to increase your chances of winning, write a small review of the book using the sample chapter on Amazon.com and also forward the same post to bhavins@packtpub.com.
You would know I see this contest two (2) days about purchasing an electronic copy of this book!
I may enter the contest anyway so I can forward someone the “extra” copy of it.
Posted in Contest, Solr | No Comments »
Friday, March 15th, 2013
Using Solr’s New Atomic Updates by Scott Stults.
From the post:
A while ago we created a sample index of US patent grants roughly 700k documents big. Adjacently we pulled down the corresponding multi-page TIFFs of those grants and made PNG thumbnails of each page. So far, so good.
You see, we wanted to give our UI the ability to flip through those thumbnails and we wanted it to be fast. So our original design had a client-side function that pulled down the first thumbnail and then tried to pull down subsequent thumbnails until it ran out of pages or cache. That was great for a while, but it didn’t scale because a good portion of our requests were for non-existent resources.
Things would be much better if the UI got the page count along with the other details of the search hits. So why not update each record in Solr with that?
A new feature in Solr and one that I suspect will be handy. Such as updating a index of associations, for example.
Posted in Indexing, Solr | No Comments »
Tuesday, March 12th, 2013
Apache Lucene 4.2
Download
Changes
Apache Solr 4.2
Download
Changes
See the Lucene homepage for a summary of the 4.2 changes in Lucene and Solr.
Warning: Reference good only until the next Lucene/Solr release.
Posted in Lucene, Solr | No Comments »
Sunday, March 10th, 2013
Solr: Custom Ranking with Function Queries by Sujit Pal.
From the post:
Solr has had support for Function Queries since version 3.1, but before sometime last week, I did not have a use for it. Which is probably why when I would read about Function Queries, they would seem like a nice idea, but not interesting enough to pursue further.
Most people get introduced to Function Queries through the bf parameter in the DisMax Query Parser or through the geodist function in Spatial Search. So far, I haven’t had the opportunity to personally use either feature in a real application. My introduction to Function Queries was through a problem posed to me by one of my coworkers.
The problem was as follows. We want to be able to customize our search results based on what a (logged-in) user tells us about himself or herself via their profile. This could be gender, age, ethnicity and a variety of other things. On the content side, we can annotate the document with various features corresponding to these profile features. For example, we can assign a score to a document that indicates its appeal/information value to males versus females that would correspond to the profile’s gender.
So the idea is that if we know that the profile is male, we should boost the documents that have a high male appeal score and deboost the ones that have a high female appeal score, and vice versa if the profile is female. This idea can be easily extended for multi-category features such as ethnicity as well. In this post, I will describe a possible implementation that uses Function Queries to rerank search results using male/female appeal document scores.
Does your topic map deliver information based on user characteristics?
Have you re-invented the ranking or are you using an off-the-shelf solution?
Posted in Lucene, Ranking, Solr | No Comments »
Saturday, March 9th, 2013
How Solr powers search on America’s largest flash sale site by Ade Trenaman.
The post caught my attention with “flash sale,” which I had to look up.
Even after discovering it means “deal of the day,” the slides were interesting.
Especially the commentary on synonym lists!
What someone else considers to be a synonym may not be one for your audience.
Posted in Lucene, Solr | No Comments »
Saturday, February 23rd, 2013
Indexing StackOverflow In Solr by John Berryman.
From the post:
One thing I really like about Solr is that its super easy to get started. You just download solr, fire it up, and then after following the 10 minute tutorial you’ll have a basic understand of indexing, updating, searching, faceting, filtering, and generally using Solr. But, you’ll soon get bored of playing with the 50 or so demo documents. So, quit insulting Solr with this puny, measly, wimpy dataset; Index something of significance and watch what Solr can do.
One of the most approachable large datasets is the StackExchange data set which most notably includes all of StackOverflow, but also contains many of the other StackExchange sites (Cooking, English Grammar, Bicycles, Games, etc.) So if StackOverflow is not your cup of tea, there’s bound to be a data set in there that jives more with your interests.
Once you’ve pulled down the data set, then you’re just moments away from having your own SolrExchange index. Simply unzip the dataset that you’re interested in (7-zip format zip files), pull down this git repo which walks you through indexing the data, and finally, just follow the instructions in the README.md.
Interesting data set for Solr.
More importantly, a measure of how easy it needs to be to get started with software.
Software like topic maps.
Suggestions?
Posted in Indexing, Solr | No Comments »
Sunday, February 17th, 2013
Developing Your Own Solr Filter part 2
From the post:
In the previous entry “Developing Your Own Solr Filter” we’ve shown how to implement a simple filter and how to use it in Apache Solr. Recently, one of our readers asked if we can extend the topic and show how to write more than a single token into the token stream. We decided to go for it and extend the previous blog entry about filter implementation.
What better way to start the week!
Posted in Lucene, Solr | No Comments »
Saturday, February 16th, 2013
Poll: Which Solr version are you using?
From the post:
With Solr 4.1 recently released, let’s see which version(s) of Solr people are using. Please tweet it to help us get more vote and better stats.
Voted and Tweeted: Total Elapsed Time: 6 Seconds.
Can you do better?
Seriously, do the poll and retweet (whether you use Solr or not).
It’s for a good cause.
Posted in Solr | No Comments »
Friday, February 15th, 2013
Solr Unleashed: A Hands-On Workshop for Building Killer Search Apps (LucidWorks)
From the post:
Having consulted with clients on Lucene and Solr for the better part of a decade, we’ve seen the same mistakes made over and over again: applications built on shaky foundations, stretched to the breaking point. In this two day class, learn from the experts about how to do it right and make sure your apps are rock solid, scalable, and produce relevant results. Also check the course outline.
The course looks great, but if you don’t have the fees, I have reproduced the course outline below.
Using online documentation, mailing lists and other online resources, track the outline and fill it in for yourself.
If you want a real challenge, work through the outline and then build a Solr application around the outline.
To keep your newly acquired skills polished to a fine sheen.
1. The Fundamentals
- About Solr
- Installing and running Solr
- Adding content to Solr
- Reading a Solr XML response
- Changing parameters in the URL
- Using the browse interface
2. Searching
- Sorting results
- Query parsers
- More queries
- Hardwiring request parameters
- Adding fields to default search
- Faceting on fields
- Range faceting
- Date range faceting
- Hierarchical faceting
- Result grouping
3. Indexing
- Adding your own content to Solr
- Deleting data from solr
- Building a bookstore search
- Adding book data
- Exploring the book data
- Dedupe updateprocessor
4. Updating your schema
- Adding fields to the schema
- Analyzing text
5. Relevance
- Field weighting
- Phrase queries
- Function queries
6. Extended features
- More-like-this
- Fuzzier search
- Sounds-like
- Geospatial
- Spell checking
- Suggestions
- Highlighting
7. Multilanguage
- Working with English
- Working with other languages
- Non-whitespace languages
- Identifying languages
- Language specific sorting
8. SolrCloud
- Introduction
- How SolrCloud works
- Commit strategies
- ZooKeeper
- Managing Solr config files
Not the same as the class but will help you ask better questions of LucidWorks experts when you need them.
Posted in Search Engines, Searching, Solr | No Comments »
Monday, February 11th, 2013
New Book: ElasticSearch Server!
In the blog post dedicated to Solr 4.0 Cookbook we give a small hint that cookbook was not the only project that occupies our free time. Today we can officially say that a few month of hard work is slowly coming to an end – we can announce a new book about one of the greatest piece of open-source software – ElasticSearch Server book!
ElasticSearch server book describes the most important and commonly used features of ElasticSearch (at least from our perspective). Example of topics discussed:
- ElasticSearch installation and configuration
- Static and dynamic index structure creation
- Querying ElasticSearch with Query DSL explained
- Using filters
- Faceting
- Routing
- Indexing data that is not flat
BTW, some wag posted a comment saying a Solr blog should not talk about ElasticSearch.
I bet they don’t see the sunshine very often from that position either.
Posted in ElasticSearch, Lucene, Solr | No Comments »
Friday, January 25th, 2013
Make your Filters Match: Faceting in Solr Florian Hopf.
From the post:
Facets are a great search feature that let users easily navigate to the documents they are looking for. Solr makes it really easy to use them though when naively querying for facet values you might see some unexpected behaviour. Read on to learn the basics of what is happening when you are passing in filter queries for faceting. Also, I’ll show how you can leverage local params to choose a different query parser when selecting facet values.
Introduction
Facets are a way to display categories next to a users search results, often with a count of how many results are in this category. The user can then select one of those facet values to retrieve only those results that are assigned to this category. This way he doesn’t have to know what category he is looking for when entering the search term as all the available categories are delivered with the search results. This approach is really popular on sites like Amazon and eBay and is a great way to guide the user.
Solr brought faceting to the Lucene world and arguably the feature was an important driving factor for its success (Lucene 3.4 introduced faceting as well). Facets can be build from terms in the index, custom queries and ranges though in this post we will only look at field facets.
Excellent introduction to facets in Solr.
The amount of enterprise quality indexing and search software that is freely available, makes me wonder why the average citizen worries about privacy?
There are far more average citizens than denizens of c-suites, government offices, and the like.
Shouldn’t they be the ones worrying about what the rest of us are compiling together?
Instead of secret, Stasi-like archives, a public archive, with the observations of ordinary citizens.
Posted in Facets, Lucene, Solr | No Comments »
Thursday, January 24th, 2013
Posted in Indexing, Lucene, Searching, Solr | No Comments »
Thursday, January 24th, 2013
Solr vs. ElasticSearch: Part 6 – User & Dev Communities by Rafał Kuć.
From the post:
One of the questions after my talk during the recent ApacheCon EU was what I thought about the communities of the two search engines I was comparing. Not surprisingly, this is also a question we often address in our consulting engagements. As a part of our Apache Solr vs ElasticSearch post series we decided to step away from the technical aspects of SolrCloud vs. ElasticSearch and look at the communities gathered around thesee two projects. If you haven’t read the previous posts about Apache Solr vs. ElasticSearch here are pointers to all of them:
Rafał compares user activity (discussion lists), resources available, search trends, code statistics.
My take away is that both projects have very vibrant and responsive user and development communities.
You?
Posted in ElasticSearch, Searching, Solr | No Comments »
Saturday, January 12th, 2013
JUnit Rule for ElasticSearch by Florian Hopf.
From the post:
While I am using Solr a lot in my current engagement I recently started a pet project with ElasticSearch to learn more about it. Some of its functionality is rather different from Solr so there is quite some experimentation involved. I like to start small and implement tests if I like to find out how things work (see this post on how to write tests for Solr).
ElasticSearch internally uses TestNG and the test classes are not available in the distributed jar files. Fortunately it is really easy to start an ElasticSearch instance from within a test so it’s no problem to do something similar in JUnit. Felix Müller posted some useful code snippets on how to do this, obviously targeted at a Maven build. The ElasticSearch instance is started in a setUp method and stopped in a tearDown method:
Useful information about tests for Solr and ElasticSearch is too useful to pass up.
Besides, it reminded me of the need to have testable merging instances, both for TMDM merging as well as more complex merging scenarios.
Posted in ElasticSearch, Solr | No Comments »
Friday, January 11th, 2013
Solr vs ElasticSearch: Part 5 – Management API Capabilities by Rafał Kuć.
From the post:
In previous posts, all listed below, we’ve discussed general architecture, full text search capabilities and facet aggregations possibilities. However, till now we have not discussed any of the administration and management options and things you can do on a live cluster without any restart. So let’s get into it and see what Apache Solr and ElasticSearch have to offer.
Rafał continues this excellent series on Solr and ElasticSearch and promises there is more to come!
This series sets a high standard for posts comparing search capabilities!
Posted in ElasticSearch, Search Engines, Searching, Solr | No Comments »
Monday, December 24th, 2012
Microsoft Open Technologies releases Windows Azure support for Solr 4.0 by Brian Benz.
From the post:
Microsoft Open Technologies is pleased to share the latest update to the Windows Azure self-deployment option for Apache Solr 4.0.
Solr 4.0 is the first release to use the shared 4.x branch for Lucene & Solr and includes support for SolrCloud functionality. SolrCloud allows you to scale a single index via replication over multiple Solr instances running multiple SolrCores for massive scaling and redundancy.
To learn more about Solr 4.0, have a look at this 40 minute video covering Solr 4 Highlights, by Mark Miller of LucidWorks from Apache Lucene Eurocon 2011.
To download and install Solr on Windows Azure visit our GitHub page to learn more and download the SDK.
Another alternative for implementing the best of Lucene/Solr on Windows Azure is provided by our partner LucidWorks. LucidWorks Search on Windows Azure delivers a high-performance search solution that enables quick and easy provisioning of Lucene/Solr search functionality without any need to install, manage or operate Lucene/Solr servers, and it supports pre-built connectors for various types of enterprise data, structured data, unstructured data and web sites.
Beyond the positive impact for Solr and Azure in general, this means your Solr skills will be useful in new places.
Posted in Azure Marketplace, Microsoft, Solr | No Comments »