Importing data from another Solr
Luca Cavanna writes:
The Data Import Handler is a popular method to import data into a Solr instance. It provides out of the box integration with databases, xml sources, e-mails and documents. A Solr instance often has multiple sources and the process to import data is usually expensive in terms of time and resources. Meanwhile, if you make some schema changes you will probably find you need to reindex all your data; the same happens with indexes when you want to upgrade to a Solr version without backward compatibility. We can call it “re-index bottleneck”: once you’ve done the first data import involving all your external sources, you will never want to do it the same way again, especially on large indexes and complex systems.
Retrieving stored fields from a running Solr
An easier solution to do this is based on querying your existing Solr whereby it retrieves all its stored fields and reindexes them on a new instance. Everyone can write their own script to achieve this, but wouldn’t it be useful having a functionality like this out of the box inside Solr? This is the reason why the SOLR-1499 issue was created about two years ago. The idea was to have a new
EntityProcessor
which retrieves data from another Solr instance using Solrj. Recently effort has been put into getting this feature committed to Solr’s dataimport contrib module. Bugs have been fixed and test coverage has been increased. Hopefully this issue will get released with Solr 3.5.
A look ahead to the next release of Solr!