Multi level composite-id routing in SolrCloud by Anshum Gupta.
From the post:
SolrCloud over the last year has evolved into a rather intelligent system with a lot of interesting and useful features going in. One of them has been the work for intelligent routing of documents and queries.
SolrCloud started off with a basic hash based routing in 4.0. It then got interesting with the composite id router being introduced with 4.1 which enabled smarter routing of documents and queries to achieve things like multi-tenancy and co-location. With 4.7, the 2-level composite id routing will be expanded to work for 3-levels (SOLR-5320).
A good post about how document routing generally works can be found here. Now, let’s look at how the composite-id routing extends to 3-levels and how we can really use it to query specific documents in our corpus.
An important thing to note here is that the 3-level router only extends the 2-level one. It’s the same router and the same java class i.e. you don’t really need to ‘set it up’.
Where would you want to use the multi-level composite-id router?
The multi-level implementation further extends the support for multi tenancy and co-location of documents provided by the already existing composite-id router. Consider a scenario where a single setup is used to host data for multiple applications (or departments) and each of them have a set of users. Each user further has documents associated with them. Using a 3-level composite-id router, a user can route the documents to the right shards at index time without having to really worry about the actual routing. This would also enable users to target queries for specific users or applications using the shard.keys parameter at query time.
Does that sound related to topic maps?
What if you remembered that “document” for Lucene means:
Documents are the unit of indexing and search. A Document is a set of fields. Each field has a name and a textual value. A field may be stored with the document, in which case it is returned with search hits on the document. Thus each document should typically contain one or more stored fields which uniquely identify it.
Probably not an efficient way to handle multiple identifiers but that depends on your use case.