Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 26, 2015

Fun with facets in ggplot2 2.0

Filed under: Facets,Ggplot2,R — Patrick Durusau @ 1:22 pm

Fun with facets in ggplot2 2.0 by Bob Rudis.

From the post:

ggplot2 2.0 provides new facet labeling options that make it possible to create beautiful small multiple plots or panel charts without resorting to icky grob manipulation.

Very appropriate for this year in Georgia (US) at any rate. Facets are used to display temperature by year and temperature versus Kwh by year.

The high today, 26th of December, 2015, is projected to be 77°F.

Sigh, that’s just not December weather.

February 15, 2014

Easy Hierarchical Faceting and display…

Filed under: Facets,JQuery,Solr — Patrick Durusau @ 1:44 pm

Easy Hierarchical Faceting and display with Solr and jQuery (and a tiny bit of Python) by Grant Ingersoll.

From the post:

Visiting two major clients in two days last week, each presented me with the same question: how do we better leverage hierarchical information like taxonomies, file paths, etc. in LucidWorks Search (LWS) (and Apache Solr) their applications, such that they could display something like the following image in their UI:

facets

Since this is pretty straight forward (much of it is captured already on the Solr Wiki) and I have both the client-side and server side code for this already in a few demos we routinely give here at Lucid, I thought I would write it up as a blog instead of sending each of them a one-off answer. I am going to be showing this work in the context of the LWS Financial Demo, for those who wish to follow along at the code level. We’ll use it to show a little bit of hierarchical faceting that correlates the industry sector of an S&P 500 company with the state and city of the HQ of that company. In your particular use case, you may wish to use it for organizing content in filesystems, websites, taxonomies or pretty much anything that exhibits, as the name implies, hierarchical relationships.

Who knew? Hierarchies are just like graphs! They’re everywhere! 😉

Grant closes by suggesting Solr analysis capabilities for faceting would be a nice addition to Solr. Are you game?

January 21, 2014

Geospatial (distance) faceting…

Filed under: Facets,Geographic Data,Georeferencing,Lucene — Patrick Durusau @ 7:32 pm

Geospatial (distance) faceting using Lucene’s dynamic range facets by Mike McCandless.

From the post:

There have been several recent, quiet improvements to Lucene that, taken together, have made it surprisingly simple to add geospatial distance faceting to any Lucene search application, for example:

  < 1 km (147)
  < 2 km (579)
  < 5 km (2775)

Such distance facets, which allow the user to quickly filter their search results to those that are close to their location, has become especially important lately since most searches are now from mobile smartphones.

In the past, this has been challenging to implement because it’s so dynamic and so costly: the facet counts depend on each user’s location, and so cannot be cached and shared across users, and the underlying math for spatial distance is complex.

But several recent Lucene improvements now make this surprisingly simple!

As always, Mike is right on the edge so wait for Lucene 4.7 to try his code out or download the current source.

Distance might not be the only consideration. What if you wanted the shortest distance that did not intercept a a known patrol? Or known patrol within some window of variation.

Distance is still going to be a factor but the search required maybe more complex than just distance.

December 13, 2013

Fast range faceting…

Filed under: Facets,Lucene — Patrick Durusau @ 3:47 pm

Fast range faceting using segment trees and the Java ASM library by Mike McCandless.

From the post:

In Lucene’s facet module we recently added support for dynamic range faceting, to show how many hits match each of a dynamic set of ranges. For example, the Updated drill-down in the Lucene/Solr issue search application uses range facets. Another example is distance facets (< 1 km, < 2 km, etc.), where the distance is dynamically computed based on the user's current location. Price faceting might also use range facets, if the ranges cannot be established during indexing. To implement range faceting, for each hit, we first calculate the value (the distance, the age, the price) to be aggregated, and then lookup which ranges match that value and increment its counts. Today we use a simple linear search through all ranges, which has O(N) cost, where N is the number of ranges. But this is inefficient! ...

Mike lays out a more efficient approach, that hasn’t been folded into Lucene, yet.

I like the example of distance from a user as an example of distance as a dynamic facet.

Distance issues are common with mobile devices, but most of those are merchants trying to sell you something.

Not a public database use case, but what if you had an alternative map of a metropolitan area? Where the distance issue was to caches, safe houses, contacts, etc.?

You are double thumbing your mobile device just like everyone else but yours is displaying different data.

You could get false information that is auto-corrected by a local app. 😉

You may have heard the old saying:

The old saying goes that God made men, but Sam Colt made them equal.

We may need to add IT to that list.

October 25, 2013

From Text to Truth:…

Filed under: Facets,Solr — Patrick Durusau @ 6:50 pm

From Text to Truth: Real-World Facets for Multilingual Search by Benson Margulies.

Description:

Solr’s ability to facet search results gives end-users a valuable way to drill down to what they want. But for unstructured documents, deriving facets such as the persons mentioned requires advanced analytics. Even if names can be extracted from documents, the user doesn’t want a “George Bush” facet that intermingles documents mentioning either the 41st and 43rd U.S. Presidents, nor does she want separate facets for “George W. Bush” or even “乔治·沃克·布什” (a Chinese translation) that are limited to just one string. We’ll explore the benefits and challenges of empowering Solr users with real-world facets.

One of the better conference presentations I have seen in quite some time.

This is likely to change your mind about how you think about facets. Or at least how to construct them.

If you think of facets as the decoration you see at ecommerce sites, think again.

Enjoy!

May 22, 2013

Dynamic faceting with Lucene

Filed under: Faceted Search,Facets,Indexing,Lucene,Search Engines — Patrick Durusau @ 2:08 pm

Dynamic faceting with Lucene by Michael McCandless.

From the post:

Lucene’s facet module has seen some great improvements recently: sizable (nearly 4X) speedups and new features like DrillSideways. The Jira issues search example showcases a number of facet features. Here I’ll describe two recently committed facet features: sorted-set doc-values faceting, already available in 4.3, and dynamic range faceting, coming in the next (4.4) release.

To understand these features, and why they are important, we first need a little background. Lucene’s facet module does most of its work at indexing time: for each indexed document, it examines every facet label, each of which may be hierarchical, and maps each unique label in the hierarchy to an integer id, and then encodes all ids into a binary doc values field. A separate taxonomy index stores this mapping, and ensures that, even across segments, the same label gets the same id.

At search time, faceting cost is minimal: for each matched document, we visit all integer ids and aggregate counts in an array, summarizing the results in the end, for example as top N facet labels by count.

This is in contrast to purely dynamic faceting implementations like ElasticSearch‘s and Solr‘s, which do all work at search time. Such approaches are more flexible: you need not do anything special during indexing, and for every query you can pick and choose exactly which facets to compute.

However, the price for that flexibility is slower searching, as each search must do more work for every matched document. Furthermore, the impact on near-real-time reopen latency can be horribly costly if top-level data-structures, such as Solr’s UnInvertedField, must be rebuilt on every reopen. The taxonomy index used by the facet module means no extra work needs to be done on each near-real-time reopen.

The dynamic range faceting sounds particularly useful.

February 25, 2013

Drill Sideways faceting with Lucene

Filed under: Facets,Indexing,Lucene,Searching — Patrick Durusau @ 5:13 am

Drill Sideways faceting with Lucene by Mike McCandless.

From the post:

Lucene’s facet module, as I described previously, provides a powerful implementation of faceted search for Lucene. There’s been a lot of progress recently, including awesome performance gains as measured by the nightly performance tests we run for Lucene:

[3.8X speedup!]

….

For example, try searching for an LED television at Amazon, and look at the Brand field, seen in the image to the right: this is a multi-select UI, allowing you to select more than one value. When you select a value (check the box or click on the value), your search is filtered as expected, but this time the field does not disappear: it stays where it was, allowing you to then drill sideways on additional values. Much better!

LinkedIn’s faceted search, seen on the left, takes this even further: not only are all fields drill sideways and multi-select, but there is also a text box at the bottom for you to choose a value not shown in the top facet values.

To recap, a single-select field only allows picking one value at a time for filtering, while a multi-select field allows picking more than one. Separately, drilling down means adding a new filter to your search, reducing the number of matching docs. Drilling up means removing an existing filter from your search, expanding the matching documents. Drilling sideways means changing an existing filter in some way, for example picking a different value to filter on (in the single-select case), or adding another or’d value to filter on (in the multi-select case). (images omitted)

More details: DrillSideways class being developed under LUCENE-4748.

Just following the progress on Lucene is enough to make you dizzy!

January 25, 2013

Make your Filters Match: Faceting in Solr [Surveillance By and For The Public?]

Filed under: Facets,Lucene,Solr — Patrick Durusau @ 8:17 pm

Make your Filters Match: Faceting in Solr Florian Hopf.

From the post:

Facets are a great search feature that let users easily navigate to the documents they are looking for. Solr makes it really easy to use them though when naively querying for facet values you might see some unexpected behaviour. Read on to learn the basics of what is happening when you are passing in filter queries for faceting. Also, I’ll show how you can leverage local params to choose a different query parser when selecting facet values.

Introduction

Facets are a way to display categories next to a users search results, often with a count of how many results are in this category. The user can then select one of those facet values to retrieve only those results that are assigned to this category. This way he doesn’t have to know what category he is looking for when entering the search term as all the available categories are delivered with the search results. This approach is really popular on sites like Amazon and eBay and is a great way to guide the user.

Solr brought faceting to the Lucene world and arguably the feature was an important driving factor for its success (Lucene 3.4 introduced faceting as well). Facets can be build from terms in the index, custom queries and ranges though in this post we will only look at field facets.

Excellent introduction to facets in Solr.

The amount of enterprise quality indexing and search software that is freely available, makes me wonder why the average citizen worries about privacy?

There are far more average citizens than denizens of c-suites, government offices, and the like.

Shouldn’t they be the ones worrying about what the rest of us are compiling together?

Instead of secret, Stasi-like archives, a public archive, with the observations of ordinary citizens.

October 30, 2012

Solr vs ElasticSearch: Part 4 – Faceting

Filed under: ElasticSearch,Faceted Search,Facets,Solr,SolrCloud — Patrick Durusau @ 2:16 pm

Solr vs ElasticSearch: Part 4 – Faceting by Rafał Kuć.

From the post:

Solr 4 (aka SolrCloud) has just been released, so it’s the perfect time to continue our ElasticSearch vs. Solr series. In the last three parts of the ElasticSearch vs. Solr series we gave a general overview of the two search engines, about data handling, and about their full text search capabilities. In this part we look at how these two engines handle faceting.

Rafał continues his excellent comparison of Solr and ElasticSearch.

Understanding your software options is almost as important as understanding your data.

October 28, 2012

Faceted classification – Drill Up/Down, Out?

Filed under: Faceted Search,Facets — Patrick Durusau @ 10:19 am

Faceted classification

I use search facets in a number of contexts everyday.

But today this summary from Wikipedia struck me differently than most days:

A faceted classification system allows the assignment of an object to multiple characteristics (attributes), enabling the classification to be ordered in multiple ways, rather than in a single, predetermined, taxonomic order. A facet comprises “clearly defined, mutually exclusive, and collectively exhaustive aspects, properties or characteristics of a class or specific subject”.[1] For example, a collection of books might be classified using an author facet, a subject facet, a date facet, etc. (From Faceted classification at Wikipedia.)

My general experience is that facets are used to narrow search results. That is set result set is progressively narrowed to fewer and fewer items.

At the same time, a choice of facets can be discarded, returning to a broader result set.

So facets can move the searcher up and down in search result size, but within the bounds of the initial result set.

Has anyone experimented with adding facets from a broader pool? Say all the items in a database and not just those items in an initial search query?

Enabling the user to “drill out” from what we think of as the initial result set?

Which would raise questions about managing facets for a changing underlying set. For a user to broaden or narrow the result set in the more traditional way.

High Availability Search with SolrCloud

Filed under: Faceted Search,Facets,Solr,SolrCloud — Patrick Durusau @ 9:41 am

High Availability Search with SolrCloud by Brent Lemons.

Brent explains that using embedded ZooKeeper is useful for testing/learning SolrCloud, but high availaility requires more.

As in separate installations of SolrCloud and ZooKeeper, both as high availability applications.

He walks through the steps to create and test such an installation.

If you have or expect to have a high availability search requirement, Brent’s post will be helpful.

April 14, 2012

Faceting & result grouping

Filed under: Faceted Search,Facets,Lucene,Solr — Patrick Durusau @ 6:27 pm

Faceting & result grouping by Martijn van Groningen

From the post:

Result grouping and faceting are in essence two different search features. Faceting counts the number of hits for specific field values matching the current query. Result grouping groups documents together with a common property and places these documents under a group. These groups are used as the hits in the search result. Usually result grouping and faceting are used together and a lot of times the results get misunderstood.

The main reason is that when using grouping people expect that a hit is represented by a group. Faceting isn’t aware of groups and thus the computed counts represent documents and not groups. This different behaviour can be very confusion. A lot of questions on the Solr user mailing list are about this exact confusion.

In the case that result grouping is used with faceting users expect grouped facet counts. What does this mean? This means that when counting the number of matches for a specific field value the grouped faceting should check whether the group a document belongs to isn’t already counted before. This is best illustrated with some example documents.

Examples follow that make the distinction between groups and facets in Lucene and Solr clear. Not to mention specific suggestions on configuration of your service.

April 3, 2012

Custom security filtering in Solr

Filed under: Faceted Search,Facets,Filters,Solr — Patrick Durusau @ 4:19 pm

Custom security filtering in Solr by Erik Hatcher

Yonik recently wrote about “Advanced Filter Caching in Solr” where he talked about expensive and custom filters; it was left as an exercise to the reader on the implementation details. In this post, I’m going to provide a concrete example of custom post filtering for the case of filtering documents based on access control lists.

Recap of Solr’s filtering and caching

First let’s review Solr’s filtering and caching capabilities. Queries to Solr involve a full-text, relevancy scored, query (the infamous q parameter). As users navigate they will browse into facets. The search application generates filter query (fq) parameters for faceted navigation (eg. fq=color:red, as in the article referenced above). The filter queries are not involved in document scoring, serving only to reduce the search space. Solr sports a filter cache, caching the document sets of each unique filter query. These document sets are generated in advance, cached, and reduce the documents considered by the main query. Caching can be turned off on a per-filter basis; when filters are not cached, they are used in parallel to the main query to “leap frog” to documents for consideration, and a cost can be associated with each filter in order to prioritize the leap-frogging (smallest set first would minimize documents being considered for matching).

Post filtering

Even without caching, filter sets default to generate in advance. In some cases it can be extremely expensive and prohibitive to generate a filter set. One example of this is with access control filtering that needs to take the users query context into account in order to know which documents are allowed to be returned or not. Ideally only matching documents, documents that match the query and straightforward filters, should be evaluated for security access control. It’s wasteful to evaluate any other documents that wouldn’t otherwise match anyway. So let’s run through an example… a contrived example for the sake of showing how Solr’s post filtering works.

Good examples but also heed the author’s warning to use the techniques in this article when necessary. Some times simple solutions are the best. Like using the network authentication layer to prevent unauthorized users from seeing the Solr application at all. No muss, no fuss.

September 27, 2011

Faceted Search using Solr – what it is and what benefits does it provide..?

Filed under: Education,Facets,Solr — Patrick Durusau @ 6:47 pm

Faceted Search using Solr – what it is and what benefits does it provide..? by James Spencer (eduserv blog).

From the post:

What is Faceted Search?

Faceted search is a more advanced searching technology that enables the end user to structure their search and ultimately drill down using categories to find the end result they are looking for via the site search. Rather than relying on simple keyword searching, faceted searching allows a user to perform a keyword search but then filter content by pre-defined categories and filtering criteria.

Faceted searching also enables you to gain advanced funcionality like suggested search terms, auto completion on search terms and have associated links to content. This provides users with quicker, more flexible, dynamic and accurate search results.

The post goes on to list the benefits of faceted searching in a very accessible way, explains Solr, uses of Solr by the Department of Education (US), and gives additional examples of faceted searching.

Very high marks for presenting the material at a web developer/advanced user level. Hard to judge that consistently but this post comes as close as any I have seen recently.

September 23, 2011

Pivot Faceting (Decision Trees) in Solr 1.4.

Filed under: Facets,Solr — Patrick Durusau @ 6:22 pm

Pivot Faceting (Decision Trees) in Solr 1.4.

From the post:

Solr faceting breaks down searches for terms, phrases, and fields in the Solr into aggregated counts by matched fields or queries. Facets are a great way to “preview” further searches, as well as a powerful aggregation tool in their own right.

Before Solr 4.0, facets were only available at one level, meaning something like “counts for field ‘foo’” for a given query. Solr 4.0 introduced pivot facets (also called decision trees) which enable facet queries to return “counts for field ‘foo’ for each different field ‘bar’” – a multi-level facet across separate Solr fields.

Decision trees come up a lot, and at work, we need results along multiple axes – typically in our case “field/query by year” for a time series. However, we use Solr 1.4.1 and are unlikely to migrate to Solr 4.0 in the meantime. Our existing approach was to simply query for the top “n” fields for a first query, then perform a second-level facet query by year for each field result. So, for the top 20 results, we would perform 1 + 20 queries – clearly not optimal, when we’re trying to get this done in the context of a blocking HTTP request in our underlying web application.

Hoping to get something better than our 1 + n separate queries approach, I began researching the somewhat more obscure facet features present in Solr 1.4.1. And after some investigation, experimentation and a good amount of hackery, I was able to come up with a “faux” pivot facet scheme that mostly approximates true pivot faceting using Solr 1.4.1.

We’ll start by examining some real pivot facets in Solr 4.0, then look at the components and full technique for simulated pivot facets in Solr 1.4.1.

Not only a good introduction to a new feature in Solr 4.0 but how to sorta duplicate it in Solr 1.4.1!

August 22, 2011

Public Dataset Catalogs Faceted Browser

Filed under: Dataset,Facets,Linked Data,RDF — Patrick Durusau @ 7:42 pm

Public Dataset Catalogs Faceted Browser

A faceted browser for the catalogs, not their content.

Filter on coverage, location, country (not sure how location and country usefully differ), catalog status (seems to mix status and data type), and managed by.

Do be aware that as the little green balloons disappear with your selection that more of the coloring of the map itself appears.

I mention that because at first it seemed the map was being colored based on the facets I choose. Such as Europe is suddenly dark green when I chose the United States in the filter. Confusing at first and makes me wonder, why use a map with underlying coloration anyway? A white map with borders would be a better display background for the green balloons indicating catalog locations.

BTW, if you visit a catalog and then use the back button, all your filters are reset. Not a problem now with a small set of filters and only 100 catalogs but should this resource continue to grow, that could become a usability issue.

July 22, 2011

Designing Faceted Searches

Filed under: Facets,Search Interface,Searching — Patrick Durusau @ 6:09 pm

Tony Russell-Rose has been doing a series of posts on faceted searches.

Since topic maps capture information that can be presented as facets, I thought it would be helpful to gather up the links to Tony’s posts for your review.

Interaction Models for Faceted Search

Where am I? Techniques for wayfinding and navigation in faceted search

Designing Faceted Search: Getting the basics right (part 1)

Designing Faceted Search: Getting the basics right (part 2)

Designing Faceted Search: Getting the basics right (part 3)

And a couple of related goodies:

A Taxonomy of Search Strategies and their Design Implications

From Search to Discovery: Information Search Strategies and Design Solutions

Word of warning: You can easily lose hours if not days chasing down design insights that remain just out of reach. Have fun!

July 21, 2011

Oracle, Sun Burned, and Solr Exposure

Filed under: Data Mining,Database,Facets,Lucene,SQL,Subject Identity — Patrick Durusau @ 6:27 pm

Oracle, Sun Burned, and Solr Exposure

From the post:

Frankly we wondered when Oracle would move off the dime in faceted search. “Faceted search”, in my lingo, is showing users categories. You can fancy up the explanation, but a person looking for a subject may hit a dead end. The “facet” angle displays links to possibly related content. If you want to educate me, use the comments section for this blog, please.

We are always looking for a solution to our clients’ Oracle “findability” woes. It’s not just relevance. Think performance. Query and snack is the operative mode for at least one of our technical baby geese. Well, Oracle is a bit of a red herring. The company is not looking for a solution to SES11g functionality. Lucid Imagination, a company offering enterprise grade enterprise search solutions, is.

If “findability” is an issue at Oracle, I would be willing to bet that subject identity is as well. Rumor has it that they have paying customers.

June 30, 2011

Faceting Module for Lucene!

Filed under: Facets,Lucene — Patrick Durusau @ 4:03 pm

Faceting Module for Lucene!

Reading the log for this issue is an education on how open source projects proceed at their best.

Oh, worth reading about the faceting aspects that you want to include in a topic map or other application as well.

May 19, 2011

Designing faceted search: Getting the basics right (part 1)

Filed under: Facets,Interface Research/Design,Search Interface,Searching — Patrick Durusau @ 3:27 pm

Designing faceted search: Getting the basics right (part 1)

Tony Russell-Rose says:

Over the last couple of weeks we’ve looked at some of the more advanced design issues in faceted search, including the strengths and weaknesses of various interaction models and techniques for wayfinding and navigation. In this post, we’ll complement that material with a look at some of the other fundamental design considerations such as layout (i.e. where to place the faceted navigation menus) and default state (e.g. open, closed, or a hybrid). In so doing, I’d like to acknowledge the work of James Kalbach, and in particular his tutorial on faceted search design, which provides an excellent framework for many of the key principles outlined below.

To write or improve a faceted search interface, start with this series of posts.

April 15, 2011

Interaction Models for Faceted Search

Filed under: Facets,Interface Research/Design,Search Interface — Patrick Durusau @ 6:29 am

Interaction Models for Faceted Search

Tony Russell-Rose on models for faceted search:

Faceted search offers tremendous potential for transforming search experiences. It provides a flexible framework by which users can satisfy a wide variety of information needs, ranging from simple lookup and fact retrieval to complex exploratory search and discovery scenarios. In recognition of this, UX designers are now starting to embrace its potential and have published many excellent articles on a variety of design issues, covering topics such as facet structure, layout & display, selection paradigm, and many more.

The purpose of this article is to explore one aspect that has received somewhat less attention than most: the interactive behaviour of the facets themselves, i.e. how they should respond and update when selected. Surprisingly, the design choices at this level of detail can make a remarkable difference to the overall user experience: the wrong choices can make an application feel disjointed and obstructive, and (in some cases) increase the likelihood of returning zero results. In this post, we’ll examine the key design options and provide some recommendations.

Highly recommended.

That your topic map has the right answer somewhere isn’t going to help a user who can’t find it.

January 17, 2011

Endeca User Interface Design Pattern Library

Filed under: Facets,Interface Research/Design,Navigation,Visualization — Patrick Durusau @ 6:36 am

Endeca User Interface Design Pattern Library

From the website:

p>The Endeca User Interface Design Pattern Library (UIDPL) describes  principled ways to solve common user interface design problems related to search, faceted navigation, and discovery. The library includes both specific UI design patterns as well as pattern topics such as:

  • Search
  • Faceted Navigation
  • Promotional Spotlighting
  • Results Manipulation
  • Faceted Analytics
  • Spatial Visualization

The patterns are offered as proposed sets of design guidelines based on our research and design experience as well as lessons learned from the information search and discovery community. They are NOT the only solutions, strict recipes etched in stone, or a substitute for sound human-centered design practices.

When the week starts off with discovery of a resource like this one, I know it is going to be a good week!

December 10, 2010

Semantically Equivalent Facets

Filed under: Authoring Topic Maps,Facets,Topic Map Software,Topic Map Systems,Topic Maps — Patrick Durusau @ 3:32 pm

I failed to mention semantically equivalent facets in either Identifying Subjects With Facets or Facets and “Undoable” Merges.

Sorry! I assumed it was too obvious to mention.

That is if you are using a facet based navigation with a topic map, it will return/navigate the facet you ask for, and also return/navigate any semantically equivalent facet.

One of the advantages of using a topic map to underlie a facet system is that users get the benefit of something familiar, a set of facet axes they recognize, while at the same time getting the benefit of navigating semantically equivalent facets without knowing about it.

I suppose I should say that declared semantically equivalent facets are included in navigation.

Declared semantic equivalence doesn’t just happen, nor is it free.

Keeping that in mind will help you ask questions when sales or project proposals gloss over the hard questions of what return you will derive from an investment in semantic technologies? And when?

Facets and “Undoable” Merges

After writing Identifying Subjects with Facets, I started thinking about the merge of the subjects matching a set of facets. So the user could observe all the associations where the members of that subject participated.

If merger is a matter of presentation to the user, then the user should be able to remove one of the members that makes up a subject from the merge. Which results in the removal of associations where that member of the subject participated.

No more or less difficult than the inclusion/exclusion based on the facets, except this time it involves removal on the basis of roles in associations. That is the playing of a role, being a role, etc. are treated as facets of a subject.

Well, except that an individual member of a collective subject is being manipulated.

This capability would enable a user to manipulate what members of a subject are represented in a merge. Not to mention being able to unravel a merge one member of a subject at a time.

An effective visual representation of such a capability could be quite stunning.

Identifying Subjects With Facets

If facets are aspects of subjects, then for every group of facets, I am identifying the subject that has those facets.

If I have the facets, height, weight, sex, age, street address, city, state, country, email address, then at the outset, my subject is the subject that has all those characteristics, with whatever value.

We could call that subject: people.

Not the way I usually think about it but follow the thought out a bit further.

For each facet where I specify a value, the subject identified by the resulting value set is both different from the starting subject and, more importantly, has a smaller set of members in the data set.

Members that make up the collective that is the subject we have identified.

Assume we have narrowed the set of people down to a group subject that has ten members.

Then, we select merge from our application and it merges these ten members.

Sounds damned odd, to merge what we know are different subjects?

What if by merging those different members we can now find these different individuals have a parent association with the same children?

Or have a contact relationship with a phone number associated with an individual or group of interest?

Robust topic map applications will offer users the ability to navigate and explore subject identities.

Subject identities that may not always be the ones you expect.

We don’t live in a canned world. Does your semantic software?

December 7, 2010

Bobo: Fast Faceted Search With Lucene

Filed under: Facets,Information Retrieval,Lucene,Navigation,Subject Identity — Patrick Durusau @ 8:52 pm

Bobo: Fast Faceted Search With Lucene

From the website:

Bobo is a Faceted Search implementation written purely in Java, an extension of Apache Lucene.

While Lucene is good with unstructured data, Bobo fills in the missing piece to handle semi-structured and structured data.

Bobo Browse is an information retrieval technology that provides navigational browsing into a semi-structured dataset. Beyond the result set from queries and selections, Bobo Browse also provides the facets from this point of browsing.

Features:

  • No need for cache warm-up for the system to perform
  • multi value sort – sort documents on fields that have multiple values per doc, .e.g tokenized fields
  • fast field value retrieval – over 30x faster than IndexReader.document(int docid)
  • facet count distribution analysis
  • stable and small memory footprint
  • support for runtime faceting
  • result merge library for distributed facet search

I had to go look up the definition of facet. Merriam-Webster (I remember when it was just Webster) says:

any of the definable aspects that make up a subject (as of contemplation) or an object (as of consideration)

So a faceted search could search/browse, in theory at any rate, based on any property of a subject, even those I don’t recognize.

Different languages being the easiest example.

I could have aspects of a hotel room described in both German and Korean, both describing the same facets of the room.

Questions:

  1. How would you choose the facets for a subject to be included in faceted browsing? (3-5 pages, no citations)
  2. How would you design and test the presentation of facets to users? (3-5 pages, no citations)
  3. Compare the current TMQL proposal (post-Barta) with the query language for facet searching. If a topic map were treated (post-merging) as faceted subjects, which one would you prefer and why? (3-5 pages, no citations)

December 6, 2010

to_be_classified: A Facet Analysis of a Folksonomy

Filed under: Classification,Facets,Folksonomy,Ranganathan — Patrick Durusau @ 5:37 am

to_be_classified: A Facet Analysis of a Folksonomy Author Elise Conradi Keywords Facet analysis, Faceted classification, VDP::Samfunnsvitenskap: 200::Biblioteks- og informasjonsvitenskap: 320::Kunnskapsgjenfinning og organisering: 323

Abstract:

This research examines Ranganathan’s postulational approach to facet analysis with the intention of manually inducing a faceted classification ontology from a folksonomy. Folksonomies are viewed as a source to a wealth of data representing users’ perspectives. An in-depth study of faceted classification theory is used to form a methodology based on the postulational approach. The dataset used to test the methodology consists of over 107,000 instances of 1,275 unique tags representing 76 popular non-fiction history books collected from the LibraryThing folksonomy. Preliminary results of the facet analysis indicate the manual inducement of two faceted classification ontologies in the dataset; one representing the universe of books and one representing the universe of subjects within the universe of books. The ontology representing the universe of books is considered to be complete, whereas the ontology representing the universe of subjects is incomplete. These differences are discussed in light of theoretical differences between special and universal faceted classifications. The induced ontologies are then discussed in terms of their substantiation or violation of Ranganathan’s Canons of Classification.

Highly recommended. Expect back references to this entry in the coming months.

Questions:

  1. Is Ranganathan’s “idea plane” for work in classification different from Husserl’s “bracketing?” If so, how? (3-5 pages, citations)
  2. How would you distinguish the “idea plane” from the “verbal plane?” (3-5 pages, no citations)
  3. How would you compare the “idea planes” as seen by two different classifiers? (3-5 pages, no citations)

Powered by WordPress