Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 16, 2011

Tsearch2 – full text extension for PostgreSQL

Filed under: PostgreSQL,TSearch — Patrick Durusau @ 1:12 pm

Tsearch2 – full text extension for PostgreSQL

Following up on the TSearch Primer post from yesterday.

This is the current documentation for Tsearch2.

February 15, 2011

TSearch Primer

Filed under: PostgreSQL,SQL,TSearch — Patrick Durusau @ 2:05 pm

TSearch Primer

From the website:

TSearch is a Full-Text Search engine that is packaged with PostgreSQL. The key developers of TSearch are Oleg Bartunov and Teodor Sigaev who have also done extensive work with GiST and GIN indexes used by PostGIS, PgSphere and other projects. For more about how TSearch and OpenFTS got started check out A Brief History of FTS in PostgreSQL. Check out the TSearch Official Site if you are interested in related TSearch tips or interested in donating to this very worthy project.

Tsearch is different from regular string searching in PostgreSQL in a couple of key ways.

  1. It is well-suited for searching large blobs of text since each word is indexed using a Generalized Inverted Index (GIN) or Generalized Search Tree (GiST) and searched using text search vectors. GIN is generally used for indexing. Search vectors are at word and phrase boundaries.
  2. TSearch has a concept of Linguistic significance using various language dictionaries, ISpell, thesaurus, stop words, etc. therefore it can ignore common words and equate like meaning terms and phrases.
  3. TSearch is for the most part case insensitive.
  4. While various dictionaries and configs are available out of the box with TSearch, one can create new ones and customize existing further to cater to specific niches within industries – e.g. medicine, pharmaceuticals, physics, chemistry, biology, legal matters.

Short introduction to TSearch, which is part of PostgreSQL.

Should be of interest to topic mappers using PostgreSQL.

Powered by WordPress