Archive for the ‘Rough Sets’ Category

Parametric matroid of rough set

Friday, September 28th, 2012

Parametric matroid of rough set by Yanfang Liu, William Zhu. ( for the first author, DBLP for the second.)


Rough set is mainly concerned with the approximations of objects through an equivalence relation on a universe. Matroid is a combinatorial generalization of linear independence in vector spaces. In this paper, we define a parametric set family, with any subset of a universe as its parameter, to connect rough sets and matroids. On the one hand, for a universe and an equivalence relation on the universe, a parametric set family is defined through the lower approximation operator. This parametric set family is proved to satisfy the independent set axiom of matroids, therefore it can generate a matroid, called a parametric matroid of the rough set. Three equivalent representations of the parametric set family are obtained. Moreover, the parametric matroid of the rough set is proved to be the direct sum of a partition-circuit matroid and a free matroid. On the other hand, since partition-circuit matroids were well studied through the lower approximation number, we use it to investigate the parametric matroid of the rough set. Several characteristics of the parametric matroid of the rough set, such as independent sets, bases, circuits, the rank function and the closure operator, are expressed by the lower approximation number.

If you are guessing this isn’t the “simpler” side of topic maps, you are right in one!

There are consumers of information/services (herein of “simpler” services of topic maps), authors of information/services (herein of semantic diversity by whatever tools), and finally, semantic intermediaries, map makers that cross the boundaries of semantic universes of discourse (here be dragons).

Not every aspect of topic maps is for everyone and we should not pretend otherwise.

Rough Set Rudiments

Monday, April 23rd, 2012

Rough Set Rudiments by Zdzislaw Pawlak and Andrzej Skowron.

From the basic philosophy section:

The rough set philosophy is founded on the assumption that with every object of the universe of discourse we associate some information (data, knowledge). For example, if objects are patients suffering from a certain disease, symptoms of the disease form information about patients. Objects characterized by the same information are indiscernible (similar) in view of the available information about them. The indiscernibility relation generated in this way is the mathematical basis for rough set theory.

Any set of all indiscernible (similar) objects is called an elementary set, and forms a basic granule (atom) of knowledge about the universe. Any union of some elementary sets is referred to as crisp (precise) set – otherwise the set is rough (imprecise, vague).

Consequently each rough set has boundary-line cases, i.e., objects which cannot be with certainty classified neither as members of the set nor of its complement. Obviously crisp sets have no boundary-line elements at all. That means that boundary-line cases cannot be properly classified by employing the available knowledge.

Thus, the assumption that objects can be “seen” only through the information available about them leads to the view that knowledge has granular structure. Due to the granularity of knowledge some objects of interest cannot be discerned and appear as the same (or similar). As a consequence vague concepts, in contrast to precise concepts, cannot be characterized in terms of information about their elements. Therefore, in the proposed approach, we assume that any vague concept is replaced by a pair of precise concepts – called the lower and the upper approximation of the vague concept. The lower approximation consists of all the objects which surely belong to the concept and the upper approximation contains all objects which possibly belong to the concept. Obviously, the difference between the upper and lower approximation constitutes the boundary region of the vague concept. Approximations are two basic operations in rough set theory.

Suspect that the normal case is “rough” sets, with “crisp” sets being an artifice of our views of the world.

This summary is a bit dated so I will use it as a basis for an update with citations to later materials.

Learning Fuzzy β-Certain and β-Possible rules…

Wednesday, April 18th, 2012

Learning Fuzzy β-Certain and β-Possible rules from incomplete quantitative data by rough sets by Ali Soltan Mohammadi, L. Asadzadeh, and D. D. Rezaee.


The rough-set theory proposed by Pawlak, has been widely used in dealing with data classification problems. The original rough-set model is, however, quite sensitive to noisy data. Tzung thus proposed deals with the problem of producing a set of fuzzy certain and fuzzy possible rules from quantitative data with a predefined tolerance degree of uncertainty and misclassification. This model allowed, which combines the variable precision rough-set model and the fuzzy set theory, is thus proposed to solve this problem. This paper thus deals with the problem of producing a set of fuzzy certain and fuzzy possible rules from incomplete quantitative data with a predefined tolerance degree of uncertainty and misclassification. A new method, incomplete quantitative data for rough-set model and the fuzzy set theory, is thus proposed to solve this problem. It first transforms each quantitative value into a fuzzy set of linguistic terms using membership functions and then finding incomplete quantitative data with lower and the fuzzy upper approximations. It second calculates the fuzzy {\beta}-lower and the fuzzy {\beta}-upper approximations. The certain and possible rules are then generated based on these fuzzy approximations. These rules can then be used to classify unknown objects.

In part interesting because of its full use of sample data to illustrate the process being advocated.

Unless smooth sets in data are encountered by some mis-chance, rough sets will remain a mainstay of data mining for the foreseeable future.

Ontologies as Semantically Discrete Data

Tuesday, January 3rd, 2012

The contest associated with the Topical Classification of Biomedical Research Papers conference involves the use of of the domain ontology MeSH. The contest involves the classification of materials using that ontology and clustering the results. (You should read the contest description for the full details. I am only pulling out facts needed for this post, which aren’t many.)

It occurred to me that an ontology consists of a set of values that are semantically discrete. That is any value in an ontology is distinct from all other values in the ontology and there is no “almost X,” or “nearly Y,” in an ontology.

I mention this because we apply ontologies to semantically continuous domains. Such as journal articles that were written without regard to any particular ontology.

Which would also explain why given a common ontology, such as MeSH, we may disagree as to which terms to apply to a particular document. We “see” different aspects in the semantically continuous document that influence our view of what term from the semantically discrete ontology to use. And in many cases we may be in agreement.

But the fact remains that we have applied a semantically discrete instrument to a semantically continuous data set.

I suppose one question is whether rough sets can capture and preserve some semantic continuity for use in information retrieval.

Soft fuzzy rough sets for robust feature evaluation and selection

Monday, September 27th, 2010

Soft fuzzy rough sets for robust feature evaluation and selection Authors: Qinghua Hu, Shuang An and Daren Yu. Keywords: Fuzzy rough sets – Feature evaluation – Noise – Soft fuzzy rough sets – Classification learning – Feature reduction

Introduces techniques that reduce the influence of noise on fuzzy rough sets. Important in a world full of noise.

Question for the ambitious: Survey ten articles on feature reduction that don’t cite each other. Pick 2 features that were eliminated in each article. Do you agree/disagree with the evaluation of those features? Not a question of the numerical calculation but your view of the useful/not useful nature of the feature.

Subjects, Identifiers, IRI’s, Crisp Sets

Sunday, September 19th, 2010

I was reading Fuzzy Sets, Uncertainty, and Information by George J. Klir and Tina A. Folger, when it occurred to me that use of IRI’s as identifiers for subjects, is by definition a “crisp set.”

Klir and Folger observe:

The crisp set is defined in such a was as to dichotomize the individuals in some given universe of discourse into two groups: members (those that certainly belong in the set) and nonmembers (those that certainly do not). A sharp, unambiguous distinction exists between the members of the class or category represented by the crisp set. (p. 3)

A subject can be assigned an IRI as an identifier, based on some set of properties.

That assignment and use as an identifier makes identification a crisp set operation.

Eliminates fuzzy, rough, soft and other non-crisp set operations, as well as other means of identification.

What formal characteristics of crisp sets are useful for topic maps?

Are those characteristics useful for topic map design, authoring or both?

Extra credit: Any set software you would suggest to test your answers?

Rough Fuzzies, and Beyond?

Friday, July 2nd, 2010

Reading Rought Sets: Theoretical Aspects of Reasoning about Data by Zdzislaw Pawlak, when I ran across this comparison of rough versus fuzzy sets:

Rough sets has often been compared to fuzzy sets, sometimes with a view to introduce them as competing models of imperfect knowledge. Such a comparison is unfounded. Indiscernibility and vagueness are distinct facets of imperfect knowledge. Indiscernibility refers to the granularity of knowledge, that affects the definition of universes of discourse. Vagueness is due to the fact that categories of natural language are often gradual notions, and refer to sets with smooth boundaries. Borrowing an example from image processing, rough set theory is about the size of pixels, fuzzy set theory is about the existence of more than two levels of grey. (pp. ix-x)

It occurred to me that the precision of our identifications or perhaps better, the fixed precision of our identifications is a real barrier to semantic integration. Because the precision I need for semantic integration is going to vary from subject to subject, depending upon what I already know, what I need to know and for what purpose. Very coarse identification may be acceptable for some purposes but not others.

I don’t know what it would look like to have varying degrees of precision to subject identification or even how that would be represented. But, I suspect solving those problems will be involved in any successful approach to semantic integration.