Archive for the ‘Dimensions’ Category

‘The Algorithm That Runs the World’ [Optimization, Identity and Polytopes]

Tuesday, August 28th, 2012

“The Algorithm That Runs the World” by Erwin Gianchandani.

From the post:

New Scientist published a great story last week describing the history and evolution of the simplex algorithm — complete with a table capturing “2000 years of algorithms”:

The simplex algorithm directs wares to their destinations the world over [image courtesy PlainPicture/Gozooma via New Scientist].Its services are called upon thousands of times a second to ensure the world’s business runs smoothly — but are its mathematics as dependable as we thought?

YOU MIGHT not have heard of the algorithm that runs the world. Few people have, though it can determine much that goes on in our day-to-day lives: the food we have to eat, our schedule at work, when the train will come to take us there. Somewhere, in some server basement right now, it is probably working on some aspect of your life tomorrow, next week, in a year’s time.

Perhaps ignorance of the algorithm’s workings is bliss. The door to Plato’s Academy in ancient Athens is said to have borne the legend “let no one ignorant of geometry enter”. That was easy enough to say back then, when geometry was firmly grounded in the three dimensions of space our brains were built to cope with. But the algorithm operates in altogether higher planes. Four, five, thousands or even many millions of dimensions: these are the unimaginable spaces the algorithm’s series of mathematical instructions was devised to probe.

Perhaps, though, we should try a little harder to get our heads round it. Because powerful though it undoubtedly is, the algorithm is running into a spot of bother. Its mathematical underpinnings, though not yet structurally unsound, are beginning to crumble at the edges. With so much resting on it, the algorithm may not be quite as dependable as it once seemed [more following the link].

A fund manager might similarly want to arrange a portfolio optimally to balance risk and expected return over a range of stocks; a railway timetabler to decide how best to roster staff and trains; or a factory or hospital manager to work out how to juggle finite machine resources or ward space. Each such problem can be depicted as a geometrical shape whose number of dimensions is the number of variables in the problem, and whose boundaries are delineated by whatever constraints there are (see diagram). In each case, we need to box our way through this polytope towards its optimal point.

This is the job of the algorithm.

Its full name is the simplex algorithm, and it emerged in the late 1940s from the work of the US mathematician George Dantzig, who had spent the second world war investigating ways to increase the logistical efficiency of the U.S. air force. Dantzig was a pioneer in the field of what he called linear programming, which uses the mathematics of multidimensional polytopes to solve optimisation problems. One of the first insights he arrived at was that the optimum value of the “target function” — the thing we want to maximise or minimise, be that profit, travelling time or whatever — is guaranteed to lie at one of the corners of the polytope. This instantly makes things much more tractable: there are infinitely many points within any polytope, but only ever a finite number of corners.

If we have just a few dimensions and constraints to play with, this fact is all we need. We can feel our way along the edges of the polytope, testing the value of the target function at every corner until we find its sweet spot. But things rapidly escalate. Even just a 10-dimensional problem with 50 constraints — perhaps trying to assign a schedule of work to 10 people with different expertise and time constraints — may already land us with several billion corners to try out.

Apologies but I saw this article too late to post within the “free” days allowed by New Scientist.

But, I think from Erwin’s post and long quote from the original article, you can see how the simplex algorithm may be very useful where identity is defined in multidimensional space.

The literature in this area is vast and it may not offer an appropriate test for all questions of subject identity.

For example, the possessor of a credit card is presumed to be the owner of the card. Other assumptions are possible, but fraud costs are recouped from fees paid by customers. Creating a lack of interest in more stringent identity tests.

On the other hand, if your situation requires multidimensional identity measures, this may be a useful approach.


PS: Be aware that naming confusion, the sort that can be managed (not solved) by topic maps abounds even in mathematics:

The elements of a polytope are its vertices, edges, faces, cells and so on. The terminology for these is not entirely consistent across different authors. To give just a few examples: Some authors use face to refer to an (n−1)-dimensional element while others use face to denote a 2-face specifically, and others use j-face or k-face to indicate an element of j or k dimensions. Some sources use edge to refer to a ridge, while H. S. M. Coxeter uses cell to denote an (n−1)-dimensional element. (Polytope)

Clustering high dimensional data

Thursday, June 28th, 2012

Clustering high dimensional data by Ira Assent. (Assent, I. (2012), Clustering high dimensional data. WIREs Data Mining Knowl Discov, 2: 340–350. doi: 10.1002/widm.1062)

Abstract:

High-dimensional data, i.e., data described by a large number of attributes, pose specific challenges to clustering. The so-called ‘curse of dimensionality’, coined originally to describe the general increase in complexity of various computational problems as dimensionality increases, is known to render traditional clustering algorithms ineffective. The curse of dimensionality, among other effects, means that with increasing number of dimensions, a loss of meaningful differentiation between similar and dissimilar objects is observed. As high-dimensional objects appear almost alike, new approaches for clustering are required. Consequently, recent research has focused on developing techniques and clustering algorithms specifically for high-dimensional data. Still, open research issues remain. Clustering is a data mining task devoted to the automatic grouping of data based on mutual similarity. Each cluster groups objects that are similar to one another, whereas dissimilar objects are assigned to different clusters, possibly separating out noise. In this manner, clusters describe the data structure in an unsupervised manner, i.e., without the need for class labels. A number of clustering paradigms exist that provide different cluster models and different algorithmic approaches for cluster detection. Common to all approaches is the fact that they require some underlying assessment of similarity between data objects. In this article, we provide an overview of the effects of high-dimensional spaces, and their implications for different clustering paradigms. We review models and algorithms that address clustering in high dimensions, with pointers to the literature, and sketch open research issues. We conclude with a summary of the state of the art.

The author has a clever example (figure 4) of why adding dimensions can decrease the discernment of distinct groups in data. A problem that worsens as the number of dimensions increases.

Or does it? Or is it the case that by weighting all dimensions equally we get the result we deserve?

My counter-example would be introducing you to twin sisters. As the number of dimensions increased, so would the similarity that would befoul any clustering algorithm.

But the important dimension, their names, is sufficient to cluster attributes around the appropriate data points.

Is the “curse of dimensionality” rather a “failure to choose dimensions wisely?”

Visualizing 4+ Dimensions

Wednesday, December 28th, 2011

Visualizing 4+ Dimensions

From the post:

When people realize that I study pure math, they often ask about how to visualize four or more dimensions. I guess it’s a natural question to ask, since mathematicians often have to deal with very high (and sometimes infinite) dimensional objects. Yet people in pure math never really have this problem.

Pure mathematicians might like you to think that they’re just that much smarter. But frankly, I’ve never had to visualize anything high-dimensional in my pure math classes. Working things out algebraically is much nicer, and using a lower-dimensional object as an example or source of intuition usually works out — at least at the undergrad level.

But that’s not a really satisfying answer, for two reasons. One is that it is possible to visualize high-dimensional objects, and people have developed many ways of doing so. Dimension Math has on its website a neat series of videos for visualizing high-dimensional geometric objects using stereographic projection. The other reason is that while pure mathematicians do not have a need for visualizing high-dimensions, statisticians do. Methods of visualizing high dimensional data can give useful insights when analyzing data.

This is an important area for study, but not only because identifications can consist of values in multiple dimensions.

It is important because the recognition of an identifier can also consist of values spread across multiple dimensions.

More on that second statement before year’s end (so you don’t have to wait very long, just until holiday company leaves).

I first saw this in Christophe Lalanne’s A bag of tweets / Dec 2011.

The Shape of Things – SHAPES 1.0

Wednesday, December 14th, 2011

The Shape of Things – SHAPES 1.0

Proceedings of the First Interdisciplinary Workshop on SHAPES, Karlsruhe, Germany, September 27, 2011. Edited by: Janna Hastings, Oliver Kutz, Mehul Bhatt, Stefano Borgo

If you have ever thought of “shape” as being a simple issue, consider the abstract from “Shape is a Non-Quantifiable Physical Dimension” by Ingvar Johansson:

In the natural-scientific community it is often taken for granted that, sooner or later, all basic physical property dimensions can be quantified and turned into a kind-of-quantity; meaning that all their possible determinate properties can be put in a one-to-one correspondence with the real numbers. By using some transfinite mathematics, the paper shows this tacit assumption to be wrong. Shape is a very basic property dimension; but, since it can be proved that there are more possible kinds of determinate shapes than real numbers, shape cannot be quantified. There will never be a shape scale the way we have length and temperature scales. This is the most important conclusion, but more is implied by the proof. Since every n-dimensional manifold has the same cardinality as the real number line, all shapes cannot even be represented in a three-dimensional manifold the way perceivable colors are represented in so-called color solids.

If shape, which exists in metric space has these issues, that casts a great deal of doubt on mapping semantics, which exists in non-metric space, in a “…one-to-one correspondence with real numbers.”

Don’t you think?

We can make simplifying assumptions about semantics and make such mappings, but we need to be aware that is what is happening.

tutorial draft on curse of dimensionality

Saturday, December 3rd, 2011

tutorial draft on curse of dimensionality

From the post:

Curse of dimensionality is a widely heard of, largely misunderstood concept in machine learning. There is one single explanation of it circulating, but there is more to it. I will explain what is the curse, and how it complicates everything.

I don’t follow hockey but the example would be easy enough to adapt by subject domain.

The author illustrates one problem with dimensionality and promises to discuss others.

I say “the author” because this is one of those blogs where identification of the author isn’t clear. In academic discussions that is more than a little annoying.

Good illustration of the problem and points for that.

QUDT – Quantities, Units, Dimensions and Data Types in OWL and XML

Monday, September 12th, 2011

QUDT – Quantities, Units, Dimensions and Data Types in OWL and XML

From background:

The QUDT Ontologies, and derived XML Vocabularies, are being developed by TopQuadrant and NASA. Originally, they were developed for the NASA Exploration Initiatives Ontology Models (NExIOM) project, a Constellation Program initiative at the AMES Research Center (ARC). The goals of the QUDT ontology are twofold:

  • to provide a unified model of, measurable quantities, units for measuring different kinds of quantities, the numerical values of quantities in different units of measure and the data structures and data types used to store and manipulate these objects in software;
  • to populate the model with the instance data (quantities, units, quantity values, etc.) required to meet the life-cycle needs of the Constellation Program engineering community.

If you are looking for measurements, this would be one place to start.