Archive for the ‘Topology’ Category

Physics, Topology, Logic and Computation: A Rosetta Stone

Monday, February 22nd, 2016

Physics, Topology, Logic and Computation: A Rosetta Stone by John C. Baez and Mike Stay.

Abstract:

In physics, Feynman diagrams are used to reason about quantum processes. In the 1980s, it became clear that underlying these diagrams is a powerful analogy between quantum physics and topology. Namely, a linear operator behaves very much like a ‘cobordism’: a manifold representing spacetime, going between two manifolds representing space. This led to a burst of work on topological quantum field theory and ‘quantum topology’. But this was just the beginning: similar diagrams can be used to reason about logic, where they represent proofs, and computation, where they represent programs. With the rise of interest in quantum cryptography and quantum computation, it became clear that there is extensive network of analogies between physics, topology, logic and computation. In this expository paper, we make some of these analogies precise using the concept of ‘closed symmetric monoidal category’. We assume no prior knowledge of category theory, proof theory or computer science.

While this is an “expository” paper, at some 66 pages (sans the references), you best set aside some of your best thinking/reading time to benefit from it.

Enjoy!

Extracting insights from the shape of complex data using topology

Thursday, November 6th, 2014

Extracting insights from the shape of complex data using topology by P. Y. Lum, et al. (Scientific Reports 3, Article number: 1236 doi:10.1038/srep01236)

Abstract:

This paper applies topological methods to study complex high dimensional data sets by extracting shapes (patterns) and obtaining insights about them. Our method combines the best features of existing standard methodologies such as principal component and cluster analyses to provide a geometric representation of complex data sets. Through this hybrid method, we often find subgroups in data sets that traditional methodologies fail to find. Our method also permits the analysis of individual data sets as well as the analysis of relationships between related data sets. We illustrate the use of our method by applying it to three very different kinds of data, namely gene expression from breast tumors, voting data from the United States House of Representatives and player performance data from the NBA, in each case finding stratifications of the data which are more refined than those produced by standard methods.

In order to identify subjects you must first discover them.

Does the available financial contribution data on members of the United States House of Representatives correspond with the clustering analysis here? (Asking because I don’t know but would be interested in finding out.)

I first saw this in a tweet by Stian Danenbarger.

Elementary Applied Topology

Wednesday, September 17th, 2014

Elementary Applied Topology by Robert Ghrist.

From the introduction:

What topology can do

Topology was built to distinguish qualitative features of spaces and mappings. It is good for, inter alia:

  1. Characterization: Topological properties encapsulate qualitative signatures. For example, the genus of a surface, or the number of connected components of an object, give global characteristics important to classification.
  2. Continuation: Topological features are robust. The number of components or holes is not something that should change with a small error in measurement. This is vital to applications in scientific disciplines, where data is never noisy.
  3. Integration: Topology is the premiere tool for converting local data into global properties. As such, it is rife with principles and tools (Mayer-Vietoris, Excision, spectral sequences, sheaves) for integrating from local to global.
  4. Obstruction: Topology often provides tools for answering feasibility of certain problems, even when the answers to the problems themselves are hard to compute. These characteristics, classes, degrees, indices, or obstructions take the form of algebraic-topological entities.

What topology cannot do

Topology is fickle. There is no resource to tweaking epsilons should desiderata fail to be found. If the reader is a scientist or applied mathematician hoping that topological tools are a quick fix, take this text with caution. The reward of reading this book with care may be limited to the realization of new questions as opposed to new answers. It is not uncommon that a new mathematical tool contributes to applications not by answering a pressing question-of-the-day but by revealing a different (and perhaps more significant) underlying principle.

The text will require more than casual interest but what a tool to add to your toolbox!

I first saw this in a tweet from Topology Fact.

Community detection in networks:…

Thursday, June 5th, 2014

Community detection in networks: structural clusters versus ground truth by Darko Hric, Richard K. Darst, and, Santo Fortunato.

Abstract:

Algorithms to find communities in networks rely just on structural information and search for cohesive subsets of nodes. On the other hand, most scholars implicitly or explicitly assume that structural communities represent groups of nodes with similar (non-topological) properties or functions. This hypothesis could not be verified, so far, because of the lack of network datasets with ground truth information on the classification of the nodes. We show that traditional community detection methods fail to find the ground truth clusters in many large networks. Our results show that there is a marked separation between structural and annotated clusters, in line with recent findings. That means that either our current modeling of community structure has to be substantially modified, or that annotated clusters may not be recoverable from topology alone.

Deeply interesting work if you are trying to detect “subjects” by clustering nodes in a network.

I would heed the warning that typology may not accurately represent hidden information.

Beyond this particular case, I would test any assumption that some known factor represents an unknown factor(s) for any data set. Better than the results surprise you than your client.

I first saw this in a tweet by Brian Keegan.

PS: As you already know, “ground truth” depends upon your point of view. Don’t risk your work on the basis of someone else’s “ground truth.”

Topological maps or topographic maps?

Friday, December 27th, 2013

Topological maps or topographic maps? by Dave Richeson.

From the post:

While surfing the web the other day I read an article in which the author refers to a “topological map.” I think it is safe to say that he meant to write “topographic map.” This is an error I’ve seen many times before.

A topographic map is a map of a region that shows changes in elevation, usually with contour lines indicating different fixed elevations. This is a map that you would take on a hike.

A topological map is a continuous function between two topological spaces—not the same thing as a topographic map at all!

I thought for sure that there was no cartographic meaning for topological map. It turns out, however, that there is.

A topological map is a map that is only concerned with relative locations of features on the map, not on exact locations. A famous example is the graph that we use to solve the Bridges of Königsberg problem.

A useful reminder.

Although I would use even topological maps of concepts, establishing relative locations, with caution. Concepts have no universal metric and therefore placement on a topological map is largely arbitrary.

Computational Topology and Data Analysis

Wednesday, November 13th, 2013

Computational Topology and Data Analysis by Tamal K Dey.

Course syllabus:

Computational topology has played a synergistic role in bringing together research work from computational geometry, algebraic topology, data analysis, and many other related scientific areas. In recent years, the field has undergone particular growth in the area of data analysis. The application of topological techniques to traditional data analysis, which before has mostly developed on a statistical setting, has opened up new opportunities. This course is intended to cover this aspect of computational topology along with the developments of generic techniques for various topology-centered problems.

A course outline on computational topology with a short reading list, papers and notes on various topics.

I found this while looking up references on Tackling some really tough problems….

Tackling some really tough problems…

Wednesday, November 13th, 2013

Tackling some really tough problems with machine learning by Derrick Harris.

From the post:

Machine learning startup Ayasdi is partnering with two prominent institutions — Lawrence Livermore National Laboratory and the Texas Medical Center — to help advance some of their complicated data challenges. At LLNL, the company will collaborate on research in energy, climate change, medical technology, and national security, while its work with the Texas Medical Center will focus on translational medicine, electronic medical records and finding new uses for existing drugs.

Ayasdi formally launched in January after years researching its core technology, called topological data analysis. Essentially, the company’s software, called Iris, uses hundreds of machine learning algorithms to analyze up to tens of billions of data points and identify the relationships among them. The topological part comes from the way the results of this analysis are visually mapped into a network that places similar or tightly connected points near one another so users can easily spot collections of variables that appear to affect each other.

Tough problems:

At LLNL, the company will collaborate on research in energy, climate change, medical technology, and national security, while its work with the Texas Medical Center will focus on translational medicine, electronic medical records and finding new uses for existing drugs.

I would say so but that wasn’t the “tough” problem I was expecting.

The “tough” problem I had in mind was taking data with no particular topology and mapping it to a topology.

I ask because “similar or tightly connected points” depend upon a notion of “similarity” that is not inherent in most data points.

For example, how “similar” are you from a leaker by working in the same office? How does that “similarity” compare to the “similarity” of other relationships?


Original text (which I have corrected above):

I ask because “similar or tightly connected points” depend upon a notion of “distance” that is not inherent in most data points.

For example, how “near” or “far” are you from a leaker by working in the same office? How does that “near” or “far” compare to the nearness or farness of other relationships?

I corrected the original post to remove the implication of a metric distance.

Tangle Machines

Wednesday, September 11th, 2013

Daniel Moskovich has started a thread on what he calls: “Tangle Machines.”

The series started with Tangle Machines- Positioning claim.

From that post:

Avishy Carmi and I are in the process of finalizing a preprint on what we call “tangle machines”, which are knot-like objects which store and process information. Topologically, these roughly correspond to embedded rack-coloured networks of 2-spheres connected by line segments. Tangle machines aren’t classical knots, or 2-knots, or knotted handlebodies, or virtual knots, or even w-knot. They’re a new object of study which I would like to market.

Below is my marketing strategy.

My positioning claim is:

  • Tangle machines blaze a trail to information topology.

My three supporting points are:

  • Tangle machines pre-exist in a the sense of Plato. If you look at a knot from the perspective of information theory, you are inevitably led to their definition.
  • Tangle machines are interesting mathematical objects with rich algebraic structure which present a plethora of new and interesting questions with information theoretic content.
  • Tangle machines provide a language in which one might model “real-world” classical and quantum interacting processes in a new and useful way.

Next post, I’ll introduce tangle machines. Right now, I’d like to preface the discussion with a content-free pseudo-philosophical rant, which argues that different approaches to knot theory give rise to different `most natural’ objects of study.

You know where Daniel lost me in his sales pitch. 😉

But, it’s Daniel’s story so read on “as though” knots “pre-exist,” even though he later concedes the pre-existing claim cannot be proven or even tested.

Tangle Machines- Part 1 by Daniel Moskovich begins:

In today’s post, I will define tangle machines. In subsequent posts, I’ll realize them topologically and describe how we study them and more about what they mean.

To connect to what we already know, as a rough first approximation, a tangle machine is an algebraic structure obtained from taking a knot diagram coloured by a rack, then building a graph whose vertices correspond to the arcs of the diagram and whose edges correspond to crossings (the overcrossing arc is a single unit- so it “acts on” one undercrossing arc to change its colour and to convert it into another undercrossing arc). Such considerations give rise to a combinatorial diagrammatic-algebraic setup, and tangle machines are what comes from taking this setup seriously. One dream is that this setup is well-suited to modeling mutually interacting processes which satisfy a natural `conservation law’- and to move in a very applied direction of actually identifying tangle machine inside data.

To whet your appetite, below is a pretty figure illustrating a 926 knot hiding inside a synthetic collection of phase transitions between anyons (an artificial and unrealistic collection; the hope is to find such things inside real-world data):

Hard to say if in the long run this “new” data structure will be useful or not.

Do stay tuned for future developments.

Physics, Topology, Logic and Computation:…

Sunday, July 7th, 2013

Physics, Topology, Logic and Computation: A Rosetta Stone by John C. Baez and Mike Stay.

Abstract:

In physics, Feynman diagrams are used to reason about quantum processes. In the 1980s, it became clear that underlying these diagrams is a powerful analogy between quantum physics and topology: namely, a linear operator behaves very much like a “cobordism”. Similar diagrams can be used to reason about logic, where they represent proofs, and computation, where they represent programs. With the rise of interest in quantum cryptography and quantum computation, it became clear that there is extensive network of analogies between physics, topology, logic and computation. In this expository paper, we make some of these analogies precise using the concept of “closed symmetric monoidal category”. We assume no prior knowledge of category theory, proof theory or computer science.

The authors set out to create a Rosetta stone for the areas of physics, topology, logic and computation on the subject of categories.

Seventy (70)+ pages of heavy reading but worth the effort (at least so far)!

The Homotopy Type Theory Book is out!

Tuesday, June 25th, 2013

The Homotopy Type Theory Book is out! by Robert Harper.

From the post:

By now many of you have heard of the development of Homotopy Type Theory (HoTT), an extension of intuitionistic type theory that provides a natural foundation for doing synthetic homotopy theory. Last year the Institute for Advanced Study at Princeton sponsored a program on the Univalent Foundations of Mathematics, which was concerned with developing these ideas. One important outcome of the year-long program is a full-scale book presenting the main ideas of Homotopy Type Theory itself and showing how to apply them to various branches of mathematics, including homotopy theory, category theory, set theory, and constructive analysis. The book is the product of a joint effort by dozens of participants in the program, and is intended to document the state of the art as it is known today, and to encourage its further development by the participation of others interested in the topic (i.e., you!). Among the many directions in which one may take these ideas, the most important (to me) is to develop a constructive (computational) interpretation of HoTT. Some partial results in this direction have already been obtained, including fascinating work by Thierry Coquand on developing a constructive version of Kan complexes in ITT, by Mike Shulman on proving homotopy canonicity for the natural numbers in a two-dimensional version of HoTT, and by Dan Licata and me on a weak definitional canonicity theorem for a similar two-dimensional theory. Much work remains to be done to arrive at a fully satisfactory constructive interpretation, which is essential for application of these ideas to computer science. Meanwhile, though, great progress has been made on using HoTT to formulate and formalize significant pieces of mathematics in a new, and strikingly beautiful, style, that are well-documented in the book.

The book is freely available on the web in various formats, including a PDF version with active references, an ebook version suitable for your reading device, and may be purchased in hard- or soft-cover from Lulu. The book itself is open source, and is available at the Hott Book Git Hub. The book is under the Creative Commons CC BY-SA license, and will be freely available in perpetuity.

Readers may also be interested in the posts on Homotopy Type Theory, the n-Category Cafe, and Mathematics and Computation which describe more about the book and the process of its creation.

I can’t promise you that Homotopy Type Theory is going to be immediately useful in your topic map practice.

However, any theory that aims at replacing set theory (and it definitions of equality) is potentially useful for topic maps.

There are doctrines of subject equivalence far beyond simple string matches and no doubt clients who are willing to pay for them.

The Filtering vs. Clustering Dilemma

Tuesday, June 11th, 2013

Hierarchical information clustering by means of topologically embedded graphs by Won-Min Song, T. Di Matteo, and Tomaso Aste.

Abstract:

We introduce a graph-theoretic approach to extract clusters and hierarchies in complex data-sets in an unsupervised and deterministic manner, without the use of any prior information. This is achieved by building topologically embedded networks containing the subset of most significant links and analyzing the network structure. For a planar embedding, this method provides both the intra-cluster hierarchy, which describes the way clusters are composed, and the inter-cluster hierarchy which describes how clusters gather together. We discuss performance, robustness and reliability of this method by first investigating several artificial data-sets, finding that it can outperform significantly other established approaches. Then we show that our method can successfully differentiate meaningful clusters and hierarchies in a variety of real data-sets. In particular, we find that the application to gene expression patterns of lymphoma samples uncovers biologically significant groups of genes which play key-roles in diagnosis, prognosis and treatment of some of the most relevant human lymphoid malignancies.

I like the framing of the central issue a bit further in the paper:

Filtering information out of complex datasets is becoming a central issue and a crucial bottleneck in any scientifi c endeavor. Indeed, the continuous increase in the capability of automatic data acquisition and storage is providing an unprecedented potential for science. However, the ready accessibility of these technologies is posing new challenges concerning the necessity to reduce data-dimensionality by fi ltering out the most relevant and meaningful information with the aid of automated systems. In complex datasets information is often hidden by a large degree of redundancy and grouping the data into clusters of elements with similar features is essential in order to reduce complexity [1]. However, many clustering methods require some a priori information and must be performed under expert supervision. The requirement of any prior information is a potential problem because often the fi ltering is one of the preliminary processing on the data and therefore it is performed at a stage where very little information about the system is available. Another difficulty may arise from the fact that, in some cases, the reduction of the system into a set of separated local communities may hide properties associated with the global organization. For instance, in complex systems, relevant features are typically both local and global and di fferent levels of organization emerge at diff erent scales in a way that is intrinsically not reducible. We are therefore facing the problem of catching simultaneously two complementary aspects: on one side there is the need to reduce the complexity and the dimensionality of the data by identifying clusters which are associated with local features; but, on the other side, there is a need of keeping the information about the emerging global organization that is responsible for cross-scale activity. It is therefore essential to detect clusters together with the diff erent hierarchical gatherings above and below the cluster levels. (emphasis added)

Simplification of data is always lossy. The proposed technique does not avoid all loss but hopes to mitigate its consequences.

Briefly the technique relies upon building a network of the “most significant” links and analyzing the network structure. The synthetic and real data sets show that the technique works quite well. At least for data sets where we can judge the outcome.

What of larger data sets? Where the algorithmic approaches are the only feasible means of analysis? How do we judge accuracy in those cases?

A revised version of this paper appears as: Hierarchical Information Clustering by Means of Topologically Embedded Graphs by Won-Min Song, T. Di Matteo, and Tomaso Aste.

The original development of the technique used here can be found in: A tool for filtering information in complex systems by M. Tumminello, T. Aste, T. Di Matteo, and R. N. Mantegna.

“Almost there….” (Computing Homology)

Friday, April 12th, 2013

We all remember the pilot in Star Wars that kept saying, “Almost there….” Jeremy Kun has us “almost there…” in his latest installment: Computing Homology.

To give you some encouragement, Jeremy concludes the post saying:

The reader may be curious as to why we didn’t come up with a more full-bodied representation of a simplicial complex and write an algorithm which accepts a simplicial complex and computes all of its homology groups. We’ll leave this direct approach as a (potentially long) exercise to the reader, because coming up in this series we are going to do one better. Instead of computing the homology groups of just one simplicial complex using by repeating one algorithm many times, we’re going to compute all the homology groups of a whole family of simplicial complexes in a single bound. This family of simplicial complexes will be constructed from a data set, and so, in grandiose words, we will compute the topological features of data.

If it sounds exciting, that’s because it is! We’ll be exploring a cutting-edge research field known as persistent homology, and we’ll see some of the applications of this theory to data analysis. (bold emphasis added)

Data analysts are needed at all levels.

Do you want to be a spreadsheet data analyst or something a bit harder to find?

Internet Topology… [Finite by Nature vs. Design]

Monday, April 1st, 2013

Internet Topology – Massive and Amazing Graphs by Vincent Granville.

From the post:

I selected a few from this Google search. Which one is best? Re-usable in other contexts? What about videos showing growth over time, or more sophisticated graphs where link thickness represents “Internet highway” bandwidth or speed. And what about a video representing a simulated reflected DNS attack, rendering 10% of the Internet virtually dead, and showing how the attack spreads across the network?

Internet topography ATT

Source: http://javiergs.com/?p=983 (a must read)

Be prepared to pump up the image size to get any recognizable text.

Truly impressive but I mention it to illustrate one of the practical problems in authoring topic maps.

The AT&amp:T graph is “massive and amazing” but it is finite. By its very nature it is finite.

Topic maps are finite as well, but their finiteness is by design. An entirely different problem.

In a topic map, every topic has the potential to have one or more associations with other topics, but it also has potential associations with subjects not yet represented by topics in the topic map.

That is like an encyclopedia author, you have to draw an arbitrary line around your topic map and say:

No associations with subjects not already in the map!

and,

No more new subjects in the map!

Which is quite different from a network typology, which no matter how vast, ends with with nodes at the end of each connection.

As a matter of design and authorship, you have to choose the limits on your topic map.

Where the limits of your topic map should be set will depend upon the use cases, requirements and resources that govern the authoring of your topic map.

The Shape of Data

Friday, March 22nd, 2013

The Shape of Data by Jesse Johnson.

From the “about” page:

Whether your goal is to write data intensive software, use existing software to analyze large, high dimensional data sets, or to better understand and interact with the experts who do these things, you will need a strong understanding of the structure of data and how one can try to understand it. On this blog, I plan to explore and explain the basic ideas that underlie modern data analysis from a very intuitive and minimally technical perspective: by thinking of data sets as geometric objects.

When I began learning about machine learning and data mining, I found that the intuition I had formed while studying geometry was extremely valuable in understanding the basic concepts and algorithms. My main obstacle has been to figure out what types of problems others are interested in solving, and what types of solutions would make the most difference. I hope that by sharing what I know (and what I continue to learn) from my own perspective, others will help me to figure out what are the major questions that drive this field.

A new blog that addresses the topology of data, in an accessible manner.

A Concise Course in Algebraic Topology

Tuesday, March 12th, 2013

A Concise Course in Algebraic Topology by J.P. May. (PDF)

From the introduction:

The first year graduate program in mathematics at the University of Chicago consists of three three-quarter courses, in analysis, algebra, and topology. The first two quarters of the topology sequence focus on manifold theory and differential geometry, including differential forms and, usually, a glimpse of de Rham cohomology. The third quarter focuses on algebraic topology. I have been teaching the third quarter off and on since around 1970. Before that, the topologists, including me, thought that it would be impossible to squeeze a serious introduction to algebraic topology into a one quarter course, but we were overruled by the analysts and algebraists, who felt that it was unacceptable for graduate students to obtain their PhDs without having some contact with algebraic topology.

This raises a conundrum. A large number of students at Chicago go into topology, algebraic and geometric. The introductory course should lay the foundations for their later work, but it should also be viable as an introduction to the subject suitable for those going into other branches of mathematics. These notes reflect my efforts to organize the foundations of algebraic topology in a way that caters to both pedagogical goals. There are evident defects from both points of view. A treatment more closely attuned to the needs of algebraic geometers and analysts would include Čech cohomology on the one hand and de Rham cohomology and ˇ perhaps Morse homology on the other. A treatment more closely attuned to the needs of algebraic topologists would include spectral sequences and an array of calculations with them. In the end, the overriding pedagogical goal has been the introduction of basic ideas and methods of thought.

Tough sledding but having insights, like those found in the GraphLab project, require a deeper than usual understanding of the issues at hand.

I first saw this in a tweet by Topology Fact.

…topological data analysis

Sunday, January 20th, 2013

New big data firm to pioneer topological data analysis by John Burn-Murdoch.

From the post:

A US big data firm is set to establish algebraic topology as the gold standard of data science with the launch of the world’s leading topological data analysis (TDA) platform.

Ayasdi, whose co-founders include renowned mathematics professor Gunnar Carlsson, launched today in Palo Alto, California, having secured $10.25m from investors including Khosla Ventures in the first round of funding.

The funds will be used to build on its Insight Discovery platform, the culmination of 12 years of research and development into mathematics, computer science and data visualisation at Stanford.

Ayasdi’s work prior to launching as a company has already yielded breakthroughs in the pharmaceuticals industry. In one case it revealed new insights in eight hours – compared to the previous norm of over 100 hours – cutting the turnaround from analysis to clinical trials in the process.

The project? CompTop, which I covered here.

Does topological data analysis sound more interesting now than before?

Constructing Topological Spaces — A Primer

Friday, November 16th, 2012

Constructing Topological Spaces — A Primer by Jeremy Kun.

From the post:

Last time we investigated the (very unintuitive) concept of a topological space as a set of “points” endowed with a description of which subsets are open. Now in order to actually arrive at a discussion of interesting and useful topological spaces, we need to be able to take simple topological spaces and build them up into more complex ones. This will take the form of subspaces and quotients, and through these we will make rigorous the notion of “gluing” and “building” spaces.

More heavy sledding but pay special attention to the discussion of sets and equivalences.

Jeremy concludes with pointers to books for additional reading.

Any personal favorites you would like to add to the list?

Topological Spaces — A Primer

Friday, November 16th, 2012

Topological Spaces — A Primer by Jeremy Kun.

From the post:

In our last primer we looked at a number of interesting examples of metric spaces, that is, spaces in which we can compute distance in a reasonable way. Our goal for this post is to relax this assumption. That is, we want to study the geometric structure of space without the ability to define distance. That is not to say that some notion of distance necessarily exists under the surface somewhere, but rather that we include a whole new class of spaces for which no notion of distance makes sense. Indeed, even when there is a reasonable notion of a metric, we’ll still want to blur the lines as to what kinds of things we consider “the same.”

The reader might wonder how we can say anything about space if we can’t compute distances between things. Indeed, how could it even really be “space” as we know it? The short answer is: the reader shouldn’t think of a topological space as a space in the classical sense. While we will draw pictures and say some very geometric things about topological spaces, the words we use are only inspired by their classical analogues. In fact the general topological space will be a much wilder beast, with properties ranging from absolute complacency to rampant hooliganism. Even so, topological spaces can spring out of every mathematical cranny. They bring at least a loose structure to all sorts of problems, and so studying them is of vast importance.

Just before we continue, we should give a short list of how topological spaces are applied to the real world. In particular, this author is preparing a series of posts dedicated to the topological study of data. That is, we want to study the loose structure of data potentially embedded in a very high-dimensional metric space. But in studying it from a topological perspective, we aim to eliminate the dependence on specific metrics and parameters (which can be awfully constricting, and even impertinent to the overall structure of the data). In addition, topology has been used to study graphics, image analysis and 3D modelling, networks, semantics, protein folding, solving systems of polynomial equations, and loads of topics in physics.

Topology offers an alternative to the fiction of metric distances between the semantics of words. It is a useful fiction, but a fiction none the less.

Deep sledding but well worth the time.

Graphlet-based edge clustering reveals pathogen-interacting proteins

Monday, September 10th, 2012

Graphlet-based edge clustering reveals pathogen-interacting proteins by R. W. Solava, R. P. Michaels and T. Milenković. (Bioinformatics (2012) 28 (18): i480-i486. doi: 10.1093/bioinformatics/bts376)

Abstract:

Motivation: Prediction of protein function from protein interaction networks has received attention in the post-genomic era. A popular strategy has been to cluster the network into functionally coherent groups of proteins and assign the entire cluster with a function based on functions of its annotated members. Traditionally, network research has focused on clustering of nodes. However, clustering of edges may be preferred: nodes belong to multiple functional groups, but clustering of nodes typically cannot capture the group overlap, while clustering of edges can. Clustering of adjacent edges that share many neighbors was proposed recently, outperforming different node clustering methods. However, since some biological processes can have characteristic ‘signatures’ throughout the network, not just locally, it may be of interest to consider edges that are not necessarily adjacent.

Results: We design a sensitive measure of the ‘topological similarity’ of edges that can deal with edges that are not necessarily adjacent. We cluster edges that are similar according to our measure in different baker’s yeast protein interaction networks, outperforming existing node and edge clustering approaches. We apply our approach to the human network to predict new pathogen-interacting proteins. This is important, since these proteins represent drug target candidates.

Availability: Software executables are freely available upon request.

Contact: tmilenko@nd.edu

Of interest for bioinformatics but more broadly for its insights into topological similarity and edge clustering by topological similarity.

Being mindful that an “edge” is for all intents and purposes a “node” that records a connection between two (non-hyperedge and non-looping edge) other nodes. Nodes could, but don’t generally record their connection to other nodes, that connection being represented by an edge.

Algebraic Topology and Machine Learning

Saturday, August 25th, 2012

Algebraic Topology and Machine Learning – In conjunction with Neural Information Processing Systems (NIPS 2012)

September 16, 2012 – Submissions Due
October 7, 2012 – Acceptance Notices
December 7 or 8 (TBD), 2011, Lake Tahoe, Nevada, USA.

From the call for papers:

Topological methods and machine learning have long enjoyed fruitful interactions as evidenced by popular algorithms like ISOMAP, LLE and Laplacian Eigenmaps which have been borne out of studying point cloud data through the lens of topology/geometry. More recently several researchers have been attempting to understand the algebraic topological properties of data. Algebraic topology is a branch of mathematics which uses tools from abstract algebra to study and classify topological spaces. The machine learning community thus far has focused almost exclusively on clustering as the main tool for unsupervised data analysis. Clustering however only scratches the surface, and algebraic topological methods aim at extracting much richer topological information from data.

The goals of this workshop are:

  1. To draw the attention of machine learning researchers to a rich and emerging source of interesting and challenging problems.
  2. To identify problems of interest to both topologists and machine learning researchers and areas of potential collaboration.
  3. To discuss practical methods for implementing topological data analysis methods.
  4. To discuss applications of topological data analysis to scientific problems.

We also invite submissions in a variety of areas, at the intersection of algebraic topology and learning, that have witnessed recent activity. Areas of focus for submissions include but are not limited to:

  1. Statistical approaches to robust topological inference.
  2. Novel applications of topological data analysis to problems in machine learning.
  3. Scalable methods for topological data analysis.

NIPS2012 site. You will appreciate the “dramatization.” 😉

Put on your calendar and/or watch for papers!

Inferring General Relations between Network Characteristics from Specific Network Ensembles

Sunday, June 10th, 2012

Inferring General Relations between Network Characteristics from Specific Network Ensembles by Stefano Cardanobile, Volker Pernice, Moritz Deger, and Stefan Rotter.

Abstract:

Different network models have been suggested for the topology underlying complex interactions in natural systems. These models are aimed at replicating specific statistical features encountered in real-world networks. However, it is rarely considered to which degree the results obtained for one particular network class can be extrapolated to real-world networks. We address this issue by comparing different classical and more recently developed network models with respect to their ability to generate networks with large structural variability. In particular, we consider the statistical constraints which the respective construction scheme imposes on the generated networks. After having identified the most variable networks, we address the issue of which constraints are common to all network classes and are thus suitable candidates for being generic statistical laws of complex networks. In fact, we find that generic, not model-related dependencies between different network characteristics do exist. This makes it possible to infer global features from local ones using regression models trained on networks with high generalization power. Our results confirm and extend previous findings regarding the synchronization properties of neural networks. Our method seems especially relevant for large networks, which are difficult to map completely, like the neural networks in the brain. The structure of such large networks cannot be fully sampled with the present technology. Our approach provides a method to estimate global properties of under-sampled networks in good approximation. Finally, we demonstrate on three different data sets (C. elegans neuronal network, R. prowazekii metabolic network, and a network of synonyms extracted from Roget’s Thesaurus) that real-world networks have statistical relations compatible with those obtained using regression models.

The key insight is that sampling can provide the basis for reliable estimates of global properties of networks too large to fully model.

I first saw this at: Understanding Complex Relationships: How Global Properties of Networks Become Apparent Locally.

If you don’t already get one of the ScienceDaily newsletters, you should consider it.

Applied topology and Dante: an interview with Robert Ghrist [Sept., 2010]

Monday, May 28th, 2012

Applied topology and Dante: an interview with Robert Ghrist by John D. Cook. (September 13, 2010)

From the post:

Robert Ghrist A few weeks ago I discovered Robert Ghrist via his web site. Robert is a professor of mathematics and electrical engineering. He describes his research as applied topology, something I’d never heard of. (Topology has countless applications to other areas of mathematics, but I’d not heard of much work directly applying topology to practical physical problems.) In addition to his work in applied topology, I was intrigued by Robert’s interest in old books.

The following is a lightly-edited transcript of a phone conversation Robert and I had September 9, 2010.

If the interview sounds interesting, you may want to read/skim:

[2008] R. Ghrist, “Three examples of applied and computational homology,” Nieuw Archief voor Wiskunde 5/9(2).

or,

[2010] R. Ghrist, “Applied Algebraic Topology & Sensor Networks,” a manu-script text. (caveat! file>50megs!)

Applied Topology & Sensor Networks are the notes for an AMS short course. Ghrist recommends continuing with Algebraic Toplogy by Allen Hatcher. (Let me know if you need my shipping address.)

Q: Are sensors always mechanical sensors? We speak of them as though that were the case.

What if I can’t afford unmanned drones (to say nothing of their pilots) and have $N$ people with cellphones?

How does a more “discriminating” “sensor” impact the range of capabilities/solutions?

Barcodes: The Persistent Topology of Data

Monday, February 27th, 2012

Barcodes: The Persistent Topology of Data by Robert Ghrist.

Abstract:

This article surveys recent work of Carlsson and collaborators on applications of computational algebraic topology to problems of feature detection and shape recognition in high-dimensional data. The primary mathematical tool considered is a homology theory for point-cloud data sets — persistent homology — and a novel representation of this algebraic characterization — barcodes. We sketch an application of these techniques to the classification of natural images.

From the article:

1. The shape of data

When a topologist is asked, “How do you visualize a four-dimensional object?” the appropriate response is a Socratic rejoinder: “How do you visualize a three-dimensional object?” We do not see in three spatial dimensions directly, but rather via sequences of planar projections integrated in a manner that is sensed if not comprehended. We spend a significant portion of our first year of life learning how to infer three-dimensional spatial data from paired planar projections. Years of practice have tuned a remarkable ability to extract global structure from representations in a strictly lower dimension.

The inference of global structure occurs on much finer scales as well, with regards to converting discrete data into continuous images. Dot-matrix printers, scrolling LED tickers, televisions, and computer displays all communicate images via arrays of discrete points which are integrated into coherent, global objects. This also is a skill we have practiced from childhood. No adult does a dot-to-dot puzzle with anything approaching anticipation.

1.1. Topological data analysis.

Problems of data analysis share many features with these two fundamental integration tasks: (1) how does one infer high dimensional structure from low dimensional representations; and (2) how does one assemble discrete points into global structure.

Now are you interested?

Reminds me of another paper on homology and higher dimensions that I need to finish writing up. Probably not today but later this week.

Found thanks to: Christophe Lalanne’s A bag of tweets / Feb 2012.

Inquiry: Algebraic Geometry and Topology

Monday, October 31st, 2011

Inquiry: Algebraic Geometry and Topology

Speaking of money and such matters, a call for assistance from Quantivity:

Algebraic geometry and topology traditionally focused on fairly pure math considerations. With the rise of high-dimensional machine learning, these fields are increasing being pulled into interesting computational applications such as manifold learning. Algebraic statistics and information geometry offer potential to help bridge these fields with modern statistics, especially time-series and random matrices.

Early evidence suggests potential for significant intellectual cross-fertilization with finance, both mathematical and computational. Geometrically, richer modeling and analysis of latent geometric structure than available from classic linear algebraic decomposition (e.g. PCA, one of the main workhorses of modern $mathbb{P}$ finance); for example, cumulant component analysis. Topologically, more effective qualitative analysis of data sampled from manifolds or singular algebraic varieties; for example, persistent homology (see CompTop).

As evidence by Twitter followers, numerous Quantivity readers are experts in these fields. Thus, perhaps the best way to explore is to seek insight from readers.

Readers: please use comments to suggest applied literature from these fields; ideally, although not required, that of potential relevance to finance modeling. All types of literature are requested, from intro texts to survey articles to preprint working papers on specific applications.

These suggestions will be synthesized into one or more subsequent posts, along with appropriate additions to People of Quant Research.

If you or a member of your family knows of any relevant resources, please go to: Inquiry: Algebraic Geometry and Topology and volunteer those resources as comments. You might even make the People of Quant Research list!

I wonder if there would be any interest in tracking bundled instruments using topic maps? That could be an interesting question.

COMTOP: Applied and Computational Algebraic Topology

Tuesday, September 13th, 2011

COMTOP: Applied and Computational Algebraic Topology

From the website:

The overall goal of this project is to develop flexible topological methods which will allow the analysis of data which is difficult to analyze using classical linear methods. Data obtained by sampling from highly curved manifolds or singular algebraic varieties in Euclidean space are typical examples where our methods will be useful. We intend to develop and refine two pieces of software which have been written by members of our research group, ISOMAP (Tenenbaum) and PLEX (de Silva-Carlsson). ISOMAP is a tool for dimension reduction and parameterization of high dimensional data sets, and PLEX is a homology computing tool which we will use in locating and analyzing singular points in data sets, as well as estimating dimension in situations where standard methods do not work well. We plan to extend the range of applicability of both tools, in the case of ISOMAP by studying embeddings into spaces with non-Euclidean metrics, and in the case of PLEX by building in the Mayer-Vietoris spectral sequence as a tool. Both ISOMAP and PLEX will be adapted for parallel computing. We will also begin the theoretical study of statistical questions relating to topology. For instance, we will initiate the study of higher dimensional homology of subsets sampled from Euclidean space under various sampling hypotheses. The key object of study will be the family of Cech complexes constructed using the distance function in Euclidean space together with a randomly chosen finite set of points in Euclidean space.

The goal of this project is to develop tools for understanding data sets which are not easy to understand using standard methods. This kind of data might include singular points, or might be strongly curved. The data is also high dimensional, in the sense that each data point has many coordinates. For instance, we might have a data set whose points each of which is an image, which has one coordinate for each pixel. Many standard tools rely on linear approximations, which do not work well in strongly curved or singular problems. The kind of tools we have in mind are in part topological, in the sense that they measure more qualitative properties of the spaces involved, such as connectedness, or the number of holes in a space, and so on. This group of methods has the capability of recognizing the number of parameters required to describe a space, without actually parameterizing it. These methods also have the capability of recognizing singular points (like points where two non-parallel planes or non-parallel lines intersect), without actually having to construct coordinates on the space. We will also be further developing and refining methods we have already constructed which can actually find good parameterizations for many high dimensional data sets. Both projects will involve the adaptation for the computer of many methods which have heretofore been used in by-hand calculations for solving theoretical problems. We will also initiate the theoretical development of topological tools in a setting which includes errors and sampling.

Are you still using “classical linear methods” to analyze your data sets?

Could be the right choice but it may be your only choice if you never consider the alternatives.