Archive for the ‘Computation’ Category

Physics, Topology, Logic and Computation: A Rosetta Stone

Monday, February 22nd, 2016

Physics, Topology, Logic and Computation: A Rosetta Stone by John C. Baez and Mike Stay.

Abstract:

In physics, Feynman diagrams are used to reason about quantum processes. In the 1980s, it became clear that underlying these diagrams is a powerful analogy between quantum physics and topology. Namely, a linear operator behaves very much like a ‘cobordism’: a manifold representing spacetime, going between two manifolds representing space. This led to a burst of work on topological quantum field theory and ‘quantum topology’. But this was just the beginning: similar diagrams can be used to reason about logic, where they represent proofs, and computation, where they represent programs. With the rise of interest in quantum cryptography and quantum computation, it became clear that there is extensive network of analogies between physics, topology, logic and computation. In this expository paper, we make some of these analogies precise using the concept of ‘closed symmetric monoidal category’. We assume no prior knowledge of category theory, proof theory or computer science.

While this is an “expository” paper, at some 66 pages (sans the references), you best set aside some of your best thinking/reading time to benefit from it.

Enjoy!

Computational Culture

Thursday, November 13th, 2014

Computational Culture: a journal of software studies

Computational Culture is an online open-access peer-reviewed journal of inter-disciplinary enquiry into the nature of the culture of computational objects, practices, processes and structures.

The journal’s primary aim is to examine the ways in which software undergirds and formulates contemporary life. Computational processes and systems not only enable contemporary forms of work and play and the management of emotional life but also drive the unfolding of new events that constitute political, social and ontological domains. In order to understand digital objects such as corporate software, search engines, medical databases or to enquire into the use of mobile phones, social networks, dating, games, financial systems or political crises, a detailed analysis of software cannot be avoided.

A developing form of literacy is required that matches an understanding of computational processes with those traditionally bound within the arts, humanities, and social sciences but also in more informal or practical modes of knowledge such as hacking and art.

The journal welcomes contributions that address such topics and many others that may derive and mix methodologies from cultural studies, science and technology studies, philosophy of computing, metamathematics, computer science, critical theory, media art, human computer interaction, media theory, design, philosophy.

Computational Culture publishes peer-reviewed articles, special projects, interviews, and reviews of books, projects, events and software. The journal is also involved in developing a series of events and projects to generate special issues.

A few of the current articles:

Not everyone’s cup of tea but for those who appreciate it, this promises to be a real treasure.

The Lambda Calculus for Absolute Dummies (like myself)

Monday, June 23rd, 2014

The Lambda Calculus for Absolute Dummies (like myself) by Joscha Bach.

From the post:

If there is one highly underrated concept in philosophy today, it is computation. Why is it so important? Because computationalism is the new mechanism. For millennia, philosophers have struggled when they wanted to express or doubt that the universe can be explained in a mechanical way, because it is so difficult to explain what a machine is, and what it is not. The term computation does just this: it defines exactly what machines can do, and what not. If the universe/the mind/the brain/bunnies/God is explicable in a mechanical way, then it is a computer, and vice versa.

Unfortunately, most people outside of programming and computer science don’t know exactly what computation means. Many may have heard of Turing Machines, but these things tend to do more harm than good, because they leave strong intuitions of moving wheels and tapes, instead of what it really does: embodying the nature of computation.
….

If you have ever struggled with the Lamda Calculus entry at Wikipedia, you will appreciate this well written introduction to the same subject.

I would re-title the post: Lamda Calculus by a Gifted Author.

Quantum Computing Playground

Friday, May 30th, 2014

From the post:

Google has just launched a new web-based integrated development environment (IDE) that allows users to write, run and debug software that makes use of quantum algorithms. The tool could allow computer scientists to stay ahead of the game and get acquainted with the many quirks of quantum algorithms even before the first practical quantum computer is built.

Homepage: Quantum Computing Playground.

From the homepage:

We strongly recommened to run Quantum Playground in Google Chrome.

I accessed the homepage using, gasp, another browser. 😉 Just happenstance.

The page doesn’t say anything about use of another browser leaving your computer out of phase but probably best not to take chances.

Enjoy!

Human Computation

Saturday, May 24th, 2014

Human Computation

From the homepage:

Human Computation is an international and interdisciplinary forum for the electronic publication and print archiving of high-quality scholarly articles in all areas of human computation, which concerns the design or analysis of information processing systems in which humans participate as computational elements.

Submission Topics

(Editorial keywords are in boldface – please see author guidelines for details)

Applications – novel or transformative applications
Interfaces – HCI or related human factors methods or issues
Modalities – general interaction paradigms (e.g., gaming) and related methods
Techniques – repeatable methods, analogous to design patterns for OOP
Algorithms – wisdom of crowds, aggregation, reputation, crowdsourced analysis, and ML/HC
Architecture – development platforms, architectures, languages, APIs, IDEs, and compilers
Infrastructure – relevant networks, protocols, state space, and services
Participation – factors that inﬂuence human participation
Analysis – techniques for identifying typical characteristics and patterns in human computation systems
Epistemology – the role, source, representation, and construction of information
Policy – ethical, regulatory, and economic considerations
Security – security issues, including surreptitious behavior to inﬂuence system outcomes
Society – cultural, evolutionary, existential, psychological, and social impact
Organization – taxonomies of concepts, terminology, problem spaces, algorithms, and methods
Surveys – state of the art assessments of various facets
Meta-topics – insightful commentary on the future, philosophy, charter, and purpose of HC.

Looks like a journal for topic map articles to me.

You?

I first saw this in a tweet by Matt Lease.

Find Papers We Love

Friday, May 2nd, 2014

Find Papers We Love by Zachary Tong.

A search interface to the Github repository maintained by @Papers_we_love.

It’s not “big data” but this search interface is going to make my life better.

You?

From their homepage:

What was the last paper within the realm of computing you read and loved? What did it inspire you to build or tinker with? Come share the ideas in an awesome academic/research paper with fellow engineers, programmers, and paper-readers. Lead a session and show off code that you wrote that implements these ideas or just give us the lowdown about the paper (because of HARD MATH!). Otherwise, just come, listen, and discuss.

We’re curating a repository for papers and places-to-find papers. You can contribute by adding PR’s for papers, code, and/or links to other repositories.

We’re posting videos of all our presentations, from all our chapters.

This is a productive use of the Internet.

Introduction to Statistical Computing

Saturday, January 18th, 2014

Introduction to Statistical Computing by Cosma Shalizi.

Description:

Computational data analysis is an essential part of modern statistics. Competent statisticians must not just be able to run existing programs, but to understand the principles on which they work. They must also be able to read, modify and write code, so that they can assemble the computational tools needed to solve their data-analysis problems, rather than distorting problems to fit tools provided by others. This class is an introduction to programming, targeted at statistics majors with minimal programming knowledge, which will give them the skills to grasp how statistical software works, tweak it to suit their needs, recombine existing pieces of code, and when needed create their own programs.

Students will learn the core of ideas of programming — functions, objects, data structures, flow control, input and output, debugging, logical design and abstraction — through writing code to assist in numerical and graphical statistical analyses. Students will in particular learn how to write maintainable code, and to test code for correctness. They will then learn how to set up stochastic simulations, how to parallelize data analyses, how to employ numerical optimization algorithms and diagnose their limitations, and how to work with and filter large data sets. Since code is also an important form of communication among scientists, students will learn how to comment and organize code.

The class will be taught in the R language.

Slides and R code for three years (as of the time of this post.

Third Age of Computing?

Monday, August 26th, 2013

The ‘third era’ of app development will be fast, simple, and compact by Rik Myslewski.

From the post:

The tutorial was conducted by members of the HSA – heterogeneous system architecture – Foundation, a consortium of SoC vendors and IP designers, software companies, academics, and others including such heavyweights as ARM, AMD, and Samsung. The mission of the Foundation, founded last June, is “to make it dramatically easier to program heterogeneous parallel devices.”

As the HSA Foundation explains on its website, “We are looking to bring about applications that blend scalar processing on the CPU, parallel processing on the GPU, and optimized processing of DSP via high bandwidth shared memory access with greater application performance at low power consumption.”

Last Thursday, HSA Foundation president and AMD corporate fellow Phil Rogers provided reporters with a pre-briefing on the Hot Chips tutorial, and said the holy grail of transparent “write once, use everywhere” programming for shared-memory heterogeneous systems appears to be on the horizon.

According to Rogers, heterogeneous computing is nothing less than the third era of computing, the first two being the single-core era and the muti-core era. In each era of computing, he said, the first programming models were hard to use but were able to harness the full performance of the chips.

(…)

Exactly how HSA will get there is not yet fully defined, but a number of high-level features are accepted. Unified memory addressing across all processor types, for example, is a key feature of HSA. “It’s fundamental that we can allocate memory on one processor,” Rogers said, “pass a pointer to another processor, and execute on that data – we move the compute rather than the data.”
(…)

Rik does a deep dive with references to the HSA Programmers Reference Manual to Project Sumatra that bring data-parallel algorithms to Java 9 (2015).

The only discordant note is that Nivdia and Intel are both missing from the HSA Foundation. Invited but not present.

Customers of Nvidia and/or Intel (I’m both) should contact Nvidia (Contact us) and Intel (contact us) and urge them to join the HSA Foundation. And pass this request along.

Sharing of memory is one of the advantages of HSA (heterogeneous systems architecture) and it is the where the semantics of shared data will come to the fore.

I haven’t read the available HSA documents in detail, but the HSA Programmer’s Reference Manual appears to presume that shared data has only one semantic. (It never says that but that is my current impression.)

We have seen that the semantics of data is not “transparent.” The same demonstration illustrates that data doesn’t always have the same semantic.

Simply because I am pointed to a particular memory location, there is no reason to presume I should approach that data with the same semantics.

For example, what if I have a Social Security Number (SSN). In processing that number for the Social Security Administration, it may serve to recall claim history, eligibility, etc. If I am accessing the same data to compare it to SSN records maintained by the Federal Bureau of Investigation (FBI), it may not longer be a unique identifier in the same sense as at the SSA.

Same “data,” but different semantics.

Who you gonna call? Topic Maps!

PS: Perhaps not as part of the running code but to document the semantics you are using to process data. Same data, same memory location, multiple semantics.

Physics, Topology, Logic and Computation:…

Sunday, July 7th, 2013

Physics, Topology, Logic and Computation: A Rosetta Stone by John C. Baez and Mike Stay.

Abstract:

In physics, Feynman diagrams are used to reason about quantum processes. In the 1980s, it became clear that underlying these diagrams is a powerful analogy between quantum physics and topology: namely, a linear operator behaves very much like a “cobordism”. Similar diagrams can be used to reason about logic, where they represent proofs, and computation, where they represent programs. With the rise of interest in quantum cryptography and quantum computation, it became clear that there is extensive network of analogies between physics, topology, logic and computation. In this expository paper, we make some of these analogies precise using the concept of “closed symmetric monoidal category”. We assume no prior knowledge of category theory, proof theory or computer science.

The authors set out to create a Rosetta stone for the areas of physics, topology, logic and computation on the subject of categories.

Seventy (70)+ pages of heavy reading but worth the effort (at least so far)!

The Quipper Language [Quantum Computing]

Saturday, June 22nd, 2013

The Quipper Language

From the webpage:

Quipper is an embedded, scalable functional programming language for quantum computing. It provides, among other things:

• A high-level circuit description language. This includes gate-by-gate descriptions of circuit fragments, as well as powerful operators for assembling and manipulating circuits.
• A monadic semantics, allowing for a mixture of procedural and declarative programming styles.
• Built-in facilities for automatic synthesis of reversible quantum circuits, including from classical code.
• Support for hierarchical circuits.
• Extensible quantum data types.
• Programmable circuit transformers.
• Support for three execution phases: compile time, circuit generation time, and circuit execution time. A dynamic lifting operation to allow circuit generation to be parametric on values generated at circuit execution time.
• Extensive libraries of quantum functions, including: libraries for quantum integer and fixed-point arithmetic; the Quantum Fourier transform; an efficient Qram implementation; libraries for simulation of pseudo-classical circuits, Stabilizer circuits, and arbitrary circuits; libraries for exact and approximate decomposition of circuits into specific gate sets.

The website has a Quipper tutorial, online documentation and a detailed look at the language itself.

No link for a quantum computer but that isn’t very far off.

Learn Quipper now and perhaps you can lead the first Apache project to develop open source software for a quantum computer.

I first saw this in a tweet by Jose A. Alonso.

In-Memory Computing

Wednesday, April 17th, 2013

Why In-Memory Computing Is Cheaper And Changes Everything by Timo Elliott.

From the post:

What is the difference? Database engines today do I/O. So if they want to get a record, they read. If they want to write a record, they write, update, delete, etc. The application, which in this case is a DBMS, thinks that it’s always writing to disk. If that record that they’re reading and writing happens to be in flash, it will certainly be faster, but it’s still reading and writing. Even if I’ve cached it in DRAM, it’s the same thing: I’m still reading and writing.

What we’re talking about here is the actual database is physically in in-memory. I’m doing a fetch to get data and not a read. So the logic of the database changes. That’s what in-memory is about as opposed to the traditional types of computing.

Why is it time for in-memory computing?

Why now? The most important thing is this: DRAM costs are dropping about 32% every 12 months. Things are getting bigger, and costs are getting lower. If you looked at the price of a Dell server with a terabyte of memory three years ago, it was almost $100,000 on their internet site. Today, a server with more cores — sixteen instead of twelve — and a terabyte of DRAM, costs less than$40,000.

In-memory results in lower total cost of ownership

So the costs of this stuff is not outrageous. For those of you who don’t understand storage, I always get into this argument: the total cost of acquisition of an in-memory system is likely higher than a storage system. There’s no question. But the total cost of TCO is lower – because you don’t need storage people to manage memory. There are no LUNs [logical unit numbers]: all the things your storage technicians do goes away.

People cost more than hardware and software – a lot more. So the TCO is lower. And also, by the way, power: one study IBM did showed that memory is 99% less power than spinning disks. So unless you happen to be an electric company, that’s going to mean a lot to you. Cooling is lower, everything is lower.

Timo makes a good case for in-memory computing but I have a slightly different question.

If both data and program are stored in memory, where is the distinction between program and data?

Or in topic map terms, can’t we then speak about subject identities in the program and even in data at particular points in the program?

That could be a very powerful tool for controlling program behavior and re-purposing data at different stages of processing.

…no Hitchhiker’s Guide…

Saturday, February 9th, 2013

From the post:

Do you really want to get better at mathematics?

Remember when you first learned how to program? I do. I spent two years experimenting with Java programs on my own in high school. Those two years collectively contain the worst and most embarrassing code I have ever written. My programs absolutely reeked of programming no-nos. Hundred-line functions and even thousand-line classes, magic numbers, unreachable blocks of code, ridiculous code comments, a complete disregard for sensible object orientation, negligence of nearly all logic, and type-coercion that would make your skin crawl. I committed every naive mistake in the book, and for all my obvious shortcomings I considered myself a hot-shot programmer! At leaa st I was learning a lot, and I was a hot-shot programmer in a crowd of high-school students interested in game programming.

Even after my first exposure and my commitment to get a programming degree in college, it was another year before I knew what a stack frame or a register was, two more before I was anywhere near competent with a terminal, three more before I fully appreciated functional programming, and to this day I still have an irrational fear of networking and systems programming (the first time I manually edited the call stack I couldn’t stop shivering with apprehension and disgust at what I was doing).

A must read post if you want to be on the cutting edge of programming.

The tyranny of algorithms

Monday, September 24th, 2012

The tyranny of algorithms by Kaiser Fung.

This WSJ book review on “The Tyranny of Algorithms” (link) is well worth reading for those interested in how computer algorithms are used in business and government. I agree with most of what this author has to say.

You need to read Kaiser’s comment on the review before proceeding….

Back?

I am not altogether sure that algorithms are the problem the book and/or review make they out to be.

No doubt the concerns and problems described are real, but they don’t exist because of algorithms.

Rather they exist because we are basically lazy and accept the result of algorithms, just like we accept the judgements of others, advertising, etc.

Were we to question algorithms, judgements of others, advertising, we might be living in a very different world.

But we don’t, so we’re not.

So the question is, how to live with algorithms knowing we are too lazy to question them. Yes?

Are these shadows/echoes of Thinking, Fast and Slow?

On the value of being inexact

Wednesday, June 13th, 2012

Algorithmic methodologies for ultra-efficient inexact architectures for sustaining technology scaling by Avinash Lingamneni, Kirthi Krishna Muntimadugu, Richard M. Karp, Krishna V. Palem, and Christian Piguet.

The following non-technical blurb caught my eye:

Researchers have unveiled an “inexact” computer chip that challenges the industry’s dogmatic 50-year pursuit of accuracy. The design improves power and resource efficiency by allowing for occasional errors. Prototypes unveiled this week at the ACM International Conference on Computing Frontiers in Cagliari, Italy, are at least 15 times more efficient than today’s technology.

The research, which earned best-paper honors at the conference, was conducted by experts from Rice University in Houston, Singapore’s Nanyang Technological University (NTU), Switzerland’s Center for Electronics and Microtechnology (CSEM) and the University of California, Berkeley.

“It is exciting to see this technology in a working chip that we can measure and validate for the first time,” said project leader Krishna Palem, who also serves as director of the Rice-NTU Institute for Sustainable and Applied Infodynamics (ISAID). “Our work since 2003 showed that significant gains were possible, and I am delighted that these working chips have met and even exceeded our expectations.” [From: Computing experts unveil superefficient ‘inexact’ chip which I saw in a list of links by Greg Linden.

Think about it. We are inexact and so are our semantics.

But we attempt to model our inexact semantics with increasingly exact computing platforms.

Does that sound like a modeling mis-match to you?

BTW, if you are interested in the details, see: Algorithmic methodologies for ultra-efficient inexact architectures for sustaining technology scaling

Abstract:

Why the link to the ACM Digital library or to the “unoffiical version” were not reported in any of the press stories I cannot say.

A Computable Universe, Understanding Computation and Exploring Nature As Computation

Friday, June 1st, 2012

Abstract:

I am most honoured to have the privilege to present the Foreword to this fascinating and wonderfully varied collection of contributions, concerning the nature of computation and of its deep connection with the operation of those basic laws, known or yet unknown, governing the universe in which we live. Fundamentally deep questions are indeed being grappled with here, and the fact that we find so many different viewpoints is something to be expected, since, in truth, we know little about the foundational nature and origins of these basic laws, despite the immense precision that we so often find revealed in them. Accordingly, it is not surprising that within the viewpoints expressed here is some unabashed speculation, occasionally bordering on just partially justified guesswork, while elsewhere we find a good deal of precise reasoning, some in the form of rigorous mathematical theorems. Both of these are as should be, for without some inspired guesswork we cannot have new ideas as to where look in order to make genuinely new progress, and without precise mathematical reasoning, no less than in precise observation, we cannot know when we are right — or, more usually, when we are wrong.

An unlikely volume to search for data mining or semantic modeling algorithms or patterns.

But one that should be read for the mental exercise/discipline of its reading.

Plus a greatly diminished impact.

How many books have I read from publisher X?*

If republished as an accessible paperback, may I suggest: “Exploring the Nature of Computation”?

The committee title makes the collage nature of the volume a bit too obvious.

Complexity and Computation

Thursday, January 12th, 2012

Complexity and Computation by Allen B. Downey.

Another free (you can order hard copy) book from Allen B. Downey. See my post: Think Stats: Probability and Statistics for Programmers or jump to Green Tea Press to see these and other titles for free download.

Description:

This book is about complexity science, data structures and algorithms, intermediate programming in Python, and the philosophy of science:

• Data structures and algorithms: A data structure is a collection that contains data elements organized in a way that supports particular operations. For example, a dictionary organizes key-value pairs in a way that provides fast mapping from keys to values, but mapping from values to keys is generally slower.

An algorithm is a mechanical process for performing a computation. Designing efficient programs often involves the co-evolution of data structures and the algorithms that use them. For example, the first few chapters are about graphs, a data structure that is a good implementation of a graph—nested dictionaries—and several graph algorithms that use this data structure.

• Python programming: This book picks up where Think Python leaves off. I assume that you have read that book or have equivalent knowledge of Python. As always, I will try to emphasize fundmental ideas that apply to programming in many languages, but along the way you will learn some useful features that are specific to Python.
• Computational modeling: A model is a simplified description of a system that is useful for simulation or analysis. Computational models are designed to take advantage of cheap, fast computation.
• Philosophy of science: The models and results in this book raise a number of questions relevant to the philosophy of science, including the nature of scientific laws, theory choice, realism and instrumentalism, holism and reductionism, and Bayesian epistemology.

This book focuses on discrete models, which include graphs, cellular automata, and agent-based models. They are often characterized by structure, rules and transitions rather than by equations. They tend to be more abstract than continuous models; in some cases there is no direct correspondence between the model and a physical system.

Complexity science is an interdiscipinary field—at the intersection of mathematics, computer science and physics—that focuses on these kinds of models. That’s what this book is about.

Turtles all the way down

Wednesday, August 31st, 2011

Turtles all the way down

From the website:

Decisive breakthrough from IBM researchers in Haifa introduces efficient nested virtualization for x86 hypervisors

What is nested virtualization and who needs it? Classical virtualization takes a physical computer and turns it into multiple logical, or virtual, computers. Each virtual machine can then interact independently, run its own operating environment, and basically behave like a separate physical resource. Hypervisor software is the secret sauce that makes virtualization possible by sitting in between the hardware and the operating system. It manages how the operating system and applications access the hardware.

IBM researchers found an efficient way to take one x86 hypervisor and run other hypervisors on top of it. For virtualization, this means that a virtual machine can be ‘turned into’ many machines, each with the potential to have its own unique environment, configuration, operating system, or security measures—which can in turn each be divided into more logical computers, and so on. With this breakthrough, x86 processors can now run multiple ‘hypervisors’ stacked, in parallel, and of different types.

This nested virtualization using one hypervisor on top of another is reminiscent of a tale popularized by Stephen Hawking. A little old lady argued with a lecturing scientist and insisted that the world is really a flat plate supported on the back of a giant tortoise. When the scientist asked what the tortoise is standing on, the woman answered sharply “But it’s turtles all the way down!” Inspired by this vision, the researchers named their solution the Turtles Project: Design and Implementation of Nested Virtualization

This awesome advance has been incorporated into the latest Linux release.

This is what I like about IBM, fundamental advances in computer science that can be turned into services for users.

One obvious use of this advance would be to segregate merging models in separate virtual machines. I am sure there are others.

Parallel prototyping leads to better design results, more divergence, and increased self-efficacy

Wednesday, January 5th, 2011

Parallel prototyping leads to better design results, more divergence, and increased self-efficacy Authors: Steven P. Dow, Alana Glassco, Jonathan Kass, Melissa Schwarz, Daniel L. Schwartz, and Scott R. Klemmer

Abstract:

Iteration can help people improve ideas. It can also give rise to fixation, continuously refining one option without considering others. Does creating and receiving feedback on multiple prototypes in parallel, as opposed to serially, affect learning, self-efficacy, and design exploration? An experiment manipulated whether independent novice designers created graphic Web advertisements in parallel or in series. Serial participants received descriptive critique directly after each prototype. Parallel participants created multiple prototypes before receiving feedback. As measured by click-through data and expert ratings, ads created in the Parallel condition significantly outperformed those from the Serial condition. Moreover, independent raters found Parallel prototypes to be more diverse. Parallel participants also reported a larger increase in task-specific self-confidence. This article outlines a theoretical foundation for why parallel prototyping produces better design results and discusses the implications for design education.

A useful heuristic for the development of subject identifications in an organization.

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Wednesday, January 5th, 2011

Is Parallel Programming Hard, And, If So, What Can You Do About It? Editor Paul E. McKenney

Kirk Lowery forwarded this link to my attention.

Just skimming the first couple of chapters, I have to say it has some of the most amusing graphics I have seen in any CS book.

Performance, productivity and generality are all goals of parallel programming, as cited by this book.

I have to wonder though if subject recognition tasks, analogous to computer vision, that are inherently parallel.

Doing them in parallel does not make them easier but not doing them in parallel certainly makes them harder.

For example, consider the last time you failed to recognize someone who wasn’t in the location or context where you normally see them.

Do you recognize the context in addition or in parallel to your recognition of the person’s face?

Questions:

1. What benefits/drawbacks do you see in parallel processing of TMDM instances? (3-5 pages, citations)
2. How would you design subject identifications for processing in a parallel environment? (3-5 pages, citations)
3. How would you evaluate the need for parallel processing of subject identifications? (3-5 pages, citations)

EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs

Friday, October 15th, 2010

EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs. Authors: B. Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju, and Christos Faloutsos Keywords: EigenSpokes – Communities – Graphs

Abstract:

We report a surprising, persistent pattern in large sparse social graphs, which we term EigenSpokes. We focus on large Mobile Call graphs, spanning about 186K nodes and millions of calls, and find that the singular vectors of these graphs exhibit a striking EigenSpokes pattern wherein, when plotted against each other, they have clear, separate lines that often neatly align along specific axes (hence the term “spokes”). Furthermore, analysis of several other real-world datasets e.g., Patent Citations, Internet, etc. reveals similar phenomena indicating this to be a more fundamental attribute of large sparse graphs that is related to their community structure.

This is the first contribution of this paper. Additional ones include (a) study of the conditions that lead to such EigenSpokes, and (b) a fast algorithm for spotting and extracting tightly-knit communities, called SpokEn, that exploits our findings about the EigenSpokes pattern.

The notion of “chipping” off communities for further study from a large graph is quite intriguing.

In part because those communities (need I say subjects?) are found as the result of a process of exploration rather than declaration.

To be sure, those subjects can be “declared” in a topic map but the finding, identifying, deciding on subject identity properties for subjects is a lot more fun.

An Approach for Fast Hierarchical Agglomerative Clustering Using Graphics Processors with CUDA

Sunday, October 10th, 2010

An Approach for Fast Hierarchical Agglomerative Clustering Using Graphics Processors with CUDA Authors: S.A. Arul Shalom, Manoranjan Dash, Minh Tue Keywords: CUDA. Hierarchical clustering, High performance Computing, Computations using Graphics hardware, complete linkage

Abstract:

Graphics Processing Units in today’s desktops can well be thought of as a high performance parallel processor. Each single processor within the GPU is able to execute different tasks independently but concurrently. Such computational capabilities of the GPU are being exploited in the domain of Data mining. Two types of Hierarchical clustering algorithms are realized on GPU using CUDA. Speed gains from 15 times up to about 90 times have been realized. The challenges involved in invoking Graphical hardware for such Data mining algorithms and effects of CUDA blocks are discussed. It is interesting to note that block size of 8 is optimal for GPU with 128 internal processors.

GPUs offer a great deal of processing power and programming them may provoke deeper insights into subject identification and mapping.

Topic mappers may be able to claim NVIDIA based software/hardware and/or Sony Playstation 3 and 4 units (Cell Broadband Engine) as a business expense (check with your tax advisor).

A GPU based paper for TMRA 2011 anyone?

Computation, Information, Cognition: The Nexus and the Liminal – Book

Saturday, July 3rd, 2010

Computation, Information, Cognition: The Nexus and the Liminal by Gordana Dodig-Crnkovic and Susan Stuart, is a deeply delightful collection of essays from European Computing and Philosophy Conference (E-CAP), 2005.

I originally ordered it because of Graeme Hirst’s essay, “Views of Text Meaning in Computational Linguistics: Past, Present, and Future.” More on that in a future post but suffice it to say that he sees computational linguistics returning to a realization that “meaning” isn’t nearly as flat as some people would like to believe.

I could not help perusing some of the other essays and ran across Werner Ceusters and Barry Smith, in “Ontology as the Core Discipline of Biomedical Informatics – Legacies of the Past and Recommendations for the Future Directions of Research,” bashing the work of ISO/IEC TC 37, and its founder, Eugen Wüster, as International Standard Bad Philosophy. Not that I care for “realist ontologies” all that much but it is a very amusing essay.

Not to mention Patrick Allo’s “Formalizing Semantic Information: Lessons from Logical Pluralism.” If I say “informational pluralism” does anyone need more of a hint as to why I would like this essay?

I feel bad that I can’t mention in a reasonable sized posts all the other essays in this volume, or do more to give the flavor of those I mention above. This isn’t a scripting source book but the ideas you will find in it are going to shape the future of computation and our little corner of it for some time to come.