Parallel Programming « Another Word For It

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 1, 2012

Retrofitting Programming Languages for a Parallel World

Filed under: Parallel Programming,Parallelism — Patrick Durusau @ 9:02 pm

Retrofitting Programming Languages for a Parallel World by James Reinders.

From the post:

The most widely used computer programming languages today were not designed as parallel programming languages. But retrofitting existing programming languages for parallel programming is underway. We can compare and contrast retrofits by looking at four key features, five key qualities, and the various implementation approaches.

In this article, I focus on the features and qualities, leaving the furious debates over best approaches (language vs. library vs. directives, and abstract and portable vs. low-level with lots of controls) for another day.

Four key features:

Memory model
Synchronization
Tasks, not threads
Data, parallel support

Five qualities to desire:

Composability
Sequential reasoning
Communication minimization
Performance portability
Safety

Parallel processing as the default isn’t that far in the future.

Do you see any of these issues as not being relevant for the processing of topic maps?

And unlike programming languages, topic maps by definition can operate in semantically heterogeneous environments.

How’s that for icing on the cake of parallel processing?

The time to address the issues of parallel processing of topic maps is now.

Suggestions?

Comments Off

February 18, 2012

Signal/Collect

Filed under: Graphs,Parallel Programming,Signal/Collect — Patrick Durusau @ 5:24 pm

Signal/Collect: a framework for parallel graph processing

I became aware of Signal/Collect because of René Pickhardt’s graph reading club assignment for 22 February 2012.

A paper to use as a starting point for Signal/Collect: Signal/Collect: Graph Algorithms for the (Semantic) Web.

From the code.google.com website (first link above):

Signal/Collect is a programming model and framework for large-scale graph processing. The model is expressive enough to concisely formulate many iterated and data-flow algorithms on graphs, while allowing the framework to transparently parallelize the processing. The current release of the framework is not distributed yet, but this is planned for March 2012.

In Signal/Collect an algorithm is written from the perspective of vertices and edges. Once a graph has been specified the edges will signal and the vertices will collect. When an edge signals it computes a message based on the state of its source vertex. This message is then sent along the edge to the target vertex of the edge. When a vertex collects it uses the received messages to update its state. These operations happen in parallel all over the graph until all messages have been collected and all vertex states have converged.

Many algorithms have very simple and elegant implementations in Signal/Collect. You find more information about the programming model and features in the project wiki. Please take the time to explore some of the example algorithms below.

PageRank

Single-source shortest path

Vertex coloring

Sudoku solver

Web crawler

Schelling Segregation

Conway's Game of Life

Signal/Collect development and source code is now on github.

The name of the project is written variously: Signal/Collect, signal collect, signal-collect. Except for when I am quoting other sources, I will be using Signal/Collect.

Comments Off

February 5, 2012

Parallelizing Machine Learning– Functionally: A Framework and Abstractions for Parallel Graph Processing

Filed under: Parallel Programming,Scala — Patrick Durusau @ 7:57 pm

Parallelizing Machine Learning– Functionally: A Framework and Abstractions for Parallel Graph Processing by Heather Miller and Philipp Haller.

Abstract:

Implementing machine learning algorithms for large data, such as the Web graph and social networks, is challenging. Even though much research has focused on making sequential algorithms more scalable, their running times continue to be prohibitively long. Meanwhile, parallelization remains a formidable challenge for this class of problems, despite frameworks like MapReduce which hide much of the associated complexity.We present a framework for implementing parallel and distributed machine learning algorithms on large graphs, flexibly, through the use of functional programming abstractions. Our aim is a system that allows researchers and practitioners to quickly and easily implement (and experiment with) their algorithms in a parallel or distributed setting. We introduce functional combinators for the flexible composition of parallel, aggregation, and sequential steps. To the best of our knowledge, our system is the first to avoid inversion of control in a (bulk) synchronous parallel model.

An area of research that appears to have a great deal of promise. Very much worth your attention.

Comments Off

December 30, 2011

Explorations in Parallel Distributed Processing:..

Filed under: Distributed Systems,Parallel Programming — Patrick Durusau @ 6:00 pm

Explorations in Parallel Distributed Processing: A Handbook of Models, Programs, and Exercises by James L. McClelland.

From Chapter 1, Introduction:

Several years ago, Dave Rumelhart and I first developed a handbook to introduce others to the parallel distributed processing (PDP) framework for modeling human cognition. When it was first introduced, this framwork represented a new way of thinking about perception, memory, learning, and thought, as well as a new way of characterizing the computational mechanisms for intelligent information processing in general. Since it was first introduced, the framework has continued to evolve, and it is still under active development and use in modeling many aspects of cognition and behavior.

Our own understanding of parallel distributed processing came about largely through hands-on experimentation with these models. And, in teaching PDP to others, we discovered that their understanding was enhanced through the same kind of hands-on simulation experience. The original edition of the handbook was intended to help a wider audience gain this kind of experience. It made many of the simulation models discussed in the two PDP volumes (Rumelhart et al., 1986; McClelland et al., 1986) available in a form that is intended to be easy to use. The handbook also provided what we hoped were accessible expositions of some of the main mathematical ideas that underlie the simulation models. And it provided a number of prepared exercises to help the reader begin exploring the simulation programs.

The current version of the handbook attempts to bring the older handbook up to date. Most of the original material has been kept, and a good deal of new material has been added. All of simulation programs have been implemented or re-implemented within the MATLAB programming environment. In keeping with other MATLAB projects, we call the suite of programs we have implemented the PDPTool software.

Latest revision (Sept. 2011) is online for your perusal. A good way to develop an understanding of parallel processing.

Apologies for not seeing this before Christmas. Please consider it an early birthday present for your birthday in 2012!

Comments Off

November 2, 2011

GPUStats

Filed under: CUDA,Parallel Programming,Statistics — Patrick Durusau @ 6:25 pm

GPUStats

If you need to access a NVIDIA CUDA interface for statistical calculations, GPUStats may be of assistance.

From the webpage:

gpustats is a PyCUDA-based library implementing functionality similar to that present in scipy.stats. It implements a simple framework for specifying new CUDA kernels and extending existing ones. Here is a (partial) list of target functionality:

Probability density functions (pdfs). These are intended to speed up likelihood calculations in particular in Bayesian inference applications, such as in PyMC

Random variable generation using CURAND

Comments Off

November 1, 2011

Parallel approaches in next-generation sequencing analysis pipelines

Filed under: Bioinformatics,Parallel Programming,Parallelism — Patrick Durusau @ 3:34 pm

Parallel approaches in next-generation sequencing analysis pipelines

From the post:

My last post described a distributed exome analysis pipeline implemented on the CloudBioLinux and CloudMan frameworks. This was a practical introduction to running the pipeline on Amazon resources. Here I’ll describe how the pipeline runs in parallel, specifically diagramming the workflow to identify points of parallelization during lane and sample processing.

Incredible innovation in throughput makes parallel processing critical for next-generation sequencing analysis. When a single Hi-Seq run can produce 192 samples (2 flowcells x 8 lanes per flowcell x 12 barcodes per lane), the analysis steps quickly become limited by the number of processing cores available.

The heterogeneity of architectures utilized by researchers is a major challenge in building re-usable systems. A pipeline needs to support powerful multi-core servers, clusters and virtual cloud-based machines. The approach we took is to scale at the level of individual samples, lanes and pipelines, exploiting the embarassingly parallel nature of the computation. An AMQP messaging queue allows for communication between processes, independent of the system architecture. This flexible approach allows the pipeline to serve as a general framework that can be easily adjusted or expanded to incorporate new algorithms and analysis methods.

The message passing based parallelism sounds a lot like Storm doesn’t it? Will message passing be what frees us from the constraints of architecture? Wondering what sort of performance “hit” we will take when not working really close to the metal? But, then the “metal” may become the basis for such message passing systems. Not quite yet but perhaps not so far away either.

Comments Off

A Convenient Framework for Efﬁcient Parallel Multipass Algorithms

Filed under: MapReduce,Parallel Programming — Patrick Durusau @ 3:32 pm

A Convenient Framework for Efﬁcient Parallel Multipass Algorithms by Markus Weimer, Sriram Rao, and Martin Zinkevich.

Abstract:

The amount of data available is ever-increasing. At the same time, the available time to learn from the available data is decreasing in many applications, especially on the web. These two trends together with limited improvements in per-cpu speed and hard disk bandwidth lead to the need for parallel machine learning algorithms. Numerous have been proposed in the past (including [1, 3, 4]). Many of them make use of frameworks like MapReduce [2], as it facilitates easy parallelization and provides fault tolerance and data local computation at the framework level. However, MapReduce also introduces some inherent inefﬁciencies when compared to message passing systems like MPI.

In this paper, we present a computational framework based on Workers and Aggregators for dataparallel computations that retains the simplicity of MapReduce, while offering a signiﬁcant speedup for a large class of algorithms. We report experiments based on several implementations of Stochastic Gradient Descent (SGD): The well known sequential variant as well as a parallel version inspired by our recent work in [5] which we implemented both in MapReduce and the proposed framework.

The direct passing of messages reminds me of Storm.

Comments?

Comments Off

October 29, 2011

We Really Don’t Know How To Compute!

Filed under: Algorithms,CS Lectures,Parallel Programming — Patrick Durusau @ 7:20 pm

We Really Don’t Know How To Compute! by Gerald Jay Sussman.

This is a must watch video! Sussman tries to make the case that we need to think differently about computing. For example, being able to backtrack the provenance of data in a series of operations. Or being able to maintain inconsistent world views while maintaining locally consistent world views in a single system. Or being able to say where world views diverge, without ever claiming either one to be correct/incorrect.

Argues in general for systems that will be robust enough for massively parallel programming. Where inconsistencies and the like are going to abound when applied to say all known medical literature. Isn’t going to be helpful if our systems fall over on their sides when they encounter inconsistency.

A lot of what Sussman says I think is largely applicable to parallel processing of topic maps. Certainly will be looking up some of his class videos from MIT.

From the webpage:

Summary

Gerald Jay Sussman compares our computational skills with the genome, concluding that we are way behind in creating complex systems such as living organisms, and proposing a few areas of improvement.

Bio

Gerald Jay Sussman is the Panasonic Professor of EE at MIT. Sussman is a coauthor (with Hal Abelson and Julie Sussman) of the MIT computer science textbook “Structure and Interpretation of Computer Programs”. Sussman has had a number of important contributions to Artificial Intelligence, and with his former student, Guy L. Steele Jr., invented the Scheme programming language in 1975.

About the conference

Strange Loop is a multi-disciplinary conference that aims to bring together the developers and thinkers building tomorrow’s technology in fields such as emerging languages, alternative databases, concurrency, distributed systems, mobile development, and the web.

Comments (1)

October 9, 2011

Parallel frameworks for graph processing

Filed under: Graphs,Parallel Programming — Patrick Durusau @ 6:41 pm

Parallel frameworks for graph processing from Lambda the Ultimate.

Summaries and then comments on GraphLab and John Gilbert’s Parallel Combinatorial BLAS: A Toolbox for High-Performance Graph Computation (papers, slides).

Contribute your comments, pointers to other resources?

Comments Off

October 3, 2011

Parallel Haskell Tutorial: The Par Monad [Topic Map Profiling?]

Filed under: Haskell,Parallel Programming,Parallelism — Patrick Durusau @ 7:07 pm

Parallel Haskell Tutorial: The Par Monad

Parallel programming will become largely transparent at some point but not today. 😉

Walk through parallel processing of Sudoku and k-means, as well as measuring performance and debugging. Code is available.

I think the debugging aspects of this tutorial stand out the most for me. Understanding a performance issue as opposed to throwing resources at it seems like the better approach to me.

I know that a lot of time has been spent by the vendors of topic maps software profiling their code, but I wonder if anyone has profiled a topic map?

That is we make choices in terms of topic map construction, some of which may result in more or less processing demands, to reach the same ends.

As topic maps grow in size, the “how” a topic map is written may be as important as the “why” certain subjects were represented and merged.

Have you profiled the construction of your topic maps? Comments appreciated.

Comments Off

September 23, 2011

ParLearning 2012 (silos or maps?)

Filed under: Cloud Computing,Clustering (servers),Inference,Machine Learning,Parallel Programming — Patrick Durusau @ 6:13 pm

ParLearning 2012 : Workshop on Parallel and Distributed Computing for Machine Learning and Inference Problems

Dates:

When May 25, 2012 – May 25, 2012
Where Shanghai, China
Submission Deadline Dec 19, 2011
Notification Due Feb 1, 2012
Final Version Due Feb 21, 2012

From the notice:

HIGHLIGHTS

Foster collaboration between HPC community and AI community

Applying HPC techniques for learning problems

Identifying HPC challenges from learning and inference

Explore a critical emerging area with strong industry interest without overlapping with existing IPDPS workshops

Great opportunity for researchers worldwide for collaborating with Chinese Academia and Industry

CALL FOR PAPERS

Authors are invited to submit manuscripts of original unpublished research that demonstrate a strong interplay between parallel/distributed computing techniques and learning/inference applications, such as algorithm design and libraries/framework development on multicore/ manycore architectures, GPUs, clusters, supercomputers, cloud computing platforms that target applications including but not limited to:

Learning and inference using large scale Bayesian Networks

Large scale inference algorithms using parallel TPIC models, clustering and SVM etc.

Parallel natural language processing (NLP).

Semantic inference for disambiguation of content on web or social media

Discovering and searching for patterns in audio or video content

On-line analytics for streaming text and multimedia content

Comparison of various HPC infrastructures for learning

Large scale learning applications in search engine and social networks

Distributed machine learning tools (e.g., Mahout and IBM parallel tool)

Real-time solutions for learning algorithms on parallel platforms

If you are wondering what role topic maps have to play in this arena, ask yourself the following question:

Will the systems and techniques demonstrated at this conference use the same means to identify the same subjects?*

If your answer is no, what would you suggest is the solution for mapping different identifications of the same subjects together?

My answer to that question is to use topic maps.

*Whatever your ascribe as its origin, semantic diversity is part and parcel of the human condition. We can either develop silos or maps across silos. Which do you prefer?

Comments Off

August 31, 2011

HipG: Parallel Processing of Large-Scale Graphs

Filed under: Graphs,HipG,Parallel Programming — Patrick Durusau @ 7:43 pm

HipG: Parallel Processing of Large-Scale Graphs

Abstract:

Distributed processing of real-world graphs is challenging due to their size and the inherent irregular structure of graph computations. We present HipG, a distributed framework that facilitates programming parallel graph algorithms by composing the parallel application automatically from the user-deﬁned pieces of sequential work on graph nodes. To make the user code high-level, the framework provides a uniﬁed interface to executing methods on local and non-local graph nodes and an abstraction of exclusive execution. The graph computations are managed by logical objects called synchronizers, which we used, for example, to implement distributed divide-and-conquer decomposition into strongly connected components. The code written in HipG is independent of a particular graph representation, to the point that the graph can be created on-the-ﬂy, i.e. by the algorithm that computes on this graph, which we used to implement a distributed model checker. HipG programs are in general short and elegant; they achieve good portability, memory utilization, and performance.

Graphs are stored in SVC-II distributed graph format described in Compressed and Distributed File Formats for Labeled Transition Systems by Stefan Blom, Izak van Langevelde, and Bert Lissera. (Electronic Notes in Theoretical Computer Science Volume 89, Issue 1, September 2003, Pages 68-83 PDMC 2003, Parallel and Distributed Model Checking (Satellite Workshop of CAV ’03)) [The abstract is so vague as to be useless. I tried to find an “open” copy of the paper but failed. Can you point to one?]

Implementation: www.cs.vu.nl/~ekr/HipG

From the implementation webpage:

HipG is a library for high-level parallel processing of large-scale graphs. HipG is implemented in Java and is designed for distributed-memory machine. Besides basic distributed graph algorithms it handles divide-and-conquer graph algorithms and algorithms that execute on graphs created on-the-fly. It is designed for clusters of machines, but can also be tested on desktops – all you need is a recent Java runtime environment. HipG is work in progress! (as of Apr’11)

Comments Off

August 1, 2011

STINGER: Spatio-Temporal Interaction Networks and Graphs (STING) Extensible Representation

Filed under: Graphs,Parallel Programming,STINGER — Patrick Durusau @ 3:52 pm

STINGER: Spatio-Temporal Interaction Networks and Graphs (STING) Extensible Representation by David A. Bader, Georgia Institute of Technolgy; Jonathan Berry, Sandia National Laboratories; Adam Amos-Binks, Carleton University, Canada; Daniel Chavarrıa-Miranda, Pacific Northwest National Laboratory; Charles Hastings, Hayden Software Consulting, Inc.; Kamesh Madduri, Lawrence Berkeley National Laboratory; and, Steven C. Poulos, U.S. Department of Defense. Dated May 9, 2009.

Abstract:

In this document, we propose a dynamic graph data structure that can serve as a common data structure for multiple real-world applications. The extensible representation for dynamic complex networks is space-efficient, allows parallelism over vertices and edges independently, and can be used for efficient checkpoint/restart of the data.

Describes a deeply interesting data structure for graphs that can be used on different frameworks.

See the Stinger wiki page (with source code as attachments).

And, see D. Ediger, K. Jiang, J. Riedy, and D.A. Bader, “Massive Streaming Data Analytics: A Case Study with Clustering Coefficients,” 4th Workshop on Multithreaded Architectures and Applications (MTAAP), Atlanta, GA, April 23, 2010.

Abstract:

We present a new approach for parallel massive graph analysis of streaming, temporal data with a dynamic and extensible representation. Handling the constant stream of new data from health care, security, business, and social network applications requires new algorithms and data structures. We examine data structure and algorithm trade-offs that extract the parallelism necessary for high-performance updating analysis of massive graphs. Static analysis kernels often rely on storing input data in a specific structure. Maintaining these structures for each possible kernel with high data rates incurs a significant performance cost. A case study computing clustering coefficients on a general-purpose data structure demonstrates incremental updates can be more efficient than global recomputation. Within this kernel, we compare three methods for dynamically updating local clustering coefficients: a brute-force local recalculation, a sorting algorithm, and our new approximation method using a Bloom filter. On 32 processors of a Cray XMT with a synthetic scale-free graph of 2²⁴ ≈ 16 million vertices and 2²⁹ ≈ 537 million edges, the brute-force method processes a mean of over 50,000 updates per second and our Bloom filter approaches 200,000 updates per second.

The authors refer to their approach as “massive streaming data analytics“. I think you will agree.

OK, admittedly they used a Cray XMT. But, such processing power will be available the average site sooner than you think. Soon enough that reading along these lines will put you ahead of the next curve.

Comments (1)

July 28, 2011

MATLAB GPU / CUDA experiences

Filed under: CUDA,GPU,Mathematics,Parallel Programming — Patrick Durusau @ 6:57 pm

MATLAB GPU / CUDA experiences and tutorials on my laptop – Introduction

From the post:

These days it seems that you can’t talk about scientific computing for more than 5 minutes without somone bringing up the topic of Graphics Processing Units (GPUs). Originally designed to make computer games look pretty, GPUs are massively parallel processors that promise to revolutionise the way we compute.

A brief glance at the specification of a typical laptop suggests why GPUs are the new hotness in numerical computing. Take my new one for instance, a Dell XPS L702X, which comes with a Quad-Core Intel i7 Sandybridge processor running at up to 2.9Ghz and an NVidia GT 555M with a whopping 144 CUDA cores. If you went back in time a few years and told a younger version of me that I’d soon own a 148 core laptop then young Mike would be stunned. He’d also be wondering ‘What’s the catch?’

Parallel computing has been around for years but in the form of GPUs it has reached the hands of hackers and innovators. Will your next topic map application take advantage of parallel processing?

Comments Off

July 16, 2011

Python for brain mining:…

Filed under: Machine Learning,Parallel Programming,Parallelism,Python,Visualization — Patrick Durusau @ 5:42 pm

Python for brain mining: (neuro)science with state of the art machine learning and data visualization by Gaël Varoquaux.

Brief slide deck on three tools:

Mayavi: For 3-D visualizations.

scikit-learn, which we reported on at: scikits.learn machine learning in Python.

Joblib: running Python function as pipeline jobs.

All three look useful, although I suspect Joblib may be the one of more immediate interest.

Depends on your interests. Comments?

Comments Off

April 17, 2011

Thrust Graph Library

Filed under: CUDA,Graphic Processors,Graphs,Parallel Programming,Visualization — Patrick Durusau @ 5:28 pm

Thrust Graph Library

From the website:

Thrust Graph Library provides graph container, algorithm, and other concepts like a Boost Graph Library. This Library based on the thrust, which is a CUDA library of parallel algorithms with an interface resembling the C++ Standard Template Library (STL).

Comments Off

January 15, 2011

How to Think about Parallel Programming: Not!

Filed under: Language Design,Parallel Programming,Subject Identity — Patrick Durusau @ 5:04 pm

How to Think about Parallel Programming: Not! by Guy Steele is a deeply interesting presentation on how not to approach parallel programming. The central theme is that languages should provide parallelism transparently, without programmers having to think in parallel.

Parallel processing of topic maps is another way to scale topic map for particular situations.

How to parallel process questions of subject identity is an open and possibly domain specific issue.

Watch the presentation even if you are only seeking an entertaining account of my first program.

Comments Off

« Newer Posts