Parallel Programming « Another Word For It

May 3, 2013

Introduction to Parallel Programming

Filed under: GPU,NVIDIA,Parallel Programming — Patrick Durusau @ 1:44 pm

Introduction to Parallel Programming by John Owens, David Luebke, Cheng-Han Lee and Mike Roberts. (UDACITY)

Class Summary:

Learn the fundamentals of parallel computing with the GPU and the CUDA programming environment! In this class, you’ll learn about parallel programming by coding a series of image processing algorithms, such as you might find in Photoshop or Instagram. You’ll be able to program and run your assignments on high-end GPUs, even if you don’t own one yourself.

What Should I Know?

We expect students to have a solid experience with the C programming language and basic knowledge of data structures and algorithms.

What Will I Learn?

You’ll master the fundamentals of massively parallel computing by using CUDA C/C++ to program modern GPUs. You’ll learn the GPU programming model and architecture, key algorithms and parallel programming patterns, and optimization techniques. Your assignments will illustrate these concepts through image processing applications, but this is a parallel computing course and what you learn will translate to any application domain. Most of all we hope you’ll learn how to think in parallel.

In Fast Database Emerges from MIT Class… [Think TweetMap] you read about a new SQL database based on GPUs.

What new approach is going to emerge from your knowing more about GPUs and parallel programming?

Comments Off

April 25, 2013

PODC and SPAA 2013 Accepted Papers

Filed under: Conferences,Distributed Computing,Parallel Programming,Parallelism — Patrick Durusau @ 2:03 pm

ACM Symposium on Principles of Distributed Computing [PODC] accepted papers. (Montréal, Québec, Canada, July 22-24, 2013) Main PODC page.

Symposium on Parallelism in Algorithms and Architectures [SPAA] accepted papers. (Montréal, Québec, Canada, July 23 – 25, 2013) Main SPAA page.

Just scanning the titles reveals a number of very interesting papers.

Suggest you schedule a couple of weeks of vacation in Canada following SPAA before attending the Balisage Conference, August 6-9, 2013.

The weather is quite temperate and the outdoor dining superb.

I first saw this at: PODC AND SPAA 2013 ACCEPTED PAPERS.

Comments Off

April 18, 2013

Parallella: The $99 Linux supercomputer

Filed under: Linux OS,Parallel Programming,Parallela,Parallelism — Patrick Durusau @ 1:23 pm

Parallella: The $99 Linux supercomputer by Steven J. Vaughan-Nichols.

From the post:

What Adapteva has done is create a credit-card sized parallel-processing board. This comes with a dual-core ARM A9 processor and a 64-core Epiphany Multicore Accelerator chip, along with 1GB of RAM, a microSD card, two USB 2.0 ports, 10/100/1000 Ethernet, and an HDMI connection. If all goes well, by itself, this board should deliver about 90 GFLOPS of performance, or — in terms PC users understand — about the same horse-power as a 45GHz CPU.

This board will use Ubuntu Linux 12.04 for its operating system. To put all this to work, the platform reference design and drivers are now available.

From Adapteva.

I wonder which will come first:

A really kick-ass 12 dimensional version of Asteroids?

New approaches to graph processing?

What do you think?

Comments Off

March 30, 2013

Parallel and Concurrent Programming in Haskell

Filed under: Concurrent Programming,Haskell,Parallel Programming — Patrick Durusau @ 6:48 pm

Parallel and Concurrent Programming in Haskell by Simon Marlow.

From the introduction:

While most programming languages nowadays provide some form of concurrent or parallel programming facilities, very few provide as wide a range as Haskell. Haskell prides itself on having the right tool for the job, for as many jobs as possible. If a job is discovered for which there isn’t already a good tool, Haskell’s typical response is to invent a new tool. Haskell’s abstraction facilities provide a fertile ground on which to experiment with different programming idioms, and that is exactly what has happened in the space of concurrent and parallel programming.

Is this a good or a bad thing? You certainly can get away with just one way of writing concurrent programs: threads and locks are in principle all you need. But as the programming community has begun to realise over the last few years, threads and locks are not the right tool for most jobs. Programming with them requires a high degree of expertise even for simple tasks, and leads to programs that have hard-to-diagnose faults.

So in Haskell we embrace the idea that different problems require different tools, and we provide the programmer with a rich selection to choose from. The inevitable downside is that there is a lot to learn, and that is what this book is all about.

In this book I will discuss how to write parallel and concurrent programs in Haskell, ranging from the simple uses of parallelism to speed up computation-heavy programs, to the use of lightweight threads for writing high-speed concurrent network servers. Along the way we’ll see how to use Haskell to write programs that run on the powerful processor in a modern graphics card (GPU), and to write programs that can run on multiple machines in a network (distributed programming).

In O’Reilly’s Open Feedback Publishing System.

If you really want to learn something, write a book about it, edit a book about it or teach a class about it.

Here’s your chance for #2.

Read carefully!

I first saw this in Christophe Lalanne’s A bag of tweets / March 2013.

Comments Off

March 20, 2013

Pyrallel – Parallel Data Analytics in Python

Filed under: Data Analysis,Parallel Programming,Programming,Python — Patrick Durusau @ 6:12 am

Pyrallel – Parallel Data Analytics in Python by Olivier Grisel.

From the webpage:

Overview: experimental project to investigate distributed computation patterns for machine learning and other semi-interactive data analytics tasks.

Scope:

focus on small to medium dataset that fits in memory on a small (10+ nodes) to medium cluster (100+ nodes).

focus on small to medium data (with data locality when possible).

focus on CPU bound tasks (e.g. training Random Forests) while trying to limit disk / network access to a minimum.

do not focus on HA / Fault Tolerance (yet).

do not try to invent new set of high level programming abstractions (yet): use a low level programming model (IPython.parallel) to finely control the cluster elements and messages transfered and help identify what are the practical underlying constraints in distributed machine learning setting.

Disclaimer: the public API of this library will probably not be stable soon as the current goal of this project is to experiment.

This project brought to mind two things:

Experimentation can lead to new approaches, such as “Think like a vertex.” (GraphLab: A Distributed Abstraction…), and
A conference anecdote about a Python application written so the customer would need to upgrade for higher performance. Prototype performed so well the customer didn’t need the fuller version. I thought that was a tribute to Python and the programmer. Opinions differed.

Comments Off

January 29, 2013

Kwong – … Word Sense Disambiguation

Filed under: Disambiguation,Parallel Programming,Word Meaning — Patrick Durusau @ 6:50 pm

New Perspectives on Computational and Cognitive Strategies for Word Sense Disambiguation
by Oi Yee Kwong.

From the description:

Cognitive and Computational Strategies for Word Sense Disambiguation examines cognitive strategies by humans and computational strategies by machines, for WSD in parallel.

Focusing on a psychologically valid property of words and senses, author Oi Yee Kwong discusses their concreteness or abstractness and draws on psycholinguistic data to examine the extent to which existing lexical resources resemble the mental lexicon as far as the concreteness distinction is concerned. The text also investigates the contribution of different knowledge sources to WSD in relation to this very intrinsic nature of words and senses.

I wasn’t aware that the “mental lexicon” of words had been fully described.

Shows what you can learn from reading marketing summaries of research.

Comments Off

January 8, 2013

Can Extragalactic Data Be Standardized? Part 2

Filed under: Astroinformatics,BigData,Parallel Programming — Patrick Durusau @ 11:44 am

Can Extragalactic Data Be Standardized? Part 2 by Ian Armas Foster.

From the post:

Last week, we profiled an effort headed by the Taiwanese Extragalactic Astronomical Data Center (TWEA-DC) to standardize astrophysical computer science.

Specifically, the object laid out by the TWEA-DC team was to create a language specifically designed for far-reaching astronomy—a Domain Specified Language. This would create a standard environment from which software could be developed.

For the researchers at the TWEA-DC, one of the bigger issues lies in the software currently being developed for big data management. Sebastien Foucaud and Nicolas Kamennoff co-authored the paper alongside Yasuhiro Hashimoto and Meng-Feng Tsai, who are based in Taiwan, laying out the TWEA-DC. They argue that since parallel processing is a relatively recent phenomenon, many programmers have not been versed in how to properly optimize their software. Specifically, they go into how the developers are brought up in a world where computing power steadily increases.

Indeed, preparing a new generation of computer scientists and astronomers is a main focus of the data center that opened in 2010. “One of the major goals of the TWEA-DC,” the researchers say, “is to prepare the next generation of astronomers, who will have to keep up pace with the changing face of modern Astronomy.”

Standard environments for software are useful, so long as they are recognized as also being ephemeral.

What was the standard environment for software development in the 1960’s wasn’t the same as the 1980’s nor the 1980’s the same as today.

Along with temporary “standard environments,” we should also construct entrances into and be thinking about exits from those environments.

Comments Off

January 5, 2013

Raspberry Pi: Up and Running

Filed under: Parallel Programming,Supercomputing — Patrick Durusau @ 7:00 am

Raspberry Pi: Up and Running by Matt Richardson.

From the post:

For those of you who haven’t yet played around with Raspberry Pi, this one’s for you. In this how-to video, I walk you through how to get a Raspberry Pi up and running. It’s the first in a series of Raspberry Pi videos that I’m making to accompany Getting Started with Raspberry Pi, a book I wrote with Shawn Wallace. The book covers Raspberry Pi and Linux basics and then works up to using Scratch, Python, GPIO (to control LED’s and switches), and web development on the board.

For the range of applications using the Raspberry Pi, consider: Water Droplet Photography:

We knew when we were designing it that the Pi would make a great bit of digital/real-world meccano. We hoped we’d see a lot of projects we hadn’t considered ourselves being made with it. We’re never so surprised by what people do with it as we are by some of the photography projects we see.

Using a €15 solenoid valve, some Python and a Raspberry Pi to trigger the valve and the camera shutter at the same time, Dave has built a rig for taking water droplet photographs.

The build your own computer kits started us on the path to today.

This is a build your own parallel/supercomputer kit.

Where do you want to go tomorrow?

Comments Off

January 3, 2013

Educational manual for Raspberry Pi released [computer science set]

Filed under: Parallel Programming,Supercomputing — Patrick Durusau @ 3:19 pm

Educational manual for Raspberry Pi released

From the post:

Created by a team of teachers from Computing at School, the newly published Raspberry Pi Education Manual⁠ sets out to provide support for teachers and educators who want to use the Raspberry Pi in a teaching environment. As education has been part of the original Raspberry Pi Foundation’s mission, the foundation has supported the development of the manual.

The manual has chapters on the basics of Scratch, experiments with Python, connecting programs with Twitter and other web services, connecting up the GPIO pins to control devices, and using the Linux command line. Two chapters, one on Greenfoot and GeoGebra, are not currently included in the manual as both applications require a Java virtual machine which is currently being optimised for the Pi platform.

The Scratch section, for example, explains how to work with the graphical programming environment and use sprites, first to animate a cat, then make a man walk, and then animate a bee pollinating flowers. It then changes gear to show how to use Scratch for solving maths problems using variables, creating an “artificial intelligence”, driving a robot, making a car follow a line, and animating a level crossing, and wraps up with a section on creating games.

Reminded me of Kevin Trainor’s efforts: Ontopia Runs on Raspberry Pi [This Rocks!].

The description in the manual of the Raspberry PI as a “computer science set” seems particularly appropriate.

What are you going to discover?

Comments Off

December 29, 2012

Parallel Computing – Prof. Alan Edelman

Filed under: HPC,Parallel Programming,Supercomputing — Patrick Durusau @ 7:35 pm

Parallel Computing – Prof. Alan Edelman MIT Course Number 18.337J / 6.338J.

From the webpage:

This is an advanced interdisciplinary introduction to applied parallel computing on modern supercomputers. It has a hands-on emphasis on understanding the realities and myths of what is possible on the world’s fastest machines. We will make prominent use of the Julia Language software project.

A “modern supercomputer” may be in your near term future. Would not hurt to start preparing now.

Similar courses that you would recommend?

Comments Off

December 17, 2012

The Cooperative Computing Lab

Filed under: Cloud Computing,Clustering (servers),HPC,Parallel Programming,Programming — Patrick Durusau @ 2:39 pm

The Cooperative Computing Lab

I encountered this site while tracking down resources for the DASPOS post.

From the homepage:

The Cooperative Computing Lab at the University of Notre Dame seeks to give ordinary users the power to harness large systems of hundreds or thousands of machines, often called clusters, clouds, or grids. We create real software that helps people to attack extraordinary problems in fields such as physics, chemistry, bioinformatics, biometrics, and data mining. We welcome others at the University to make use of our computing systems for research and education.

As the computing requirements of your data mining or topic maps increase, so will your need for clusters, clouds, or grids.

The CCL offers several software packages for free download that you may find useful.

Comments Off

December 5, 2012

Fast Parallel Sorting Algorithms on GPUs

Filed under: Algorithms,GPU,Parallel Programming,Sorting — Patrick Durusau @ 6:00 am

Fast Parallel Sorting Algorithms on GPUs by Bilal Jan, Bartolomeo Montrucchio, Carlo Ragusa, Fiaz Gul Khan, Omar Khan.

Abstract:

This paper presents a comparative analysis of the three widely used parallel sorting algorithms: OddEven sort, Rank sort and Bitonic sort in terms of sorting rate, sorting time and speed-up on CPU and different GPU architectures. Alongside we have implemented novel parallel algorithm: min-max butterfly network, for finding minimum and maximum in large data sets. All algorithms have been implemented exploiting data parallelism model, for achieving high performance, as available on multi-core GPUs using the OpenCL specification. Our results depicts minimum speed-up19x of bitonic sort against oddeven sorting technique for small queue sizes on CPU and maximum of 2300x speed-up for very large queue sizes on Nvidia Quadro 6000 GPU architecture. Our implementation of full-butterfly network sorting results in relatively better performance than all of the three sorting techniques: bitonic, odd-even and rank sort. For min-max butterfly network, our findings report high speed-up of Nvidia quadro 6000 GPU for high data set size reaching 224 with much lower sorting time.

Is there a GPU in your topic map processing future?

I first saw this in a tweet by Stefano Bertolo.

Comments Off

December 4, 2012

Ontopia Runs on Raspberry Pi [This Rocks!]

Filed under: Ontopia,Parallel Programming,Supercomputing — Patrick Durusau @ 3:18 pm

Ontopia Runs on Raspberry Pi by Kevin Trainor.

From the post:

I am pleased to report that I have had the Ontopia Topic Maps software running on my Raspberry Pi for the past week. Ontopia is a suite of open source tools for building, maintaining and deploying Topic Maps-based applications. The Raspberry Pi is an ultra-affordable ARM GNU/Linux box based upon the work of the Raspberry Pi Foundation. My experience in running the out-of-the-box Ontopia apps (Ontopoly topic map editor, Omnigator topic map browser, and Vizigator topic map vizualizer) has been terrific. Using the Raspberry Pi to run the Apache Tomcat server that hosts the Ontopia software, response time is as good or better than I have experienced when hosting the Ontopia software on a cloud-based Linux server at my ISP. Topic maps open quickly in all three applications and navigation from topic to topic within each application is downright snappy.

As you will see in my discussion of testing below, I have experienced good results with up to two simultaneous users. So, my future test plans include testing with more simultaneous users and testing with the Ontopia RDBMS Backend installed. Based upon the performance that I have experienced so far, I have high hopes. Stay tuned for further reports.

What a great way to introduce topic maps to experimenters!

Thanks Kevin!

Awaiting future results! (And for a Raspberry PI to arrive!)

October 17, 2012

R at 12,000 Cores

Filed under: BigData,MPI,Parallel Programming,R — Patrick Durusau @ 9:19 am

R at 12,000 Cores

From the post:

I am very happy to introduce a new set of packages that has just hit the CRAN. We are calling it the Programming with Big Data in R Project, or pbdR for short (or as I like to jokingly refer to it, ‘pretty bad for dyslexics’). You can find out more about the pbdR project at http://r-pbd.org/

The packages are a natural programming framework that are, from the user’s point of view, a very simple extension of R’s natural syntax, but running in parallel over MPI and handling big data sets with ease. Much of the parallelism we offer is implicit, meaning that you can use code you are already using while achieving massive performance gains.

The packages are free as in beer, and free as in speech. You could call them “free and open source”, or libre software. The source code is free for everyone to look at, extend, re-use, whatever, forever.

At present, the project consists of 4 packages: pbdMPI, pbdSLAP, pbdBASE, and pbdDMAT. The pbdMPI package offers simplified hooks into MPI, making explicit parallel programming over much simpler, and sometimes much faster than with Rmpi. Next up the chain is pbdSLAP, which is a set of libraries pre-bundled for the R user, to greatly simplify complicated installations. The last two packages, pbdBASE and pbdDMAT, offer high-level R syntax for computing with distributed matrix objects at low-level programming speed. The only system requirements are that you have R and an MPI installation.

We have attempted to extensively document the project in a collection of package vignettes; but really, if you are already using R, then much of the work is already familiar to you. Want to take the svd of a matrix? Just use svd(x) or La.svd(x), only “x” is now a distributed matrix object.

One MPI source: OpenMPI. Interested to hear of experiences with other MPI installations.

If you can’t run MPI or don’t want to, be sure to also check out the RHadoop project.

I first saw this at R-Bloggers.

Comments Off

September 20, 2012

25th ACM Symposium on Parallelism in Algorithms and Architectures

Filed under: Algorithms,Conferences,Parallel Programming — Patrick Durusau @ 7:59 pm

25th ACM Symposium on Parallelism in Algorithms and Architectures

Submission Deadlines:
Abstracts: February 11 (11:59 pm EST)
Full papers: February 13 (11:59 pm EST)
These are firm deadlines. No extensions will be granted.
Notification: April 15
Camera-ready copy due: May 14

From the call for papers:

This year, SPAA is co-located with PODC. SPAA defines the term “parallel” broadly, encompassing any computational system that can perform multiple operations or tasks simultaneously. Topics include, but are not limited to:

Parallel and Distributed Algorithms

Parallel and Distributed Data Structures

Green Computing and Power-Efficient Architectures

Management of Massive Data Sets

Parallel Complexity Theory

Parallel and Distributed Architectures

Multi-Core Architectures

Instruction Level Parallelism and VLSI

Compilers and Tools for Concurrent Programming

Supercomputer Architecture and Computing

Transactional Memory Hardware and Software

The Internet and the World Wide Web

Game Theory and Collaborative Learning

Routing and Information Dissemination

Resource Management and Awareness

Peer-to-Peer Systems

Mobile Ad-Hoc and Sensor Networks

Robustness, Self-Stabilization and Security

Synergy of Parallelism in Algorithms, Programming and Architecture

Montreal, Canada, July 23 – 25, 2013.

Think about it. Balisage won’t be that far away, could put some vacation time together with the conferences at either end.

Comments Off

September 15, 2012

Introductory FP Course Materials

Filed under: CS Lectures,Functional Programming,Parallel Programming,Programming — Patrick Durusau @ 7:20 pm

Introductory FP Course Materials by Robert Harper.

First semester introductory programming course.

Second semester data structures and algorithms course.

Deeply awesome body of material.

Enjoy!

Comments Off

September 12, 2012

A Raspberry Pi Supercomputer

Filed under: Computer Science,Parallel Programming,Supercomputing — Patrick Durusau @ 9:55 am

A Raspberry Pi Supercomputer

If you need a supercomputer for processing your topic maps, an affordable one is at hand.

Some assembly required. With Legos no less.

From the ScienceDigest post:

Computational Engineers at the University of Southampton have built a supercomputer from 64 Raspberry Pi computers and Lego.

The team, led by Professor Simon Cox, consisted of Richard Boardman, Andy Everett, Steven Johnston, Gereon Kaiping, Neil O’Brien, Mark Scott and Oz Parchment, along with Professor Cox’s son James Cox (aged 6) who provided specialist support on Lego and system testing.

Professor Cox comments: “As soon as we were able to source sufficient Raspberry Pi computers we wanted to see if it was possible to link them together into a supercomputer. We installed and built all of the necessary software on the Pi starting from a standard Debian Wheezy system image and we have published a guide so you can build your own supercomputer.”

The racking was built using Lego with a design developed by Simon and James, who has also been testing the Raspberry Pi by programming it using free computer programming software Python and Scratch over the summer. The machine, named “Iridis-Pi” after the University’s Iridis supercomputer, runs off a single 13 Amp mains socket and uses MPI (Message Passing Interface) to communicate between nodes using Ethernet. The whole system cost under £2,500 (excluding switches) and has a total of 64 processors and 1Tb of memory (16Gb SD cards for each Raspberry Pi). Professor Cox uses the free plug-in ‘Python Tools for Visual Studio’ to develop code for the Raspberry Pi.

You may also want to visit the Rasberry PI Foundation. Which has the slogan: “An ARM GNU/Linux box for $25. Take a byte!”

In an age with ready access to cloud computing resources, to say nothing of weapon quality toys (Playstation 3’s), for design simulations, there is still a place for inexpensive experimentation.

What hardware configurations will you test out on your Raspberry Pi Supercomputer?

Are there specialized configurations that work better for some subject identity tests than others?

How do hardware constraints influence our approaches to computational problems?

Are we missing solutions because they don’t fit current architectures and therefore aren’t considered? (Not rejected, just don’t come up at all.)

Comments (1)

August 27, 2012

POSIX Threads Programming

Filed under: Parallel Programming,POSIX Threads,Programming — Patrick Durusau @ 2:58 pm

POSIX Threads Programming by Blaise Barney, Lawrence Livermore National Laboratory

From the webpage:

In shared memory multiprocessor architectures, such as SMPs, threads can be used to implement parallelism. Historically, hardware vendors have implemented their own proprietary versions of threads, making portability a concern for software developers. For UNIX systems, a standardized C language threads programming interface has been specified by the IEEE POSIX 1003.1c standard. Implementations that adhere to this standard are referred to as POSIX threads, or Pthreads.

The tutorial begins with an introduction to concepts, motivations, and design considerations for using Pthreads. Each of the three major classes of routines in the Pthreads API are then covered: Thread Management, Mutex Variables, and Condition Variables. Example codes are used throughout to demonstrate how to use most of the Pthreads routines needed by a new Pthreads programmer. The tutorial concludes with a discussion of LLNL specifics and how to mix MPI with pthreads. A lab exercise, with numerous example codes (C Language) is also included.

Level/Prerequisites: This tutorial is one of the eight tutorials in the 4+ day “Using LLNL’s Supercomputers” workshop. It is deal for those who are new to parallel programming with threads. A basic understanding of parallel programming in C is required. For those who are unfamiliar with Parallel Programming in general, the material covered in EC3500: Introduction To Parallel Computing would be helpful.

The capacity for parallelism in computing has become commonplace. How well parallelism is being used?, is a much more difficult question.

Fortunately, exploration of parallelism isn’t limited to cloistered and carefully guarded CS installations. It is quite likely that the computer on your desk has some capacity for parallel processing.

Not enough to simulate the origin of the universe or an atomic bomb explosion but enough to learn the basics of parallelism. You may discover insights that have been overlooked by others.

Won’t know unless you try.

I first saw this at Christopher Lalanne’s A bag of tweets / August 2012.

PS: If you learn POSIX threads, you might want to consider mapping the terminology to vendor specific thread terminology.

Comments Off

July 24, 2012

Oracle closes Fortress language down for good

Filed under: Fortress Language,Parallel Programming — Patrick Durusau @ 3:14 pm

Oracle closes Fortress language down for good by Chris Mayer.

From the post:

Oracle is to cease all production on the long-running Fortress language project, seeking to cast aside any language that isn’t cutting the mustard financially.

Guy Steele, creator of Fortress and also involved in Java’s development under Sun jurisdiction, wrote on his blog: “After working nearly a decade on the design, development, and implementation of the Fortress programming language, the Oracle Labs Programming Language Research Group is now winding down the Fortress project.“

He added: “Ten years is a remarkably long run for an industrial research project (one to three years is much more typical), but we feel that our extended effort has been worthwhile.”

Guy’s post has commentary on points of pride from the Fortress project:

Generators and reducers

Implicit parallelism supported by work-stealing

Nested atomic blocks supported by transactional memory

Parametrically polymorphic types that are not erased

Symmetric multimethod dispatch and parametrically polymorphic methods

Multiple inheritance, inheritance symmetry, and type exclusion

Mathematical syntax

Components and APIs

Dimensions and units

Explicit descriptions of data distribution and processor assignment

Conditional inheritance and conditional method definition

Respectable output for a project? Yes?

To avoid saying something in anger, I did research Oracle’s Support for Open Source and Open Standards:

Berkeley DB

Eclipse

GlassFish

Hudson

InnoDB

Java

Java Platform, Micro Edition (Java ME)

Linux

MySQL

NetBeans

OpenJDK

PHP

VirtualBox

Xen

Free and Open Source Software

Hard for me to say which one of those projects I would trade for Fortress or even ODF/OpenOffice.

But that was Oracle’s call, not mine.

On the other hand, former Oracle support doesn’t bar anyone else from stepping up. So maybe it is your call now?

Parallel processors are here, now, in abundance. Can’t say the same for programming paradigms to take full advantage of them.

Topic maps may help you avoid re-inventing Fortress concepts and mechanisms, if you learn from the past, as opposed to re-inventing it.

Comments Off

June 6, 2012

Concurrent Programming for Scalable Web Architectures

Filed under: Concurrent Programming,Parallel Programming,Scalability,Web Applications — Patrick Durusau @ 7:49 pm

Concurrent Programming for Scalable Web Architectures by Benjamin Erb.

Abstract:

Web architectures are an important asset for various large-scale web applications, such as social networks or e-commerce sites. Being able to handle huge numbers of users concurrently is essential, thus scalability is one of the most important features of these architectures. Multi-core processors, highly distributed backend architectures and new web technologies force us to reconsider approaches for concurrent programming in order to implement web applications and fulfil scalability demands. While focusing on different stages of scalable web architectures, we provide a survey of competing concurrency approaches and point to their adequate usages.

High Scalability has a good list of topics and the table of contents.

Or you can jump to the thesis homepage.

Just in case you are thinking about taking your application to “web scale.” 😉

Comments (1)

June 2, 2012

High-Performance Domain-Specific Languages using Delite

Filed under: Delite,DSL,Machine Learning,Parallel Programming,Scala — Patrick Durusau @ 12:50 pm

High-Performance Domain-Specific Languages using Delite

Description:

This tutorial is an introduction to developing domain specific languages (DSLs) for productivity and performance using Delite. Delite is a Scala infrastructure that simplifies the process of implementing DSLs for parallel computation. The goal of this tutorial is to equip attendees with the knowledge and tools to develop DSLs that can dramatically improve the experience of using high performance computation in important scientific and engineering domains. In the first half of the day we will focus on example DSLs that provide both high-productivity and performance. In the second half of the day we will focus on understanding the infrastructure for implementing DSLs in Scala and developing techniques for defining good DSLs.

The graph manipulation language Green-Marl is one of the subjects of this tutorial.

This resource should be located and “boosted” by a search engine tuned to my preferences.

Skipping breaks, etc., you will find:

Introduction To High Performance DSLs (Kunle Olukotun)
OptiML: A DSL for Machine Learning (Arvind Sujeeth)
Liszt: A DSL for solving mesh-based PDEs (Zach Devito)
Green-Marl: A DSL for efficient Graph Analysis (Sungpack Hong)
Scala Tutorial (Hassan Chafi)
Delite DSL Infrastructure Overview (Kevin Brown)
High Performance DSL Implementation Using Delite (Arvind Sujeeth)
Future Directions in DSL Research (Hassan Chafi)

Compare your desktop computer to the MANIAC 1 (calculations for the first hydrogen bomb).

What have you invented/discovered lately?

Comments Off

May 7, 2012

Parallel clustering with CFinder

Filed under: CFinder,Clustering,Networks,Parallel Programming — Patrick Durusau @ 7:18 pm

Parallel clustering with CFinder by Peter Pollner, Gergely Palla, and Tamas Vicsek.

Abstract:

The amount of available data about complex systems is increasing every year, measurements of larger and larger systems are collected and recorded. A natural representation of such data is given by networks, whose size is following the size of the original system. The current trend of multiple cores in computing infrastructures call for a parallel reimplementation of earlier methods. Here we present the grid version of CFinder, which can locate overlapping communities in directed, weighted or undirected networks based on the clique percolation method (CPM). We show that the computation of the communities can be distributed among several CPU-s or computers. Although switching to the parallel version not necessarily leads to gain in computing time, it definitely makes the community structure of extremely large networks accessible.

If you aren’t familiar with CFinder, you should be.

Comments Off

May 6, 2012

Online resources for handling big data and parallel computing in R

Filed under: BigData,Parallel Programming,R — Patrick Durusau @ 7:44 pm

Online resources for handling big data and parallel computing in R by Yanchang Zhao.

Resources to spice up your reading list for this week:

Compared with many other programming languages, such as C/C++ and Java, R is less efficient and consumes much more memory. Fortunately, there are some packages that enables parallel computing in R and also packages for processing big data in R without loading all data into RAM. I have collected some links to online documents and slides on handling big data and parallel computing in R, which are listed below. Many online resources on other topics related to data mining with R can be found at http://www.rdatamining.com/resources/onlinedocs.

Comments Off

April 28, 2012

Akaros – an open source operating system for manycore architectures

Filed under: Identity,Multi-Core,Parallel Programming — Patrick Durusau @ 6:05 pm

Akaros – an open source operating system for manycore architectures

From the post:

If you are interested in future foward OS designs then you might find Akaros worth a look. It’s an operating system designed for many-core architectures and large-scale SMP systems, with the goals of:

Providing better support for parallel and high-performance applications

Scaling the operating system to a large number of cores

A more indepth explanation of the motiviation behind Akaros can be found in Improving Per-Node Efﬁciency in the Datacenter with NewOS Abstractions by Barret Rhoden, Kevin Klues, David Zhu, and Eric Brewer.

From the paper abstract:

Traditional operating system abstractions are ill-suited for high performance and parallel applications, especially on large-scale SMP and many-core architectures. We propose four key ideas that help to overcome these limitations. These ideas are built on a philosophy of exposing as much information to applications as possible and giving them the tools necessary to take advantage of that information to run more efficiently. In short, high-performance applications need to be able to peer through layers of virtualization in the software stack to optimize their behavior. We explore abstractions based on these ideas and discuss how we build them in the context of a new operating system called Akaros.

Rather than “layers of virtualization” I would say: “layers of identifiable subjects.” That’s hardly surprising but it has implications for this paper and future successors on the same issue.

Issues of inefficiency aren’t due to a lack of programming talent, as the authors ably demonstrate, but rather the limitations placed upon that talent by the subjects our operating systems identify and permit to be addressed.

The paper is an exercise in identifying different subjects than those identified in contemporary operating systems. That abstraction may assist future researchers in positing different subjects for identification and consequences that flow from identifying different subjects.

Comments Off

April 11, 2012

Whamcloud, EMC Collaborate on PLFS and Lustre Integration

Filed under: Lustre,Parallel Programming,PLFS — Patrick Durusau @ 6:17 pm

Whamcloud, EMC Collaborate on PLFS and Lustre Integration

From the post:

Whamcloud, a venture-backed company formed from a worldwide network of high-performance computing (HPC) storage industry veterans, today announced it is extending its working relationship with EMC Corporation (NYSE:EMC). The relationship between the two companies began over a year ago and promotes the open source availability of Lustre. Whamcloud and EMC, a fellow member of the OpenSFS consortium, are extending their collaboration for an additional year.

Whamcloud and EMC will continue working together to provide deeper integration between the Parallel Log-structured File System (PLFS) and Lustre. As part of their joint efforts, Whamcloud and EMC will continue augmenting Lustre’s IO functionality, including the enhancement of small file IO and metadata performance. The two companies will look for multiple ways to contribute to the future feature development of Lustre.

PLFS is a parallel IO abstraction layer that rearranges unstructured, concurrent writes by many clients into sequential writes to unique files (N-1 into N-N) to improve the efficiency of the underlying parallel filesystem. PLFS can reduce checkpoint time by up to several orders of magnitude. Lustre is an open source massively parallel file system, generally used for large scale cluster computing. It is found in over 60% of the TOP100 supercomputing sites.

Less important for the business news aspects but more important as a heads up on Lustre and PLFS.

Parallel semantic monogamy is one thing. Parallel semantic heterogeneity is another. Will your name/company be associated with solutions for the later?

Comments Off

April 3, 2012

Ohio State University Researcher Compares Parallel Systems

Filed under: Cray,GPU,HPC,Parallel Programming,Parallelism — Patrick Durusau @ 4:18 pm

Ohio State University Researcher Compares Parallel Systems

From the post:

Surveying the wide range of parallel system architectures offered in the supercomputer market, an Ohio State University researcher recently sought to establish some side-by-side performance comparisons.

The journal, Concurrency and Computation: Practice and Experience, in February published, “Parallel solution of the subset-sum problem: an empirical study.” The paper is based upon a master’s thesis written last year by former computer science and engineering graduate student Saniyah Bokhari.

“We explore the parallelization of the subset-sum problem on three contemporary but very different architectures, a 128-processor Cray massively multithreaded machine, a 16-processor IBM shared memory machine, and a 240-core NVIDIA graphics processing unit,” said Bokhari. “These experiments highlighted the strengths and weaknesses of these architectures in the context of a well-defined combinatorial problem.”

Bokhari evaluated the conventional central processing unit architecture of the IBM 1350 Glenn Cluster at the Ohio Supercomputer Center (OSC) and the less-traditional general-purpose graphic processing unit (GPGPU) architecture, available on the same cluster. She also evaluated the multithreaded architecture of a Cray Extreme Multithreading (XMT) supercomputer at the Pacific Northwest National Laboratory’s (PNNL) Center for Adaptive Supercomputing Software.

What I found fascinating about this approach was the comparison of:

the strengths and weaknesses of these architectures in the context of a well-defined combinatorial problem.

True enough, there is a place for general methods and solutions, but one pays the price for using general methods and solutions.

Thinking that for subject identity and “merging” in a “big data” context, that we will need a deeper understanding of specific identity and merging requirements. So that the result of that study is one or more well-defined combinatorial problems.

That is to say that understanding one or more combinatorial problems precedes proposing a solution.

You can view/download the thesis by Saniyah Bokhari, Parallel Solution of the Subset-sum Problem: An Empirical Study

Or view the article (assuming you have access):

Parallel solution of the subset-sum problem: an empirical study

Abstract (of the article):

The subset-sum problem is a well-known NP-complete combinatorial problem that is solvable in pseudo-polynomial time, that is, time proportional to the number of input objects multiplied by the sum of their sizes. This product defines the size of the dynamic programming table used to solve the problem. We show how this problem can be parallelized on three contemporary architectures, that is, a 128-processor Cray Extreme Multithreading (XMT) massively multithreaded machine, a 16-processor IBM x3755 shared memory machine, and a 240-core NVIDIA FX 5800 graphics processing unit (GPU). We show that it is straightforward to parallelize this algorithm on the Cray XMT primarily because of the word-level locking that is available on this architecture. For the other two machines, we present an alternating word algorithm that can implement an efficient solution. Our results show that the GPU performs well for problems whose tables fit within the device memory. Because GPUs typically have memories in the order of 10GB, such architectures are best for small problem sizes that have tables of size approximately 1010. The IBM x3755 performs very well on medium-sized problems that fit within its 64-GB memory but has poor scalability as the number of processors increases and is unable to sustain performance as the problem size increases. This machine tends to saturate for problem sizes of 1011 bits. The Cray XMT shows very good scaling for large problems and demonstrates sustained performance as the problem size increases. However, this machine has poor scaling for small problem sizes; it performs best for problem sizes of 1012 bits or more. The results in this paper illustrate that the subset-sum problem can be parallelized well on all three architectures, albeit for different ranges of problem sizes. The performance of these three machines under varying problem sizes show the strengths and weaknesses of the three architectures. Copyright © 2012 John Wiley & Sons, Ltd.

Comments Off

March 30, 2012

Zoltan: Parallel Partitioning, Load Balancing and Data-Management Services

Filed under: Graph Coloring,Parallel Programming,Zoltan — Patrick Durusau @ 4:38 pm

Zoltan: Parallel Partitioning, Load Balancing and Data-Management Services

From project motivation:

Over the past decade, parallel computers have been used with great success in many scientific simulations. While differing in their numerical methods and details of implementation, most applications successfully parallelized to date are “static” applications. Their data structures and memory usage do not change during the course of the computation. Their inter-processor communication patterns are predictable and non-varying. And their processor workloads are predictable and roughly constant throughout the simulation. Traditional finite difference and finite element methods are examples of widely used static applications.

However, increasing use of “dynamic” simulation techniques is creating new challenges for developers of parallel software. For example, adaptive finite element methods refine localized regions the mesh and/or adjust the order of the approximation on individual elements to obtain a desired accuracy in the numerical solution. As a result, memory must be allocated dynamically to allow creation of new elements or degrees of freedom. Communication patterns can vary as refinement creates new element neighbors. And localized refinement can cause severe processor load imbalance as elemental and processor work loads change throughout a simulation.

Particle simulations and crash simulations are other examples of dynamic applications. In particle simulations, scalable parallel performance depends upon a good assignment of particles to processors; grouping physically close particles within a single processor reduces inter-processor communication. Similarly, in crash simulations, assignment of physically close surfaces to a single processor enables efficient parallel contact search. In both cases, data structures and communication patterns change as particles and surfaces move. Re-partitioning of the particles or surfaces is needed to maintain geometric locality of objects within processors.

We developed the Zoltan library to simplilfy many of the difficulties arising in dynamic applications. Zoltan is a collection of data management services for unstructured, adaptive and dynamic applications. It includes a suite of parallel partitioning algorithms, data migration tools, parallel graph coloring tools, distributed data directories, unstructured communication services, and dynamic memory management tools. Zoltan’s data-structure neutral design allows it to be used by a variety of applications without imposing restrictions on application data structures. Its object-based interface provides a simple and inexpensive way for application developers to use the library and researchers to make new capabilities available under a common interface.

The NoSQL advocates only recently discovered “big data.” There are those who have thought long and deep about processing issues for “big data.” New approaches and techniques will go further if compared and contrasted to prior understandings. This is one place for such an effort.

Comments Off

March 24, 2012

The Heterogeneous Programming Jungle

Filed under: Compilers,Heterogeneous Programming,HPC,Language,Language Design,Parallel Programming — Patrick Durusau @ 7:35 pm

The Heterogeneous Programming Jungle by Michael Wolfe.

Michael starts off with one definition of “heterogeneous:”

The heterogeneous systems of interest to HPC use an attached coprocessor or accelerator that is optimized for certain types of computation.These devices typically exhibit internal parallelism, and execute asynchronously and concurrently with the host processor. Programming a heterogeneous system is then even more complex than “traditional” parallel programming (if any parallel programming can be called traditional), because in addition to the complexity of parallel programming on the attached device, the program must manage the concurrent activities between the host and device, and manage data locality between the host and device.

And while he returns to that definition in the end, another form of heterogeneity is lurking not far behind:

Given the similarities among system designs, one might think it should be obvious how to come up with a programming strategy that would preserve portability and performance across all these devices. What we want is a method that allows the application writer to write a program once, and let the compiler or runtime optimize for each target. Is that too much to ask?

Let me reflect momentarily on the two gold standards in this arena. The first is high level programming languages in general. After 50 years of programming using Algol, Pascal, Fortran, C, C++, Java, and many, many other languages, we tend to forget how wonderful and important it is that we can write a single program, compile it, run it, and get the same results on any number of different processors and operating systems.

So there is the heterogeneity of attached coprocessor and, just as importantly, of the processors with coprocessors.

His post concludes with:

Grab your Machete and Pith Helmet

If parallel programming is hard, heterogeneous programming is that hard, squared. Defining and building a productive, performance-portable heterogeneous programming system is hard. There are several current programming strategies that attempt to solve this problem, including OpenCL, Microsoft C++AMP, Google Renderscript, Intel’s proposed offload directives (see slide 24), and the recent OpenACC specification. We might also learn something from embedded system programming, which has had to deal with heterogeneous systems for many years. My next article will whack through the underbrush to expose each of these programming strategies in turn, presenting advantages and disadvantages relative to the goal.

These are languages that share common subjects (think of their target architectures) and so are ripe for a topic map that co-locates their approaches to a particular architecture. Being able to incorporate official and non-official documentation, tests, sample code, etc., might enable faster progress in this area.

The future of HPC processors is almost upon us. It will not do to be tardy.

Comments Off

March 19, 2012

A Parallel Architecture for In-Line Data De-duplication

Filed under: Deduplication,Parallel Programming — Patrick Durusau @ 6:54 pm

A Parallel Architecture for In-Line Data De-duplication by Seetendra Singh Sengar, Manoj Mishra. (2012 Second International Conference on Advanced Computing & Communication Technologies)

Abstract:

Recently, data de-duplication, the hot emerging technology, has received a broad attention from both academia and industry. Some researches focus on the approach by which more redundant data can be reduced and others investigate how to do data de-duplication at high speed. In this paper, we show the importance of data de-duplication in the current digital world and aim at reducing the time and space requirement for data de-duplication. Then, we present a parallel architecture with one node designated as a server and multiple storage nodes. All the nodes, including the server, can do block level in-line de-duplication in parallel. We have built a prototype of the system and present some performance results. The proposed system uses magnetic disks as a storage technology.

Apologies but all I have at the moment is the abstract.

Comments Off

March 9, 2012

Ask For Forgiveness Programming – Or How We’ll Program 1000 Cores

Filed under: Multi-Core,Parallel Programming — Patrick Durusau @ 8:45 pm

Ask For Forgiveness Programming – Or How We’ll Program 1000 Cores

Another approach to multi-core processing:

The argument for a massively multicore future is now familiar: while clock speeds have leveled off, device density is increasing, so the future is cheap chips with hundreds and thousands of cores. That’s the inexorable logic behind our multicore future.

The unsolved question that lurks deep in the dark part of a programmer’s mind is: how on earth are we to program these things? For problems that aren’t embarrassingly parallel, we really have no idea. IBM Research’s David Ungar has an idea. And it’s radical in the extreme…

After reading this article, ask yourself, how would you apply this approach with topic maps?

Comments Off

« Newer Posts — Older Posts »

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 3, 2013

April 25, 2013

April 18, 2013

March 30, 2013

March 20, 2013

January 29, 2013

January 8, 2013

January 5, 2013

January 3, 2013

December 29, 2012

December 17, 2012

December 5, 2012

December 4, 2012

October 17, 2012

September 20, 2012

September 15, 2012

September 12, 2012

August 27, 2012

July 24, 2012

June 6, 2012

June 2, 2012

May 7, 2012

May 6, 2012

April 28, 2012

April 11, 2012

April 3, 2012

March 30, 2012

March 24, 2012

March 19, 2012

March 9, 2012