Archive for the ‘Multi-Core’ Category

Virtual School summer courses…

Monday, May 6th, 2013

Virtual School summer courses on data-intensive and many-core computing

From the webpage:

Graduate students, post-docs and professionals from academia, government, and industry are invited to sign up now for two summer school courses offered by the Virtual School of Computational Science and Engineering.

These Virtual School courses will be delivered to sites nationwide using high-definition videoconferencing technologies, allowing students to participate at a number of convenient locations where they will be able to work with a cohort of fellow computational scientists, have access to local experts, and interact in real time with course instructors.

The Data Intensive Summer School focuses on the skills needed to manage, process, and gain insight from large amounts of data. It targets researchers from the physical, biological, economic, and social sciences who need to deal with large collections of data. The course will cover the nuts and bolts of data-intensive computing, common tools and software, predictive analytics algorithms, data management, and non-relational database models.


For more information about the Data-Intensive Summer School, including pre-requisites and course topics, visit

The Proven Algorithmic Techniques for Many-core Processors summer school will present students with the seven most common and crucial algorithm and data optimization techniques to support successful use of GPUs for scientific computing.

Studying many current GPU computing applications, the course instructors have learned that the limits of an application’s scalability are often related to some combination of memory bandwidth saturation, memory contention, imbalanced data distribution, or data structure/algorithm interactions. Successful GPU application developers often adjust their data structures and problem formulation specifically for massive threading and executed their threads leveraging shared on-chip memory resources for bigger impact. The techniques presented in the course can improve performance of applicable kernels by 2-10X in current processors while improving future scalability.


For more information about the Proven Algorithmic Techniques for Many-core Processors course, including pre-requisites and course topics, visit

Think of it as summer camp. For $100 (waived at some locations), it would be hard to do better.

atomic<> Weapons

Saturday, February 16th, 2013

atomic<> Weapons by Herb Sutter.

C++ and Beyond 2012: Herb Sutter – atomic<> Weapons, 1 of 2

C++ and Beyond 2012: Herb Sutter – atomic<> Weapons, 2 of 2


This session in one word: Deep.

It’s a session that includes topics I’ve publicly said for years is Stuff You Shouldn’t Need To Know and I Just Won’t Teach, but it’s becoming achingly clear that people do need to know about it. Achingly, heartbreakingly clear, because some hardware incents you to pull out the big guns to achieve top performance, and C++ programmers just are so addicted to full performance that they’ll reach for the big red levers with the flashing warning lights. Since we can’t keep people from pulling the big red levers, we’d better document the A to Z of what the levers actually do, so that people don’t SCRAM unless they really, really, really meant to.

With all the recent posts about simplicity and user interaction, some readers may be getting bored.

Never fear, something a bit more challenging for you.

Multicore memory models along with comments that cite even more research.

Plus I liked the line: “…reach for the big red levers with the flashing warning lights.”


Tilera’s TILE-Gx Processor Family and the Open Source Community [topic maps lab resource?]

Thursday, June 21st, 2012

Tilera’s TILE-Gx Processor Family and the Open Source Community Deliver the World’s Highest Performance per Watt to Networking, Multimedia, and the Cloud

It’s summer and on hot afternoons it’s easy to look at all the cool stuff at online trade zines. Like really high-end processors that we could stuff in our boxes, to run, well, really complicated stuff to be sure. šŸ˜‰

On one hand we should be mindful that our toys have far more processing power than mainframes of not too long ago. So we need to step up our skill at using the excess capacity on our desktops.

On the other hand, it would be nice to have access to cutting edge processors that will be common place in another cycle or two, today!

From the post:

TileraĀ® Corporation, the leader in 64-bit manycore general purpose processors, announced the general availability of its Multicore Development Environmentā„¢ (MDE) 4.0 release on the TILE-Gx processor family. The release integrates a complete Linux distribution including the kernel 2.6.38, glibc 2.12, GNU tool chain, more than 3000 CentOS 6.2 packages, and the industry’s most advanced manycore tools developed by Tilera in collaboration with the open source community. This release brings standards, familiarity, ease of use, quality and all the development benefits of the Linux environment and open source tools onto the TILE-Gx processor family; both the world’s highest performance and highest performance per watt manycore processor in the market. Tilera’s MDE 4.0 is available now.

“High quality software and standard programming are essential elements for the application development process. Developers don’t have time to waste on buggy and hard to program software tools, they need an environment that works, is easy and feels natural to them,” said Devesh Garg, co-founder, president and chief executive officer, Tilera. “From 60 million packets per second to 40 channels of H.264 encoding on a Linux SMP system, this release further empowers developers with the benefits of manycore processors.”

Using the TILE-Gx processor family and the MDE 4.0 software release, customers have demonstrated high performance, low latency, and the highest performance per watt on many applications. These include Firewall, Intrusion Prevention, Routers, Application Delivery Controllers, Intrusion Detection, Network Monitoring, Network Packet Brokering, Application Switching for Software Defined Networking, Deep Packet Inspection, Web Caching, Storage, High Frequency Trading, Image Processing, and Video Transcoding.

The MDE provides a comprehensive runtime software stack, including Linux kernel 2.6.38, glibc 2.12, binutil, Boost, stdlib and other libraries. It also provides full support for Perl, Python, PHP, Erlang, and TBB; high-performance kernel and user space PCIe drivers; high performance low latency Ethernet drivers; and a hypervisor for hardware abstraction and virtualization. For development tools the MDE includes standard C/C++ GNU compiler v4.4 and 4.6; an Eclipse Integrated Development Environment (IDE); debugging tools such as gdb 7 and mudflap; profiling tools including gprof, oprofile, and perf_events; native and cross build environments; and graphical manycore application debugging and profiling tools.

Should a topic maps lab offer this sort of resource to a geographically distributed set of researchers? (Just curious. I don’t have funding but should the occasion arise.)

Even with the cloud, thinking topic map researchers need access to high-end architectures for experiments with data structures and processing techniques.

Akaros – an open source operating system for manycore architectures

Saturday, April 28th, 2012

Akaros – an open source operating system for manycore architectures

From the post:

If you are interested in future foward OS designs then you might find Akaros worth a look. It’s an operating system designed for many-core architectures and large-scale SMP systems, with the goals of:

  • Providing better support for parallel and high-performance applications
  • Scaling the operating system to a large number of cores

A more indepth explanation of the motiviation behind Akaros can be found in Improving Per-Node Efļ¬ciency in the Datacenter with NewOS Abstractions by Barret Rhoden, Kevin Klues, David Zhu, and Eric Brewer.

From the paper abstract:

Traditional operating system abstractions are ill-suited for high performance and parallel applications, especially on large-scale SMP and many-core architectures. We propose four key ideas that help to overcome these limitations. These ideas are built on a philosophy of exposing as much information to applications as possible and giving them the tools necessary to take advantage of that information to run more efficiently. In short, high-performance applications need to be able to peer through layers of virtualization in the software stack to optimize their behavior. We explore abstractions based on these ideas and discuss how we build them in the context of a new operating system called Akaros.

Rather than “layers of virtualization” I would say: “layers of identifiable subjects.” That’s hardly surprising but it has implications for this paper and future successors on the same issue.

Issues of inefficiency aren’t due to a lack of programming talent, as the authors ably demonstrate, but rather the limitations placed upon that talent by the subjects our operating systems identify and permit to be addressed.

The paper is an exercise in identifying different subjects than those identified in contemporary operating systems. That abstraction may assist future researchers in positing different subjects for identification and consequences that flow from identifying different subjects.

Ask For Forgiveness Programming – Or How We’ll Program 1000 Cores

Friday, March 9th, 2012

Ask For Forgiveness Programming – Or How We’ll Program 1000 Cores

Another approach to multi-core processing:

The argument for a massively multicore future is now familiar: while clock speeds have leveled off, device density is increasing, so the future is cheap chips with hundreds and thousands of cores. That’s the inexorable logic behind our multicore future.

The unsolved question that lurks deep in the dark part of a programmer’s mind is: how on earth are we to program these things? For problems that aren’t embarrassingly parallel, we really have no idea. IBM Research’s David Ungar has an idea. And it’s radical in the extreme…

After reading this article, ask yourself, how would you apply this approach with topic maps?