Hardware for Big Data, Graphs and Large-scale Computation by Rob Farber.
From the post:
Recent announcements by Intel and NVIDIA indicate that massively parallel computing with GPUs and Intel Xeon Phi will no longer require passing data via the PCIe bus. The bad news is that these standalone devices are still in the design phase and are not yet available for purchase. Instead of residing on the PCIe bus as a second-class system component like a disk or network controller, the new Knights Landing processor announced by Intel at ISC’13 will be able to run as a standalone processor just like a Sandy Bridge or any other multi-core CPU. Meanwhile, NVIDIA’s release of native ARM compilation in CUDA 5.5 provides a necessary next step toward Project Denver, which is NVIDIAs integration of a 64-bit ARM processor and a GPU. This combination, termed a CP-GP (or ceepee-geepee) in the media, can leverage the energy savings and performance of both architectures.
Of course, the NVIDIA strategy also opens the door to the GPU acceleration of mobile phone and other devices in the ARM dominated low-power, consumer and real-time markets. In the near 12- to 24-month timeframe, customers should start seeing big-memory standalone systems based on Intel and NVIDIA technology that only require power and a network connection. The need for a separate x86 computer to host one or more GPU or Intel Xeon Phi coprocessors will no longer be a requirement.
The introduction of standalone GPU and Intel Xeon Phi devices will affect the design decisions made when planning the next generation of leadership class supercomputers, enterprise data center procurements, and teraflop/s workstations. It also will affect the software view in programming these devices, because the performance limitations of the PCIe bus and the need to work with multiple memory spaces will no longer be compulsory.
…
Ray provides a great peek at hardware that is coming and current high performance computing, in particular, processing graphs.
Resources mentioned in Rob’s post without links:
Rob’s Intel Xeon Phi tutorial at Dr. Dobbs:
Programming Intel’s Xeon Phi: A Jumpstart Introduction
CUDA vs. Phi: Phi Programming for CUDA Developers
Getting to 1 Teraflop on the Intel Phi Coprocessor
Numerical and Computational Optimization on the Intel Phi
Rob’s GPU Technology Conference presentations:
Simplifying Portable Killer Apps with OpenACC and CUDA-5 Concisely and Efficiently.
Clicking GPUs into a Portable, Persistent and Scalable Massive Data Framework.
(The links are correct but put you one presentation below Rob’s. Scroll up one. Sorry. It was that or use an incorrect link to put you at the right location.)
Other resources you may find of interest:
Ray Farber – Dr. Dobbs – Current article listing.
Hot-Rodding Windows and Linux App Performance with CUDA-Based Plugins by Rob Farber (with source code for Windows and Linux).
Ray Farber’s wiki: http://gpucomputing.net/ (Warning: The site seems to be flaky. If it doesn’t load, try again.)
OpenCL (Khronos)
Ray Farber’s Code Project tutorials:
- Part 1: OpenCL™ – Portable Parallelism
- Part 2: OpenCL™ – Memory Spaces
- Part 3: Work-Groups and Synchronization
- Part 4: Coordinating Computations with OpenCL Queues
- Part 5: OpenCL Buffers and Memory Affinity
- Part 6: Primitive Restart and OpenGL Interoperability
- Part 7: OpenCL plugins
- Part 8: Heterogeneous workflows using OpenCL
- Part 9: OpenCL Extensions and Device Fission
(Part 9 was published in February of 2012. Some updating may be necessary.)