GPU « Another Word For It

December 5, 2012

Fast Parallel Sorting Algorithms on GPUs

Filed under: Algorithms,GPU,Parallel Programming,Sorting — Patrick Durusau @ 6:00 am

Fast Parallel Sorting Algorithms on GPUs by Bilal Jan, Bartolomeo Montrucchio, Carlo Ragusa, Fiaz Gul Khan, Omar Khan.

Abstract:

This paper presents a comparative analysis of the three widely used parallel sorting algorithms: OddEven sort, Rank sort and Bitonic sort in terms of sorting rate, sorting time and speed-up on CPU and different GPU architectures. Alongside we have implemented novel parallel algorithm: min-max butterfly network, for finding minimum and maximum in large data sets. All algorithms have been implemented exploiting data parallelism model, for achieving high performance, as available on multi-core GPUs using the OpenCL specification. Our results depicts minimum speed-up19x of bitonic sort against oddeven sorting technique for small queue sizes on CPU and maximum of 2300x speed-up for very large queue sizes on Nvidia Quadro 6000 GPU architecture. Our implementation of full-butterfly network sorting results in relatively better performance than all of the three sorting techniques: bitonic, odd-even and rank sort. For min-max butterfly network, our findings report high speed-up of Nvidia quadro 6000 GPU for high data set size reaching 224 with much lower sorting time.

Is there a GPU in your topic map processing future?

I first saw this in a tweet by Stefano Bertolo.

Comments Off

November 14, 2012

Efficient similarity search on multimedia databases [No Petraeus Images, Yet]

Filed under: GPU,Multimedia,Searching,Similarity — Patrick Durusau @ 11:42 am

Efficient similarity search on multimedia databases by Mariela Lopresti, Natalia Miranda, Fabiana Piccoli, Nora Reyes.

Abstract:

Manipulating and retrieving multimedia data has received increasing attention with the advent of cloud storage facilities. The ability of querying by similarity over large data collections is mandatory to improve storage and user interfaces. But, all of them are expensive operations to solve only in CPU; thus, it is convenient to take into account High Performance Computing (HPC) techniques in their solutions. The Graphics Processing Unit (GPU) as an alternative HPC device has been increasingly used to speedup certain computing processes. This work introduces a pure GPU architecture to build the Permutation Index and to solve approximate similarity queries on multimedia databases. The empirical results of each implementation have achieved different level of speedup which are related with characteristics of GPU and the particular database used.

No images have been published, yet, in the widening scandal around David Petraeus.

When they do, searching multimedia databases such as Flickr, Facebook, YouTube and others will be a hot issue.

Once found, there is the problem of finding unique ones again and duplicates not again.

Comments Off

November 8, 2012

hgpu.org

Filed under: GPU,HPC — Patrick Durusau @ 3:11 pm

hgpu.org – high performance computing on graphics processing units

Wealth of GPU computing resources. Will take days to explore fully (if then).

Highest level view:

Applications – Where it’s used
Hardware – Specs and reviews
Programming – Algorithms and techniques
Resources – Source Code, tutorials, books, etc.
Tools – GPU Sources

Homepage is rather “busy” but packed with information (as opposed to gadgets). Lists the most recent entries, most viewed papers, most recent source code and events.

One special item to note:

Free GPU computing node at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

Oh, did I mention that registration is free?

If you don’t get a multi-GPU unit under the Christmas tree, you can still hum along.

Comments Off

Efficient implementation of data flow graphs on multi-gpu clusters

Filed under: GPU,HPC — Patrick Durusau @ 2:51 pm

Efficient implementation of data flow graphs on multi-gpu clusters by Vincent Boulos, Sylvain Huet, Vincent Fristot, Luc Salvo and Dominique Houzet.

Abstract:

Nowadays, it is possible to build a multi-GPU supercomputer, well suited for implementation of digital signal processing algorithms, for a few thousand dollars. However, to achieve the highest performance with this kind of architecture, the programmer has to focus on inter-processor communications, tasks synchronization. In this paper, we propose a high level programming model based on a data flow graph (DFG) allowing an efficient implementation of digital signal processing applications on a multi-GPU computer cluster. This DFG-based design flow abstracts the underlying architecture. We focus particularly on the efficient implementation of communications by automating computation-communication overlap, which can lead to significant speedups as shown in the presented benchmark. The approach is validated on three experiments: a multi-host multi-gpu benchmark, a 3D granulometry application developed for research on materials and an application for computing visual saliency maps.

Analysis of the statistics of sizes in images (granulometry) and focusing on a particular place of interest in an image (visual saliency) were interesting use cases.

May or may not be helpful in particular cases, depending on your tests for subject identity.

Comments Off

August 19, 2012

Java for graphics cards

Filed under: GPU,Java — Patrick Durusau @ 1:22 pm

Java for graphics cards

From the post:

Phil Pratt-Szeliga, a postgraduate at Syracuse University in New York, has released the source code of his Rootbeer GPU compiler on Github. The developer presented the software at the High Performance Computing and Communication conference in Liverpool in June. The slides from this presentation can be found in the documentation section of the Github directory.
…

Short summary of Phil Pratt-Szeliga’s GPU compiler.

Is it a waste to have GPU cycles lying around or is there some more fundamental issue at stake?

To what degree does chip architecture drive choices at higher levels of abstraction?

Suggestions of ways to explore that question?

Comments Off

August 6, 2012

Writing a modular GPGPU program in Java

Filed under: CUDA,GPU,Java — Patrick Durusau @ 4:05 pm

Writing a modular GPGPU program in Java by Masayuki Ioki, Shumpei Hozumi, and Shigeru Chiba.

Abstract:

This paper proposes a Java to CUDA runtime program translator for scientific-computing applications. Traditionally, these applications have been written in Fortran or C without using a rich modularization mechanism. Our translator enables those applications to be written in Java and run on GPGPUs while exploiting a rich modularization mechanism in Java. This translator dynamically generates optimized CUDA code from a Java program given at bytecode level when the program is running. By exploiting dynamic type information given at translation, the translator devirtualizes dynamic method dispatches and flattens objects into simple data representation in CUDA. To do this, a Java program must be written to satisfy certain constraints.

This paper also shows that the performance overheads due to Java and WootinJ are not significantly high.

Just in case you are starting to work on topic map processing routines for GPGPUs.

Something to occupy your time during the “dog days” of August.

Comments Off

May 18, 2012

Cloud-Hosted GPUs And Gaming-As-A-Service

Filed under: Games,GPU,NVIDIA — Patrick Durusau @ 4:24 pm

Cloud-Hosted GPUs And Gaming-As-A-Service by Humayun

From the post:

NVIDIA is all buckled up to redefine the dynamics of gaming. The company has spilled the beans over three novel cloud technologies aimed at accelerating the available remote computational power by endorsing the number-crunching potential of its very own (and redesigned) graphical processing units.

At the heart of each of the three technologies lies the latest Kepler GPU architecture, custom-tailored for utility in volumetric datacenters. Through virtualization software, a number of users achieve access through the cutting-edge computational capability of the GPUs.

Jen-Hsun Huang, NVIDIA’s president and CEO, firmly believes that the Kepler cloud GPU technology is bound to take cloud computing to an entirely new level. He advocates that the GPU has become a significant constituent of contemporary computing devices. Digital artists are essentially dependent upon the GPU for conceptualizing their thoughts. Touch devices owe a great deal to the GPU for delivering a streamlined graphical experience.

With the introduction of the cloud GPU, NVIDIA is all set to change the game—literally. NVIDIA’s cloud-based GPU will bring an amazingly pleasant experience to gamers on a hunt to play in an untethered manner from a console or personal computer.

First in line is the NVIDIA VGX platform, an enterprise-level execution of the Kepler cloud technologies, primarily targeting virtualized desktop performance boosts. The company is hopeful that ventures will make use of this particular platform to ensure flawless remote computing and cater to the most computationally starved applications to be streamed directly to a notebook, tablet or any other mobile device variant. Jeff Brown, GM at NVIDIA’s Professional Solutions Group, is reported to have marked the VGX as the starting point for a “new era in desktop virtualization” that promises a cost-effective virtualization solution offering “an experience almost indistinguishable from a full desktop”.

Results with GPUs have been encouraging and spreading their availability as a cloud-based GPU should lead to a wider variety of experiences.

The emphasis here is making the lives of gamers more pleasant but one expects serious uses, such as graph processing, to not be all that far behind.

Comments Off

April 3, 2012

Ohio State University Researcher Compares Parallel Systems

Filed under: Cray,GPU,HPC,Parallel Programming,Parallelism — Patrick Durusau @ 4:18 pm

Ohio State University Researcher Compares Parallel Systems

From the post:

Surveying the wide range of parallel system architectures offered in the supercomputer market, an Ohio State University researcher recently sought to establish some side-by-side performance comparisons.

The journal, Concurrency and Computation: Practice and Experience, in February published, “Parallel solution of the subset-sum problem: an empirical study.” The paper is based upon a master’s thesis written last year by former computer science and engineering graduate student Saniyah Bokhari.

“We explore the parallelization of the subset-sum problem on three contemporary but very different architectures, a 128-processor Cray massively multithreaded machine, a 16-processor IBM shared memory machine, and a 240-core NVIDIA graphics processing unit,” said Bokhari. “These experiments highlighted the strengths and weaknesses of these architectures in the context of a well-defined combinatorial problem.”

Bokhari evaluated the conventional central processing unit architecture of the IBM 1350 Glenn Cluster at the Ohio Supercomputer Center (OSC) and the less-traditional general-purpose graphic processing unit (GPGPU) architecture, available on the same cluster. She also evaluated the multithreaded architecture of a Cray Extreme Multithreading (XMT) supercomputer at the Pacific Northwest National Laboratory’s (PNNL) Center for Adaptive Supercomputing Software.

What I found fascinating about this approach was the comparison of:

the strengths and weaknesses of these architectures in the context of a well-defined combinatorial problem.

True enough, there is a place for general methods and solutions, but one pays the price for using general methods and solutions.

Thinking that for subject identity and “merging” in a “big data” context, that we will need a deeper understanding of specific identity and merging requirements. So that the result of that study is one or more well-defined combinatorial problems.

That is to say that understanding one or more combinatorial problems precedes proposing a solution.

You can view/download the thesis by Saniyah Bokhari, Parallel Solution of the Subset-sum Problem: An Empirical Study

Or view the article (assuming you have access):

Parallel solution of the subset-sum problem: an empirical study

Abstract (of the article):

The subset-sum problem is a well-known NP-complete combinatorial problem that is solvable in pseudo-polynomial time, that is, time proportional to the number of input objects multiplied by the sum of their sizes. This product defines the size of the dynamic programming table used to solve the problem. We show how this problem can be parallelized on three contemporary architectures, that is, a 128-processor Cray Extreme Multithreading (XMT) massively multithreaded machine, a 16-processor IBM x3755 shared memory machine, and a 240-core NVIDIA FX 5800 graphics processing unit (GPU). We show that it is straightforward to parallelize this algorithm on the Cray XMT primarily because of the word-level locking that is available on this architecture. For the other two machines, we present an alternating word algorithm that can implement an efficient solution. Our results show that the GPU performs well for problems whose tables fit within the device memory. Because GPUs typically have memories in the order of 10GB, such architectures are best for small problem sizes that have tables of size approximately 1010. The IBM x3755 performs very well on medium-sized problems that fit within its 64-GB memory but has poor scalability as the number of processors increases and is unable to sustain performance as the problem size increases. This machine tends to saturate for problem sizes of 1011 bits. The Cray XMT shows very good scaling for large problems and demonstrates sustained performance as the problem size increases. However, this machine has poor scaling for small problem sizes; it performs best for problem sizes of 1012 bits or more. The results in this paper illustrate that the subset-sum problem can be parallelized well on all three architectures, albeit for different ranges of problem sizes. The performance of these three machines under varying problem sizes show the strengths and weaknesses of the three architectures. Copyright © 2012 John Wiley & Sons, Ltd.

Comments Off

January 31, 2012

Accelerating SQL Database Operations on a GPU with CUDA (merging spreadsheet data?)

Filed under: CUDA,GPU,Spreadsheets,SQL,SQLite — Patrick Durusau @ 4:33 pm

Accelerating SQL Database Operations on a GPU with CUDA by Peter Bakkum and Kevin Skadron.

Abstract:

Prior work has shown dramatic acceleration for various database operations on GPUs, but only using primitives that are not part of conventional database languages such as SQL. This paper implements a subset of the SQLite command processor directly on the GPU. This dramatically reduces the effort required to achieve GPU acceleration by avoiding the need for database programmers to use new programming languages such as CUDA or modify their programs to use non-SQL libraries.

This paper focuses on accelerating SELECT queries and describes the considerations in an efficient GPU implementation of the SQLite command processor. Results on an NVIDIA Tesla C1060 achieve speedups of 20-70X depending on the size of the result set.

Important lessons to be learned from this paper:

Don’t invent new languages for the average user to learn.
Avoid the need to modify existing programs
Write against common software

Remember that 75% of the BI market is still using spreadsheets. For all sorts of data but numeric data in particular.

I don’t have any experience with importing files into Excel but I assume there is a macro language that can used to create import processes.

Curious if there has been any work on creating import macros for Excel that incorporate merging as part of those imports?

That would:

Not be a new language for users to learn.
Avoid modification of existing programs (or data)
Be written against common software

I am not sure about the requirements for merging numeric data but that should make the exploration process all the more enjoyable.

Comments Off

PGStrom (PostgreSQL + GPU)

Filed under: CUDA,GPU,PGStrom,PostgreSQL — Patrick Durusau @ 4:32 pm

PGStrom

From the webpage:

PG-Strom is a module of FDW (foreign data wrapper) of PostgreSQL database. It was designed to utilize GPU devices to accelarate sequential scan on massive amount of records with complex qualifiers. Its basic concept is CPU and GPU should focus on the workload with their advantage, and perform concurrently. CPU has much more flexibility, thus, it has advantage on complex stuff such as Disk-I/O, on the other hand, GPU has much more parallelism of numerical calculation, thus, it has advantage on massive but simple stuff such as check of qualifiers for each rows.

The below figure is a basic concept of PG-Strom. Now, on sequential scan workload, vanilla PostgreSQL does iteration of fetch a tuple and checks of qualifiers for each tuples. If we could consign GPU the workload of green portion, it enables to reduce workloads of CPU, thus, it shall be able to load more tuples in advance. Eventually, it should allow to provide shorter response-time on complex queries towards large amount of data.

Requires setting up the table for the GPU ahead of time but performance increase is reported to be 10x – 20x.

It occurs to me that GPUs should be well suited for graph processing. Yes? Will have to look into that and report back.

Comments (1)

January 30, 2012

Ålenkå

Filed under: Benchmarks,Database,GPU — Patrick Durusau @ 8:02 pm

Ålenkå

If you don’t mind alpha code, ålenkå was pointed out in the bitmap posting I cited earlier today.

From its homepage:

Alenka is a modern analytical database engine written to take advantage of vector based processing and high bandwidth of modern GPUs.

Features include:

Vector-based processing
CUDA programming model allows a single operation to be applied to an entire set of data at once.

Self optimizing compression
Ultra fast compression and decompression performed directly inside GPU

Column-based storage
Minimize disk I/O by only accessing the relevant data

Fast database loads
Data load times measured in minutes, not in hours.

Open source and free

Apologies for the name spelling differences, Ålenkå versus Alenka. I suspect it has something to do with character support in whatever produced the readme file, but can’t say for sure.

The benchmarks (there is that term again) are impressive.

Would semantic benchmarks be different from the ones used in IR currently? Different from precision and recall? What about range (same subject but identified differently) or accuracy (different identifications but same subject, how many false positives)?

Comments Off

December 2, 2011

rCUDA

Filed under: CUDA,GPU — Patrick Durusau @ 4:53 pm

rCUDA

From the post:

We are glad to announce the new version 3.1 of rCUDA. It has been developed in a joint collaboration with the Parallel Architectures Group from the Technical University of Valencia.

The rCUDA framework enables the concurrent usage of CUDA-compatible devices remotely.

rCUDA employs the socket API for the communication between clients and servers. Thus, it can be useful in three different environments:

Clusters. To reduce the number of GPUs installed in High Performance Clusters. This leads to increased GPU usage and therefore energy savings as well as other related savings like acquisition costs, maintenance, space, cooling, etc.

Academia. In commodity networks, to offer access to a few high performance GPUs concurrently to many students.

Virtual Machines. To enable the access to the CUDA facilities on the physical machine.

The current version of rCUDA (v3.1) implements most of the functions in the CUDA Runtime API version 4.0, excluding only those related with graphics interoperability. rCUDA 3.1 targets the Linux OS (for 32- and 64-bit architectures) on both client and server sides.

This was mentioned in the Letting GPUs run free post but I thought it merited a separate entry. This is very likely to be important.

Comments Off

Letting GPUs run free

Filed under: CUDA,GPU — Patrick Durusau @ 4:51 pm

Letting GPUs run free by Dan Olds.

From the post:

One of the most interesting things I saw at SC11 was a joint Mellanox and University of Valencia demonstration of rCUDA over Infiniband. With rCUDA, applications can access a GPU (or multiple GPUs) on any other node in the cluster. It makes GPUs a sharable resource and is a big step towards making them as virtualisable (I don’t think that’s a word, but going to go with it anyway) as any other compute resource.

There aren’t a lot of details out there yet, there’s this press release from Mellanox and Valencia and this explanation of the rCUDA project.

This is a big deal. To me, the future of computing will be much more heterogeneous and hybrid than homogeneous and, well, some other word that means ‘common’ and begins with ‘H’. We’re moving into a mindset where systems are designed to handle particular workloads, rather than workloads that are modified to run sort of well on whatever systems are cheapest per pound or flop.

Comments (1)

December 1, 2011

GPUs: Path into the future

Filed under: GPU — Patrick Durusau @ 7:37 pm

GPUs: Path into the future

From the introduction:

With the announcement of a new Blue Waters petascale system that includes a considerable amount of GPU capability, it is clear GPUs are the future of supercomputing. Access magazine’s Barbara Jewett recently sat down with Wen-mei Hwu, a professor of electrical and computer engineering at the University of Illinois, a co-principal investigator on the Blue Waters project, and an expert in computer architecture, especially GPUs.

Find out why you should start thinking about GPU systems, now.

For more information, see the Blue Waters project.

Comments Off

November 29, 2011

A Common GPU n-Dimensional Array for Python and C

Filed under: GPU,Python — Patrick Durusau @ 8:40 pm

A Common GPU n-Dimensional Array for Python and C by Frédéric Bastien, Arnaud Bergeron, Pascal Vincent and Yoshua Bengio

From the webpage:

Currently there are multiple incompatible array/matrix/n-dimensional base object implementations for GPUs. This hinders the sharing of GPU code and causes duplicate development work. This paper proposes and presents a first version of a common GPU n-dimensional array(tensor) named GpuNdArray that works with both CUDA and OpenCL. It will be usable from python, C and possibly other languages.

Apologies, all I can give you today is a pointer to the accepted papers for Big Learning, Day 1, first paper, which promises a PDF soon.

I didn’t check the PDF link yesterday when I saw it. My bad.

Anyway, there are a lot of other interesting papers at this site and I will update this entry when this paper appears. The conference is December 16-17, 2011 so it may not be too long of a wait.

Comments Off

September 17, 2011

Nvidia Research

Filed under: GPU — Patrick Durusau @ 8:12 pm

Nvidia Research

Nvidia has a number of programs for working with academic institutions and researchers. I got an email today extolling several new research centers mostly with projects in the hard sciences.

Please spread the call for research projects with GPUs in the difficult sciences. Social sciences and humanities in general.

For example, consider the analysis in How to kill a patent with Python. You discover two very different words for the same thing. You use topics to record they represent the same subject and the graphic display changes in real time to add or subtract patents of interest. And to add or subtract relationships to other patents, patent holders, parties of interest, non-patent literature, etc. Dynamic analysis where your insights change and evolve as you explore the patents. With the ability to roll-back to any point in your journey.

That is the power of very high-end processing and GPUs, such as those from Nvidia, are one way to get there.

BTW, there is an awesome collection of materials from academics already available at this location.

You could be the first person in your department/institution to publish on topic maps using Nvidia GPUs!

Comments Off

September 7, 2011

An Open Source Platform for Virtual Supercomputing

Filed under: Cloud Computing,Erlang,GPU,Supercomputing — Patrick Durusau @ 6:55 pm

An Open Source Platform for Virtual Supercomputing, Michael Feldman reports:

Erlang Solutions and Massive Solutions will soon launch a new cloud platform for high performance computing. Last month they announced their intent to bring a virtual supercomputer (VSC) product to market, the idea being to enable customers to share their HPC resources either externally or internally, in a cloud-like manner, all under the banner of open source software.

…

The platform will be based on Clustrx and Xpandrx, two HPC software operating systems that were the result of several years of work done by Erlang Solutions, based in the UK, and Massive Solutions, based in Gibraltar. Massive Solutions has been the driving force behind the development of these two OS’s, using Erlang language technology developed by its partner.

In a nutshell, Clustrx is an HPC operating system, or more accurately, middleware, which sits atop Linux, providing the management and monitoring functions for supercomputer clusters. It is run on its own small server farm of one or more nodes, which are connected to the compute servers that make up the HPC cluster. The separation between management and compute enables it to support all the major Linux distros as well as Windows HPC Server. There is a distinct Clustrx-based version of Linux for the compute side as well, called Compute Based Linux.

A couple of things to note from within the article:

The only limitation to this model is its dependency on the underlying capabilities of Linux. For example, although Xpandrx is GPU-aware, since GPU virtualization is not yet supported in any Linux distros, the VSC platform can’t support virtualization of those resources. More exotic HPC hardware technology would, likewise, be out of the virtual loop.

The common denominator for VSC is Erlang, not just the company, but the language http://www.erlang.org/, which is designed for programming massively scalable systems. The Erlang runtime has built-in to support for things like concurrency, distribution and fault tolerance. As such, it is particularly suitable for HPC system software and large-scale interprocess communication, which is why both Clustrx and Xpandrx are implemented in the language.

As computing power and access to computing power increases, have you seen an increase in robust (in your view) topic map applications?

Comments Off

July 28, 2011

MATLAB GPU / CUDA experiences

Filed under: CUDA,GPU,Mathematics,Parallel Programming — Patrick Durusau @ 6:57 pm

MATLAB GPU / CUDA experiences and tutorials on my laptop – Introduction

From the post:

These days it seems that you can’t talk about scientific computing for more than 5 minutes without somone bringing up the topic of Graphics Processing Units (GPUs). Originally designed to make computer games look pretty, GPUs are massively parallel processors that promise to revolutionise the way we compute.

A brief glance at the specification of a typical laptop suggests why GPUs are the new hotness in numerical computing. Take my new one for instance, a Dell XPS L702X, which comes with a Quad-Core Intel i7 Sandybridge processor running at up to 2.9Ghz and an NVidia GT 555M with a whopping 144 CUDA cores. If you went back in time a few years and told a younger version of me that I’d soon own a 148 core laptop then young Mike would be stunned. He’d also be wondering ‘What’s the catch?’

Parallel computing has been around for years but in the form of GPUs it has reached the hands of hackers and innovators. Will your next topic map application take advantage of parallel processing?

Comments Off

June 20, 2011

Massively Parallel Database Startup to Reap Multicore Dividends

Filed under: GPU,NoSQL — Patrick Durusau @ 3:36 pm

Massively Parallel Database Startup to Reap Multicore Dividends

From the post:

The age of multicore couldn’t have come at a better time. Plagued with mounting datasets and a need for suitable architectures to contend with them, organizations are feeling the resource burden in every database corner they look.

German startup Parstream claims its found an answer to those problems. The company has come up with a solution to harness the power of GPU computing at the dawn of the manycore plus big data day–and it just might be onto something.

The unique element in ParStream’s offering is that it is able to exploit the coming era of multicore architectures, meaning that it will be able to deliver results faster with lower resource usage. This is in addition to its claim that it can eliminate the need for data decompression entirely, which if it is proven to be the case when their system is available later this summer, could change the way we think about system utilization when performing analytics on large data sets.

ParStream is appearing as an exhibitor at: ISC’11 Supercomputing Conference 2011, June 20 – 22, 2011. I can’t make the conference but am interested in your reactions to the promised demos.

Comments Off

June 6, 2011

GTC 2012

Filed under: Conferences,GPU,Graphic Processors — Patrick Durusau @ 1:59 pm

GTC (GPU Technology Conference) 2012

Important Dates

GTC 2012 in San Jose, May 14-17, 2012

Session proposals has closed but posters proposals is open until June 27, 2011. Both will re-open September 27, 2011.

From the website:

GTC advances awareness of high performance computing, and connects the scientists, engineers, researchers, and developers who use GPUs to tackle enormous computational challenges.

GTC 2012 will feature the latest breakthroughs and the most amazing content in GPU-enabled computing. Spanning 4 full days of world-class education delivered by some of the greatest minds in GPU computing, GTC will showcase the dramatic impact that parallel computing is having on scientific research and commercial applications.

BTW, hundreds of hours of video is available from GTC 2010 at this website.

If you are concerned with scaling topic maps and other semantic technologies or just high performance computing in general, the 2010 recordings look like a good place to start while awaiting the 2012 conference.

Comments Off

« Newer Posts

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 5, 2012

November 14, 2012

November 8, 2012

August 19, 2012

August 6, 2012

May 18, 2012

April 3, 2012

January 31, 2012

January 30, 2012

December 2, 2011

December 1, 2011

November 29, 2011

September 17, 2011

September 7, 2011

July 28, 2011

June 20, 2011

June 6, 2011