Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 3, 2013

A multi-Teraflop Constituency Parser using GPUs

Filed under: GPU,Grammar,Language,Parsers,Parsing — Patrick Durusau @ 4:45 pm

A multi-Teraflop Constituency Parser using GPUs by John Canny, David Hall and Dan Klein.

Abstract:

Constituency parsing with rich grammars remains a computational challenge. Graphics Processing Units (GPUs) have previously been used to accelerate CKY chart evaluation, but gains over CPU parsers were modest. In this paper, we describe a collection of new techniques that enable chart evaluation at close to the GPU’s practical maximum speed (a Teraflop), or around a half-trillion rule evaluations per second. Net parser performance on a 4-GPU system is over 1 thousand length- 30 sentences/second (1 trillion rules/sec), and 400 general sentences/second for the Berkeley Parser Grammar. The techniques we introduce include grammar compilation, recursive symbol blocking, and cache-sharing.

Just in case you are interested in parsing “unstructured” data, mostly what they also call “texts.”

I first saw the link: BIDParse: GPU-accelerated natural language parser at hgup.org. Then I started looking for the paper. 😉

Towards GPU-Accelerated Large-Scale Graph Processing in the Cloud

Filed under: Cloud Computing,GPU,Graphs — Patrick Durusau @ 4:28 pm

Towards GPU-Accelerated Large-Scale Graph Processing in the Cloud by Jianlong Zhong and Bingsheng He.

Abstract:

Recently, we have witnessed that cloud providers start to offer heterogeneous computing environments. There have been wide interests in both cluster and cloud of adopting graphics processors (GPUs) as accelerators for various applications. On the other hand, large-scale processing is important for many data-intensive applications in the cloud. In this paper, we propose to leverage GPUs to accelerate large-scale graph processing in the cloud. Specifically, we develop an in-memory graph processing engine G2 with three non-trivial GPU-specific optimizations. Firstly, we adopt fine-grained APIs to take advantage of the massive thread parallelism of the GPU. Secondly, G2 embraces a graph partition based approach for load balancing on heterogeneous CPU/GPU architectures. Thirdly, a runtime system is developed to perform transparent memory management on the GPU, and to perform scheduling for an improved throughput of concurrent kernel executions from graph tasks. We have conducted experiments on a local cluster of three nodes and an Amazon EC2 virtual cluster of eight nodes. Our preliminary results demonstrate that 1) GPU is a viable accelerator for cloud-based graph processing, and 2) the proposed optimizations further improve the performance of GPU-based graph processing engine.

GPUs in the cloud anyone?

The future of graph computing isn’t clear but it certainly promises to be interesting!

I first saw this in a tweet by Stefano Bertolo

Mizan

Filed under: Graphs,MapReduce,Mizan,Pregel — Patrick Durusau @ 3:53 pm

Mizan

From the webpage:

What is Mizan?

Mizan is an advanced clone to Google’s graph processing system Pregel that utilizes online graph vertex migrations to dynamically optimizes the execution of graph algorithms. You can use our Mizan system to develop any vertex centric graph algorithm and run in parallel over a local cluster or over cloud infrastructure. Mizan is compatible with Pregel’s API, written in C++ and uses MPICH2 for communication. You can download a copy of Mizan and start using it today on your local machine or try Mizan on EC2. We also welcome programers who are interested to go deeper into our Mizan code to optimize or tweak.

Mizan is published in EuroSys 13 as “Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing“. We have an earlier work of Mizan as “Mizan: Optimizing Graph Mining in Large Parallel Systems“, which we recently changed it to Libra to avoid confusions. We show below the abstract for Mizan’s EuroSys 13 paper. We also include Mizan’s general architecture and its API available for users.

Abstract

Pregel was recently introduced as a scalable graph mining system that can provide significant performance improvements over traditional MapReduce implementations. Existing implementations focus primarily on graph partitioning as a preprocessing step to balance computation across compute nodes. In this paper, we examine the runtime characteristics of a Pregel system. We show that graph partitioning alone is insufficient for minimizing end-to-end computation. Especially where data is very large or the runtime behavior of the algorithm is unknown, an adaptive approach is needed. To this end, we introduce Mizan, a Pregel system that achieves efficient load balancing to better adapt to changes in computing needs. Unlike known implementations of Pregel, Mizan does not assume any a priori knowledge of the structure of the graph or behavior of the algorithm. Instead, it monitors the runtime characteristics of the system. Mizan then performs efficient fine-grained vertex migration to balance computation and communication. We have fully implemented Mizan; using extensive evaluation we show that—especially for highly-dynamic workloads— Mizan provides up to 84% improvement over techniques leveraging static graph pre-partitioning.

Post like this one make me want to build a local cluster at home. 😉

November 2, 2013

RICON West 2013! (video streams)

Filed under: Functional Programming,Riak — Patrick Durusau @ 5:59 pm

RICON West 2013! (video streams)

Presentation by presentation editing is underway but you can duplicate the conference experience at your own terminal!

Day 1 and 2, both tracks are ready for your viewing!

True, it’s not interactive but here you can pause the speaker while you take a call or answer email. 😉

Enjoy!

Use multiple CPU Cores with your Linux commands…

Filed under: Linux OS,Parallelism — Patrick Durusau @ 5:53 pm

Use multiple CPU Cores with your Linux commands — awk, sed, bzip2, grep, wc, etc.

From the post:

Here’s a common problem: You ever want to add up a very large list (hundreds of megabytes) or grep through it, or other kind of operation that is embarrassingly parallel? Data scientists, I am talking to you. You probably have about four cores or more, but our tried and true tools like grep, bzip2, wc, awk, sed and so forth are singly-threaded and will just use one CPU core. To paraphrase Cartman, “How do I reach these cores”? Let’s use all of our CPU cores on our Linux box by using GNU Parallel and doing a little in-machine map-reduce magic by using all of our cores and using the little-known parameter –pipes (otherwise known as –spreadstdin). Your pleasure is proportional to the number of CPUs, I promise. BZIP2 So, bzip2 is better compression than gzip, but it’s so slow! Put down the razor, we have the technology to solve this.

A very interesting post, particularly if you explore data with traditional Unix tools.

A comment to the post mentions that SSD is being presumed by the article.

Perhaps but learning the technique will be useful for when SSD is standard.

I first saw this in Pete Warden’s Five short links for Thursday, October 31, 2013.

Statistics Done Wrong

Filed under: Skepticism,Statistics — Patrick Durusau @ 4:29 pm

Statistics Done Wrong by Alex Reinhart.

From the post:

If you’re a practicing scientist, you probably use statistics to analyze your data. From basic t tests and standard error calculations to Cox proportional hazards models and geospatial kriging systems, we rely on statistics to give answers to scientific problems.

This is unfortunate, because most of us don’t know how to do statistics.

Statistics Done Wrong is a guide to the most popular statistical errors and slip-ups committed by scientists every day, in the lab and in peer-reviewed journals. Many of the errors are prevalent in vast swathes of the published literature, casting doubt on the findings of thousands of papers. Statistics Done Wrong assumes no prior knowledge of statistics, so you can read it before your first statistics course or after thirty years of scientific practice.

Dive in: the whole guide is available online!

Something to add to your data skeptic bag.

As a matter of fact, a summary of warning signs for these problems would fit on 81/2 by 11 (or A4) paper.

Thinking when you show up to examine a data set, you have Statistic Done Wrong with the web address on the back of your laminated cheat sheets.

Part of being a data skeptic is intuiting where to push so that the data “as presented” unravels.

I first saw this in Nat Torkington’s Four short links: 30 October 2013.

Principles of Reactive Programming [4th of November]

Filed under: Akka,Functional Programming,Scala — Patrick Durusau @ 4:08 pm

Principles of Reactive Programming [4th of November]

Just in case U.S. government intercepts either prevented you from getting the news or erased data from your calendar, just a reminder that Principles of Reactive Programming starts next Monday and runs for seven (7) weeks.

Even though I am signed up for another course, I am tempted to add this one. Unlikely as two courses is a bit much at one time.

But will be watching the lectures later to prepare for the next time.

Free Access to Standards…

Filed under: Standards — Patrick Durusau @ 4:01 pm

ANSI Launches Online Portal for Standards Incorporated by Reference

From the post:

The American National Standards Institute (ANSI) is proud to announce the official launch of the ANSI IBR Portal, an online tool for free, read-only access to voluntary consensus standards that have been incorporated by reference (IBR) into federal laws and regulations.

In recent years, issues related to IBR have commanded increased attention, particularly in connection to requirements that standards that have been incorporated into federal laws and regulations be “reasonably available” to the U.S. citizens and residents affected by these rules. This requirement had led some to call for the invalidation of copyrights for IBR standards. Others have posted copyrighted standards online without the permission of the organizations that developed them, triggering legal action from standards developing organizations (SDOs).

“In all of our discussions about the IBR issue, the question we are trying to answer is simple. Why aren’t standards free? In the context of IBR, it’s a valid point to raise,” said S. Joe Bhatia, ANSI president and CEO. “A standard that has been incorporated by reference does have the force of law, and it should be available. But the blanket statement that all IBR standards should be free misses a few important considerations.”

As coordinator of the U.S. standardization system, ANSI has taken a lead role in informing the public about the reality of free standards, the economics of standards setting, and how altering this infrastructure will undermine U.S. competitiveness. Specifically, the loss of revenue from the sale of standards could negatively impact the business model supporting many SDOs – potentially disrupting the larger U.S. and international standardization system, a major driver of innovation and economic growth worldwide. In response to concerns raised by ANSI members and partner organizations, government officials, and other stakeholders, ANSI began to develop its IBR Portal, with the goal of providing a single solution to this significant issue that also provides SDOs with the flexibility they require to safeguard their ability to develop standards.

This is “free” access to standards that have the force of law in the United States.

Whether it is meaningful access is something I will leave for you to consider in light of restrictions that prevent printing, copying, downloading or taking screenshots.

Particularly since some standards run many pages and are not easy documents to read.

I wonder if viewing these “free” standards disables your cellphone camera?

SDOs could be selling enhanced electronic versions, think XML versions that are interlinked together or linked into information systems and giving the PDFs away as advertising.

That would require using the standards others (not the SDOs who house such efforts) have labored so hard to produce.

The response I get to that suggestion has traditionally been: “Our staff doesn’t have the skills for that suggestion.”

I know how to fix that. Don’t you?

November 1, 2013

NSA FILES: DECODED

Filed under: NSA — Patrick Durusau @ 6:31 pm

NSA FILES: DECODED What the revelations mean for you.

From the story:

When Edward Snowden met journalists in his cramped room n Hong Kong’s Mira hotel in June, his mission was ambitious. Amid the clutter of laundry, meal trays and his four laptops, he wanted to start a debate about mass surveillance.

He succeeded beyond anything the journalists or Snowden himself ever imagined. His disclosures about the NSA resonated with Americans from day one. But they also exploded round the world.

For some, like Congresswoman Zoe Lofgren, it is a vitally important issue, one of the biggest of our time: nothing less than the defence of democracy in the digital age.

And it just keeps getting better the further you read in the story.

If you have trouble remembering all the various outrages of the NSA as they dribbled out over the past several months, this is a great summary of the leaks and the debates surrounding them.

Do keep in mind that surveillance has not slowed one bit nor is there any reason to think the NSA will obey any future restrictions.

« Newer Posts

Powered by WordPress