Archive for the ‘NVIDIA’ Category

New Nvidia Resources – Data Science Bowl [Topology and Aligning Heart Images?]

Thursday, January 28th, 2016

New Resources Available to Help Participants by Pauline Essalou.

From the post:

Hungry for more help? NVIDIA can feed your passion and fuel your progress.

The free course includes lecture recordings and hands-on exercises. You’ll learn how to design, train, and integrate neural network-powered artificial intelligence into your applications using widely-used open source frameworks and NVIDIA software.

Visit NVIDIA at:

For access to the hands-on labs for free, you’ll need to register, using the promo code KAGGLE, at:

With weeks to go until the March 7 stage one deadline and stage two data release deadline, there’s still plenty of time for participants to take advantage of these tools and continue to submit solutions. Visit the Data Science Bowl Resources page for a complete listing of free resources.

If you aren’t already competing, the challenge in brief:

Declining cardiac function is a key indicator of heart disease. Doctors determine cardiac function by measuring end-systolic and end-diastolic volumes (i.e., the size of one chamber of the heart at the beginning and middle of each heartbeat), which are then used to derive the ejection fraction (EF). EF is the percentage of blood ejected from the left ventricle with each heartbeat. Both the volumes and the ejection fraction are predictive of heart disease. While a number of technologies can measure volumes or EF, Magnetic Resonance Imaging (MRI) is considered the gold standard test to accurately assess the heart’s squeezing ability.

The challenge with using MRI to measure cardiac volumes and derive ejection fraction, however, is that the process is manual and slow. A skilled cardiologist must analyze MRI scans to determine EF. The process can take up to 20 minutes to complete—time the cardiologist could be spending with his or her patients. Making this measurement process more efficient will enhance doctors’ ability to diagnose heart conditions early, and carries broad implications for advancing the science of heart disease treatment.

The 2015 Data Science Bowl challenges you to create an algorithm to automatically measure end-systolic and end-diastolic volumes in cardiac MRIs. You will examine MRI images from more than 1,000 patients. This data set was compiled by the National Institutes of Health and Children’s National Medical Center and is an order of magnitude larger than any cardiac MRI data set released previously. With it comes the opportunity for the data science community to take action to transform how we diagnose heart disease.

This is not an easy task, but together we can push the limits of what’s possible. We can give people the opportunity to spend more time with the ones they love, for longer than ever before. (From:

Unlike the servant with the one talent, Nvidia isn’t burying its talent under a basket. It is spreading access to its information as far as possible, in contrast to editorial writers at the New England Journal of Medicine.

Care to guess who is going to have the greater impact on cardiology and medicine?

I forgot to mention that Nietzsche described the editorial page writers of the New England Journal of Medicine quite well when he said, “…they tell the proper time and make a modest noise when doing so….” (Of Scholars).

I first saw this in a tweet by Kirk D. Borne.

PS: Kirk pointed to Image Preprocessing: The Challenges and Approach by Peter VanMaasdam today.

Are you surprised that the data is dirty? 😉

I’m not a professional mathematicians but what if you created a common topology for hearts and then treated the different measurements for each one as dimensions?

I say that having recently read: Quantum algorithms for topological and geometric analysis of data by Seth Lloyd, Silvano Garnerone & Paolo Zanardi. Nature Communications 7, Article number: 10138 doi:10.1038/ncomms10138, Published 25 January 2016.

Whether you have a quantum computer or not, given the small size of the heart data set, some of those methods might be applicable.

Unless my memory fails me, the entire GPU Gems series in online at Nvidia and has several chapters on topological methods.

Good luck!

DIGITS: Deep Learning GPU Training System

Friday, March 20th, 2015

DIGITS: Deep Learning GPU Training System by Allison Gray.

From the post:

The hottest area in machine learning today is Deep Learning, which uses Deep Neural Networks (DNNs) to teach computers to detect recognizable concepts in data. Researchers and industry practitioners are using DNNs in image and video classification, computer vision, speech recognition, natural language processing, and audio recognition, among other applications.

The success of DNNs has been greatly accelerated by using GPUs, which have become the platform of choice for training these large, complex DNNs, reducing training time from months to only a few days. The major deep learning software frameworks have incorporated GPU acceleration, including Caffe, Torch7, Theano, and CUDA-Convnet2. Because of the increasing importance of DNNs in both industry and academia and the key role of GPUs, last year NVIDIA introduced cuDNN, a library of primitives for deep neural networks.

Today at the GPU Technology Conference, NVIDIA CEO and co-founder Jen-Hsun Huang introduced DIGITS, the first interactive Deep Learning GPU Training System. DIGITS is a new system for developing, training and visualizing deep neural networks. It puts the power of deep learning into an intuitive browser-based interface, so that data scientists and researchers can quickly design the best DNN for their data using real-time network behavior visualization. DIGITS is open-source software, available on GitHub, so developers can extend or customize it or contribute to the project.

Apologies for the delay in seeing Allison’s post but at least I saw it before the weekend!

In addition to a great write-up, Allison walks through how she has used DIGITS. In terms of “onboarding” to software, it doesn’t get any better than this.

What are you going to apply DIGITS to?

I first saw this in a tweet by Christian Rosnes.

How to run the Caffe deep learning vision library…

Wednesday, October 29th, 2014

How to run the Caffe deep learning vision library on Nvidia’s Jetson mobile GPU board by Pete Warden.

From the post:

Jetson boardPhoto by Gareth Halfacree

My colleague Yangqing Jia, creator of Caffe, recently spent some free time getting the framework running on Nvidia’s Jetson board. If you haven’t heard of the Jetson, it’s a small development board that includes Nvidia’s TK1 mobile GPU chip. The TK1 is starting to appear in high-end tablets, and has 192 cores so it’s great for running computational tasks like deep learning. The Jetson’s a great way to get a taste of what we’ll be able to do on mobile devices in the future, and it runs Ubuntu so it’s also an easy environment to develop for.

Caffe comes with a pre-built ‘Alexnet’ model, a version of the Imagenet-winning architecture that recognizes 1,000 different kinds of objects. Using this as a benchmark, the Jetson can analyze an image in just 34ms! Based on this table I’m estimating it’s drawing somewhere around 10 or 11 watts, so it’s power-intensive for a mobile device but not too crazy.

Yangqing passed along his instructions, and I’ve checked them on my own Jetson, so here’s what you need to do to get Caffe up and running.

Hardware fun for the middle of your week!

192 cores for under $200, plus GPU experience.

Accelerate Machine Learning with cuDNN Deep Neural Network Library

Monday, September 8th, 2014

Accelerate Machine Learning with the cuDNN Deep Neural Network Library by Larry Brown.

From the post:

Introducing cuDNN

NVIDIA cuDNN is a GPU-accelerated library of primitives for DNNs. It provides tuned implementations of routines that arise frequently in DNN applications, such as:

  • convolution
  • pooling
  • softmax
  • neuron activations, including:
    • Sigmoid
    • Rectified linear (ReLU)
    • Hyperbolic tangent (TANH)

Of course these functions all support the usual forward and backward passes. cuDNN’s convolution routines aim for performance competitive with the fastest GEMM-based (matrix multiply) implementations of such routines while using significantly less memory.

cuDNN features customizable data layouts, supporting flexible dimension ordering, striding and subregions for the 4D tensors used as inputs and outputs to all of its routines. This flexibility allows easy integration into any neural net implementation and avoids the input/output transposition steps sometimes necessary with GEMM-based convolutions.

cuDNN is thread safe, and offers a context-based API that allows for easy multithreading and (optional) interoperability with CUDA streams. This allows the developer to explicitly control the library setup when using multiple host threads and multiple GPUs, and ensure that a particular GPU device is always used in a particular host thread (for example).

cuDNN allows DNN developers to easily harness state-of-the-art performance and focus on their application and the machine learning questions, without having to write custom code. cuDNN works on Windows or Linux OSes, and across the full range of NVIDIA GPUs, from low-power embedded GPUs like Tegra K1 to high-end server GPUs like Tesla K40. When a developer leverages cuDNN, they can rest assured of reliable high performance on current and future NVIDIA GPUs, and benefit from new GPU features and capabilities in the future.

I didn’t quote the background and promotional material on machine learning or deep neural networks (DNN’s), assuming that if you are interested at all, you will read the original post to pick up that material. Attention has been paid to making cuDNN “easy” to use. “Easy” is a relative term but I think you will appreciate the effort.

BTW, cuDNN is free for any purpose but does require you to have a registered CUDA developer account. If you are already a registered CUDA developer or after you are, see:

Caffe, a deep learning framework, has support for cuDNN in its current development branch.

I first saw this in a tweet by Mark Harris.

Jetson TK1:… [$192.00]

Friday, April 4th, 2014

Jetson TK1: Mobile Embedded Supercomputer Takes CUDA Everywhere by Mark Harris.

From the post:

Jetson TK1 is a tiny but full-featured computer designed for development of embedded and mobile applications. Jetson TK1 is exciting because it incorporates Tegra K1, the first mobile processor to feature a CUDA-capable GPU. Jetson TK1 brings the capabilities of Tegra K1 to developers in a compact, low-power platform that makes development as simple as developing on a PC.

Tegra K1 is NVIDIA’s latest mobile processor. It features a Kepler GPU with 192 cores, an NVIDIA 4-plus-1 quad-core ARM Cortex-A15 CPU, integrated video encoding and decoding support, image/signal processing, and many other system-level features. The Kepler GPU in Tegra K1 is built on the same high-performance, energy-efficient Kepler GPU architecture that is found in our high-end GeForce, Quadro, and Tesla GPUs for graphics and computing. That makes it the only mobile processor today that supports CUDA 6 for computing and full desktop OpenGL 4.4 and DirectX 11 for graphics.

Tegra K1 is a parallel processor capable of over 300 GFLOP/s of 32-bit floating point computation. Not only is that a huge achievement in a processor with such a low power footprint (Tegra K1 power consumption is in the range of 5 Watts for real workloads), but K1′s support for CUDA and desktop graphics APIs means that much of your existing compute and graphics software will compile and run largely as-is on this platform.

Are you old enough to remember looking at the mini-computers on the back of most computer zines?

And then sighing at the price tag?

Times have changed!

Order Jetson TK1 Now, just $192

Jetson TK1 is available to pre-order today for $192. In the United States, it is available from the NVIDIA website, as well as and Micro Center. See the Jetson TK1 page for details on international orders.

Some people, General Clapper comes to mind, use supercomputers to mine dots that are already connected together (phone data).

Other people, create algorithms to assist users in connecting dots between diverse and disparate data sources.

You know who my money is riding on.


CUDA 6, Available as Free Download, …

Wednesday, March 5th, 2014

CUDA 6, Available as Free Download, Makes Parallel Programming Easier, Faster by George Millington.

From the post:

We’re always striving to make parallel programming better, faster and easier for developers creating next-gen scientific, engineering, enterprise and other applications.

With the latest release of the CUDA parallel programming model, we’ve made improvements in all these areas.

Available now to all developers on the CUDA website, the CUDA 6 Release Candidate is packed with several new features that are sure to please developers.

A few highlights:

  • Unified Memory – This major new feature lets CUDA applications access CPU and GPU memory without the need to manually copy data from one to the other. This is a major time saver that simplifies the programming process, and makes it easier for programmers to add GPU acceleration in a wider range of applications.
  • Drop-in Libraries – Want to instantly accelerate your application by up to 8X? The new drop-in libraries can automatically accelerate your BLAS and FFTW calculations by simply replacing the existing CPU-only BLAS or FFTW library with the new, GPU-accelerated equivalent.
  • Multi-GPU Scaling – Re-designed BLAS and FFT GPU libraries automatically scale performance across up to eight GPUs in a single node. This provides over nine teraflops of double-precision performance per node, supporting larger workloads than ever before (up to 512GB).

And there’s more.

In addition to the new features, the CUDA 6 platform offers a full suite of programming tools, GPU-accelerated math libraries, documentation and programming guides.

To keep informed about the latest CUDA developments, and to access a range parallel programing tools and resources, we encourage you to sign up for the free CUDA/GPU Computing Registered Developer Program at the NVIDIA Developer Zone website.

The only sad note is that processing power continues to out-distance the ability to document and manipulate the semantics of data.

Not unlike having a car that can cross the North American continent in a hour but not having a map of locations between the coasts.

You arrive quickly, but is it where you wanted to go?

Map-D: A GPU Database…

Thursday, February 6th, 2014

Map-D: A GPU Database for Real-time Big Data Analytics and Interactive Visualization by Todd Mostak (map-D) and Tom Graham (map-D). (MP4)

From the description:

map-D makes big data interactive for anyone! map-D is a super-fast GPU database that allows anyone to interact and visualize streaming big data in real time. Its unique architecture runs 70-1,000x faster than other in-memory databases or big data analytics platforms. To boot, it works with any size or kind of dataset; works with data that is streaming live on to the system; uses cheap, off-the-shelf hardware; is easily is focused on learning from big data. At the moment, the map-D team is working on projects with MIT CSAIL, the Harvard Center for Geographic Analysis and the Harvard-Smithsonian Center for Astrophysics. Join Todd Mostak and Tom Graham, key members of the map-D team, as they demonstrate the speed and agility of map-D and describe the live processing, search and mapping of over 1 billion tweets.

I have been haunting the GTC On-Demand page waiting for this to be posted.

I had to download the MP4. (Approximately 124 MB) Suspect they are creating a lot of traffic at the GTC On-Demand page.

As a bonus, see also:

Map-D: GPU-Powered Databases and Interactive Social Science Research in Real Time by Tom Graham (Map_D) and Todd Mostak (Map_D) (streaming) or PDF.

From the description:

Map-D (Massively Parallel Database) uses multiple NVIDIA GPUs to interactively query and visualize big data in real-time. Map-D is an SQL-enabled column store that generates 70-400X speedups over other in-memory databases. This talk discusses the basic architecture of the system, the advantages and challenges of running queries on the GPU, and the implications of interactive and real-time big data analysis in the social sciences and beyond.

Suggestions of more links/papers on Map-D greatly appreciated!


PS: Just so you aren’t too shocked, the Twitter demo involves scanning a billion row database in 5 mili-seconds.

Hardware for Big Data, Graphs and Large-scale Computation

Wednesday, January 15th, 2014

Hardware for Big Data, Graphs and Large-scale Computation by Rob Farber.

From the post:

Recent announcements by Intel and NVIDIA indicate that massively parallel computing with GPUs and Intel Xeon Phi will no longer require passing data via the PCIe bus. The bad news is that these standalone devices are still in the design phase and are not yet available for purchase. Instead of residing on the PCIe bus as a second-class system component like a disk or network controller, the new Knights Landing processor announced by Intel at ISC’13 will be able to run as a standalone processor just like a Sandy Bridge or any other multi-core CPU. Meanwhile, NVIDIA’s release of native ARM compilation in CUDA 5.5 provides a necessary next step toward Project Denver, which is NVIDIAs integration of a 64-bit ARM processor and a GPU. This combination, termed a CP-GP (or ceepee-geepee) in the media, can leverage the energy savings and performance of both architectures.

Of course, the NVIDIA strategy also opens the door to the GPU acceleration of mobile phone and other devices in the ARM dominated low-power, consumer and real-time markets. In the near 12- to 24-month timeframe, customers should start seeing big-memory standalone systems based on Intel and NVIDIA technology that only require power and a network connection. The need for a separate x86 computer to host one or more GPU or Intel Xeon Phi coprocessors will no longer be a requirement.

The introduction of standalone GPU and Intel Xeon Phi devices will affect the design decisions made when planning the next generation of leadership class supercomputers, enterprise data center procurements, and teraflop/s workstations. It also will affect the software view in programming these devices, because the performance limitations of the PCIe bus and the need to work with multiple memory spaces will no longer be compulsory.

Ray provides a great peek at hardware that is coming and current high performance computing, in particular, processing graphs.

Resources mentioned in Rob’s post without links:

Rob’s Intel Xeon Phi tutorial at Dr. Dobbs:

Programming Intel’s Xeon Phi: A Jumpstart Introduction

CUDA vs. Phi: Phi Programming for CUDA Developers

Getting to 1 Teraflop on the Intel Phi Coprocessor

Numerical and Computational Optimization on the Intel Phi

Rob’s GPU Technology Conference presentations:

Simplifying Portable Killer Apps with OpenACC and CUDA-5 Concisely and Efficiently.

Clicking GPUs into a Portable, Persistent and Scalable Massive Data Framework.

(The links are correct but put you one presentation below Rob’s. Scroll up one. Sorry. It was that or use an incorrect link to put you at the right location.)

mpgraph (part of XDATA)

Other resources you may find of interest:

Ray Farber – Dr. Dobbs – Current article listing.

Hot-Rodding Windows and Linux App Performance with CUDA-Based Plugins by Rob Farber (with source code for Windows and Linux).

Ray Farber’s wiki: (Warning: The site seems to be flaky. If it doesn’t load, try again.)

OpenCL (Khronos)

Ray Farber’s Code Project tutorials:

(Part 9 was published in February of 2012. Some updating may be necessary.)


Thursday, October 17th, 2013

cudaMap: a GPU accelerated program for gene expression connectivity mapping by Darragh G McArt, Peter Bankhead, Philip D Dunne, Manuel Salto-Tellez, Peter Hamilton, Shu-Dong Zhang.


BACKGROUND: Modern cancer research often involves large datasets and the use of sophisticated statistical techniques. Together these add a heavy computational load to the analysis, which is often coupled with issues surrounding data accessibility. Connectivity mapping is an advanced bioinformatic and computational technique dedicated to therapeutics discovery and drug re-purposing around differential gene expression analysis. On a normal desktop PC, it is common for the connectivity mapping task with a single gene signature to take > 2h to complete using sscMap, a popular Java application that runs on standard CPUs (Central Processing Units). Here, we describe new software, cudaMap, which has been implemented using CUDA C/C++ to harness the computational power of NVIDIA GPUs (Graphics Processing Units) to greatly reduce processing times for connectivity mapping.

RESULTS: cudaMap can identify candidate therapeutics from the same signature in just over thirty seconds when using an NVIDIA Tesla C2050 GPU. Results from the analysis of multiple gene signatures, which would previously have taken several days, can now be obtained in as little as 10 minutes, greatly facilitating candidate therapeutics discovery with high throughput. We are able to demonstrate dramatic speed differentials between GPU assisted performance and CPU executions as the computational load increases for high accuracy evaluation of statistical significance.

CONCLUSION: Emerging ‘omics’ technologies are constantly increasing the volume of data and information to be processed in all areas of biomedical research. Embracing the multicore functionality of GPUs represents a major avenue of local accelerated computing. cudaMap will make a strong contribution in the discovery of candidate therapeutics by enabling speedy execution of heavy duty connectivity mapping tasks, which are increasingly required in modern cancer research. cudaMap is open source and can be freely downloaded from

Or to put that in lay terms, the goal is to establish the connections between human diseases, genes that underlie them and drugs that treat them.

Going from several days to ten (10) minutes is quite a gain in performance.

This is processing of experimental data but is it a window into techniques for scaling topic maps?

I first saw this in a tweet by Stefano Bertolo.

Third Age of Computing?

Monday, August 26th, 2013

The ‘third era’ of app development will be fast, simple, and compact by Rik Myslewski.

From the post:

The tutorial was conducted by members of the HSA – heterogeneous system architecture – Foundation, a consortium of SoC vendors and IP designers, software companies, academics, and others including such heavyweights as ARM, AMD, and Samsung. The mission of the Foundation, founded last June, is “to make it dramatically easier to program heterogeneous parallel devices.”

As the HSA Foundation explains on its website, “We are looking to bring about applications that blend scalar processing on the CPU, parallel processing on the GPU, and optimized processing of DSP via high bandwidth shared memory access with greater application performance at low power consumption.”

Last Thursday, HSA Foundation president and AMD corporate fellow Phil Rogers provided reporters with a pre-briefing on the Hot Chips tutorial, and said the holy grail of transparent “write once, use everywhere” programming for shared-memory heterogeneous systems appears to be on the horizon.

According to Rogers, heterogeneous computing is nothing less than the third era of computing, the first two being the single-core era and the muti-core era. In each era of computing, he said, the first programming models were hard to use but were able to harness the full performance of the chips.


Exactly how HSA will get there is not yet fully defined, but a number of high-level features are accepted. Unified memory addressing across all processor types, for example, is a key feature of HSA. “It’s fundamental that we can allocate memory on one processor,” Rogers said, “pass a pointer to another processor, and execute on that data – we move the compute rather than the data.”

Rik does a deep dive with references to the HSA Programmers Reference Manual to Project Sumatra that bring data-parallel algorithms to Java 9 (2015).

The only discordant note is that Nivdia and Intel are both missing from the HSA Foundation. Invited but not present.

Customers of Nvidia and/or Intel (I’m both) should contact Nvidia (Contact us) and Intel (contact us) and urge them to join the HSA Foundation. And pass this request along.

Sharing of memory is one of the advantages of HSA (heterogeneous systems architecture) and it is the where the semantics of shared data will come to the fore.

I haven’t read the available HSA documents in detail, but the HSA Programmer’s Reference Manual appears to presume that shared data has only one semantic. (It never says that but that is my current impression.)

We have seen that the semantics of data is not “transparent.” The same demonstration illustrates that data doesn’t always have the same semantic.

Simply because I am pointed to a particular memory location, there is no reason to presume I should approach that data with the same semantics.

For example, what if I have a Social Security Number (SSN). In processing that number for the Social Security Administration, it may serve to recall claim history, eligibility, etc. If I am accessing the same data to compare it to SSN records maintained by the Federal Bureau of Investigation (FBI), it may not longer be a unique identifier in the same sense as at the SSA.

Same “data,” but different semantics.

Who you gonna call? Topic Maps!

PS: Perhaps not as part of the running code but to document the semantics you are using to process data. Same data, same memory location, multiple semantics.

Introduction to Parallel Programming

Friday, May 3rd, 2013

Introduction to Parallel Programming by John Owens, David Luebke, Cheng-Han Lee and Mike Roberts. (UDACITY)

Class Summary:

Learn the fundamentals of parallel computing with the GPU and the CUDA programming environment! In this class, you’ll learn about parallel programming by coding a series of image processing algorithms, such as you might find in Photoshop or Instagram. You’ll be able to program and run your assignments on high-end GPUs, even if you don’t own one yourself.

What Should I Know?

We expect students to have a solid experience with the C programming language and basic knowledge of data structures and algorithms.

What Will I Learn?

You’ll master the fundamentals of massively parallel computing by using CUDA C/C++ to program modern GPUs. You’ll learn the GPU programming model and architecture, key algorithms and parallel programming patterns, and optimization techniques. Your assignments will illustrate these concepts through image processing applications, but this is a parallel computing course and what you learn will translate to any application domain. Most of all we hope you’ll learn how to think in parallel.

In Fast Database Emerges from MIT Class… [Think TweetMap] you read about a new SQL database based on GPUs.

What new approach is going to emerge from your knowing more about GPUs and parallel programming?

Cloud-Hosted GPUs And Gaming-As-A-Service

Friday, May 18th, 2012

Cloud-Hosted GPUs And Gaming-As-A-Service by Humayun

From the post:

NVIDIA is all buckled up to redefine the dynamics of gaming. The company has spilled the beans over three novel cloud technologies aimed at accelerating the available remote computational power by endorsing the number-crunching potential of its very own (and redesigned) graphical processing units.

At the heart of each of the three technologies lies the latest Kepler GPU architecture, custom-tailored for utility in volumetric datacenters. Through virtualization software, a number of users achieve access through the cutting-edge computational capability of the GPUs.

Jen-Hsun Huang, NVIDIA’s president and CEO, firmly believes that the Kepler cloud GPU technology is bound to take cloud computing to an entirely new level. He advocates that the GPU has become a significant constituent of contemporary computing devices. Digital artists are essentially dependent upon the GPU for conceptualizing their thoughts. Touch devices owe a great deal to the GPU for delivering a streamlined graphical experience.

With the introduction of the cloud GPU, NVIDIA is all set to change the game—literally. NVIDIA’s cloud-based GPU will bring an amazingly pleasant experience to gamers on a hunt to play in an untethered manner from a console or personal computer.

First in line is the NVIDIA VGX platform, an enterprise-level execution of the Kepler cloud technologies, primarily targeting virtualized desktop performance boosts. The company is hopeful that ventures will make use of this particular platform to ensure flawless remote computing and cater to the most computationally starved applications to be streamed directly to a notebook, tablet or any other mobile device variant. Jeff Brown, GM at NVIDIA’s Professional Solutions Group, is reported to have marked the VGX as the starting point for a “new era in desktop virtualization” that promises a cost-effective virtualization solution offering “an experience almost indistinguishable from a full desktop”.

Results with GPUs have been encouraging and spreading their availability as a cloud-based GPU should lead to a wider variety of experiences.

The emphasis here is making the lives of gamers more pleasant but one expects serious uses, such as graph processing, to not be all that far behind.