Archive for the ‘Scientific Computing’ Category

Scipy Lecture Notes

Sunday, November 12th, 2017

Scipy Lecture Notes edited by Gaël Varoquaux, Emmanuelle Gouillart, Olav Vahtras.

From the webpage:

Tutorials on the scientific Python ecosystem: a quick introduction to central tools and techniques. The different chapters each correspond to a 1 to 2 hours course with increasing level of expertise, from beginner to expert.

In PDF format, some six-hundred and fifty-seven pages of top quality material on Scipy.

In addition to the main editors, there are fourteen chapter editors and seventy-three contributors.

Good documentation needs maintenance so if you improvements or examples to offer, perhaps your name will appear here in the not too distant future.

Enjoy!

Repulsion On A Galactic Scale (Really Big Data/Visualization)

Tuesday, January 31st, 2017

Newly discovered intergalactic void repels Milky Way by Rol Gal.

From the post:

For decades, astronomers have known that our Milky Way galaxy—along with our companion galaxy, Andromeda—is moving through space at about 1.4 million miles per hour with respect to the expanding universe. Scientists generally assumed that dense regions of the universe, populated with an excess of galaxies, are pulling us in the same way that gravity made Newton’s apple fall toward earth.

In a groundbreaking study published in Nature Astronomy, a team of researchers, including Brent Tully from the University of Hawaiʻi Institute for Astronomy, reports the discovery of a previously unknown, nearly empty region in our extragalactic neighborhood. Largely devoid of galaxies, this void exerts a repelling force, pushing our Local Group of galaxies through space.

Astronomers initially attributed the Milky Way’s motion to the Great Attractor, a region of a half-dozen rich clusters of galaxies 150 million light-years away. Soon after, attention was drawn to a much larger structure called the Shapley Concentration, located 600 million light-years away, in the same direction as the Great Attractor. However, there has been ongoing debate about the relative importance of these two attractors and whether they suffice to explain our motion.

The work appears in the January 30 issue of Nature Astronomy and can be found online here.

Additional images, video, and links to previous related productions can be found at http://irfu.cea.fr/dipolerepeller.

If you are looking for processing/visualization of data on a galactic scale, this work by Yehuda Hoffman, Daniel Pomarède, R. Brent Tully & Hélène M. Courtois, hits the spot!

It is also a reminder that when you look up from your social media device, there is a universe waiting to be explored.

How Computers Broke Science… [Soon To Break Businesses …]

Tuesday, November 10th, 2015

How Computers Broke Science — and What We can do to Fix It by Ben Marwick.

From the post:

Reproducibility is one of the cornerstones of science. Made popular by British scientist Robert Boyle in the 1660s, the idea is that a discovery should be reproducible before being accepted as scientific knowledge.

In essence, you should be able to produce the same results I did if you follow the method I describe when announcing my discovery in a scholarly publication. For example, if researchers can reproduce the effectiveness of a new drug at treating a disease, that’s a good sign it could work for all sufferers of the disease. If not, we’re left wondering what accident or mistake produced the original favorable result, and would doubt the drug’s usefulness.

For most of the history of science, researchers have reported their methods in a way that enabled independent reproduction of their results. But, since the introduction of the personal computer — and the point-and-click software programs that have evolved to make it more user-friendly — reproducibility of much research has become questionable, if not impossible. Too much of the research process is now shrouded by the opaque use of computers that many researchers have come to depend on. This makes it almost impossible for an outsider to recreate their results.

Recently, several groups have proposed similar solutions to this problem. Together they would break scientific data out of the black box of unrecorded computer manipulations so independent readers can again critically assess and reproduce results. Researchers, the public, and science itself would benefit.

Whether you are looking for specific proposals to make computed results capable of replication or quotes to support that idea, this is a good first stop.

FYI for business analysts, how are you going to replicate results of computer runs to establish your “due diligence” before critical business decisions?

What looked like a science or academic issue has liability implications!

Changing a few variables in a spreadsheet or more complex machine learning algorithms can make you look criminally negligent if not criminal.

The computer illiteracy/incompetence of prosecutors and litigants is only going to last so long. Prepare defensive audit trails to enable the replication of your actual* computer-based business analysis.

*I offer advice on techniques for such audit trails. The audit trails you choose to build are up to you.

Tomas Petricek on The Against Method

Tuesday, October 13th, 2015

Tomas Petricek on The Against Method by Tomas Petricek.

From the webpage:

How is computer science research done? What we take for granted and what we question? And how do theories in computer science tell us something about the real world? Those are some of the questions that may inspire computer scientist like me (and you!) to look into philosophy of science. I’ll present the work of one of the more extreme (and interesting!) philosophers of science, Paul Feyerabend. In “Against Method”, Feyerabend looks at the history of science and finds that there is no fixed scientific methodology and the only methodology that can encompass the rich history is ‘anything goes’. We see (not only computer) science as a perfect methodology for building correct knowledge, but is this really the case? To quote Feyerabend:

“Science is much more ‘sloppy’ and ‘irrational’ than its methodological image.”

I’ll be mostly talking about Paul Feyerabend’s “Against Method”, but as a computer scientist myself, I’ll insert a number of examples based on my experience with theoretical programming language research. I hope to convince you that looking at philosophy of science is very much worthwhile if we want to better understand what we do and how we do it as computer scientists!

The video runs an hour and about eighteen minutes but is worth every minute of it. As you can imagine, I was particularly taken with Tomas’ emphasis on the importance of language. Tomas goes so far as to suggest that disagreements about “type” in computer science stem from fundamentally different understandings of the word “type.”

I was reminded of Stanley Fish‘s “Doing What Comes Naturally (DWCN).

DWCN is a long and complex work but in brief Fish argues that we are all members of various “interpretive communities,” and that each of those communities influence how we understand language as readers. Which should come as assurance to those who fear intellectual anarchy and chaos because our interpretations are always within the context of an interpretative community.

Two caveats on Fish. As far as I know, Fish has never made the strong move and pointed out that his concept of “interpretative communities is just as applicable to natural sciences as it is to social sciences. What passes as “objective” today is part and parcel of an interpretative community that has declared it so. Other interpretative communities can and do reach other conclusions.

The second caveat is more sad than useful. Post-9/11, Fish and a number of other critics who were accused of teaching cultural relativity of values felt it necessary to distance themselves from that position. While they could not say that all cultures have the same values (factually false), they did say that Western values, as opposed to those of “cowardly, murdering,” etc. others, were superior.

If you think there is any credibility to that post-9/11 position, you haven’t read enough Chompsky. 9/11 wasn’t 1/100,0000 of the violence the United States has visited on civilians in other countries after the Korea War.

Scientific Computing on the Erlang VM

Tuesday, January 6th, 2015

Scientific Computing on the Erlang VM by Duncan McGreggor.

From the post:

This tutorial brings in the New Year by introducing the Erlang/LFE scientific computing library lsci – a ports wrapper of NumPy and SciPy (among others) for the Erlang ecosystem. The topic of the tutorial is polynomial curve-fitting for a given data set. Additionally, this post further demonstrates py usage, the previously discussed Erlang/LFE library for running Python code from the Erlang VM.

Background

The content of this post was taken from a similar tutorial done by the same author for the Python Lisp Hy in an IPython notebook. It, in turn, was completely inspired by the Clojure Incantor tutorial on the same subject, by David Edgar Liebke.

This content is also available in the lsci examples directory.

Introduction

The lsci library (pronounced “Elsie”) provides access to the fast numerical processing libraries that have become so popular in the scientific computing community. lsci is written in LFE but can be used just as easily from Erlang.

Just in case Erlang was among your New Year’s Resolutions. 😉

Well, that’s not the only reason. You are going to encounter data processing that was performed in systems or languages that are strange to you. Assuming access to the data and a sufficient explanation of what was done, you need to be able to verify analysis in a language comfortable to you.

There isn’t now nor is there likely to be a shortage of languages and applications for data processing. Apologies to the various evangelists who dream of world domination for their favorite. Unless and until that happy day for someone arrives, the rest of us need to survive in a multilingual and multi-application space.

Which means having the necessary tools for data analysis/verification in your favorite tool suite counts for a lot. It is the difference in taking someone’s word for analysis and verifying the analysis for yourself. There is a world of difference between those two positions.

CERN frees LHC data

Friday, November 21st, 2014

CERN frees LHC data

From the post:

Today CERN launched its Open Data Portal, which makes data from real collision events produced by LHC experiments available to the public for the first time.

“Data from the LHC program are among the most precious assets of the LHC experiments, that today we start sharing openly with the world,” says CERN Director General Rolf Heuer. “We hope these open data will support and inspire the global research community, including students and citizen scientists.”

The LHC collaborations will continue to release collision data over the coming years.

The first high-level and analyzable collision data openly released come from the CMS experiment and were originally collected in 2010 during the first LHC run. Open source software to read and analyze the data is also available, together with the corresponding documentation. The CMS collaboration is committed to releasing its data three years after collection, after they have been thoroughly studied by the collaboration.

“This is all new and we are curious to see how the data will be re-used,” says CMS data preservation coordinator Kati Lassila-Perini. “We’ve prepared tools and examples of different levels of complexity from simplified analysis to ready-to-use online applications. We hope these examples will stimulate the creativity of external users.”

In parallel, the CERN Open Data Portal gives access to additional event data sets from the ALICE, ATLAS, CMS and LHCb collaborations that have been prepared for educational purposes. These resources are accompanied by visualization tools.

All data on OpenData.cern.ch are shared under a Creative Commons CC0 public domain dedication. Data and software are assigned unique DOI identifiers to make them citable in scientific articles. And software is released under open source licenses. The CERN Open Data Portal is built on the open-source Invenio Digital Library software, which powers other CERN Open Science tools and initiatives.

Awesome is the only term for this data release!

But, when you dig just a little bit further, you discover that embargoes still exist on three (3) out of (4) experiments. Both on data and software.

Disappointing but hopefully a dying practice when it comes to publicly funded data.

I first saw this in a tweet by Ben Evans.

Bringing researchers and developers together:…(Mozilla Science Lab)

Sunday, June 8th, 2014

Bringing researchers and developers together: a call for proposals by Bill Mills.

From the post:

Interdisciplinary Programming is looking for research projects to participate in a pilot study on bringing together the scientific and developer communities to work together on common problems to help further science on the web. This pilot will be run with the Mozilla Science Lab as a means of testing out new ways for the open science and open source community to get their hands dirty and contribute. The pilot is open to coders both within the research enterprise as well as those outside, and for all skill levels.

In this study, we’ll work to break accepted projects down to digestible tasks (think bug reports or github issues) for others to contribute to or offer guidance on. Projects can be small to mid-scale – the key here is to show how we can involve the global research and development community in furthering science on the web, while testing what the right level of engagement is. Any research-oriented software development project is eligible, with special consideration given to projects that further open, collaborative, reproducible research, and reusable tools and technology for open science.

Candidate research projects should:

  • Have a clearly stated and specific goal to achieve or problem to solve in software.
  • Be directly relevant to your ongoing or shortly upcoming research.
  • Require code that is sharable and reusable, with preference given to open source projects.
  • Science team should be prepared to communicate regularly with the software team.

Interdisciplinary Programming was the brainchild of Angelina Fabbro (Mozilla) and myself (Bill Mills, TRIUMF) that came about when we realized the rich opportunities for cross-pollination between the fields of software development and basic research. When I was a doctoral student writing analysis software for the Large Hadron Collider’s ATLAS experiment, I got to participate in one of the most exciting experiments in physics today – which made it all the more heartbreaking to watch how much precious time vanished into struggling with unusable software, and how many opportunities for great ideas had to be abandoned while we wrestled with software problems that should have been helping us instead of holding us back. If we could only capture some of the coding expertise that was out there, surely our grievously limited budgets and staff could reach far further, and do so much more.

Later, I had the great good fortune to be charged with building the user interface for TRIUMF’s upcoming GRIFFIN experiment, launching this month; thanks to Angelina, this was a watershed moment in realizing what research could do if it teamed up with the web. Angelina taught me about the incredibly rich thought the web community had in the spheres of usability, interaction design, and user experience; even my amature first steps in this world allowed GRIFFIN to produce a powerful, elegant, web-based UI that was strides ahead of what we had before. But what really struck me, was the incredible enthusiasm coders had for research. Angelina and I spoke about our plans for Interdisciplinary Programming on the JavaScript conference circuit in late 2013, and the response was overwhelming; coders were keen to contribute ideas, participate in the discussion and even get their hands dirty with contributions to the fields that excited them; and if I could push GRIFFIN ahead just by having a peek at what web developers were doing, what could we achieve if we welcomed professional coders to the realm of research in numbers? The moment is now to start studying what we can do together.

We’ll be posting projects in early July 2014, due to conclude no later than December 2014 (shorter projects also welcome); projects anticipated to fit this scope will be given priority. In addition, the research teams should be prepared to answer a few short questions on how they feel the project is going every month or so. Interested participants should send project details to the team at mills.wj@gmail.com by June 27, 2014.

I wonder, do you think documenting semantics of data is likely to come up? 😉

Will report more news as it develops!

100 numpy exercises

Thursday, January 30th, 2014

100 numpy exercises A joint effort of the numpy community.

The categories are:

Neophyte
Novice
Apprentice
Journeyman
Craftsman
Artisan
Adept
Expert
Master
Archmaster

Further on Numpy.

Enjoy!

I first saw this in a tweet by Gregory Piatetsky.

Scientific Computing and Numerical Analysis FAQ

Saturday, April 6th, 2013

Scientific Computing and Numerical Analysis FAQ

From the webpage:


Note: portions of this document may be out of date. Search the web for more recent information!

This is a summary of Internet-related resources for a handful of fields related to Scientific Computing, primarily:

  • scientific and engineering numerical computing
  • numerical analysis
  • symbolic algebra
  • statistics
  • operations research

Some parts may be out of date but it makes up an impressive starting place.

I first saw this in a tweet by Scientific Python.

Hadoop in Perspective: Systems for Scientific Computing

Saturday, January 19th, 2013

Hadoop in Perspective: Systems for Scientific Computing by Evert Lammerts.

From the post:

When the term scientific computing comes up in a conversation it’s usually just the occasional science geek who shows signs of recognition. But although most people have little or no knowledge of the field’s existence, it has been around since the second half of the twentieth century and has played an increasingly important role in many technological and scientific developments. Internet search engines, DNA analysis, weather forecasting, seismic analysis, renewable energy, and aircraft modeling are just a small number of examples where scientific computing is nowadays indispensible.

Apache Hadoop is a newcomer in scientific computing, and is welcomed as a great new addition to already existing systems. In this post I mean to give an introduction to systems for scientific computing, and I make an attempt at giving Hadoop a place in this picture. I start by discussing arguably the most important concept in scientific computing: parallel computing; what is it, how does it work, and what tools are available? Then I give an overview of the systems that are available for scientific computing at SURFsara, the Dutch center for academic IT and home to some of the world’s most powerful computing systems. I end with a short discussion on the questions that arise when there’s many different systems to choose from.

A good overview of the range of options for scientific computing, where, just as with more ordinary problems, no one solution is the best for all cases.

2013 Workshop on Interoperability in Scientific Computing

Friday, September 28th, 2012

2013 Workshop on Interoperability in Scientific Computing

From the post:

The 13th annual International Conference on Computational Science (ICCS 2013) will be held in Barcelona, Spain from 5th – 7th June 2013. ICCS is an ERA 2010 ‘A’-ranked conference series. For more details on the main conference, please visit www.iccs-meeting.org The 2nd Workshop on Interoperability in Scientific Computing (WISC ’13) will be co-located with ICCS 2013.

Approaches to modelling take many forms. The mathematical, computational and encapsulated components of models can be diverse in terms of complexity and scale, as well as in published implementation (mathematics, source code, and executable files). Many of these systems are attempting to solve real-world problems in isolation. However the long-term scientific interest is in allowing greater access to models and their data, and to enable simulations to be combined in order to address ever more complex issues. Markup languages, metadata specifications, and ontologies for different scientific domains have emerged as pathways to greater interoperability. Domain specific modelling languages allow for a declarative development process to be achieved. Metadata specifications enable coupling while ontologies allow cross platform integration of data.

The goal of this workshop is to bring together researchers from across scientific disciplines whose computational models require interoperability. This may arise through interactions between different domains, systems being modelled, connecting model repositories, or coupling models themselves, for instance in multi-scale or hybrid simulations. The outcomes of this workshop will be to better understand the nature of multidisciplinary computational modelling and data handling. Moreover we hope to identify common abstractions and cross-cutting themes in future interoperability research applied to the broader domain of scientific computing.

How is your topic map information product going to make the lives of scientists simpler?

Tilera’s TILE-Gx Processor Family and the Open Source Community [topic maps lab resource?]

Thursday, June 21st, 2012

Tilera’s TILE-Gx Processor Family and the Open Source Community Deliver the World’s Highest Performance per Watt to Networking, Multimedia, and the Cloud

It’s summer and on hot afternoons it’s easy to look at all the cool stuff at online trade zines. Like really high-end processors that we could stuff in our boxes, to run, well, really complicated stuff to be sure. 😉

On one hand we should be mindful that our toys have far more processing power than mainframes of not too long ago. So we need to step up our skill at using the excess capacity on our desktops.

On the other hand, it would be nice to have access to cutting edge processors that will be common place in another cycle or two, today!

From the post:

Tilera® Corporation, the leader in 64-bit manycore general purpose processors, announced the general availability of its Multicore Development Environment™ (MDE) 4.0 release on the TILE-Gx processor family. The release integrates a complete Linux distribution including the kernel 2.6.38, glibc 2.12, GNU tool chain, more than 3000 CentOS 6.2 packages, and the industry’s most advanced manycore tools developed by Tilera in collaboration with the open source community. This release brings standards, familiarity, ease of use, quality and all the development benefits of the Linux environment and open source tools onto the TILE-Gx processor family; both the world’s highest performance and highest performance per watt manycore processor in the market. Tilera’s MDE 4.0 is available now.

“High quality software and standard programming are essential elements for the application development process. Developers don’t have time to waste on buggy and hard to program software tools, they need an environment that works, is easy and feels natural to them,” said Devesh Garg, co-founder, president and chief executive officer, Tilera. “From 60 million packets per second to 40 channels of H.264 encoding on a Linux SMP system, this release further empowers developers with the benefits of manycore processors.”

Using the TILE-Gx processor family and the MDE 4.0 software release, customers have demonstrated high performance, low latency, and the highest performance per watt on many applications. These include Firewall, Intrusion Prevention, Routers, Application Delivery Controllers, Intrusion Detection, Network Monitoring, Network Packet Brokering, Application Switching for Software Defined Networking, Deep Packet Inspection, Web Caching, Storage, High Frequency Trading, Image Processing, and Video Transcoding.

The MDE provides a comprehensive runtime software stack, including Linux kernel 2.6.38, glibc 2.12, binutil, Boost, stdlib and other libraries. It also provides full support for Perl, Python, PHP, Erlang, and TBB; high-performance kernel and user space PCIe drivers; high performance low latency Ethernet drivers; and a hypervisor for hardware abstraction and virtualization. For development tools the MDE includes standard C/C++ GNU compiler v4.4 and 4.6; an Eclipse Integrated Development Environment (IDE); debugging tools such as gdb 7 and mudflap; profiling tools including gprof, oprofile, and perf_events; native and cross build environments; and graphical manycore application debugging and profiling tools.

Should a topic maps lab offer this sort of resource to a geographically distributed set of researchers? (Just curious. I don’t have funding but should the occasion arise.)

Even with the cloud, thinking topic map researchers need access to high-end architectures for experiments with data structures and processing techniques.

IFIP Working Conference on Uncertainty Quantification in Scientific Computing

Saturday, October 29th, 2011

IFIP Working Conference on Uncertainty Quantification in Scientific Computing

From the webpage:

I just came across the following presentations at the IFIP Working Conference on Uncertainty Quantification in Scientific Computing held at the Millennium Harvest House in Boulder, on August 1-4, 2011. Here are the talks and some abstracts:

I really like the title of this blog: The Robust Mathematical Modeling Blog …When modeling Reality is not an option.

I think you will find the presentations good starting points for reviewing what we know or suspect about uncertainty.

Does anyone know of references to modeling uncertainties in the humanities?

Seems to me that our notions of subject identity should be understood along a continuum of uncertainty.