Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 16, 2016

Is Failing to Attempt to Replicate, “Just Part of the Whole Science Deal”?

Filed under: Bioinformatics,Replication,Science — Patrick Durusau @ 8:08 pm

Genomeweb posted this summary of Stuart Firestein’s op-ed on failure to replicate:

Failure to replicate experiments is just part of the scientific process, writes Stuart Firestein, author and former chair of the biology department at Columbia University, in the Los Angeles Times. The recent worries over a reproducibility crisis in science are overblown, he adds.

“Science would be in a crisis if it weren’t failing most of the time,” Firestein writes. “Science is full of wrong turns, unconsidered outcomes, omissions and, of course, occasional facts.”

Failures to repeat experiments and the struggle to figure out what went wrong has also fed a number of discoveries, he says. For instance, in 1921, biologist Otto Loewi studied beating hearts from frogs in saline baths, one with the vagus nerve removed and one with it still intact. When the solution from the heart with the nerve still there was added to the other bath, that heart also slowed, suggesting that the nerve secreted a chemical that slowed the contractions.

However, Firestein notes Loewi and other researchers had trouble replicating the results for nearly six years. But that led the researchers to find that seasons can affect physiology and that temperature can affect enzyme function: Loewi’s first experiment was conducted at night and in the winter, while the follow-up ones were done during the day in heated buildings or on warmer days. This, he adds, also contributed to the understanding of how synapses fire, a finding for which Loewi shared the 1936 Nobel Prize.

“Replication is part of [the scientific] process, as open to failure as any other step,” Firestein adds. “The mistake is to think that any published paper or journal article is the end of the story and a statement of incontrovertible truth. It is a progress report.”

You will need to read Firestein’s comments in full: just part of the scientific process, to appreciate my concerns.

For example, Firestein says:


Absolutely not. Science is doing what it always has done — failing at a reasonable rate and being corrected. Replication should never be 100%. Science works beyond the edge of what is known, using new, complex and untested techniques. It should surprise no one that things occasionally come out wrong, even though everything looks correct at first.

I don’t know, would you say an 85% failure to replicate rate is significant? Drug development: Raise standards for preclinical cancer research, C. Glenn Begley & Lee M. Ellis, Nature 483, 531–533 (29 March 2012) doi:10.1038/483531a. Or over half of psychology studies? Over half of psychology studies fail reproducibility test. just to name two studies on replication.

I think we can agree with Firestein that replication isn’t at 100% but at what level are the attempts to replicate?

From what Firestein says,

“Replication is part of [the scientific] process, as open to failure as any other step,” Firestein adds. “The mistake is to think that any published paper or journal article is the end of the story and a statement of incontrovertible truth. It is a progress report.”

Systematic attempts at replication (and its failure) should be part and parcel of science.

Except…, that it’s obviously not.

If it were, there would have been no earth shaking announcements that fundamental cancer research experiments could not be replicated.

Failures to replicate would have been spread out over the literature and gradually resolved with better data, methods, if not both.

Failure to replicate is a legitimate part of the scientific method.

Not attempting to replicate, “I won’t look too close at your results if you don’t look too closely at mine,” isn’t.

There an ugly word for avoiding looking too closely at your own results or those of others.

February 13, 2016

You Can Confirm A Gravity Wave!

Filed under: Physics,Python,Science,Signal Processing,Subject Identity,Subject Recognition — Patrick Durusau @ 5:35 pm

Unless you have been unconscious since last Wednesday, you have heard about the confirmation of Einstein’s 1916 prediction of gravitational waves.

An very incomplete list of popular reports include:

Einstein, A Hunch And Decades Of Work: How Scientists Found Gravitational Waves (NPR)

Einstein’s gravitational waves ‘seen’ from black holes (BBC)

Gravitational Waves Detected, Confirming Einstein’s Theory (NYT)

Gravitational waves: breakthrough discovery after a century of expectation (Guardian)

For the full monty, see the LIGO Scientific Collaboration itself.

Which brings us to the iPython notebook with the gravitational wave discovery data: Signal Processing with GW150914 Open Data

From the post:

Welcome! This ipython notebook (or associated python script GW150914_tutorial.py ) will go through some typical signal processing tasks on strain time-series data associated with the LIGO GW150914 data release from the LIGO Open Science Center (LOSC):

To begin, download the ipython notebook, readligo.py, and the data files listed below, into a directory / folder, then run it. Or you can run the python script GW150914_tutorial.py. You will need the python packages: numpy, scipy, matplotlib, h5py.

On Windows, or if you prefer, you can use a python development environment such as Anaconda (https://www.continuum.io/why-anaconda) or Enthought Canopy (https://www.enthought.com/products/canopy/).

Questions, comments, suggestions, corrections, etc: email losc@ligo.org

v20160208b

Unlike the toadies at the New England Journal of Medicine, Parasitic Re-use of Data? Institutionalizing Toadyism, Addressing The Concerns Of The Selfish, the scientists who have labored for decades on the gravitational wave question are giving their data away for free!

Not only giving the data away, but striving to help others learn to use it!

Beyond simply “doing the right thing,” and setting an example for other scientists, this is a great opportunity to learn more about signal processing.

Signal processing being an important method of “subject identification” when you stop to think about it in a large number of domains.

Detecting a gravity wave is beyond your personal means but with the data freely available…, further analysis is a matter of interest and perseverance.

February 6, 2016

Are You A Scientific Twitter User or Polluter?

Filed under: Science,Twitter — Patrick Durusau @ 11:22 am

Realscientists posted this image to Twitter:

science

Self-Scoring Test:

In the last week, how often have you retweeted without “read[ing] the actual paper” pointed to by a tweet?

How many times did you retweet in total?

Formula: retweets w/o reading / retweets in total = % of retweets w/o reading.

No scale with superlatives because I don’t have numbers to establish a baseline for the “average” Twitter user.

I do know that I see click-bait, out-dated and factually wrong material retweeted by people who know better. That’s Twitter pollution.

Ask yourself: Am I a scientific Twitter user or a polluter?

Your call.

January 27, 2016

Another Victory For Peer Review – NOT! Cowardly Science

Filed under: Chemistry,Peer Review,Science — Patrick Durusau @ 9:35 pm

Pressure on controversial nanoparticle paper builds by Anthony King.

From the post:

The journal Science has posted an expression of concern over a controversial 2004 paper on the synthesis of palladium nanoparticles, highlighting serious problems with the work. This follows an investigation by the US funding body the National Science Foundation (NSF), which decided that the authors had falsified research data in the paper, which reported that crystalline palladium nanoparticle growth could be mediated by RNA.1 The NSF’s 2013 report on the issue, and a letter of reprimand from May last year, were recently brought into the open by a newspaper article.

The chief operating officer of the NSF identified ‘an absence of care, if not sloppiness, and most certainly a departure from accepted practices’. Recommended actions included sending letters of reprimand, requiring the subjects contact the journal to make a correction and barring the two chemists from serving as a peer reviewer, adviser or consultant for the NSF for three years.

Science notes that, though the ‘NSF did not find that the authors’ actions constituted misconduct, it nonetheless concluded that there “were significant departures from research practice”.’ The NSF report noted it would no longer fund the paper’s senior authors chemists Daniel Feldheim and Bruce Eaton at the University of Colorado, Boulder, who ‘recklessly falsified research data’, unless they ‘take specific actions to address issues’ in the 2004 paper. Science said it is working with the two authors ‘to understand their response to the NSF final ruling’.

Feldheim and Eaton have been under scrutiny since 2008, when an investigation by their former employer North Carolina State University, US, concluded the 2004 paper contained falsified data. According to Retraction Watch, Science said it would retract the paper as soon as possible.

I’m not a subscriber to Science, unfortunately, but if you are, can you write to Marcia McNutt, Editor-in-Chief to ask why findings of “recklessly falsified research data,” merits an expression of concern?

What’s with that? Concern?

In many parts of the United States, you can be murdered with impunity for DWB, Driving While Black, but you can falsify research data and only merit an expression of “concern” from Science?

Not to mention that the NSF doesn’t think that falsifying research evidence is “misconduct.”

The NSF needs to document what it thinks “misconduct” means. I don’t think it means what they think it means.

Every profession has bad apples but what is amazing in this case is the public kid glove handling of known falsifiers of evidence.

What is required for a swift and effective response against scientific misconduct?

Vivisection of human babies?

Or would that only count if they failed to have a petty cash account and to reconcile it on a monthly basis?

November 25, 2015

Apple Watches Lowers Your IQ – Still Want One For Christmas?

Filed under: Humor,Science — Patrick Durusau @ 5:43 pm

Philip Elmer-DeWitt reports Apple Watch Owners Glance at Their Wrists 60 to 80 Times a Day.

The vast majority of those uses are not to check the time.

The reports Philip summarizes say that interactions last only a few seconds but how long does it take to break your train of thought?

Which reminded me of Vanessa Loder‘s post: Why Multi-Tasking Is Worse Than Marijuana For Your IQ.

From Vanessa’s post:

What makes you more stupid – smoking marijuana, emailing while talking on the phone or losing a night’s sleep?

Researchers at the Institute of Psychiatry at the University of London studied 1,100 workers at a British company and found that multitasking with electronic media caused a greater decrease in IQ than smoking pot or losing a night’s sleep.

For those of you in Colorado, this means you should put down your phone and pick up your pipe! In all seriousness, in today’s tech heavy world, the temptation to multi-task is higher than it’s ever been. And this has become a major issue. We don’t focus and we do too many things at once. We also aren’t efficient or effective when we stay seated too long.

If a colleague gives you an Apple Watch for Christmas, be very wary.

Apple is likely to complain that my meta-comparison isn’t the same as a controlled study and I have to admit, it’s not.

If Apple wants to get one hundred people together for about a month, with enough weed, beer, snack food, PS4s, plus Apple Watches, my meta-analysis can be put to the test.

The Consumer Safety Commission should sponsor that type of testing.

Imagine, being a professional stoner. 😉

November 17, 2015

Building Software, Building Community: Lessons from the rOpenSci Project

Filed under: Open Source,R,Science — Patrick Durusau @ 6:36 pm

Building Software, Building Community: Lessons from the rOpenSci Project by Carl Boettiger, Scott Chamberlain, Edmund Hart, Karthik Ram.

Abstract:

rOpenSci is a developer collective originally formed in 2011 by graduate students and post-docs from ecology and evolutionary biology to collaborate on building software tools to facilitate a more open and synthetic approach in the face of transformative rise of large and heterogeneous data. Born on the internet (the collective only began through chance discussions over social media), we have grown into a widely recognized effort that supports an ecosystem of some 45 software packages, engages scores of collaborators, has taught dozens of workshops around the world, and has secured over $480,000 in grant support. As young scientists working in an academic context largely without direct support for our efforts, we have first hand experience with most of the the technical and social challenges WSSSPE seeks to address. In this paper we provide an experience report which describes our approach and success in building an effective and diverse community.

Given the state of world affairs, I can’t think of a better time for the publication of this article.

The key lesson that I urge you to draw from this paper is the proactive stance of the project in involving and reaching out to build a community around this project.

Too many projects (and academic organizations for that matter) take the approach that others know they exist and so they sit waiting for volunteers and members to queue up.

Very often they are surprised and bitter that the queue of volunteers and members is so sparse. If anyone dares to venture that more outreach might be helpful, the response is nearly always, sure, you go do that and let us know when it is successful.

How proactive are you in promoting your favorite project?

PS: The rOpenSci website.

October 27, 2015

Statistical Reporting Errors in Psychology (1985–2013) [1 in 8]

Filed under: Psychology,Science,Statistics — Patrick Durusau @ 7:50 pm

Do you remember your parents complaining about how far the latest psychology report departed from their reality?

Turns out there may be a scientific reason why those reports were as far off as your parents thought (or not).

The prevalence of statistical reporting errors in psychology (1985–2013) by Michèle B. Nuijten , Chris H. J. Hartgerink, Marcel A. L. M. van Assen, Sacha Epskamp, Jelte M. Wicherts, reports:

This study documents reporting errors in a sample of over 250,000 p-values reported in eight major psychology journals from 1985 until 2013, using the new R package “statcheck.” statcheck retrieved null-hypothesis significance testing (NHST) results from over half of the articles from this period. In line with earlier research, we found that half of all published psychology papers that use NHST contained at least one p-value that was inconsistent with its test statistic and degrees of freedom. One in eight papers contained a grossly inconsistent p-value that may have affected the statistical conclusion. In contrast to earlier findings, we found that the average prevalence of inconsistent p-values has been stable over the years or has declined. The prevalence of gross inconsistencies was higher in p-values reported as significant than in p-values reported as nonsignificant. This could indicate a systematic bias in favor of significant results. Possible solutions for the high prevalence of reporting inconsistencies could be to encourage sharing data, to let co-authors check results in a so-called “co-pilot model,” and to use statcheck to flag possible inconsistencies in one’s own manuscript or during the review process.

This is an open access article so dig in for all the details discovered by the authors.

The R package statcheck: Extract Statistics from Articles and Recompute P Values is quite amazing. The manual for statcheck should have you up and running in short order.

I did puzzle over the proposed solutions:

Possible solutions for the high prevalence of reporting inconsistencies could be to encourage sharing data, to let co-authors check results in a so-called “co-pilot model,” and to use statcheck to flag possible inconsistencies in one’s own manuscript or during the review process.

All of those are good suggestions but we already have the much valued process of “peer review” and the value-add of both non-profit and commercial publishers. Surely those weighty contributions to the process of review and publication should be enough to quell this “…systematic bias in favor of significant results.”

Unless, of course, dependence on “peer review” and the value-add of publishers for article quality is entirely misplaced. Yes?

What area with “p-values reported as significant” will fall to statcheck next?

October 13, 2015

Tomas Petricek on The Against Method

Filed under: Language,Science,Scientific Computing — Patrick Durusau @ 1:57 pm

Tomas Petricek on The Against Method by Tomas Petricek.

From the webpage:

How is computer science research done? What we take for granted and what we question? And how do theories in computer science tell us something about the real world? Those are some of the questions that may inspire computer scientist like me (and you!) to look into philosophy of science. I’ll present the work of one of the more extreme (and interesting!) philosophers of science, Paul Feyerabend. In “Against Method”, Feyerabend looks at the history of science and finds that there is no fixed scientific methodology and the only methodology that can encompass the rich history is ‘anything goes’. We see (not only computer) science as a perfect methodology for building correct knowledge, but is this really the case? To quote Feyerabend:

“Science is much more ‘sloppy’ and ‘irrational’ than its methodological image.”

I’ll be mostly talking about Paul Feyerabend’s “Against Method”, but as a computer scientist myself, I’ll insert a number of examples based on my experience with theoretical programming language research. I hope to convince you that looking at philosophy of science is very much worthwhile if we want to better understand what we do and how we do it as computer scientists!

The video runs an hour and about eighteen minutes but is worth every minute of it. As you can imagine, I was particularly taken with Tomas’ emphasis on the importance of language. Tomas goes so far as to suggest that disagreements about “type” in computer science stem from fundamentally different understandings of the word “type.”

I was reminded of Stanley Fish‘s “Doing What Comes Naturally (DWCN).

DWCN is a long and complex work but in brief Fish argues that we are all members of various “interpretive communities,” and that each of those communities influence how we understand language as readers. Which should come as assurance to those who fear intellectual anarchy and chaos because our interpretations are always within the context of an interpretative community.

Two caveats on Fish. As far as I know, Fish has never made the strong move and pointed out that his concept of “interpretative communities is just as applicable to natural sciences as it is to social sciences. What passes as “objective” today is part and parcel of an interpretative community that has declared it so. Other interpretative communities can and do reach other conclusions.

The second caveat is more sad than useful. Post-9/11, Fish and a number of other critics who were accused of teaching cultural relativity of values felt it necessary to distance themselves from that position. While they could not say that all cultures have the same values (factually false), they did say that Western values, as opposed to those of “cowardly, murdering,” etc. others, were superior.

If you think there is any credibility to that post-9/11 position, you haven’t read enough Chompsky. 9/11 wasn’t 1/100,0000 of the violence the United States has visited on civilians in other countries after the Korea War.

October 8, 2015

“Big data are not about data,” Djorgovski says. “It’s all about discovery.” [Not re-discovery]

Filed under: Astroinformatics,BigData,Science — Patrick Durusau @ 9:14 am

I first saw this quote in a tweet by Kirk Borne. It is the concluding line from George Djorgovski looks for knowledge hidden in data by Rebecca Fairley Raney.

From the post:

When you sit down to talk with an astronomer, you might expect to learn about galaxies, gravity, quasars or spectroscopy. George Djorgovski could certainly talk about all those topics.

But Djorgovski, a professor of astronomy at the California Institute of Technology, would prefer to talk about data.

The AAAS Fellow has spent more than three decades watching scientists struggle to find needles in massive digital haystacks. Now, he is director of the Center for Data-Driven Discovery at Caltech, where staff scientists are developing advanced data analysis techniques and applying them to fields as disparate as plant biology, disaster response, genetics and neurobiology.

The descriptions of the projects at the center are filled with esoteric phrases like “hyper-dimensional data spaces” and “datascape geometry.”

Astronomy was “always advanced as a digital field,” Djorgovski says, and in recent decades, important discoveries in the field have been driven by novel uses of data.

Take the discovery of quasars.

In the early 20th century, astronomers using radio telescopes thought quasars were stars. But by merging data from different types of observations, they discovered that quasars were rare objects that are powered by gas that spirals into black holes in the center of galaxies.

Quasars were discovered not by a single observation, but by a fusion of data.

It is assumed by Djorgovski and his readers that future researchers won’t have to start from scratch when researching quasars. They can but don’t have to re-mine all the data that supported their original discovery or their association with black holes.

Can you say the same for discoveries you make in your data? Are those discoveries preserved for others or just tossed back into the sea of big data?

Contemporary searching is a form of catch-n-release. You start with your question and whether it takes a few minutes or an hour, you find something resembling an answer to your question.

The data is then tossed back to await the next searcher who has the same or similar question.

How are you capturing your search results to benefit the next searcher?

October 4, 2015

8,400 NASA Apollo Moon Mission Photos

Filed under: Science — Patrick Durusau @ 9:18 pm

Over 8,400 NASA Apollo moon mission photos just landed online, in high-resolution by Xeni Jardin.

From the post:

Space fans, rejoice: today, just about every image captured by Apollo astronauts on lunar missions is now on the Project Apollo Archive Flickr account. There are some 8,400 photographs in all at a resolution of 1800 dpi, and they’re sorted by the roll of film they were on.

The Project Apollo Archive is also on Facebook. They’ll be showcasing new renderings of some of the best imagery, and other rare images including Apollo 11 training photos.

The Apollo astronauts were sent to the moon with Hasselblad cameras, and the resulting prints have been painstakingly restored for contemporary high-resolution screens for this wonderful archival project.

Long live space.

Tear yourself away from news feeds humming with the latest non-events accompanied by screaming headlines.

The U.S. space program did not unify everyone and there were a multitude of problems (still present) on the ground.

But, it represents what can be achieved by a government that isn’t trying to avoid blame for random and unpreventable acts.

September 24, 2015

Data Analysis for the Life Sciences… [No Ur-Data Analysis Book?]

Filed under: Data Analysis,Life Sciences,Science — Patrick Durusau @ 7:42 pm

Data Analysis for the Life Sciences – a book completely written in R markdown by Rafael Irizarry.

From the post:

Data analysis is now part of practically every research project in the life sciences. In this book we use data and computer code to teach the necessary statistical concepts and programming skills to become a data analyst. Following in the footsteps of Stat Labs, instead of showing theory first and then applying it to toy examples, we start with actual applications and describe the theory as it becomes necessary to solve specific challenges. We use simulations and data analysis examples to teach statistical concepts. The book includes links to computer code that readers can use to program along as they read the book.

It includes the following chapters: Inference, Exploratory Data Analysis, Robust Statistics, Matrix Algebra, Linear Models, Inference for High-Dimensional Data, Statistical Modeling, Distance and Dimension Reduction, Practical Machine Learning, and Batch Effects.

Have you ever wondered about the growing proliferation of data analysis books?

The absence of one Ur-Data Analysis book that everyone could read and use?

I have a longer post coming on a this idea but if each discipline has the need for its own view on data analysis, it is really surprising that no one system of semantics satisfies all communities?

In other words, is the evidence of heterogeneous semantics so strong that we should abandon attempts at uniform semantics and focus on communicating across systems of semantics?

I’m sure there are other examples of where every niche has its own vocabulary, tables in relational databases or column headers in spreadsheets for example.

What is your favorite example of heterogeneous semantics?

Assuming heterogeneous semantics are here to stay (they have been around since the start of human to human communication, possibly earlier), what solution do you suggest?

I first saw this in a tweet by Christophe Lalanne.

September 22, 2015

Python for Scientists [Warning – Sporadic Content Ahead]

Filed under: Programming,Python,Science — Patrick Durusau @ 10:36 am

Python for Scientists: A Curated Collection of Chapters from the O’Reilly Data and Programming Libraries

From the post:

More and more, scientists are seeing tech seep into their work. From data collection to team management, various tools exist to make your lives easier. But, where to start? Python is growing in popularity in scientific circles, due to its simple syntax and seemingly endless libraries. This free ebook gets you started on the path to a more streamlined process. With a collection of chapters from our top scientific books, you’ll learn about the various options that await you as you strengthen your computational thinking.

This free ebook includes chapters from:

  • Python for Data Analysis
  • Effective Computation in Physics
  • Bioinformatics Data Skills
  • Python Data Science Handbook

Warning: You give your name and email to the O’Reilly marketing marketing machine and get:

Python for Data Analysis

Python Language Essentials Appendix

Effective Computation in Physics

Chapter 1: Introduction to the Command Line
Chapter 7: Analysis and Visualization
Chapter 20: Publication

Bioinformatics Data Skills

Chapter 4: Working with Remote Machines
Chapter 5: Git for Scientists

Python Data Science Handbook

Chapter 3: Introduction to NumPy
Chapter 4: Introduction to Pandas

The content present is very good. The content missing is vast.

August 25, 2015

Looking for Big Data? Look Up!

Filed under: Astroinformatics,BigData,Science — Patrick Durusau @ 5:02 pm

Gaia’s first year of scientific observations

From the post:

After launch on 19 December 2013 and a six-month long in-orbit commissioning period, the satellite started routine scientific operations on 25 July 2014. Located at the Lagrange point L2, 1.5 million km from Earth, Gaia surveys stars and many other astronomical objects as it spins, observing circular swathes of the sky. By repeatedly measuring the positions of the stars with extraordinary accuracy, Gaia can tease out their distances and motions through the Milky Way galaxy.

For the first 28 days, Gaia operated in a special scanning mode that sampled great circles on the sky, but always including the ecliptic poles. This meant that the satellite observed the stars in those regions many times, providing an invaluable database for Gaia’s initial calibration.

At the end of that phase, on 21 August, Gaia commenced its main survey operation, employing a scanning law designed to achieve the best possible coverage of the whole sky.

Since the start of its routine phase, the satellite recorded 272 billion positional or astrometric measurements, 54.4 billion brightness or photometric data points, and 5.4 billion spectra.

The Gaia team have spent a busy year processing and analysing these data, en route towards the development of Gaia’s main scientific products, consisting of enormous public catalogues of the positions, distances, motions and other properties of more than a billion stars. Because of the immense volumes of data and their complex nature, this requires a huge effort from expert scientists and software developers distributed across Europe, combined in Gaia’s Data Processing and Analysis Consortium (DPAC).

In case you missed it:

Since the start of its routine phase, the satellite recorded 272 billion positional or astrometric measurements, 54.4 billion brightness or photometric data points, and 5.4 billion spectra.

It sounds like big data. Yes? 😉

Public release of the data is pending. Check back at the Gaia homepage for the latest news.

July 2, 2015

Collaborative Annotation for Scientific Data Discovery and Reuse [+ A Stumbling Block]

Filed under: Annotation,Data Models,Ontology,Science,Subject Headings,Taxonomy — Patrick Durusau @ 2:49 pm

Collaborative Annotation for Scientific Data Discovery and Reuse by Kirk Borne.

From the post:

The enormous growth in scientific data repositories requires more meaningful indexing, classification and descriptive metadata in order to facilitate data discovery, reuse and understanding. Meaningful classification labels and metadata can be derived autonomously through machine intelligence or manually through human computation. Human computation is the application of human intelligence to solving problems that are either too complex or impossible for computers. For enormous data collections, a combination of machine and human computation approaches is required. Specifically, the assignment of meaningful tags (annotations) to each unique data granule is best achieved through collaborative participation of data providers, curators and end users to augment and validate the results derived from machine learning (data mining) classification algorithms. We see very successful implementations of this joint machine-human collaborative approach in citizen science projects such as Galaxy Zoo and the Zooniverse (http://zooniverse.org/).

In the current era of scientific information explosion, the big data avalanche is creating enormous challenges for the long-term curation of scientific data. In particular, the classic librarian activities of classification and indexing become insurmountable. Automated machine-based approaches (such as data mining) can help, but these methods only work well when the classification and indexing algorithms have good training sets. What happens when the data includes anomalous patterns or features that are not represented in the training collection? In such cases, human-supported classification and labeling become essential – humans are very good at pattern discovery, detection and recognition. When the data volumes reach astronomical levels, it becomes particularly useful, productive and educational to crowdsource the labeling (annotation) effort. The new data objects (and their associated tags) then become new training examples, added to the data mining training sets, thereby improving the accuracy and completeness of the machine-based algorithms.
….

Kirk goes onto say:

…it is incumbent upon science disciplines and research communities to develop common data models, taxonomies and ontologies.

Sigh, but we know from experience that has never worked. True, we can develop more common data models, taxonomies and ontologies, but they will be in addition to the present common data models, taxonomies and ontologies. Not to mention that developing knowledge is going to lead to future common data models, taxonomies and ontologies.

If you don’t believe me, take a look at: Library of Congress Subject Headings Tentative Monthly List 07 (July 17, 2015). These subject headings have not yet been approved but they are in addition to existing subject headings.

The most recent approved list: Library of Congress Subject Headings Monthly List 05 (May 18, 2015). For approved lists going back to 1997, see: Library of Congress Subject Headings (LCSH) Approved Lists.

Unless you are working in some incredibly static and sterile field, the basic terms that are found in “common data models, taxonomies and ontologies” are going to change over time.

The only sure bet in the area of knowledge and its classification is that change is coming.

But, Kirk is right, common data models, taxonomies and ontologies are useful. So how do we make them more useful in the face of constant change?

Why not use topics to model elements/terms of common data models, taxonomies and ontologies? Which would enable user to search across such elements/terms by the properties of those topics. Possibly discovering topics that represent the same subject under a different term or element.

Imagine working on an update of a common data model, taxonomy or ontology and not having to guess at the meaning of bare elements or terms? A wealth of information, including previous elements/terms for the same subject being present at each topic.

All of the benefits that Kirk claims would accrue, plus empowering users who only know previous common data models, taxonomies and ontologies, to say nothing of easing the transition to future common data models, taxonomies and ontologies.

Knowledge isn’t static. Our methodologies for knowledge classification should be as dynamic as the knowledge we seek to classify.

July 1, 2015

Digital Data Repositories in Chemistry…

Filed under: Cheminformatics,Chemistry,Publishing,Science,Subject Identifiers — Patrick Durusau @ 4:36 pm

Digital Data Repositories in Chemistry and Their Integration with Journals and Electronic Notebooks by Matthew J. Harvey, Nicholas J. Mason, Henry S. Rzepa.

Abtract:

We discuss the concept of recasting the data-rich scientific journal article into two components, a narrative and separate data components, each of which is assigned a persistent digital object identifier. Doing so allows each of these components to exist in an environment optimized for purpose. We make use of a poorly-known feature of the handle system for assigning persistent identifiers that allows an individual data file from a larger file set to be retrieved according to its file name or its MIME type. The data objects allow facile visualization and retrieval for reuse of the data and facilitates other operations such as data mining. Examples from five recently published articles illustrate these concepts.

A very promising effort to integrate published content and electronic notebooks in chemistry. Encouraging that in addition to the technical and identity issues the authors also point out the lack of incentives for the extra work required to achieve useful integration.

Everyone agrees that deeper integration of resources in the sciences will be a game-changer but renewing the realization that there is no such thing as a free lunch, is an important step towards that goal.

This article easily repays a close read with interesting subject identity issues and the potential that topic maps would offer to such an effort.

June 29, 2015

ChemistryWorld Podcasts: Compounds (Phosgene)

Filed under: Cheminformatics,Chemistry,Science — Patrick Durusau @ 2:36 pm

Chemistry in its elements: Compounds is a weekly podcast sponsored by ChemistryWorld, which features a chemical compound or group of compounds every week.

Matthew Gunter has a podcast entitled: Phosgene.

In case your recent history is a bit rusty, phosgene was one of the terror weapons of World War I. It accounted for 85% of the 100,000 deaths from chemical gas. Not as effective as say sarin but no slouch.

Don’t run to the library, online guides or the FBI for recipes to make phosgene at home. Its use in industrial applications should give you a clue as to an alternative to home-made phosgene. Use of phosgene violates the laws of war, so being a thief as well should not trouble you.

No, I don’t have a list of locations that make or use phosgene, but then DHS probably doesn’t either. They are more concerned with terrorists using “nuclear weapons” or “gamma-ray bursts“. One is mechanically and technically difficult to do well and the other is impossible to control.

The idea of someone using a dual-wheel pickup and a plant pass to pickup and deliver phosgene gas is too simple to have occurred to them.

If you are pitching topic maps to a science/chemistry oriented audience, these podcasts make a nice starting point for expansion. To date there are two hundred and forty-two (242) of them.

Enjoy!

June 25, 2015

Eidyia (Scientific Python)

Filed under: Python,Science — Patrick Durusau @ 2:30 pm

Eidyia

From the webpage:

A scientific Python 3 environment configured with Vagrant. This environment is designed to be used by professionals and students, with ease of access a priority.

Libraries included:

Databases

Eidyia also includes MongoDB and PostgreSQL

Getting Started

With Vagrant and VirtualBox installed:

Watch the Vagrant link on the Github page, it is broken. Correct link appears above. (I am posting an issue about the link to Github.)

The more experience I have with virtual environments, the more I like them. Mostly from a configuration perspective. I don’t have to worry about library upgrades stepping on other programs, port confusion, etc.

Enjoy!

May 31, 2015

The peer review drugs don’t work [Faith Based Science]

Filed under: Peer Review,Publishing,Science,Social Sciences — Patrick Durusau @ 10:44 am

The peer review drugs don’t work by Richard Smith.

From the post:

It is paradoxical and ironic that peer review, a process at the heart of science, is based on faith not evidence.

There is evidence on peer review, but few scientists and scientific editors seem to know of it – and what it shows is that the process has little if any benefit and lots of flaws.

Peer review is supposed to be the quality assurance system for science, weeding out the scientifically unreliable and reassuring readers of journals that they can trust what they are reading. In reality, however, it is ineffective, largely a lottery, anti-innovatory, slow, expensive, wasteful of scientific time, inefficient, easily abused, prone to bias, unable to detect fraud and irrelevant.

As Drummond Rennie, the founder of the annual International Congress on Peer Review and Biomedical Publication, says, “If peer review was a drug it would never be allowed onto the market.”

Cochrane reviews, which gather systematically all available evidence, are the highest form of scientific evidence. A 2007 Cochrane review of peer review for journals concludes: “At present, little empirical evidence is available to support the use of editorial peer review as a mechanism to ensure quality of biomedical research.”

We can see before our eyes that peer review doesn’t work because most of what is published in scientific journals is plain wrong. The most cited paper in Plos Medicine, which was written by Stanford University’s John Ioannidis, shows that most published research findings are false. Studies by Ioannidis and others find that studies published in “top journals” are the most likely to be inaccurate. This is initially surprising, but it is to be expected as the “top journals” select studies that are new and sexy rather than reliable. A series published in The Lancet in 2014 has shown that 85 per cent of medical research is wasted because of poor methods, bias and poor quality control. A study in Nature showed that more than 85 per cent of preclinical studies could not be replicated, the acid test in science.

I used to be the editor of the BMJ, and we conducted our own research into peer review. In one study we inserted eight errors into a 600 word paper and sent it 300 reviewers. None of them spotted more than five errors, and a fifth didn’t detect any. The median number spotted was two. These studies have been repeated many times with the same result. Other studies have shown that if reviewers are asked whether a study should be published there is little more agreement than would be expected by chance.

As you might expect, the humanities are lagging far behind the sciences in acknowledging that peer review is an exercise in social status rather than quality:


One of the changes I want to highlight is the way that “peer review” has evolved fairly quietly during the expansion of digital scholarship and pedagogy. Even though some scholars, such as Kathleen Fitzpatrick, are addressing the need for new models of peer review, recognition of the ways that this process has already been transformed in the digital realm remains limited. The 2010 Center for Studies in Higher Education (hereafter cited as Berkeley Report) comments astutely on the conventional role of peer review in the academy:

Among the reasons peer review persists to such a degree in the academy is that, when tied to the venue of a publication, it is an efficient indicator of the quality, relevance, and likely impact of a piece of scholarship. Peer review strongly influences reputation and opportunities. (Harley, et al 21)

These observations, like many of those presented in this document, contain considerable wisdom. Nevertheless, our understanding of peer review could use some reconsideration in light of the distinctive qualities and conditions associated with digital humanities.
…(Living in a Digital World: Rethinking Peer Review, Collaboration, and Open Access by Sheila Cavanagh.)

Can you think of another area where something akin to peer review is being touted?

What about internal guidelines of the CIA, NSA, FBI and secret courts reviewing actions by those agencies?

How do those differ from peer review, which is an acknowledged failure in science and should be acknowledged in the humanities?

They are quite similar in the sense that some secret group is empowered to make decisions that impact others and members of those groups, don’t want to relinquish those powers. Surprise, surprise.

Peer review should be scrapped across the board and replaced by tracked replication and use by others, both in the sciences and the humanities.

Government decisions should be open to review by all its citizens and not just a privileged few.

May 27, 2015

Ephemeral identifiers for life science data

Filed under: Identifiers,Life Sciences,Science — Patrick Durusau @ 1:55 pm

10 Simple rules for design, provision, and reuse of persistent identifiers for life science data by Julie A. McMurray, et al. (35 others).

From the introduction:

When we interact, we use names to identify things. Usually this works well, but there are many familiar pitfalls. For example , the “morning star” and “evening star” are both names for the planet Venus. “The luminiferous ether” is a name for an entity which no one still thinks exists. There are many women named “Margaret”, some of whom go by “Maggie” and some of whom have changed their surnames. We use everyday conversational mechanisms to work around these problems successfully. Naming problems have plagued the life sciences since Linnaeus pondered the Norway spruce; in the much larger conversation that underlies the life sciences, problems with identifiers (Box 1) impede the flow and integrity of information. This is especially challenging within “synthesis research” disciplines such as systems biology, translational medicine, and ecology. Implementation – driven initiatives such as ELIXIR , BD2K, and others (Text S1) have therefore been actively working to understand and address underlying problems with identifiers.

Good, global-scale, persistent identifier design is harder than it appears, and is essential for data to be Findable, Accessible, Interoperable, and Reusable (Data FAIRport principles [1]). Digital entities (e.g., files), physical entities (e.g., biosamples), and descriptive entities (e.g., ‘mitosis’) have different requirements for identifiers. Identifiers are further complicated by imprecise terminology and different forms (Box 1).

Of the identifier forms, Local Resource Identifiers (LRI) and their corresponding full Uniform Resource Identifiers (URIs) are still among the most commonly used and most problematic identifiers in the bio-data ecosystem. Other forms of identifiers such as Uniform Resource Name (URNs) are less impactful because of their current lack of uptake. Here, we build on emerging conventions and existing general recommendations [2,3] and summarise the identifier characteristics most important to optimising the flow and integrity of life-science data (Table 1). We propose actions to take in the identifier ‘green field’ and offer guidance for using real-world identifiers from diverse sources.

Truth be told, global, persistent identifier design is overreaching.

First, some identifiers are more widely used than others, but there are no globally accepted identifiers of any sort.

Second, “persistent” is undefined. Present identifiers (curies or URIs) have not persisted pre-Web identifiers. On what basis would you claim that future generations will persist our identifiers?

However, systems expect to be able to make references by single, opaque, identifiers and so the hunt goes on for a single identifier.

The more robust and in fact persistent approach is to have a bag of identifiers for any subject, where each identifier itself has a bag of properties associated with it.

That avoids the exclusion of old identifiers and hence historical records and avoids pre-exclusion of future identifiers, which come into use long after our identifier is no long the most popular one.

Systems can continue to use a single identifier, locally as it were but software where semantic integration is important, should use sets of identifiers to facilitate integration across data sources.

May 24, 2015

LOFAR Transients Pipeline (“TraP”)

Filed under: Astroinformatics,Python,Science — Patrick Durusau @ 5:41 pm

LOFAR Transients Pipeline (“TraP”)

From the webpage:

The LOFAR Transients Pipeline (“TraP”) provides a means of searching a stream of N-dimensional (two spatial, frequency, polarization) image “cubes” for transient astronomical sources. The pipeline is developed specifically to address data produced by the LOFAR Transients Key Science Project, but may also be applicable to other instruments or use cases.

The TraP codebase provides the pipeline definition itself, as well as a number of supporting routines for source finding, measurement, characterization, and so on. Some of these routines are also available as stand-alone tools.

High-level overview

The TraP consists of a tightly-coupled combination of a “pipeline definition” – effectively a Python script that marshals the flow of data through the system – with a library of analysis routines written in Python and a database, which not only contains results but also performs a key role in data processing.

Broadly speaking, as images are ingested by the TraP, a Python-based source-finding routine scans them, identifying and measuring all point-like sources. Those sources are ingested by the database, which associates them with previous measurements (both from earlier images processed by the TraP and from other catalogues) to form a lightcurve. Measurements are then performed at the locations of sources which were expected to be seen in this image but which were not detected. A series of statistical analyses are performed on the lightcurves constructed in this way, enabling the quick and easy identification of potential transients. This process results in two key data products: an archival database containing the lightcurves of all point-sources included in the dataset being processed, and community alerts of all transients which have been identified.

Exploiting the results of the TraP involves understanding and analysing the resulting lightcurve database. The TraP itself provides no tools directly aimed at this. Instead, the Transients Key Science Project has developed the Banana web interface to the database, which is maintained separately from the TraP. The database may also be interrogated by end-user developed tools using SQL.

While it uses the term “association,” I think you will conclude it is much closer to merging in a topic map sense:

The association procedure knits together (“associates”) the measurements in extractedsource which are believed to originate from a single astronomical source. Each such source is given an entry in the runningcatalog table which ties together all of the measurements by means of the assocxtrsource table. Thus, an entry in runningcatalog can be thought of as a reference to the lightcurve of a particular source.

Perhaps not of immediate use but good reading and a diversion from corruption, favoritism, oppression and other usual functions of government.

May 4, 2015

“The ultimate goal is evidence-based data analysis”

Filed under: Science,Statistics — Patrick Durusau @ 1:30 pm

Statistics: P values are just the tip of the iceberg by Jeffrey T. Leek & Roger D. Peng.

From the summary:

Ridding science of shoddy statistics will require scrutiny of every step, not merely the last one, say Jeffrey T. Leek and Roger D. Peng.

From the post:

p-value1

Leek and Peng are right but I would shy away from ever claiming “…evidence-based data analysis.”

You can disclose the choices you make at every stage of the data pipeline but the result isn’t “…evidence-based data analysis.”

I say that because “…evidence-based data analysis” implies that whatever the result, human agency wasn’t a factor in it. On the contrary, an ineffable part of human judgement is a part of every data analysis.

The purpose of documenting the details of each step is to enable discussion and debate about the choices made in the process.

Just as I object to politicians wrapping themselves in national flags, I equally object to anyone wrapping themselves in “evidence/facts” as though they and only they possess them.

May 1, 2015

Replication in Psychology?

Filed under: Peer Review,Psychology,Researchers,Science — Patrick Durusau @ 8:28 pm

First results from psychology’s largest reproducibility test by Monya Baker.

From the post:

An ambitious effort to replicate 100 research findings in psychology ended last week — and the data look worrying. Results posted online on 24 April, which have not yet been peer-reviewed, suggest that key findings from only 39 of the published studies could be reproduced.

But the situation is more nuanced than the top-line numbers suggest (See graphic, ‘Reliability test’). Of the 61 non-replicated studies, scientists classed 24 as producing findings at least “moderately similar” to those of the original experiments, even though they did not meet pre-established criteria, such as statistical significance, that would count as a successful replication.

The project, known as the “Reproducibility Project: Psychology”, is the largest of a wave of collaborative attempts to replicate previously published work, following reports of fraud and faulty statistical analysis as well as heated arguments about whether classic psychology studies were robust. One such effort, the ‘Many Labs’ project, successfully reproduced the findings of 10 of 13 well-known studies3.

Replication is a “hot” issue and likely to get hotter if peer review shifts to be “open.”

Do you really want to be listed as a peer reviewer for a study that cannot be replicated?

Perhaps open peer review will lead to more accountability of peer reviewers.

Yes?

April 13, 2015

An experimental bird migration visualization

Filed under: CartoDB,Cartography,Mapping,Science — Patrick Durusau @ 9:36 am

Time Integrated Multi-Altitude Migration Patterns by Wouter Van den Broeck, Jan Klaas Van Den Meersche, Kyle Horton, and Sérgio Branco.

From the webpage:

The Problem

Every year hundreds of millions of birds migrate to and from their wintering and breeding grounds, often traveling hundreds, if not thousands of kilometers twice a year. Many of these individuals make concentrated movements under the cover of darkness, and often at high altitudes, making it exceedingly difficult to precisely monitor the passage of these animals.

However one tool, radar, has the ability to measure the mass flow of migrants both day and night at a temporal and spatial resolution that cannot be matched by any other monitoring tool. Weather surveillance radars such as those of the EUMETNET/OPERA and NEXRAD networks continually monitor and collect data in real-time, monitoring meteorological phenomena, but also biological scatters (birds, bats, and insects). For this reason radar offers a unique tool for collecting large-scale data on biological movements. However, visualizing these data in a comprehensive manner that facilitates insight acquisition, remains a challenge.

Our contribution

To help tackle this challenge, the European Network for the Radar Surveillance of Animal Movement (ENRAM) organized the Bird Migration Visualization Challenge & Hackathon in March 2015 with the support of the European Cooperation in Science and Technology (COST) programme. We participated and explored a particular approach.

Using radar measures of bioscatter (birds, bats, and insects), algorithms can estimate the density, speed, and direction of migration movement at different altitudes around a radar. By interpolating these data both spatially and temporally, and mapping these geographically in the form of flow lines, a visualization might be obtained that offers insights in the migration patterns when applied to a large-scale dataset. The result is an experimental interactive web-based visualization that dynamically loads data from the given case study served by the CartoDB system.

Impressive work with both static and interactive visualizations!

Enjoy!

April 9, 2015

Building upon the Current Capabilities of WWT

Filed under: Astroinformatics,Science — Patrick Durusau @ 6:09 pm

Building upon the Current Capabilities of WWT

From the post:

WWT to GitHub

WorldWide Telescope is a complex system that supports a wide variety of research, education and outreach activities.  By late 2015, the Windows and HTML5/JavaScript code needed to run WWT will be available in a public (Open Source) GitHub repository. As code moves through the Open Sourcing process during 2015, the OpenWWT web site (www.openwwt.org) will offer updated details appropriate for a technical audience, and contact links for additional information.

Leveraging and Extending WorldWide Telescope

The open WorldWide Telescope codebase will provide new ways of leveraging and extending WWT functionality in the future.  WWT is already friendly to data and reuse thanks to its extant software development kits, and its ability to import data through both the user interface and “WTML” (WWT’s XML based description language to add data into WWT).  The short listing below gives some examples of how data can be accessed, displayed, and explained using WWT as it presently is. Most of these capabilities are demonstrated quickly in the “What Can WorldWide Telescope Do for Me?” video at tinyurl.com/wwt-for-me. The www.worldwidetelescope.org/Developers/ site offers resources useful to developers, and details beyond those offered below.

Creating Tours

What you can do: You can create a variety of tours with WWT. The tour-authoring interface allows tour creators to guide tour viewers through the Universe by positioning a virtual camera in various slides, and WWT animates the between-slide transitions automatically. Tour creators can also add their own images, data, text, music, voice over and other media to enhance the message. Buttons, images and other elements can link to other Tours, ultimately allowing tour viewers to control their own paths. Tour functionality can be used to create Kiosks, menu-driven multimedia content, presentations, training and quizzing interactives and self-service data exploration. In addition to their educational value, tours can be particularly useful in collaborative research projects, where researchers can narrate and/or annotate various views of data.  Tour files are typically small enough to be exchanged easily by email or cloud services. Tours that follow a linear storyline can also be output to high quality video frames for professional quality video production at any resolution desired. Tours can also be hosted in a website to create interactive web content.

Skills Required: WWT tours are one of the most powerful aspects of WWT, and creating them doesn’t require any programing skills. You should know what story you want to tell and understand presentation and layout skills. If you can make a PowerPoint presentation then you should be able to make a WWT tour.  The WorldWide Telescope Ambassadors (outreach-focused) website provides a good sample of Tours, at wwtambassadors.org/tours, and a good tour to experience to see the largest number of tour features in use all at once is “John Huchra’s Universe,” at wwtambassadors.org/john-huchras-universe.  A sample tour-based kiosk is online at edukiosks.harvard.edu.  A video showing a sample research tour (meant for communication with collaborators) is at tinyurl.com/morenessies.

That is just a sample of the news from the WorldWide Telescope!

The popular press keeps bleating about “big data.” Some of which will be useful, some of which will not. But imagine a future when data from all scientific experiments supported by the government are streamed online at the time of acquisition. It won’t be just “big data” but rather “data that makes a difference.” As the decades of data accumulates, synthetic analysis can be performed on all the available data, not just the snippet that you were able to collect.

Hopefully even private experiments will be required to contribute their data as well. Facts are facts and not subject to ownership. Private entities could produce products subject to patents but knowledge itself should be patent free.

April 7, 2015

Exploring the Unknown Frontier of the Brain

Filed under: Neural Information Processing,Neural Networks,Neuroinformatics,Science — Patrick Durusau @ 1:33 pm

Exploring the Unknown Frontier of the Brain by James L. Olds.

From the post:

To a large degree, your brain is what makes you… you. It controls your thinking, problem solving and voluntary behaviors. At the same time, your brain helps regulate critical aspects of your physiology, such as your heart rate and breathing.

And yet your brain — a nonstop multitasking marvel — runs on only about 20 watts of energy, the same wattage as an energy-saving light bulb.

Still, for the most part, the brain remains an unknown frontier. Neuroscientists don’t yet fully understand how information is processed by the brain of a worm that has several hundred neurons, let alone by the brain of a human that has 80 billion to 100 billion neurons. The chain of events in the brain that generates a thought, behavior or physiological response remains mysterious.

Building on these and other recent innovations, President Barack Obama launched the Brain Research through Advancing Innovative Neurotechnologies Initiative (BRAIN Initiative) in April 2013. Federally funded in 2015 at $200 million, the initiative is a public-private research effort to revolutionize researchers’ understanding of the brain.

James reviews currently funded efforts under the BRAIN Initiative, each of which is pursuing possible ways to explore, model and understand brain activity. Exploration in its purest sense. The researchers don’t know what they will find.

I suspect the leap from not understanding <302 neurons in a worm to understanding the 80 to 100 billion neurons in each person, is going to happen anytime soon. Just as well, think of all the papers, conferences and publications along the way!

April 5, 2015

Photoshopping Science? Where Was Peer Review?

Filed under: Bioinformatics,Peer Review,Science — Patrick Durusau @ 6:46 pm

Too Much to be Nothing? by Leonid Schneider.

From the post:

(March 24th, 2015) Already at an early age, Olivier Voinnet had achieved star status among plant biologists – until suspicions arose last year that more than 30 of his publications contained dubious images. Voinnet’s colleagues are shocked – and demand an explanation.

Several months ago, a small group of international plant scientists set themselves the task of combing through the relevant literature for evidence of potential data manipulation. They posted their discoveries on the post-publication peer review platform PubPeer. As one of these anonymous scientists (whose real name is known to Laborjournal/Lab Times) explained, all this detective work was accomplished simply by taking a good look at the published figures. Soon, the scientists stumbled on something unexpected: putative image manipulations in the papers of one of the most eminent scientists in the field, Sir David Baulcombe. Even more strikingly, all these suspicious publications (currently seven, including papers in Cell, PNAS and EMBO J) featured his former PhD student, Olivier Voinnet, as first or co-author.

Baulcombe’s research group at The Sainsbury Laboratory (TSL) in Norwich, England, has discovered nothing less than RNA interference (RNAi) in plants, the famous viral defence mechanism, which went on to revolutionise biomedical research as a whole and the technology of controlled gene silencing in particular. Olivier Voinnet himself also prominently contributed to this discovery, which certainly helped him, then only 33 years old, to land a research group leader position at the CNRS Institute for Plant Molecular Biology in Strasbourg, in his native country, France. During his time in Strasbourg, Voinnet won many prestigious prizes and awards, such as the ERC Starting Grant and the EMBO Young Investigator Award, plus the EMBO Gold Medal. Finally, at the end of 2010, the Swiss Federal Institute of Technology (ETH) in Zürich appointed the 38-year-old EMBO Member as Professor of RNA biology. Shortly afterwards, Voinnet was awarded the well-endowed Max Rössler Prize of the ETH.

Disturbing news from the plant sciences of evidence of photo manipulation in published articles.

The post examines the charges at length and indicates what is or is not known at this juncture. Investigations are underway and reports from those investigation will appear in the future.

A step that could be taken now, since the articles in question (about 20) have been published, would be for the journals to disclose the peer reviewers who failed to catch the photo manipulation.

The premise of peer review is holding an author responsible for the content of their article so it is only fair to hold peer reviewers responsible for articles approved by their reviews.

Peer review isn’t much of a gate keeper if it is unable to discover false information or even patterns of false information prior to publication.

I haven’t been reading Lab Times on a regular basis but it looks like I need to correct that oversight.

March 20, 2015

Split opens up on Capitol Hill over science funding

Filed under: Government,Politics,Science — Patrick Durusau @ 5:49 pm

Split opens up on Capitol Hill over science funding by Rebecca Trager.

From the post:

A conflict several years in the making between Republican leaders in Congress and US science agencies has reached boiling point. Science advocates and researchers that depend on government grants are particularly worried now that Republicans control both chambers of Congress. They fear that science budgets will be cut and the independence of research agencies curtailed.

Their concerns have been sparked by two simultaneous developments: increasing public criticism by key Republicans of research funded by agencies like the National Science Foundation (NSF) and a congressional power shift that has placed many vocal so-called climate change sceptics and opponents of environmental regulations in positions of power. This shift has been marked by a number of top Republicans publicly questioning the value of some research based on a reading of the grant’s title and abstract.

But the problem appears to go beyond mocking apparently silly-sounding research. ‘It is not only that politicians are making fun of scientific projects that sound outlandish or impractical, they are literally rejecting science in order to gain political advantage,’ says Sean Carroll, a theoretical physicist at the California Institute of Technology, US. This could have to do with pleasing campaign contributors or constituencies, he suggests.

‘There is an attack on the actual substance of the science being done in an attempt to limit the type of science that federal agencies can do because the results of that investigation would be politically inconvenient,’ says Will White, a marine biology professor at the University of North Carolina-Wilmington, US, who has received science agency grants in the past.

An important story and one where you have a direct interest in coming to the aid of science. Science has been generating important and big data for decades. Now it is about to step up its game and the political elites want to put their oar in.

Not that science was ever the genteel and clean pursuit of truth myth that we were taught in elementary school, on that see: The raw politics of science by Judith Curry. Judith isn’t talking about politics as with the government but inside science itself.

That’s important to remember because despite their experience with internal politics of science, scientists as a class don’t seem to get that pointing out politicians could be replaced by sump pumps isn’t a winning strategy.

Not that I disagree, in fact I had to invent the comparison to a “sump pump” to have something I was comfortable publishing on this blog. My actual opinion is a good deal more colorful and quite a bit less generous.

The other reason why scientists are at a disadvantage is that as a rule, politicians may have attended Ivy League colleges but they have a bar maid’s understanding of the world. The sole and only question of interest to a politician is what are you doing right now to further their interests. Well, or what it would take for you to further their interests.

That scientists may discover things unknown to their constituents, that may save the lives of their constituents, that may help reshape the future, but none of those are values in their universe. What matters are the uninformed opinions of people benighted enough to elect them to public office. So long as those opinions total up to be more than 50% of the voters in their district, what else is there to value?

None of what I have just said is new, original or surprising to anyone. You will hear politer and coarser versions of it as debates over the funding of science heats up.

NIH only so long as he has people to donate to him, volunteer for him, vote for him, do deals with him, etc. What if by using big data supporters of science could reach out to every cancer survivor who survived because of the NIH? Or reached out to the survivors who lost a loved one because NIH funded research found a cure too late due to budget cuts?

Do you think they would be as sympathetic to Rand Paul as before? When the blood on the knife in the back of the NIH is that of a family member? Somehow I doubt they would keep donating to Sen. Paul.

Won’t it be ironic if big data makes big government personal for everyone? Not just the top 1%.

Let’s use big data to make the 2016 election personal, very personal for everyone.

PS: I’m thinking about data sets that could be used together to create a “personal” interest in the 2016 elections. Suggestions more than welcome!

I first saw this in a tweet by Chemistry World.

March 17, 2015

World Register of Marine Introduced Species (WRIMS)

Filed under: Biology,Science — Patrick Durusau @ 1:18 pm

World Register of Marine Introduced Species (WRIMS)

From the post:

WRIMS – a database of introduced and invasive alien marine species – has officially been released to the public. It includes more than 1,400 marine species worldwide, compiled through the collaboration with international initiatives and study of almost 2,500 publications.

WRIMS lists the known alien marine species worldwide, with an indication of the region in which they are considered to be alien. In addition, the database lists whether a species is reported to have ecological or economic impacts and thus considered invasive in that area. Each piece of information is linked to a source publication or a specialist database, allowing users to retrace the information or get access to the full source for more details.

Users can search for species within specific groups, and generate species lists per geographic region, thereby taking into account their origin (alien or origin unknown or uncertain) and invasiveness (invasive, of concern, uncertain …). For each region the year of introduction or first report has been documented where available. In the past, species have sometimes erroneously been labelled as ‘alien in region X’. This information is also stored in WRIMS, clearly indicating that this was an error. Keeping track of these kinds of errors or misidentifications can greatly help researchers and policy makers in dealing with alien species.

WRIMS is a subset of the World Register of Marine Species (WoRMS): the taxonomy of the species is managed by the taxonomic editor community of WoRMS, whereas the alien-related information is managed by both the taxonomic editors and the thematic editors within WRIMS. Just like its umbrella-database WoRMS, WRIMS is dynamic: a team of editors is not only keeping track of new reports of alien species, they also scan existing literature and databases to complete the general distribution range of each alien species in WRIMS.

Are there aliens in your midst? 😉

Exactly the sort of resource that if I don’t capture it now, I will never be able to find it again.

Enjoy!

March 3, 2015

Light as Wave and Particle (Naming Issue?)

Filed under: Physics,Science — Patrick Durusau @ 2:50 pm

Scientists take the first ever photograph of light as both a wave and a particle by Kelly Dickerson.

light-wave-particle

For the first time ever, scientist have snapped a photo of light behaving as both a wave and a particle at the same time.

The research was published on Monday in the journal Nature Communications.

Scientists know that light is a wave. That’s why light can bend around buildings and squeeze through tiny pinholes. Different wavelengths of light are why we can see different colors, and why everyone freaked out about that black and blue dress.

But all the characteristics and behaviors of a wave aren’t enough to explain everything that light does.

Naming issue?

Before this photo, light behaved as a wave or as a particle. Now we have a photo of light between those two states? Neither of the old terms is sufficient by itself.

Who is going to break the news to Cyc? 😉

I first saw this in a tweet by Reg Saddler

February 27, 2015

Banning p < .05 In Psychology [Null Hypothesis Significance Testing Procedure (NHSTP)]

Filed under: Psychology,Science — Patrick Durusau @ 4:58 pm

The recent banning of the Null Hypothesis Significance Testing Procedure (NHSTP) in psychology should be a warning to would be data scientists that even “well established” statistical procedures may be deeply flawed.

Sorry, you may not have seen the news. In Basic and Applied Social Psychology (BASP), Banning Null Hypothesis Significance Testing Procedure (NHSTP) (2015) David Trafimow and Michael Marks write

The Basic and Applied Social Psychology (BASP) 2014 Editorial emphasized that the null hypothesis significance testing procedure (NHSTP) is invalid, and thus authors would be not required to perform it (Trafimow, 2014). However, to allow authors a grace period, the Editorial stopped short of actually banning the NHSTP. The purpose of the present Editorial is to announce that the grace period is over. From now on, BASP is banning the NHSTP.

You may be more familiar with seeing p < .05 rather than Null Hypothesis Significance Testing Procedure (NHSTP).

David Trafimow cites in the 2014 editorial warning about NHSTP his earlier work, Hypothesis Testing and Theory Evaluation at the Boundaries: Surprising Insights From Bayes’s Theorem (2003) as justifying non-use and the later ban of NHSTP.

His argument is summarized in the introduction:

Despite a variety of different criticisms, the standard nullhypothesis significance-testing procedure (NHSTP) has dominated psychology over the latter half of the past century. Although NHSTP has its defenders when used “properly” (e.g., Abelson, 1997; Chow, 1998; Hagen, 1997; Mulaik, Raju, & Harshman, 1997), it has also been subjected to virulent attacks (Bakan, 1966; Cohen, 1994; Rozeboom, 1960; Schmidt, 1996). For example, Schmidt and Hunter (1997) argue that NHSTP is “logically indefensible and retards the research enterprise by making it difficult to develop cumulative knowledge” (p. 38). According to Rozeboom (1997), “Null-hypothesis significance testing is surely the most bone-headedly misguided procedure ever institutionalized in the rote training of science students” (p. 336). The most important reason for these criticisms is that although one can calculate the probability of obtaining a finding given that the null hypothesis is true, this is not equivalent to calculating the probability that the null hypothesis is true given that one has obtained a finding. Thus, researchers are in the position of rejecting the null hypothesis even though they do not know its probability of being true (Cohen, 1994). One way around this problem is to use Bayes’s theorem to calculate the probability of the null hypothesis given that one has obtained a finding, but using Bayes’s theorem carries with it some problems of its own, including a lack of information necessary to make full use of the theorem. Nevertheless, by treating the unknown values as variables, it is possible to conduct some analyses that produce some interesting conclusions regarding NHSTP. These analyses clarify the relations between NHSTP and Bayesian theory and quantify exactly why the standard practice of rejecting the null hypothesis is, at times, a highly questionable procedure. In addition, some surprising findings come out of the analyses that bear on issues pertaining not only to hypothesis testing but also to the amount of information gained from findings and theory evaluation. It is possible that the implications of the following analyses for information gain and theory evaluation are as important as the NHSTP debate.

The most important lines for someone who was trained with the null hypothesis as an undergraduate many years ago:

The most important reason for these criticisms is that although one can calculate the probability of obtaining a finding given that the null hypothesis is true, this is not equivalent to calculating the probability that the null hypothesis is true given that one has obtained a finding. Thus, researchers are in the position of rejecting the null hypothesis even though they do not know its probability of being true (Cohen, 1994).

If you don’t know the probability of the null hypothesis, any conclusion you draw is on very shaky grounds.

Do you think any of the big data “shake-n-bake” mining/processing services are going to call that problem to your attention? True enough, such services may “empower” users but if “empowerment” means producing meaningless results, no thanks.

Trafimow cites Jacob Cohen’s The Earth is Round (p < .05) (1994) in his 2003 work. Cohen is angry and in full voice as only a senior scholar can afford to be.

Take the time to read both Trafimow and Cohen. Many errors are lurking outside your door but that will help you recognize this one.

« Newer PostsOlder Posts »

Powered by WordPress