Archive for the ‘Ethics’ Category


Friday, February 12th, 2016

Manhandled by Robert C. Martin.

From the post:

Warning: Possible sexual abuse triggers.

One of my regular bike-riding podcasts is Astronomy Cast, by Dr. Pamela Gay and Fraser Cain. Indeed, if you go to you’ll see that Astronomy Cast is one of the charities on my list of favorites. Make a contribution and I will send you a green Clean Code wristband, or coffee cup, or sweatshirt. If you listen to Astronomy Cast you’ll also find that I am a sponsor.

This podcast is always about science; and the science content is quite good. It’s techie. It’s geeky. It’s right up my alley. I’ve listened to almost every one of the 399 episodes. If you like science — especially science about space and astronomy, this is a great resource.

But yesterday was different. Yesterday was episode 399; and it was not about science at all. It was entitled: Women in Science; and it was about — sexual harassment.

Not the big kind that gets reported. Not the notorious kind that gets people fired. Not that kind — though there’s enough of that to go around. No, this was about commonplace, everyday, normal sexual harassment.

Honestly, I didn’t know there was such a thing. I’ve always thought that sexual harassment was anomalous behavior perpetrated by a few disgusting, arrogant men in positions of power. It never occurred to me that sexual harassment was an everyday, commonplace, run-of-the-mill, what-else-is-new occurrence. But I listened, aghast, as I heard Dr. Gay recount tales of it. Tales of the kind of sexual harassment that women in Science regularly encounter; and have simply come to expect as a normal fact of life.

You need to read Bob’s post in full but in particular his concluding advice:

  • You never lay your hands on someone with sexual intent without their explicit permission. It does not matter how drunk you are. It does not matter how drunk they are. You never, ever manhandle someone without their very explicit consent. And if they work for you, or if you have power over them, then you must never make the advance, and must never accept the consent.
  • What’s more: if you see harassment in progress, or even something you suspect is harassment, you intervene! You stop it! Even if it means you’ll lose a friend, or your job, you stop it!

Bob makes those points as a matter of “professionalism” for programmers but being considerate of others, is part and parcel of being a decent human being.

The Ethical Data Scientist

Thursday, February 4th, 2016

The Ethical Data Scientist by Cathy O’Neil.

From the post:

After the financial crisis, there was a short-lived moment of opportunity to accept responsibility for mistakes with the financial community. One of the more promising pushes in this direction was when quant and writer Emanuel Derman and his colleague Paul Wilmott wrote the Modeler’s Hippocratic Oath, which nicely sums up the list of responsibilities any modeler should be aware of upon taking on the job title.

The ethical data scientist would strive to improve the world, not repeat it. That would mean deploying tools to explicitly construct fair processes. As long as our world is not perfect, and as long as data is being collected on that world, we will not be building models that are improvements on our past unless we specifically set out to do so.

At the very least it would require us to build an auditing system for algorithms. This would be not unlike the modern sociological experiment in which job applications sent to various workplaces differ only by the race of the applicant—are black job seekers unfairly turned away? That same kind of experiment can be done directly to algorithms; see the work of Latanya Sweeney, who ran experiments to look into possible racist Google ad results. It can even be done transparently and repeatedly, and in this way the algorithm itself can be tested.

The ethics around algorithms is a topic that lives only partly in a technical realm, of course. A data scientist doesn’t have to be an expert on the social impact of algorithms; instead, she should see herself as a facilitator of ethical conversations and a translator of the resulting ethical decisions into formal code. In other words, she wouldn’t make all the ethical choices herself, but rather raise the questions with a larger and hopefully receptive group.

First, the link for the Modeler’s Hippocratic Oath takes you to a splash page at Wiley for Derman’s book: My Life as a Quant: Reflections on Physics and Finance.

The Financial Modelers’ Manifesto (PDF) and The Financial Modelers’ Manifesto (HTML), are valid links as of today.

I commend the entire text of The Financial Modelers’ Manifesto to you for repeated reading but for present purposes, let’s look at the Modelers’ Hippocratic Oath:

~ I will remember that I didn’t make the world, and it doesn’t satisfy my equations.

~ Though I will use models boldly to estimate value, I will not be overly impressed by mathematics.

~ I will never sacrifice reality for elegance without explaining why I have done so.

~ Nor will I give the people who use my model false comfort about its accuracy. Instead, I will make explicit its assumptions and oversights.

~ I understand that my work may have enormous effects on society and the economy, many of them beyond my comprehension

It may just be me but I don’t see a charge being laid on data scientists to be the ethical voices in organizations using data science.

Do you see that charge?

To to put it more positively, aren’t other members of the organization, accountants, engineers, lawyers, managers, etc., all equally responsible for spurring “ethical conversations?” Why is this a peculiar responsibility for data scientists?

I take a legal ethics view of the employer – employee/consultant relationship. The client is the ultimate arbiter of the goal and means of a project, once advised of their options.

Their choice may or may not be mine but I haven’t ever been hired to play the role of Jiminy Cricket.


It’s heady stuff to be responsible for bringing ethical insights to the clueless but sometimes the clueless have ethical insights on their on, or not.

Data scientists can and should raise ethical concerns but no more or less than any other member of a project.

As you can tell from reading this blog, I have very strong opinions on a wide variety of subjects. That said, unless a client hires me to promote those opinions, the goals of the client, by any legal means, are my only concern.

PS: Before you ask, no, I would not work for Donald Trump. But that’s not an ethical decision. That’s simply being a good citizen of the world.

When back doors backfire [Uncorrected Tweet From Economist Hits 1.1K Retweets]

Sunday, January 3rd, 2016

When back doors backfire

From the post:


Push back against back doors

Calls for the mandatory inclusion of back doors should therefore be resisted. Their potential use by criminals weakens overall internet security, on which billions of people rely for banking and payments. Their existence also undermines confidence in technology companies and makes it hard for Western governments to criticise authoritarian regimes for interfering with the internet. And their imposition would be futile in any case: high-powered encryption software, with no back doors, is available free online to anyone who wants it.

Rather than weakening everyone’s encryption by exploiting back doors, spies should use other means. The attacks in Paris in November succeeded not because terrorists used computer wizardry, but because information about their activities was not shared. When necessary, the NSA and other agencies can usually worm their way into suspects’ computers or phones. That is harder and slower than using a universal back door—but it is safer for everyone else.

By my count on two (2) tweets from The Economist, they are running at 50% correspondence between their tweets and actual content.

You may remember my checking their tweet about immigrants yesterday, that got 304 retweets (and was wrong) in Fail at The Economist Gets 304 Retweets!.

Today I saw the When back doors backfire tweet and I followed the link to the post to see if it corresponded to the tweet.

Has anyone else been checking on tweet/story correspondence at The Economist (zine)? The twitter account is: @TheEconomist.

I ask because no correcting tweet has appeared in @TheEconomist tweet feed. I know because I just looked at all of its tweets in chronological order.

Here is the uncorrected tweet:


As of today, the uncorrected tweet on immigrants has 1.1K retweets and 707 likes.

From the Economist article on immigrants:

Refugee resettlement is the least likely route for potential terrorists, says Kathleen Newland at the Migration Policy Institute, a think-tank. Of the 745,000 refugees resettled since September 11th, only two Iraqis in Kentucky have been arrested on terrorist charges, for aiding al-Qaeda in Iraq.

Do retweets and likes matter more than factual accuracy, even as reported in the tweeted article?

Is this a journalism ethics question?

What’s the standard journalism position on retweet-bait tweets?

Neural Networks, Recognizing Friendlies, $Billions; Friendlies as Enemies, $Priceless

Thursday, December 24th, 2015

Elon Musk merits many kudos for the recent SpaceX success.

At the same time, Elon has been nominated for Luddite of the Year, along with Bill Gates and Stephen Hawking, for fanning fears of artificial intelligence.

One favorite target for such fears are autonomous weapons systems. Hannah Junkerman annotated a list of 18 posts, articles and books on such systems for Just Security.

While moralists are wringing their hands, military forces have not let grass grow under their feet with regard to autonomous weapon systems. As Michael Carl Haas reports in Autonomous Weapon Systems: The Military’s Smartest Toys?:

Military forces that rely on armed robots to select and destroy certain types of targets without human intervention are no longer the stuff of science fiction. In fact, swarming anti-ship missiles that acquire and attack targets based on pre-launch input, but without any direct human involvement—such as the Soviet Union’s P-700 Granit—have been in service for decades. Offensive weapons that have been described as acting autonomously—such as the UK’s Brimstone anti-tank missile and Norway’s Joint Strike Missile—are also being fielded by the armed forces of Western nations. And while governments deny that they are working on armed platforms that will apply force without direct human oversight, sophisticated strike systems that incorporate significant features of autonomy are, in fact, being developed in several countries.

In the United States, the X-47B unmanned combat air system (UCAS) has been a definite step in this direction, even though the Navy is dodging the issue of autonomous deep strike for the time being. The UK’s Taranis is now said to be “merely” semi-autonomous, while the nEUROn developed by France, Greece, Italy, Spain, Sweden and Switzerland is explicitly designed to demonstrate an autonomous air-to-ground capability, as appears to be case with Russia’s MiG Skat. While little is known about China’s Sharp Sword, it is unlikely to be far behind its competitors in conceptual terms.

The reasoning of military planners in favor of autonomous weapons systems isn’t hard to find, especially when one article describes air-to-air combat between tactically autonomous and machine-piloted aircraft versus piloted aircraft this way:

This article claims that a tactically autonomous, machine-piloted aircraft whose design capitalizes on John Boyd’s observe, orient, decide, act (OODA) loop and energy-maneuverability constructs will bring new and unmatched lethality to air-to-air combat. It submits that the machine’s combined advantages applied to the nature of the tasks would make the idea of human-inhabited platforms that challenge it resemble the mismatch depicted in The Charge of the Light Brigade.

Here’s the author’s mock-up of sixth-generation approach:


(Select the image to see an undistorted view of both aircraft.)

Given the strides being made on the use of neural networks, I would be surprised if they are not at the core of present and future autonomous weapons systems.

You can join the debate about the ethics of autonomous weapons but the more practical approach is to read How to trick a neural network into thinking a panda is a vulture by Julia Evans.

Autonomous weapon systems will be developed by a limited handful of major military powers, at least at first, which means the market for counter-measures, such as turning such weapons against their masters, will bring a premium price. Far more than the offensive development side. Not to mention there will be a far larger market for counter-measures.

Deception, one means of turning weapons against their users, has a long history, not the earliest of which is the tale of Esau and Jacob (Genesis, chapter 26):

11 And Jacob said to Rebekah his mother, Behold, Esau my brother is a hairy man, and I am a smooth man:

12 My father peradventure will feel me, and I shall seem to him as a deceiver; and I shall bring a curse upon me, and not a blessing.

13 And his mother said unto him, Upon me be thy curse, my son: only obey my voice, and go fetch me them.

14 And he went, and fetched, and brought them to his mother: and his mother made savoury meat, such as his father loved.

15 And Rebekah took goodly raiment of her eldest son Esau, which were with her in the house, and put them upon Jacob her younger son:

16 And she put the skins of the kids of the goats upon his hands, and upon the smooth of his neck:

17 And she gave the savoury meat and the bread, which she had prepared, into the hand of her son Jacob.

Julia’s post doesn’t cover the hard case of seeing Jacob as Esau up close but in a battle field environment, the equivalent of mistaking a panda for a vulture, may be good enough.

The primary distinction that any autonomous weapons system must make is the friendly/enemy distinction. The term “friendly fire” was coined to cover cases where human directed weapons systems fail to make that distinction correctly.

The historical rate of “friendly fire” or fratricide is 2% but Mark Thompson reports in The Curse of Friendly Fire, that the actual fratricide rate in the 1991 Gulf war was 24%.

#Juniper, just to name one recent federal government software failure, is evidence that robustness isn’t an enforced requirement for government software.

Apply that lack of requirements to neural networks in autonomous weapons platforms and you have the potential for both developing and defeating autonomous weapons systems.

Julia’s post leaves you a long way from defeating an autonomous weapons platform but it is a good starting place.

PS: Defeating military grade neural networks will be good training for defeating more sophisticated ones used by commercial entities.

Data Science Ethics: Who’s Lying to Hillary Clinton?

Sunday, December 20th, 2015

The usual ethics example for data science involves discrimination against some protected class. Discrimination on race, religion, ethnicity, etc., most if not all of which is already illegal.

That’s not a question of ethics, that’s a question of staying out of jail.

A better ethics example is to ask: Who’s lying to Hillary Clinton about back doors for encryption?

I ask because in the debate on December 19, 2015, Hillary says:

Secretary Clinton, I want to talk about a new terrorist tool used in the Paris attacks, encryption. FBI Director James Comey says terrorists can hold secret communications which law enforcement cannot get to, even with a court order.

You’ve talked a lot about bringing tech leaders and government officials together, but Apple CEO Tim Cook said removing encryption tools from our products altogether would only hurt law-abiding citizens who rely on us to protect their data. So would you force him to give law enforcement a key to encrypted technology by making it law?

CLINTON: I would not want to go to that point. I would hope that, given the extraordinary capacities that the tech community has and the legitimate needs and questions from law enforcement, that there could be a Manhattan-like project, something that would bring the government and the tech communities together to see they’re not adversaries, they’ve got to be partners.

It doesn’t do anybody any good if terrorists can move toward encrypted communication that no law enforcement agency can break into before or after. There must be some way. I don’t know enough about the technology, Martha, to be able to say what it is, but I have a lot of confidence in our tech experts.

And maybe the back door is the wrong door, and I understand what Apple and others are saying about that. But I also understand, when a law enforcement official charged with the responsibility of preventing attacks — to go back to our early questions, how do we prevent attacks — well, if we can’t know what someone is planning, we are going to have to rely on the neighbor or, you know, the member of the mosque or the teacher, somebody to see something.

CLINTON: I just think there’s got to be a way, and I would hope that our tech companies would work with government to figure that out. Otherwise, law enforcement is blind — blind before, blind during, and, unfortunately, in many instances, blind after.

So we always have to balance liberty and security, privacy and safety, but I know that law enforcement needs the tools to keep us safe. And that’s what i hope, there can be some understanding and cooperation to achieve.

Who do you think has told Secretary Clinton there is a way to have secure encryption and at the same time enable law enforcement access to encrypted data?

That would be a data scientist or someone posing as a data scientist. Yes?

I assume you have read: Keys Under Doormats: Mandating Insecurity by Requiring Government Access to All Data and Communications by H. Abelson, R. Anderson, S. M. Bellovin, J. Benaloh, M. Blaze, W. Diffie, J. Gilmore, M. Green, S. Landau, P. G. Neumann, R. L. Rivest, J. I. Schiller, B. Schneier, M. Specter, D. J. Weitzner.


Twenty years ago, law enforcement organizations lobbied to require data and communication services to engineer their products to guarantee law enforcement access to all data. After lengthy debate and vigorous predictions of enforcement channels “going dark,” these attempts to regulate security technologies on the emerging Internet were abandoned. In the intervening years, innovation on the Internet flourished, and law enforcement agencies found new and more effective means of accessing vastly larger quantities of data. Today, there are again calls for regulation to mandate the provision of exceptional access mechanisms. In this article, a group of computer scientists and security experts, many of whom participated in a 1997 study of these same topics, has convened to explore the likely effects of imposing extraordinary access mandates.

We have found that the damage that could be caused by law enforcement exceptional access requirements would be even greater today than it would have been 20 years ago. In the wake of the growing economic and social cost of the fundamental insecurity of today’s Internet environment, any proposals that alter the security dynamics online should be approached with caution. Exceptional access would force Internet system developers to reverse “forward secrecy” design practices that seek to minimize the impact on user privacy when systems are breached. The complexity of today’s Internet environment, with millions of apps and globally connected services, means that new law enforcement requirements are likely to introduce unanticipated, hard to detect security flaws. Beyond these and other technical vulnerabilities, the prospect of globally deployed exceptional access systems raises difficult problems about how such an environment would be governed and how to ensure that such systems would respect human rights and the rule of law.

Whether you agree on policy grounds about back doors to encryption or not, is there any factual doubt that back doors to encryption leave users insecure?

That’s an important point because Hillary’s data science advisers should have clued her in that her position is factually false. With or without a “Manhattan Project.”

Here are the ethical questions with regard to Hillary’s position on back doors for encryption:

  1. Did Hillary’s data scientist(s) tell her that access by the government to encrypted data means no security for users?
  2. What ethical obligations do data scientists have to advise public office holders or candidates that their positions are at variance with known facts?
  3. What ethical obligations do data scientists have to caution their clients when they persist in spreading mis-information, in this case about encryption?
  4. What ethical obligations do data scientists have to expose their reports to a client outlining why the client’s public position is factually false?

Many people will differ on the policy question of access to encrypted data but that access to encrypted data weakens the protection for all users is beyond reasonable doubt.

If data scientists want to debate ethics, at least make it about an issue with consequences. Especially for the data scientists.

Questions with no risk aren’t ethics questions, they are parlor entertainment games.

PS: Is there an ethical data scientist in the Hillary Clinton campaign?

The Moral Failure of Computer Scientists [Warning: Scam Alert!]

Sunday, December 13th, 2015

The Moral Failure of Computer Scientists by Kaveh Waddell.

From the post:

Computer scientists and cryptographers occupy some of the ivory tower’s highest floors. Among academics, their work is prestigious and celebrated. To the average observer, much of it is too technical to comprehend. The field’s problems can sometimes seem remote from reality.

But computer science has quite a bit to do with reality. Its practitioners devise the surveillance systems that watch over nearly every space, public or otherwise—and they design the tools that allow for privacy in the digital realm. Computer science is political, by its very nature.

That’s at least according to Phillip Rogaway, a professor of computer science at the University of California, Davis, who has helped create some of the most important tools that secure the Internet today. Last week, Rogaway took his case directly to a roomful of cryptographers at a conference in Auckland, New Zealand. He accused them of a moral failure: By allowing the government to construct a massive surveillance apparatus, the field had abused the public trust. Rogaway said the scientists had a duty to pursue social good in their work.

He likened the danger posed by modern governments’ growing surveillance capabilities to the threat of nuclear warfare in the 1950s, and called upon scientists to step up and speak out today, as they did then.

I spoke to Rogaway about why cryptographers fail to see their work in moral terms, and the emerging link between encryption and terrorism in the national conversation. A transcript of our conversation appears below, lightly edited for concision and clarity.

I don’t disagree with Rogaway that all science and technology is political. I might use the term social instead but I agree, there are no neutral choices.

Having said that, I do disagree that Rogaway has the standing to pre-package a political stance colored as “morals” and denounce others as “immoral” if they disagree.

It is one of the oldest tricks in rhetoric but quite often effective, which is why people keep using it.

If Rogaway is correct that CS and technology are political, then his stance for a particular take on government, surveillance and cryptography is equally political.

Not that I disagree with his stance, but I don’t consider it be a moral choice.

Anything you can do to impede, disrupt or interfere with any government surveillance is fine by me. I won’t complain. But that’s because government surveillance, the high-tech kind, is a waste of time and effort.

Rogaway uses scientists who spoke out in the 1950’s about the threat of nuclear warfare as an example. Some example.

The Federation of American Scientists estimates that as of September 2015, there are approximately 15,800 nuclear weapons in the world.

Hmmm, doesn’t sound like their moral outrage was very effective does it?

There will be sessions, presentations, conferences, along with comped travel and lodging, publications for tenure, etc., but the sum of the discussion of morality in computer science with be largely the same.

The reason for the sameness of result is that discussions, papers, resolutions and the rest, aren’t nearly as important as the ethical/moral choices you make in the day to day practice as a computer scientist.

Choices in the practice of computer science make a difference, discussions of fictional choices don’t. It’s really that simple.*

*That’s not entirely fair. The industry of discussing moral choices without making any of them is quite lucrative and it depletes the bank accounts of those snared by it. So in that sense it does make a difference.

Racist algorithms: how Big Data makes bias seem objective

Sunday, December 6th, 2015

Racist algorithms: how Big Data makes bias seem objective by Cory Doctorow.

From the post:

The Ford Foundation’s Michael Brennan discusses the many studies showing how algorithms can magnify bias — like the prevalence of police background check ads shown against searches for black names.

What’s worse is the way that machine learning magnifies these problems. If an employer only hires young applicants, a machine learning algorithm will learn to screen out all older applicants without anyone having to tell it to do so.

Worst of all is that the use of algorithms to accomplish this discrimination provides a veneer of objective respectability to racism, sexism and other forms of discrimination.

Cory has a good example of “hidden” bias in data analysis and has suggestions for possible improvement.

Although I applaud the notion of “algorithmic transparency,” the issue of bias in algorithms may be more subtle than you think.

Lauren J. Young reports in Computer Scientists Find Bias in Algorithms that the bias problem can be especially acute with self-improving algorithms. Algorithms, like users have experiences and those experiences can lead to bias.

Lauren’s article is a good introduction to the concept of bias in algorithms, but for the full monty, see: Certifying and removing disparate impact by Michael Feldman, et al.


What does it mean for an algorithm to be biased? In U.S. law, unintentional bias is encoded via disparate impact, which occurs when a selection process has widely different outcomes for different groups, even as it appears to be neutral. This legal determination hinges on a definition of a protected class (ethnicity, gender, religious practice) and an explicit description of the process.

When the process is implemented using computers, determining disparate impact (and hence bias) is harder. It might not be possible to disclose the process. In addition, even if the process is open, it might be hard to elucidate in a legal setting how the algorithm makes its decisions. Instead of requiring access to the algorithm, we propose making inferences based on the data the algorithm uses.

We make four contributions to this problem. First, we link the legal notion of disparate impact to a measure of classification accuracy that while known, has received relatively little attention. Second, we propose a test for disparate impact based on analyzing the information leakage of the protected class from the other data attributes. Third, we describe methods by which data might be made unbiased. Finally, we present empirical evidence supporting the effectiveness of our test for disparate impact and our approach for both masking bias and preserving relevant information in the data. Interestingly, our approach resembles some actual selection practices that have recently received legal scrutiny.

Bear in mind that disparate impact is only one form of bias for a selected set of categories. And that bias can be introduced prior to formal data analysis.

Rather than say data or algorithms can be made unbiased, say rather that known biases can be reduced to acceptable levels, for some definition of acceptable.

Big Data Ethics?

Saturday, December 5th, 2015

Ethics are a popular topic in big data and related areas, as I was reminded by Sam Ransbotham’s The Ethics of Wielding an Analytical Hammer.

Here’s a big data ethics problem.

In order to select individuals based on some set of characteristics, habits, etc., we first must define the selection criteria.

Unfortunately, we don’t have a viable profile for terrorists, which explain in part why they can travel under their actual names, with their own identification and not be stopped by the authorities.

So, here’s the ethical question: Is it ethical for contractors and data scientists to offer data mining services to detect terrorists when there is no viable profile for a terrorist?

For all the hand wringing about ethics, basic honesty seems to be in short supply when talking about big data and the search for terrorists.


Encyclopedia of Ethical Failure — Updated October 2014

Sunday, November 16th, 2014

Encyclopedia of Ethical Failure — Updated October 2014 by the Department of Defense, Office of General Counsel, Standards of Conduct Office. (Word Document)

From the introduction:

The Standards of Conduct Office of the Department of Defense General Counsel’s Office has assembled the following selection of cases of ethical failure for use as a training tool. Our goal is to provide DoD personnel with real examples of Federal employees who have intentionally or unwittingly violated the standards of conduct. Some cases are humorous, some sad, and all are real. Some will anger you as a Federal employee and some will anger you as an American taxpayer.

Please pay particular attention to the multiple jail and probation sentences, fines, employment terminations and other sanctions that were taken as a result of these ethical failures. Violations of many ethical standards involve criminal statutes. Protect yourself and your employees by learning what you need to know and accessing your Agency ethics counselor if you become unsure of the proper course of conduct. Be sure to access them before you take action regarding the issue in question. Many of the cases displayed in this collection could have been avoided completely if the offender had taken this simple precaution.

The cases have been arranged according to offense for ease of access. Feel free to reproduce and use them as you like in your ethics training program. For example – you may be conducting a training session regarding political activities. Feel free to copy and paste a case or two into your slideshow or handout – or use them as examples or discussion problems. If you have a case you would like to make available for inclusion in a future update of this collection, please email it to OSD.SOCO@MAIL.MIL or you may fax it to (703) 695-4970.

One of the things I like about the United States military is they have no illusions about being better or worse than any other large organization and they prepare accordingly. Instead of pretending they are “…shocked, shocked to find gambling…,” they are prepared for rule breaking and try to keep it in check.

If you are interested in exploring or mapping this area, you will find the U.S. Office of Government Ethics useful. Unfortunately, the “Office of Inspector General” is distinct for each agency so collating information across executive departments will be challenging. To say nothing of obtaining similar information for other branches of the United States government.

Not from a technical standpoint for a topic map but from a data mining and analysis perspective.

I first saw this at Full Text Reports as Encyclopedia of Ethical Failure — Updated October 2014.

Ethics and Big Data

Monday, May 26th, 2014

Ethical research standards in a world of big data by Caitlin M. Rivers and Bryan L. Lewis.


In 2009 Ginsberg et al. reported using Google search query volume to estimate influenza activity in advance of traditional methodologies. It was a groundbreaking example of digital disease detection, and it still remains illustrative of the power of gathering data from the internet for important research. In recent years, the methodologies have been extended to include new topics and data sources; Twitter in particular has been used for surveillance of influenza-like-illnesses, political sentiments, and even behavioral risk factors like sentiments about childhood vaccination programs. As the research landscape continuously changes, the protection of human subjects in online research needs to keep pace. Here we propose a number of guidelines for ensuring that the work done by digital researchers is supported by ethical-use principles. Our proposed guidelines include: 1) Study designs using Twitter-derived data should be transparent and readily available to the public. 2) The context in which a tweet is sent should be respected by researchers. 3) All data that could be used to identify tweet authors, including geolocations, should be secured. 4) No information collected from Twitter should be used to procure more data about tweet authors from other sources. 5) Study designs that require data collection from a few individuals rather than aggregate analysis require Institutional Review Board (IRB) approval. 6) Researchers should adhere to a user’s attempt to control his or her data by respecting privacy settings. As researchers, we believe that a discourse within the research community is needed to ensure protection of research subjects. These guidelines are offered to help start this discourse and to lay the foundations for the ethical use of Twitter data.

I am curious who is going to follow this suggested code of ethics?

Without long consideration, obviously not the NSA, FBI, CIA, DoD, or any employee of the United States government.

Ditto for the security services in any country plus their governments.

Industry players are well known for their near perfect recidivism rate on corporate crime so not expecting big data ethics there.

Drug cartels? Anyone shipping cocaine in multi-kilogram lots is unlikely to be interested in Big Data ethics.

That rather narrows the pool of prospective users of a code of ethics for big data doesn’t it?

I first saw this in a tweet by Ed Yong.

Mortar’s Open Source Community

Tuesday, November 19th, 2013

Building Mortar’s Open Source Community: Announcing Public Plans by K. Young.

From the post:

We’re big fans of GitHub. There are a lot of things to like about the company and the fantastic service they’ve built. However, one of the things we’ve come to admire most about GitHub is their pricing model.

If you’re giving back to the community by making your work public, you can use GitHub for free. It’s a great approach that drives tremendous benefits to the GitHub community.

Starting today, Mortar is following GitHub’s lead in supporting those who contribute to the data science community.

If you’re improving the data science community by allowing your Mortar projects to be seen and forked by the public, we will support you by providing free access to our complete platform (including unlimited development time, up to 25 public projects, and email support). In short, you’ll pay nothing beyond Amazon Web Services’ standard Elastic MapReduce fees if you decide to run a job.

A good illustration of the difference between talking about ethics (Ethics of Big Data?) and acting ethically.

Acting ethically benefits the community.

Government grants to discuss ethics, well, you know who benefits from that.

Ethics of Big Data?

Tuesday, November 19th, 2013

The ethics of big data: A council forms to help researchers avoid pratfalls by Jordan Novet.

From the post:

Big data isn’t just something for tech companies to talk about. Researchers and academics are forming a council to analyze the hot technology category from legal, ethical, and political angles.

The researchers decided to create the council in response to a request from the National Science Foundation (NSF) for “innovation projects” involving big data.

The Council for Big Data, Ethics, and Society will convene for the first time next year, with some level of participation from the NSF. Alongside Microsoft researchers Kate Crawford and Danah Boyd, two computer-science-savvy professors will co-direct the council: Geoffrey Bowker from the University of California, Irvine, and Helen Nissenbaum of New York University.

Through “public commentary, events, white papers, and direct engagement with data analytics projects,” the council will “address issues such as security, privacy, equality, and access in order to help guard against the repetition of known mistakes and inadequate preparation,” according to a fact sheet the White House released on Tuesday.

“We’re doing all of these major investments in next-generation internet (projects), in big data,” Fen Zhao, an NSF staff associate, told VentureBeat in a phone interview. “How do we in the research-and-development phase make sure they’re aware and cognizant of any issues that may come up?”

Odd that I should encounter this just after seeing the latest NSA surveillance news.

Everyone cites the Tuskegee syphilis study as an example of research with ethical lapses.

Tuskegee is only one of many ethical lapses in American history. I think hounding native Americans to near extermination would make a list of moral lapses. But, that was more application than research.

It doesn’t require training in ethics to know Tuskegee and the treatment of native Americans were wrong.

And whatever “ethics” come out of this study are likely to resemble the definition of a prisoner of war as defined in Geneva Convention (III), Article 4(a)(2)

(2) Members of other militias and members of other volunteer corps, 
including those of organized resistance movements, belonging to a 
Party to the conflict and operating in or outside their own territory, 
even if this territory is occupied, provided that such militias or 
volunteer corps, including such organized resistance movements, 
fulfill the following conditions:

(a) that of being commanded by a person responsible for his 

(b) that of having a fixed distinctive sign recognizable at a distance;

(c) that of carrying arms openly;

(d) that of conducting their operations in accordance with the laws 
and customs of war.

That may seem neutral on its face, but it’s fair to say that major nation states and not groups that have differences with them are likely to meet those requirements.

In fact, the Laws of War Deskbook argues in part that members of the Taliban had no distinctive uniforms and thus no POW status. (At page 79, footnote 31.)

The point being discussion of ethics should be in concrete cases, so we can judge who will win and who will lose.

Otherwise you will have general principles of ethics that favor the rule makers.