Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 6, 2016

Unicode® Standard, Version 9.0

Filed under: Unicode — Patrick Durusau @ 3:40 pm

Unicode® Standard, Version 9.0

From the webpage:

Version 9.0 of the Unicode Standard is now available. Version 9.0 adds exactly 7,500 characters, for a total of 128,172 characters. These additions include six new scripts and 72 new emoji characters.

The new scripts and characters in Version 9.0 add support for lesser-used languages worldwide, including:

  • Osage, a Native American language
  • Nepal Bhasa, a language of Nepal
  • Fulani and other African languages
  • The Bravanese dialect of Swahili, used in Somalia
  • The Warsh orthography for Arabic, used in North and West Africa
  • Tangut, a major historic script of China

Important symbol additions include:

  • 19 symbols for the new 4K TV standard
  • 72 emoji characters such as the following

Why they choose to omit the bacon emoji from the short list is a mystery to me:

bacon-emoji-460

Get your baking books out! I see missing bread emojis. 😉

Chilcot Report – Collected PDFs, Converted to Text

Filed under: Chilcot Report (Iraq),Government — Patrick Durusau @ 3:19 pm

I didn’t see a bulk download option for the chapters of the Chilcot Report at: The Iraq Inquiry Report page so I have collected those files and bundled them up for download as Iraq-Inquiry-Report-All-Volumes.tar.gz.

I wrote about Apache PDFBox recently so I also converted all of those files to text and have bundled them up as a Iraq-Inquiry-Report-Text-Conversion.tar.gz.

Some observations on the text files:

  • Numbered paragraphs have the format: digit(one or more)-period-space
  • Footnotes are formatted: digit(1 or more)-space-text
  • Page numbers: digit(1 or more)-space-no following text

Suggestions on other processing steps?

The Iraq Inquiry (Chilcot Report) [4.5x longer than War and Peace]

Filed under: ElasticSearch,Lucene,Search Algorithms,Search Interface,Solr,Topic Maps — Patrick Durusau @ 2:41 pm

The Iraq Inquiry

To give a rough sense of the depth of the Chilcot Report, the executive summary runs 150 pages. The report appears in twelve (12) volumes, not including video testimony, witness transcripts, documentary evidence, contributions and the like.

Cory Doctorow reports a Guardian project to crowd source collecting facts from the 2.6 million word report. The Guardian observes the Chilcot report is “…almost four-and-a-half times as long as War and Peace.”

Manual reading of the Chilcot report is doable, but unlikely to yield all of the connections that exist between participants, witnesses, evidence, etc.

How would you go about making the Chilcot report and its supporting evidence more amenable to navigation and analysis?

The Report

The Evidence

Other Material

Unfortunately, sections within volumes were not numbered according to their volume. In other words, volume 2 starts with section 3.3 and ends with 3.5, whereas volume 4 only contains sections beginning with “4.,” while volume 5 starts with section 5 but also contains sections 6.1 and 6.2. Nothing can be done for it but be aware that section numbers don’t correspond to volume numbers.

When AI’s Take The Fifth – Sign Of Intelligence?

Filed under: Artificial Intelligence — Patrick Durusau @ 8:52 am

Taking the fifth amendment in Turing’s imitation game by Kevin Warwick and Huma Shahb.

Abstract:

In this paper, we look at a specific issue with practical Turing tests, namely the right of the machine to remain silent during interrogation. In particular, we consider the possibility of a machine passing the Turing test simply by not saying anything. We include a number of transcripts from practical Turing tests in which silence has actually occurred on the part of a hidden entity. Each of the transcripts considered here resulted in a judge being unable to make the ‘right identification’, i.e., they could not say for certain which hidden entity was the machine.

A delightful read about something never seen in media interviews: silence of the person being interviewed.

Of the interviews I watch, which is thankfully a small number, most people would seem more intelligent by being silent more often.

I take author’s results as a mark in favor of Fish’s interpretative communities because “interpretation” of silence falls squarely on the shoulders of the questioner.

If you don’t know the name Kevin Warwick, you should.


As of today, footnote 1 correctly points to the Fifth Amendment text at Cornell but mis-quotes it. In relevant part the Fifth Amendment reads, “…nor shall be compelled in any criminal case to be a witness against himself….”

July 5, 2016

Everything You Wanted to Know about Book Sales (But Were Afraid to Ask)

Filed under: Books,Publishing — Patrick Durusau @ 4:57 pm

Everything You Wanted to Know about Book Sales (But Were Afraid to Ask) by Lincoln Michel.

From the post:

Publishing is the business of creating books and selling them to readers. And yet, for some reason we aren’t supposed to talk about the latter.

Most literary writers consider book sales a half-crass / half-mythological subject that is taboo to discuss.
While authors avoid the topic, every now and then the media brings up book sales — normally to either proclaim, yet again, the death of the novel, or to make sweeping generalizations about the attention spans of different generations. But even then, the data we are given is almost completely useless for anyone interested in fiction and literature. Earlier this year, there was a round of excited editorials about how print is back, baby after industry reports showed print sales increasing for the second consecutive year. However, the growth was driven almost entirely by non-fiction sales… more specifically adult coloring books and YouTube celebrity memoirs. As great as adult coloring books may be, their sales figures tell us nothing about the sales of, say, literary fiction.

Lincoln’s account mirrors my experience (twice) with a small press decades ago.

While you (rightfully) think that every sane person on the planet will forego the rent in order to purchase your book, sadly your publisher is very unlikely to share that view.

One of the comments to this post reads:

…Writing is a calling but publishing is a business.

Quite so.

Don’t be discouraged by this account but do allow it to influence your expectations, at least about the economic rewards of publishing.

Just in case I get hit with the publishing bug again, good luck to us all!

Free Programming Books – Update

Filed under: Books,Programming — Patrick Durusau @ 3:30 pm

Free Programming Books by Victor Felder.

From the webpage:

This list initially was a clone of stackoverflow – List of Freely Available Programming Books by George Stocker. Now updated, with dead links gone and new content.

Moved to GitHub for collaborative updating.

Great listing of resources!

But each resource stands alone as its own silo. It can (and many do) refer to other materials, even with hyperlinks, but if you want to explore any of them, you must explore them separately. That’s what being in a silo means. You have to start over at the beginning. Every time.

That is complicated by the existence of thousands of slideshows and videos on programming topics not listed here. Search for your favorite programming language at Slideshare and Youtube. There are other repositories of slideshows and videos, those are just examples.

Each one of those slideshows and/or videos is also a silo. Not to mention that with video you need a time marker if you aren’t going to watch every second of it to find relevant material.

What if you could traverse each of those silos, books, posts, slideshows, videos, documentation, source code, seamlessly?

Making that possible for C/C++ now, given the backlog of material, would have a large upfront cost before it could be useful.

Making that possible for languages with shorter histories, well, how useful would it need to be to justify its cost?

And how would you make it possible for others to easily contribute gems that they find?

Something to think about as you wander about in each of these separate silos.

Enjoy!

Using A Shared Password Is A Crime (9th Circuit, U.S. v. Nosal) Full Text of Opinion

Filed under: Computer Fraud and Abuse (CFAA),Cybersecurity — Patrick Durusau @ 2:48 pm

U.S. appeals court rejects challenge to anti-hacking law by Jonathan Stempel.

From the post:

A divided federal appeals court on Tuesday gave the U.S. Department of Justice broad leeway to police password theft under a 1984 anti-hacking law, upholding the conviction of a former Korn/Ferry International executive for stealing confidential client data.

The 9th U.S. Circuit Court of Appeals in San Francisco said David Nosal violated the Computer Fraud and Abuse Act in 2005 when he and two friends, who had also left Korn/Ferry, used an employee’s password to access the recruiting firm’s computers and obtain information to help start a new firm.

Writing for a 2-1 majority, Circuit Judge Margaret McKeown said Nosal acted “without authorization” even though the employee, his former secretary, had voluntarily provided her password.

The full text of the decision (plus dissent) in U.S. v. Nosal, No. 14-10037.

This case has a long history, which I won’t try to summarize now.

Hillary Clinton Email Archive

Filed under: Government,Wikileaks — Patrick Durusau @ 1:33 pm

Hillary Clinton Email Archive by Wikileaks.

From the webpage:

On March 16, 2016 WikiLeaks launched a searchable archive for 30,322 emails & email attachments sent to and from Hillary Clinton’s private email server while she was Secretary of State. The 50,547 pages of documents span from 30 June 2010 to 12 August 2014. 7,570 of the documents were sent by Hillary Clinton. The emails were made available in the form of thousands of PDFs by the US State Department as a result of a Freedom of Information Act request. The final PDFs were made available on February 29, 2016.

“Truthers” may be interested in this searchable archive of Clinton’s emails while Secretary of State.

“Truthers” because the FBI’s recommendation of no charges effectively ends this particular approach to derail Clinton’s run for the presidency.

Many wish the result were different but when the last strike is called, arguing about it isn’t going to change the score of the game.

New evidence and new facts, on the other hand, are unknown factors and could make a difference whereas old emails will not.

Are you going to be looking for new evidence and facts or crying over calls in a game already lost?

Promiscuous Use of USB Sticks

Filed under: Cybersecurity — Patrick Durusau @ 1:10 pm

17% of US employees would use a USB stick found in the street by Marika Samarati.

From the post:


The social experiment – 17% caught in the net

The team of researchers hypothesized that, while there is increasing concern about cyber attacks and data breaches, people still have poor cybersecurity hygiene that puts their own devices at risk. To test this assumption, they dropped 200 USB sticks in public spaces. Each stick contained text files prompting the reader to click on a link or send an email to a specific address. After a few weeks, 17% of the sticks were picked up, plugged in, and resulted in the researchers being notified, either because the user clicked on the link or sent an email. The hypothesis turned out to be true: Despite people’s awareness of cyber threats, they still make decisions that could have disastrous outcomes.

I take the 17% to be the low estimate of users who used the USB sticks left in public places. It’s not possible to tell how many employees used the USB sticks without ever opening the text files or if they did, simply ignoring the request for contact.

Incentive to take fashionable/attractive USB sticks on job interviews, tours, site visits, etc.

Apache PDFBox 2 – Vulnerability Warning

Filed under: PDF — Patrick Durusau @ 12:39 pm

Apache PDFBox 2 by Dustin Marx.

From the post:

Apache PDFBox 2 was released earlier this year and Apache PDFBox 2.0.1 and Apache PDFBox 2.0.2 have since been released. Apache PDFBox is open source (Apache License Version 2) and Java-based (and so is easy to use with wide variety of programming language including Java, Groovy, Scala, Clojure, Kotlin, and Ceylon). Apache PDFBox can be used by any of these or other JVM-based languages to read, write, and work with PDF documents.

Apache PDFBox 2 introduces numerous bug fixes in addition to completed tasks and some new features. Apache PDFBox 2 now requires Java SE 6 (J2SE 5 was minimum for Apache PDFBox 1.x). There is a migration guide, Migration to PDFBox 2.0.0, that details many differences between PDFBox 1.8 and PDFBox 2.0, including updated dependencies (Bouncy Castle 1.53 and Apache Commons Logging 1.2) and “breaking changes to the library” in PDFBox 2.

PDFBox can be used to create PDFs. The next code listing is adapted from the Apache PDFBox 1.8 example “Create a blank PDF” in the Document Creation “Cookbook” examples. The referenced example explicitly closes the instantiated PDDocument and probably does so for benefit of those using a version of Java before JDK 7. For users of Java 7, however, try-with-resources is a better option for ensuring that the PDDocument instance is closed and it is supported because PDDocument implements AutoCloseable.

If you don’t know Apache PDFBox™, its homepage lists the following features:

  • Extract Text
  • Print
  • Split & Merge
  • Save as Image
  • Fill Forms
  • Create PDFs
  • Preflight
  • Signing

Warning: If you are using Apache PDFBox, update to the most recent version.

CVE-2016-2175 XML External Entity vulnerability (2016-05-27)

Due to a XML External Entity vulnerability we strongly recommend to update to the most recent version of Apache PDFBox.

Versions Affected: Apache PDFBox 1.8.0 to 1.8.11 and 2.0.0. Earlier, unsupported versions may be affected as well.

Mitigation: Upgrade to Apache PDFBox 1.8.12 respectively 2.0.1

SEO Tools: The Complete List (153 Free and Paid Tools) [No IEO Tools?]

Filed under: Search Engines,WWW — Patrick Durusau @ 9:15 am

SEO Tools: The Complete List (153 Free and Paid Tools) by Brian Dean.

Updated as of May 20, 2016.

There is a PDF version but that requires sacrifice of your email address, indeterminate waiting for the confirmation email, etc.

The advantage of the PDF version isn’t clear, other than you can print it on marketing’s color printer. Something to cement that close bond between marketing and IT.

With the abundance of search engine optimization tools, have you noticed the lack of index engine optimization (IEO) tools?

When an indexing engine is “optimized,” settings of the indexing engine are altered to produce a “better” result. So far as I know, the data being indexed isn’t normally changed to alter the behavior of the indexing engine.

In contrast to an indexing engine, it is expected data destined for a search engine can and will change/optimize itself to alter the behavior of the search engine.

What if data were index engine optimized, say to distinguish terms with multiple meanings, at the time of indexing? Say articles in the New York Times were paired with vocabulary lists of the names, terms, etc. that appear within them.

Bi-directional links so that an index of the vocabulary lists would at the same time be an index of the articles themselves.

Thoughts?

Securing A Travel iPhone

Filed under: Cybersecurity,Privacy,Security — Patrick Durusau @ 8:08 am

Securing A Travel iPhone by Filippo Valsorda.

From the post:

These are dry notes I took in the process of setting up a burner iPhone SE as a secure travel device. They are roughly in setup order.

I believe iOS to be the most secure platform one can use at this time, but there are a lot of switches and knobs. This list optimizes for security versus convenience.

Don’t to use anything older than an iPhone 5S, it wouldn’t have the TPM.

Needless to say, use long unique passwords everywhere.

There are more than forty (40) tasks/sub-tasks to securing a travel iPhone so you best start well ahead of time.

No security is perfect but if you follow this guide, you will be more secure than the vast majority of travelers.

July 4, 2016

Were You Paying Attention In June 2016?

Filed under: Journalism,News,Reporting — Patrick Durusau @ 9:16 pm

June’s fake news quiz: Football fans, kissing politicians and Arnie on safari by Alastair Reid, First Draft.

Alastair’s fake news quiz is a good way to find out.

Prior fake news quizzes are listed in case you want to test your long term memory.

Cybersecurity By Design?

Filed under: Cybersecurity,Programming,Rust — Patrick Durusau @ 9:08 pm

Shaun Nichols reports in Mozilla emits nightly builds of heir-to-Firefox browser engine Servo:

Mozilla has started publishing nightly in-development builds of its experimental Servo browser engine so anyone can track the project’s progress.

Executables for macOS and GNU/Linux are available right here to download and test drive even if you’re not a developer. If you are, the open-source engine’s code is here if you want to build it from scratch, fix bugs, or contribute to the effort.

Right now, the software is very much in a work-in-progress state, with a very simple user interface built out of HTML. It’s more of a technology demonstration than a viable web browser, although Mozilla has pitched Servo as a potential successor to Firefox’s Gecko engine.

Crucially, Servo is written using Rust – Mozilla’s more-secure C-like systems programming language. If Google has the language of Go, Moz has the language of No: Rust. It works hard to stop coders making common mistakes that lead to exploitable security bugs, and we literally mean stop: the compiler won’t build the application if it thinks dangerous code is present.

Rust focuses on safety and speed: its security measures do not impact it at run-time as the safety mechanisms are in the language by design. For example, variables in Rust have an owner and a lifetime; they can be borrowed by another owner. When a variable is being used by one owner, it cannot be used by another. This is supposed to help enforce memory safety and stop data races between threads.

It also forces the programmer to stop and think about their software’s design – Rust is not something for novices to pick up and quickly bash out code on.

Even though pre-release and rough, I was fairly excited until I read:


One little problem is that Servo relies on Mozilla’s SpiderMonkey JavaScript engine, which is written in C/C++. So while the HTML-rendering engine will run secured Rust code, fingers crossed nothing terrible happens within the JS engine.

Really?

But then I checked Mozilla JavaScript-C Engine – SpiderMonkey at BlackDuck | Security, which shows zero (0) vulnerabilities over the last 10 versions.

Other than SpiderMonkey vulnerabilities known to the NSA, any others you care to mention?

Support, participate, submit bug reports on the new rendering engine but don’t forget about the JavaScript engine.

Breaking Honeypots For Fun And Profit – Detecting Deception

Filed under: Privacy,Tor — Patrick Durusau @ 4:38 pm

by Dean Sysman & Gadi Evron & Itamar Sher

The description:

We will detect, bypass, and abuse honeypot technologies and solutions, turning them against the defender. We will also release a global map of honeypot deployments, honeypot detection vulnerabilities, and supporting code.

The concept of a honeypot is strong, but the way honeypots are implemented is inherently weak, enabling an attacker to easily detect and bypass them, as well as make use of them for his own purposes. Our methods are analyzing the network protocol completeness and operating system software implementation completeness, and vulnerable code.

As a case study, we will concentrate on platforms deployed in real organizational networks, mapping them globally, and demonstrating how it is possible to both bypass and use these honeypots to the attacker’s advantage.

The slides for the presentation.

This presentation addresses the question of detecting (identifying) a deception.

Detection of the following honeypots discussed:

Artillery: https://github.com/BinaryDefense/artillery (Updated URL)

BearTrap: https://github.com/chrisbdaemon/BearTrap

honeyd: http://www.honeyd.org

Dionaea: http://dionaea.carnivore.it/ (timed out on July 4, 2016)

Glastopf: http://glastopf.org/

Kippo: https://github.com/desaster/kippo

KFSensor: http://www.keyfocus.net/kfsensor/

Nova: https://github.com/DataSoft/Nova

Identification of an attack was argued to possibly result in the attack being prevented in all anti-attack code, whereas identification of an attacker, could have consequences for the attack as an operation.

Combining an IP address along with other dimensions of identification, say with a topic map, could prove to be a means of sharpening the consequences for attackers.

Of course, I am assuming that at least within an agency, agents share data/insights towards a common objective. That may not be the case in your agency.

While looking for other resources on honeypots, I did find Collection of Awesome Honeypots, dating from December of 2015.

Thomas Jefferson (Too Early For Tor – TEFT)

Filed under: Government,Privacy,Tor — Patrick Durusau @ 2:27 pm

Official Presidential portrait of Thomas Jefferson (by Rembrandt Peale, 1800)

Thomas Jefferson lived centuries before the internet and the rise of Tor but he is easy to see as a Tor user.

He was the author of the Declaration of Independence, which if you read the details, is a highly offensive document:


He has affected to render the Military independent of and superior to the Civil Power.

He has combined with others to subject us to a jurisdiction foreign to our constitution, and unacknowledged by our laws; giving his Assent to their Acts of pretended Legislation:

For quartering large bodies of armed troops among us:

For protecting them, by a mock Trial from punishment for any Murders which they should commit on the Inhabitants of these States:

For cutting off our Trade with all parts of the world:

For imposing Taxes on us without our Consent:

For depriving us in many cases, of the benefit of Trial by Jury:

For transporting us beyond Seas to be tried for pretended offences:

He is at this time transporting large Armies of foreign Mercenaries to compleat the works of death, desolation, and tyranny, already begun with circumstances of Cruelty & Perfidy scarcely paralleled in the most barbarous ages, and totally unworthy the Head of a civilized nation.

Update the language of “For transporting us beyond Seas to be tried for pretended offences” to “Transporting people to Guantanamo Bay prison for unlawful detention” and you have a good example of what FBI wants discussed in clear text.

Make no mistake, the FBI of today, working for George III, would have arrested Thomas Jefferson if it caught wind of the Declaration of Independence. At that time, Jefferson was not the towering figure of liberty that he is today. Then he was the opponent of a nation-state.

Jefferson was too early for Tor but he is the type of person that Tor protects.

Do you want to be on the side of George III or Jefferson in history?

Support Tor!

July 3, 2016

Outing Dark Web Spies (Donate to Tor)

Filed under: Cybersecurity,Tor — Patrick Durusau @ 3:45 pm

Two security experts have conducted a study that allowed them to spot over 100 snooping Tor Nodes spying on Dark Web Sites by Pierluigi Paganini.

From the post:

…Joseph Cox from Motherboad reported a study conducted by Guevara Noubir, a professor from the College of Computer and Information Science at Northeastern University, and Amirali Sanatinia, a PhD candidate also from Northeaster who revealed the existence of s number of Tor hidden service directories that are spying on Tor websites. Such kind of attacks could allow law enforcement to discover IP addresses of black markets and child pornography sites.

A similar technique could be very useful also for security firms that offer dark web intelligence services.

Threat actors using this technique could reveal the IP address of Tor hidden services, Noubir will present the results of the research at the Def Con hacking conference in August.

“We create what we call ‘honey onions’ or ‘honions.’ These are onion addresses that we don’t share with anyone,” Noubir said.

The security researchers ran 4,500 honey onions over 72 days, they identified that at least 110 HSDirs have been configured to spy on hidden services.

The experts highlighted that some of the threat actors operating the bogus HSDirs were active observers involved in many activities, including penetration testing.

While Next Generation Onion Services (issue 224), (Montreal 2016 update), is under development, outing dark web spies may be your next best defense.

Your best defense is supporting the Tor project. You support will help it gain and keep the advantage over dark web spies.

By helping Tor, you will be helping all of us, yourself included.

PS: Def Con 24 is August 4-7, 2016, at Paris + Bally’s in Las Vegas. No pre-registration, $240 USD cash at the door.

July 2, 2016

Developing Expert p-Hacking Skills

Filed under: Peer Review,Psychology,Publishing,R,Statistics — Patrick Durusau @ 4:00 pm

Introducing the p-hacker app: Train your expert p-hacking skills by Ned Bicare.

Ned’s p-hacker app will be welcomed by everyone who publishes where p-values are accepted.

Publishers should mandate authors and reviewers to submit six p-hacker app results along with any draft that contains, or is a review of, p-values.

The p-hacker app results won’t improve a draft and/or review, but when compared to the draft, will improve the publication in which it might have appeared.

From the post:

My dear fellow scientists!

“If you torture the data long enough, it will confess.”

This aphorism, attributed to Ronald Coase, sometimes has been used in a disrespective manner, as if it was wrong to do creative data analysis.

In fact, the art of creative data analysis has experienced despicable attacks over the last years. A small but annoyingly persistent group of second-stringers tries to denigrate our scientific achievements. They drag psychological science through the mire.

These people propagate stupid method repetitions; and what was once one of the supreme disciplines of scientific investigation – a creative data analysis of a data set – has been crippled to conducting an empty-headed step-by-step pre-registered analysis plan. (Come on: If I lay out the full analysis plan in a pre-registration, even an undergrad student can do the final analysis, right? Is that really the high-level scientific work we were trained for so hard?).

They broadcast in an annoying frequency that p-hacking leads to more significant results, and that researcher who use p-hacking have higher chances of getting things published.

What are the consequence of these findings? The answer is clear. Everybody should be equipped with these powerful tools of research enhancement!

The art of creative data analysis

Some researchers describe a performance-oriented data analysis as “data-dependent analysis”. We go one step further, and call this technique data-optimal analysis (DOA), as our goal is to produce the optimal, most significant outcome from a data set.

I developed an online app that allows to practice creative data analysis and how to polish your p-values. It’s primarily aimed at young researchers who do not have our level of expertise yet, but I guess even old hands might learn one or two new tricks! It’s called “The p-hacker” (please note that ‘hacker’ is meant in a very positive way here. You should think of the cool hackers who fight for world peace). You can use the app in teaching, or to practice p-hacking yourself.

Please test the app, and give me feedback! You can also send it to colleagues: http://shinyapps.org/apps/p-hacker.

Enjoy!

Five Essential Research Tips for Journalists Using Google

Filed under: Journalism,Searching — Patrick Durusau @ 3:16 pm

Five Essential Research Tips for Journalists Using Google by Temi Adeoye.

This graphic:

google-search-460

does not appear in Temi’s post but rather in a tweet by the International Center For Journalism (ICFJ) about his post.

See Temi’s post for the details but this graphic is a great reminder.

This will make a nice addition to my local page of search links.

July 1, 2016

ACLU Challenges Computer Fraud and Abuse Act (CFAA) – Sandvig v. Lynch

Filed under: Cybersecurity,Government — Patrick Durusau @ 4:49 pm

I saw the case on the ACLU website but had to dig through three posts before finding the text: Sandvig v. Lynch, Case 1:16-cv-01368, US District Court for District of Columbia.

When referencing litigation, don’t point to yet another post, link directly to the document in question.

You may be interested in the following posts at the ACLU site:

ACLU Challenges Computer Crimes Law That is Thwarting Research on Discrimination Online

Your Favorite Website Might Be Discriminating Against You

SANDVIG V. LYNCH — CHALLENGE TO CFAA PROHIBITION ON UNCOVERING RACIAL DISCRIMINATION ONLINE

SANDVIG V. LYNCH – COMPLAINT (bingo!)

The ACLU argument is that potential discrimination by websites (posing as users with different characteristics) cannot be researched without violating “terms of use,” and hence the CFAA.

I truly despise the CFAA but given the nature of machine learning driven marketing, I think the ACLU argument has a problem.

You see it already but for the uninitiated:

The complaint uses the term “sock puppet” seventeen (17) times, for example:


91. Plaintiffs Sandvig and Karahalios will then instruct the bot to perform the exhibiting behaviors associated with a particular race, so that, for instance, one sock puppet would browse like a Black user, while another would browse like a white user. All the sock puppets will browse the Web for several weeks, periodically revisiting the initial real estate site to search for properties.

[One hopes discovery will include plaintiffs definition of “browse like a Black user, while another would browse like a white user.”]

92. At each visit to the real estate site, Plaintiffs Sandvig and Karahalios will record the properties that were advertised to that sock puppet by scraping that data from the real estate site. They will scrape the organic listings and the Uniform Resource Locator (“URL”) of any advertisements. They will also record images of the advertisements shown to the sock puppets.

93. Finally, Plaintiffs Sandvig and Karahalios will compare the number and location of properties offered to different sock puppets, as well as the properties offered to the same sock puppet at different times. They seek to identify cases where the sock puppet behaved as though it were a person of a particular race and that behavior caused it to see a significantly different set of properties, whether in number or location.

As we all know, it took less than a day for a Microsoft chat-bot to become “a racist asshole.”

From all accounts it didn’t start off as “a racist asshole” so inputs must account for its ending state.

Not that the researchers will be inputting such content, but their content will be interacting in unforeseen and unpredictable ways with other content from other users, users who are unknown to the sites being tested.

It’s all well and good to test a website for discrimination but holding it responsible for the input of unknown others, seems like a bit of a stretch.

We won’t know, can’t know, whose input is responsible for whatever results the researchers obtain. The results could well be that no discrimination is found, by happenstance, when in fact the site left to its own devices, does discriminate in some illegal way.

Another aspect of the same problem is that machine learning algorithms give discrete results, they do no provide an explanation for how they arrived at a result. That is a site could be displaying high paying job ads to one job seeker but not another, because one “fake” user has been confused with a real user in an outside database and additional information has been used but not exposed in the result.

Not to mention given the dynamic nature of machine learning, the site that you complain about today as practicing discrimination, may not. Without an opportunity to capture the entire data set for each alleged claim of discrimination, websites can reply, “ok, you say that was your result on day X. Have you tried today?”

The testing for discrimination claim is a clever attack on the CFAA but given the complexities and realities of machine learning, I don’t see it as being a successful one.

National Security Letter (NSL) Resources

Filed under: FBI,Government,National Security — Patrick Durusau @ 3:38 pm

After posting about the use of National Security Letters (NSLs) to abuse the press yesterday, I discovered a very useful paper on NSLs by Charles Doyle. The first one is an abridged version of the second.

National Security Letters in Foreign Intelligence Investigations: A Glimpse at the Legal Background (abridged version of: National Security Letters in Foreign Intelligence Investigations: Legal Background.)

National Security Letters in Foreign Intelligence Investigations: Legal Background

(NOT legal advice)

Doyle identifies two perils posed by National Security Letters:

Contempt of Court


If an NSL contains a nondisclosure notice, it must advice the recipient of its right to seek, or to have the agency seek, judicial review. At the recipient’s request, the issuing agency must petition the court for review, stating the specific facts that support its belief that disclosure might result in one or more of the statutorily identified adverse consequences. 140 If the court agrees that such a risk may exist, it must issue a nondisclosure order. 141 (page 21) Failure to honor a nondisclosure order is punishable as contempt of court, 142…

Contempt of court sanctions come into play if, and only if, the recipient has sought judicial review and becomes subject to a court order.

Non-Court Order Penalties

…and if committed knowingly and with the intent to obstruct an investigation or related judicial proceedings is punishable by imprisonment for not more than five years and/or a fine of not more than $250,000 (not more than $500,000 for an organization). 143

Unpacking the first reference in footnote 143, “18 U.S.C. 1510(e),”


(e) Whoever, having been notified of the applicable disclosure prohibitions or confidentiality requirements of section 2709(c)(1) of this title, section 626(d)(1) or 627(c)(1) of the Fair Credit Reporting Act (15 U.S.C. 1681u(d)(1) or 1681v(c)(1)), section 1114(a)(3)(A) or 1114(a)(5)(D)(i) of the Right to Financial Privacy Act [1] (12 U.S.C. 3414(a)(3)(A) or 3414(a)(5)(D)(i)), or section 802(b)(1) of the National Security Act of 1947 (50 U.S.C. 436(b)(1)),[2] knowingly and with the intent to obstruct an investigation or judicial proceeding violates such prohibitions or requirements applicable by law to such person shall be imprisoned for not more than five years, fined under this title, or both.

As I read 18 U.S.C. 1510(e), it requires:

  1. Notice of the applicable disclosure prohibitions or confidentiality requirements
  2. Disclosure
    1. knowingly (excludes accidental disclosure ?)
    2. with the intent to obstruct an investigation or judicial proceeding

The first step in any government prosecution for leaking an NSL requires proof of the applicable disclosure prohibitions, in other words, that some identified individual was notified of the applicable disclosure prohibitions.

The list of people who could have leaked an NSL of necessity includes all the people in the government with knowledge of the NSL, which I suspect won’t be disclosed to the trier of fact, plus the recipient and their counsel, etc.

Government documents, even FBI documents get leaked on a regular basis.

The lack of NSL leaks appears to be more a matter of timidity than serious jeopardy. The very worse response to terrorist-fiction-driven legislation is to take it seriously.

The more NSAs are treated as anything other than Col. “Bat” Guano responses to a world only he can see, the deeper we become mired in unconstitutional habits and practices.

Open Access Journals Threaten Science – What’s Your Romesburg Number?

Filed under: Open Access,Peer Review — Patrick Durusau @ 10:35 am

When I saw the pay-per-view screen shot of this article on Twitter, I almost dismissed it as Photoshop-based humor. But, anything is possible so I searched for the title, only to find:

How publishing in open access journals threatens science and what we can do about it by H. Charles Romesburg (Department of Environment and Society, Utah State University, Logan, UT, USA).

Abstract:

The last decade has seen an enormous increase in the number of peer-reviewed open access research journals in which authors whose articles are accepted for publication pay a fee to have them made freely available on the Internet. Could this popularity of open access publishing be a bad thing? Is it actually imperiling the future of science? In this commentary, I argue that it is. Drawing upon research literature, I explain why it is almost always best to publish in society journals (i.e., those sponsored by research societies such as Journal of Wildlife Management) and not nearly as good to publish in commercial academic journals, and worst—to the point it should normally be opposed—to publish in open access journals (e.g., PLOS ONE). I compare the operating plans of society journals and open access journals based on 2 features: the quality of peer review they provide and the quality of debate the articles they publish receive. On both features, the quality is generally high for society journals but unacceptably low for open access journals, to such an extent that open access publishing threatens to pollute science with false findings. Moreover, its popularity threatens to attract researchers’ allegiance to it and away from society journals, making it difficult for them to achieve their traditionally high standards of peer reviewing and of furthering debate. I prove that the commonly claimed benefits to science of open access publishing are nonexistent or much overestimated. I challenge the notion that journal impact factors should be a key consideration in selecting journals in which to publish. I suggest ways to strengthen the Journal and keep it strong. © 2016 The Wildlife Society.

On a pay-per-view site (of course):

wildlife-460

You know about the Erdős number, which measures your distance from collaborating with Paul Erdős.

I propose the Romesburg Number, which measures your collaboration distance from H. Charles Romesburg. The higher your number, the further removed you are from Romesburg.

I don’t have all the data but I hopeful my Romesburg number is 12 or higher.

« Newer Posts

Powered by WordPress