## Archive for June, 2016

Tuesday, June 21st, 2016

Abstract:

Academic publishers claim that they add value to scholarly communications by coordinating reviews and contributing and enhancing text during publication. These contributions come at a considerable cost: U.S. academic libraries paid $1.7 billion for serial subscriptions in 2008 alone. Library budgets, in contrast, are flat and not able to keep pace with serial price inflation. We have investigated the publishers’ value proposition by conducting a comparative study of pre-print papers and their final published counterparts. This comparison had two working assumptions: 1) if the publishers’ argument is valid, the text of a pre-print paper should vary measurably from its corresponding final published version, and 2) by applying standard similarity measures, we should be able to detect and quantify such differences. Our analysis revealed that the text contents of the scientific papers generally changed very little from their pre-print to final published versions. These findings contribute empirical indicators to discussions of the added value of commercial publishers and therefore should influence libraries’ economic decisions regarding access to scholarly publications. The authors have performed a very detailed analysis of pre-prints, 90% – 95% of which are published as open pre-prints first, to conclude there is no appreciable difference between the pre-prints and the final published versions. I take “…no appreciable difference…” to mean academic publishers and the peer review process, despite claims to the contrary, contribute little or no value to academic publications. How’s that for a bargaining chip in negotiating subscription prices? ### Tapping Into The Terror Money Stream Tuesday, June 21st, 2016 From the post: If the federal government is good at anything, however, it’s throwing money at threats. Since 2003, taxpayers have contributed$1.3 billion to the feds’ BioWatch program, a network of pathogen detectors deployed in D.C. and 33 other cities (plus at so-called national security events like the Super Bowl), despite persistent questions about its need and reliability. In 2013, Republican Representative Tim Murphy of Pennsylvania, chairman of the House Energy and Commerce Committee’s Oversight and Investigations subcommittee, called it a “boondoggle.” Jeh Johnson, who took over the reins of the Department of Homeland Security (DHS) in late 2013, evidently agreed. One of his first acts was to cancel a planned third generation of the program, but the rest of it is still running.

“The BioWatch program was a mistake from the start,” a former top federal emergency medicine official tells Newsweek on condition of anonymity, saying he fears retaliation from the government for speaking out. The well-known problems with the detectors, he says, are both highly technical and practical. “Any sort of thing can blow into its filter papers, and then you are wrapping yourself around an axle,” trying to figure out if it’s real. Of the 149 suspected pathogen samples collected by BioWatch detectors nationwide, he reports, “none were a threat to public health.” A 2003 tularemia alarm in Texas was traced to a dead rabbit.

Michael Sheehan, a former top Pentagon, State Department and New York Police Department counterterrorism official, echoes such assessments. “The technology didn’t work, and I had no confidence that it ever would,” he tells Newsweek. The immense amounts of time and money devoted to it, he adds, could’ve been better spent “protecting dangerous pathogens stored in city hospitals from falling into the wrong hands.” When he sought to explore that angle at the NYPD, the Centers for Disease Control and Prevention “initially would not tell us where they were until I sent two detectives to Atlanta to find out,” he says. “And they did, and we helped the hospitals with their security—and they were happy for the assistance.”

Even if BioWatch performed as touted, Sheehan and others say, a virus would be virtually out of control and sending scores of people to emergency rooms by the time air samples were gathered, analyzed and the horrific results distributed to first responders. BioWatch, Sheehan suggests, is a billion-dollar hammer looking for a nail, since “weaponizing biological agents is incredibly hard to do,” and even ISIS, which theoretically has the scientific assets to pursue such weapons, has shown little sustained interest in them. Plus, extremists of all denominations have demonstrated over the decades that they like things that go boom (or tat-tat-tat, the sound of an assault rifle). So the $1.1 billion spent on BioWatch is way out of proportion to the risk, critics argue. What’s really driving programs like BioWatch, Sheehan says—beside fears of leaving any potential threat uncovered, no matter how small—is the opportunity it gives members of Congress to lard out pork to research universities and contractors back home. Considering that two people, one rifle, terrorized the D.C. area for 23 days, The Beltway Snipers, Part 1, The Beltway Snipers, Part 2, I would have to say yes, ISIS can take down D.C. Even if they limit themselves to “…things that go boom (or tat-tat-tat, the sound of an assault rifle).” (You have to wonder about the quality of their “terrorist” training.) But in order to get funding, you have to discover a scenario that isn’t fully occupied by contractors. Quite recently I read of an effort to detect the possible onset of terror attacks based on social media traffic. Except there is no evidence that random social media group traffic picks up before a terrorist attack. Yeah, well, there is that but that won’t come up for years. Here’s a new terror vector. Using Washington, D.C. as an example, how would you weaponize open data found at: District of Columbia Open Data? Data.gov reports there are forty states (US), forty-eight counties and cities (US), fifty-two international countries (what else would they be?), and one-hundred and sixty-four international regions with open data portals. That’s a considerable amount of open data. Data that could be combined together to further ends not intended to improve public health and well-being. Don’t allow the techno-jingoism of posts like: How big data can terrorize global terrorism lull you in to a false sense of security. Anyone who can think beyond being a not-so-smart bomb or tat-tat-tat can access and use open data with free tools. Are you aware of the danger that poses? ### Driving While Black (DWB) Stops Affirmed By Supreme Court [Hacker Tip] Tuesday, June 21st, 2016 Justice Sotomayor captures the essence of Utah v. Strieff when she writes: The Court today holds that the discovery of a warrant for an unpaid parking ticket will forgive a police officer’s violation of your Fourth Amendment rights. Do not be soothed by the opinion’s technical language: This case allows the police to stop you on the street, demand your identification, and check it for outstanding traffic warrants—even if you are doing nothing wrong. If the officer discovers a warrant for a fine you forgot to pay, courts will now excuse his illegal stop and will admit into evidence anything he happens to find by searching you after arresting you on the warrant. Because the Fourth Amendment should prohibit, not permit, such misconduct, I dissent. The facts are easy enough to summarize, Edward Strieff was seen visiting a home that had been reported (but not confirmed) as a site of drug sales. Officer Frackwell, with no suspicions that Strieff had committed a crime, detained Strieff, requested his identification and was advised of a traffic warrant for his arrest. Frackwell arrested Strieff and while searching him, discovered “a baggie of methamphetamine and drug paraphernalia.” Frackwell moved to suppress the “a baggie of methamphetamine and drug paraphernalia” since Officer Frackwell lacked even a pretense for the original stop. The Utah Supreme Court correctly agreed but the Supreme Court in this decision, written by “Justice” Thomas, disagreed. The “exclusionary rule” has a long history but for our purposes, it suffices to say that it removes any incentive for police officers to stop people without reasonable suspicion and demand their ID, search them, etc. It does so by excluding any evidence of a crime they discover as a result of such a stop. Or at least it did prior to Utah v. Strieff. Police officers were forced to make up some pretext for a reasonable suspicion in order to stop any given individual. No reasonable suspicion for stop = No evidence to be used in court. That was the theory, prior to Utah v. Strieff Sotomayor makes clear in her dissent, this was a suspicionless stop: This case involves a suspicionless stop, one in which the officer initiated this chain of events without justification. As the Justice Department notes, supra, at 8, many innocent people are subjected to the humiliations of these unconstitutional searches. The white defendant in this case shows that anyone’s dignity can be violated in this manner. See M. Gottschalk, Caught 119–138 (2015). But it is no secret that people of color are disproportionate victims of this type of scrutiny. See M. Alexander, The New Jim Crow 95–136 (2010). For generations, black and brown parents have given their children “the talk”— instructing them never to run down the street; always keep your hands where they can be seen; do not even think of talking back to a stranger—all out of fear of how an officer with a gun will react to them. See, e.g., W. E. B. Du Bois, The Souls of Black Folk (1903); J. Baldwin, The Fire Next Time (1963); T. Coates, Between the World and Me (2015). By legitimizing the conduct that produces this double consciousness, this case tells everyone, white and black, guilty and innocent, that an officer can verify your legal status at any time. It says that your body is subject to invasion while courts excuse the violation of your rights. It implies that you are not a citizen of a democracy but the subject of a carceral state, just waiting to be cataloged. We must not pretend that the countless people who are routinely targeted by police are “isolated.” They are the canaries in the coal mine whose deaths, civil and literal, warn us that no one can breathe in this atmosphere. See L. Guinier & G. Torres, The Miner’s Canary 274–283 (2002). They are the ones who recognize that unlawful police stops corrode all our civil liberties and threaten all our lives. Until their voices matter too, our justice system will continue to be anything but. (emphasis in original) New rule: Police can stop you at any time, for no reason, demand identification, check your legal status, if you are arrested as a result of that check, any evidence seized can be used against you in court. Police officers were very good at imagining reasonable cause for stopping people, but now even that tissue of protection has been torn away. You are subject to arbitrary and capricious stops with no disincentive for the police. They can go fishing for evidence and see what turns up. For all of that, I don’t see the police as our enemy. They are playing by rules as defined by others. If we want better play, such as Fourth Amendment rights, then we need enforcement of those rights. It isn’t hard to identify the enemies of the people in this decision. Hackers, you too can be stopped at anytime. Hackers should never carry incriminating USB drives, SIM cards, etc. If possible, everything even remotely questionable should not be in a location physically associated with you. Remote storage of your code, booty, etc., protects it from clumsy physical seizure of local hardware and, if you are very brave, enables rapid recovery from such seizures. ### Cryptome – Happy 20th Anniversary! Monday, June 20th, 2016 Cryptome marks 20 years, June 1996-2016, 100K dox thanx to 25K mostly anonymous doxers. Donate$100 for the Cryptome Archive of 101,900 files from June 1996 to 25 May 2016 on 1 USB  (43.5GB). Cryptome public key.
(Search site with Google, or WikiLeaks for most not all.)

Bitcoin: 1P11b3Xkgagzex3fYusVcJ3ZTVsNwwnrBZ

Interesting post on fake Cryptome torrents: http://www.joshwieder.net/2015/07/cryptome-torrents-draw-concerns.html

$100 is a real bargain for the Cryptome Archive, plus you will be helping a worthy cause. Repost the news of Cryptome 20th anniversary far and wide! Thanks! ### Clojure Gazette – New Format – Looking for New Readers Monday, June 20th, 2016 From the end of this essay: Hi! The Clojure Gazette has recently changed from a list of curated links to an essay-style newsletter. I’ve gotten nothing but good comments about the change, but I’ve also noticed the first negative growth of readership since I started. I know these essays aren’t for everyone, but I’m sure there are people out there who would like the new format who don’t know about it. Would you do me a favor? Please share the Gazette with your friends! The Biggest Waste in Our Industry is the title of the essay I link to above. From the post: I would like to talk about two nasty habits I have been party to working in software. Those two habits are 1) protecting programmer time and 2) measuring programmer productivity. I’m talking from my experience as a programmer to all the managers out there, or any programmer interested in process. You can think of Eric’s essay as an update to Peopleware: Productive Projects and Teams by Tom DeMarco and Timothy Lister. Peopleware was first published in 1987, second edition in 1999 (8 new chapters), third edition in 2013 (5 more pages than 1999 edition?). Twenty-nine (29) years after the publication of Peopleware, managers still don’t “get” how to manage programmers (or other creative workers). Disappointing, but not surprising. It’s not uncommon to read position ads that describe going to lunch en masse, group activities, etc. You would think they were hiring lemmings rather than technical staff. If your startup founder is that lonely, check the local mission. Hire people for social activities, lunch, etc. Cheaper than hiring salaried staff. Greater variety as well. Ditto for managers with the need to “manage” someone. ### Tufte-inspired LaTeX (handouts, papers, and books) Monday, June 20th, 2016 From the webpage: As discussed in the Book Design thread of Edward Tufte’s Ask E.T Forum, this site is home to LaTeX classes for producing handouts and books according to the style of Edward R. Tufte and Richard Feynman. Download the latest release, browse the source, join the mailing list, and/or submit patches. Contributors are welcome to help polish these classes! Some examples of the Tufte-LaTeX classes in action: • Some papers by Jason Catena using the handout class • A handout for a math club lecture on volumes of n-dimensional spheres by Marty Weissman • A draft copy of a book written by Marty Weissman using the new Tufte-book class • An example handout (source) using XeLaTeX with the bidi class option for the ancient Hebrew by Kirk Lowery Caution: A Tufte-inspired LaTeX class is no substitute for professional design advice and assistance. It will help you do “better,” for some definition of “better,” but professional design is in a class of its own. If you are interested in TeX/LaTeX tips, follow: TexTips. One of several excellent Twitter feeds by John D. Cook. ### Machine Learning Yearning [New Book – Free Draft – Signup By Friday June 24th (2016) Monday, June 20th, 2016 Andrew Ng is Associate Professor of Computer Science at Stanford; Chief Scientist of Baidu; and Chairman and Co-founder of Coursera. In 2011 he led the development of Stanford University’s main MOOC (Massive Open Online Courses) platform and also taught an online Machine Learning class to over 100,000 students, leading to the founding of Coursera. Ng’s goal is to give everyone in the world access to a great education, for free. Today, Coursera partners with some of the top universities in the world to offer high quality online courses, and is the largest MOOC platform in the world. Ng also works on machine learning with an emphasis on deep learning. He founded and led the “Google Brain” project which developed massive-scale deep learning algorithms. This resulted in the famous “Google cat” result, in which a massive neural network with 1 billion parameters learned from unlabeled YouTube videos to detect cats. More recently, he continues to work on deep learning and its applications to computer vision and speech, including such applications as autonomous driving. Haven’t you signed up yet? OK, What You Will Learn: The goal of this book is to teach you how to make the numerous decisions needed with organizing a machine learning project. You will learn: • How to establish your dev and test sets • Basic error analysis • How you can use Bias and Variance to decide what to do • Learning curves • Comparing learning algorithms to human-level performance • Debugging inference algorithms • When you should and should not use end-to-end deep learning • Error analysis by parts Free drafts of a new book on machine learning projects, not just machine learning, by one of the leading world experts on machine learning. Now are you signed up? If you are interested in machine learning, following Andrew Ng on Twitter isn’t a bad place to start. Be aware, however, that even machine learning experts can be mistaken. For example, Andrew tweeted, favorably, How to make a good teacher from the Economist. Instilling these techniques is easier said than done. With teaching as with other complex skills, the route to mastery is not abstruse theory but intense, guided practice grounded in subject-matter knowledge and pedagogical methods. Trainees should spend more time in the classroom. The places where pupils do best, for example Finland, Singapore and Shanghai, put novice teachers through a demanding apprenticeship. In America high-performing charter schools teach trainees in the classroom and bring them on with coaching and feedback. Teacher-training institutions need to be more rigorous—rather as a century ago medical schools raised the calibre of doctors by introducing systematic curriculums and providing clinical experience. It is essential that teacher-training colleges start to collect and publish data on how their graduates perform in the classroom. Courses that produce teachers who go on to do little or nothing to improve their pupils’ learning should not receive subsidies or see their graduates become teachers. They would then have to improve to survive. The author conflates “demanding apprenticeship” with “teacher-training colleges start to collect and publish data on how their graduates perform in the classroom,” as though whatever data we collect has some meaningful relationship with teaching and/or the training of teachers. A “demanding apprenticeship” no doubt weeds out people who are not well suited to be teachers, there is no evidence that it can make a teacher out of someone who isn’t suited for the task. The collection of data is one of the ongoing fallacies about American education. Simply because you can collect data is no indication that it is useful and/or has any relationship to what you are attempting to measure. Follow Andrew for his work on machine learning, not so much for his opinions on education. ### Concealing the Purchase of Government Officials Monday, June 20th, 2016 Fredreka Schouten reports in House approves Koch-backed bill to shield donors’ names the US House of Representatives, has passed a measure to conceal the purchase of government officials. From the post: The House approved a bill Tuesday that would bar the IRS from collecting the names of donors to tax-exempt groups, prompting warnings from campaign-finance watchdogs that it could lead to foreign interests illegally infiltrating American elections. The measure, which has the support of House Speaker Paul Ryan, R-Wis., also pits the Obama administration against one of the most powerful figures in Republican politics, billionaire industrialist Charles Koch. Koch’s donor network channels hundreds of millions of dollars each year into groups that largely use anonymous donations to shape policies on everything from health care to tax subsidies. Its leaders have urged the Republican-controlled Congress to clamp down on the IRS, citing free-speech concerns. The names of donors to politically active non-profit groups aren’t public information now, but the organizations still have to disclose donor information to the IRS on annual tax returns. The bill, written by Rep. Peter Roskam, R-Ill., would prohibit the tax agency from collecting names, addresses or any “identifying information” about donors. Truth be told, however, “the House” didn’t vote in favor of H.R.5053 – Preventing IRS Abuse and Protecting Free Speech Act. Rather, two-hundred and forty (240) identified representatives voted in favor of H.R.5053. Two-hundred and forty representatives purchased by campaign contributions who now wish to keep their contributors secret. Two-hundred and forty representatives who are as likely as not, guilty of criminal, financial/sexual or other forms of misconduct, that could result in their replacement. Two-hundred and forty representatives who continue in office only so long as they are not exposed to law enforcement and the public. Where are you going to invest your time and resources? Showing solidarity on issues where substantive change isn’t going to happen, or taking back your government from its current purchasers? PS: In case you think “substantive change” is possible on gun control, consider the unlikely scenario that “assault weapons” are banned from sale. So what? The ones in circulation number in the millions. Net effect of your “victory” would be exactly zero. ### How do you skim through a digital book? Sunday, June 19th, 2016 From the post: We’ve had a couple of digitised books that proved really popular with online audiences. Perhaps partly reflecting the interests of the global population, they’ve been about prostitutes and demons. I’ve been especially interested in how people have interacted with these popular digitised books. Imagine how you’d pick up a book to look at in a library or bookshop. Would you start from page one, laboriously working through page by page, or would you flip through it, checking for interesting bits? Should we expect any different behaviour when people use a digital book? We collect data on aggregate (nothing personal or trackable to our users) about what’s being asked of our digitised items in the viewer. With such a large number of views of these two popular books, I’ve got a big enough dataset to get an interesting idea of how readers might be using our digitised books. Focusing on ‘Compendium rarissimum totius Artis Magicae sistematisatae per celeberrimos Artis hujus Magistros. Anno 1057. Noli me tangere’ (the 18th century one about demons) I’ve mapped the number of page views (horizontal axis) against page number (vertical axis, with front cover at the top), and added coloured bands to represent what’s on those pages. Chole captured and then analyzed the reading behavior of readers on two very popular electronic titles. She explains her second observation: Observation 2: People like looking at pictures more than text by suggesting the text being in Latin and German may explain the fondness for the pictures. Perhaps, but I have heard the same observation made about Playboy magazine. 😉 From a documentation/training perspective, Chole’s technique, for digital training materials, could provide guidance on: • Length of materials • Use of illustrations • Organization of materials • What material is habitually unread? If critical material isn’t being read, exhorting newcomers to read more carefully, is not the answer. If security and/or on-boarding reading isn’t happening, as shown by reader behavior, that’s your fault, not the readers. Your call, successful staff and customers or failing staff and customers you can blame for security faults and declining sales. Choose carefully. ### Electronic Literature Organization Sunday, June 19th, 2016 From the “What is E-Lit” page: Electronic literature, or e-lit, refers to works with important literary aspects that take advantage of the capabilities and contexts provided by the stand-alone or networked computer. Within the broad category of electronic literature are several forms and threads of practice, some of which are: • Hypertext fiction and poetry, on and off the Web • Kinetic poetry presented in Flash and using other platforms • Computer art installations which ask viewers to read them or otherwise have literary aspects • Conversational characters, also known as chatterbots • Interactive fiction • Literary apps • Novels that take the form of emails, SMS messages, or blogs • Poems and stories that are generated by computers, either interactively or based on parameters given at the beginning • Collaborative writing projects that allow readers to contribute to the text of a work • Literary performances online that develop new ways of writing The ELO showcase, created in 2006 and with some entries from 2010, provides a selection outstanding examples of electronic literature, as do the two volumes of our Electronic Literature Collection. The field of electronic literature is an evolving one. Literature today not only migrates from print to electronic media; increasingly, “born digital” works are created explicitly for the networked computer. The ELO seeks to bring the literary workings of this network and the process-intensive aspects of literature into visibility. The confrontation with technology at the level of creation is what distinguishes electronic literature from, for example, e-books, digitized versions of print works, and other products of print authors “going digital.” Electronic literature often intersects with conceptual and sound arts, but reading and writing remain central to the literary arts. These activities, unbound by pages and the printed book, now move freely through galleries, performance spaces, and museums. Electronic literature does not reside in any single medium or institution. I was looking for a recent presentation by Allison Parrish on bots when I encountered Electronic Literature Organization (ELO). I was attracted by the bot discussion at a recent conference but as you can see, the range of activities of the ELO is much broader. Enjoy! ### “invisible entities having arcane but gravely important significances” Sunday, June 19th, 2016 Allison Parrish tweeted: https://t.co/sXt6AqEIoZ the “Other, Format” unicode category, full of invisible entities having arcane but gravely important significances I just could not let a tweet with: “invisible entities having arcane but gravely important significances” pass without comment! As of today, one-hundred and fifty (150) such entities. All with multiple properties. How many of these “invisible entities” are familiar to you? ### Formal Methods for Secure Software Construction Sunday, June 19th, 2016 Formal Methods for Secure Software Construction by Ben Goodspeed. Abstract: The objective of this thesis is to evaluate the state of the art in formal methods usage in secure computing. From this evaluation, we analyze the common components and search for weaknesses within the common workflows of secure software construction. An improved workflow is proposed and appropriate system requirements are discussed. The systems are evaluated and further tools in the form of libraries of functions, data types and proofs are provided to simplify work in the selected system. Future directions include improved program and proof guidance via compiler error messages, and targeted proof steps. George chose Idris for this project saying: The criteria for selecting a language for this work were expressive power, theorem proving ability (sufficient to perform universal quantification), extraction/compilation, and performance. Idris has sufficient expressive power to be used as a general purpose language (by design) and has library support for many common tasks (including web development). It supports machine verified proof and universal quantification over its datatypes and can be directly compiled to produce efficiently sized executables with reasonable performance (see section 10.1 for details). Because of these characteristics, we have chosen Idris as the basis for our further work. (at page 57) The other contenders were Coq, Agda, Haskell, and Isabelle. Ben provides examples of using Idris and his Proof Driven Development (PDD), but stops well short of solving the problem of secure software construction. While waiting upon the arrival of viable methods for secure software construction, shouldn’t formal methods be useful in uncovering and documenting failures in current software? Reasoning the greater specificity and exactness of formal methods will draw attention to gaps and failures concealed by custom and practice. Akin to the human eye eliding over mistakes such as “When the the cat runs.” The average reader “auto-corrects” for the presence of the second “the” in that sentence, even knowing there are two occurrences of the word “the.” Perhaps that is a better way to say it: Formal methods avoid the human tendency to auto-correct or elide over unknown outcomes in code. ### PSA – Misleading Post On Smartphone Security Sunday, June 19th, 2016 If you happen across Your smartphone could be hacked without your knowledge by Jennifer Schlesinger and Andrea Day, posted on CNBC, don’t bother to read it. Dissuade others from reading it. The three threats as listed by the authors: • Unsecure Wi-Fi • Operating system flaws • Malicious apps What’s missing? Hmmm, can you say SS7 vulnerability? The omission of SS7 vulnerability is particularly disturbing because in some ways, it has the easiest defense. Think about it for a moment. What do I need as the premise for most (not all) successful SS7 hacks? Your smartphone number. Yes, information you give away with every email, contact information listing, website registration, etc. Not only given away, but archived and available to search engines. If you don’t believe me, try running a web search on your smartphone number. I understand that your smartphone number is as useful as it is widespread. I’m just pointing out how many times you have tied a noose around your own neck. The best (partial) defense to SS7 attacks? Limit the distribution of your smartphone number. When someone omits a root problem of smartphone security, in a listing of smartphone security issues, how much trust can you put in the rest of their analysis? ### Palantir Hack Report – What’s Missing? Sunday, June 19th, 2016 From the post: Palantir Technologies has cultivated a reputation as perhaps the most formidable data analysis firm in Silicon Valley, doing secretive work for defense and intelligence agencies as well as Wall Street giants. But when Palantir hired professional hackers to test the security of its own information systems late last year, the hackers found gaping holes that left data about customers exposed. Palantir, valued at$20 billion, prides itself on an ability to guard important secrets, both its own and those entrusted to it by clients. But after being brought in to try to infiltrate these digital defenses, the cybersecurity firm Veris Group concluded that even a low-level breach would allow hackers to gain wide-ranging and privileged access to the Palantir network, likely leading to the “compromise of critical systems and sensitive data, including customer-specific information.”

This conclusion was presented in a confidential report, reviewed by BuzzFeed News, that detailed the results of a hacking exercise run by Veris over three weeks in September and October last year. The report, submitted on October 19, has been closely guarded inside Palantir and is described publicly here for the first time. “Palantir Use Only” is plastered across each page.

It is not known whether Palantir’s systems have ever been breached by real-world intruders. But the results of the hacking exercise — known as a “red team” test — show how a company widely thought to have superlative ability to safeguard data has struggled with its own data security.

The red team intruders, finding that Palantir lacked crucial internal defenses, ultimately “had complete control of PAL’s domain,” the Veris report says, using an acronym for Palantir. The report recommended that Palantir “immediately” take specific steps to improve its data security.

“The findings from the October 2015 report are old and have long since been resolved,” Lisa Gordon, a Palantir spokesperson, said in an emailed statement. “Our systems and our customers’ information were never at risk. As part of our best practices, we conduct regular reviews and tests of our systems, like every other technology company does.”

Alden gives a lengthy summary of the report, but since Palantir claims the reported risks “…have long since been resolved” where is the Veris report?

Describing issues in glittering generalities isn’t going to improve anyone’s cybersecurity stance.

So I have to wonder, is How Hired Hackers Got “Complete Control” Of Palantir an extended commercial for Veris? Is it an attempt to sow doubt and uncertainty among Palantir customers?

End of the day, Alden’s summary can be captured in one sentence:

Veris attackers took and kept control of Palantir’s network from day one to the end of the exercise, evading defenders all the way.

How useful is that one sentence summary in improving your cybersecurity stance?

That’s what I thought as well.

PS: I’m interested in pointers to any “leaked” copies of the Veris report on Palantir.

### IRS E-File Bucket – Internet Archive

Saturday, June 18th, 2016

IRS E-File Bucket courtesy of Carl Malamud and Public.Resource.Org.

From the webpage:

This bucket contains a mirror of the IRS e-file release as of June 16, 2016. You may access the source files at https://aws.amazon.com/public-data-sets/irs-990/. The present bucket may or may not be updated in the future.

Note that tarballs is image scans from 2002-2015 are also available in this IRS 990 Forms collection.

Many thanks to the Internal Revenue Service for making this information available. Here is their announcement on June 16, 2016. Here is a statement from Public.Resource.Org congratulating the IRS on a job well done.

As I noted in IRS 990 Filing Data (2001 to date):

990* disclosures aren’t detailed enough to pinch but when combined with other data, say leaked data, the results can be remarkable.

It’s up to you to see that public disclosures pinch.

### Where Has Sci-Hub Gone?

Saturday, June 18th, 2016

While I was writing about the latest EC idiocy (link tax), I was reminded of Sci-Hub.

Just checking to see if it was still alive, I tried http://sci-hub.io/.

404 by standard DNS service.

If you are having the same problem, Mike Masnick reports in Sci-Hub, The Repository Of ‘Infringing’ Academic Papers Now Available Via Telegram, you can access Sci-Hub via:

I’m not on Telegram, yet, but that may be changing soon. 😉

BTW, while writing this update, I stumbled across: The New Napster: How Sci-Hub is Blowing Up the Academic Publishing Industry by Jason Shen.

From the post:

This is obviously piracy. And Elsevier, one of the largest academic journal publishers, is furious. In 2015, the company earned $1.1 billion in profits on$2.9 billion in revenue [2] and Sci-hub directly attacks their primary business model: subscription service it sells to academic organizations who pay to get access to its journal articles. Elsevier filed a lawsuit against Sci-Hub in 2015, claiming Sci-hub is causing irreparable injury to the organization and its publishing partners.

But while Elsevier sees Sci-Hub as a major threat, for many scientists and researchers, the site is a gift from the heavens, because they feel unfairly gouged by the pricing of academic publishing. Elsevier is able to boast a lucrative 37% profit margin because of the unusual (and many might call exploitative) business model of academic publishing:

• Scientists and academics submit their research findings to the most prestigious journal they can hope to land in, without getting any pay.
• The journal asks leading experts in that field to review papers for quality (this is called peer-review and these experts usually aren’t paid)
• Finally, the journal turns around and sells access to these articles back to scientists/academics via the organization-wide subscriptions at the academic institution where they work or study

There’s piracy afoot, of that I have no doubt.

Elsevier:

• Relies on research it does not sponsor
• Research is published in journals of value only because of the free contributions to them
• Elsevier makes a 37% profit off of that free content

There is piracy but Jason fails to point to Elsevier as the pirate.

Sci-Hub/Alexandra Elbakyan is re-distributing intellectual property that was stolen by Elsevier from the academic community, for its own gain.

It’s time to bring Elsevier’s reign of terror against the academic community to an end. Support Sci-Hub in any way possible.

### A Plausible Explanation For The EC Human Brain Project

Saturday, June 18th, 2016

I have puzzled for years over how to explain the EC’s Human Brain Project. See The EC Brain if you need background on this ongoing farce.

While reading Reject Europe’s Plans To Tax Links and Platforms by Jeremy Malcolm, I suddenly understood the motivation for the Human Brain Project!

From the post:

A European Commission proposal to give new copyright-like veto powers to publishers could prevent quotation and linking from news articles without permission and payment. The Copyright for Creativity coalition (of which EFF is a member) has put together an easy survey and answering guide to guide you through the process of submitting your views before the consultation for this “link tax” proposal winds up on 15 June.

Since the consultation was opened, the Commission has given us a peek into some of the industry pressures that have motivated what is, on the face of it, otherwise an inexplicable proposal. In the synopsis report that accompanied the release of its Communication on Online Platforms, it writes that “Right-holders from the images sector and press publishers mention the negative impact of search engines and news aggregators that take away some of the traffic on their websites.” However, this claim is counter-factual, as search engines and aggregators are demonstrably responsible for driving significant traffic to news publishers’ websites. This was proved when a study conducted in the wake of introduction of a Spanish link tax resulted in a 6% decline in traffic to news websites, which was even greater for the smaller sites.

There is a severe shortage of human brains at the European Commission! The Human Brain Project is a failing attempt to remedy that shortage of human brains.

Before you get angry, Europe is full of extremely fine brains. But that isn’t the same thing as saying they found at the European Commission.

Consider for example, the farcical request for comments, having previously decided the outcome as cited above. EC customary favoritism and heavy-handedness.

I would not waste electrons submitting comments to the EC on this issue.

Spend your time mining EU news sources and making fair use of their content. Every now and again, gather up your links and send them to the publications and copy the EC. So publications can see the benefits of your linking versus the overhead of the EC.

As the Spanish link tax experience proves, link taxes may deceive property cultists into expecting a windfall, in truth their revenue will decrease and what revenue is collected, will go to the EC.

There’s the mark of a true EC solution:

The intended “beneficiary” is worse off and the EC absorbs what revenue, if any, results.

### Online Surveillance: …ISIS and beyond [Social Media “chaff”]

Saturday, June 18th, 2016

If you ever doubted “anti-terror group surveillance tools” should always be called titled “group surveillance tools,” New online ecology of adversarial aggregates: ISIS and beyond. Science, 2016; 352 (6292): 1459 DOI: 10.1126/science.aaf0675 by N. F. Johnson, et al., puts those to rest.

Unintentionally no doubt, but the “…ISIS and beyond” part of the title signals this technique is not limited to ISIS.

Consider the abstract:

Support for an extremist entity such as Islamic State (ISIS) somehow manages to survive globally online despite considerable external pressure and may ultimately inspire acts by individuals having no history of extremism, membership in a terrorist faction, or direct links to leadership. Examining longitudinal records of online activity, we uncovered an ecology evolving on a daily time scale that drives online support, and we provide a mathematical theory that describes it. The ecology features self-organized aggregates (ad hoc groups formed via linkage to a Facebook page or analog) that proliferate preceding the onset of recent real-world campaigns and adopt novel adaptive mechanisms to enhance their survival. One of the predictions is that development of large, potentially potent pro-ISIS aggregates can be thwarted by targeting smaller ones.

Here’s the abstract re-written for the anti-war movement of the 1960’s:

Support for an extremists such as the anti-Vietnam War movement somehow manages to survive nationally online despite considerable external pressure and may ultimately inspire acts by individuals having no history of extremism, membership in a anti-war faction, or direct links to leadership. Examining longitudinal records of online activity, we uncovered an ecology evolving on a daily time scale that drives online support, and we provide a mathematical theory that describes it. The ecology features self-organized aggregates (ad hoc groups formed via linkage to a Facebook page or analog) that proliferate preceding the onset of recent real-world campaigns and adopt novel adaptive mechanisms to enhance their survival. One of the predictions is that development of large, potentially potent pro-anti-War aggregates can be thwarted by targeting smaller ones.

Here’s the abstract re-written for the civil rights movement of the 1960’s:

Support for an extremists such as SNCC somehow manages to survive nationally online despite considerable external pressure and may ultimately inspire acts by individuals having no history of extremism, membership in a SNCC faction, or direct links to leadership. Examining longitudinal records of online activity, we uncovered an ecology evolving on a daily time scale that drives online support, and we provide a mathematical theory that describes it. The ecology features self-organized aggregates (ad hoc groups formed via linkage to a Facebook page or analog) that proliferate preceding the onset of recent real-world campaigns and adopt novel adaptive mechanisms to enhance their survival. One of the predictions is that development of large, potentially potent SNCC aggregates can be thwarted by targeting smaller ones.

Here’s the abstract re-written for the gay rights movement:

Support for an extremists such as gay rights somehow manages to survive nationally online despite considerable external pressure and may ultimately inspire acts by individuals having no history of extremism, membership in a gay rights faction, or direct links to leadership. Examining longitudinal records of online activity, we uncovered an ecology evolving on a daily time scale that drives online support, and we provide a mathematical theory that describes it. The ecology features self-organized aggregates (ad hoc groups formed via linkage to a Facebook page or analog) that proliferate preceding the onset of recent real-world campaigns and adopt novel adaptive mechanisms to enhance their survival. One of the predictions is that development of large, potentially potent gay rights aggregates can be thwarted by targeting smaller ones.

The government has admitted to the use of surveillance against all three, civil rights, anti-Vietnam war, and gay rights, which in the words of Justice Holmes, “…was an outrage which the Government now regrets….”

I mention those cases so the current fervor against “terrorists” doesn’t blind us to the need for counters to every technique for disrupting “terrorists.”

“Terrorists” being a label applied to people with who some group or government disagrees. Frequently almost entirely fictional, as in the case of the United States. The FBI recruits the mentally ill in order to provide some credence to its hunt for terrorists in the US.

One obvious counter to the aggregate analysis proposed by the authors would be a series of AI-driven aggregates that are auto-populated and supplied with content derived from human users.

Defeating suppression with a large number of “fake” aggregates. Think of it as social media “chaff.”

If you think about it, separating wheat from chaff is a subject identity issue. 😉

Production of social media “chaff” and influencing papers such as this one, is a open research subject.

If you have a cause, I have some time.

### Modelling Stems and Principal Part Lists (Attic Greek)

Friday, June 17th, 2016

From the post:

This is part 0 of a series of blog posts about modelling stems and principal part lists, particularly for Attic Greek but hopefully more generally applicable. This is largely writing up work already done but I’m doing cleanup as I go along as well.

A core part of the handling of verbs in the Morphological Lexicon is the set of terminations and sandhi rules that can generate paradigms attested in grammars like Louise Pratt’s The Essentials of Greek Grammar. Another core part is the stem information for a broader range of verbs usually conveyed in works like Pratt’s in the form of lists of principal parts.

A rough outline of future posts is:

• the sources of principal part lists for this work
• lemmas in the Pratt principal parts
• lemma differences across lists
• what information is captured in each of the lists individually
• how to model a merge of the lists
• inferring stems from principal parts
• stems, terminations and sandhi
• relationships between stems
• ???

I’ll update this outline with links as posts are published.

(emphasis in original)

A welcome reminder of projects that transcend the ephemera that is social media.

Or should I say “modern” social media?

The texts we parse so carefully were originally spoken, recorded and copied, repeatedly, without the benefit of modern reference grammars and/or dictionaries.

Enjoy!

### Volumetric Data Analysis – yt

Friday, June 17th, 2016

One of those rotating homepages:

Volumetric Data Analysis – yt

yt is a python package for analyzing and visualizing volumetric, multi-resolution data from astrophysical simulations, radio telescopes, and a burgeoning interdisciplinary community.

Quantitative Analysis and Visualization

yt is more than a visualization package: it is a tool to seamlessly handle simulation output files to make analysis simple. yt can easily knit together volumetric data to investigate phase-space distributions, averages, line integrals, streamline queries, region selection, halo finding, contour identification, surface extraction and more.

Many formats, one language

yt aims to provide a simple uniform way of handling volumetric data, regardless of where it is generated. yt currently supports FLASH, Enzo, Boxlib, Athena, arbitrary volumes, Gadget, Tipsy, ART, RAMSES and MOAB. If your data isn’t already supported, why not add it?

From the non-rotating part of the homepage:

To get started using yt to explore data, we provide resources including documentation, workshop material, and even a fully-executable quick start guide demonstrating many of yt’s capabilities.

But if you just want to dive in and start using yt, we have a long list of recipes demonstrating how to do various tasks in yt. We even have sample datasets from all of our supported codes on which you can test these recipes. While yt should just work with your data, here are some instructions on loading in datasets from our supported codes and formats.

Professional astronomical data and tools like yt put exploration of the universe at your fingertips!

Enjoy!

### Hacking Any Facebook Account – SS7 Weakness

Friday, June 17th, 2016

How to Hack Someones Facebook Account Just by Knowing their Phone Numbers by Swati Khandelwal.

From the post:

Hacking Facebook account is one of the major queries on the Internet today. It’s hard to find — how to hack Facebook account, but researchers have just proven by taking control of a Facebook account with only the target’s phone number and some hacking skills.

Yes, your Facebook account can be hacked, no matter how strong your password is or how much extra security measures you have taken. No joke!

Hackers with skills to exploit the SS7 network can hack your Facebook account. All they need is your phone number.

The weaknesses in the part of global telecom network SS7 not only let hackers and spy agencies listen to personal phone calls and intercept SMSes on a potentially massive scale but also let them hijack social media accounts to which you have provided your phone number.

Swati’s post has the details and a video of the hack in action.

Of greater interest than hacking Facebook accounts, however, is the weakness in the SS7 network. Hacking Facebook accounts is good for intelligence gathering, annoying the defenseless, etc., but fundamental weaknesses in telecom network is something different.

Swaiti quotes a Facebook clone as saying:

“Because this technique [SSL exploitation] requires significant technical and financial investment, it is a very low risk for most people,”

Here’s the video from Swati’s post (2:42 in length):

Having watched it, can you point out the “…significant technical and financial investment…” involved in that hack?

What investment would you make for a hack that opens up Gmail, Twitter, WhatsApp, Telegram, Facebook, any service that uses SMS, to attack?

Definitely a hack for your intelligence gathering toolkit.

### Visualizing your Titan graph database:…

Friday, June 17th, 2016

From the post:

Last summer, we wrote a blog with our five simple steps to visualizing your Titan graph database with KeyLines. Since then TinkerPop has emerged from the Apache Incubator program with TinkerPop3, and the Titan team have released v1.0 of their graph database:

• TinkerPop3 is the latest major reincarnation of the graph proje­­­ct, pulling together the multiple ventures into a single united ecosystem.
• Titan 1.0 is the first stable release of the Titan graph database, based on the TinkerPop3 stack.

We thought it was about time we updated our five-step process, so here’s:

Not exactly five (5) steps because you have to acquire a KeyLines trial key, etc.

A great endorsement of much improved installation process for TinkerPop3 and Titan 1.0.

Enjoy!

### RSA Cybersecurity Poverty Index [Safety in Numbers?]

Thursday, June 16th, 2016

RSA Research: 75% of Organizations are at Significant Risk of Cyber Incidents

Highlights from the post:

• For the second straight year, 75% of survey respondents have a significant cybersecurity risk exposure
• Organizations that report more business-impacting security incidents are 65% more likely to have advanced cyber maturity capabilities
• Half of those surveyed assess their incident response capabilities as either “ad hoc” or “nonexistent”
• Less mature Organizations continue to mistakenly implement more perimeter technologies as a stop gap measure to prevent incidents from occurring
• Government and Energy ranked lowest among industries in cyber preparedness
• American entities continue to rank themselves behind both APJ and EMEA in overall cyber maturity

Relying on cybersecurity poverty making others more likely targets, is like increasing the size of a herd of sheep to reduce the odds of a wolf carrying off any particular one.

That works, but is of little consolation to the sheep that is carried off.

Are you depending on other sheep being carried off?

### Are Non-AI Decisions “Open to Inspection?”

Thursday, June 16th, 2016

From the post:

As our civilization becomes more and more reliant upon computers and other intelligent devices, there arises specific moral issue that designers and programmers will inevitably be forced to address. Among these concerns is trust. Can we trust that the AI we create will do what it was designed to without any bias? There’s also the issue of incorruptibility. Can the AI be fooled into doing something unethical? Can it be programmed to commit illegal or immoral acts? Transparency comes to mind as well. Will the motives of the programmer or the AI be clear? Or will there be ambiguity in the interactions between humans and AI? The list of questions could go on and on.

Imagine if the government uses a machine-learning algorithm to recommend applications for student loan approvals. A rejected student and or parent could file a lawsuit alleging that the algorithm was designed with racial bias against some student applicants. The defense could be that this couldn’t be possible since it was intentionally designed so that it wouldn’t have knowledge of the race of the person applying for the student loan. This could be the reason for making a system like this in the first place — to assure that ethnicity will not be a factor as it could be with a human approving the applications. But suppose some racial profiling was proven in this case.

If directed evolution produced the AI algorithm, then it may be impossible to understand why, or even how. Maybe the AI algorithm uses the physical address data of candidates as one of the criteria in making decisions. Maybe they were born in or at some time lived in poverty‐stricken regions, and that in fact, a majority of those applicants who fit these criteria happened to be minorities. We wouldn’t be able to find out any of this if we didn’t have some way to audit the systems we are designing. It will become critical for us to design AI algorithms that are not just robust and scalable, but also easily open to inspection.

While I can appreciate the desire to make AI algorithms that are “…easily open to inspection…,” I feel compelled to point out that human decision making has resisted such openness for thousands of years.

There are the tales we tell each other about “rational” decision making but those aren’t how decisions are made, rather they are how we justify decisions made to ourselves and others. Not exactly the same thing.

Recall the parole granting behavior of israeli judges that depended upon the proximity to their last meal. Certainly all of those judges would argue for their “rational” decisions but meal time was a better predictor than any other. (Extraneous factors in judicial decisions)

My point being that if we struggle to even articulate the actual basis for non-AI decisions, where is our model for making AI decisions “open to inspection?” What would that look like?

You could say, for example, no discrimination based on race. OK, but that’s not going to work if you want to purposely setup scholarships for minority students.

When you object, “…that’s not what I meant! You know what I mean!…,” well, I might, but try convincing an AI that has no social context of what you “meant.”

The openness of AI decisions to inspection is an important issue but the human record in that regard isn’t encouraging.

### IRS 990 Filing Data (2001 to date)

Thursday, June 16th, 2016

IRS 990 Filing Data Now Available as an AWS Public Data Set

From the post:

We are excited to announce that over one million electronic IRS 990 filings are available via Amazon Simple Storage Service (Amazon S3). Filings from 2011 to the present are currently available and the IRS will add new 990 filing data each month.

(image omitted)

Form 990 is the form used by the United States Internal Revenue Service (IRS) to gather financial information about nonprofit organizations. By making electronic 990 filing data available, the IRS has made it possible for anyone to programmatically access and analyze information about individual nonprofits or the entire nonprofit sector in the United States. This also makes it possible to analyze it in the cloud without having to download the data or store it themselves, which lowers the cost of product development and accelerates analysis.

Each electronic 990 filing is available as a unique XML file in the “irs-form-990” S3 bucket in the AWS US East (N. Virginia) region. Information on how the data is organized and what it contains is available on the IRS 990 Filings on AWS Public Data Set landing page.

Some of the forms and instructions that will help you make sense of the data reported:

As always, use caution with law related data as words may have unusual nuances and/or unexpected meanings.

These forms and instructions are only a tiny part of a vast iceberg of laws, regulations, rulings, court decisions and the like.

990* disclosures aren’t detailed enough to pinch but when combined with other data, say leaked data, the results can be remarkable.

### Securing Captured Intelligence

Thursday, June 16th, 2016

In Intelligence Gathering… [Capturing Intelligence] I closed with the thought that securing of captured intelligence wasn’t discussed in Intelligence Gathering & Its Relationship to the Penetration Testing Process by Dimitar Kostadinov.

Security wasn’t Dimitar’s focus so the omission was understandable, but I can’t recall seeing any discussion of securing the results of intelligence gathering. Can you?

Are intelligence results by default subject to the same (lack of) security that most of us practice on our computers?

That’s ironic given that the goal of intelligence gathering is the penetration of other computers.

If you first response is that you have encrypted your hard drive, consider Indefinite prison for suspect who won’t decrypt hard drives, feds say by David Kravets.

I agree that the suspect in that case has the far better argument (and case law), but on the other hand, you will note he has been in prison for seven months while the government argues it “knows” he is guilty.

The government’s claim of knowledge is puzzling because if they have proof of his guilt, why not proceed to trial? Ah, yes, that is an inconvenient question for the prosecution.

As I said, the case law appears to be on the side of the suspect but the prosecution has still cost him months of his life and depending on the decision of the Third Circuit, that could stretch into years.

An encrypted hard drive and refusal to unlock it may save you, at least for a while, from prosecution for hacking, but how much time do you want to spend in jail just for having an encrypted drive?

I’m not saying an encrypted drive is a bad idea, nice first line of defense but it isn’t a slam dunk when it comes to concealing information.

Within an encrypted drive, my concealment of captured hacking intelligence should meet the following requirements:

1. The captured hacking intelligence should be concealed in plain sight. That is a casual observer should not be able to distinguish the captured hacking intelligence file from any other file of a similar nature.
2. Even if the captured hacking intelligence file is identified, it should not be possible for a prosecutor to prove specified content was in fact recorded in that file.
3. As a counter to whatever fanciful claims by prosecutors, it should be possible to produce an innocent text from the captured intelligence file in a repeatable way. One that does not enable prosecutors to do the same thing with specified content.
4. Finally, it must be possible to effectively use and supplement the captured hacking intelligence content.

Notice that brevity is not a requirement. Storage space is virtually unlimited so unless you are creating an encyclopedia for one hacking job, I don’t see that as an issue.

Other requirements?

Suggestions for solutions that meet the requirements I outlined above?

### Intelligence Gathering… [Capturing Intelligence]

Thursday, June 16th, 2016

From the post:

Penetration testing simulates real cyber-attacks, either directly or indirectly, to circumvent security systems and gain access to a company’s information assets. The whole process, however, is more than just playing automated tools and then proceed to write down a report, submit it and collect the check.

The Penetration Testing Execution Standard (PTES) is a norm adopted by leading members of the security community as a way to establish a set of fundamental principles of conducting a penetration test. Seven phases lay the foundations of this standard: Pre-engagement Interactions, Information Gathering, Threat Modeling, Exploitation, Post Exploitation, Vulnerability Analysis, Reporting.

Intelligence gathering is the first stage in which direct actions against the target are taken. One of the most important ability a pen tester should possess is to know how to learn as much as possible about a targeted organization without the test has even begun – for instance, how this organization operates and its day-to-day business dealings – but most of all, he should make any reasonable endeavor to learn more about its security posture and, self-explanatory, how this organization can be attacked effectively. So, every piece of information that a pen tester can gather will provide invaluable insights into essential characteristics of the security systems in place.

Great introduction to intelligence gathering with links to some of the more obvious tools and coverage of common techniques.

My only reservation is that Dimitar doesn’t mention how you capture the intelligence you have gathered.

Text document edited in Emacs?

Word document (shudder) under control of a SharePoint (shudder, shudder) server?

Graph/Topic Map?

Intelligence gathering results in non-linear discovery arbitrary relationships and facts. Don’t limit yourself to a linear capture methodology, however necessary linear reports are for others.

My vote is with graphs/topic maps.

Since he didn’t mention recording your intelligence, Dimitar also doesn’t discuss how you secure your captured intelligence. But that’s a topic for another post.

### A Taste of the DNC

Wednesday, June 15th, 2016

From the post:

Worldwide known cyber security company CrowdStrike announced that the Democratic National Committee (DNC) servers had been hacked by “sophisticated” hacker groups.

I’m very pleased the company appreciated my skills so highly))) But in fact, it was easy, very easy.

Guccifer may have been the first one who penetrated Hillary Clinton’s and other Democrats’ mail servers. But he certainly wasn’t the last. No wonder any other hacker could easily get access to the DNC’s servers.

Shame on CrowdStrike: Do you think I’ve been in the DNC’s networks for almost a year and saved only 2 documents? Do you really believe it?

Here are just a few docs from many thousands I extracted when hacking into DNC’s network.

A taste of what was liberated from the DNC servers, including:

• Donald Trump Report.
• DNC donor lists (compare to FEC records).
• A secret document from Clinton’s days as Secretary of State.
• A scattering of other documents.

The main part of the papers were given to Wikileaks.

Sigh.

Hopefully that won’t mean sanitized documents but we will have to wait and see. Remember the Afghan War Diaries? Edited so as to not discomfort the U.S. government too much.

### If You Believe In OpenAccess, Do You Practice OpenAccess?

Wednesday, June 15th, 2016

CSC-OpenAccess LIBRARY

From the webpage:

CSC Open-Access Library aim to maintain and develop access to journal publication collections as a research resource for students, teaching staff, researchers and industrialists.

You can see a complete listing of the journals here.

Before you protest these are not Science or Nature, remember that Science and Nature did not always have the reputations they do today.

Let the quality of your work bolster the reputations of open access publications and attract others to them.

### I’ll See You The FBI’s 411.9 million images and raise 300 million more, per day

Wednesday, June 15th, 2016

From the post:

Today the federal Government Accountability Office (GAO) finally published its exhaustive report on the FBI’s face recognition capabilities. The takeaway: FBI has access to hundreds of millions more photos than we ever thought. And the Bureau has been hiding this fact from the public—in flagrant violation of federal law and agency policy—for years.

According to the GAO Report, FBI’s Facial Analysis, Comparison, and Evaluation (FACE) Services unit not only has access to FBI’s Next Generation Identification (NGI) face recognition database of nearly 30 million civil and criminal mug shot photos, it also has access to the State Department’s Visa and Passport databases, the Defense Department’s biometric database, and the drivers license databases of at least 16 states. Totaling 411.9 million images, this is an unprecedented number of photographs, most of which are of Americans and foreigners who have committed no crimes.

I understand and share the concern over the FBI’s database of 411.9 million images from identification sources, but let’s be realistic about the FBI’s share of all the image data.

Not an exhaustive list but:

Facebook alone is equaling the FBI photo count every 1.3 days. Moreover, Facebook data is tied to both Facebook and very likely, other social media data, unlike my driver’s license.

Instagram takes a little over 5 days to exceed the FBI image count. but like the little engine that could, it keeps trying.

I’m not sure how to count YouTube’s 300 hours of video every minute.

No reliable counts are available for porn images, which streamed from Pornhub in 2015, accounted for 1,892 petabytes of data.

The Pornhub data stream includes a lot of duplication but finding non-religious and reliable stats on porn is difficult. Try searching for statistics on porn images. Speculation, guesses, etc.

Based on those figures, it’s fair to say the number of images available to the FBI is somewhere North of 100 billion and growing.

Oh, you think non-public photos off-limits to the FBI?

Hmmm, so is lying to federal judges, or so they say.

The FBI may say they are following safeguards, etc., but once a agency develops a culture of lying “in the public’s interest,” why would you ever believe them?

If you believe the FBI now, shouldn’t you say: Shame on me?