Geek Jeopardy – Display Random Man Page

November 22nd, 2016

While writing up Julia Evans’ Things to learn about Linux, I thought it would be cool to display random man pages.

Which resulted in this one-liner in an executable file (man-random, invoke ./man-random):

man $(ls /usr/share/man/man* | shuf -n1 | cut -d. -f1)

As written, it displays a random page from the directories man1 – man8.

If you replace /man* with /man1/, you will only get results for man1 (the usual default).

All of which made me think of Geek Jeopardy!

Can you name this commands from their first paragraph descriptions? (omit their names)

  • remove sections from each line of files
  • pattern scanning and processing language
  • stream editor for filtering and transforming text
  • generate random permutations
  • filter reverse line feeds from input
  • dump files in octal and other formats

Looks easy now, but after a few glasses of holiday cheer? With spectators? Ready to try another man page section?

Enjoy!

Solution:

  • cut: remove sections from each line of files
  • awk: pattern scanning and processing language
  • sed: stream editor for filtering and transforming text
  • shuf: generate random permutations
  • col: filter reverse line feeds from input
  • od: dump files in octal and other formats

PS: I changed the wildcard in the fourth suggested solution from “?” to “*” to arrive at my solution. (Ubuntu 14.04)

Things to learn about Linux

November 22nd, 2016

Things to learn about Linux

From the post:

I asked on Twitter today what Linux things they would like to know more about. I thought the replies were really cool so here’s a list (many of them could be discussed on any Unixy OS, some of them are Linux-specific)

I count forty-seven (47) entries on Julia’s list, which should keep you busy through any holiday!

Enjoy!

The five-step fact-check (Africa Check)

November 22nd, 2016

The five-step fact-check from AfricaCheck

From the post:

Print our useful flow-chart and stick it up in a place where you can quickly refer to it when a deadline is pressing.

africa-check-fact-check-460

Click here to download the PDF for printing.

A great fact checking guide for reporters but useful insight for readers as well.

What’s missing from a story you are reading right now?

AfricaCheck offers to fact check claims about Africa tweeted with: #AfricaCheckIt.

There’s a useful service to the news community!

A quick example, eNCA (South African news site) claimed Zimbabwe’s President Robert Mugabe announced his retirement.

Africa Check responded with Mugabe’s original words plus translation.

I don’t read Mugabe as announcing his retirement but see for yourself.

Advancing exploitation: a scriptless 0day exploit against Linux desktops

November 22nd, 2016

Advancing exploitation: a scriptless 0day exploit against Linux desktops by Chris Evans.

From the post:

A powerful heap corruption vulnerability exists in the gstreamer decoder for the FLIC file format. Presented here is an 0day exploit for this vulnerability.

This decoder is generally present in the default install of modern Linux desktops, including Ubuntu 16.04 and Fedora 24. Gstreamer classifies its decoders as “good”, “bad” or “ugly”. Despite being quite buggy, and not being a format at all necessary on a modern desktop, the FLIC decoder is classified as “good”, almost guaranteeing its presence in default Linux installs.

Thanks to solid ASLR / DEP protections on the (some) modern 64-bit Linux installs, and some other challenges, this vulnerability is a real beast to exploit.

Most modern exploits defeat protections such as ASLR and DEP by using some form of scripting to manipulate the environment and make dynamic decisions and calculations to move the exploit forward. In a browser, that script is JavaScript (or ActionScript etc.) When attacking a kernel from userspace, the “script” is the userspace program. When attacking a TCP stack remotely, the “script” is the program running on the attacker’s computer. In my previous full gstreamer exploit against the NSF decoder, the script was an embedded 6502 machine code program.

But in order to attack the FLIC decoder, there simply isn’t any scripting opportunity. The attacker gets, once, to submit a bunch of scriptless bytes into the decoder, and try and gain code execution without further interaction…

… and good luck with that! Welcome to the world of scriptless exploitation in an ASLR environment. Let’s give it our best shot.

Above my head, at the moment, but I post it as a test for hackers who want to test their understanding/development of exploits.

BTW, some wag, I didn’t bother to see which one, complained Chris’ post is “irresponsible disclosure.”

Sure, the CIA, FBI, NSA and their counter-parts in other governments, plus their cybersecurity contractors should have sole access to such exploits. Ditto for the projects concerned. (NOT!)

“Responsible disclosure” is just another name for unilateral disarmament, on behalf of all of us.

Open and public discussion is much better.

Besides, a hack of Ubuntu 16.04 won’t be relevant at most government installations for years.

Plenty of time for a patched release. ;-)

Practical Palaeography: Recreating the Exeter Book in a Modern Day ‘Scriptorium’

November 22nd, 2016

Practical Palaeography: Recreating the Exeter Book in a Modern Day ‘Scriptorium’

From the post:

Dr Johanna Green is a lecturer in Book History and Digital Humanities at the University of Glasgow. Her PhD (English Language, University of Glasgow 2012) focused on a palaeographical study of the textual division and subordination of the Exeter Book manuscript. Here, she tells us about the first of two sessions she led for the Society of Northumbrian Scribes, a group of calligraphers based in North East England, bringing palaeographic research and modern-day calligraphy together for the public.
(emphasis in original)

Not phrased in subject identity language, but concerns familiar to the topic map community are not far away:


My own research centres on the scribal hand of the manuscript, specifically the ways in which the poems are divided and subdivided from one another and the decorative designs used for these litterae notabiliores throughout. For much of my research, I have spent considerable time (perhaps more than I am willing to admit) wondering where one ought to draw the line with palaeography. When do the details become so tiny to no longer be of any significance? When are they just important enough to mean something significant for our understanding of how the manuscript was created and arranged? How far am I willing to argue that these tiny features have significant impact? Is, for example, this littera notabilior Đ on f. 115v (Judgement Day I, left) different enough in a significant way to this H on f.97v, (The Partridge, bottom right), and in turn are both of these litterae notabiliores performing a different function than the H on f.98r (Soul and Body II, far right)?[5]
(emphasis in original, footnote omitted)

When Dr. Green says:

…When do the details become so tiny to no longer be of any significance?…

I would say: When do the subjects (details) become so tiny we want to pass over them in silence? That is they could be but are not represented in a topic map.

Green ends her speculation, to a degree, by enlisting scribes to re-create the manuscript of interest under her observation.

I’ll leave her conclusions for her post but consider a secondary finding:


The experience also made me realise something else: I had learned much by watching them write and talking to them during the process, but I had also learned much by trying to produce the hand myself. Rather than return to Glasgow and teach my undergraduates the finer details of the script purely through verbal or written description, perhaps providing space for my students to engage in the materials of manuscript production, to try out copying a script/exemplar for themselves would help increase their understanding of the process of writing and, in turn, deepen their knowledge of the constituent parts of a letter and their significance in palaeographic endeavour. This last is something I plan to include in future palaeography teaching.

Dr. Green’s concern over palaeographic detail illustrates two important points about topic maps:

  1. Potential subjects for a topic map are always unbounded.
  2. Different people “see” different subjects.

Which also account for my yawn when Microsoft drops the Microsoft Concept Graph of more than 5.4 million concepts.

…[M]ore than 5.4 million concepts[?]

Hell, Copleston’s History of Western Philosophy easily has more concepts.

But the Microsoft Concept Graph is more useful than a topic map of Copleston in your daily, shallow, social sea.

What subjects do you see and how would capturing them and their identities make a difference in your life (professional or otherwise)?

PubMed comments & their continuing conversations

November 21st, 2016

PubMed comments & their continuing conversations

From the post:

We have many options for communication. We can choose platforms that fit our style, approach, and time constraints. From pop culture to current events, information and opinions are shared and discussed across multiple channels. And scientific publications are no exception.

PubMed Commons was established to enable commenting in PubMed, the largest biomedical literature database. In the past year, commenters posted to more than 1,400 publications. Of those publications, 80% have a single comment today, and 12% have comments from multiple members. The conversation carries forward in other venues.

Sometimes comments pull in discussion from other locations or spark exchanges elsewhere.Here are a few examples where social media prompted PubMed Commons posts or continued the commentary on publications.

An encouraging review of examples of sane discussion through the use of comments.

Unlike the abandoning of comments by some media outlets, NPR for example, NPR Website To Get Rid Of Comments by Elizabeth Jensen.

My take away from Jensen’s account was that NPR likes its free speech, not so much interested in the free speech of others.

See also: Have Comment Sections on News Media Websites Failed?, for op-ed pieces at the New York Times from a variety of perspectives.

Perhaps comments on news sites are examples of casting pearls before swine? (Matthew 7:6)

Resources to Find the Data You Need, 2016 Edition

November 21st, 2016

Resources to Find the Data You Need, 2016 Edition by Nathan Yau.

From the post:

Before you get started on any data-related project, you need data. I know. It sounds crazy, but it’s the truth. It can be frustrating to sleuth for the data you need, so here are some tips on finding it (the openly available variety) and some topic-specific resources to begin your travels.

This is an update to the guide I wrote in 2009, which as it turns out, is now mostly outdated. So, 2016. Here we go.

If you know Nathan Yau’s work, FlowingData, then you know this is “the” starting list for data.

Enjoy!

OPM Farce Continues – 2016 Inspector General Report

November 21st, 2016

U.S. Office of Personnel Management – Office of the Inspector General – Office of Audits

The Office of Personnel Management hack was back in the old days when China was being blamed for every hack. There’s no credible evidence of that but the Chinese were blamed in any event.

The OMP hack illustrated the danger inherent in appointing campaign staff to run mission critical federal agencies. Just a sampling of the impressive depth of Archuleta’s incompetence, read Flash Audit on OPM Infrastructure Update Plan.

The executive summary of the current report offers little room for hope:

This audit report again communicates a material weakness related to OPM’s Security Assessment and Authorization (Authorization) program. In April 2015, the then Chief Information Officer issued a memorandum that granted an extension of the previous Authorizations for all systems whose Authorization had already expired, and for those scheduled to expire through September 2016. Although the moratorium on Authorizations has since been lifted, the effects of the April 2015 memorandum continue to have a significant negative impact on OPM. At the end of fiscal year (FY) 2016, the agency still had at least 18 major systems without a valid Authorization in place.

However, OPM did initiate an “Authorization Sprint” during FY 2016 in an effort to get all of the agency’s systems compliant with the Authorization requirements. We acknowledge that OPM is once again taking system Authorization seriously. We intend to perform a comprehensive audit of OPM’s Authorization process in early FY 2017.

This audit report also re-issues a significant deficiency related to OPM’s information security management structure. Although OPM has developed a security management structure that we believe can be effective, there has been an extremely high turnover rate of critical positions. The negative impact of these staffing issues is apparent in the results of our current FISMA audit work. There has been a significant regression in OPM’s compliance with FISMA requirements, as the agency failed to meet requirements that it had successfully met in prior years. We acknowledge that OPM has placed significant effort toward filling these positions, but simply having the staff does not guarantee that the team can effectively manage information security and keep OPM compliant with FISMA requirements. We will continue to closely monitor activity in this area throughout FY 2017.

It’s illegal but hacking the OPM remains easier than the NSA.

Hacking the NSA requires a job at Booz Allen and a USB drive.

Preserving Ad Revenue With Filtering (Hate As Renewal Resource)

November 21st, 2016

Facebook and Twitter haven’t implemented robust and shareable filters for their respective content streams for fear of disturbing their ad revenue streams.* The power to filter feared as the power to exclude ads.

Other possible explanations include: Drone employment, old/new friends hired to discuss censoring content; Hubris, wanting to decide what is “best” for others to see and read; NIH (not invented here), which explains silence concerning my proposals for shareable content filters; others?

* Lest I be accused of spreading “fake news,” my explanation for the lack of robust and shareable filters on content on Facebook and Twitter is based solely on my analysis of their behavior and not any inside leaks, etc.

I have a solution for fearing filters as interfering with ad revenue.

All Facebook posts and Twitter tweets, will be delivered with an additional Boolean field, ad, which defaults to true (empty field), meaning the content can be filtered. (following Clojure) When the field is false, that content cannot be filtered.

Filters being registered and shared via Facebook and Twitter, testing those filters for proper operation (and not applying them if they filter ad content) is purely an algorithmic process.

Users pay to post ad content, a step where the false flag can be entered, resulting in no more ad freeloaders being free from filters.

What’s my interest? I’m interested in the creation of commercial filters for aggregation, exclusion and creating a value-add product based on information streams. Moreover, ending futile and bigoted attempts at censorship seems like a worthwhile goal to me.

The revenue potential for filters is nearly unlimited.

The number of people who hate rivals the number who want to filter the content seen by others. An unrestrained Facebook/Twitter will attract more hate and “fake news,” which in turn will drive a great need for filters.

Not a virtuous cycle but certainly a profitable one. Think of hate and the desire to censor as renewable resources powering that cycle.

PS: I’m not an advocate for hate and censorship but they are both quite common. Marketing is based on consumers as you find them, not as you wish they were.

MuckRock Needs Volunteers (9 states in particular)

November 21st, 2016

MuckRock needs your help to keep filing in all 50 states by Beryl Lipton.

From the post:

Election time excitement got you feeling a little more patriotic than usual? Looking for a way to help but not sure you have the time? Well, MuckRock is looking for a few good people to do a big service requiring little effort: serve as our resident proxies.

A few states have put up barriers at their borders, limiting required disclosure and response to requests to only residents. One more thing added to the regular rigamarole of requesting public records, it’s huge block to comparative studies and useful, outside accountability.

This is where you come in.

proxymap-460

We’re looking for volunteers in the ten states that can whip out their residency requirements whenever they get the chance:

  • Alabama
  • Arkansas
  • Georgia
  • Missouri
  • Montana.
  • New Hampshire
  • New Jersey
  • Tennessee
  • Virginia

As a MuckRock proxy requester, you’ll serve as the in-state request representative, allowing requests to be submitted in your name and enabling others to continue to demand accountability. In exchange, you’ll get your own Professional MuckRock account – 20 requests a month and all that comes with them – and the gratitude of the transparency community.

Interested in helping the cause? Let us know at info@muckrock.com, or via the from below.

Despite my view that government disclosures are previously undisclosed government lies, I have volunteered for this project.

Depending on where you reside, you should too and/or contribute to support MuckRock.

How to get started with Data Science using R

November 20th, 2016

How to get started with Data Science using R by Karthik Bharadwaj.

From the post:

R being the lingua franca of data science and is one of the popular language choices to learn data science. Once the choice is made, often beginners find themselves lost in finding out the learning path and end up with a signboard as below.

In this blog post I would like to lay out a clear structural approach to learning R for data science. This will help you to quickly get started in your data science journey with R.

You won’t find anything you don’t already know but this is a great short post to pass onto others.

Point out R skills will help them expose and/or conceal government corruption.

Refining The Dakota Access Pipeline Target List

November 20th, 2016

I mentioned in Exploding the Dakota Access Pipeline Target List that while listing of the banks financing Dakota Access Pipeline is great, banks and other legal entities are owned, operated and act through people. People, who unlike abstract legal entities, are subject to the persuasion of other people.

Unfortunately, almost all discussions of #DAPL focus on the on-site brutality towards Native Americans and/or the corporations involved in the project.

The protesters deserve our support but resisting local pawns (read police) may change the route of the pipeline, but it won’t stop the pipeline.

To stop the Dakota Access Pipeline, there are only two options:

  1. Influence investors to abandon the project
  2. Make the project prohibitively expensive

In terms of #1, you have to strike through the corporate veil to reach the people who own and direct the affairs of the corporation.

“Piercing the corporate veil” is legal terminology but I mean it as in knowing the named and located individuals are making decisions for a corporation and the named and located individuals who are its owners.

A legal fiction, such as a corporation, cannot feel public pressure, distress, social ostracism, etc., all things that people are subject to suffering.

Even so, persuasion can only be brought to bear on named and located individuals.

News reports giving only corporate names and not individual owners/agents creates a boil of obfuscation.

A boil of obfuscation that needs lancing. Shall we?

To get us off on a common starting point, here are some resources I will be reviewing/using:

Corporate Research Project

The Corporate Research Project assists community, environmental and labor organizations in researching companies and industries. Our focus is on identifying information that can be used to advance corporate accountability campaigns. [Sponsors Dirt Diggers Digest]

Dirt Diggers Digest

chronicling corporate misbehavior (and how to research it) [blog]

LittleSis

LittleSis* is a free database of who-knows-who at the heights of business and government.

* opposite of Big Brother

OpenCorporates

The largest open database of companies in the world [115,419,017 companies]

Revealing the World of Private Companies by Sheila Coronel

Coronel’s blog post has numerous resources and links.

She also points out that the United States is a top secrecy destination:


A top secrecy jurisdiction is the United States, which doesn’t collect the names of shareholders of private companies and is unsurprisingly one of the most favored nations for hiding illicit wealth. (See, for example, this Reuters report on shell companies in Wyoming.) As Senator Carl Levin says, “It takes more information to obtain a driver’s license or open a U.S. bank account than it does to form a U.S. corporation.” Levin has introduced a bill that would end the formation of companies for unidentified persons, but that is unlikely to pass Congress.

If we picked one of the non-U.S. sponsors of the #DAPL, we might get lucky and hit a transparent or semi-transparent jurisdiction.

Let’s start with a semi-tough case, a U.S. corporation but a publicly traded one, Wells Fargo.

Where would you go next?

How to get superior text processing in Python with Pynini

November 19th, 2016

How to get superior text processing in Python with Pynini by Kyle Gorman and Richard Sproat.

From the post:

It’s hard to beat regular expressions for basic string processing. But for many problems, including some deceptively simple ones, we can get better performance with finite-state transducers (or FSTs). FSTs are simply state machines which, as the name suggests, have a finite number of states. But before we talk about all the things you can do with FSTs, from fast text annotation—with none of the catastrophic worst-case behavior of regular expressions—to simple natural language generation, or even speech recognition, let’s explore what a state machine is, what they have to do with regular expressions.

Reporters, researchers and others will face a 2017 where the rate of information has increased, along with noise from media spasms over the latest taut from president-elect Trump.

Robust text mining/filtering will your daily necessities, if they aren’t already.

Tagging text is the first example. Think about auto-generating graphs from emails with “to:,” “from:,” “date:,” and key terms in the email. Tagging the key terms is essential to that process.

Once tagged, you can slice and dice the text as more information is uncovered.

Interested?

Tracking Business Records Across Asia

November 19th, 2016

Tracking Business Records Across Asia by GIJN staff.

From the post:

The paper trail has changed — money now moves digitally and business registries are databases — and this lets journalists do more than ever before in tracking people and companies across borders.

Backgrounding an individual or a company? Following an organized crime ring? The key to uncovering corruption is to “follow the money” — to discover who owns what, who gets which contract, and how business are linked to each other.

Resources on tracking corporate records in China, the Philippines and India!

While you are sharpening your tracking skills, don’t forget to support GIJN.

Python Data Science Handbook

November 19th, 2016

Python Data Science Handbook (Github)

From the webpage:

Jupyter notebook content for my OReilly book, the Python Data Science Handbook.

pdsh-cover

See also the free companion project, A Whirlwind Tour of Python: a fast-paced introduction to the Python language aimed at researchers and scientists.

This repository will contain the full listing of IPython notebooks used to create the book, including all text and code. I am currently editing these, and will post them as I make my way through. See the content here:

Enjoy!

CIA Raises Technical Incompetence Flag

November 19th, 2016

The CIA‘s responded to Michael Morisy‘s request for:

“a copy of emails sent to or from the CIA’s FOIA office regarding FOIA Portal’s Technical Issues.”

gives these requirements for requesting emails:

We require requesters seeking any form of “electronic communications” such as emails, to provide the specific “to” and “from” recipients, time frame and subject.

(The full response.)

Recalling that the FBI requested special software to separate emails of Huma Abedin and Anthony Weiner on the same laptop, is the CIA really that technically incompetent in terms of searching?

Is the CIA is incapable of searching emails by subject alone?

With a dissatisfied-with-intelligence-community president-elect Donald Trump about to take office, I would not be flying the Technical Incompetence Here flag.

The CIA may respond it is not incompetent but rather was acting in bad faith.

In debate we used to call that the “horns of a dilemma,” yes?

I’m voting for bad faith.

How about you?

If You Don’t Get A New Car For The Holidays

November 19th, 2016

Just because you aren’t expecting:

car-christmas-460

Doesn’t mean a new car isn’t in your future:

grid-locks-460

From the Sparrows Lock Pick website:

Sparrows Gridlock

There is a reason as to why a coat hanger is the tool of choice for most Automobile lockouts. Picking a standard 10 wafer Automotive lock is a Huge challenge. Most often it is achieved by being stubborn with a pinch of lucky a dash of skill.

The Gridlock set lets you develop that skill by working through three automotive locks of ever increasing difficulty. Building from a 3 to a 6 to a full 10 wafer Automotive lock will allow you to develop the skill for picking wafers. Wafer picking is an entirely different skill set when compared to pin tumbler picking.

A standard pin tumbler key is cut just along the top to lift the pins up into place letting you open the lock. A wafer lock key is cut on the top and bottom, this then moves the wafers Up and Down positioning them for the lock to open.

Learning to manipulate and rock those wafers into position is a skill ….. a skill that one day may get you a well deserved high five or a court appointed lawyer.

The Gridlock comes with 3 progressive wafer locks and an automotive tension wrench specific to appling tension to wafer locks. The locks are solid aluminum and perfect in scale to a classic car lock.

Think of lock picking as an expansion of your skills at digitally hacking access to automobiles.

With the Auto Rocker Picks, sans shipping, the package lists for $41.50. You may want some additional accessories from the LockPickShop

Security discussions determine when your security will fail, not if.

Security discussions that don’t include physical security determine it will be sooner rather than later.

Eight steps reporters should take … [every day]

November 19th, 2016

Eight steps reporters should take before Trump assumes office by Dana Priest.

Reporters should paste these eight steps to their bathrooms mirror for review every day, not just for the Trump presidency:

Rebuild sources: Call every source you’ve ever had who is either still in government or still connected to those who are. Touch base, renew old connections, and remind folks that you’re all ears.

Join forces: Triangulate tips and sources across the newsroom, like we did after 9/11, when reporting became more difficult.

Make outside partnerships: Reporting organizations outside your own newspaper, especially those abroad and with international reach, can help uncover the moves being considered and implemented in foreign countries.

Discover the first family: Now part of the White House team, Donald Trump’s children and son-in-law are an important target for deep-dive reporting into their own financial holdings and their professional and personal records.

Renew the hunt: Find those tax filings!

Out disinformation: Find a way to take on the many false news sites that now hold a destructive sway over some Americans.

Create a war chest: Donate and persuade your news organization to donate large sums to legal defense organizations preparing to jump in with legal challenges the moment Trump moves against access, or worse. The two groups that come to mind are the Reporters’ Committee for Freedom of the Press and the American Civil Liberties Union. Encourage your senior editors to get ready for the inevitable, quickly.

Be grateful: Celebrate your freedom to do hard-hitting, illuminating work by doing much more of it.

Don’t wait for reporters to carry all the load.

Many of these steps, “Renew the hunt” comes to mind, can be performed by non-reporters and then leaked.

A lack of transparency of government signals a lack of effort on the part of the press and public.

FOIA is great but it’s also being spoon fed what the government chooses to release.

I’m thinking of transparency that is less self-serving than FOIA releases.

The Postal Museum (UK)

November 19th, 2016

The Postal Museum

Set to open in mid-2017, the Postal Museum covers five hundred years of “Royal Mail.”

It’s Online catalogue has more than 120,000 records describing its collection.

Which includes this gem:

uk-postal-museum-460

Registering for the catalogue will enable you to access downloadable content, save searches, create wish-lists, etc. Registration is free and worth the effort.

The site is in beta and my confirmation email displayed as blank in Thunderbird but viewing source gave the confirmation URL.

A terminology issue. Where the tabs for an item say “Ordering and Viewing,” they mean requesting an items to be retrieved for you to view on a specified day.

I was confused because I thought “ordering” meant obtaining a copy, print or digital of the item in question.

The turnpike road map above is available in a somewhat larger size but not nearly large enough for actual use.

Very high resolution images of maps and similar materials would be a welcome addition to the resources already available.

Enjoy!

PS: I didn’t look but the Postal Museum has resources on stamps as well. ;-)

Successful Hate Speech/Fake News Filters – 20 Facts About Facebook

November 18th, 2016

After penning Monetizing Hate Speech and False News yesterday, I remembered non-self-starters will be asking:

Where are examples of successful monetized filters for hate speech and false news?

Of The Top 20 Valuable Facebook Statistics – Updated November 2016, I need only two to make the case for monetized filters.

1. Worldwide, there are over 1.79 billion monthly active Facebook users (Facebook MAUs) which is a 16 percent increase year over year. (Source: Facebook as of 11/02/16)

15. Every 60 seconds on Facebook: 510 comments are posted, 293,000 statuses are updated, and 136,000 photos are uploaded. (Source: The Social Skinny)

(emphasis in the original)

By comparison, Newsonomics: 10 numbers on The New York Times’ 1 million digital-subscriber milestone [2015], the New York Times has 1 million digital subscribers.

If you think about it, the New York Times is a hate speech/fake news filter, although it has a much smaller audience than Facebook.

Moreover, the New York Times is spending money to generate content whereas on Facebook, content is there for the taking or filtering.

If the New York Times can make money as a filter for hate speech/fake news carrying its overhead, imagine the potential for profit from simply filtering content generated and posted by others. Across a market of 1.79 billion viewers. Where “hate,” and “fake” varies from audience to audience.

Content filters at Facebook and the ability to “follow” those filters for on timelines is all that is missing. (And Facebook monetizing the use of those filters.)

Petition Mark Zuckerberg and Facebook for content filters today!

Operating Systems Design and Implementation (12th USENIX Symposium)

November 17th, 2016

Operating Systems Design and Implementation (12th USENIX Symposium) – Savannah, GA, USA, November 2-4, 2016.

Message from the OSDI ’16 Program Co-Chairs:

We are delighted to welcome to you to the 12th USENIX Symposium on Operating Systems Design and Implementation, held in Savannah, GA, USA! This year’s program includes a record high 47 papers that represent the strength of our community and cover a wide range of topics, including security, cloud computing, transaction support, storage, networking, formal verification of systems, graph processing, system support for machine learning, programming languages, troubleshooting, and operating systems design and implementation.

Weighing in at seven hundred and ninety-seven (797) pages, this tome will prove more than sufficient to avoid annual family arguments during the holiday season.

Not to mention this is an opportunity to hone your skills to a fine edge.

Monetizing Hate Speech and False News

November 17th, 2016

Eli Pariser has started If you were Facebook, how would you reduce the influence of fake news? on GoogleDocs.

Out of the now seventeen pages of suggestions, I haven’t noticed any that promise a revenue stream to Facebook.

I view ideas to filter “false news” and/or “hate speech” that don’t generate revenue for Facebook as non-starters. I suspect Facebook does as well.

Here is a broad sketch of how Facebook can monetize “false news” and “hate speech,” all while shaping Facebook timelines to diverse expectations.

Monetizing “false news” and “hate speech”

Facebook creates user defined filters for their timelines. Filters can block other Facebook accounts (and any material from them), content by origin, word and I would suggest, regex.

User defined filters apply only to that account and can be shared with twenty other Facebooks users.

To share a filter with more than twenty other Facebook users, Facebook charges an annual fee, scaled on the number of shares.

Unlike the many posts on “false news” and “hate speech,” being a filter isn’t free beyond twenty other users.

Selling Subscriptions to Facebook Filters

Organizations can sell subscriptions to their filters, Facebook, which controls the authorization of the filters, contracts for a percentage of the subscription fee.

Pro tip: I would not invoke Facebook filters from the Washington Post and New York Times at the same time. It is likely they exclude each other as news sources.

Advantages of Monetizing Hate Speech and False News

First and foremost for Facebook, it gets out of the satisfying every point of view game. Completely. Users are free to define as narrow or as broad a point of view as they desire.

If you see something you don’t like, disagree with, etc., don’t complain to Facebook, complain to your Facebook filter provider.

That alone will expose the hidden agenda behind most, perhaps not all, of the “false news” filtering advocates. They aren’t concerned with what they are seeing on Facebook but they are very concerned with deciding what you see on Facebook.

For wannabe filters of what other people see, beyond twenty other Facebook users, that privilege is not free. Unlike the many proposals with as many definitions of “false news” as appear in Eli’s document.

It is difficult to imagine a privilege people would be more ready to pay for than the right to attempt to filter what other people see. Churches, social organizations, local governments, corporations, you name them and they will be lining up to create filter lists.

The financial beneficiary of the “drive to filter for others” is of course Facebook but one could argue the filter owners profit by spreading their worldview and the unfortunates that follow them, well, they get what they get.

Commercialization of Facebook filters, that is selling subscriptions to Facebook filters creates a new genre of economic activity and yet another revenue stream for Facebook. (That two up to this point if you are keeping score.)

It isn’t hard to imagine the Economist, Forbes, professional clipping services, etc., creating a natural extension of their filtering activities onto Facebook.

Conclusion: Commercialization or Unfunded Work Assignments

Preventing/blocking “hate speech” and “false news,” for free has been, is and always will be a failure.

Changing Facebook infrastructure isn’t free and by creating revenue streams off of preventing/blocking “hate speech” and “false news,” creates incentives for Facebook to make the necessary changes and for people to build filters off of which they can profit.

Not to mention that filtering enables everyone, including the alt-right, alt-left and the sane people in between, to create the Facebook of their dreams, and not being subject to the Facebook desired by others.

Finally, it gets Facebook and Mark Zuckerberg out of the fantasy island approach where they are assigned unpaid work by others. New York Times, Mark Zuckerberg Is in Denial. (It’s another “hit” piece by Zeynep Tufekci.)

If you know Mark Zuckerberg, please pass this along to him.

Pentagon Says: Facts Don’t Matter (Pre-Trump)

November 17th, 2016

Intel chairman: Pentagon plagiarized Wikipedia in report to Congress by Kristina Wong.

From the post:

The Pentagon submitted information plagiarized from Wikipedia to members of Congress, the chairman of the House Intelligence Committee said at a hearing Thursday.

Chairman Devin Nunes (R-Calif.) said on March 21, Deputy Defense Secretary Bob Work submitted a document to the chairmen of the House Intelligence, Armed Services, and Defense appropriations committees with information directly copied from Wikipedia, an online open-source encyclopedia.

The information was submitted in a document used to justify a determination that Croughton was the best location for a joint intelligence center with the United Kingdom, Nunes said. The determination was required by the 2016 National Defense Authorization Act.

If that weren’t bad enough, here’s the kicker:


Work said he still fulfilled the law by making a determination and that the plagiarized information had “no bearing” on that determination.

Do you read that to mean:

  1. Work made the determination
  2. The “made” determination was packed with facts to justify it

In that order?

Remarkably candid admission that Pentagon decisions are made and then those decisions are packed with facts to justify them.

Not particularly surprising to me.

You?

The new Tesseract package: High Quality OCR in R

November 17th, 2016

The new Tesseract package: High Quality OCR in R by Jeroen Ooms.

From the post:

Optical character recognition (OCR) is the process of extracting written or typed text from images such as photos and scanned documents into machine-encoded text. The new rOpenSci package tesseract brings one of the best open-source OCR engines to R. This enables researchers or journalists, for example, to search and analyze vast numbers of documents that are only available in printed form.

People looking to extract text and metadata from pdf files in R should try our pdftools package.

Reading too quickly at first I thought I had missed a new version of Tesseract (tesseract-ocr Github), an OCR program that I use on a semi-regular basis.

Reading a little slower, ;-), I discovered Ooms is describing a new package for R, which uses Tesseract for OCR.

This is great news but be aware that Tesseract (whether called by an R package or standalone) can generate a large amount of output in a fairly short period of time.

One of the stumbling blocks of OCR is the labor intensive process of cleaning up the inevitable mistakes.

Depending on how critical accuracy is for searching, for example, you may choose to verify and clean only quotes for use in other publications.

Best to make those decisions up front and not be faced with a mountain of output that isn’t useful unless and until it has been corrected.

Mute Account vs. Mute Word/Hashtag – Ineffectual Muting @Twitter

November 17th, 2016

twitter-hate-speech-460

I mentioned yesterday the distinction between muting an account versus the new muting by word or #hashtag at Twitter.

Take a moment to check my sources at Twitter support to make sure I have the rules correctly stated. I’ll wait.

(I’m not a journalist but readers should be enabled to satisfy themselves claims I make are at least plausible.)

No feedback from Twitter on the don’t appear in your timeline vs. do appear in your timeline distinction.

Why would I want to only block notifications of what I think of as hate speech and still have those tweets in my timeline?

Then it occurred to me:

If you can block tweets from appearing in your timeline by word or hashtag, you can block advertising tweets from appearing in your timeline.

You cannot effectively mute hate speech @Twitter because you could also mute advertising.

What about it Twitter?

Must feminists, people of color, minorities of all types be subjected to hate speech in order to preserve your revenue streams?


Not that I object to Twitter having revenue streams from advertising but it needs to be more sophisticated than the Nigerian spammer model now in use. Charge a higher price for targeted advertising that users are unlikely to block.

For example, I would be highly unlikely to block ads for cs theory/semantic integration tomes. On the other hand, I would follow a mute list that blocked histories of famous cricket matches. (Apologies to any cricket players in the audience.)

In my post: Twitter Almost Enables Personal Muting + Roving Citizen-Censors I offer a solution that requires only minor changes based on data Twitter already collects plus regexes for muting. It puts what you see entirely in the hands of users.

That enables Twitter to get out of the censorship business altogether, something it doesn’t do well anyway, and puts users in charge of what they see. A win-win from my perspective.

Alt-right suspensions lay bare Twitter’s consistency [hypocrisy] problem

November 17th, 2016

Alt-right suspensions lay bare Twitter’s consistency problem by Nausicaa Renner.

From the post:

TWITTER SUSPENDED A NUMBER OF ACCOUNTS associated with the alt-right, USA Today reported this morning. This move was bound to be divisive: While Twitter has banned and suspended users in the past (prominently, Milo Yiannopoulos for incitement), USA Today points out the company has never suspended so many at once—at least seven in this case. Richard Spencer, one of the suspended users and prominent alt-righter, also had a verified account on Twitter. He claims, “I, and a number of other people who have just got banned, weren’t even trolling.”

If this is true, it would be a powerful political statement, indeed. As David Frum notes in The Atlantic, “These suspensions seem motivated entirely by viewpoint, not by behavior.” Frum goes on to argue that a kingpin strategy on Twitter’s part will only strengthen the alt-right’s audience. But we may never know Twitter’s reasoning for suspending the accounts. Twitter declined to comment on its moves, citing privacy and security reasons.

(emphasis in original)

Contrary to the claims of the Southern Poverty Law Center (SPLC) to Twitter, these users may not have been suspended for violating Twitter’s terms of service, but for their viewpoints.

Like the CIA, FBI and NSA, Twitter uses secrecy to avoid accountability and transparency for its suspension process.

The secrecy – avoidance of accountability/transparency pattern is one you should commit to memory. It is quite common.

Twitter needs to develop better muting options for users and abandon account suspension (save on court order) altogether.

XML Prague 2017, February 9-11, 2017 – Registration Opens!

November 16th, 2016

XML Prague 2017, February 9-11, 2017

I mentioned XML Prague 2017 last month and now, after the election of Donald Trump as president of the United States, registration for the conference opens!

Coincidence?

Maybe. ;-)

Even if you are returning to the U.S. after the conference, XML Prague will be a welcome respite from the tempest of news coverage of what isn’t known about the impending Trump administration.

At 120 Euros for three days, this is a great investment both professionally and emotionally.

Enjoy!

The Amnesic Incognito Live System (Tails) 2.7

November 16th, 2016

The Amnesic Incognito Live System (Tails) 2.7

The Amnesic Incognito Live System (Tails) is a Debian-based, live distribution with the goal of providing Internet anonymity for its users. The distribution accomplishes this by directing Internet traffic through the Tor network and by providing built-in tools for protecting files and scrubbing away meta data. The project’s latest release mostly focuses on fixing bugs and improving security: “Tails 2.7 is out. This release fixes many security issues and users should upgrade as soon as possible. New features: ship LetsEncrypt intermediate SSL certificate so that our tools are able to authenticate our website when its certificate is updated. Upgrades and changes: Tor 0.2.8.9, Tor Browser 6.0.6, Linux kernel 4.7, Icedove 45.4.0. Fixed problems: Synaptic installs packages with the correct architecture; set default spelling to en_US in Icedove. Known issues: users setting their Tor Browser security slider to High will have to click on a link to see the result of the search they done with the search box.” Additional information on Tails 2.7 can be found in the project’s release notes. A list of issues fixed in the 2.7 release can be found in the list of former security issues. Download: tails-i386-2.7.iso (1,113MB, signature, pkglist). Also available from OSDisc.

An essential part of your overall cybersecurity stance.

All releases are date/time sensitive.

BEFORE installing this release, even later today, check for a later release: Tails.

Checking for the latest release only takes seconds and is a habit that will help you avoid patched security holes.

PoisonTap – Wishlist 2016

November 16th, 2016

PoisonTap Steals Cookies, Drops Backdoors on Password-Protected Computers by Chris Brook.

From the post:

Even locked, password-protected computers are no rival for Samy Kamkar and his seemingly endless parade of gadgets.

His latest, PoisonTap, is a $5 Raspberry Pi Zero device running Node.js that’s retrofitted to emulate an Ethernet device over USB. Assuming a victim has left their web browser open, once plugged in to a machine, the device can quietly fetch HTTP cookies and sessions from millions of websites, even if the computer is locked.

If that alone doesn’t sound like Mr. Robot season three fodder, the device can also expose the machine’s internal router and install persistent backdoors, guaranteeing an attacker access long after they’ve removed the device from a USB slot.

“[The device] produces a cascading effect by exploiting the existing trust in various mechanisms of a machine and network, including USB, DHCP, DNS, and HTTP, to produce a snowball effect of information exfiltration, network access and installation of semi-permanent backdoors,” Kamkar said Wednesday in a writeup of PoisonTap.

Opportunity may only knock once.

Be prepared by carrying one or more PoisonTaps along with a bootable USB stick.

“…Fake News Is Not the Problem”

November 16th, 2016

According to Snopes, Fake News Is Not the Problem by Brooke Binkowski.

From the post:

Take it from the internet’s chief myth busters: The problem is the failing media.

This is the state of truth on the internet in 2016, now that it is as easy for a Macedonian teenager to create a website as it is for The New York Times, and now that the information most likely to find a large audience is that which is most alarming, not most correct. In the wake of the election, the spread of this kind of phony news on Facebook and other social media platforms has come under fire for stoking fears and influencing the election’s outcome. Both Facebook and Google have taken moves to bar fake news sites from their advertising platforms, aiming to cut off the sites’ sources of revenue.

But as managing editor of the fact-checking site Snopes, Brooke Binkowski believes Facebook’s perpetuation of phony news is not to blame for our epidemic of misinformation. “It’s not social media that’s the problem,” she says emphatically. “People are looking for somebody to pick on. The alt-rights have been empowered and that’s not going to go away anytime soon. But they also have always been around.”

The misinformation crisis, according to Binkowski, stems from something more pernicious. In the past, the sources of accurate information were recognizable enough that phony news was relatively easy for a discerning reader to identify and discredit. The problem, Binkowski believes, is that the public has lost faith in the media broadly — therefore no media outlet is considered credible any longer. The reasons are familiar: as the business of news has grown tougher, many outlets have been stripped of the resources they need for journalists to do their jobs correctly. “When you’re on your fifth story of the day and there’s no editor because the editor’s been fired and there’s no fact checker so you have to Google it yourself and you don’t have access to any academic journals or anything like that, you will screw stories up,” she says.

Sadly Binkowski’s debunking of the false/fake news meme doesn’t turn up on Snopes.com.

That might make it more convincing to mainstream media who have seized upon false/fake news to excuse their lack of credibility with readers.

Please share the Binkowski post with your friends, especially journalists.