November « 2016 « Another Word For It

November 21, 2016

Resources to Find the Data You Need, 2016 Edition

Filed under: Graphics,Visualization — Patrick Durusau @ 5:25 pm

Resources to Find the Data You Need, 2016 Edition by Nathan Yau.

From the post:

Before you get started on any data-related project, you need data. I know. It sounds crazy, but it’s the truth. It can be frustrating to sleuth for the data you need, so here are some tips on finding it (the openly available variety) and some topic-specific resources to begin your travels.

This is an update to the guide I wrote in 2009, which as it turns out, is now mostly outdated. So, 2016. Here we go.
…

If you know Nathan Yau’s work, FlowingData, then you know this is “the” starting list for data.

Enjoy!

Comments Off

OPM Farce Continues – 2016 Inspector General Report

Filed under: Cybersecurity,Government,Government Data,NSA,Security — Patrick Durusau @ 4:59 pm

U.S. Office of Personnel Management – Office of the Inspector General – Office of Audits

The Office of Personnel Management hack was back in the old days when China was being blamed for every hack. There’s no credible evidence of that but the Chinese were blamed in any event.

The OMP hack illustrated the danger inherent in appointing campaign staff to run mission critical federal agencies. Just a sampling of the impressive depth of Archuleta’s incompetence, read Flash Audit on OPM Infrastructure Update Plan.

The executive summary of the current report offers little room for hope:

This audit report again communicates a material weakness related to OPM’s Security Assessment and Authorization (Authorization) program. In April 2015, the then Chief Information Officer issued a memorandum that granted an extension of the previous Authorizations for all systems whose Authorization had already expired, and for those scheduled to expire through September 2016. Although the moratorium on Authorizations has since been lifted, the effects of the April 2015 memorandum continue to have a significant negative impact on OPM. At the end of fiscal year (FY) 2016, the agency still had at least 18 major systems without a valid Authorization in place.

However, OPM did initiate an “Authorization Sprint” during FY 2016 in an effort to get all of the agency’s systems compliant with the Authorization requirements. We acknowledge that OPM is once again taking system Authorization seriously. We intend to perform a comprehensive audit of OPM’s Authorization process in early FY 2017.

This audit report also re-issues a significant deficiency related to OPM’s information security management structure. Although OPM has developed a security management structure that we believe can be effective, there has been an extremely high turnover rate of critical positions. The negative impact of these staffing issues is apparent in the results of our current FISMA audit work. There has been a significant regression in OPM’s compliance with FISMA requirements, as the agency failed to meet requirements that it had successfully met in prior years. We acknowledge that OPM has placed significant effort toward filling these positions, but simply having the staff does not guarantee that the team can effectively manage information security and keep OPM compliant with FISMA requirements. We will continue to closely monitor activity in this area throughout FY 2017.

It’s illegal but hacking the OPM remains easier than the NSA.

Hacking the NSA requires a job at Booz Allen and a USB drive.

Comments Off

Preserving Ad Revenue With Filtering (Hate As Renewal Resource)

Filed under: Advertising,Facebook,Marketing,Twitter — Patrick Durusau @ 4:02 pm

Facebook and Twitter haven’t implemented robust and shareable filters for their respective content streams for fear of disturbing their ad revenue streams.* The power to filter feared as the power to exclude ads.

Other possible explanations include: Drone employment, old/new friends hired to discuss censoring content; Hubris, wanting to decide what is “best” for others to see and read; NIH (not invented here), which explains silence concerning my proposals for shareable content filters; others?

* Lest I be accused of spreading “fake news,” my explanation for the lack of robust and shareable filters on content on Facebook and Twitter is based solely on my analysis of their behavior and not any inside leaks, etc.

I have a solution for fearing filters as interfering with ad revenue.

All Facebook posts and Twitter tweets, will be delivered with an additional Boolean field, ad, which defaults to true (empty field), meaning the content can be filtered. (following Clojure) When the field is false, that content cannot be filtered.

Filters being registered and shared via Facebook and Twitter, testing those filters for proper operation (and not applying them if they filter ad content) is purely an algorithmic process.

Users pay to post ad content, a step where the false flag can be entered, resulting in no more ad freeloaders being free from filters.

What’s my interest? I’m interested in the creation of commercial filters for aggregation, exclusion and creating a value-add product based on information streams. Moreover, ending futile and bigoted attempts at censorship seems like a worthwhile goal to me.

The revenue potential for filters is nearly unlimited.

The number of people who hate rivals the number who want to filter the content seen by others. An unrestrained Facebook/Twitter will attract more hate and “fake news,” which in turn will drive a great need for filters.

Not a virtuous cycle but certainly a profitable one. Think of hate and the desire to censor as renewable resources powering that cycle.

PS: I’m not an advocate for hate and censorship but they are both quite common. Marketing is based on consumers as you find them, not as you wish they were.

Comments Off

MuckRock Needs Volunteers (9 states in particular)

Filed under: FOIA,Government,Politics,Transparency — Patrick Durusau @ 2:02 pm

MuckRock needs your help to keep filing in all 50 states by Beryl Lipton.

From the post:

Election time excitement got you feeling a little more patriotic than usual? Looking for a way to help but not sure you have the time? Well, MuckRock is looking for a few good people to do a big service requiring little effort: serve as our resident proxies.

…

A few states have put up barriers at their borders, limiting required disclosure and response to requests to only residents. One more thing added to the regular rigamarole of requesting public records, it’s huge block to comparative studies and useful, outside accountability.

This is where you come in.

We’re looking for volunteers in the ten states that can whip out their residency requirements whenever they get the chance:

Alabama

Arkansas

Georgia

Missouri

Montana.

New Hampshire

New Jersey

Tennessee

Virginia

As a MuckRock proxy requester, you’ll serve as the in-state request representative, allowing requests to be submitted in your name and enabling others to continue to demand accountability. In exchange, you’ll get your own Professional MuckRock account – 20 requests a month and all that comes with them – and the gratitude of the transparency community.

Interested in helping the cause? Let us know at info@muckrock.com, or via the from below.

…

Despite my view that government disclosures are previously undisclosed government lies, I have volunteered for this project.

Depending on where you reside, you should too and/or contribute to support MuckRock.

Comments Off

November 20, 2016

How to get started with Data Science using R

Filed under: Politics,Programming,R — Patrick Durusau @ 5:40 pm

How to get started with Data Science using R by Karthik Bharadwaj.

From the post:

R being the lingua franca of data science and is one of the popular language choices to learn data science. Once the choice is made, often beginners find themselves lost in finding out the learning path and end up with a signboard as below.

In this blog post I would like to lay out a clear structural approach to learning R for data science. This will help you to quickly get started in your data science journey with R.
…

You won’t find anything you don’t already know but this is a great short post to pass onto others.

Point out R skills will help them expose and/or conceal government corruption.

Comments Off

Refining The Dakota Access Pipeline Target List

Filed under: #DAPL,Data Mining,Government,Politics — Patrick Durusau @ 3:56 pm

I mentioned in Exploding the Dakota Access Pipeline Target List that while listing of the banks financing Dakota Access Pipeline is great, banks and other legal entities are owned, operated and act through people. People, who unlike abstract legal entities, are subject to the persuasion of other people.

Unfortunately, almost all discussions of #DAPL focus on the on-site brutality towards Native Americans and/or the corporations involved in the project.

The protesters deserve our support but resisting local pawns (read police) may change the route of the pipeline, but it won’t stop the pipeline.

To stop the Dakota Access Pipeline, there are only two options:

Influence investors to abandon the project
Make the project prohibitively expensive

In terms of #1, you have to strike through the corporate veil to reach the people who own and direct the affairs of the corporation.

“Piercing the corporate veil” is legal terminology but I mean it as in knowing the named and located individuals are making decisions for a corporation and the named and located individuals who are its owners.

A legal fiction, such as a corporation, cannot feel public pressure, distress, social ostracism, etc., all things that people are subject to suffering.

Even so, persuasion can only be brought to bear on named and located individuals.

News reports giving only corporate names and not individual owners/agents creates a boil of obfuscation.

A boil of obfuscation that needs lancing. Shall we?

To get us off on a common starting point, here are some resources I will be reviewing/using:

Corporate Research Project

The Corporate Research Project assists community, environmental and labor organizations in researching companies and industries. Our focus is on identifying information that can be used to advance corporate accountability campaigns. [Sponsors Dirt Diggers Digest]

Dirt Diggers Digest

chronicling corporate misbehavior (and how to research it) [blog]

LittleSis

LittleSis* is a free database of who-knows-who at the heights of business and government.

* opposite of Big Brother

OpenCorporates

The largest open database of companies in the world [115,419,017 companies]

Revealing the World of Private Companies by Sheila Coronel

Coronel’s blog post has numerous resources and links.

She also points out that the United States is a top secrecy destination:

…
A top secrecy jurisdiction is the United States, which doesn’t collect the names of shareholders of private companies and is unsurprisingly one of the most favored nations for hiding illicit wealth. (See, for example, this Reuters report on shell companies in Wyoming.) As Senator Carl Levin says, “It takes more information to obtain a driver’s license or open a U.S. bank account than it does to form a U.S. corporation.” Levin has introduced a bill that would end the formation of companies for unidentified persons, but that is unlikely to pass Congress.
…

If we picked one of the non-U.S. sponsors of the #DAPL, we might get lucky and hit a transparent or semi-transparent jurisdiction.

Let’s start with a semi-tough case, a U.S. corporation but a publicly traded one, Wells Fargo.

Where would you go next?

Comments Off

November 19, 2016

How to get superior text processing in Python with Pynini

Filed under: FSTs,Journalism,News,Python,Reporting,Text Mining — Patrick Durusau @ 9:35 pm

How to get superior text processing in Python with Pynini by Kyle Gorman and Richard Sproat.

From the post:

It’s hard to beat regular expressions for basic string processing. But for many problems, including some deceptively simple ones, we can get better performance with finite-state transducers (or FSTs). FSTs are simply state machines which, as the name suggests, have a finite number of states. But before we talk about all the things you can do with FSTs, from fast text annotation—with none of the catastrophic worst-case behavior of regular expressions—to simple natural language generation, or even speech recognition, let’s explore what a state machine is, what they have to do with regular expressions.
…

Reporters, researchers and others will face a 2017 where the rate of information has increased, along with noise from media spasms over the latest taut from president-elect Trump.

Robust text mining/filtering will your daily necessities, if they aren’t already.

Tagging text is the first example. Think about auto-generating graphs from emails with “to:,” “from:,” “date:,” and key terms in the email. Tagging the key terms is essential to that process.

Once tagged, you can slice and dice the text as more information is uncovered.

Interested?

Comments Off

Tracking Business Records Across Asia

Filed under: Journalism,News,Reporting — Patrick Durusau @ 9:10 pm

Tracking Business Records Across Asia by GIJN staff.

From the post:

The paper trail has changed — money now moves digitally and business registries are databases — and this lets journalists do more than ever before in tracking people and companies across borders.

Backgrounding an individual or a company? Following an organized crime ring? The key to uncovering corruption is to “follow the money” — to discover who owns what, who gets which contract, and how business are linked to each other.
…

Resources on tracking corporate records in China, the Philippines and India!

While you are sharpening your tracking skills, don’t forget to support GIJN.

Comments Off

Python Data Science Handbook

Filed under: Data Science,Programming,Python — Patrick Durusau @ 5:27 pm

Python Data Science Handbook (Github)

From the webpage:

Jupyter notebook content for my OReilly book, the Python Data Science Handbook.

See also the free companion project, A Whirlwind Tour of Python: a fast-paced introduction to the Python language aimed at researchers and scientists.

This repository will contain the full listing of IPython notebooks used to create the book, including all text and code. I am currently editing these, and will post them as I make my way through. See the content here:
…

Enjoy!

Comments Off

CIA Raises Technical Incompetence Flag

Filed under: FOIA,Government — Patrick Durusau @ 4:29 pm

The CIA‘s responded to Michael Morisy‘s request for:

“a copy of emails sent to or from the CIA’s FOIA office regarding FOIA Portal’s Technical Issues.”

gives these requirements for requesting emails:

We require requesters seeking any form of “electronic communications” such as emails, to provide the specific “to” and “from” recipients, time frame and subject.

(The full response.)

Recalling that the FBI requested special software to separate emails of Huma Abedin and Anthony Weiner on the same laptop, is the CIA really that technically incompetent in terms of searching?

Is the CIA is incapable of searching emails by subject alone?

With a dissatisfied-with-intelligence-community president-elect Donald Trump about to take office, I would not be flying the Technical Incompetence Here flag.

The CIA may respond it is not incompetent but rather was acting in bad faith.

In debate we used to call that the “horns of a dilemma,” yes?

I’m voting for bad faith.

How about you?

Comments Off

If You Don’t Get A New Car For The Holidays

Filed under: Security — Patrick Durusau @ 3:49 pm

Just because you aren’t expecting:

Doesn’t mean a new car isn’t in your future:

From the Sparrows Lock Pick website:

Sparrows Gridlock

There is a reason as to why a coat hanger is the tool of choice for most Automobile lockouts. Picking a standard 10 wafer Automotive lock is a Huge challenge. Most often it is achieved by being stubborn with a pinch of lucky a dash of skill.

The Gridlock set lets you develop that skill by working through three automotive locks of ever increasing difficulty. Building from a 3 to a 6 to a full 10 wafer Automotive lock will allow you to develop the skill for picking wafers. Wafer picking is an entirely different skill set when compared to pin tumbler picking.

A standard pin tumbler key is cut just along the top to lift the pins up into place letting you open the lock. A wafer lock key is cut on the top and bottom, this then moves the wafers Up and Down positioning them for the lock to open.

Learning to manipulate and rock those wafers into position is a skill ….. a skill that one day may get you a well deserved high five or a court appointed lawyer.

The Gridlock comes with 3 progressive wafer locks and an automotive tension wrench specific to appling tension to wafer locks. The locks are solid aluminum and perfect in scale to a classic car lock.

Think of lock picking as an expansion of your skills at digitally hacking access to automobiles.

With the Auto Rocker Picks, sans shipping, the package lists for $41.50. You may want some additional accessories from the LockPickShop

Security discussions determine when your security will fail, not if.

Security discussions that don’t include physical security determine it will be sooner rather than later.

Comments Off

Eight steps reporters should take … [every day]

Filed under: Journalism,News,Reporting — Patrick Durusau @ 2:38 pm

Eight steps reporters should take before Trump assumes office by Dana Priest.

Reporters should paste these eight steps to their bathrooms mirror for review every day, not just for the Trump presidency:

Rebuild sources: Call every source you’ve ever had who is either still in government or still connected to those who are. Touch base, renew old connections, and remind folks that you’re all ears.

Join forces: Triangulate tips and sources across the newsroom, like we did after 9/11, when reporting became more difficult.

Make outside partnerships: Reporting organizations outside your own newspaper, especially those abroad and with international reach, can help uncover the moves being considered and implemented in foreign countries.

Discover the first family: Now part of the White House team, Donald Trump’s children and son-in-law are an important target for deep-dive reporting into their own financial holdings and their professional and personal records.

Renew the hunt: Find those tax filings!

Out disinformation: Find a way to take on the many false news sites that now hold a destructive sway over some Americans.

Create a war chest: Donate and persuade your news organization to donate large sums to legal defense organizations preparing to jump in with legal challenges the moment Trump moves against access, or worse. The two groups that come to mind are the Reporters’ Committee for Freedom of the Press and the American Civil Liberties Union. Encourage your senior editors to get ready for the inevitable, quickly.

Be grateful: Celebrate your freedom to do hard-hitting, illuminating work by doing much more of it.

Don’t wait for reporters to carry all the load.

Many of these steps, “Renew the hunt” comes to mind, can be performed by non-reporters and then leaked.

A lack of transparency of government signals a lack of effort on the part of the press and public.

FOIA is great but it’s also being spoon fed what the government chooses to release.

I’m thinking of transparency that is less self-serving than FOIA releases.

Comments Off

The Postal Museum (UK)

Filed under: Government,History,Mapping,Maps — Patrick Durusau @ 2:00 pm

The Postal Museum

Set to open in mid-2017, the Postal Museum covers five hundred years of “Royal Mail.”

It’s Online catalogue has more than 120,000 records describing its collection.

Which includes this gem:

Registering for the catalogue will enable you to access downloadable content, save searches, create wish-lists, etc. Registration is free and worth the effort.

The site is in beta and my confirmation email displayed as blank in Thunderbird but viewing source gave the confirmation URL.

A terminology issue. Where the tabs for an item say “Ordering and Viewing,” they mean requesting an items to be retrieved for you to view on a specified day.

I was confused because I thought “ordering” meant obtaining a copy, print or digital of the item in question.

The turnpike road map above is available in a somewhat larger size but not nearly large enough for actual use.

Very high resolution images of maps and similar materials would be a welcome addition to the resources already available.

Enjoy!

PS: I didn’t look but the Postal Museum has resources on stamps as well. 😉

Comments Off

November 18, 2016

Successful Hate Speech/Fake News Filters – 20 Facts About Facebook

Filed under: Facebook,Journalism,News,Reporting — Patrick Durusau @ 11:04 am

After penning Monetizing Hate Speech and False News yesterday, I remembered non-self-starters will be asking:

“Where are examples of successful monetized filters for hate speech and false news?”

Of The Top 20 Valuable Facebook Statistics – Updated November 2016, I need only two to make the case for monetized filters.

1. Worldwide, there are over 1.79 billion monthly active Facebook users (Facebook MAUs) which is a 16 percent increase year over year. (Source: Facebook as of 11/02/16)

…

15. Every 60 seconds on Facebook: 510 comments are posted, 293,000 statuses are updated, and 136,000 photos are uploaded. (Source: The Social Skinny)
…
(emphasis in the original)

By comparison, Newsonomics: 10 numbers on The New York Times’ 1 million digital-subscriber milestone [2015], the New York Times has 1 million digital subscribers.

If you think about it, the New York Times is a hate speech/fake news filter, although it has a much smaller audience than Facebook.

Moreover, the New York Times is spending money to generate content whereas on Facebook, content is there for the taking or filtering.

If the New York Times can make money as a filter for hate speech/fake news carrying its overhead, imagine the potential for profit from simply filtering content generated and posted by others. Across a market of 1.79 billion viewers. Where “hate,” and “fake” varies from audience to audience.

Content filters at Facebook and the ability to “follow” those filters for on timelines is all that is missing. (And Facebook monetizing the use of those filters.)

Petition Mark Zuckerberg and Facebook for content filters today!

Comments Off

November 17, 2016

Operating Systems Design and Implementation (12th USENIX Symposium)

Filed under: Computer Science,CS Lectures,Cybersecurity,Security — Patrick Durusau @ 9:59 pm

Operating Systems Design and Implementation (12th USENIX Symposium) – Savannah, GA, USA, November 2-4, 2016.

Message from the OSDI ’16 Program Co-Chairs:

We are delighted to welcome to you to the 12th USENIX Symposium on Operating Systems Design and Implementation, held in Savannah, GA, USA! This year’s program includes a record high 47 papers that represent the strength of our community and cover a wide range of topics, including security, cloud computing, transaction support, storage, networking, formal verification of systems, graph processing, system support for machine learning, programming languages, troubleshooting, and operating systems design and implementation.
…

Weighing in at seven hundred and ninety-seven (797) pages, this tome will prove more than sufficient to avoid annual family arguments during the holiday season.

Not to mention this is an opportunity to hone your skills to a fine edge.

Comments Off

Monetizing Hate Speech and False News

Filed under: Facebook,Journalism,News,Reporting — Patrick Durusau @ 5:48 pm

Eli Pariser has started If you were Facebook, how would you reduce the influence of fake news? on GoogleDocs.

Out of the now seventeen pages of suggestions, I haven’t noticed any that promise a revenue stream to Facebook.

I view ideas to filter “false news” and/or “hate speech” that don’t generate revenue for Facebook as non-starters. I suspect Facebook does as well.

Here is a broad sketch of how Facebook can monetize “false news” and “hate speech,” all while shaping Facebook timelines to diverse expectations.

Monetizing “false news” and “hate speech”

Facebook creates user defined filters for their timelines. Filters can block other Facebook accounts (and any material from them), content by origin, word and I would suggest, regex.

User defined filters apply only to that account and can be shared with twenty other Facebooks users.

To share a filter with more than twenty other Facebook users, Facebook charges an annual fee, scaled on the number of shares.

Unlike the many posts on “false news” and “hate speech,” being a filter isn’t free beyond twenty other users.

Selling Subscriptions to Facebook Filters

Organizations can sell subscriptions to their filters, Facebook, which controls the authorization of the filters, contracts for a percentage of the subscription fee.

Pro tip: I would not invoke Facebook filters from the Washington Post and New York Times at the same time. It is likely they exclude each other as news sources.

Advantages of Monetizing Hate Speech and False News

First and foremost for Facebook, it gets out of the satisfying every point of view game. Completely. Users are free to define as narrow or as broad a point of view as they desire.

If you see something you don’t like, disagree with, etc., don’t complain to Facebook, complain to your Facebook filter provider.

That alone will expose the hidden agenda behind most, perhaps not all, of the “false news” filtering advocates. They aren’t concerned with what they are seeing on Facebook but they are very concerned with deciding what you see on Facebook.

For wannabe filters of what other people see, beyond twenty other Facebook users, that privilege is not free. Unlike the many proposals with as many definitions of “false news” as appear in Eli’s document.

It is difficult to imagine a privilege people would be more ready to pay for than the right to attempt to filter what other people see. Churches, social organizations, local governments, corporations, you name them and they will be lining up to create filter lists.

The financial beneficiary of the “drive to filter for others” is of course Facebook but one could argue the filter owners profit by spreading their worldview and the unfortunates that follow them, well, they get what they get.

Commercialization of Facebook filters, that is selling subscriptions to Facebook filters creates a new genre of economic activity and yet another revenue stream for Facebook. (That two up to this point if you are keeping score.)

It isn’t hard to imagine the Economist, Forbes, professional clipping services, etc., creating a natural extension of their filtering activities onto Facebook.

Conclusion: Commercialization or Unfunded Work Assignments

Preventing/blocking “hate speech” and “false news,” for free has been, is and always will be a failure.

Changing Facebook infrastructure isn’t free and by creating revenue streams off of preventing/blocking “hate speech” and “false news,” creates incentives for Facebook to make the necessary changes and for people to build filters off of which they can profit.

Not to mention that filtering enables everyone, including the alt-right, alt-left and the sane people in between, to create the Facebook of their dreams, and not being subject to the Facebook desired by others.

Finally, it gets Facebook and Mark Zuckerberg out of the fantasy island approach where they are assigned unpaid work by others. New York Times, Mark Zuckerberg Is in Denial. (It’s another “hit” piece by Zeynep Tufekci.)

If you know Mark Zuckerberg, please pass this along to him.

Comments Off

Pentagon Says: Facts Don’t Matter (Pre-Trump)

Filed under: Government,Plagiarism — Patrick Durusau @ 4:02 pm

Intel chairman: Pentagon plagiarized Wikipedia in report to Congress by Kristina Wong.

From the post:

The Pentagon submitted information plagiarized from Wikipedia to members of Congress, the chairman of the House Intelligence Committee said at a hearing Thursday.

Chairman Devin Nunes (R-Calif.) said on March 21, Deputy Defense Secretary Bob Work submitted a document to the chairmen of the House Intelligence, Armed Services, and Defense appropriations committees with information directly copied from Wikipedia, an online open-source encyclopedia.

The information was submitted in a document used to justify a determination that Croughton was the best location for a joint intelligence center with the United Kingdom, Nunes said. The determination was required by the 2016 National Defense Authorization Act.
…

If that weren’t bad enough, here’s the kicker:

…
Work said he still fulfilled the law by making a determination and that the plagiarized information had “no bearing” on that determination.
…

Do you read that to mean:

Work made the determination
The “made” determination was packed with facts to justify it

In that order?

Remarkably candid admission that Pentagon decisions are made and then those decisions are packed with facts to justify them.

Not particularly surprising to me.

You?

Comments Off

The new Tesseract package: High Quality OCR in R

Filed under: OCR,R — Patrick Durusau @ 1:38 pm

The new Tesseract package: High Quality OCR in R by Jeroen Ooms.

From the post:

Optical character recognition (OCR) is the process of extracting written or typed text from images such as photos and scanned documents into machine-encoded text. The new rOpenSci package tesseract brings one of the best open-source OCR engines to R. This enables researchers or journalists, for example, to search and analyze vast numbers of documents that are only available in printed form.

People looking to extract text and metadata from pdf files in R should try our pdftools package.
…

Reading too quickly at first I thought I had missed a new version of Tesseract (tesseract-ocr Github), an OCR program that I use on a semi-regular basis.

Reading a little slower, ;-), I discovered Ooms is describing a new package for R, which uses Tesseract for OCR.

This is great news but be aware that Tesseract (whether called by an R package or standalone) can generate a large amount of output in a fairly short period of time.

One of the stumbling blocks of OCR is the labor intensive process of cleaning up the inevitable mistakes.

Depending on how critical accuracy is for searching, for example, you may choose to verify and clean only quotes for use in other publications.

Best to make those decisions up front and not be faced with a mountain of output that isn’t useful unless and until it has been corrected.

Comments Off

Mute Account vs. Mute Word/Hashtag – Ineffectual Muting @Twitter

Filed under: Free Speech,Tweets,Twitter — Patrick Durusau @ 10:55 am

I mentioned yesterday the distinction between muting an account versus the new muting by word or #hashtag at Twitter.

Mute by account – Tweets don’t appear in your timeline.
Mute by word or hashtag – Tweets do appear in your timeline.

Take a moment to check my sources at Twitter support to make sure I have the rules correctly stated. I’ll wait.

(I’m not a journalist but readers should be enabled to satisfy themselves claims I make are at least plausible.)

No feedback from Twitter on the don’t appear in your timeline vs. do appear in your timeline distinction.

Why would I want to only block notifications of what I think of as hate speech and still have those tweets in my timeline?

Then it occurred to me:

If you can block tweets from appearing in your timeline by word or hashtag, you can block advertising tweets from appearing in your timeline.

You cannot effectively mute hate speech @Twitter because you could also mute advertising.

What about it Twitter?

Must feminists, people of color, minorities of all types be subjected to hate speech in order to preserve your revenue streams?

Not that I object to Twitter having revenue streams from advertising but it needs to be more sophisticated than the Nigerian spammer model now in use. Charge a higher price for targeted advertising that users are unlikely to block.

For example, I would be highly unlikely to block ads for cs theory/semantic integration tomes. On the other hand, I would follow a mute list that blocked histories of famous cricket matches. (Apologies to any cricket players in the audience.)

In my post: Twitter Almost Enables Personal Muting + Roving Citizen-Censors I offer a solution that requires only minor changes based on data Twitter already collects plus regexes for muting. It puts what you see entirely in the hands of users.

That enables Twitter to get out of the censorship business altogether, something it doesn’t do well anyway, and puts users in charge of what they see. A win-win from my perspective.

Comments Off

Alt-right suspensions lay bare Twitter’s consistency [hypocrisy] problem

Filed under: Censorship,Free Speech,Twitter — Patrick Durusau @ 10:10 am

Alt-right suspensions lay bare Twitter’s consistency problem by Nausicaa Renner.

From the post:

TWITTER SUSPENDED A NUMBER OF ACCOUNTS associated with the alt-right, USA Today reported this morning. This move was bound to be divisive: While Twitter has banned and suspended users in the past (prominently, Milo Yiannopoulos for incitement), USA Today points out the company has never suspended so many at once—at least seven in this case. Richard Spencer, one of the suspended users and prominent alt-righter, also had a verified account on Twitter. He claims, “I, and a number of other people who have just got banned, weren’t even trolling.”

If this is true, it would be a powerful political statement, indeed. As David Frum notes in The Atlantic, “These suspensions seem motivated entirely by viewpoint, not by behavior.” Frum goes on to argue that a kingpin strategy on Twitter’s part will only strengthen the alt-right’s audience. But we may never know Twitter’s reasoning for suspending the accounts. Twitter declined to comment on its moves, citing privacy and security reasons.
…
(emphasis in original)

Contrary to the claims of the Southern Poverty Law Center (SPLC) to Twitter, these users may not have been suspended for violating Twitter’s terms of service, but for their viewpoints.

Like the CIA, FBI and NSA, Twitter uses secrecy to avoid accountability and transparency for its suspension process.

The secrecy – avoidance of accountability/transparency pattern is one you should commit to memory. It is quite common.

Twitter needs to develop better muting options for users and abandon account suspension (save on court order) altogether.

Comments Off

November 16, 2016

XML Prague 2017, February 9-11, 2017 – Registration Opens!

Filed under: Conferences,XML,XQuery,XSLT — Patrick Durusau @ 3:22 pm

XML Prague 2017, February 9-11, 2017

I mentioned XML Prague 2017 last month and now, after the election of Donald Trump as president of the United States, registration for the conference opens!

Coincidence?

Maybe. 😉

Even if you are returning to the U.S. after the conference, XML Prague will be a welcome respite from the tempest of news coverage of what isn’t known about the impending Trump administration.

At 120 Euros for three days, this is a great investment both professionally and emotionally.

Enjoy!

Comments Off

The Amnesic Incognito Live System (Tails) 2.7

Filed under: Cybersecurity,Security — Patrick Durusau @ 3:10 pm

The Amnesic Incognito Live System (Tails) 2.7

The Amnesic Incognito Live System (Tails) is a Debian-based, live distribution with the goal of providing Internet anonymity for its users. The distribution accomplishes this by directing Internet traffic through the Tor network and by providing built-in tools for protecting files and scrubbing away meta data. The project’s latest release mostly focuses on fixing bugs and improving security: “Tails 2.7 is out. This release fixes many security issues and users should upgrade as soon as possible. New features: ship LetsEncrypt intermediate SSL certificate so that our tools are able to authenticate our website when its certificate is updated. Upgrades and changes: Tor 0.2.8.9, Tor Browser 6.0.6, Linux kernel 4.7, Icedove 45.4.0. Fixed problems: Synaptic installs packages with the correct architecture; set default spelling to en_US in Icedove. Known issues: users setting their Tor Browser security slider to High will have to click on a link to see the result of the search they done with the search box.” Additional information on Tails 2.7 can be found in the project’s release notes. A list of issues fixed in the 2.7 release can be found in the list of former security issues. Download: tails-i386-2.7.iso (1,113MB, signature, pkglist). Also available from OSDisc.

An essential part of your overall cybersecurity stance.

All releases are date/time sensitive.

BEFORE installing this release, even later today, check for a later release: Tails.

Checking for the latest release only takes seconds and is a habit that will help you avoid patched security holes.

Comments Off

PoisonTap – Wishlist 2016

Filed under: Cybersecurity,Security — Patrick Durusau @ 2:50 pm

PoisonTap Steals Cookies, Drops Backdoors on Password-Protected Computers by Chris Brook.

From the post:

Even locked, password-protected computers are no rival for Samy Kamkar and his seemingly endless parade of gadgets.

His latest, PoisonTap, is a $5 Raspberry Pi Zero device running Node.js that’s retrofitted to emulate an Ethernet device over USB. Assuming a victim has left their web browser open, once plugged in to a machine, the device can quietly fetch HTTP cookies and sessions from millions of websites, even if the computer is locked.

If that alone doesn’t sound like Mr. Robot season three fodder, the device can also expose the machine’s internal router and install persistent backdoors, guaranteeing an attacker access long after they’ve removed the device from a USB slot.

“[The device] produces a cascading effect by exploiting the existing trust in various mechanisms of a machine and network, including USB, DHCP, DNS, and HTTP, to produce a snowball effect of information exfiltration, network access and installation of semi-permanent backdoors,” Kamkar said Wednesday in a writeup of PoisonTap.
…

Opportunity may only knock once.

Be prepared by carrying one or more PoisonTaps along with a bootable USB stick.

Comments Off

“…Fake News Is Not the Problem”

Filed under: Journalism,News,Reporting — Patrick Durusau @ 1:52 pm

According to Snopes, Fake News Is Not the Problem by Brooke Binkowski.

From the post:

Take it from the internet’s chief myth busters: The problem is the failing media.

…

This is the state of truth on the internet in 2016, now that it is as easy for a Macedonian teenager to create a website as it is for The New York Times, and now that the information most likely to find a large audience is that which is most alarming, not most correct. In the wake of the election, the spread of this kind of phony news on Facebook and other social media platforms has come under fire for stoking fears and influencing the election’s outcome. Both Facebook and Google have taken moves to bar fake news sites from their advertising platforms, aiming to cut off the sites’ sources of revenue.

But as managing editor of the fact-checking site Snopes, Brooke Binkowski believes Facebook’s perpetuation of phony news is not to blame for our epidemic of misinformation. “It’s not social media that’s the problem,” she says emphatically. “People are looking for somebody to pick on. The alt-rights have been empowered and that’s not going to go away anytime soon. But they also have always been around.”

The misinformation crisis, according to Binkowski, stems from something more pernicious. In the past, the sources of accurate information were recognizable enough that phony news was relatively easy for a discerning reader to identify and discredit. The problem, Binkowski believes, is that the public has lost faith in the media broadly — therefore no media outlet is considered credible any longer. The reasons are familiar: as the business of news has grown tougher, many outlets have been stripped of the resources they need for journalists to do their jobs correctly. “When you’re on your fifth story of the day and there’s no editor because the editor’s been fired and there’s no fact checker so you have to Google it yourself and you don’t have access to any academic journals or anything like that, you will screw stories up,” she says.
…

Sadly Binkowski’s debunking of the false/fake news meme doesn’t turn up on Snopes.com.

That might make it more convincing to mainstream media who have seized upon false/fake news to excuse their lack of credibility with readers.

Please share the Binkowski post with your friends, especially journalists.

Comments Off

Twitter Almost Enables Personal Muting + Roving Citizen-Censors

Filed under: Censorship,Free Speech,Tweets,Twitter — Patrick Durusau @ 12:40 pm

Investigating news reports of Twitter enabling muting of words and hashtags lead me to Advanced muting options on Twitter. Also relevant is Muting accounts on Twitter.

Alex Hern‘s post: Twitter users to get ability to mute words and conversations prompted this search because I found:

After nine years, Twitter users will finally be able to mute specific conversations on the site, as well as filter out all tweets with a particular word or phrase from their notifications.

The much requested features are being rolled out today, according to the company. Muting conversations serves two obvious purposes: users who have a tweet go viral will no longer have to deal with thousands of replies from strangers, while users stuck in an interminable conversation between people they don’t know will be able to silently drop out of the discussion.

A broader mute filter serves some clear general uses as well. Users will now be able to mute the names of popular TV shows, for instance, or the teams playing in a match they intend to watch later in the day, from showing up in their notifications, although the mute will not affect a user’s main timeline. “This is a feature we’ve heard many of you ask for, and we’re going to keep listening to make it better and more comprehensive over time,” says Twitter in a blogpost.
…

to be too vague to be useful.

Starting with Advanced muting options on Twitter, you don’t have to read far to find:

Note: Muting words and hashtags only applies to your notifications. You will still see these Tweets in your timeline and via search. The muted words and hashtags are applied to replies and mentions, including all interactions on those replies and mentions: likes, Retweets, additional replies, and Quote Tweets.

That’s the second paragraph and displayed with a high-lighted background.

So, “muting” of words and hashtags only stops notifications.

“Muted” offensive or inappropriate content is still visible “in your timeline and search.”

Perhaps really muting based on words and hashtags will be a paid subscription feature?

The other curious aspect is that “muting” an account carries an entirely different meaning.

The first sentence in Muting accounts on Twitter reads:

Mute is a feature that allows you to remove an account’s Tweets from your timeline without unfollowing or blocking that account.

Quick Summary:

Mute account – Tweets don’t appear in your timeline.
Mute by word or hashtag – Tweets do appear in your timeline.

How lame is that?

Solution That Avoids Censorship

The solution to Twitter’s “hate speech,” which means different things to different people isn’t hard to imagine:

Mute by account, word, hashtag or regex – Tweets don’t appear in your timeline.
Mute lists can be shared and/or followed by others.

Which means that if I trust N’s judgment on “hate speech,” I can follow their mute list. That saves me the effort of constructing my own mute list and perhaps even encourages the construction of public mute lists.

Twitter has the technical capability to produce such a solution in short order so you have to wonder why they haven’t? I have no delusion of being the first person to have imagined such a solution. Twitter? Comments?

The Alternative Solution – Roving Citizen-Censors

The alternative to a clean and non-censoring solution is covered in the USA Today report Twitter suspends alt-right accounts:

Twitter suspended a number of accounts associated with the alt-right movement, the same day the social media service said it would crack down on hate speech.

Among those suspended was Richard Spencer, who runs an alt-right think tank and had a verified account on Twitter.

The alt-right, a loosely organized group that espouses white nationalism, emerged as a counterpoint to mainstream conservatism and has flourished online. Spencer has said he wants blacks, Asians, Hispanics and Jews removed from the U.S.
…
[I personally find Richard Spencer’s views abhorrent and report them here only by way of example.]

From the report, Twitter didn’t go gunning for Richard Spencer’s account but the Southern Poverty Law Center (SPLC) did.

The SPLC didn’t follow more than 100 white supermacists to counter their outlandish claims or to offer a counter-narrative. They followed to gather evidence of alleged violations of Twitter’s terms of service and to request removal of those accounts.

Government censorship of free speech is bad enough, enabling roving bands of self-righteous citizen-censors to do the same is even worse.

The counter-claim that Twitter isn’t the government, it’s not censorship, etc., is intellectually and morally dishonest. Technically true in U.S. constitutional law sense but suppression of speech is the goal and that’s censorship, whatever fig leaf the SPLC wants to put on it. They should be honest enough to claim and defend the right to censor the speech of others.

I would not vote in their favor, that is to say they have a right to censor the speech of others. They are free to block speech they don’t care to hear, which is what my solution to “hate speech” on Twitter enables.

Support muting, not censorship or roving bands of citizen-censors.

Comments Off

November 15, 2016

BBC World Service – In 40 Languages [Non-U.S. Centric Topic Mappers Take Note]

Filed under: BBC,Language,News — Patrick Durusau @ 9:56 pm

BBC World Service announces biggest expansion ‘since the 1940s’

From the post:

The BBC World Service will launch 11 new language services as part of its biggest expansion “since the 1940s”, the corporation has announced.

The expansion is a result of the funding boost announced by the UK government last year.

The new languages will be Afaan Oromo, Amharic, Gujarati, Igbo, Korean, Marathi, Pidgin, Punjabi, Telugu, Tigrinya, and Yoruba.

The first new services are expected to launch in 2017.

“This is a historic day for the BBC, as we announce the biggest expansion of the World Service since the 1940s,” said BBC director general Tony Hall.

“The BBC World Service is a jewel in the crown – for the BBC and for Britain.

“As we move towards our centenary, my vision is of a confident, outward-looking BBC which brings the best of our independent, impartial journalism and world-class entertainment to half a billion people around the world.
…

Excellent!

The BBC World Service is the starting place to broaden your horizons.

In English “all shows” lists 1831 shows.

I prefer reading over listening but have resolved to start exploring the world of the BBC.

Comments Off

Surveillance Self-Defense [Guide to creating “false” persona?]

Filed under: Cybersecurity,Privacy,Security — Patrick Durusau @ 7:51 pm

Surveillance Self-Defense – Tips, Tools and How-Tos for Safer Online Communications

From the webpage:

Modern technology has given those in power new abilities to eavesdrop and collect data on innocent people. Surveillance Self-Defense is EFF’s guide to defending yourself and your friends from surveillance by using secure technology and developing careful practices.

Select an article from our index to learn about a tool or issue, or check out one of our playlists to take a guided tour through a new set of skills.
…

Definitely a starting point that merits sharing.

One important topic that is missing: How to create a “false” persona?

A “false” persona that cannot be connected back to a user is far more valuable than two-factor authentication, strong passwords, etc.

Pointers to such resources?

Comments Off

False, Misleading, Clickbait-y, and Satirical “News” Sources (Another Useful Listicle)

Filed under: Journalism,News,Reporting — Patrick Durusau @ 5:39 pm

False, Misleading, Clickbait-y, and Satirical “News” Sources by Melissa Zimdars.

From the document:

Below is a list of fake, false, regularly misleading, and/or otherwise questionable “news” organizations, as well as organizations that regularly use clickbait-y headlines and descriptions, that are commonly shared on facebook and other social media sites. Some of these websites rely on “outrage” by using distorted headlines and decontextualized or dubious information in order to generate likes, shares, and profits.

Other sources on this list are purposefully fake with the intent of satire/comedy, which can offer important critical commentary on politics and society, but they are regularly shared as actual/literal news. I’m including them here, for now, because 1.) they have the potential to perpetuate misinformation based on different audience (mis)interpretations and 2.) to make sure anyone who reads a story by The Onion, for example, understands its purpose. If you think this is unnecessary, please see Literally Unbelievable.
…

This list is in the process of being updated and to her credit, Melissa explicitly says that no source should be given an automatic imprimatur.

Too many commentators to complain about “false news,” and/or “bubbles:”

Want to separate true/false news for you
Want to sell you their bubble to replace your own

You will be less informed and less capable of evaluating news for yourself in either case.

As Melissa notes, read widely and with a critical eye.

Comments Off

Useful Listicle: The 5 most downloaded R packages

Filed under: Programming,R — Patrick Durusau @ 5:27 pm

The 5 most downloaded R packages

From the post:

Curious which R packages your colleagues and the rest of the R community are using? Thanks to Rdocumentation.org you can now see for yourself! Rdocumentation.org aggregates R documentation and download information from popular repositories like CRAN, BioConductor and GitHub. In this post, we’ll take a look at the top 5 R packages with the most direct downloads!
…

Sorry! No spoiler!

Do check out:

Rdocumentation.org aggregates help documentation for R packages from CRAN, BioConductor, and GitHub – the three most common sources of current R documentation. RDocumentation.org goes beyond simply aggregating this information, however, by bringing all of this documentation to your fingertips via the RDocumentaion package. The RDocumentation package overwrites the basic help functions from the utils package and gives you access to RDocumentation.org from the comfort of your RStudio IDE. Look up the newest and most popular R packages, search through documentation and post community examples.

As they say:

Create an RDocumentation account today!

I’m always sympathetic to documentation but more so today because I have wasted hours over the past two or three days on issues that could have been trivially documented.

I will be posting “corrected” documentation later this week.

PS: If you have or suspect you have poorly written documentation, I have some time available for paid improvement of the same.

Comments Off

« Newer Posts — Older Posts »

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 21, 2016

November 20, 2016

November 19, 2016

November 18, 2016

November 17, 2016

November 16, 2016

November 15, 2016