Statistical Functions for XQuery 3.1 (see OpenFormula)

June 24th, 2017

simple-statsxq by Tim Thompson.

From the webpage:

Loosely inspired by the JavaScript simple-statistics project. The goal of this module is to provide a basic set of statistical functions for doing data analysis in XQuery 3.1.

Functions are intended to be implementation-agnostic.

Unit tests were written using the unit testing module of BaseX.

OpenFormula (Open Document Format for Office Applications (OpenDocument)) defines eighty-seven (87) statistical functions.

There are fifty-five (55) financial functions defined by OpenFormula, just in case you are interested.

See Through Walls With WiFi!

June 22nd, 2017

Drones that can see through walls using only Wi-Fi

From the post:

A Wi-Fi transmitter and two drones. That’s all scientists need to create a 3D map of the interior of your house. Researchers at the University of California, Santa Barbara have successfully demonstrated how two drones working in tandem can ‘see through’ solid walls to create 3D model of the interiors of a building using only, and we kid you not, only Wi-Fi signals.

As astounding as it sounds, researchers Yasamin Mostofi and Chitra R. Karanam have devised this almost superhero-level X-ray vision technology. “This approach utilizes only Wi-Fi RSSI measurements, does not require any prior measurements in the area of interest and does not need objects to move to be imaged,” explains Mostofi, who teaches electrical and computer engineering at the University.

For the paper and other details, see: 3D Through-Wall Imaging With Unmanned Aerial Vehicles and WiFi.

Before some contractor creates the Stingray equivalent for law enforcement, researchers and electronics buffs need to create new and improved versions for the public.

Government and industry offices are more complex than this demo but the technology will continue to improve.

I don’t have the technical ability to carry out the experiment but wondering if measurement of a strong signal from any source as it approaches a building and then its exit on the far side would serve the same purpose?

Reasoning that government/industry buildings may become shielded to some signals but in an age of smart phones, not all.

Enjoy!

.Rddj (data journalism with R)

June 21st, 2017

.Rddj Hand-curated, high quality resources for doing data journalism with R by Timo Grossenbacher.

From the webpage:

The R Project is a great software environment for doing all sorts of data-driven journalism. It can be used for any of the stages of a typical data project: data collection, cleaning, analysis and even (interactive) visualization. And it’s all reproducible and transparent! Sure, it requires a fair amount of scripting, yet…

Do not fear! With this hand-curated (and opinionated) list of resources, you will be guided through the thick jungle of countless R packages, from learning the basics of R’s syntax, to scraping HTML tables, to a guide on how to make your work comprehensible and reproducible.

Now, enjoy your journey.

Some more efforts at persuasion: As I work in the media, I know how a lot of journalists are turned off by everything that doesn’t have a graphical interface with buttons to click on. However, you don’t need to spend days studying programming concepts in order to get started with R, as most operations can be carried out without applying scary things such as loops or conditionals – and, nowadays, high-level abstrations like dplyr make working with data a breeze. My advice if you’re new to data journalism or data processing in general: Better learn R than Excel, ’cause getting to know Excel (and the countless other tools that each do a single thing) doesn’t come for free, either.

This list is (partially) inspired by R for Journalists by Ed Borasky, which is another great resource for getting to know R.

… (emphasis in original)

The topics are familiar:

  • RStudio
  • Syntax and basic R programming
  • Collecting Data (from the Web)
  • Data cleaning and manipulation
  • Text mining / natural language processing
  • Exploratory data analysis and plotting
  • Interactive data visualization
  • Publication-quality graphics
  • Reproducibility
  • Examples of using R in (data) journalism
  • What makes this list of resources different from search results?

    Hand curation.

    How much of a difference?

    Compare the search results of “R” + any of these categories to the resources here.

    Bookmark .Rddj for data journalism and R, then ping me with the hand curated list of resources you are creating.

    Save yourself and the rest of us from search. Thanks!

    Storyzy A.I. Fights Fake Quotes (Ineffective Against Trump White House)

    June 21st, 2017

    In the battle against fake news, Storyzy A.I. fights fake quotes

    From the post:

    The Quote Verifier launched today by Storyzy takes the battle against fake news to a whole new automated level by conveniently flagging fake quotes on social networks and search engines with +50,000 new authentic quotes added daily.

    Storyzy aims to help social networks and search engines by spotting fake quotes. To fulfill this ambition Storyzy developed a tool (currently available in Beta version) that verifies whether a quote is authentic or not by checking if a person truly said that or not.
    … (emphasis in original)

    A tool for your short-list of verification tools to use on a daily basis.

    It’s ineffective against the Trump White House because accurate quotes can still be “false.”

    “Truthful quotes,” as per Trump White House policy, issue only from the President and must reflect what he meant to say. Subject to correction by the President.

    A “truthful quote,” consists of three parts:

    1. Said by the President
    2. Reflects what he meant to say
    3. Includes any subsequent correction by the President (one or more)

    There is a simply solution to avoiding “false” quotes from President Trump:

    Never quote him or his tweets at all.

    Quote his lackeys, familiars and sycophants, but not him.

    A Dictionary of Victorian Slang (1909)

    June 20th, 2017

    Passing English of the Victorian era, a dictionary of heterodox English, slang and phrase (1909) by J. Reeding Ware.

    Quoted from the Preface:

    HERE is a numerically weak collection of instances of ‘Passing English’. It may be hoped that there are errors on every page, and also that no entry is ‘quite too dull’. Thousands of words and phrases in existence in 1870 have drifted away, or changed their forms, or been absorbed, while as many have been added or are being added. ‘Passing English’ ripples from countless sources, forming a river of new language which has its tide and its ebb, while its current brings down new ideas and carries away those that have dribbled out of fashion. Not only is ‘Passing English’ general ; it is local ; often very seasonably local. Careless etymologists might hold that there are only four divisions of fugitive language in London west, east, north and south. But the variations are countless. Holborn knows little of Petty Italia behind Hatton Garden, and both these ignore Clerkenwell, which is equally foreign to Islington proper; in the South, Lambeth generally ignores the New Cut, and both look upon Southwark as linguistically out of bounds; while in Central London, Clare Market (disappearing with the nineteenth century) had, if it no longer has, a distinct fashion in words from its great and partially surviving rival through the centuries the world of Seven Dials, which is in St Giles’s St James’s being ractically in the next parish. In the East the confusion of languages is a world of ‘ variants ‘ there must be half-a-dozen of Anglo-Yiddish alone all, however, outgrown from the Hebrew stem. ‘Passing English’ belongs to all the classes, from the peerage class who have always adopted an imperfection in speech or frequency of phrase associated with the court, to the court of the lowest costermonger, who gives the fashion to his immediate entourage.

    A healthy reminder that language is no more fixed and unchanging than the people who use it.

    Enjoy!

    My Last Index (Is Search A Form of Discrimination?)

    June 20th, 2017

    My Last Index by Judith Pascoe.

    From the post:

    A casual reader of authors’ acknowledgment pages will encounter expressions of familial gratitude that paper over years of spousal neglect and missed cello recitals. A keen reader of those pages may happen upon animals that were essential to an author’s well-being—supportive dogs, diverting cats, or, in one instance, “four very special squirrels.” But even an assiduous reader of acknowledgments could go a lifetime without coming across a single shout-out to a competent indexer.

    That is mostly because the index gets constructed late in the book-making process. But it’s also because most readers pay no mind to indexes, especially at this moment in time when they are being supplanted by Amazon and Google. More and more, when I want to track down an errant tidbit of information about a book, I use Amazon’s “Search inside this book” function, which allows interested parties to access a book’s front cover, copyright, table of contents, first pages (and sometimes more), and index. But there’s no reason to even use the index when you can “Look Inside!” to find anything you need.

    I had plenty of time to ponder the unsung heroism of indexers when I was finishing my latest book. Twice before, I had assembled an indexer’s tools of trade: walking down the stationery aisles of a college book store, pausing to consider the nib and color of my Flair pens, halting before the index cards. But when I began work on this index, I was overcome with thoughts of doom that Nancy Mulvany, author of Indexing Books, attributes to two factors that plague self-indexing authors: general fatigue and too much self-involvement. “Intense involvement with one’s book,” Mulvany writes, “can make it very difficult to anticipate the index user’s needs accurately.”

    Perhaps my mood was dire because I’d lost the services of my favorite proofreader, a woman who knew a blackberry from a BlackBerry, and who could be counted on to fix my flawed French. Perhaps it was because I was forced to notice how often I’d failed to include page citations in my bibliography entries, and how inconsistently I’d applied the protocol for citing Web sites—a result of my failure to imagine a future index user so needy as to require the exact date of my visit to theirvingsociety.org.uk. Or perhaps it was because my daughter was six months away from leaving home for college and I was missing her in advance.

    Perhaps for all of those reasons, I could only see my latest index as a running commentary on the fragility of all human endeavor. And so I started reading indexes while reluctantly compiling my own.

    A highly instructive tale on the importance of indexing (and hiring a professional indexer) that includes this reference to Jonathan Swift:


    Jonathan Swift, in his 1704 A Tale of a Tub, describes two means of using books: “to serve them as men do lords—learn their titles exactly and then brag of their acquaintance,” or “the choicer, the profounder and politer method, to get a thorough insight into the index, by which the whole book is governed and turned, like fishes by the tail.”

    In full context, the Swift passage is even more amusing:


    The whole course of things being thus entirely changed between us and the ancients, and the moderns wisely sensible of it, we of this age have discovered a shorter and more prudent method to become scholars and wits, without the fatigue of reading or of thinking. The most accomplished way of using books at present is twofold: either first to serve them as some men do lords, learn their titles exactly, and then brag of their acquaintance; or, secondly, which is indeed the choicer, the profounder, and politer method, to get a thorough insight into the index by which the whole book is governed and turned, like fishes by the tail. For to enter the palace of learning at the great gate requires an expense of time and forms, therefore men of much haste and little ceremony are content to get in by the back-door. For the arts are all in a flying march, and therefore more easily subdued by attacking them in the rear. Thus physicians discover the state of the whole body by consulting only what comes from behind. Thus men catch knowledge by throwing their wit on the posteriors of a book, as boys do sparrows with flinging salt upon their tails. Thus human life is best understood by the wise man’s rule of regarding the end. Thus are the sciences found, like Hercules’ oxen, by tracing them backwards. Thus are old sciences unravelled like old stockings, by beginning at the foot. (The Tale of a Tub by Jonathan Swift)

    Searching, as opposed to indexing (good indexing at any rate), is the equivalent of bragging of the acquaintance of a lord. Yes, you did find term A or term B in the text, but you don’t know what other terms appear in the text, nor do you know what other statements were made about term A or term B.

    Search is at best a partial solution and one that varies based on the skill of the searcher.

    Indexing, on the other hand, can reflect an accumulation of insights, made equally available to all readers.

    Hmmm, equally made available to all readers.

    Is search a form of discrimination?

    Is search a type of access with disproportionate (read disadvantageous) impact on some audiences and not others?

    Any research on the social class, racial, ethnic impact of search you would suggest?

    All leads and tips appreciated!

    Manning Leaks — No Real Harm (Database of Government Liars Anyone?)

    June 20th, 2017

    Secret Government Report: Chelsea Manning Leaks Caused No Real Harm by Jason Leopold.

    From the post:

    In the seven years since WikiLeaks published the largest leak of classified documents in history, the federal government has said they caused enormous damage to national security.

    But a secret, 107-page report, prepared by a Department of Defense task force and newly obtained by BuzzFeed News, tells a starkly different story: It says the disclosures were largely insignificant and did not cause any real harm to US interests.

    Regarding the hundreds of thousands of Iraq-related military documents and State Department cables provided by the Army private Chelsea Manning, the report assessed “with high confidence that disclosure of the Iraq data set will have no direct personal impact on current and former U.S. leadership in Iraq.”

    The 107 page report, redacted, runs 35 pages. Thanks to BuzzFeed News for prying that much of a semblance of the truth out of the government.

    It is further proof that US prosecutors and other federal government representatives lie to the courts, the press and the public, whenever its suits their purposes.

    Anyone with transcripts from the original Manning hearings, should identify statements by prosecutors at variance with this report, noting the prosecutor’s name, rank and recording the page/line reference in the transcript.

    That individual prosecutors and federal law enforcement witnesses lie is a commonly known fact. What I haven’t seen, is a central repository of all such liars and the lies they have told.

    I mention a central repository because to say one or two prosecutors have lied or been called down by a judge grabs a headline, but showing a pattern over decades of lying by the state, that could move to an entirely different level.

    Judges, even conservative ones (especially conservative ones?), don’t appreciate being lied to by anyone, including the state.

    The state has chosen lying as its default mode of operation.

    Let’s help them wear that banner.

    Interested?

    Concealed Vulnerability Survives Reboots – Consumers Left in Dark

    June 19th, 2017

    New Vulnerability Could Give Mirai the Ability to Survive Device Reboots by Catalin Cimpanu

    From the post:

    Until now, all malware targeting IoT devices survived only until the user rebooted his equipment, which cleared the device’s memory and erased the malware from the user’s equipment.

    Intense Internet scans for vulnerable targets meant that devices survived only minutes until they were reinfected again, which meant that users needed to secure devices with unique passwords or place behind firewalls to prevent exploitation.

    New vulnerability allows for permanent Mirai infections

    While researching the security of over 30 DVR brands, researchers from Pen Test Partners have discovered a new vulnerability that could allow the Mirai IoT worm and other IoT malware to survive between device reboots, permitting for the creation of a permanent IoT botnet.

    “We’ve […] found a route to remotely fix Mirai vulnerable devices,” said Pen Test Partners researcher Ken Munro. “Problem is that this method can also be used to make Mirai persistent beyond a power off reboot.”

    Understandably, Munro and his colleagues decided to refrain from publishing any details about this flaw, fearing that miscreants might weaponize it and create non-removable versions of Mirai, a malware known for launching some of the biggest DDoS attacks known today.

    Do security researchers realize concealing vulnerabilities prevents market forces from deciding the fate of insecure systems?

    Should security researchers marketing vulnerabilities to manufacturers be more important than the operation market forces on their products?

    More important than your right to choose products based on the best and latest information?

    Market forces are at work here, but they aren’t ones that will benefit consumers.

    E-Cigarette Can Hack Your Computer (Is Nothing Sacred?)

    June 19th, 2017

    Kavita Iyer has the details on how an e-cigarette can be used to hack your computer at: Know How E-Cigarette Can Be Used By Hackers To Target Your Computer.

    I’m guessing you aren’t so certain that expensive e-cigarette you “found” is harmless after all?

    Malware in e-cigarettes seems like a stretch given the number of successful phishing emails every year.

    But, a recent non-smoker maybe the security lapse you need.

    Key DoD Officials – September 1947 to June 2017

    June 19th, 2017

    While looking for a particular Department of Defense official, I stumbled on: Department of Defense Key Officials September 1947–June 2017.

    Yes, almost seventy (70) years worth of key office holders at the DoD. It’s eighty (80) pages long, produced by the Historical Office of the Secretary of Defense.

    One potential use, aside from giving historical military fiction a ring of authenticity, would be to use this as a starting set of entities to trace through the development of the military/industrial complex.

    Everyone, including me, refers to the military/industrial complex as though it is a separate entity, over there somewhere.

    But as everyone discovered with the Panama Papers, however tangled and corrupt even world-wide organizations can be, we have the technology to untangle those knots and to shine bright lights into obscure corners.

    Interested?

    DoD Audit Ready By End of September (Which September? Define “ready.”)

    June 19th, 2017

    For your Monday amusement: Pentagon Official: DoD will be audit ready by end of September by Eric White.

    From the post:

    In today’s Federal Newscast, the Defense Department’s Comptroller David Norquist said the department has been properly preparing for its deadline for audit readiness.

    The Pentagon’s top financial official said DoD will meet its deadline to be “audit ready” by the end of September. DoD has been working toward the deadline for the better part of seven years, and as the department pointed out in its most recent audit readiness update, most federal agencies haven’t earned clean opinions until they’ve been under full-scale audits for several years. But newly-confirmed comptroller David Norquist said now’s the time to start. He said the department has already contracted with several outside accounting firms to perform the audits, both for the Defense Department’s various components and an overarching audit of the entire department.

    I’m reminded of the alleged letter by the Duke of Wellington to Whitehall:

    Gentlemen,

    Whilst marching from Portugal to a position which commands the approach to Madrid and the French forces, my officers have been diligently complying with your requests which have been sent by H.M. ship from London to Lisbon and thence by dispatch to our headquarters.

    We have enumerated our saddles, bridles, tents and tent poles, and all manner of sundry items for which His Majesty’s Government holds me accountable. I have dispatched reports on the character, wit, and spleen of every officer. Each item and every farthing has been accounted for, with two regrettable exceptions for which I beg your indulgence.

    Unfortunately the sum of one shilling and ninepence remains unaccounted for in one infantry battalion’s petty cash and there has been a hideous confusion as the the number of jars of raspberry jam issued to one cavalry regiment during a sandstorm in western Spain. This reprehensible carelessness may be related to the pressure of circumstance, since we are war with France, a fact which may come as a bit of a surprise to you gentlemen in Whitehall.

    This brings me to my present purpose, which is to request elucidation of my instructions from His Majesty’s Government so that I may better understand why I am dragging an army over these barren plains. I construe that perforce it must be one of two alternative duties, as given below. I shall pursue either one with the best of my ability, but I cannot do both:

    1. To train an army of uniformed British clerks in Spain for the benefit of the accountants and copy-boys in London or perchance.

    2. To see to it that the forces of Napoleon are driven out of Spain.

    Your most obedient servant,

    Wellington

    The primary function of any military organization is suppression of the currently designated “enemy.”

    Congress should direct the Department of Homeland Security (DHS) to auditing the DoD.

    Instead of chasing fictional terrorists, DHS staff would be chasing known to exist dollars and alleged expenses.

    TensorFlow 1.2 Hits The Streets!

    June 17th, 2017

    TensorFlow 1.2

    I’m not copying the features and improvement here, better that you download TensorFlow 1.2 and experience them for yourself!

    The incomplete list of models at TensorFlow Models:

    • adversarial_crypto: protecting communications with adversarial neural cryptography.
    • adversarial_text: semi-supervised sequence learning with adversarial training.
    • attention_ocr: a model for real-world image text extraction.
    • autoencoder: various autoencoders.
    • cognitive_mapping_and_planning: implementation of a spatial memory based mapping and planning architecture for visual navigation.
    • compression: compressing and decompressing images using a pre-trained Residual GRU network.
    • differential_privacy: privacy-preserving student models from multiple teachers.
    • domain_adaptation: domain separation networks.
    • im2txt: image-to-text neural network for image captioning.
    • inception: deep convolutional networks for computer vision.
    • learning_to_remember_rare_events: a large-scale life-long memory module for use in deep learning.
    • lm_1b: language modeling on the one billion word benchmark.
    • namignizer: recognize and generate names.
    • neural_gpu: highly parallel neural computer.
    • neural_programmer: neural network augmented with logic and mathematic operations.
    • next_frame_prediction: probabilistic future frame synthesis via cross convolutional networks.
    • object_detection: localizing and identifying multiple objects in a single image.
    • real_nvp: density estimation using real-valued non-volume preserving (real NVP) transformations.
    • resnet: deep and wide residual networks.
    • skip_thoughts: recurrent neural network sentence-to-vector encoder.
    • slim: image classification models in TF-Slim.
    • street: identify the name of a street (in France) from an image using a Deep RNN.
    • swivel: the Swivel algorithm for generating word embeddings.
    • syntaxnet: neural models of natural language syntax.
    • textsum: sequence-to-sequence with attention model for text summarization.
    • transformer: spatial transformer network, which allows the spatial manipulation of data within the network.
    • tutorials: models described in the TensorFlow tutorials.
    • video_prediction: predicting future video frames with neural advection.

    And your TensorFlow model is ….?

    Enjoy!

    If You Don’t Think “Working For The Man” Is All That Weird

    June 17th, 2017

    J.P.Morgan’s massive guide to machine learning and big data jobs in finance by Sara Butcher.

    From the post:

    Financial services jobs go in and out of fashion. In 2001 equity research for internet companies was all the rage. In 2006, structuring collateralised debt obligations (CDOs) was the thing. In 2010, credit traders were popular. In 2014, compliance professionals were it. In 2017, it’s all about machine learning and big data. If you can get in here, your future in finance will be assured.

    J.P. Morgan’s quantitative investing and derivatives strategy team, led Marko Kolanovic and Rajesh T. Krishnamachari, has just issued the most comprehensive report ever on big data and machine learning in financial services.

    Titled, ‘Big Data and AI Strategies’ and subheaded, ‘Machine Learning and Alternative Data Approach to Investing’, the report says that machine learning will become crucial to the future functioning of markets. Analysts, portfolio managers, traders and chief investment officers all need to become familiar with machine learning techniques. If they don’t they’ll be left behind: traditional data sources like quarterly earnings and GDP figures will become increasingly irrelevant as managers using newer datasets and methods will be able to predict them in advance and to trade ahead of their release.

    At 280 pages, the report is too long to cover in detail, but we’ve pulled out the most salient points for you below.

    How important is Sarah’s post and the report by J.P. Morgan?

    Let put it this way: Sarah’s post is the first business type post I have saved as a complete webpage so I can clean it up and print without all the clutter. This year. Perhaps last year as well. It’s that important.

    Sarah’s post is a quick guide to the languages, talents and tools you will need to start “working for the man.”

    It that catches your interest, then Sarah’s post is pure gold.

    Enjoy!

    PS: I’m still working on a link for the full 280 page report. The switchboard is down for the weekend so I will be following up with J.P. Morgan on Monday next.

    The Quartz Directory of Essential Data (Directory of Directories Is More Accurate)

    June 17th, 2017

    The Quartz Directory of Essential Data

    From the webpage:

    A curated list of useful datasets published by important sources. Please remember that “important” does not mean “correct.” You should vet these data as you would with any human source.

    Switch to the “Data” tab at the bottom of this spreadsheet and use Find (⌘ + F) to search for datasets on a particular topic.

    Note: Just because data is useful, doesn’t mean it’s easy to use. The point of this directory is to help you find data. If you need help accessing or interpreting one of these datasets, please reach out to your friendly Quartz data editor, Chris.

    Slack: @chris
    Email: c@qz.com

    A directory of 77 data directories. The breath of organizing topics, health, trade, government, for example, creates a need for repeated data mining by every new user.

    A low/no-friction method for creating more specific and re-usable directories has remained elusive.

    American Archive of Public Broadcasting

    June 17th, 2017

    American Archive of Public Broadcasting

    From the post:

    An archive worth knowing about: The Library of Congress and Boston’s WGBH have joined forces to create The American Archive of Public Broadcasting and “preserve for posterity the most significant public television and radio programs of the past 60 years.” Right now, they’re overseeing the digitization of approximately 40,000 hours of programs. And already you can start streaming “more than 7,000 historic public radio and television programs.”

    The collection includes local news and public affairs programs, and “programs dealing with education, environmental issues, music, art, literature, dance, poetry, religion, and even filmmaking.” You can browse the complete collection here. Or search the archive here. For more on the archive, read this About page.

    Follow Open Culture on Facebook and Twitter and share intelligent media with your friends. Or better yet, sign up for our daily email and get a daily dose of Open Culture in your inbox.

    If you’d like to support Open Culture and our mission, please consider making a donation to our site. It’s hard to rely 100% on ads, and your contributions will help us provide the best free cultural and educational materials.

    Hopeful someone is spinning cable/television content 24 x 7 to archival storage. The ability to research and document, reliably, patterns in shows, advertisements, news reporting, etc., is more important than any speculative copyright interest.

    Are You A Serious Reader?

    June 17th, 2017

    What does it mean for a journalist today to be a Serious Reader? by Danny Funt.

    From the post:

    BEFORE THE BOOKS ARRIVED, Adam Gopnik, in an effort to be polite, almost contradicted the essential insight of his life. An essayist, critic, and reporter at The New Yorker for the last 31 years, he was asked whether there is an imperative for busy, ambitious journalists to read books seriously—especially with journalism, and not just White House reporting, feeling unusually high-stakes these days—when the doorbell rang in his apartment, a block east of Central Park. He came back with a shipment and said, “It would be,” pausing to think of and lean into the proper word, “brutally unkind and unrealistic to say, Oh, all of you should be reading Stendhal. You’ll be better BuzzFeeders for it.” For the part about the 19th-century French novelist, he switched from his naturally delicate voice to a buffoonish, apparently bookish, baritone.

    Then, as he tore open the packaging of two nonfiction paperbacks (one, obscure research for an assignment on Ernest Hemingway; the other, a new book on Adam Smith, a past essay subject) and sat facing a wall-length bookcase and sliding ladder in his heavenly, all-white living room, Gopnik took that back. His instinct was to avoid sermonizing about books, particularly to colleagues with grueling workloads, because time for books is a privilege of his job. And yet, to achieve such an amazingly prolific life, the truth is he simply read his way here.

    I spoke with a dozen accomplished journalists of various specialties who manage to do their work while reading a phenomenal number of books, about and beyond their latest project. With journalists so fiercely resented after last year’s election for their perceived elitist detachment, it might seem like a bizarre response to double down on something as hermetic as reading—unless you see books as the only way to fully see the world.

    Being well-read is a transcendent achievement similar to training to run 26.2 miles, then showing up for a marathon in New York City and finding 50,000 people there. It is at once superhuman and pedestrian.

    … (emphasis in original)

    A deeply inspirational and instructive essay on serious readers and the benefits that accrue to them. Very much worth two or more slow reads, plus looking up the authors, writers and reporters who are mentioned.

    Earlier this year I began the 2017 Women of Color Reading Challenge. I have not discovered any technical insights into data science or topic maps, but I am gaining, incrementally for sure, a deeper appreciation for how race and gender shapes a point of view.

    Or perhaps more accurately, I am encountering points of view different enough from my own that I recognize them as being different. That in and of itself, the encountering of different views, is one reason I aspire to become a “serious reader.”

    You?

    OpSec Reminder

    June 17th, 2017

    Catalin Cimpanu covers a hack of the DoD’s Enhanced Mobile Satellite Services (EMSS) satellite phone network in 2014 in British Hacker Used Home Internet Connection to Hack the DoD in 2014.

    The details are amusing but the most important part of Cimpanu’s post is a reminder about OpSec:


    In a statement released yesterday, the NCA said it had a solid case against Caffrey because they traced back the attack to his house, and found the stolen data on his computer. Furthermore, officers found an online messaging account linked to the hack on Caffrey’s computer.

    Caffrey’s OpSec stumbles:

    1. Connection traced to his computer (No use of Tor or VPN)
    2. Data found on his hard drive (No use of encryption and/or storage elsewhere)
    3. Online account used in hack operated from his computer (Again, no use of Tor or VPN)

    I’m sure the hack was a clever one but Caffrey’s OpSec was less so. Decidedly less so.

    PS: The National Criminal Agency (NCA) report on Caffrey.

    FOIA Success Prediction

    June 16th, 2017

    Will your FOIA request succeed? This new machine will tell you by Benjamin Mullin.

    From the post:

    Many journalists know the feeling: There could be a cache of documents that might confirm an important story. Your big scoop hinges on one question: Will the government official responsible for the records respond to your FOIA request?

    Now, thanks to a new project from a data storage and analysis company, some of the guesswork has been taken out of that question.

    Want to know the chances your public records request will get rejected? Plug it into FOIA Predictor, a probability analysis web application from Data.World, and it will provide an estimation of your success based on factors including word count, average sentence length and specificity.

    Accuracy?

    Best way to gauge that is experience with your FOIA requests.

    Try starting at MuckRock.com.

    Enjoy!

    Man Bites Dog Or Shoots Member of Congress – Novelty Rules the News

    June 15th, 2017

    The need for “novelty” in a 24 x 7 news cycle, identified by Lewis and Marwick in Megyn Kelly fiasco is one more instance of far right outmaneuvering media comes to the fore in coverage of the recent shooting reported in Capitol Hill shaken by baseball shooting.

    Boiled down to the essentials, James Hodgkinson, 66, of Illinois, who is now dead, wounded “House Majority Whip Steve Scalise (R-La.) and four others in the Washington suburb of Alexandria, Va.,” on June 14, 2017. The medical status of the wounded vary from critical to released.

    That’s all the useful information, aside from identification of the victims, that can be wrung from that story.

    Not terribly useful information, considering Hodgkinson is dead and so not a candidate for a no-fly/sell list.

    But you will read column inch after column inch of non-informative comments by and between special interest groups, “experts,” and even experienced political reporters, on a one-off event.

    A per capita murder rate of 5 per 100,000, works out to 50 murderers per million people. Approximately 136 million people voted in the 2016 election so 50 x 136 means 6800 people who will commit murder this year voted in the 2016 election. (I’m assuming 1 murderer per murder, which isn’t true but it does simplify the calculation.)

    One of those 6800 people (I could have used shootings per capita for an even larger number) shot a member of Congress.

    Will this story, plus or minus hand wringing, accusations, counter-accusations, etc., change your routine tomorrow? Next week? Your plans for this year?

    All I see is novelty and no news.

    You?

    PS: Identifying the “novelty” of this story did not require a large research/fact-checking budget. What it did require is a realization that everyone is talking about the shooting of a member of congress means only “everyone is talking about….” Whether that is just a freakish event or genuine news, requires deeper inquiry.

    One nutter shoots a member of Congress, man bites dog, novelty, not news. Organization succeeds in killing 3rd member of Congress, that looks like news. Pattern, behavior, facts, goals, etc.

    The Media and Far Right Trolls – Mutual Reinforcing Exploitation (MRE)

    June 15th, 2017

    The Columbia Journalism Review (CJR) normally has great headlines but the editors missed a serious opportunity with: Megyn Kelly fiasco is one more instance of far right outmaneuvering media by Becca Lewis and Alice Marwick.

    Lewis and Marwick capture the essential facts and then lose their key insights in order to portray “the media” (whoever that is) as a victim of far right trolls.

    Indeed, research suggests that even debunking falsehoods can reinforce and amplify them. In addition, if a media outlet declines to cover a story that has widely circulated in the far-right and mainstream conservative press, it is accused of lying and promoting a liberal agenda. Far-right subcultures are able to exploit this, using the media to spread ideas and target potential new recruits.

    A number of factors make the mainstream media susceptible to manipulation from the far-right. The cost-cutting measures instituted by traditional newspapers since the 1990s have resulted in less fact-checking and investigative reporting. At the same time, there is a constant need for novelty to fill a 24/7 news cycle driven by cable networks and social media. Many of those outlets have benefited from the new and increased partisanship in the country, meaning there is now more incentive to address memes and half-truths, even if it’s only to shoot them down.

    Did you catch them? The key insights/phrases?

    1. “…declines to cover a story that has widely circulated…it is accused of lying and promoting a liberal agenda…”
    2. “…less fact-checking and investigative reporting…”
    3. “…constant need for novelty to fill a 24/7 news cycle driven by cable networks and social media…”

    Declining to Cover a Story

    “Far-right subcultures” don’t exploit “the media” with just any stories, they are “…widely circulated…” stories. That is “the media” is being exploited over stories it carries out of fear of losing click-through advertising revenue. If a story is “widely circulated,” it attracts reader interest, page-views, click-throughs and hence, is news.

    Less Fact-Checking and Investigative Reporting

    Lewis and Marwick report the decline in fact-checking and investigative reporting as fact but don’t connect it to “the media” carrying stories promoted by “far-right subcultures.” Even if fact-checking and investigative reporting were available in abundance, for every story, given enough public interest (read “…widely circulated…”), is any editor going to decline a story of wide spread interest? (BTW, who chose to reduce fact-checking and investigative reporting? It wasn’t “far-right subcultures” choosing for “the media.”

    Constant Need for Novelty

    The “…constant need for novelty…” and its relationship to producing income for “the media” is best captured by the following dialogue from Santa Claus (1985)


    How can I tell all the people
    about my something special?
    Advertise. Advertise?
    How do I do that?
    In my line,
    television works best.
    Oh, I know! Those little picture
    box thingies? Can we get on those?
    With enough money, a horse in a
    hoop skirt can get on one of those.

    In the context of Lewis and Marwick, far-right subculture news is the “horse in a hoop skirt” of the dialogue. It’s a “horse in a hoop skirt” that is generating page-views and click-through rates.

    The Missed Headline

    I’m partial to my headline but the CJR aims at a more literary audience, I would suggest:

    The Media and Far Right Trolls – Imitating Alessandro and Napoleone

    Alessandro and Napoleone, currently residents of Hell, are described in Canto 32 of the Inferno (Dante, Ciardi translation) as follows:

    When I had stared about me, I looked down
    and at my feet I saw two clamped together
    so tightly that the hair of their heads had grown

    together, “Who are you,” I said, “who lie
    so tightly breast to breast?” They strained their necks
    and when they had raised their heads as if to reply,

    the tears their eyes had managed to contain
    up to that time gushed out, and the cold froze them
    between the lids, sealing them shut again

    tighter than any clamp grips wood to wood,
    and mad with pain, they fell to butting heads
    like billy-goats in a sudden savage mood.

    “The media” now reports its “butting heads” with “far-right subcultures,” generating more noise, in addition to reports of non-fact-checked but click-stream revenue producing right-wing fantasies.

    Tails 3.0 is out (Don’t be a Bank or the NHS, Upgrade Today)

    June 13th, 2017

    Tails 3.0 is out

    From the webpage:

    We are especially proud to present you Tails 3.0, the first version of Tails based on Debian 9 (Stretch). It brings a completely new startup and shutdown experience, a lot of polishing to the desktop, security improvements in depth, and major upgrades to a lot of the included software.

    Debian 9 (Stretch) will be released on June 17. It is the first time that we are releasing a new version of Tails almost at the same time as the version of Debian it is based upon. This was an important objective for us as it is beneficial to both our users and users of Debian in general and strengthens our relationship with upstream:

    • Our users can benefit from the cool changes in Debian earlier.
    • We can detect and fix issues in the new version of Debian while it is still in development so that our work also benefits Debian earlier.

    This release also fixes many security issues and users should upgrade as soon as possible.

    Upgrade today, not tomorrow, not next week. Today!

    Don’t be like banks and NHS and run out-dated software.

    Promote software upgrades by

    • barring civil liability for
    • decriminalizing
    • prohibiting insurance coverage for damages due to

    hacking of out-dated software.

    Management will develop an interest in software upgrade policies.

    Power Outage Data – 15 Years Worth

    June 13th, 2017

    Data: Explore 15 Years Of Power Outages by Jordan Wirfs-Brock.

    From the post:

    This database details 15 years of power outages across the United States, compiled and standardized from annual data available at from the Department of Energy.

    For an explanation of what it means, how it came about, and how we got here, listen to this conversation between Inside Energy Reporter Dan Boyce and Data Journalist Jordan Wirfs-Brock:

    You can also view the data as a Google Spreadsheet (where you can download it as a CSV). This version of the database also includes information about the amount of time it took power to be restored, the demand loss in megawatts, the NERC region, (NERC refers to the North American Electricity Reliability Corporation, formed to ensure the reliability of the grid) and a list of standardized tags.

    The data set isn’t useful for tactical information, the submissions are too general to replicate the events leading up to an outage.

    On the other hand, identifiable outage events, dates, locations, etc., do make recovery of tactical data from grid literature a manageable search problem.

    Enjoy!

    Electric Grid Threats – Squirrels 952 : CrashOverride 1 (maybe)

    June 13th, 2017

    If you are monitoring cyberthreats to the electric grid, compare the teaser document, Crash Override: Analysis of the Treat to Electric Grid Operators from Dragos, Inc. to the stats at CyberSquirrel1.com:

    I say a “teaser” documents because the modules of greatest interest include: “This module was unavailable to Dragos at the time of publication” statements (4 out of 7) and:


    If you are a Dragos, Inc. customer, you will have already received the more concise and technically in-depth intelligence report. It will be accompanied by follow-on reports, and the Dragos team will keep you up-to-date as things evolve.

    If you have a copy of Dragos customer data on CrashOverride, be a dear and publish a diff against this public document.

    Inquiring minds want to know. 😉

    If you are planning to mount/defeat operations against an electric grid, a close study CyberSquirrel1.com cases will be instructive.

    Creating and deploying grid damaging malware remains a challenging task.

    Training an operative to mimic a squirrel, not so much.

    FreeDiscovery

    June 12th, 2017

    FreeDiscovery: Open Source e-Discovery and Information Retrieval Engine

    From the webpage:

    FreeDiscovery is built on top of existing machine learning libraries (scikit-learn) and provides a REST API for information retrieval applications. It aims to benefit existing e-Discovery and information retrieval platforms with a focus on text categorization, semantic search, document clustering, duplicates detection and e-mail threading.

    In addition, FreeDiscovery can be used as Python package and exposes several estimators with a scikit-learn compatible API.

    Python 3.5+ required.

    Homepage has command line examples, with a pointer to: http://freediscovery.io/doc/stable/examples/ for more examples.

    The additional examples use a subset of the TREC 2009 legal collection. Cool!

    I saw this in a tweet by Lynn Cherny today.

    Enjoy!

    The Hack2Win 2017 5K – IP Address 1 July 2017

    June 12th, 2017

    No, an annoying road race, that’s $5K in USD!

    Hack2Win 2017 – The Online Version

    From the post:

    Want to get paid for a vulnerability similar to this one?

    Contact us at: ssd@beyondsecurity.com

    We proud to announce the first online hacking competition!

    The rules are very simple – you need to hack the D-link router (AC1200 / DIR-850L) and you can win up to 5,000$ USD.

    To try and help you win – we bought a D-link DIR-850L device and plugged it to the internet (we will disclose the IP address on 1st of July 2017) for you to try to hack it, while the WAN access is the only point of entry for this device, we will be accepting LAN vulnerabilities as well.

    If you successfully hack it – submit your findings to us ssd[]beyondsecurity.com, you will get paid and we will report the information to the vendor.

    The competition will end on the 1st of September 2017 or if a total of 10,000$ USD was handed out to eligible research.
    … (emphasis in original)

    Great opportunity to learn about the D-link router (AC1200 / DIR-850L) because hacked doesn’t count:


    Usage of any known method of hacking – known methods including anything that we can use Google/Bing/etc to locate – this includes: documented default password (that cannot be changed), known vulnerabilities/security holes (found via Google, exploit-db, etc)

    Makes me think having all the known vulnerabilities of the D-link router (AC1200 / DIR-850L) could be a competitive advantage.

    Topic maps anyone?

    PS: For your convenience, I have packaged up the D-Link files as of Monday, 12 June 2017 for the AC1200, hardware version A1, AC1200-A1.zip.

    If You Can’t See The Data, The Statistics Are False

    June 10th, 2017

    The headline, If You Can’t See The Data, The Statistics Are False is my one line summary of 73.6% of all Statistics are Made Up – How to Interpret Analyst Reports by Mark Suster.

    You should read Suster’s post in full, if for no other reason that his accounts of how statistics are created, that’s right, created, for reports:


    But all of the data projections were so different so I decided to call some of the research companies and ask how they derived their data. I got the analyst who wrote one of the reports on the phone and asked how he got his projections. He must have been about 24. He said, literally, I sh*t you not, “well, my report was due and I didn’t have much time. My boss told me to look at the growth rate average over the past 3 years an increase it by 2% because mobile penetration is increasing.” There you go. As scientific as that.

    I called another agency. They were more scientific. They had interviewed telecom operators, handset manufacturers and corporate buyers. They had come up with a CAGR (compounded annual growth rate) that was 3% higher that the other report, which in a few years makes a huge difference. I grilled the analyst a bit. I said, “So you interviewed the people to get a plausible story line and then just did a simple estimation of the numbers going forward?”

    “Yes. Pretty much”

    Write down the name of your favorite business magazine.

    How many stories have you enjoyed over the past six months with “scientific” statistics like those?

    Suster has five common tips for being a more informed consumer of data. All of which require effort on your part.

    I have only one, which requires only reading on your part:

    Can you see the data for the statistic? By that I mean is the original data, its collection method, who collected it, method of collection, when it was collected, etc., available to the reader?

    If not, the statistic is either false or inflated.

    The test I suggest is applicable at the point where you encounter the statistic. It puts the burden on the author who wants their statistic to be credited, to empower the user to evaluate their statistic.

    Imagine the data analyst story where the growth rate statistic had this footnote:

    1. Averaged growth rate over past three (3) years and added 2% at direction of management.

    It reports the same statistic but also warns the reader the result is a management fantasy. Might be right, might be wrong.

    Patronize publications with statistics + underlying data. Authors and publishers will get the idea soon enough.

    Real Talk on Reality (Knowledge Gap on Leaking)

    June 9th, 2017

    Real Talk on Reality : Leaking is high risk by the grugq.

    From the post:

    On June 5th The Intercept released an article based on an anonymously leaked Top Secret NSA document. The article was about one aspect of the Russian cyber campaign against the 2016 US election — the targeting of election device manufacturers. The relevance of this aspect of the Russian operation is not exactly clear, but we’ll address that in a separate post because… just hours after The Intercept’s article went live the US Department of Justice released an affidavit (and search warrant) covering the arrest of Reality Winner — the alleged leaker. Let’s look at that!

    You could teach a short course on leaking from this one post but there is one “meta” issue that merits your attention.

    The failures of Reality Winner and the Intercept signal users need educating in the art of information leaking.

    With wide spread tracking of web browsers, training on information leaking needs to be pushed to users. It would stand out if one member of the military requested and was sent an email lesson on leaking. An email that went to everyone in a particular command, not so much.

    Public Service Announcements (PSAs) in web zines, as ads, etc. with only the barest of tips, is another mechanism to consider.

    If you are very creative, perhaps “Mr. Bill” claymation episodes with one principle of leaking each? Need to be funny enough that viewing/sharing isn’t suspicious.

    Other suggestions?

    Raw FBI Uniform Crime Report (UCR) Files for 2015 (NICAR Database Library)

    June 9th, 2017

    IRE & NICAR to freely publish unprocessed data by Charles Minshew.

    From the post:

    Inspired by our members, IRE is pleased to announce the first release of raw, unprocessed data from the NICAR Database Library.

    The contents of the FBI’s Uniform Crime Report (UCR) master file for 2015 are now available for free download on our website. The package contains the original fixed-width files, data dictionaries for the tables as well as the FBI’s UCR user guide. We are planning subsequent releases of other raw data that is not readily available online.

    The yearly data from the FBI details arrest and offense numbers for police agencies across the United States. If you download this unprocessed data, expect to do some work to get it in a useable format. The data is fixed-width, across multiple tables, contains many records on a single row that need to be unpacked and in some cases decoded, before being cleaned and imported for use in programs like Excel or your favorite database manager. Not up to the task? We do all of this work in the version of the data that we will soon have for sale in the Database Library.

    I have peeked at the data and documentation files and “raw” is the correct term.

    Think of it as great exercise for when an already cleaned and formatted data set isn’t available.

    More to follow on processing this data set.

    (Legal) Office of Personnel Management Data!

    June 9th, 2017

    We’re Sharing A Vast Trove Of Federal Payroll Records by Jeremy Singer-Vine.

    From the post:

    Today, BuzzFeed News is sharing an enormous dataset — one that sheds light on four decades of the United States’ federal payroll.

    The dataset contains hundreds of millions of rows and stretches all the way back to 1973. It provides salary, title, and demographic details about millions of U.S. government employees, as well as their migrations into, out of, and through the federal bureaucracy. In many cases, the data also contains employees’ names.

    We obtained the information — nearly 30 gigabytes of it — from the U.S. Office of Personnel Management, via the Freedom of Information Act (FOIA). Now, we’re sharing it with the public. You can download it for free on the Internet Archive.

    This is the first time, it seems, that such extensive federal payroll data is freely available online, in bulk. (The Asbury Park Press and FedsDataCenter.com both publish searchable databases. They’re great for browsing, but don’t let you download the data.)

    We hope that policy wonks, sociologists, statisticians, fellow journalists — or anyone else, for that matter — find the data useful.

    We obtained the information through two Freedom of Information Act requests to OPM. The first chunk of data, provided in response to a request filed in September 2014, covers late 1973 through mid-2014. The second, provided in response to a request filed in December 2015, covers late 2014 through late 2016. We have submitted a third request, pending with the agency, to update the data further.

    Between our first and second requests, OPM announced it had suffered a massive computer hack. As a result, the agency told us, it would no longer release certain information, including the employee “pseudo identifier” that had previously disambiguated employees with common names.

    What a great data release! Kudos and thanks to BuzzFeed News!

    If you need the “pseudo identifiers” for the second or following releases and/or data for the employees withheld (generally the more interesting ones), consult data from the massive computer hack.

    Or obtain the excluded data directly from the Office of Personnel Management without permission.

    Enjoy!

    Open Data = Loss of Bureaucratic Power

    June 9th, 2017

    James Comey’s leaked memos about meetings with President Trump illustrates one reason for the lack of progress on open data reported in FOIA This! The Depressing State of Open Data by Toby McIntosh.

    From Former DOJ Official on Comey Leak: ‘Standard Operating Procedure’ Among Bureaucrats:


    On “Fox & Friends” today, J. Christian Adams said the leak of the memos by Comey was in line with “standard operating procedure” among Beltway bureaucrats.

    “[They] were using the media, using confidential information to advance attacks on the President of the United States. That’s what they do,” said Adams, adding he saw it go on at DOJ.

    Access to information is one locus of bureaucratic power, which makes the story in FOIA This! The Depressing State of Open Data a non-surprise:

    In our latest look at FOIA around the world, we examine the state of open data sets. According to the new report by the World Wide Web Foundation, the news is not good.

    “The number of global truly open datasets remains at a standstill,” according to the group’s researchers, who say that only seven percent of government data is fully open.

    The findings come in the fourth edition of the Open Data Barometer, an annual assessment which was enlarged this year to include 1,725 datasets from 15 different sectors across 115 countries. The report summarizes:

    Only seven governments include a statement on open data by default in their current policies. Furthermore, we found that only 7 percent of the data is fully open, only one of every two datasets is machine readable and only one in four datasets has an open license. While more data has become available in a machine-readable format and under an open license since the first edition of the Barometer, the number of global truly open datasets remains at a standstill.

    Based on the detailed country-by-country rankings, the report says some countries continue to be leaders on open data, a few have stepped up their game, but some have slipped backwards.

    With open data efforts at a standstill and/or sliding backwards, waiting for bureaucrats to voluntarily relinquish power is a non-starter.

    There are other options.

    Need I mention the Office of Personnel Management hack? The highly touted but apparently fundamentally vulnerable NSA?

    If you need a list of cyber-vulnerable U.S. government agencies, see: A-Z Index of U.S. Government Departments and Agencies.

    You can:

    • wait for bureaucrats to abase themselves,
    • post how government “…ought to be transparent and accountable…”
    • echo opinions of others on calling for open data,

    or, help yourself to government collected, generated, or produced data.

    Which one do you think is more effective?