Archive for January, 2018

The vector algebra war: a historical perspective [Semantic Confusion in Engineering and Physics]

Tuesday, January 23rd, 2018

The vector algebra war: a historical perspective by James M. Chappell, Azhar Iqbal, John G. Hartnett, Derek Abbott.

Abstract:

There are a wide variety of different vector formalisms currently utilized in engineering and physics. For example, Gibbs’ three-vectors, Minkowski four-vectors, complex spinors in quantum mechanics, quaternions used to describe rigid body rotations and vectors defined in Clifford geometric algebra. With such a range of vector formalisms in use, it thus appears that there is as yet no general agreement on a vector formalism suitable for science as a whole. This is surprising, in that, one of the primary goals of nineteenth century science was to suitably describe vectors in three-dimensional space. This situation has also had the unfortunate consequence of fragmenting knowledge across many disciplines, and requiring a significant amount of time and effort in learning the various formalisms. We thus historically review the development of our various vector systems and conclude that Clifford’s multivectors best fulfills the goal of describing vectorial quantities in three dimensions and providing a unified vector system for science.

An image from the paper captures the “descent of the various vector systems:”

The authors contend for use of Clifford’s multivectors over the other vector formalisms described.

Assuming Clifford’s multivectors displace all other systems in use, the authors fail to answer how readers will access the present and past legacy of materials in other formalisms?

If the goal is to eliminate “fragmenting knowledge across many disciplines, and requiring a significant amount of time and effort in learning the various formalisms,” that fails in the absence of a mechanism to access existing materials using the Clifford’s multivector formalism.

Topic maps anyone?

Stop, Stop, Stop All the Patching, Give Intel Time to Breath

Tuesday, January 23rd, 2018

Root Cause of Reboot Issue Identified; Updated Guidance for Customers and Partners by Navin Shenoy.

From the post:

As we start the week, I want to provide an update on the reboot issues we reported Jan. 11. We have now identified the root cause for Broadwell and Haswell platforms, and made good progress in developing a solution to address it. Over the weekend, we began rolling out an early version of the updated solution to industry partners for testing, and we will make a final release available once that testing has been completed.

Based on this, we are updating our guidance for customers and partners:

  • We recommend that OEMs, cloud service providers, system manufacturers, software vendors and end users stop deployment of current versions, as they may introduce higher than expected reboots and other unpredictable system behavior. For the full list of platforms, see the Intel.com Security Center site.
  • We ask that our industry partners focus efforts on testing early versions of the updated solution so we can accelerate its release. We expect to share more details on timing later this week.
  • We continue to urge all customers to vigilantly maintain security best practice and for consumers to keep systems up-to-date.

I apologize for any disruption this change in guidance may cause. The security of our products is critical for Intel, our customers and partners, and for me, personally. I assure you we are working around the clock to ensure we are addressing these issues.

I will keep you updated as we learn more and thank you for your patience.

Essence of Shenoy’s advice:

…OEMs, cloud service providers, system manufacturers, software vendors and end users stop deployment of current versions, as they may introduce higher than expected reboots and other unpredictable system behavior.

Or better:

Patching an Intel machine makes it worse.

That’s hardly news.

Unverifiable firmware/code + unverifiable patch = unverifiable firmware/code + patch. What part of that seems unclear?

WebGoat (Advantage over OPM)

Monday, January 22nd, 2018

Deliberately Insecure Web Application: OWASP WebGoat

From the webpage:

WebGoat is a deliberately insecure web application maintained by OWASP designed to teach web application security lessons. You can install and practice with WebGoat in either J2EE or WebGoat for .Net in ASP.NET. In each lesson, users must demonstrate their understanding of a security issue by exploiting a real vulnerability in the WebGoat applications.

WebGoat for J2EE is written in Java and therefore installs on any platform with a Java virtual machine. Once deployed, the user can go through the lessons and track their progress with the scorecard.

WebGoat’s scorecards are a feature not found when hacking Office of Personnel Management (OPM). Hacks of the OPM are reported by its inspector general and more generally in the computer security press.

EFF Investigates Dark Caracal (But Why?)

Monday, January 22nd, 2018

Someone is touting a mobile, PC spyware platform called Dark Caracal to governments by Iain Thomson.

From the post:

An investigation by the Electronic Frontier Foundation and security biz Lookout has uncovered Dark Caracal, a surveillance-toolkit-for-hire that has been used to suck huge amounts of data from Android mobiles and Windows desktop PCs around the world.

Dark Caracal [PDF] appears to be controlled from the Lebanon General Directorate of General Security in Beirut – an intelligence agency – and has slurped hundreds of gigabytes of information from devices. It shares its backend infrastructure with another state-sponsored surveillance campaign, Operation Manul, which the EFF claims was operated by the Kazakhstan government last year.

Crucially, it appears someone is renting out the Dark Caracal spyware platform to nation-state snoops.

The EFF could be spending its time and resources duplicating Dark Caracal for the average citizen.

Instead the EFF continues its quixotic pursuit of governmental wrong-doers. I say “quixotic” because those pilloried by the EFF, such as the NSA, never change their behavior. Unlawful conduct, including surveillance continues.

But don’t take my word for it, the NSA admits that it deletes data it promised under court order to preserve: NSA deleted surveillance data it pledged to preserve. No consequences. Just like there were no consequences when Snowden revealed widespread and illegal surveillance by the NSA.

So you have to wonder, if investigating and suing governmental intelligence organizations produces no tangible results, why is the EFF pursuing them?

If the average citizen had the equivalent of Dark Caracal at their disposal, say as desktop software, the ability of governments like Lebanon, Kazakhstan, and others, to hide their crimes, would be greatly reduced.

Exposure is no guarantee of accountability and/or punishment, but the wack-a-mole strategy of the EFF hasn’t produced transparency or consequences.

Don Knuth Needs Your Help

Monday, January 22nd, 2018

Donald Knuth Turns 80, Seeks Problem-Solvers For TAOCP

From the post:

An anonymous reader writes:

When 24-year-old Donald Knuth began writing The Art of Computer Programming, he had no idea that he’d still be working on it 56 years later. This month he also celebrated his 80th birthday in Sweden with the world premier of Knuth’s Fantasia Apocalyptica, a multimedia work for pipe organ and video based on the bible’s Book of Revelations, which Knuth describes as “50 years in the making.”

But Knuth also points to the recent publication of “one of the most important sections of The Art of Computer Programming” in preliminary paperback form: Volume 4, Fascicle 6: Satisfiability. (“Given a Boolean function, can its variables be set to at least one pattern of 0s and 1 that will make the function true?”)

Here’s an excerpt from its back cover:

Revolutionary methods for solving such problems emerged at the beginning of the twenty-first century, and they’ve led to game-changing applications in industry. These so-called “SAT solvers” can now routinely find solutions to practical problems that involve millions of variables and were thought until very recently to be hopelessly difficult.

“in several noteworthy cases, nobody has yet pointed out any errors…” Knuth writes on his site, adding “I fear that the most probable hypothesis is that nobody has been sufficiently motivated to check these things out carefully as yet.” He’s uncomfortable printing a hardcover edition that hasn’t been fully vetted, and “I would like to enter here a plea for some readers to tell me explicitly, ‘Dear Don, I have read exercise N and its answer very carefully, and I believe that it is 100% correct,'” where N is one of the exercises listed on his web site.

Elsewhere he writes that two “pre-fascicles” — 5a and 5B — are also available for alpha-testing. “I’ve put them online primarily so that experts in the field can check the contents before I inflict them on a wider audience. But if you want to help debug them, please go right ahead.”

Do you have some other leisure project for 2018 that is more important?

😉

A “no one saw” It Coming Memory Hack (Schneider Electric)

Sunday, January 21st, 2018

Schneider Electric: TRITON/TRISIS Attack Used 0-Day Flaw in its Safety Controller System, and a RAT by Kelly Jackson Higgins.

Industrial control systems giant Schneider Electric discovered a zero-day privilege-escalation vulnerability in its Triconex Tricon safety-controller firmware which helped allow sophisticated hackers to wrest control of the emergency shutdown system in a targeted attack on one of its customers.

Researchers at Schneider also found a remote access Trojan (RAT) in the so-called TRITON/TRISIS malware that they say represents the first-ever RAT to infect safety-instrumented systems (SIS) equipment. Industrial sites such as oil and gas and water utilities typically run multiple SISes to independently monitor critical systems to ensure they are operating within acceptable safety thresholds, and when they are not, the SIS automatically shuts them down.

Schneider here today provided the first details of its investigation of the recently revealed TRITON/TRISIS attack that targeted a specific SIS used by one of its industrial customers. Two of the customer’s SIS controllers entered a failed safe mode that shut down the industrial process and ultimately led to the discovery of the malware.

Teams of researchers from Dragos and FireEye’s Mandiant last month each published their own analysis of the malware used in the attack, noting that the smoking gun – a payload that would execute a cyber-physical attack – had not been found.

Perhaps the most amusing part of the post is Schneider’s attribution of near super-human capabilities to the hackers:


Schneider’s controller is based on proprietary hardware that runs on a PowerPC processor. “We run our own proprietary operating system on top of that, and that OS is not known to the public. So the research required to pull this [attack] off was substantial,” including reverse-engineering it, Forney says. “This bears resemblance to a nation-state, someone who was highly financed.”

The attackers also had knowledge of Schneider’s proprietary protocol for Tricon, which also is undocumented publicly, and used it to create their own library for sending commands to interact with Tricon, he says.

Alternatives to a nation-state:

  • 15 year old working with junked Schneider hardware and the Schneider help desk
  • Disgruntled Schneider Electric employee or their children
  • Malware planted to force a quick and insecure patch being pushed out

I discount all the security chest beating by vendors. Their goal: continued use of their products.

Are your Schneider controllers are air-gapped and audited?

Bludgeoning Bootloader Bugs:… (Rebecca “.bx” Shapiro – job hunting)

Sunday, January 21st, 2018

Bludgeoning Bootloader Bugs: No write left behind by Rebecca “.bx” Shapiro.

Slides from ShmooCon 2018.

If you are new to bootloading, consider Shapiro’s two blog post on the topic:

A History of Linux Kernel Module Signing

A Toure of Bootloading

both from 2015, and her resources page.

Aside from the slides, her most current work is found at: https://github.com/bx/bootloader_instrumentation_suite.

ShmooCon 2018 just finished earlier today but check for the ShmooCon archives to see a video of Sharpio’s presentation.

I don’t normally post shout-outs for people seeking employment but Shario does impressive work and she is sharing it with the broader community. Unlike some governments and corporations we could all name. Pass her name and details along.

Are You Smarter Than A 15 Year Old?

Sunday, January 21st, 2018

15-Year-Old Schoolboy Posed as CIA Chief to Hack Highly Sensitive Information by Mohit Kumar.

From the post:

A notorious pro-Palestinian hacking group behind a series of embarrassing hacks against United States intelligence officials and leaked the personal details of 20,000 FBI agents, 9,000 Department of Homeland Security officers, and some number of DoJ staffers in 2015.

Believe or not, the leader of this hacking group was just 15-years-old when he used “social engineering” to impersonate CIA director and unauthorisedly access highly sensitive information from his Leicestershire home, revealed during a court hearing on Tuesday.

Kane Gamble, now 18-year-old, the British teenager hacker targeted then CIA director John Brennan, Director of National Intelligence James Clapper, Secretary of Homeland Security Jeh Johnson, FBI deputy director Mark Giuliano, as well as other senior FBI figures.

Between June 2015 and February 2016, Gamble posed as Brennan and tricked call centre and helpline staff into giving away broadband and cable passwords, using which the team also gained access to plans for intelligence operations in Afghanistan and Iran.

Gamble said he targeted the US government because he was “getting more and more annoyed about how corrupt and cold-blooded the US Government” was and “decided to do something about it.”

Your questions:

1. Are You Smarter Than A 15 Year Old?

2. Are You Annoyed by a Corrupt and Cold-blooded Government?

3. Have You Decided to do Something about It?

Yeses for #1 and #2 number in the hundreds of millions.

The lack of governments hemorrhaging data worldwide is silent proof that #3 is a very small number.

What’s your answer to #3? (Don’t post it in the comments.)

Collaborative Journalism Projects (Collaboration Opportunities for the Public?)

Sunday, January 21st, 2018

Database: Search, sort and learn about collaborative journalism projects from around the world

From the post:

Over the past several months, the Center for Cooperative Media has been collecting, organizing and standardizing information about dozens and dozens of collaborative journalism projects around the world. Our goal was to build a database that could serve as a hub of information about collaborative journalism, something that would be useful to journalists, scholars, media executives, funders and others seeking information on the how such projects work, who’s doing them and what they’re covering.

We worked with Melody Kramer to build the first iteration of the database, which you can find below. It is a work in progress, and you’ll see that it’s still incomplete as we continue to add to it. So far for this soft launch, we’ve input information on 94 news collaborations between more than 800 organizations and 151 people.

But this is just the beginning. We need your help.

Is your project listed? If not, tell us about it. Is the information about your project incorrect? Let us know; email Melody at melodykramer@gmail.com. Are there fields missing you’d like to see us add, or other ways to sort that you think would be useful? Email the Center at info@centerforcooperativemedia.org. We’re using Airtable right now, but are still considering what the best way will be to display the treasure trove of data we’re collecting.

Some notes on navigating the database: First, it’s easier to see the whole picture on desktop than on mobile, although both work well. To see the full record for any particular project, click on the little blue arrow that appears to the left of the project name when you hover over it. You can sort by column as well.

Collaborative journalism is a great way to avoid duplication of effort and to find strength in numbers. This resource is a big step towards encouraging journalist to journalist collaboration.

Opportunities for members of the public to collaborate with journalists?

Suggestions?

What Can Reverse Engineering Do For You?

Thursday, January 18th, 2018

From the description:

Reverse engineering is a core skill in the information security space, but it doesn’t necessarily get the wide spread exposure that other skills do even though it can help you with your security challenges. We will talk about getting you quickly up and running with a reverse engineering starter pack and explore some interesting x86 assembly code patterns you may encounter in the wild. These patterns are essentially common malware evasion techniques that include packing, analysis evasion, shellcode execution, and crypto usages. It is not always easy recognizing when a technique is used. This talk will begin by defining the each technique as a pattern and then the approaches for reading or bypassing the evasion.

Technical keynote at Shellcon 2017 by Amanda Rousseau (@malwareunicorn).

Even if you’re not interested in reverse engineering, watch the video to see a true master describing their craft.

The “patterns” she speaks of are what I would call “subject identity” in a topic maps context.

TLDR pages (man pages by example)

Thursday, January 18th, 2018

TLDR pages

From the webpage:

The TLDR pages are a community effort to simplify the beloved man pages with practical examples.

The TLDR Pages Book (pdf), has 274 pages!

If you have ever hunted through a man page for an example, you will appreciate TLDR pages!

I first saw this in a tweet by Christophe Lalanne.

Launch of DECLASSIFIED

Thursday, January 18th, 2018

Launch of DECLASSIFIED by Mark Curtis.

From the post:

I am about to publish on this site hundreds of UK declassified documents and articles on British foreign policy towards various countries. This will be the first time such a collection has been brought together online.

The declassified documents, mainly from the UK’s National Archives, reveal British policy-makers actual concerns and priorities from the 1940s until the present day, from the ‘horse’s mouth’, as it were: these files are often revelatory and provide an antidote to the often misleading and false mainstream media (and academic) coverage of Britain’s past and present foreign policies.

The documents include my collections of files, accumulated over many years and used as a basis for several books, on episodes such as the UK’s covert war in Yemen in the 1960s, the UK’s support for the Pinochet coup in Chile, the UK’s ‘constitutional coup’ in Guyana, the covert wars in Indonesia in the 1950s, the UK’s backing for wars against the Iraqi Kurds in the 1960s, the coup in Oman in 1970, support for the Idi Amin takeover in Uganda and many others policies since 1945.

But the collection also brings together many other declassified documents by listing dozens of media articles that have been written on the release of declassified files over the years. It also points to some US document releases from the US National Security Archive.

A new resource for those of you tracking the antics of the small and the silly through the 20th and into the 21st century.

I say the “small and the silly” because there’s no doubt that similar machinations have been part and parcel of government toady lives so long as there have been governments. Despite the exaggerated sense of their own importance and the history making importance of their efforts, almost none of their names survive in the ancient historical record.

With the progress of time, the same fate awaits the most recent and current crop of government familiars. While we wait for them to pass into obscurity, you can amuse yourself by outing them and tracking their activities.

This new archive may assist you in your efforts.

Be sure to keep topic maps in mind for mapping between disjoint vocabularies and collections of documents as well as accounts of events.

For Some Definition of “Read” and “Answer” – MS Clickbait

Thursday, January 18th, 2018

Microsoft creates AI that can read a document and answer questions about it as well as a person by Allison Linn.

From the post:

It’s a major milestone in the push to have search engines such as Bing and intelligent assistants such as Cortana interact with people and provide information in more natural ways, much like people communicate with each other.

A team at Microsoft Research Asia reached the human parity milestone using the Stanford Question Answering Dataset, known among researchers as SQuAD. It’s a machine reading comprehension dataset that is made up of questions about a set of Wikipedia articles.

According to the SQuAD leaderboard, on Jan. 3, Microsoft submitted a model that reached the score of 82.650 on the exact match portion. The human performance on the same set of questions and answers is 82.304. On Jan. 5, researchers with the Chinese e-commerce company Alibaba submitted a score of 82.440, also about the same as a human.

With machine reading comprehension, researchers say computers also would be able to quickly parse through information found in books and documents and provide people with the information they need most in an easily understandable way.

That would let drivers more easily find the answer they need in a dense car manual, saving time and effort in tense or difficult situations.

These tools also could let doctors, lawyers and other experts more quickly get through the drudgery of things like reading through large documents for specific medical findings or rarified legal precedent. The technology would augment their work and leave them with more time to apply the knowledge to focus on treating patients or formulating legal opinions.

Wait, wait! If you read the details about SQuAD, you realize how far Microsoft (or anyone else) is from “…reading through large documents for specific medical findings or rarified legal precedent….”

What is the SQuAD test?

Stanford Question Answering Dataset (SQuAD) is a new reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage. With 100,000+ question-answer pairs on 500+ articles, SQuAD is significantly larger than previous reading comprehension datasets.

Not to take anything away from Microsoft Research Asia or the creators of SQuAD, but “…the answer to every question is a segment of text, or span, from the corresponding reading passage.” is a long way from synthesizing an answer from a long legal document.

The first hurdle is asking a question that can be scored against every “…segment of text, or span…” such that a relevant snippet of text can be found.

The second hurdle is the process of scoring snippets of text in order to retrieve the most useful one. That’s a mechanical process, not one that depends on the semantics of the underlying question or text.

There are other hurdles but those two suffice to show there is no “reading and answering questions” in the same sense we would apply to any human reader.

Click-bait headlines don’t serve the cause of advocating more AI research. On the contrary, a close reading of alleged progress leads to disappointment.

Tips for Entering the Penetration Testing Field

Tuesday, January 16th, 2018

Tips for Entering the Penetration Testing Field by Ed Skoudis.

From the post:

It’s an exciting time to be a professional penetration tester. As malicious computer attackers amp up the number and magnitude of their breaches, the information security industry needs an enormous amount of help in proactively finding and resolving vulnerabilities. Penetration testers who are able to identify flaws, understand them, and demonstrate their business impact through careful exploitation are an important piece of the defensive puzzle.

In the courses I teach on penetration testing, I’m frequently asked about how someone can land their first job in the field after they’ve acquired the appropriate technical skills and gained a good understanding of methodologies. Also, over the past decade, I’ve counseled a lot of my friends and acquaintances as they’ve moved into various penetration testing jobs. Although there are many different paths to pen test nirvana, let’s zoom into three of the most promising. It’s worth noting that these three paths aren’t mutually exclusive either. I know many people who started on the first path, jumped to the second mid-way, and later found themselves on path #3. Or, you can jumble them up in arbitrary order.

Career advice and a great listing of resources for any aspiring penetration “tester.”

If you do penetration work for a government, you may be a national hero. If you do commercial penetration testing, not a national hero but not on the run either. If you do non-sanctioned penetration work, life is uncertain. Same skill, same activity. Go figure.

Updated Hacking Challenge Site Links (Signatures as Subject Identifiers)

Tuesday, January 16th, 2018

Updated Hacking Challenge Site Links

From the post:

These are 70+ sites which offer free challenges for hackers to practice their skills. Some are web-based challenges, some require VPN access to private labs and some are downloadable ISOs and VMs. I’ve tested the links at the time of this posting and they work.

Most of them are at https://www.wechall.net but if I missed a few they will be there.

WeChall is a portal to hacking challenges where you can link your account to all the sites and get ranked. I’ve been a member since 2/2/14.

Internally to the site they have challenges there as well so make sure you check them out!

To find CTFs go to https://www.ctftime.org

On Twitter in the search field type CTF

Google is also your friend.

I’d rephrase “Google is also your friend.” to “Sometimes Google allows you to find ….”

When visiting hacker or CTF (capture the flag) sites, use the same levels of security as any government or other known hostile site.

What is an exploit or vulnerability signature if not a subject identifier?

Data Science Bowl 2018 – Spot Nuclei. Speed Cures.

Tuesday, January 16th, 2018

Spot Nuclei. Speed Cures.

From the webpage:

The 2018 Data Science Bowl offers our most ambitious mission yet: Create an algorithm to automate nucleus detection and unlock faster cures.

Compete on Kaggle

Three months. $100,000.

Even if you “lose,” think of the experience you will gain. No losers.

Enjoy!

PS: Just thinking outloud but if:


This dataset contains a large number of segmented nuclei images. The images were acquired under a variety of conditions and vary in the cell type, magnification, and imaging modality (brightfield vs. fluorescence). The dataset is designed to challenge an algorithm’s ability to generalize across these variations.

isn’t the ability to generalize, with lower performance a downside?

Why not use the best algorithm for a specified set of data conditions, “merging” that algorithm so to speak, so that scientists always have the best algorithm for their specific data set.

So outside the contest, perhaps recognizing the conditions of the images are the most important subjects and they should be matched to the best conditions for particular algorithms.

Anyone interested in collaborating on a topic map entry?

The Art & Science Factory

Monday, January 15th, 2018

The Art & Science Factory

From the about page:


The Art & Science Factory was started in 2008 by Dr. Brian Castellani to organize the various artistic, scientific and educational endeavours he and different collaborators have engaged in to address the growing complexity of global life.

Dr. Castellani is a complexity scientist/artist.

He is internationally recognized for his expertise in complexity science and its history and for his development of the SACS Toolkit, a case-based, mixed-methods, computationally-grounded framework for modeling complex systems. Dr. Castellani’s main area of study is applying complexity science and the SACS Toolkit to various topics in health and healthcare, including community health and medical education.

In terms of visual complexity, Castellani is recognized around the world for his creation of the complexity map, which can be found on Wikipedia and on this website. He is also recognized for his blog on “all things complexity science and art,” the Sociology and Complexity Science Blog.
… (emphasis in original)

Dr. Castellani apparently dislikes searchable text, the about page quote being hand transcribed from an image that is that page.

Unexpectedly, the SACS toolkit, etc. were not hyperlinks so: SACS toolkit, complexity map, and Sociology and Complexity Science Blog, respectively.

2018 Map of the Complexity Sciences

Monday, January 15th, 2018

2018 Map of the Complexity Sciences by Brian Castellani.

At full screen this map barely displays on my 22″ monitor so I’m not going to mangle it into something smaller for this post.

The reading instructions read in part:


Also, in order to present some type of organizational structure, the history of the complexity sciences is developed along the field’s five major intellectual traditions: dynamical systems theory (purple), systems science (blue, complex systems theory (yellow, cybernetics (gray) and artificial intelligence (orange. Again, the fit is not exact (and sometimes even somewhat forced); but it is sufficient to help those new to the field gain a sense of its evolving history.

The subject and person nodes are all hyperlinks to additional resources!

Enjoy!

Fun, Frustration, Curiosity, Murderous Rage – mimic

Monday, January 15th, 2018

mimic

From the webpage:


There are many more characters in the Unicode character set that look, to some extent or another, like others – homoglyphs. Mimic substitutes common ASCII characters for obscure homoglyphs.

Fun games to play with mimic:

  • Pipe some source code through and see if you can find all of the problems
  • Pipe someone else’s source code through without telling them
  • Be fired, and then killed

I can attest to the murderous rage from experience. There was a browser-based SGML parser that would barf on the presence of an extra whitespace (space I think) in the SGML declaration. One file worked, another with the “same” declaration did not.

Only by printing and comparing the files (this was on Windoze machines) was the errant space discovered.

Enjoy!

Tactical Advantage: I don’t have to know everything, just more than you.

Friday, January 12th, 2018

Mapping the Ghostly Traces of Abandoned Railroads – An interactive, crowdsourced atlas plots vanished transit routes by Jessica Leigh Hester.

From the post:

In the 1830s, a rail line linked Elkton, Maryland, with New Castle, Delaware, shortening the time it took to shuttle people and goods between the Delaware River and Chesapeake Bay. Today you’d never know it had been there. A photograph snapped years after the line had been abandoned captures a stone culvert halfway to collapse into the creek it spanned. Another image, captured even later, shows a relict trail that looks more like a footpath than a railroad right-of-way. The compacted dirt seems wide enough to accommodate no more than two pairs of shoes at a time.

The scar of the New Castle and Frenchtown Railroad barely whispers of the railcars that once barreled through. That’s what earned it a place on Andrew Grigg’s map.

For the past two years, Grigg, a transit enthusiast, has been building an interactive atlas of abandoned railroads. Using Google Maps, he lays the ghostly silhouettes of the lines over modern aerial imagery. His recreation of the 16-mile New Castle and Frenchtown Line crosses state lines and modern highways, marches through suburban housing developments, and passes near a cineplex, a Walmart, and a paintball field.
… (emphasis in original)

Great example of a project capturing travel paths that may be omitted from modern maps. Being omitted from a map doesn’t impact the potential use of an abandoned railway as an alternative to other routes.

Be sure to check ahead of time but digital navigation systems may have omitted discontinued railroads.

The same advantage obtains if you know which underpasses flood after a heavy rain, which streets are impassable, when trains are passing over certain crossings, all manner of information that isn’t captured by standard digital navigation systems.

What information can you add to a map that isn’t known to or thought to be important by others?

Computus manuscripts and where to find them

Friday, January 12th, 2018

Computus manuscripts and where to find them

An interactive map of computus manuscripts by place of preservation.

A poor screen shot:

From the about page:

Welcome to the bèta version of Computus.lat, an online platform for teaching and research in studies of the medieval science of computus. Computus.lat consists of a catalogue of computistical manuscripts and computistical objects, a bibliography, and a number of resources (such as a Mirador-viewer and data visualizations).

Follow @computuslat on Twitter for updates.

Kind regards,
Thom Snijders

Over 500 manuscripts online!

Oh, Computus:

Computus in its simplest definition is the art of ascertaining time by the course of the sun and the moon. This art could be and was a theoretical science, such as that explored by Johannes of Sacrobosco in his De sphera–a science based on arithmetical calculations and astronomical measurements derived from use of the astrolabe or, increasingly by the end of the 13th century, the solar quadrant. In the context of the present exhibit, however, computus is understood mainly as the practical application of these calculations. To reckon time in the broadest sense and to determine the date of Easter became one and the same effort. And for most people, understanding the problem of correct alignment of solar, lunar, yearly and weekly cycles to arrive at the date of Easter was simply reduced to a question of “when?” rather than “why?”. The result was a profusion of calculation formulae, charts and memory devices.

Accompanying these handy mechanisms for determining the date of Easter were many other bits of calendrical information that faith, prejudice and experience leveled to the same degree of acceptance and necessity: the lucky and the unlucky days for travel or for eating goose; the prognostications of rain or wind; the times for bloodletting; the signs of the zodiac; the phases of the moon; the number of hours of sunshine in a given day; the feasts of the saints; the Sundays in a perpetual calendar.

Take heed of the line: “The result was a profusion of calculation formulae, charts and memory devices.” (emphasis added)

And you think we have trouble with daylight savings time and time zones. 😉

Pass this along to manuscript scholars, liturgy buffs, historians, anyone interested in out diverse religious history.

A [Selective] Field Guide to “Fake News” and other Information Disorders

Friday, January 12th, 2018

New guide helps journalists, researchers investigate misinformation, memes and trolling by Liliana Bounegru and Jonathan Gray.

Recent scandals about the role of social media in key political events in the US, UK and other European countries over the past couple of years have underscored the need to understand the interactions between digital platforms, misleading information and propaganda, and their influence on collective life in democracies.

In response to this, the Public Data Lab and First Draft collaborated last year to develop a free, open-access guide to help students, journalists and researchers investigate misleading and viral content, memes and trolling practices online.

Released today, the five chapters of the guide describe a series of research protocols or “recipes” that can be used to trace trolling practices, the ways false viral news and memes circulate online, and the commercial underpinnings of problematic content. Each recipe provides an accessible overview of the key steps, methods, techniques and datasets used.

The guide will be most useful to digitally savvy and social media literate students, journalists and researchers. However, the recipes range from easy formulae that can be executed without much technical knowledge other than a working understanding of tools such as BuzzSumo and the CrowdTangle browser extension, to ones that draw on more advanced computational techniques. Where possible, we try to offer the recipes in both variants.

Download the guide at the Public Data Lab’s website.

The techniques in the guide are fascinating but the underlying definition of “fake news” is problematic:


The guide explores the notion that fake news is not just another type of content that circulates online, but that it is precisely the character of this online circulation and reception that makes something into fake news. In this sense fake news may be considered not just in terms of the form or content of the message, but also in terms of the mediating infrastructures, platforms and participatory cultures which facilitate its circulation. In this sense, the significance of fake news cannot be fully understood apart from its circulation online. It is the register of this circulation that also enables us to trace how material that starts its life as niche satire can be repackaged as hyper-partisan clickbait to generate advertising money and then continue life as an illustration of dangerous political misinformation.

As a consequence this field guide encourages a shift from focusing on the formal content of fabrications in isolation to understanding the contexts in which they circulate online. This shift points to the limits of a “deficit model” approach – which might imply that fabrications thrive only because of a deficit of factual information. In the guide we suggest new ways of mapping and responding to fake news beyond identifying and fact-checking suspect claims – including “thicker” accounts of circulation as a way to develop a richer understanding of how fake news moves and mobilises people, more nuanced accounts of “fakeness” and responses which are better attuned to the phenomenon.
… (page 8)

The means by which information circulates is always relevant to the study of communications. However, notice that the authors’ definition excludes traditional media from its quest to identify “fake news.” Really? Traditional media isn’t responsible for the circulation of any “fake news?”

Examples of traditional media fails are legion but here is a recent and spectacular one: The U.S. Media Suffered Its Most Humiliating Debacle in Ages and Now Refuses All Transparency Over What Happened by Glenn Greenwald.

Friday was one of the most embarrassing days for the U.S. media in quite a long time. The humiliation orgy was kicked off by CNN, with MSNBC and CBS close behind, and countless pundits, commentators, and operatives joining the party throughout the day. By the end of the day, it was clear that several of the nation’s largest and most influential news outlets had spread an explosive but completely false news story to millions of people, while refusing to provide any explanation of how it happened.

The spectacle began Friday morning at 11 a.m. EST, when the Most Trusted Name in News™ spent 12 straight minutes on air flamboyantly hyping an exclusive bombshell report that seemed to prove that WikiLeaks, last September, had secretly offered the Trump campaign, even Donald Trump himself, special access to the Democratic National Committee emails before they were published on the internet. As CNN sees the world, this would prove collusion between the Trump family and WikiLeaks and, more importantly, between Trump and Russia, since the U.S. intelligence community regards WikiLeaks as an “arm of Russian intelligence,” and therefore, so does the U.S. media.

This entire revelation was based on an email that CNN strongly implied it had exclusively obtained and had in its possession. The email was sent by someone named “Michael J. Erickson” — someone nobody had heard of previously and whom CNN could not identify — to Donald Trump Jr., offering a decryption key and access to DNC emails that WikiLeaks had “uploaded.” The email was a smoking gun, in CNN’s extremely excited mind, because it was dated September 4 — 10 days before WikiLeaks began promoting access to those emails online — and thus proved that the Trump family was being offered special, unique access to the DNC archive: likely by WikiLeaks and the Kremlin.

There was just one small problem with this story: It was fundamentally false, in the most embarrassing way possible. Hours after CNN broadcast its story — and then hyped it over and over and over — the Washington Post reported that CNN got the key fact of the story wrong.

This fundamentally false story does not qualify as “fake news” for this guide. Surprised?

The criteria for “fake news” also excludes questioning statements from members of the intelligence community, which includes James Clapper, a self-confessed and known liar, who continues to be the darling of mainstream media outlets.

Cozy relationships between news organizations and their reporters with government and intelligence sources are also not addressed as potential sources of “fake news.”

Limiting the scope of a “fake news” study in order to have a doable project is understandable. However, excluding factually false stories, use of known liars and corrupting relationships, all because they occur in mainstream media, looks like picking a target to tar with the label “fake news.”

The guides and techniques themselves may be quite useful, so long as you remember they were designed to show social media as the spreader of “fake news.”

One last thing, what the authors don’t offer and I haven’t seen reports of, is the effectiveness of the so-called “fake news” with voters. Taking “Pope Francis Endorses Trump,” as a lie, however widely spread that story became, did it have any impact on the 2016 election? Or did every reader do a double-take and move on? It’s possible to answer that type of question but it does require facts.

Getting Started with Python/CLTK for Historical Languages

Friday, January 12th, 2018

Getting Started with Python/CLTK for Historical Languages by Patrick J. Burns.

From the post:

This is a ongoing project to collect online resources for anybody looking to get started with working with Python for historical languages, esp. using the Classical Language Toolkit. If you have suggestions for this lists, email me at patrick[at]diyclassics[dot]org.

What classic or historical language resources would you recommend?

Complete Guide to Topic Modeling (Recommender System for Email Dumps?)

Friday, January 12th, 2018

Complete Guide to Topic Modeling with scikit-learn and gensim by George-Bogdan Ivanov.

From the post:

Why is Topic Modeling useful?

There are several scenarios when topic modeling can prove useful. Here are some of them:

  • Text classification – Topic modeling can improve classification by grouping similar words together in topics rather than using each word as a feature
  • Recommender Systems – Using a similarity measure we can build recommender systems. If our system would recommend articles for readers, it will recommend articles with a topic structure similar to the articles the user has already read.
  • Uncovering Themes in Texts – Useful for detecting trends in online publications for example

Would a recommender system be useful for reading email dumps? 😉

Within or across candidates for Congress?

Secrets to Searching for Video Footage (AI Assistance In Your Future?)

Friday, January 12th, 2018

Secrets to Searching for Video Footage by Aric Toler.

From the post:

Much of Bellingcat’s work requires intense research into particular events, which includes finding every possible photograph, video and witness account that will help inform our analysis. Perhaps most notably, we exhaustively researched the events surrounding the shoot down of Malaysian Airlines Flight 17 (MH17) over eastern Ukraine.

The photographs and videos taken near the crash in eastern Ukraine were not particularly difficult to find, as they were widely publicized. However, locating over a dozen photographs and videos of the Russian convoy transporting the Buk anti-aircraft missile launcher that shot down MH17 three weeks before the tragedy was much harder, and required both intense investigation on social networks and some creative thinking.

Most of these videos were shared on Russian-language social networks and YouTube, and did not involve another type of video that is much more important today than it was in 2014 — live streaming. Bellingcat has also made an effort to compile all user-generated videos of the events in Charlottesville on August 12, 2017, providing a database of livestreamed videos on platforms like Periscope, Ustream and Facebook Live, along with footage uploaded after the protest onto platforms like Twitter and YouTube.

Verifying videos is important, as detailed in this Bellingcat guide, but first you have to find them. This guide will provide advice and some tips on how to gather as much video as possible on a particular event, whether it is videos from witnesses of a natural disaster or a terrorist attack. For most examples in this guide, we will assume that the event is a large protest or demonstration, but the same advice is applicable to other events.

I was amused by this description of Snapchat and Instagram:


Snapchat and Instagram are two very common sources for videos, but also two of the most difficult platforms to trawl for clips. Neither has an intuitive search interface that easily allows researchers to sort through and collect videos.

I’m certain that’s true but a trained AI could sort out videos obtained by overly broad requests. As I’m fond of pointing out, not 100% accuracy but you can’t get that with humans either.

Augment your searching with a tireless AI. For best results, add or consult a librarian as well.

PS: I have other concerns at the moment but a subset of the Bellingcat Charlottesville database would make a nice training basis for an AI, which could then be loosed on Instagram and other sources to discover more videos. The usual stumbling block for AI projects being human curated material, which Bellingcat has already supplied.

Leaking Resources for Federal Employees with Ties to ‘Shithole’ Countries

Friday, January 12th, 2018

Trump derides protections for immigrants from ‘shithole’ countries by Josh Dawsey.

From the post:

President Trump grew frustrated with lawmakers Thursday in the Oval Office when they discussed protecting immigrants from Haiti, El Salvador and African countries as part of a bipartisan immigration deal, according to several people briefed on the meeting.

“Why are we having all these people from shithole countries come here?” Trump said, according to these people, referring to countries mentioned by the lawmakers.

The EEOC Annual report for 2014 reports out of 2.7 million women and men employed by the federal government:

…63.50% were White, 18.75% were Black or African American 8.50% were Hispanic or Latino, 6.16% were Asian, 1.49% were American Indian or Alaska Native, 1.16% were persons of Two or More Races and 0.45% were Native Hawaiian or Other Pacific Islander…(emphasis added)

In other words, 27.25% of 2.7 million people working for the federal government, or approximately 794,000 federal employees have ties ‘shithole’ countries.

President Trump’s rude remarks are an accurate reflection of current U.S. immigration policy:

The United States treats other countries ‘shitholes’ but it is considered impolite to mention that in public.

Federal employees with ties to ‘shithole’ countries are at least as loyal, if not more so, than your average staffer.

That said, I’m disappointed that media outlets did not immediately call upon federal employees with ties to ‘shithole’ countries to start leaking documents/data.

Here are some places documents can be leaked to:

More generally, see Here’s how to share sensitive leaks with the press and their excellent listing of SecureDrop resources for anonymous submission of documents.

If you have heard of the Panama Papers or the Paradise Papers, then you are thinking about the International Consortium of Investigative Journalists. They do excellent work, but like the other journalists mentioned, are obsessed with being in control of the distribution of your leak.

Every outrage, whether a shooting, unjust imprisonment, racist remarks, religious bigotry, is an opportunity to incite leaking by members of a group.

Not calling for leaking speaks volumes about your commitment to the status quo and its current injustices.

The art of writing science

Thursday, January 11th, 2018

The art of writing science by Kevin W. Plaxco

From the post:

The value of writing well should not be underestimated. Imagine, for example, that you hold in your hand two papers, both of which describe precisely the same set of experimental results. One is long, dense, and filled with jargon. The other is concise, engaging, and easy to follow. Which are you more likely to read, understand, and cite? The answer to this question hits directly at the value of good writing: writing well leverages your work. That is, while even the most skillful writing cannot turn bad science into good science, clear and compelling writing makes good science more impactful, and thus more valuable.

The goal of good writing is straightforward: to make your reader’s job as easy as possible. Realizing this goal, though, is not so simple. I, for one, was not a natural-born writer; as a graduate student, my writing was weak and rambling, taking forever to get to the point. But I had the good fortune to postdoc under an outstanding scientific communicator, who taught me the above-described lesson that writing well is worth the considerable effort it demands. Thus inspired, I set out to teach myself how to communicate more effectively, an effort that, some fifteen years later, I am still pursuing.

Along the way I have learned a thing or two that I believe make my papers easier to read, a few of which I am pleased to share with you here. Before I share my hard-won tips, though, I have an admission: there is no single, correct way to write. In fact, there are a myriad of solutions to the problem of writing well (see, e.g., Refs.1–4). The trick, then, is not to copy someone else’s voice, but rather to study what works—and what does not—in your own writing and that of others to formulate your own guide to effective communication. Thus, while I present here some of my most cherished writing conventions (i.e., the rules that I force on my own students), I do not mean to imply that they represent the only acceptable approach. Indeed, you (or your mentor) may disagree strongly with many of the suggestions I make below. This, though, is perfectly fine: my goal is not to convince you that I have found the one true way, but instead simply to get people thinking and talking about writing. I do so in the hope that this will inspire a few more young scientists to develop their own effective styles.

The best way to get the opportunity to do a great presentation for Balisage 2018 is to write a great paper for Balisage 2018. A great paper is step one towards being accepted and having a chance to bask in the admiration of other markup geeks.

OK, so it’s not so much basking as trying to see by star light on a cloudy night.

Still, a great paper will impress the reviewers and if accepted, readers when it appears in the proceedings for this year.

Strong suggestion: Try Plaxco’s first sentence of the paragraph test on your paper (or any you are reviewing). If if fails, start over.

I volunteer to do peer review for Balisage so I’m anticipating some really well-written papers this year.

The David Attenborough Style of Scientific Presentation (Historic First for Balisage?)

Thursday, January 11th, 2018

The David Attenborough Style of Scientific Presentation by Will Ratcliff.

From the post:

One of the biggest hurdles to giving a good talk is convincing people that it’s worth their mental energy to listen to you. This approach to speaking is designed to get that buy-in from the audience, without them even realizing they are doing so. The key to this is exploitation of a simple fact: people are curious creatures by nature and will pay attention to a cool story as long as that story remains absolutely clear.

In the D.A. style of speaking, you are the narrator of an interesting story. The goal is to have a visually streamlined talk where the audience is so engaged with your presentation that they forget you’re standing in front of them speaking. Instead, they’re listening to your narrative and seeing the visuals that accompany your story, at no point do they have to stop and try to make sense of what you just said.

A captivating two (2) page summary of the David Attenborough (DA) style for presentations, but at first, since I don’t travel any longer, I wasn’t going to mention it.

On a second or third read, the blindingly obvious hit me:

Rules that work for live conference presentations, also work for video podcasts, lectures, client presentations, anywhere you are seeking to effectively communicate to others. (I guess that rules out White House press briefings.)

Paper submission dates aren’t out yet for Balisage 2018 but your use of DA style for your presentation would be a historic first, so far as I know. 😉

No promises but a video in “normal” style with the same presentation in DA style, for the same presentation, could be an interesting data point.

Introduction to reverse engineering and Assembly (Suicidal Bricking by Ubuntu Servers)

Thursday, January 11th, 2018

Introduction to reverse engineering and Assembly by Youness Alaoui.

From the post:

Recently, I’ve finished reverse engineering the Intel FSP-S “entry” code, that is from the entry point (FspSiliconInit) all the way to the end of the function and all the subfunctions that it calls. This is only some initial foray into reverse engineering the FSP as a whole, but reverse engineering is something that takes a lot of time and effort. Today’s blog post is here to illustrate that, and to lay the foundations for understanding what I’ve done with the FSP code (in a future blog post).

Over the years, many people asked me to teach them what I do, or to explain to them how to reverse engineer assembly code in general. Sometimes I hear the infamous “How hard can it be?” catchphrase. Last week someone I was discussing with thought that the assembly language is just like a regular programming language, but in binary form—it’s easy to make that mistake if you’ve never seen what assembly is or looks like. Historically, I’ve always said that reverse engineering and ASM is “too complicated to explain” or that “If you need help to get started, then you won’t be able to finish it on your own” and various other vague responses—I often wanted to explain to others why I said things like that but I never found a way to do it. You see, when something is complex, it’s easy to say that it’s complex, but it’s much harder to explain to people why it’s complex.

I was lucky to recently stumble onto a little function while reverse engineering the Intel FSP, a function that was both simple and complex, where figuring out what it does was an interesting challenge that I can easily walk you through. This function wasn’t a difficult thing to understand, and by far, it’s not one of the hard or complex things to reverse engineer, but this one is “small and complex enough” that it’s a perfect example to explain, without writing an entire book or getting into the more complex aspects of reverse engineering. So today’s post serves as a “primer” guide to reverse engineering for all of those interested in the subject. It is a required read in order to understand the next blog posts I would be writing about the Intel FSP. Ready? Strap on your geek helmet and let’s get started!
… (emphasis in original)

Intel? Intel? I heard something recently about Intel chips. You? 😉

No, this won’t help you specifically with Spectre and Meltdown, but it’s a step in the direction of building such skills.

The Project Zero team at Google did not begin life with the skills necessary to discover Spectre and Meltdown.

It took 20 years for those vulnerabilities to be discovered.

What vulnerabilities await discovery by you?

PS: Word on the street is that Ubuntu 16.04 servers are committing suicide rather than run more slowly with patches for Meltdown and Spectre. Meltdown and Spectre Patches Bricking Ubuntu 16.04 Computers. The attribution of intention to Ubuntu servers may be a bit overdone but the bricking part is true.

W. E. B. Du Bois as Data Scientist

Thursday, January 11th, 2018

W. E. B. Du Bois’s Modernist Data Visualizations of Black Life by Allison Meier.

From the post:

For the 1900 Exposition Universelle in Paris, African American activist and sociologist W. E. B. Du Bois led the creation of over 60 charts, graphs, and maps that visualized data on the state of black life. The hand-drawn illustrations were part of an “Exhibit of American Negroes,” which Du Bois, in collaboration with Thomas J. Calloway and Booker T. Washington, organized to represent black contributions to the United States at the world’s fair.

This was less than half a century after the end of American slavery, and at a time when human zoos displaying people from colonized countries in replicas of their homes were still common at fairs (the ruins of one from the 1907 colonial exhibition in Paris remain in the Bois de Vincennes). Du Bois’s charts (recently shared by data artist Josh Begley on Twitter) focus on Georgia, tracing the routes of the slave trade to the Southern state, the value of black-owned property between 1875 and 1889, comparing occupations practiced by blacks and whites, and calculating the number of black students in different school courses (2 in business, 2,252 in industrial).

Ellen Terrell, a business reference specialist at the Library of Congress, wrote a blog post in which she cites a report by Calloway that laid out the 1900 exhibit’s goals:

It was decided in advance to try to show ten things concerning the negroes in America since their emancipation: (1) Something of the negro’s history; (2) education of the race; (3) effects of education upon illiteracy; (4) effects of education upon occupation; (5) effects of education upon property; (6) the negro’s mental development as shown by the books, high class pamphlets, newspapers, and other periodicals written or edited by members of the race; (7) his mechanical genius as shown by patents granted to American negroes; (8) business and industrial development in general; (9) what the negro is doing for himself though his own separate church organizations, particularly in the work of education; (10) a general sociological study of the racial conditions in the United States.

Georgia was selected to represent these 10 points because, according to Calloway, “it has the largest negro population and because it is a leader in Southern sentiment.” Rebecca Onion on Slate Vault notes that Du Bois created the charts in collaboration with his students at Atlanta University, examining everything from the value of household and kitchen furniture to the “rise of the negroes from slavery to freedom in one generation.”

The post is replete with images created by Du Bois for the exposition, of which this is an example:

As we all know, but rarely say in public, data science and visualization of data isn’t a new discipline.

The data science/visualization by Du Bois merits notice during Black History month (February) but the rest of the year as well. It’s part of our legacy in data science and we should be proud of it.