Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 31, 2017

Malware Subscriptions and the Long Tail of Patching (What you get for $100)

Filed under: Cybersecurity,Security — Patrick Durusau @ 8:20 pm

Hacker Fantastic and x0rz have been deriding Shadow Brokers Response Team is creating open & transparent crowd-funded analysis of leaked NSA tools.

In part because whitehats will get the data at the same time.

Even if whitehats could instantly generate patches for all the vulnerabilities in each monthly release, if the vulnerabilities do have value, always an open question, they will retain that value for years, even more than a decade.

Why?

Roger Grimes recites the folk wisdom:


Folk wisdom says that patching habits can be divided into quarters: 25 percent of people patch within the first week; 25 percent patch within the first month; 25 percent patch after the first month, and 25 percent never apply the patch. The longer the wait, the greater the increased risk.

Or to put that another way:


50% of all vulnerable systems remain so 30+ days after the release.

25% of all vulnerable systems remain so forever.

Here’s a “whitehat” graphic that makes a similar point:

(From: Website Security Statistics Report 2015)

For $100 each by 2500 people, assuming there are vulnerabilities in the first Shadow Brokers monthly release, you get:

Vulnerabilities for 25% of systems forever (assuming patches are possible), vulnerabilities for 50% of systems are vulnerable for more than a month (assuming patches are possible), for some industries offer years of vulnerability, especially government systems.

For a $100 investment?

Modulo my preference for a group buy, then distribute model, that’s not a bad deal.

If there are no meaningful vulnerabilities in the first release, then don’t spend the second $100.

A commodity marketplace for malware weakens the NSA and its kindred. That’s reason enough for me to invest.

Disclosure = No action/change/consequences

Filed under: Cybersecurity,Security — Patrick Durusau @ 3:42 pm

What would you do if you discovered:


A cache of more than 60,000 files were discovered last week on a publicly accessible Amazon server, including passwords to a US government system containing sensitive information, and the security credentials of a lead senior engineer at Booz Allen Hamilton, one of the nation’s top intelligence and defense contractors. What’s more, the roughly 28GB of data contained at least a half dozen unencrypted passwords belonging to government contractors with Top Secret Facility Clearance.

?

Dell Cameron reports in: Top Defense Contractor Left Sensitive Pentagon Files on Amazon Server With No Password this result:


UpGuard cyber risk analyst Chris Vickery discovered the Booz Allen server last week while at his Santa Rosa home running a scan for publicly accessible s3 buckets (what Amazon calls its cloud storage devices).

The mission of UpGuard’s Cyber Risk Team is to locate and secure leaked sensitive records, so Vickery’s first email on Wednesday was to Joe Mahaffee, Booz Allen’s chief information security officer. But after received no immediate response, he went directly the agency. “I emailed the NGA at 10:33am on Thursday. Public access to the leak was cut off nine minutes later,” he said.

What an unfortunate outcome.

Not faulting Chris Vickery, who was doing his job.

But responsible disclosure to Booz Allen Hamilton and then NGA, will result in no change to Booz Allen Hamilton’s position as a government IT supplier.

Public distribution of these files might not result in significant changes at government agencies and their IT contractors.

On the other hand, no consequences for agencies and their IT contractors hasn’t improved security.

Shouldn’t we give real world consequences a chance?

May 30, 2017

Trillion-Edge Graphs – Dodging Cost and the NSA

Filed under: Graph Database Benchmark,Graphs — Patrick Durusau @ 7:48 pm

Mosaic: processing a trillion-edge graph on a single machine by Adrian Colyer.

From the post:

Mosaic: Processing a trillion-edge graph on a single machine Maass et al., EuroSys’17

Unless your graph is bigger than Facebook’s, you can process it on a single machine.

With the inception of the internet, large-scale graphs comprising web graphs or social networks have become common. For example, Facebook recently reported their largest social graph comprises 1.4 billion vertices and 1 trillion edges. To process such graphs, they ran a distributed graph processing engine, Giraph, on 200 machines. But, with Mosaic, we are able to process large graphs, even proportional to Facebook’s graph, on a single machine.

In this case it’s quite a special machine – with Intel Xeon Phi coprocessors and NVMe storage. But it’s really not that expensive – the Xeon Phi used in the paper costs around $549, and a 1.2TB Intel SSD 750 costs around $750. How much do large distributed clusters cost in comparison? Especially when using expensive interconnects and large amounts of RAM.

So Mosaic costs less, but it also consistently outperforms other state-of-the-art out of core (secondary storage) engines by 3.2x-58.6x, and shows comparable performance to distributed graph engines. At one trillion edge scale, Mosaic can run an iteration of PageRank in 21 minutes (after paying a fairly hefty one-off set-up cost).

(And remember, if you have a less-than-a-trillion edges scale problem, say just a few billion edges, you can do an awful lot with just a single thread too!).

Another advantage of the single machine design, is a much simpler approach to fault tolerance:

… handling fault tolerance is as simple as checkpointing the intermediate stale data (i.e., vertex array). Further, the read-only vertex array for the current iteration can be written to disk parallel to the graph processing; it only requires a barrier on each superstep. Recovery is also trivial; processing can resume with the last checkpoint of the vertex array.

There’s a lot to this paper. Perhaps the two most central aspects are design sympathy for modern hardware, and the Hilbert-ordered tiling scheme used to divide up the work. So I’m going to concentrate mostly on those in the space available.

A publicly accessible version of the paper: Mosaic: Processing a trillion-edge graph on a single machine. Presentation slides.

Definitely a paper for near the top of my reading list!

Shallow but broad graphs (think telephone surveillance data) are all the rage but how would relatively narrow but deep graphs fare when being processed by Mosaic?

Using top-end but not uncommon hardware may enable your processing requirements to escape the notice of the NSA. Another benefit to commodity hardware.

Enjoy!

Crowd-Funding Public Access to NSA Tools!

Filed under: Cybersecurity,Government,NSA,Security — Patrick Durusau @ 6:51 pm

Awesome! (with a caveat below)

Shadow Brokers Response Team is creating open & transparent crowd-funded analysis of leaked NSA tools.

The group calling itself the Shadow Brokers have released several caches of exploits to date. These caches and releases have had a detrimental outcome on the Internet at large, one leak especially resulted in the now in-famous WannaCry ransomware worm – others have been used by criminal crackers to illegally access infrastructure. Many have been analysing the data to determine its authenticity and impact on infrastructure, as a community it has been expressed that the harm caused by exploits could have been mitigated against had the Shadow Brokers been paid for their disclosures.

The leaks of information seen so far have included weaponized reliable exploits for the following platforms:

  • Cisco
  • Juniper
  • Solaris
  • Microsoft Windows
  • Linux

The Shadow Brokers have announced they are offering a “monthly dump” service which requires a subscription of 100 ZCASH coins. Currently this is around £17688.29 but could change due to the fleeting nature of cryptocurrency. By paying the Shadow Brokers the cash they asked for we hope to pool resources and avert any future WannaCry type incidents. This patreon is a chance for those who may not have large budgets (SME, startups and individuals) in the ethical hacking and whitehat community to pool resources and buy a subscription for the new monthly released data.

The goal here is to raise sufficient funds from interested parties to purchase a subscription to the new data leak. We are attempting to perform the following task:

  • Raise funds to purchase 100 ZCASH coins
  • Purchase 100 ZCASH coins from a reputable exchange
  • Transfer 100 ZCASH coins to ShadowBrokers with email address
  • Access the data from the ShadowBrokers and distribute to backers
  • Perform analysis on data leak and ascertain risk / perform disclosures

The Shadow Brokers have implied that the leak could be any of the following items of interest:

  • web browser, router, handset exploits and tools
  • newer material from NSA ops disk including Windows 10 exploits
  • misc compromised network data (SWIFT or Nuclear programmes)
  • … (emphasis in original)

An almost excellent plan that with enough contributors, reduces the risk to any one person to a manageable level.

Two-hundred and fifty contributors at $100 each, makes the $25,000 goal. That’s quite doable.

My only caveat is the “…whitehat ethical hacker…” language for sharing the release. Buying a share in the release should be just that, buying a share. What participants do or don’t do with their share is not a concern.

Kroger clerks don’t ask me if I am going to use flour to bake bread for the police and/or terrorists.

Besides, the alleged NSA tools weren’t created by “…whitehat ethical hackers….” Yes? No government has a claim on others to save them from their own folly.

Any competing crowd-funded subscriptions to the Shadow Brokers release?

May 29, 2017

Launch of the PhilMath Archive

Filed under: Mathematics,Philosophy,Philosophy of Science,Science — Patrick Durusau @ 8:39 pm

Launch of the PhilMath Archive: preprint server specifically for philosophy of mathematics

From the post:

PhilSci-Archive is pleased to announce the launch of the PhilMath-Archive, http://philsci-archive.pitt.edu/philmath.html a preprint server specifically for the philosophy of mathematics. The PhilMath-Archive is offered as a free service to the philosophy of mathematics community. Like the PhilSci-Archive, its goal is to promote communication in the field by the rapid dissemination of new work. We aim to provide an accessible repository in which scholarly articles and monographs can find a permanent home. Works posted here can be linked to from across the web and freely viewed without the need for a user account.

PhilMath-Archive invites submissions in all areas of philosophy of mathematics, including general philosophy of mathematics, history of mathematics, history of philosophy of mathematics, history and philosophy of mathematics, philosophy of mathematical practice, philosophy and mathematics education, mathematical applicability, mathematical logic and foundations of mathematics.

For your reference, the PhilSci-Archive.

Enjoy!

Innovations In Security: Put All Potential Bombs In Cargo

Filed under: Security,Terrorism — Patrick Durusau @ 7:38 pm

US Wants to Extend Laptop Ban to All International Flights by Catalin Cimpanu.

From the post:

US Secretary of Homeland Security Gen. John Kelly revealed in an interview over the weekend that the US might expand its current laptop ban to all flights into the US in the near future.

“I might,” said Gen. Kelly yesterday on Fox News Sunday. “There’s a real threat. There’s numerous threats against aviation. That’s really the thing they’re really obsessed with, the terrorists, the idea of knocking down an airplane in flight, particularly if it is a US carrier, particularly if it is full of mostly US folks.”

Is there an FOIA exception to obtaining the last fitness report on US Secretary of Homeland Security Gen. John F. Kelly when he was serving with the Marines?

Loading fire-prone laptops, which may potentially also contain bombs, into a planes cargo hold for “safety,” raises serious questions about Kelly’s mental competence.

Banning laptops could be a ruse to get passengers to use cloud services for their data, making it more easily available to the NSA.

As the general says, there are people obsessed with “the idea of knocking down an airplane in fight,” but those are mostly found in the Department of Homeland Security.

You need not take my word for it, consider the Wikipedia timeline of airline bombings shows eight such bombings since December of 2001. I find it difficult to credit “obsession” when worldwide there is only one bomb attack on an airline every two years.

Moreover, the GAO in Airport Perimeter and Access Control Security Would Benefit from Risk Assessment and Strategy Updates (2016) found the TSA has not evaluated the vulnerability at 81% of the 437 commercial airports. US airports are vulnerable and the TSA can’t say which ones or by how much.

If terrorists truly were “obsessed,” in General Kelly’s words, the abundance of vulnerable US airports should see US aircraft dropping like flies. Except they’re not.

PS: Anticipating a complete ban on laptops, now would be a good time to invest in airport laptop rental franchises.

Deep Learning – Dodging The NSA

Filed under: Deep Learning,Machine Learning — Patrick Durusau @ 4:30 pm

The $1700 great Deep Learning box: Assembly, setup and benchmarks by Slav Ivanov.

Ivanov’s motivation for local deep learning hardware came from monthly AWS bills.

You may suffer from those or be training on data sets you’d rather not share with the NSA.

For whatever reason, follow these detailed descriptions to build your own deep learning box.

Caution: If more than a month or more has lapsed from this post and your starting to build a system, check all the update links. Hardware and prices change rapidly.

The “blue screen of death” lives! (Humorous HTML Links)

Filed under: Cybersecurity,Humor,Microsoft,Security — Patrick Durusau @ 3:54 pm

A simple file naming bug can crash Windows 8.1 and earlier by Steve J. Vaughan-Nichols.

From the post:

In a blast from the past, a Russian researcher has uncovered a simple bug in the NTFS file system that consistently crashed Windows Vista to 8.1 PCs.

Like the infamous Windows 95/98 /con/con bug, by simply entering a file name with “$MFT” the file-system bug locks up Windows at best, or dumps it into a “blue screen of death” at worse.

The bug won’t deliver malware but since it works in URLs (except for Chrome), humorous HTML links in emails are the order of the day.

Enjoy!

Combating YouTube Censorship (Carry Banned Videos Yourself)

Filed under: Censorship,Free Speech,Terrorism — Patrick Durusau @ 3:17 pm

Memorial Day is always a backwards looking holiday, but reading How Terrorists Slip Beheading Videos Past YouTube’s Censors by Rita Katz, felt like time warping to the 1950’s.

Other jihadi propaganda on the video-sharing platform may be visually more low-key, but are just as insidious in their own ways.

There is a grim bit in comedian Dave Chappelle’s new Netflix special about clicking “don’t like” on an Islamic State beheading video.

“How is this guy cutting peoples’ heads off on YouTube?” Chappelle asks, noting the absurdity of it.

Don’t like. Click.

In reality, reports of extremist content littering YouTube aren’t new. But when hundreds of major advertisers began suspending contracts with YouTube and Google in recent months, boycotting the massive video-sharing platform over concerns with such explicit content, things got a lot more real.

Google services—namely YouTube—are the most plentiful and important links used by terrorist organizations to disseminate their propaganda. And despite all of YouTube’s efforts to keep them out thus far, such groups still manage to sneak their media onto its servers.
… (emphasis in original)

Whatever label you want to apply to another group, “terrorist,” “al Qaeda,” etc., censorship is and remains censorship.

Censorship and intimidation were practiced during the Red Scare of the 1940’s/50’s, lives/careers were ruined, and we weren’t one whit safer than without it.

Want to combat YouTube censorship?

When videos are censored by YouTube, carry them on your site.

Suggested header: Banned on YouTube to make it easy to find.

It won’t stop YouTube’s censorship but it can defeat its intended outcome.

Data Journalists! Data Gif Tool (Google)

Filed under: Graphics,Journalism,News,Reporting,Visualization — Patrick Durusau @ 10:03 am

While not hiding its prior salary discrimination against women, Google has created and released a tool for creating data gifs.

Make your own data gifs with our new tool by Simon Rogers.

From the post:

Data visualizations are an essential storytelling tool in journalism, and though they are often intricate, they don’t have to be complex. In fact, with the growth of mobile devices as a primary method of consuming news, data visualizations can be simple images formatted for the device they appear on.

Enter data gifs.

(gif omitted)

These animations can be used for a variety of sophisticated storytelling approaches among data journalists: one example is Lena Groeger, who has become *the* expert in working with data gifs.

Today we are releasing Data Gif Maker, a tool to help journalists make these visuals, which show share of search interest for two competing topics.

A good way to get your feet wet with simple data gifs.

Don’t be surprised that Google does good things for the larger community while engaging in evil conduct.

Racists sheriffs who used water cannon and dogs on Black children loved their own children and remembered their birthdays. WWII death camps guards attended church. Were kind to small animals.

People and their organizations are complicated and the reading public is ill-served by shallow reporting of only one aspect or another as the “true” view.

May 28, 2017

Ethics, Data Scientists, Google, Wage Discrimination Against Women

Filed under: Data Science,Ethics — Patrick Durusau @ 4:50 pm

Accused of underpaying women, Google says it’s too expensive to get wage data by Sam Levin.

From the post:

Google argued that it was too financially burdensome and logistically challenging to compile and hand over salary records that the government has requested, sparking a strong rebuke from the US Department of Labor (DoL), which has accused the Silicon Valley firm of underpaying women.

Google officials testified in federal court on Friday that it would have to spend up to 500 hours of work and $100,000 to comply with investigators’ ongoing demands for wage data that the DoL believes will help explain why the technology corporation appears to be systematically discriminating against women.

Noting Google’s nearly $28bn annual income as one of the most profitable companies in the US, DoL attorney Ian Eliasoph scoffed at the company’s defense, saying, “Google would be able to absorb the cost as easy as a dry kitchen sponge could absorb a single drop of water.”

Disclosure: I assume Google is resisting disclosure because it has in fact has a history of engaging in discrimination against women. It may or may not be discriminating this month/year, but if known, the facts will support the government’s claim. The $100,000 alleged cost is chump change to prove such a charge groundless. Resistance signals the charge has merit.

Levin’s post gives me reason to doubt Google will prevail on this issue or on the merits in general. Read it in full.

My question is what of the ethical obligations of data scientists at Google?

Should data scientists inside Google come forward with the requested information?

Should data scientists inside Google stage a work slow down to protest Googles’ resistance?

Exactly what should ethical data scientists do when their employer is the 500 pound gorilla in their field?

Do you think Google executives need a memo from their data scientists cluing them in on the ethical issues here?

Possibly not, this is old fashioned gender discrimination.

Google’s resistance signals to all of its mid-level managers that gender based discrimination will be defended.

Does that really qualify for “Don’t be evil?”

May 26, 2017

Thank You, Scott – SNL

Filed under: Facebook,Social Media,Twitter — Patrick Durusau @ 8:49 pm

I posted this to Facebook, search for “Thanks Scott SNL” to find my post or that of others.

Included this note (with edits):

Appropriate social media warriors (myself included). From sexism and racism to fracking and pipelines, push back in the real world if you [want] change. Push back on social media for a warm but meaningless feeling of solidarity.

For me the “real world,” includes cyberspace, where pushing can have consequences.

You?

May 25, 2017

Hacking Fingerprints (Yours, Mine, Theirs)

Filed under: Cybersecurity,Government,Security — Patrick Durusau @ 4:46 pm

Neural networks just hacked your fingerprints by Thomas McMullan.

From the post:

Fingerprints are supposed to be unique markers of a person’s identity. Detectives look for fingerprints in crime scenes. Your phone’s fingerprint sensor means only you can unlock the screen. The truth, however, is that fingerprints might not be as secure as you think – at least not in an age of machine learning.

A team of researchers has demonstrated that, with the help of neural networks, a “masterprint” can be used to fool verification systems. A masterprint, like a master key, is a fingerprint that can be open many different doors. In the case of fingerprint identification, it does this by tricking a computer into thinking the print could belong to a number of different people.

“Our method is able to design a MasterPrint that a commercial fingerprint system matches to 22% of all users in a strict security setting, and 75% of all users at a looser security setting,” the researchers ­– Philip Bontrager, Julian Togelius and Nasir Memon – claim in a paper.

The tweet that brought this post to my attention didn’t seem to take this as good news.

But it is, very good news!

Think about it for a moment. Who is most likely to have “strict security settings?”

Your average cubicle dweller/home owner or …, large corporation or government entity?

What is more, if you, as a cubicle dweller are ever accosted for a breach of security, leaking fingerprint protected files, etc., what better defense than known spoofing of fingerprints?

Not that you would be guilty of such an offense but its always nice to have a credible defense in addition to being innocent!

For further details:

DeepMasterPrint: Generating Fingerprints for Presentation Attacks by Philip Bontrager, Julian Togelius, Nasir Memon.

Abstract:

We present two related methods for creating MasterPrints, synthetic fingerprints that a fingerprint verification system identifies as many different people. Both methods start with training a Generative Adversarial Network (GAN) on a set of real fingerprint images. The generator network is then used to search for images that can be recognized as multiple individuals. The first method uses evolutionary optimization in the space of latent variables, and the second uses gradient-based search. Our method is able to design a MasterPrint that a commercial fingerprint system matches to 22% of all users in a strict security setting, and 75% of all users at a looser security setting.

Defeating fingerprints as “conclusive proof” of presence is an important step towards freedom for us all.

Samba Flaw In Linux PCs

Filed under: Cybersecurity,Linux OS — Patrick Durusau @ 4:04 pm

Samba Flaw Allows Hackers Access Thousands of Linux PCs Remotely

From the post:

A remote code execution vulnerability in Samba has potentially exposed a large number of Linux and UNIX machines to remote attackers. The code vulnerability (CVE-2017-7494) affects all machines with Samba versions newer than the 3.5.0 released last March 2010, making it a 7-year old flaw in the system.

Samba is a software that runs on most of the operating systems used today like Windows, UNIX, IBM, Linux, OpenVMS, and System 390. Due to its open source nature resulting from the reimplementation of the SMB (Server Message Block) networking protocol, Samba enables non-Windows operating systems like Mac OS X or GNU/Linux to give access to folders, printers, and files with Windows OS.

All affected machines can be remotely controlled by uploading a shared library to a writable program. Another command can then be used to cause the server to execute the code. This allows hackers access Linux PC remotely according to the published advisory by Samba last Wednesday, May 24.

Cited but not linked:

The Rapid7 Community post in particular has good details.

Not likely a repeat of WannaCry. It’s hard imagine NHS trusts running Linux.

😉

Banking Malware Tip: Don’t Kill The Goose

Filed under: Cybersecurity,Security — Patrick Durusau @ 1:56 pm

Dridex: A History of Evolution by Nikita Slepogin.

From the post:

The Dridex banking Trojan, which has become a major financial cyberthreat in the past years (in 2015, the damage done by the Trojan was estimated at over $40 million), stands apart from other malware because it has continually evolved and become more sophisticated since it made its first appearance in 2011. Dridex has been able to escape justice for so long by hiding its main command-and-control (C&C) servers behind proxying layers. Given that old versions stop working when new ones appear and that each new improvement is one more step forward in the systematic development of the malware, it can be concluded that the same people have been involved in the Trojan’s development this entire time. Below we provide a brief overview of the Trojan’s evolution over six years, as well as some technical details on its latest versions.

Compared to the 2015 GDP of the United States at ~$18 trillion, the ~$40 million damage from Dridex is a rounding error.

The Dridex authors are not killing the goose that lays golden eggs.

Compare the WannaCry ransomware attack, which provoked a worldwide, all hands on deck response, including Microsoft releasing free patches for unsupported software!

Maybe you can breach an FBI file server and dump its contents to Pastebin. That attracts a lot of attention and is likely to be your only breach of that server.

Strategy is as important in cyberwarfare as in more traditional warfare.

Critical: Draw Coffee Cup In TeX/LaTeX

Filed under: Humor,TeX/LaTeX — Patrick Durusau @ 10:12 am

How to draw a coffee cup.

I’m sure everyone who has ever seen a post, article, book on TeX/LaTeX has lost sleep over how to draw a coffee cup.

Thanks to a tweet from @TeXtip, we can all rest easier. Or at least be bothered by other problems.

@TeXtip points to answers to vexing questions such as how to draw a coffee cup and acts as a reminder to use a little TeX/LaTeX everyday.

Enjoy!

Sanborn Fire Insurance Maps Now Online (25K, Goal: ~500K)

Filed under: Library,Mapping,Maps — Patrick Durusau @ 9:59 am

Sanborn Fire Insurance Maps Now Online

From the post:

The Library of Congress has placed online nearly 25,000 Sanborn Fire Insurance Maps, which depict the structure and use of buildings in U.S. cities and towns. Maps will be added monthly until 2020, for a total of approximately 500,000.

The online collection now features maps published prior to 1900. The states available include Arizona, Arkansas, Colorado, Delaware, Iowa, Kentucky, Louisiana, Michigan, Nebraska, Nevada, North Dakota, South Dakota, Vermont, Wisconsin and Wyoming. Alaska is also online, with maps published through the early 1960s. By 2020, all the states will be online, showing maps from the late 1880s through the early 1960s.

In collaboration with the Library’s Geography and Map Division, Historical Information Gatherers digitized the Sanborn Fire Insurance Maps during a 16-month period at the Library of Congress. The Library is in the process of adding metadata and placing the digitized, public-domain maps on its website.

The Sanborn Fire Insurance Maps are a valuable resource for genealogists, historians, urban planners, teachers or anyone with a personal connection to a community, street or building. The maps depict more than 12,000 American towns and cities. They show the size, shape and construction materials of dwellings, commercial buildings, factories and other structures. They indicate both the names and width of streets, and show property boundaries and how individual buildings were used. House and block numbers are identified. They also show the location of water mains, fire alarm boxes and fire hydrants.

In the 19th century, specialized maps were originally prepared for the exclusive use of fire insurance companies and underwriters. Those companies needed accurate, current and detailed information about the properties they were insuring. The Sanborn Map Company was created around 1866 in the United States in response to this need and began publishing and registering maps for copyright. The Library of Congress acquired the maps through copyright deposit, and the collection grew to 700,000 individual sheets. The insurance industry eventually phased out use of the maps and Sanborn stopped producing updates in the late 1970s.

The Sanborn Maps Collection.

From the collection page:


Fire insurance maps are distinctive because of the sophisticated set of symbols that allows complex information to be conveyed clearly. In working with insurance maps, it is important to remember that they were made for a very specific use, and that although they are now valuable for a variety of purposes, the insurance industry dictated the selection of information to be mapped and the way that information was portrayed. Knowledge of the keys and colors is essential to proper interpretation of the information found in fire insurance maps.

The collection page relates that the keys and use of the keys change over time so use of a topic map with scoping topics is highly recommended.

There aren’t many maps for Georgia but my hometown in Louisiana has good coverage through 1900. Reasoning that roughly knowing the geography, history of the area will help with map interpretation.

Enjoy!

How Not To Be Wrong

Filed under: Journalism,News,Reporting — Patrick Durusau @ 9:03 am

How Not To Be Wrong by Winny de Jong.

From the post:

At the intersection of data and journalism, lots can go wrong. Merely taking precautions might not be enough.

“It’s very well possible that your story is true but wrong,” New York Times data journalist Robert Gebeloff explained at the European Investigative Journalism Conference & Dataharvest, which was recently held in Mechelen, a city 20 minutes outside of Brussels.

“When I work on a big story, I want to know everything about the topic.” To make sure he doesn’t miss out, Gebeloff gets all the data sources he can, examines it in all relevant ways and publishes only what he believes to be true.

The best part of this post is the distillation of Gebeloff’s presentation into a How Not To Be Wrong Checklist.

De Jong’s checklist is remarkably similar to requirements for replication of experiments in science.

It would make a great PDF file to share with data scientists in general.

May 24, 2017

Leaking Photos Of: “Sophisticated Bomb Parts”

Filed under: Journalism,News,Reporting,Terrorism — Patrick Durusau @ 4:45 pm

Theresa May to tackle Donald Trump over Manchester bombing evidence by Heather Stewart, Robert Booth and Vikram Dodd.

From the post:


British officials were infuriated on Wednesday when the New York Times published forensic photographs of sophisticated bomb parts that UK authorities fear could complicate the expanding investigation into the lethal blast in which five further arrests have been made in the UK and two more in Libya.

See for yourself: Found at the Scene in Manchester: Shrapnel, a Backpack and a Battery by C. J. Chivers.

Let’s see, remains of a backpack, detonator, metal scrap, battery.

Do you see any sophisticated bomb parts?

Sophistication, skill, encryption, etc., are emphasized after terrorist attacks, I assume to excuse the failure of authorities to prevent such attacks.

That’s more generous than assuming UK authorities are so untrained they consider this a “sophisticated” bomb. Just guessing from the parts, hardly.

“Click Bait” at The Kicker – Covering Manchester

Filed under: Journalism,News,Reporting,Terrorism — Patrick Durusau @ 4:10 pm

The Kicker: The media’s model for covering terrorist attacks is broken by Pete Vernon.

From the webpage:

ON THE LATEST EPISODE of The Kicker, we run through some of the week’s biggest media stories, including a ratings leaderboard shakeup for cable news, a spurious conspiracy that consumed the right-wing media universe, and a new study that says–surprise–journalists drink too much caffeine and alcohol. Then, we move on to the media coverage of the terrorist attack in Manchester, and tackle why we think the industry’s model for covering terror attacks is broken. Finally, CJR’s David Uberti interviews Clara Jeffery, editor in chief of Mother Jones. They discuss the magazine’s novel approach to funding its political coverage as well as the role Mother Jones played in breaking the Trump-Russia story.

Subscribe via iTunes · Stitcher · RSS Feed · SoundCloud.

The podcast.

Leading with the promise of The media’s model for covering terrorist attacks is broken, I listened to The Kicker today.

If you like podcasts, you will like The Kicker, but it illustrates for me the difficulties associated with podcasts.

First, the podcast covered five separate stories in a little over thirty minutes. Ranging from cable news ratings, Seth Rich and fake news, the drinking habits of journalists, the media model for terrorist coverage (the story of interest to me), and the role of Mother Jones in the continuing From Russia With Love connection to Donald Trump.

As “click bait” for the podcast, the media reporting on terrorism segment starts at approximately 8:20 and ends at approximately 16:50, some 8 minutes and 30 seconds of coverage, much shorter than the account concerning Mother Jones (16:49 – 31:14).

Second, what discussion occurred, included insights such as “…breaking news rooms, larger news rooms, don’t have the privilege of deciding whether to cover a story…?” To be fair, that was followed by discussions of “how to cover stories,” the use of raw/unexplained user video, and the appropriateness of experts discussing politics immediately following such events.

The point that got dropped in the podcast was Christie Chisholm‘s remark:

…breaking news rooms, larger news rooms, don’t have the privilege of deciding whether to cover a story…

Why so?

I may be reading entirely too much into Christie’s comment, but it implies that some news rooms must fill N minutes of coverage on breaking events, whether there is meaningful content to be delivered or not. Yes?

If that is the case, that coverage of breaking events requires wall-to-wall coverage for N minutes, then raw, unexplained video, expert opinions with no facts, reporters asking for each others reactions, the spontaneous speculation and condemnations, become easily explainable.

There is too little content and too much media time available to cover it.

Building on Christie’s insight, The Kicker could have created a timeline of “facts” with regard to the explosion in Manchester as a way to illustrate when facts became known about the explosion and contrast that with the drone of factless coverage of the event.

That would have made a rocking podcast and a pointed one at that.

PS: The podcast did discuss other issues with media coverage of Manchester but the lack of depth and time prevented substantive analysis or proposals. Media coverage of terrorist events certainly merits extended treatment by podcast or otherwise.

Music Encoding Initiative

Filed under: Music,Music Retrieval,Text Encoding Initiative (TEI),Texts — Patrick Durusau @ 12:26 pm

Music Encoding Initiative

From the homepage:

The Music Encoding Initiative (MEI) is an open-source effort to define a system for encoding musical documents in a machine-readable structure. MEI brings together specialists from various music research communities, including technologists, librarians, historians, and theorists in a common effort to define best practices for representing a broad range of musical documents and structures. The results of these discussions are formalized in the MEI schema, a core set of rules for recording physical and intellectual characteristics of music notation documents expressed as an eXtensible Markup Language (XML) schema. It is complemented by the MEI Guidelines, which provide detailed explanations of the components of the MEI model and best practices suggestions.

MEI is hosted by the Akademie der Wissenschaften und der Literatur, Mainz. The Mainz Academy coordinates basic research in musicology through editorial long-term projects. This includes the complete works of from Brahms to Weber. Each of these (currently 15) projects has a duration of at least 15 years, and some (like Haydn, Händel and Gluck) are running since the 1950s. Therefore, the Academy is one of the most prominent institutions in the field of scholarly music editing. Several Academy projects are using MEI already (c.f. projects), and the Academy’s interest in MEI is a clear recommendation to use standards like MEI and TEI in such projects.

This website provides a Gentle Introduction to MEI, introductory training material, and information on projects and tools that utilize MEI. The latest MEI news, including information about additional opportunities for learning about MEI, is displayed on this page.

If you want to become an active MEI member, you’re invited to read more about the community and then join us on the MEI-L mailing list.

Any project that cites and relies upon Standard Music Description Language (SMDL), merits a mention on my blog!

If you are interested in encoding music or just complex encoding challenges in general, MEI merits your attention.

May 23, 2017

China Draws Wrong Lesson from WannaCry Ransomware

Filed under: Cybersecurity,Government,NSA,Open Source,Security — Patrick Durusau @ 7:48 pm

Chinese state media says US should take some blame for cyberattack

From the post:


China’s cyber authorities have repeatedly pushed for what they call a more “equitable” balance in global cyber governance, criticizing U.S. dominance.

The China Daily pointed to the U.S. ban on Chinese telecommunication provider Huawei Technologies Co Ltd, saying the curbs were hypocritical given the NSA leak.

Beijing has previously said the proliferation of fake news on U.S. social media sites, which are largely banned in China, is a reason to tighten global cyber governance.

The newspaper said that the role of the U.S. security apparatus in the attack should “instill greater urgency” in China’s mission to replace foreign technology with its own.

The state-run People’s Daily compared the cyber attack to the terrorist hacking depicted in the U.S. film “Die Hard 4”, warning that China’s role in global trade and internet connectivity opened it to increased risks from overseas.

China is certainly correct to demand a place at the table for China and other world powers in global cyber governance.

But China is drawing the wrong lesson from the WannaCry ransomeware attacks if that is used as a motivation for closed source Chinese software to replace “foreign” technology.

NSA staffers may well be working for Microsoft and/or Oracle, embedding NSA produced code in their products. With closed source code, it isn’t possible to verify the absence of such code or to prevent its introduction.

Sadly, the same is true if closed source code is written by Chinese programmers, some of who may have agendas, domestic or foreign, of their own.

The only defense to rogue code is to invest in open source projects. Not everyone will read every line of code but being available for being read, is a deterrent to obvious subversion of an applications security.

China should have “greater urgency” to abandon closed source software, but investing in domestic closed source only replicates the mistake of investing in foreign closed source software.

Opensource projects cover every office, business and scientific need.

Chinese government support for Chinese participation in existing and new opensource projects can make these projects competitors to closed and potential spyware products.

The U.S. made the closed source mistake for critical cyber infrastructure. China should not make the same mistake.

Fiscal Year 2018 Budget

Filed under: Government,Government Data,Politics,Transparency — Patrick Durusau @ 7:23 pm

Fiscal Year 2018 Budget.

In the best pay-to-play tradition, the Government Printing Office (GPO) has these volumes for sale:

America First: A Budget Blueprint To Make America Great Again By: Executive Office of the President, Office of Management and Budget. GPO Stock # 041-001-00719-9 ISBN: 9780160937620. Price: $10.00.

Budget of the United States Government, FY 2018 (Paperback Book) By: Executive Office of the President, Office of Management and Budget. GPO Stock # 041-001-00723-7 ISBN: 9780160939228. Price: $38.00.

Appendix, Budget of the United States Government, FY 2018 By: Executive Office of the President, Office of Management and Budget GPO Stock # 041-001-00720-2 ISBN: 9780160939334. Price: $79.00.

Budget of the United States Government, FY 2018 (CD-ROM) By: Executive Office of the President, Office of Management and Budget GPO Stock # 041-001-00722-9 ISBN: 9780160939358. Price: $29.00.

Analytical Perspectives, Budget of the United States Government, FY 2018 By: Executive Office of the President, Office of Management and Budget. GPO Stock # 041-001-00721-1 ISBN: 9780160939341. Price: $56.00.

Major Savings and Reforms: Budget of the United States Government, Fiscal Year 2018 By: Executive Office of the President, Office of Management and Budget. GPO Stock # 041-001-00724-5 ISBN: 9780160939457. Price: $35.00.

If someone doesn’t beat me to it (very likely), I will be either uploading the CD-ROM and/or pointing you to a location with the contents of the CD-ROM.

As citizens, whether you voted or not, you should have the opportunity to verify news accounts, charges and counter-charges with regard to the budget.

C Reference Manual (D.M. Richie, 1974)

Filed under: C/C++,Documentation,Programming — Patrick Durusau @ 4:19 pm

C Reference Manual (D.M. Richie, 1974)

I mention the C Reference Manual, now forty-three (43) years old, as encouragement to write good documentation.

It may have a longer life than you ever expected!

For example, in 1974 Richie writes:

2.2 Identifier (Names)

An identifier is a sequence of letters and digits: the first character must be alphabetic.

Which we find replicated years later in ISO/IEC 8879 : 1986 (SGML):

4.198 name: A name token whose first character is a name start character.

4.201 name start character: A character that can begin a name: letters and others designated by the concrete syntax.

And in production [53]:


name start character =
LC Letter \
UC Letter \
LCNMSTRT \
UCNMSTRT

Where Figure 1 of 9.2.1 SGML Character defines LC Letter as a-z, UC Letter as A-Z, LCNMSTRT as (none), UCNMSTRT as (none), in the concrete syntax.

And in 1997, the letter vs. digit distinction, finds its way into Extensible Markup Language (XML) 1.0.


[4] NameChar ::= Letter | Digit | ‘.’ | ‘-‘ | ‘_’ | ‘:’ | CombiningChar | Extender
[5] Name ::= (Letter | ‘_’ | ‘:’) (NameChar)*

“Letter” is a link to a production referencing all the qualifying Unicode characters which is too long to include here.

What started off as an arbitrary choice, “alphabetic” characters as name start characters in 1974, is picked up some 12 years later (1986) in ISO/IEC 8879 (SGML), both of which were bound by a restricted character set.

When the opportunity came to abandon the letter versus digit distinction in name start characters (XML 1.0), the result is a larger character repertoire for name start characters, but digits continue as second-class citizens.

Can you point to an explanation why Richie preferred alphabetic characters over digits for name start characters?

The power of algorithms and how to investigate them (w/ resources)

Filed under: Algorithms,Journalism,News,Reporting — Patrick Durusau @ 2:03 pm

The power of algorithms and how to investigate them by Katrien Vanherck.

From the post:

Most Americans these days get their main news from Google or Facebook, two tools that rely heavily on algorithms. A study in 2015 showed that the way a search engine like Google selects and prioritises search results on political candidates can have an influence on voters’ preferences.

Similarly, it has been shown that by tweaking the algorithms behind the Facebook newsfeed, the turnout of voters in American elections can be influenced. If Marc Zuckerberg were ever to run for president, he would theoretically have an enormously powerful tool at his disposal. (Note: as recent article in The Guardian investigated the misuse of big data and social media in the context of the Brexit referendum).

Algorithms are everywhere in our everyday life and are exerting a lot of power in our society. They prioritise, classify, connect and filter information, automatically making decisions on our behalf all the time. But as long as the algorithms remain a ‘black box’, we don’t know exactly how these decisions are made.

Are these algorithms always fair? Examples of possible racial bias in algorithms include the risk analysis score that is calculated for prisoners that are up for parole or release (white people appear to get more favourable scores more often) and the service quality of Uber in Washington DC (waiting times are shorter in predominantly white neighbourhoods). Maybe such unfair results are not only due to the algorithms, but the lack of transparency remains a concern.

So what is going on in these algorithms, and how can we make them more accountable?
… (emphasis in original)

A great inspirational keynote but short on details for investigation of algorithms.

Such as failing to mention the algorithms of both Google and Facebook are secret.

Reverse engineering those from results would be a neat trick.

Google would be the easier of the two, since you could script searches domain by domain with a list of search terms to build up a data set of its results. That would not result in the algorithm per se but you could detect some of its contours.

Google has been accused of liberal bias, Who would Google vote for? An analysis of political bias in internet search engine results, bias in favor of Hillary Clinton, Google defends its search engine against charges it favors Clinton, and, bias in favor of the right wing, How Google’s search algorithm spreads false information with a rightwing bias.

To the extent you identify Hillary Clinton with the rightwing, those results may be expressions of the same bias.

In any event, you can discern from those studies some likely techniques to use in testing Google search/auto-completion results.

Facebook is be harder because you don’t have access to or control over the content it is manipulating for delivery. Although by manipulating social media identities, you could test and compare the content that Facebook delivers.

May 22, 2017

Breaking News Consumer’s Handbook

Filed under: Journalism,News,Reporting — Patrick Durusau @ 9:18 pm

From a tweet by @onthemedia, see their website: onthemedia.org.

If you follow #2:

2. Don’t trust anonymous sources.

Skip political reports in the New York Times and Washington Post.

Is there a market for delayed news?

I ask because I understand there was an explosion in Manchester Arena in England, 10:35 PM their local. Even as I type this, mis-information is flooding social media channels from any number of sources.

What if there was a news service with a variable delay, say minimum 7 days but maximum of 14 days, that delivered a coherent and summarized version of breaking events?

As opposed to the click-bait teasers that get shared/forwarded/re-tweeted without anyone reading the mis-information behind the click-bait.

Weaponizing GPUs (Terrorism)

Filed under: Deep Learning,GPU,NVIDIA,Terrorism — Patrick Durusau @ 8:54 pm

Nvidia reports in: Modeling Cities in 3D Using Only Image Data:

ETH Zurich scientists leveraged deep learning to automatically stich together millions of public images and video into a three-dimensional, living model of the city of Zurich.

The platform called “VarCity” combines a variety of different image sources: aerial photographs, 360-degree panoramic images taken from vehicles, photos published by tourists on social networks and video material from YouTube and public webcams.

“The more images and videos the platform can evaluate, the more precise the model becomes,” says Kenneth Vanhoey, a postdoc in the group led by Luc Van Gool, a Professor at ETH Zurich’s Computer Vision Lab. “The aim of our project was to develop the algorithms for such 3D city models, assuming that the volume of available images and videos will also increase dramatically in the years ahead.”

Using a cluster of GPUs including Tesla K40s with cuDNN to train their deep learning models, the technology recognizes image content such as buildings, windows and doors, streets, bodies of water, people, and cars. Without human assistance, the 3D model “knows”, for example, what pavements are and – by evaluating webcam data – which streets are one-way only.

The data/information gap between nation states and non-nation state groups grows narrower everyday. Here, GPUs and deep learning, produce planning data terrorists could have only dreamed about twenty years ago.

Technical advances make precautions such as:

Federal, state, and local law enforcement let people know that if they take pictures or notes around monuments and critical infrastructure facilities, they could be subject to an interrogation or an arrest; in addition to the See Something, Say Something awareness campaign, DHS also has broader initiatives such as the Buffer Zone Protection Program, which teach local police and security how to spot potential terrorist activities. (DHS focus on suspicious activity at critical infrastructure facilities)

sound old fashioned and quaint.

Such measures annoy tourists but unless potential terrorists are as dumb as the underwear bomber, against a skilled adversary, not so much.

I guess that’s the question isn’t it?

Are you planning to fight terrorists from shallow end of the gene pool or someone a little more challenging?

The Secrets of Technical Writing

Filed under: Documentation,Writing — Patrick Durusau @ 8:26 pm

The Secrets of Technical Writing by Matthew Johnston.

From the post:

The process of writing code, building apps, or developing websites is always evolving, with improvements in coding tools and practices constantly arriving. But one aspect hasn’t really been brought along for the journey, passed-by in the democratisation of learning that the internet has brought about, and that’s the idea of writing about code.

Technical writing is one of the darkest of dark arts in the domain of code development: you won’t find too many people talking about it, you won’t find too many great examples of it, and even hugely successful tech companies have no idea how to handle it.

So, in an effort to change that, I’m going to share with you what I’ve learnt about technical writing from building Facebook’s Platform docs, providing documentation assistance to their Open Source projects, and creating a large, multi-part tutorial for Facebook’s F8 conference in 2016. When I talk about the struggles of writing docs, I’ve seen it happen at the biggest and best of tech companies, and I’ve experienced how difficult it can be to get it right.

These tips aren’t perfect, they aren’t applicable to everything, and I’m not at an expert-level of technical writing, but I think it’s important to share thoughts on this, and help bring technical writing up to par with the rest of code development.

Note that this is from the perspective of writing technical docs, it can just as easily apply to shorter tutorials, blog posts, presentations, or talks.

The best tip of the lot: start early! Don’t wait until just before launch to hack some documentation together.

If you don’t have the cycles, I know someone, who might. 😉

May 21, 2017

More Dicking With The NSA

Filed under: Cybersecurity,NSA,Privacy,Tails — Patrick Durusau @ 9:01 pm

Privacy-focused Debian 9 ‘Stretch’ Linux-based operating system Tails 3.0 reaches RC status by Brian Fagioli.

From the post:

If you want to keep the government and other people out of your business when surfing the web, Tails is an excellent choice. The Linux-based operating system exists solely for privacy purposes. It is designed to run from read-only media such as a DVD, so that there are limited possibilities of leaving a trail. Of course, even though it isn’t ideal, you can run it from a USB flash drive too, as optical drives have largely fallen out of favor with consumers.

Today, Tails achieves an important milestone. Version 3.0 reaches RC status — meaning the first release candidate (RC1). In other words, it may soon be ready for a stable release — if testing confirms as much. If you want to test it and provide feedback, you can download the ISO now.

Fagioli covers some of the details but the real story is this:

The sooner testers (that can include you) confirm the stability, etc., of Tails Version 3.0 (RC1), the sooner it can be released for general use.

In part, the release schedule for Tails Version 3.0 (RC1) depends on you.

Your response?

Check Fagoli’s post for links to the release and docs.

immersive linear algebra

Filed under: Algebra,Mathematics — Patrick Durusau @ 4:56 pm

immersive linear algebra by J. Ström, K. Åström, and T. Akenine-Möller.

Billed as:

The world’s first linear algebra book with fully interactive figures.

From the preface:

“A picture says more than a thousand words” is a common expression, and for text books, it is often the case that a figure or an illustration can replace a large number of words as well. However, we believe that an interactive illustration can say even more, and that is why we have decided to build our linear algebra book around such illustrations. We believe that these figures make it easier and faster to digest and to learn linear algebra (which would be the case for many other mathematical books as well, for that matter). In addition, we have added some more features (e.g., popup windows for common linear algebra terms) to our book, and we believe that those features will make it easier and faster to read and understand as well.

After using linear algebra for 20 years times three persons, we were ready to write a linear algebra book that we think will make it substantially easier to learn and to teach linear algebra. In addition, the technology of mobile devices and web browsers have improved beyond a certain threshold, so that this book could be put together in a very novel and innovative way (we think). The idea is to start each chapter with an intuitive concrete example that practically shows how the math works using interactive illustrations. After that, the more formal math is introduced, and the concepts are generalized and sometimes made more abstract. We believe it is easier to understand the entire topic of linear algebra with a simple and concrete example cemented into the reader’s mind in the beginning of each chapter.

Please contact us if there are errors to report, things that you think should be improved, or if you have ideas for better exercises etc. We sincerely look forward to hearing from you, and we will continuously improve this book, and add contributing people to the acknowledgement.
… (popups omitted)

Unlike some standards I could mention, but won’t, the authors number just about everything, making it easy to reference equations, illustrations, etc.

Enjoy!

Older Posts »

Powered by WordPress