If You Believe in Parliaments

July 19th, 2017

If you believe in parliaments, other than as examples of how governments don’t “get it,” then the The Law Library of Congress, Global Legal Research Center has a treat for you!

Fifty (50) countries and seventy websites surveyed in: Features of (70)Parliamentary Websites in Selected Jurisdictions.

From the summary:

In recent years, parliaments around the world have enhanced their websites in order to improve access to legislative information and other parliamentary resources. Innovative features allow constituents and researchers to locate and utilize detailed information on laws and lawmaking in various ways. These include tracking tools and alerts, apps, the use of open data technology, and different search functions. In order to demonstrate some of the developments in this area, staff from the Global Legal Research Directorate of the Law Library of Congress surveyed the official parliamentary websites of fifty countries from all regions of the world, plus the website of the European Parliament. In some cases, information on more than one website is provided where separate sites have been established for different chambers of the national parliament, bringing the total number of individual websites surveyed to seventy.

While the information on the parliamentary websites is primarily in the national language of the particular country, around forty of the individual websites surveyed were found to provide at least limited information in one or more other languages. The European Parliament website can be translated into any of the twenty-four official languages of the members of the European Union.

All of the parliamentary websites included in the survey have at least basic browse tools that allow users to view legislation in a list format, and that may allow for viewing in, for example, date or title order. All of the substantive websites also enable searching, often providing a general search box for the whole site at the top of each page as well as more advanced search options for different types of documents. Some sites provide various facets that can be used to further narrow searches.

Around thirty-nine of the individual websites surveyed provide users with some form of tracking or alert function to receive updates on certain documents (including proposed legislation), parliamentary news, committee activities, or other aspects of the website. This includes the ability to subscribe to different RSS feeds and/or email alerts.

The ability to watch live or recorded proceedings of different parliaments, including debates within the relevant chamber as well as committee hearings, is a common feature of the parliamentary websites surveyed. Fifty-eight of the websites surveyed featured some form of video, including links to dedicated YouTube channels, specific pages where users can browse and search for embedded videos, and separate video services or portals that are linked to or viewable from the main site. Some countries also make videos available on dedicated mobile-friendly sites or apps, including Denmark, Germany, Ireland, the Netherlands, and New Zealand.

In total, apps containing parliamentary information are provided in just fourteen of the countries surveyed. In comparison, the parliamentary websites of thirty countries are available in mobile-friendly formats, enabling easy access to information and different functionalities using smartphones and tablets.

The table also provides information on some of the additional special features available on the surveyed websites. Examples include dedicated sites or pages that provide educational information about the parliament for children (Argentina, El Salvador, Germany, Israel, Netherlands, Spain, Taiwan, Turkey); calendar functions, including those that allow users to save information to their personal calendars or otherwise view information about different types of proceedings or events (available on at least twenty websites); and open data portals or other features that allow information to be downloaded in bulk for reuse or analysis, including through the use of APIs (application programming interfaces) (at least six countries).

With differing legal vocabularies and local personification of multi-nationals, this is a starting point for transparency based upon topic maps.

I first saw this in a tweet by the Global Investigative Journalism Network (GIJN).

Twitter – Government Censor’s Friend

July 15th, 2017

Governments, democratic, non-democratic, kingships, etc. that keep secrets from the public, share a common enemy in Wikileaks.

Wikileaks self-describes in part as:

WikiLeaks is a multi-national media organization and associated library. It was founded by its publisher Julian Assange in 2006.

WikiLeaks specializes in the analysis and publication of large datasets of censored or otherwise restricted official materials involving war, spying and corruption. It has so far published more than 10 million documents and associated analyses.

“WikiLeaks is a giant library of the world’s most persecuted documents. We give asylum to these documents, we analyze them, we promote them and we obtain more.” – Julian Assange, Der Spiegel Interview.

WikiLeaks has contractual relationships and secure communications paths to more than 100 major media organizations from around the world. This gives WikiLeaks sources negotiating power, impact and technical protections that would otherwise be difficult or impossible to achieve.

Although no organization can hope to have a perfect record forever, thus far WikiLeaks has a perfect in document authentication and resistance to all censorship attempts.

Those same governments, share a common ally in Twitter, which has engaged in systematic actions to diminish the presence/influence of Julian Assange on Twitter.

Caitlin Johnstone documents Twitter’s intentional campaign against Assange in Twitter Is Using Account Verification To Stifle Leaks And Promote War Propaganda.

Catch Johnstone’s post for the details but then:

  1. Follow @JulianAssange on Twitter (watch for minor variations that are not this account.
  2. Tweet to your followers, at least once a week, urging them to follow @JulianAssange
  3. Investigate and support non-censoring alternatives to Twitter.

You can verify Twitter’s dilution of Julian Assange for yourself.

Type “JulianAssange_” in the Twitter search box (my results):

Twitter was a remarkably good idea, but has long since poisoned itself with censorship and pettiness.

Your suggested alternative?

Next Office of Personnel Management (OPM) Leak, When, Not If

July 14th, 2017

2 Years After Massive Breach, OPM Isn’t Sufficiently Vetting IT Systems by Joseph Marks.

From the post:

More than two years after suffering a massive data beach, the Office of Personnel Management still isn’t sufficiently vetting many of its information systems, an auditor found.

In some cases, OPM is past due to re-authorize IT systems, the inspector general’s audit said. In other cases, OPM did reauthorize those systems but did it in a haphazard and shoddy way during a 2016 “authorization sprint,” the IG said.

“The lack of a valid authorization does not necessarily mean that a system is insecure,” the auditors said. “However, it does mean that a system is at a significantly higher risk of containing unidentified security vulnerabilities.”

The full audit provides more details but suffice it to say OPM security is as farcical as ever.

Do you think use of https://www.opm.gov/ in hacking examples and scripts, would call greater attention to flaws at the OPM?

Detecting Leaky AWS Buckets

July 14th, 2017

Experts Warn Too Often AWS S3 Are Misconfigured, Leak Data by Tom Spring.

From the post:

A rash of misconfigured Amazon Web Services storage servers leaking data to the internet have plagued companies recently. Earlier this week, data belonging to anywhere between six million and 14 million Verizon customers were left on an unprotected server belonging to a partner of the telecommunications firm. Last week, wrestling giant World Wide Entertainment accidentally exposed personal data of three million fans. In both cases, it was reported that data was stored on AWS S3 storage buckets.

Reasons why this keeps on happening vary. But, Detectify Labs believes many leaky servers trace back to common errors when it comes to setting up access controls for AWS Simple Storage Service (S3) buckets.

In a report released Thursday, Detectify’s Security Advisor Frans Rosén said network administrators too often gloss over rules for configuring AWS’ Access Control Lists (ACL) and the results are disastrous.

Jump to the report released Thursday for the juicy details.

Any thoughts on the going rate for discovery of leaky AWS buckets?

Could be something, could be nothing.

In any event, you should be checking your own AWS buckets.

Successful Phishing Subject Lines

July 14th, 2017

Gone Phishing: The Top 10 Attractive Lures by Roy Urrico.

From the post:

The list shows there’s still a lot of room to train employees on how to spot a phishing or spoofed email. Here they are:

  • Security Alert – 21%
  • Revised Vacation and Sick Time Policy – 14%
  • UPS Label Delivery 1ZBE312TNY00015011 – 10%
  • BREAKING: United Airlines Passenger Dies from Brain Hemorrhage – VIDEO – 10%
  • A Delivery Attempt was made – 10%
  • All Employees: Update your Healthcare Info – 9%
  • Change of Password Required Immediately – 8%
  • Password Check Required Immediately – 7%
  • Unusual sign-in activity – 6%
  • Urgent Action Required – 6%

*Capitalization is as it was in the phishing test subject line

A puff piece for KnowBe4 but a good starting point. KnowBe4 has an online phishing test among others. The phishing test requires registration.

Enjoy!

Targets of Government Cybercrimnal Units

July 14th, 2017

The Unfortunate Many: How Nation States Select Targets

From the post:

Key Takeaways

  • It’s safe to assume that all governments are developing and deploying cyber capabilities at some level. It’s also safe to assume most governments are far from open about the extent of their cyber activity.
  • If you take the time to understand why nation states get involved with cyber activity in the first place, you’ll find their attacks are much more predictable than they seem.
  • Each nation state has its own objectives and motivations for cyber activity. Even amongst big players like China, Russia, and the U.S. there’s a lot of variation.
  • Most nation states develop national five-year plans that inform all their cyber activities. Understanding these plans enables an organization to prioritize preparations for the most likely threats.

There’s a name for those who rely on governments, national or otherwise, to protect their cybersecurity: victims.

Recorded Future gives a quick overview of factors that may drive the objectives of government cybercriminal units.

I use “cybercriminal units” to avoid the false dichotomy between alleged “legitimate” government hacking and that of other governments and individuals.

We’re all adults here and realize government is a particular distribution of reward and stripes, nothing more. It has no vision, no goal beyond self-preservation and certainly, beyond your locally owned officials, no interest in you or yours.

That is to say governments undertaking hacking to further a “particular distribution of reward and stripes” and their choices are no more (or less) legitimate than anyone else’s.

Government choices are certainly no more legitimate than your choices. Although governments claim a monopoly on criminal prosecutions, which accounts for why criminals acting on their behalf are never prosecuted. That monopoly also explains why governments, assuming they have possession of your person, may prosecute you for locally defined “criminal” acts.

Read the Recorded Future post to judge your odds of being a victim of a national government. Then consider which governments should be your victims.

Summer Pocket Change – OrientDB Code Execution

July 14th, 2017

SSD Advisory – OrientDB Code Execution

From the webpage:

Want to get paid for a vulnerability similar to this one?

Contact us at: ssd@beyondsecurity.com

Vulnerability Summary

The following advisory reports a vulnerability in OrientDB which allows users of the product to cause it to execute code.

OrientDB is a Distributed Graph Database engine with the flexibility of a Document Database all in one product. The first and best scalable, high-performance, operational NoSQL database.

Credit

An independent security researcher, Francis Alexander, has reported this vulnerability to Beyond Security’s SecuriTeam Secure Disclosure program.

Vendor response

The vendor has released patches to address this vulnerability.

For more information: https://github.com/orientechnologies/orientdb/wiki/OrientDB-2.2-Release-Notes#security.

Some vulnerabilities require deep code analysis, others, well, just asking the right questions.

If you are looking for summer pocket change, check out default users, permissions, etc. on popular software.

Locate Your Representative/Senator In Hell

July 13th, 2017

Mapping Dante’s Inferno, One Circle of Hell at a Time by Anika Burgess.

From the post:

I found myself, in truth, on the brink of the valley of the sad abyss that gathers the thunder of an infinite howling. It was so dark, and deep, and clouded, that I could see nothing by staring into its depths.”

This is the vision that greets the author and narrator upon entry the first circle of Hell—Limbo, home to honorable pagans—in Dante Alighieri’s Inferno, the first part of his 14th-century epic poem, Divine Comedy. Before Dante and his guide, the classical poet Virgil, encounter Purgatorio and Paradiso, they must first journey through a multilayered hellscape of sinners—from the lustful and gluttonous of the early circles to the heretics and traitors that dwell below. This first leg of their journey culminates, at Earth’s very core, with Satan, encased in ice up to his waist, eternally gnawing on Judas, Brutus, and Cassius (traitors to God) in his three mouths. In addition to being among the greatest Italian literary works, Divine Comedy also heralded a craze for “infernal cartography,” or mapping the Hell that Dante had created.
… (emphasis in original)

Burgess has collected seven (7) traditional maps of the Inferno. I take them to be early essays in the art of visualization. They are by no means, individually or collectively, the definitive visualizations of the Inferno.

The chief deficit of all seven, to me, is the narrowness of the circles/ledges. As I read the Inferno, Dante and Virgil are not pressed for space. Expanding and populating the circles more realistically is one starting point.

The Inferno has no shortage of characters in each circle, Dante predicting the fate of Pope Boniface VIII, to place him in the eight circle of Hell (simoniacs A subclass of fraud.). (Use the online Britannica with caution. It’s entry for Boniface VIII doesn’t even mention the Inferno. (As of July 13, 2017.)

I would like to think being condemned to Hell by no less than Dante would rate at least a mention in my biography!

Sadly, Dante is no longer around to add to the populace of the Inferno but new visualizations could take the opportunity to update the resident list for Hell!

It’s an exercise in visualization, mapping, 14th century literature, and, an excuse to learn the name of your representative and senators.

Enjoy!

DigitalGlobe Platform

July 12th, 2017

DigitalGlobe Platform

The Maps API offers:

Recent Imagery

A curated satellite imagery layer of the entire globe. More than 80% of the Earth’s landmass is covered with high-resolution (30 cm-60 cm) imagery, supplemented with cloud-free LandSat 8 as a backdrop.

Street Map

An accurate, seamless street reference map. Based on contributions from the OpenStreetMap community, this layer combines global coverage with essential “locals only” perspectives.

Terrain Map

A seamless, visually appealing terrain perspective of the planet. Shaded terrain with contours guide you through the landscape, and OpenStreetMap reference vectors provide complete locational context.

Prices start at $5/month and go up. (5,000 map views for $5.)

BTW, 30 cm is 11.811 inches, just a little less than a foot.

For planning constructive or disruptive activities, that should be sufficient precision.

I haven’t tried the service personally but the resolution of the imagery compels me to mention it.

Enjoy!

Graphing the distribution of English letters towards…

July 11th, 2017

Graphing the distribution of English letters towards the beginning, middle or end of words by David Taylor.

From the post:

(partial image)

Some data visualizations tell you something you never knew. Others tell you things you knew, but didn’t know you knew. This was the case for this visualization.

Many choices had to be made to visually present this essentially semi-quantitative data (how do you compare a 3- and a 13-letter word?). I semi-exhaustively explain everything at on my other, geekier blog, prooffreaderplus, and provide the code I used; I’ll just repeat the most crucial here:

The counts here were generated from Brown corpus, which is composed of texts printed in 1961.

Take Taylor’s post as an inducement to read both Prooffreader Plus and Prooffreader on a regular basis.

Media Verification Assistant + Tweet Verification Assistant

July 11th, 2017

Media Verification Assistant

From the welcome screen:

Who

We are a joint team of engineers and investigators from CERTH-ITI and Deutsche Welle, aiming to build a comprehensive tool for image verification on the Web.

Features

The Media Verification Assistant features a multitude of image tampering detection algorithms plus metadata analysis, GPS Geolocation, EXIF Thumbnail extraction and integration with Google reverse image search.

Alpha

It is constantly being developed, expanded and upgraded -our ambition is to include most state-of-the-art verification technologies currently available on the Web, plus unique implementations of numerous experimental algorithms from the research literature. As the platform is currently in its Alpha stage, errors may occur and some algorithms may not operate as expected.

Feedback

For comments, suggestions and error reports, please contact verifymedia@iti.gr.

Sharing

The source code of the Java back-end is freely distributed at GitHub.

Even in alpha, this is a great project!

Even though images can be easily altered, Photoshop and Gimp, they continue to be admissible in court, so long as a witness testifies the image
is a fair and accurate representation of the subject matter.

This project has spawned a related project: Tweet Verification Assistant, which leverages the image algorithms to verify tweets with an image or video.

Another first stop before retweeting or re-publishing an image with a story.

Open Islamicate Texts Initiative (OpenITI)

July 11th, 2017

Open Islamicate Texts Initiative (OpenITI)

From the description (Annotation) of the project:

Books are grouped into authors. All authors are grouped into 25 AH periods, based on the year of their death. These repositories are the main working loci—if any modifications are to be added or made to texts or metadata, all has to be done in files in these folders.

There are three types of text repositories:

  • RAWrabicaXXXXXX repositories include raw texts as they were collected from various open-access online repositories and libraries. These texts are in their initial (raw) format and require reformatting and further integration into OpenITI. The overall current number of text files is over 40,000; slightly over 7,000 have been integrated into OpenITI.
  • XXXXAH are the main working folders that include integrated texts (all coming from collections included into RAWrabicaXXXXXX repositories).
  • i.xxxxx repositories are instantiations of the OpenITI corpus adapted for specific forms of analysis. At the moment, these include the following instantiations (in progress):
    • i.cex with all texts split mechanically into 300 word units, converted into cex format.
    • i.mech with all texts split mechanically into 300 word units.
    • i.logic with all texts split into logical units (chapters, sections, etc.); only tagged texts are included here (~130 texts at the moment).
    • i.passim_new_mech with all texts split mechanically into 300 word units, converted for the use with new passim (JSON).
    • [not created yet] i.passim_new_mech_cluster with all text split mechanically into 900 word units (3 milestones) with 300 word overlap; converted for the use with new passim (JSON).
    • i.passim_old_mech with all texts split mechanically into 300 word units, converted for the use with old passim (XML, gzipped).
    • i.stylo includes all texts from OpenITI (duplicates excluded) that are renamed and slightly reformatted (Arabic orthography is simplified) for the use with stylo R-package.

A project/site to join to hone your Arabic NLP and reading skills.

Enjoy!

The Classical Language Toolkit

July 11th, 2017

The Classical Language Toolkit

From the webpage:

The Classical Language Toolkit (CLTK) offers natural language processing (NLP) support for the languages of Ancient, Classical, and Medieval Eurasia. Greek and Latin functionality are currently most complete.

Goals

  • compile analysis-friendly corpora;
  • collect and generate linguistic data;
  • act as a free and open platform for generating scientific research.

You are sure to find one or more languages of interest:

Collecting, analyzing and mapping Tweets can be profitable and entertaining, but tomorrow or perhaps by next week, almost no one will read them again.

The texts in this project survived by hand preservation for thousands of years. People are still reading them.

How about you?

Truth In Terrorism Labeling (TITL) – A Starter Set

July 11th, 2017

Sam Biddle‘s recent post: Facebook’s Tough-On-Terror Talk Overlooks White Extremists, is a timely reminder that “terrorism” and “terrorist” are labels with no agreed upon meaning.

To illustrate, here are some common definitions with suggestions for specifying the definition in use:

Terrorist/terrorism(Biddle): ISIS, Al Qaeda, and US white extremists. But not Tibetans and Uyghurs.

Terrorist/terrorism(China): From: How China Sees ISIS Is Not How It Sees ‘Terrorism’:

… in Chinese discourse, terrorism is employed exclusively in reference to Tibetans and Uyghurs. Official government statements typically avoid identifying acts of violence with a specific ethnic group, preferring more generic descriptors like “Xinjiang terrorists,“ “East Turkestan terror forces and groups,” the “Tibetan Youth Congress,” or the “Dalai clique.” In online Chinese chat-rooms, however, epithets like “Uyghur terrorist” or “Tibetan splittest” are commonplace and sometimes combine with homophonic racial slurs like “dirty Tibetans” or “raghead Uyghurs.”

Limiting “terrorism” to Tibetans and Uyghurs excludes ISIS, Al Qaeda, and US white extremists from that term.

Terrorist/terrorism(Facebook): ISIS, Al Qaeda, but no US white extremists (following US)

Terrorist/terrorism(Russia): Putin’s Flexible Definition of Terrorism

Who, exactly, counts as a terrorist? If you’re Russian President Vladimir Putin, the definition might just depend on how close or far the “terror” is from Moscow. A court in the Nizhniy Novgorod regional center last week gave a suspended two year sentence to Stanislav Dmitriyevsky, Chair of the local Russian-Chechen Friendship Society, and editor of Rights Defense bulletin. Dmitriyevsky was found guilty of fomenting ethnic hatred, simply because in March 2004, he published an appeal by Chechen rebel leader Aslan Maskhadov — later killed by Russian security services — and Maskhadov’s envoy in Europe, Akhmet Zakayev.

Maskhadov, you see, is officially a terrorist in the eyes of the Kremlin. Hamas, however, isn’t. Putin said so at his Kremlin press-conference on Thursday, where he extended an invitation — eagerly accepted — to Hamas’s leaders to Moscow for an official visit.

In fairness to Putin, as a practical matter, who is or is not a “terrorist” for the US depends on the state of US support. US supporting, not terrorists, US not supporting, likely to be terrorists.

Terrorist/terrorism(US): Generally ISIS, Al Qaeda, no US white extremists, for details see: Terrorist Organizations.

By appending parentheses and Biddle, China, Facebook, Russia, or US to terrorist or terrorism, the reading public has some chance to understand your usage of “terrorism/terrorist.”

Otherwise they are nodding along using their definitions of “terrorism/terrorist” and not yours.

Or was that vagueness intentional on your part?

New York Times, Fact Checking and Dacosta’s First OpEd

July 7th, 2017

Cutbacks on editors/fact-checking at the New York Times came at an unfortunate time for Marc Dacosta‘s first OpEd, The President Wants to Keep Us in the Dark (New York Times, 28 June 2017).

DaCosta decries the lack of TV cameras at several recent White House press briefings. Any proof the lack of TV cameras altered the information available to reporters covering the briefings? Here’s DaCosta on that point:


But the truth is that the decision to prevent the press secretary’s comments on the day’s most pressing matters from being televised is an affront to the spirit of an open and participatory government. It’s especially chilling in a country governed by a Constitution whose very First Amendment protects the freedom of the press.

Unfortunately, the slow death of the daily press briefing is only part of a larger assault by the Trump administration on a precious public resource: information.

DaCosta’s implied answer is no, a lack of TV cameras resulted in no diminishing of information from the press conference. But, his hyperbole gland kicks in, then he cites disjointed events claimed to diminish public access to information.

For example, Trump’s non-publication of visitor records:


Immediately after Mr. Trump took office, the administration stopped publishing daily White House visitor records, reversing a practice established by President Obama detailing the six million appointments he and administration officials took at the White House during his eight years in office. Who is Mr. Trump meeting with today? What about Mr. Bannon? Good luck finding out.

Really? Mark J. Rozell summarizes the “detailing the six million appointments he and administration officials took…” this way:


Obama’s action clearly violated his own pledge of transparency and an outpouring of criticism of his action somewhat made a difference. He later reversed his position when he announced that indeed the White House visitor logs would be made public after all.

Unfortunately, the president decided only to release lengthy lists of names, with no mention of the purpose of White House visits or even differentiation between tourists and people consulted on policy development.

This action enabled the Obama White House to appear to be promoting openness while providing no substantively useful information. If the visitor log listed “Michael Jordan,” there was no way to tell if the basketball great or a same-named industry lobbyist was the person at the White House that day and the layers of inquiry required to get that information were onerous. But largely because the president had appeared to have reversed himself in reaction to criticism for lack of transparency, the controversy died down, though it should not have.

Much of the current reaction to President Trump’s decision has contrasted that with the action of his predecessor, and claimed that Obama had set the proper standard by opening the books. The reality is different though, as Obama’s action set no standard at all for transparency.
…(Trump should open White House visitor logs, but don’t flatter Obama, The Hill, 18 April 2017)

That last line on White House visitor records under Obama is worth repeating:

The reality is different though, as Obama’s action set no standard at all for transparency.

Obama-style opaqueness would not answer the questions:

Who is Mr. Trump meeting with today? What about Mr. Bannon? [Questions by DaCosta.]

A fact-checker and/or editor at the New York Times knew that answer (hint to NYT management).

Even more disappointing is the failure of DaCosta, as the co-founder of Engima, to bring any data to a claim that White House press briefings are of value.

One way to test the value of White House press briefings is to extract the “facts” announced during the briefing and compare those to media reports in the prior twenty-four hours.

If DaCosta thought of such a test, the reason it went unperformed isn’t hard to guess:


The Senate had just released details of a health care plan that would deprive 22 million Americans of health insurance, and President Trump announced that he did not, as he had previously hinted, surreptitiously record his conversations with James Comey, the former F.B.I. director.
… (DaCosta)

First, a presidential press briefing isn’t an organ for the US Senate and second, Trump had already tweeted the news about not recording his conversations with James Comey. None of those “facts” broke at the presidential press briefing.

DaCosta is 0 for 2 for new facts at that press conference.

I offer no defense for the current administration’s lack of transparency, but fact-free and factually wrong claims against it don’t advance DaCosta’s cause:


Differences of belief and opinion are inseparable from the democratic process, but when the facts are in dispute or, worse, erased altogether, public debate risks breaking down. To have a free and democratic society we all need a common and shared context of facts to draw from. Facts or data will themselves never solve any problem. But without them, finding solutions to our common problems is impossible.

We should all expect better of President Trump, the New York Times and Marc DaCosta (@marc_dacosta).

Deanonymizing the Past

July 6th, 2017

What Ever Happened to All the Old Racist Whites from those Civil Rights Photos? by Johnny Silvercloud raises an interesting question but never considers it from a modern technology perspective.

Silvercloud includes this lunch counter image:

I count almost twenty (20) full or partial faces in this one image. Thousands if not hundreds of thousands of other images from the civil rights era capture similar scenes.

Then it occurred to me, unlike prior generations with volumes of photographs, populated by anonymous bystanders/perpetrators to/of infamous acts, we have the present capacity to deanonimize the past.

As a starting point, may I suggest Deep Face Recognition by Omkar M. Parkhi, Andrea Vedaldi, Andrew Zisserman, one of the more popular papers in this area, with 429 citations as of today (06 July 2017).

Abstract:

The goal of this paper is face recognition – from either a single photograph or from a set of faces tracked in a video. Recent progress in this area has been due to two factors: (i) end to end learning for the task using a convolutional neural network (CNN), and (ii) the availability of very large scale training datasets.

We make two contributions: first, we show how a very large scale dataset (2.6M images, over 2.6K people) can be assembled by a combination of automation and human in the loop, and discuss the trade off between data purity and time; second, we traverse through the complexities of deep network training and face recognition to present methods and procedures to achieve comparable state of the art results on the standard LFW and YTF face benchmarks.

That article was written in 2015 so consulting a 2017 summary update posted to Quora is advised for current details.

Banks, governments and others are using facial recognition for their own purposes, let’s also uses it to hold people responsible for their moral choices.

Moral choices at lunch counters, police riots, soldiers and camp guards from any number of countries and time periods, etc.

Yes?

Kaspersky: Is Source Code Disclosure Meaningful?

July 6th, 2017

Responding to a proposed ban of Kaspersky Labs software, Eugene Kaspersky, chief executive of Kaspersky, is quoted in Russia’s Kaspersky Lab offers up source code for US government scrutiny, as saying:

The chief executive of Russia’s Kaspersky Lab says he’s ready to have his company’s source code examined by U.S. government officials to help dispel long-lingering suspicions about his company’s ties to the Kremlin.

In an interview with The Associated Press at his Moscow headquarters, Eugene Kaspersky said Saturday that he’s also ready to move part of his research work to the U.S. to help counter rumors that he said were first started more than two decades ago out of professional jealousy.

“If the United States needs, we can disclose the source code,” he said, adding that he was ready to testify before U.S. lawmakers as well. “Anything I can do to prove that we don’t behave maliciously I will do it.”

Personally I think Kaspersky is about to be victimized by anti-Russia hysteria, where repetition of rumors, not facts, are the coin of the realm.

Is source code disclosure is meaningful? A question applicable to Kasperky disclosures to U.S. government officials, or Microsoft or Oracle disclosures of source code to foreign governments.

My answer is no, at least if you mean source code disclosure limited to governments or other clients.

Here’s why:

  • Limited competence: For the FBI in particular, source code disclosure is meaningless. Recall the FBI blew away $170 million in the Virtual Case File project with nothing to show and no prospect of a timeline, after four years of effort.
  • Limited resources: Guido Vranken‘s The OpenVPN post-audit bug bonanza demonstrates that after two (2) manual audits, vulnerabilities remain to be found in OpenVPN. Unlike OpenVPN, any source code given to a government will be reviewed at most once and then only by a limited number of individuals. Contrast that with OpenVPN, which has been reviewed for years by a large number of people and yets flaws remain to be discovered.
  • Limited staff: Closely related to my point about limited resources, the people in government who are competent to undertake a software review are already busy with other tasks. Most governments don’t have a corps of idle but competent programmers waiting for source code disclosures to evaluate. Whatever source code review takes place, it will be the minimum required and that only as other priorities allow.

If Kaspersky Labs were to open source but retain copyright on their software, then their source code could be reviewed by:

  • As many competent programmers as are interested
  • On an ongoing basis
  • By people with varying skills and approaches to software auditing

Setting a new standard, that is open source but copyrighted for security software, would be to the advantage of leaders in Gartner’s Magic Quadrant, others, not so much.

It’s entirely possible for someone to compile source code and avoid paying a license fee but seriously, is anyone going to pursue pennies on the ground when there are $100 bills blowing overhead? Auditing, code review, transparency, trust. (I know, the RIAA chases pennies but it’s run by delusional paranoids.)

Three additional reasons for Kaspersky to go open source but copyrighted:

  • Angst among its more poorly managed competitors will soar.
  • Example for government mandated open source but copyright for domestic sales. (Think China, EU, Russia.)
  • Front page news featuring Kaspersky Labs as breaking away from the pack.

Entirely possible for Kaspersky to take advantage of the narrow-minded nationalism now so popular in some circles of the U.S. government. Not to mention changing the landscape of security software to its advantage.

Full Fact is developing two new tools for automated fact-checking

July 6th, 2017

Full Fact is developing two new tools for automated fact-checking by Mădălina Ciobanu.

From the post:

The first tool, Live, is based on the assumption that people, especially politicians, repeat themselves, Babakar explained, so a claim that is knowingly or unknowingly false or inaccurate is likely to be said more than once by different people.

Once Full Fact has fact-checked a claim, it becomes part of their database, and the next step is making sure that data is available every time the same assertion is being made, whether on TV or at a press conference. “That’s when it gets interesting – how can you scale the fact check so that it can be distributed in a much grander way?”

Live will be able to monitor live TV subtitles and eventually perform speech-to-text analysis, taking a live transcript from a radio programme or a press conference and matching it against Full Fact’s database.

The second tool Full Fact is building is called Trends, and it aims to record every time a wrong or false claim is repeated, and by whom, to enable fact-checkers to track who or what is “putting misleading claims out into the world”.

Because part of Full Fact’s remit is also to get corrections on claims they verify, the team wants to be able to measure the work of their impact, by looking at whether a claim has been said again once they have fact-checked it and requested a correction for it.

The work on Live and Trends has just been funded and the tools are scheduled to appear in 2018.

They are hiring, by the way: Automated Factchecking at Full Fact. Full Fact is also a charity, in case you want to donate to support this work.

I wonder how Full Fact rate stories such as Crowdstrike‘s, a security firm that lives in the back pocket of the Democratic Party (US), report claiming Russian hacks of the DNC? A report it later revised.

Personally since the claims were “confirmed” by a known liar, James Capper, former Director of National Intelligence, I would downgrade such reports and repetitions by others to latrine gossip.

In case you haven’t read in detail the various reports, there have been no records produced, but much looks like, “in our experience,” etc., but a positive dearth of facts. That interested “experts” say it is so, in the absence of evidence, doesn’t make their claims facts.

Looking forward to news on these projects as they develop!

The State of Automated Factchecking

July 5th, 2017

The State of Automated Factchecking by Mevan Babakar and Will Moy.

From the webpage:

The State of Automated Factchecking is an in-depth report looking at where we are with automated factchecking globally, and where we could get to with the necessary funding.

It sets out Full Fact’s roadmap for our own work on automated factchecking, and our design principles for the tools we are building.

We propose principles of collaboration for factchecking organisations, researchers and computer scientists around the world.

We hope that it will be the beginning of many fruitful conversations.

It’s split into two parts:

Part One: A roadmap for automated factchecking
Part Two: What we can do now and what remains to be done

Summary

  • We can scale up and speed up factchecking dramatically using technology that exists now.
  • We are months—and relatively small amounts of money—away from handing practical automated tools to factcheckers and journalists. This is not the horizon of artificial intelligence; it is simply the application of existing technology to factchecking.
  • Automated factchecking projects are taking place across the world, but they are fragmented. This means factcheckers and researchers are wasting time and money reinventing the wheel.
  • We propose open standards. Automated factchecking will come to fruition in a more coherent and efficient way if key players think in terms of similar questions and design principles, learn from existing language processing tasks, and build shared infrastructure.
  • International collaboration is vital so that the system works in several languages and countries.
  • Research into machine learning must continue, but we can make serious progress harnessing other technologies in the meantime.

Read the full report

Read The State of Automated Factchecking (pdf, 6Mb) and sign up below to keep up with the latest.

Stay up to date

To stay updated on our progress subscribe to our automated factchecking mailing list, or for any specific questions email Mevan Babakar at mevan@fullfact.org

I mention this report as background reading for the latest efforts by Full Fact to develop automated fact-checking tools.

Enjoy!

Media outrage on threatened violence against Assange?

July 2nd, 2017

Assange Compiles Media Figures, Establishment Democrats Calling For His Death by Elizabeth Vos.

From the post:

Wikileaks Editor-in-Chief Julian Assange tweeted extensively overnight regarding what he labeled tolerant liberals who have called for his assassination and torture. Assange called such media figures “blue-ticks.” It is not clear at this time what may have prompted the series of tweets. Assange also referenced the torture and murder of what he called “alleged sources” during the series of tweets. He also implicated Hillary Clinton in some of the references. Some understood this to be a reference to the upcoming anniversary of Seth Rich’s murder, but it is not clear at this time who Assange may have been specifically referring to.

I won’t repeat the latest dust-up between the US media and the village idiot they helped elect with their fascination for “man bites dog” type news. Candidate X said Y or did outrageous act Z, so, why is that news? The “media” that reports meetings between US presidents and aliens in the Rose Garden needs something to print. Could have left such stories to them.

Now, however, the candidate they favored with $millions if not $billions in free coverage, is rough-housing with the media. Oh, my!

When you think about reporters in other countries who die on a regular basis, year in and year out, a little harsh talk pales by comparison.

Not to mention the hypocrisy of the US media that reacts to every unkind twitch of the current Whitehouse, but blandly reports calls for the murder of Julian Assange.

I disagree with Assange’s partial leaks*, but even with partial leaks, Assange has empowered public discussion of vital issues for years. You need to ask yourself why in the face of that history, he is not attracting support from mainstream media. Or outrage at calls for violence, explicit calls, against him. (Care to comment New York Times, Washington Post?)

* I disagree on partial leaks because full leaks are likely to be more damaging to those responsible for immoral and/or illegal activity. To that end, those harmed by leaks should have made better choices.

Wikipedia: The Text Adventure

June 30th, 2017

Wikipedia: The Text Adventure by Kevan Davis.

You can choose a starting location or enter any Wikipedia article as your starting point. Described by Davis as interactive non-fiction.

Interesting way to learn an area but the graphics leave much to be desired.

If you like games, see http://kevan.org/ for a number of games designed by Davis.

Paparazzi, for example, with some modifications, could be adapted to being a data mining exercise with public data feeds. “Public” in the sense the data feeds are being obtained from public cameras.

Images from congressional hearings for example. All of the people in those images, aside from the members of congress and the witness, have identities and possibly access to information of interest to you. The same is true for people observed going to and from federal or state government offices.

Crowd-sourcing identification of people in such images, assuming you have pre-clustered them by image similarity, could make government staff and visitors more transparent than they are at present.

Enjoy the Wikipedia text adventure and mine the list of games for ideas on building data-related games.

Mistaken Location of Creativity in “Machine Creativity Beats Some Modern Art”

June 30th, 2017

Machine Creativity Beats Some Modern Art

From the post:

Creativity is one of the great challenges for machine intelligence. There is no shortage of evidence showing how machines can match and even outperform humans in vast areas of endeavor, such as face and object recognition, doodling, image synthesis, language translation, a vast variety of games such as chess and Go, and so on. But when it comes to creativity, the machines lag well behind.

Not through lack of effort. For example, machines have learned to recognize artistic style, separate it from the content of an image, and then apply it to other images. That makes it possible to convert any photograph into the style of Van Gogh’s Starry Night, for instance. But while this and other work provides important insight into the nature of artistic style, it doesn’t count as creativity. So the challenge remains to find ways of exploiting machine intelligence for creative purposes.

Today, we get some insight into progress in this area thanks to the work of Ahmed Elgammal at the Art & AI Laboratory at Rutgers University in New Jersey, along with colleagues at Facebook’s AI labs and elsewhere.
… (emphasis in original)

This summary of CAN: Creative Adversarial Networks, Generating “Art” by Learning About Styles and Deviating from Style Norms by Ahmed Elgammal, Bingchen Liu, Mohamed Elhoseiny, Marian Mazzone, repeats a mistake made by the authors, that is the misplacement of creativity.

Creativity, indeed, even art itself, is easily argued to reside in the viewer (reader) and not the creator at all.

To illustrate, I quote a long passage from Stanley Fish’s How to Recognize a Poem When You See One below but a quick summary/reminder goes like this:

Fish was teaching back to back classes in the same classroom and for the first class, wrote a list of authors on the blackboard. After the first class ended but before the second class, a poetry class, arrived, he enclosed the list of authors in a rectangle and wrote a page number, as though the list was from a book. When the second class arrived, he asked them to interpret the “poem” that was on the board. Which they proceeded to do. Where would you locate creativity in that situation?

The longer and better written start of the story (by Fish):

[1] Last time I sketched out an argument by which meanings are the property neither of fixed and stable texts nor of free and independent readers but of interpretive communities that are responsible both for the shape of a reader’s activities and for the texts those activities produce. In this lecture I propose to extend that argument so as to account not only for the meanings a poem might be said to have but for the fact of its being recognized as a poem in the first place. And once again I would like to begin with an anecdote.

[2] In the summer of 1971 I was teaching two courses under the joint auspices of the Linguistic Institute of America and the English Department of the State University of New York at Buffalo. I taught these courses in the morning and in the same room. At 9:30 I would meet a group of students who were interested in the relationship between linguistics and literary criticism. Our nominal subject was stylistics but our concerns were finally theoretical and extended to the presuppositions and assumptions which underlie both linguistic and literary practice. At 11:00 these students were replaced by another group whose concerns were exclusively literary and were in fact confined to English religious poetry of the seventeenth century. These students had been learning how to identify Christian symbols and how to recognize typological patterns and how to move from the observation of these symbols and patterns to the specification of a poetic intention that was usually didactic or homiletic. On the day I am thinking about, the only connection between the two classes was an assignment given to the first which was still on the blackboard at the beginning of the second. It read:

Jacobs-Rosenbaum
Levin
Thorne
Hayes
Ohman (?)

[3] I am sure that many of you will already have recognized the names on this list, but for the sake of the record, allow me to identify them. Roderick Jacobs and Peter Rosenbaum are two linguists who have coauthored a number of textbooks and coedited a number of anthologies. Samuel Levin is a linguist who was one of the first to apply the operations of transformational grammar to literary texts. J. P. Thorne is a linguist at Edinburgh who, like Levin, was attempting to extend the rules of transformational grammar to the notorious ir-regularities of poetic language. Curtis Hayes is a linguist who was then using transformational grammar in order to establish an objective basis for his intuitive impression that the language of Gibbon’s Decline and Fall of the Roman Empire is more complex than the language of Hemingway’s novels. And Richard Ohmann is the literary critic who, more than any other, was responsible for introducing the vocabulary of transformational grammar to the literary community. Ohmann’s name was spelled as you see it here because I could not remember whether it contained one or two n’s. In other words, the question mark in parenthesis signified nothing more than a faulty memory and a desire on my part to appear scrupulous. The fact that the names appeared in a list that was arranged vertically, and that Levin, Thorne, and Hayes formed a column that was more or less centered in relation to the paired names of Jacobs and Rosenbaum, was similarly accidental and was evidence only of a certain compulsiveness if, indeed, it was evidence of anything at all.

[4] In the time between the two classes I made only one change. I drew a frame around the assignment and wrote on the top of that frame “p. 43.” When the members of the second class filed in I told them that what they saw on the blackboard was a religious poem of the kind they had been studying and I asked them to interpret it. Immediately they began to perform in a manner that, for reasons which will become clear, was more or less predictable. The first student to speak pointed out that the poem was probably a hieroglyph, although he was not sure whether it was in the shape of a cross or an altar. This question was set aside as the other students, following his lead, began to concentrate on individual words, interrupting each other with suggestions that came so quickly that they seemed spontaneous. The first line of the poem (the very order of events assumed the already constituted status of the object) received the most attention: Jacobs was explicated as a reference to Jacob’s ladder, traditionally allegorized as a figure for the Christian ascent to heaven. In this poem, however, or so my students told me, the means of ascent is not a ladder but a tree, a rose tree or rosenbaum. This was seen to be an obvious reference to the Virgin Mary who was often characterized as a rose without thorns, itself an emblem of the immaculate conception. At this point the poem appeared to the students to be operating in the familiar manner of an iconographic riddle. It at once posed the question, “How is it that a man can climb to heaven by means of a rose tree?” and directed the reader to the inevitable answer: by the fruit of that tree, the fruit of Mary’s womb, Jesus. Once this interpretation was established it received support from, and conferred significance on, the word “thorne,” which could only be an allusion to the crown of thorns, a symbol of the trial suffered by Jesus and of the price he paid to save us all. It was only a short step (really no step at all) from this insight to the recognition of Levin as a double reference, first to the tribe of Levi, of whose priestly function Christ was the fulfillment, and second to the unleavened bread carried by the children of Israel on their exodus from Egypt, the place of sin, and in response to the call of Moses, perhaps the most familiar of the old testament types of Christ. The final word of the poem was given at least three complementary readings: it could be “omen,” especially since so much of the poem is concerned with foreshadowing and prophecy; it could be Oh Man, since it is mans story as it intersects with the divine plan that is the poem’s subject; and it could, of course, be simply “amen,” the proper conclusion to a poem celebrating the love and mercy shown by a God who gave his only begotten son so that we may live.

[5] In addition to specifying significances for the words of the poem and relating those significances to one another, the students began to discern larger structural patterns. It was noted that of the six names in the poem three–Jacobs, Rosenbaum, and Levin–are Hebrew, two–Thorne and Hayes–are Christian, and one–Ohman–is ambiguous, the ambiguity being marked in the poem itself (as the phrase goes) by the question mark in parenthesis. This division was seen as a reflection of the basic distinction between the old dis-pensation and the new, the law of sin and the law of love. That distinction, however, is blurred and finally dissolved by the typological perspective which invests the old testament events and heroes with new testament meanings. The structure of the poem, my students concluded, is therefore a double one, establishing and undermining its basic pattern (Hebrew vs. Christian) at the same time. In this context there is finally no pressure to resolve the ambiguity of Ohman since the two possible readings–the name is Hebrew, the name is Christian–are both authorized by the reconciling presence in the poem of Jesus Christ. Finally, I must report that one student took to counting letters and found, to no one’s surprise, that the most prominent letters in the poem were S, O, N.

The account by Fish isn’t long and is highly recommended if you are interested in this issue.

If readers/viewers interpret images as art, is the “creativity” of the process that brought it into being even meaningful? Or does polling of viewers measure their appreciation of an image as art, without regard to the process that created it? Exactly what are we measuring when polling such viewers?

By Fish’s account, such a poll tells us a great deal about the viewers but nothing about the creator of the art.

FYI, that same lesson applies to column headers, metadata keys, and indeed, data itself. Which means the “meaning” of what you wrote may be obvious to you, but not to anyone else.

Topic maps can increase your odds of being understood or discovering the understanding captured by others.

Neo4j 3.3.0-alpha02 (Graphs For Schemas?)

June 30th, 2017

Neo4j 3.3.0-alpha02

A bit late (release was 06/15/2017) but give Neo4j 3.3.0-alpha02 a spin over the weekend.

From the post:


Detailed Changes and Docs

For the complete list of all changes, please see the changelog. Look for 3.3 Developer manual here, and 3.3 Operations manual here.

Neo4j is one of the graph engines a friend wants to use for analysis/modeling of the ODF 1.2 schema. The traditional indented list is only one tree visualization out of the four major ones.

(From: Trees & Graphs by Nathalie Henry Riche, Microsoft Research)

Riche’s presentation covers a number of other ways to visualize trees and if you relax the “tree” requirement for display, interesting graph visualizations that may give insight into a schema design.

The slides are part of the materials for CSE512 Data Visualization (Winter 2014), so references for visualizing trees and graphs need to be updated. Check the course resources link for more visualization resources.

Reinventing Wheels with No Wheel Experience

June 30th, 2017

Rob Graham, @ErrataRob, captured an essential truth when he tweeted:

Wheel re-invention is inherent every new programming language, every new library, and no doubt, nearly every new program.

How much “wheel experience” every programmer has across the breath of software vulnerabilities?

Hard to imagine meaningful numbers on the “wheel experience” of programmers in general but vulnerability reports make it clear either “wheel experience” is lacking or the lesson didn’t stick. Your call.

Vulnerabilities may occur in any release so standard practice is to check every release, however small. Have your results independently verified by trusted others.

PS: For the details on systemd, see: Sergey Bratus and the systemd thread.

If Silo Owners Love Their Children Too*

June 30th, 2017

* Apologies to Sting for the riff on the lyrics to Russians.

Topic Maps Now by Michel Biezunski.

From the post:

This article is my assessment on where Topic Maps are standing today. There is a striking contradiction between the fact that many web sites are organized as a set of interrelated topics — Wikipedia for example — and the fact that the name “Topic Maps” is hardly ever mentioned. In this paper, I will show why this is happening and advocate that the notions of topic mapping are still useful, even if they need to be adapted to new methods and systems. Furthermore, this flexibility in itself is a guarantee that they are still going to be relevant in the long term.

I have spent many years working with topic maps. I took part in the design of the initial topic maps model, I started the process to transform the conceptual model into an international standard. We published the first edition of Topic Maps ISO/IEC 13250 in 2000, and an update and a couple of years later in XML. Several other additions to the standard were published since then, the most recent one in 2015. During the last 15 years, I have helped clients create and manage topic map applications, and I am still doing it.

An interesting read, some may quibble over the details, but my only serious disagreement comes when Michel says:


When we created the Topic maps standard, we created something that turned out to be a solution without a problem: the possibility to merge knowledge networks across organizations. Despite numerous expectations and many efforts in that direction, this didn’t prove to meet enough demands from users.

On the contrary, the inability “…to merge knowledge networks across organizations” is a very real problem. It’s one that has existed since there was more than one record that capture information about the same subject, inconsistently. That original event has been lost in the depths of time.

The inability “…to merge knowledge networks across organizations” has persisted to this day, relieved only on occasion by the use of the principles developed as part of the topic maps effort.

If “mistake” it was, the “mistake” of topic maps was failing to realize that silo owners have an investment in the maintenance of their silos. Silos distinguish them from other silo owners, make them important both intra and inter organization, make the case for their budgets, their staffs, etc.

To argue that silos create inefficiencies for an organization is to mistake efficiency as a goal of the organization. There’s no universal ordering of the goals of organizations (commercial or governmental) but preservation or expansion of scope, budget, staff, prestige, mission, all trump “efficiency” for any organization.

Unfunded “benefits for others” (including the public) falls into the same category as “efficiency.” Unfunded “benefits for others” is also a non-goal of organizations, including governmental ones.

Want to appeal to silo owners?

Appeal to silo owners on the basis of extending their silos to consume the silos of others!

Market topic maps not as leading to a Kumbaya state of openness and stupor but of aggressive assimilation of other silos.

If the CIA assimilates part of the NSA or the NSA assimilates part of the FSB , or the FSB assimilates part of the MSS, what is assimilated, on what basis and what of those are shared, isn’t decided by topic maps. Those issues are decided by the silo owners paying for the topic map.

Topic maps and subject identity are non-partisan tools that enable silo poaching. If you want to share your results, that’s your call, not mine and certainly not topic maps.

Open data, leaking silos, envious silo owners, the topic maps market is so bright I gotta wear shades.**

** Unseen topic maps may be robbing you of the advantages of your silo even as you read this post. Whose silo(s) do you covet?

Fuzzing To Find Subjects

June 29th, 2017

Guido Vranken‘s post: The OpenVPN post-audit bug bonanza is an important review of bugs discovered in OpenVPN.

Jump to “How I fuzzed OpenVPN” for the details on Vranken fuzzing OpenVPN.

Not for the novice but an inspiration to devote time to the art of fuzzing.

The Open Web Application Security Project (OWASP) defines fuzzing this way:

Fuzz testing or Fuzzing is a Black Box software testing technique, which basically consists in finding implementation bugs using malformed/semi-malformed data injection in an automated fashion.

OWASP’s fuzzing mentions a number of resources and software, but omits the Basic Fuzzing Framework by CERT. That’s odd don’t you think?

The CERT Basic Fuzzing Framework (BFF), is current through 2016. Allen Householder has a description of version 2.8 at: Announcing CERT Basic Fuzzing Framework Version 2.8. Details on BFF, see: CERT BFF – Basic Fuzzing Framework.

Caution: One resource in the top ten (#9) for “fuzzing software” is: Fuzzing: Brute Force Vulnerability Discovery, by Michael Sutton, Adam Greene, and Pedram Amini. Great historical reference but it was published in 2007, some ten years ago. Look for more recent literature and software.

Fuzzing is obviously an important topic in finding subjects (read vulnerabilities) in software. Whether your intent is to fix those vulnerabilities or use them for your own purposes.

While reading Vranken‘s post, it occurred to me that “fuzzing” is also useful in discovering subjects in unmapped data sets.

Not all nine-digit numbers are Social Security Numbers but if you find a column of such numbers, along with what you think are street addresses and zip codes, it would not be a bad guess. Of course, if it is a 16-digit number, a criminal opportunity may be knocking at your door. (credit card)

While TMDM topic maps emphasized the use of URIs for subject identifiers, we all know that subject identifications outside of topic maps are more complex than string matching and far messier.

How would you create “fuzzy” searches to detect subjects across different data sets? Are there general principles for classes of subjects?

While your results might be presented as a curated topic map, the grist for that map would originate in the messy details of diverse information.

This sounds like an empirical question to me, especially since most search engines offer API access.

Thoughts?

Tor descriptors à la carte: Tor Metrics Library 2

June 29th, 2017

Tor descriptors à la carte: Tor Metrics Library 2.

From the post:

We’re often asked by researchers, users, and journalists for Tor network data. How can you find out how many people use the Tor network daily? How many relays make up the network? How many times has Tor Browser been downloaded in your country? In order to get to these answers from archived data, we have to continuously fetch, parse, and evaluate Tor descriptors. We do this with the Tor Metrics Library.

Today, the Tor Metrics Team is proud to announce major improvements and launch Tor Metrics Library version 2.0.0. These improvements, supported by a Mozilla Open Source Support (MOSS) “Mission Partners” award, enhance our ability to monitor the performance and stability of the Tor network.

Tutorials too! How very cool!

From the tutorials page:

“Tor metrics are the ammunition that lets Tor and other security advocates argue for a more private and secure Internet from a position of data, rather than just dogma or perspective.”
— Bruce Schneier (June 1, 2016

Rocks!

Encourage your family, friends, visitors to all use Tor. Consider an auto-updated display of Tor statistics to drive further use.

Relying on governments, vendors and interested others for security, is by definition, insecurity.

Targeting Data: Law Firms

June 29th, 2017

Law Firm Cyber Security Scorecard

From the webpage:

If you believe your law firm is cyber secure, we recommend that you download this report. We believe you will be quite surprised at the state the law firm industry as it relates to cyber security. This report demonstrates three key findings. First, law firms are woefully insecure. Second, billions of dollars are at-risk from corporate and government clients. Third, there exists little transparency between firms and clients about this issue.

How do we know this? LOGICFORCE surveyed and assessed over 200 law firms, ranging in size from 1 to 450+ total attorneys, located throughout the United States, working in a full complement of practice areas. The insights in this study come from critical data points gathered through authorized collection of anonymized LOGICFORCE system monitoring data, responses to client surveys, our proprietary SYNTHESIS E-IT SECURE™ assessments and published industry information.

Key Findings:

  • Every law firm assessed was targeted for confidential client data in 2016-2017. Approximately 40% did not know they were breached.
  • We see consistent evidence that cyber attacks on law firms are non-discriminatory. Size and revenues don’t seem to matter.
  • Only 23% of firms have cybersecurity insurance policies.
  • 95% of assessments conducted by LOGICFORCE show firms are not compliant with their data governance and cyber security policies.
  • 100% of those firms are not compliant with their client’s policy standards.

LOGICFORCE does not want your law firm to make headlines for the wrong reasons. Download this report now so you can understand your risks and begin to take appropriate action.

The “full report,” which I downloaded, is a sales brochure for LOGICFORCE and not a detailed technical analysis. (12 pages including cover front and back.)

It signals the general cyber vulnerability of law firms, but not so much of what works, what doesn’t, security by practice area, etc.

The Panama Papers provided a start on much needed transparency for governments and the super wealthy. That start was the result of a breach at one (1) law firm.

Martindale.com lists over one million (1,000,000) lawyers and law firms from around the world.

The Panama Papers and following fallout were the result of breaching 1 out of 1,000,000+ lawyers and law firms.

Do you ever wonder what lies hidden in the remaining 1,000,000+ lawyers and law firms?

According to Logicforce, that desire isn’t a difficult one to satisfy.

Fleeing the Country?

June 29th, 2017

Laws on Extradition of Citizens – Library of Congress Report.

Freedom/resistance fighters need to bookmark this report! A bit dated (2013) but still a serviceable guide to extradition laws in 157 countries.

The extradition map, reduced in scale here, is encouraging:

Always consult legal professionals for updated information and realize that governments make and choose the laws they will enforce. Your local safety in a “no extradition” country depends upon the whims and caprices of government officials.

Just like your cybersecurity, take multiple steps to secure yourself against unwanted government attention, both local and foreign.

ANTLR Parser Generator (4.7)

June 28th, 2017

ANTLR Parser Generator

From the about page:

ANTLR is a powerful parser generator that you can use to read, process, execute, or translate structured text or binary files. It’s widely used in academia and industry to build all sorts of languages, tools, and frameworks. Twitter search uses ANTLR for query parsing, with over 2 billion queries a day. The languages for Hive and Pig, the data warehouse and analysis systems for Hadoop, both use ANTLR. Lex Machina uses ANTLR for information extraction from legal texts. Oracle uses ANTLR within SQL Developer IDE and their migration tools. NetBeans IDE parses C++ with ANTLR. The HQL language in the Hibernate object-relational mapping framework is built with ANTLR.

Aside from these big-name, high-profile projects, you can build all sorts of useful tools like configuration file readers, legacy code converters, wiki markup renderers, and JSON parsers. I’ve built little tools for object-relational database mappings, describing 3D visualizations, injecting profiling code into Java source code, and have even done a simple DNA pattern matching example for a lecture.

From a formal language description called a grammar, ANTLR generates a parser for that language that can automatically build parse trees, which are data structures representing how a grammar matches the input. ANTLR also automatically generates tree walkers that you can use to visit the nodes of those trees to execute application-specific code.

There are thousands of ANTLR downloads a month and it is included on all Linux and OS X distributions. ANTLR is widely used because it’s easy to understand, powerful, flexible, generates human-readable output, comes with complete source under the BSD license, and is actively supported.
… (emphasis in original)

A friend wants to explore the OpenOffice schema by visualizing a parse from the Multi-Schema Validator.

ANTLR is probably more firepower than needed but the extra power may encourage creative thinking. Maybe.

Enjoy!