Aerial Informatics and Robotics Platform [simulator]

February 16th, 2017

Aerial Informatics and Robotics Platform (Microsoft)

From the webpage:

Machine learning is becoming an increasingly important artificial intelligence approach to building autonomous and robotic systems. One of the key challenges with machine learning is the need for many samples — the amount of data needed to learn useful behaviors is prohibitively high. In addition, the robotic system is often non-operational during the training phase. This requires debugging to occur in real-world experiments with an unpredictable robot.

The Aerial Informatics and Robotics platform solves for these two problems: the large data needs for training, and the ability to debug in a simulator. It will provide realistic simulation tools for designers and developers to seamlessly generate the copious amounts of training data they need. In addition, the platform leverages recent advances in physics and perception computation to create accurate, real-world simulations. Together, this realism, based on efficiently generated ground truth data, enables the study and execution of complex missions that might be time-consuming and/or risky in the real-world. For example, collisions in a simulator cost virtually nothing, yet provide actionable information for improving the design.

Open source simulator from Microsoft for drones.

How very cool!

Imagine training your drone to search for breaches of the Dakota Access pipeline.

Or how to react when it encounters hostile drones.

Enjoy!

behind the scenes: cleaning dirty data

February 16th, 2017

behind the scenes: cleaning dirty data

From the post:

Dirty Data. It’s everywhere! And that’s expected and ok and even frankly good imho — it happens when people are doing complicated things, in the real world, with lots of edge cases, and moving fast. Perfect is the enemy of good.

Alas it’s definitely behind-the-scenes work to find and fix dirty data problems, which means none of us learn from each other in the process. So — here’s a quick post about a dirty data issue we recently dealt with  Hopefully it’ll help you feel comradery, and maybe help some people using the BASE data.

We traced some oaDOI bugs to dirty records from PMC in the BASE open access aggregation database.

BASE = Bielefeld Academic Search Engine.

oaDOI = oaDOI (similar to DOI but points to open access version)

PMC = PubMed Central.

Are you cleaning data or contributing more dirty data?

Can You Replicate Your Searches?

February 16th, 2017

A comment at PubMed raises the question of replicating reported literature searches:

From the comment:

Mellisa Rethlefsen

I thank the authors of this Cochrane review for providing their search strategies in the document Appendix. Upon trying to reproduce the Ovid MEDLINE search strategy, we came across several errors. It is unclear whether these are transcription errors or represent actual errors in the performed search strategy, though likely the former.

For instance, in line 39, the search is “tumour bed boost.sh.kw.ti.ab” [quotes not in original]. The correct syntax would be “tumour bed boost.sh,kw,ti,ab” [no quotes]. The same is true for line 41, where the commas are replaced with periods.

In line 42, the search is “Breast Neoplasms /rt.sh” [quotes not in original]. It is not entirely clear what the authors meant here, but likely they meant to search the MeSH heading Breast Neoplasms with the subheading radiotherapy. If that is the case, the search should have been “Breast Neoplasms/rt” [no quotes].

In lines 43 and 44, it appears as though the authors were trying to search for the MeSH term “Radiotherapy, Conformal” with two different subheadings, which they spell out and end with a subject heading field search (i.e., Radiotherapy, Conformal/adverse events.sh). In Ovid syntax, however, the correct search syntax would be “Radiotherapy, Conformal/ae” [no quotes] without the subheading spelled out and without the extraneous .sh.

In line 47, there is another minor error, again with .sh being extraneously added to the search term “Radiotherapy/” [quotes not in original].

Though these errors are minor and are highly likely to be transcription errors, when attempting to replicate this search, each of these lines produces an error in Ovid. If a searcher is unaware of how to fix these problems, the search becomes unreplicable. Because the search could not have been completed as published, it is unlikely this was actually how the search was performed; however, it is a good case study to examine how even small details matter greatly for reproducibility in search strategies.

A great reminder that replication of searches is a non-trivial task and that search engines are literal to the point of idiocy.

Stopping DAPL – One Breach At A Time

February 16th, 2017

Despite years of opposition and a large number of donations, the Dakota Access pipeline is moving inexorably towards completion. Charlie Northcott writes in Dakota Access pipeline: Is the Standing Rock movement defeated?:

“Our hope is that the new administration in Washington will now provide North Dakota law enforcement the necessary resources to bring closure to the protests,” said Kyle Kirchmeier, the sheriff of the local Morton County Police, in a press release.

The last 1.5 mile (2.4 km) stretch of the pipeline is expected to be completed in less than 90 days.

Kyle “Bull Connor” Kirchmeier is the sheriff responsible for spraying Standing Rock protesters with water canon in sub-freezing weather. A real piece of work.

For speculation purposes, let’s assume the government does overwhelm the protesters at Standing Rock.

Aside from completion, what does the 1,172 miles of DAPL require to be used?

bakken_pipeline_map-460

It must have no known holes.

That is to say that if the pipeline were breached and that breach was known to the operator (as well as members of the press), no oil would flow.

Yes?

What do we know about the DAPL pipeline?

First, since the pipeline can be approached from either side, there is 2,344 miles of land for staging actions against the integrity of the pipeline.

The pipeline’s right of way is described in: Dakota Access Pipeline Project, U.S. Fish and Wildlife Service, Environmental Assessment, Grassland and Wetland Easement Crossings (May 2016):


Construction of the new pipeline would require a typical construction right-of-way (ROW) width of 125 feet in uplands, 100 feet in non-forested wetlands, 85 feet in forested areas (wetlands and uplands), and up to 150 feet in agricultural areas. Following construction, a 50-foot wide permanent easement would be retained along the pipeline. … (page 12)

Which means staging areas for pipeline interference activities can be located less than 30 yards (for US football fans) from the DAPL pipeline on either side.

A propaganda site for the DAPL builders helpfully notes:

99.98% of the pipeline is installed on privately owned property in North Dakota, South Dakota, Iowa, and Illinois. The Dakota Access Pipeline does not enter the Standing Rock Sioux reservation at any point.

Which of course means that you can lawfully, with the land owner’s permission, park a backhoe,

backhoe-loader-digging2-460

or, a bulldozer,

bulldozer-20626-2957877-460

quite close to the location of the DAPL pipeline.

Backhoes, bulldozers and suitable heavy equipment come in a wide variety of makes and models so these images are illustrative only.

The propaganda site I mentioned earlier also notes:


The Dakota Access Pipeline is an entirely underground pipeline. Only where there are pump stations or valves of testing stations is there any portion of the pipeline above ground. The pipeline is buried nearly 4 feet deep in most areas and in all agricultural lands, two feet deeper than required by law.

which if you remember your army training:

fighting-position-460

(The Infantry Rifle Platoon and Squad, FM 3-21.8 (FM 7-8) March, 2007, page 8-35.)

puts the DAPL pipeline within easy reach of one of these:

USMC_ETool-460

Of course, an ordinary shovel works just as well.

shovel-460

Anyone breaching or damaging the pipeline will be guilty of a variety of federal and state crimes and therefore should not do so.

If you discover a breach in the pipeline, however, you should document its location with a GPS phone and send the image to both local law enforcement and news organizations.

You will need maps to make sure you have discovered a breach in DAPL for reporting. I have some maps that will help. More on 17 February 2017.

DataBASIC

February 16th, 2017

DataBASIC

Not for you but an interesting resource for introducing children to working with data.

Includes WordCounter, WTFcsv, SameDiff and ConnectTheDots.

The network template is a csv file with a header, two fields separated by commas.

Pick the right text/examples and you could have a class captivated pretty quickly.

Enjoy!

Bypassing ALLR Protection on 22 CPU Architectures (Why This Is Good News!)

February 16th, 2017

A Simple JavaScript Exploit Bypasses ASLR Protection On 22 CPU Architectures by Swati Khandelwal.

From the post:

Security researchers have discovered a chip flaw that could nullify hacking protections for millions of devices regardless of their operating system or application running on them, and the worse — the flaw can not be entirely fixed with any mere software update.

The vulnerability resides in the way the memory management unit (MMU), a component of many CPUs, works and leads to bypass the Address Space Layout Randomization (ASLR) protection.

ASLR is a crucial security defense deployed by all modern operating systems from Windows and Linux to macOS, Android, and the BSDs.

In general, ASLR is a memory protection mechanism which randomizes the location where programs run in a device’s memory. This, in turn, makes it difficult for attackers to execute malicious payloads in specific spots in memory when exploiting buffer overflows or similar bugs.

In short, for attackers, it’s like an attempt to burglarize a house blindfolded.

But now a group of researchers, known as VUSec, from the Vrije University in the Netherlands have developed an attack that can bypass ASLR protection on at least 22 processor micro-architectures from popular vendors like Intel, AMD, ARM, Allwinner, Nvidia, and others.

The attack, dubbed ASLR Cache or AnC, is particularly serious because it uses simple JavaScript code to identify the base addresses in memory where system and application components are executed.

So, merely visiting a malicious site can trigger the attack, which allows attackers to conduct more attacks targeting the same area of the memory to steal sensitive information stored in the PC’s memory.

See Swati’s post for two videos demonstrating this unpatchable security flaw in action.

For a more formal explanation of the flaw,

ASLR on the Line: Practical Cache Attacks on the MMU by Ben Gras, et al.

Abstract:

Address space layout randomization (ASLR) is an important first line of defense against memory corruption attacks and a building block for many modern countermeasures. Existing attacks against ASLR rely on software vulnerabilities and/or on repeated (and detectable) memory probing.

In this paper, we show that neither is a hard requirement and that ASLR is fundamentally insecure on modern cachebased architectures, making ASLR and caching conflicting requirements (ASLR⊕Cache, or simply AnC). To support this claim, we describe a new EVICT+TIME cache attack on the virtual address translation performed by the memory management unit (MMU) of modern processors. Our AnC attack relies on the property that the MMU’s page-table walks result in caching page-table pages in the shared last-level cache (LLC). As a result, an attacker can derandomize virtual addresses of a victim’s code and data by locating the cache lines that store the page-table entries used for address translation.

Relying only on basic memory accesses allows AnC to be implemented in JavaScript without any specific instructions or software features. We show our JavaScript implementation can break code and heap ASLR in two major browsers running on the latest Linux operating system with 28 bits of entropy in 150
seconds. We further verify that the AnC attack is applicable to every modern architecture that we tried, including Intel, ARM and AMD. Mitigating this attack without naively disabling caches is hard, since it targets the low-level operations of the MMU. We conclude that ASLR is fundamentally flawed in sandboxed environments such as JavaScript and future defenses should not rely on randomized virtual addresses as a building block.

and,

Reverse Engineering Hardware Page Table Caches Using Side-Channel Attacks on the MMU by Stephan van Schaik, et al.

Abstract:

Recent hardware-based attacks that compromise systems with Rowhammer or bypass address-space layout randomization rely on how the processor’s memory management unit (MMU) interacts with page tables. These attacks often need to reload page tables repeatedly in order to observe changes in the target system’s behavior. To speed up the MMU’s page table lookups, modern processors make use of multiple levels of caches such as translation lookaside buffers (TLBs), special-purpose page table caches and even general data caches. A successful attack needs to flush these caches reliably before accessing page tables. To flush these caches from an unprivileged process, the attacker needs to create specialized memory access patterns based on the internal architecture and size of these caches, as well as on how the caches interact with each other. While information about TLBs and data caches are often reported in processor manuals released by the vendors, there is typically little or no information about the properties of page table caches on
different processors. In this paper, we retrofit a recently proposed EVICT+TIME attack on the MMU to reverse engineer the internal architecture, size and the interaction of these page table caches with other caches in 20 different microarchitectures from Intel, ARM and AMD. We release our findings in the form of a library that provides a convenient interface for flushing these caches as well as automatically reverse engineering page table caches on new architectures.

So, Why Is This Good News?

Everything exists in a context and security flaws are no exception to that rule.

For example, H.J.Res.41 – Providing for congressional disapproval under chapter 8 of title 5, United States Code, of a rule submitted by the Securities and Exchange Commission relating to “Disclosure of Payments by Resource Extraction Issuers” reads in part:


Resolved by the Senate and House of Representatives of the United States of America in Congress assembled, That Congress disapproves the rule submitted by the Securities and Exchange Commission relating to “Disclosure of Payments by Resource Extraction Issuers” (published at 81 Fed. Reg. 49359 (July 27, 2016)), and such rule shall have no force or effect.
… (emphasis in original)

That may not sound like much until you read Disclosure of Payments by Resource Extraction Issuers, issued by the Security and Exchange Commission (SEC), which reads in part:


SUMMARY:

We are adopting Rule 13q-1 and an amendment to Form SD to implement Section 1504 of the Dodd-Frank Wall Street Reform and Consumer Protection Act relating to the disclosure of payments by resource extraction issuers. Rule 13q-1 was initially adopted by the Commission on August 22, 2012, but it was subsequently vacated by the U.S. District Court for the District of Columbia. Section 1504 of the Dodd-Frank Act added Section 13(q) to the Securities Exchange Act of 1934, which directs the Commission to issue rules requiring resource extraction issuers to include in an annual report information relating to any payment made by the issuer, a subsidiary of the issuer, or an entity under the control of the issuer, to a foreign government or the Federal Government for the purpose of the commercial development of oil, natural gas, or minerals. Section 13(q) requires a resource extraction issuer to provide information about the type and total amount of such payments made for each project related to the commercial development of oil, natural gas, or minerals, and the type and total amount of payments made to each government. In addition, Section 13(q) requires a resource extraction issuer to provide information about those payments in an interactive data format.
… (emphasis in original)

Or as By Alex Guillén says in Trump signs bill killing SEC rule on foreign payments:

President Donald Trump Tuesday signed the first in a series of congressional regulatory rollback bills, revoking an Obama-era regulation that required oil and mining companies to disclose their payments to foreign governments.

The danger posed to global corruption by this SEC rule has passed.

What hasn’t passed is the staffs of foreign governments and resource extraction issuers remain promiscuous web surfers.

Web surfers who will easily fall prey to a JavaScript exploit that bypasses ASLR protection!

Rather than protecting global corruption, H.J.Res 41 increases the incentives for breaching the networks of foreign governments and resource extraction issuers. You may find payment information and other embarrassing and/or incriminating information.

ASLR Cache or AnC gives you another tool for mining the world of the elites.

Rejoice at every new systemic security flaw. The elites have more to hide than youthful indiscretions and records of poor marital fidelity.

New MorphGNT Releases and Accentuation Analysis

February 16th, 2017

New MorphGNT Releases and Accentuation Analysis by James Tauber.

From the post:

Back in 2015, I talked about Annotating the Normalization Column in MorphGNT. This post could almost be considered Part 2.

I recently went back to that work and made a fresh start on a new repo gnt-accentuation intended to explain the accentuation of each word in the GNT (and eventually other Greek texts). There’s two parts to that: explaining why the normalized form is accented the way it but then explaining why the word-in-context might be accented differently (clitics, etc). The repo is eventually going to do both but I started with the latter.

My goal with that repo is to be part of the larger vision of an “executable grammar” I’ve talked about for years where rules about, say, enclitics, are formally written up in a way that can be tested against the data. This means:

  • students reading a rule can immediately jump to real examples (or exceptions)
  • students confused by something in a text can immediately jump to rules explaining it
  • the correctness of the rules can be tested
  • errors in the text can be found

It is the fourth point that meant that my recent work uncovered some accentuation issues in the SBLGNT, normalization and lemmatization. Some of that has been corrected in a series of new releases of the MorphGNT: 6.08, 6.09, and 6.10. See https://github.com/morphgnt/sblgnt/releases for details of specifics. The reason for so many releases was I wanted to get corrections out as soon as I made them but then I found more issues!

There are some issues in the text itself which need to be resolved. See the Github issue https://github.com/morphgnt/sblgnt/issues/52 for details. I’d very much appreciate people’s input.

In the meantime, stay tuned for more progress on gnt-accentuation.

Was it random chance that I saw this announcement from James and Getting your hands dirty with the Digital Manuscripts Toolkit on the same day?

😉

I should mention that Codex Sinaiticus (second oldest witness to the Greek New Testament) and numerous other Greek NT manuscripts have been digitized by the British Library.

Paring these resources together offers a great opportunity to discover the Greek NT text as choices made by others. (Same holds true for the Hebrew Bible as well.)

Getting your hands dirty with the Digital Manuscripts Toolkit

February 16th, 2017

Getting your hands dirty with the Digital Manuscripts Toolkit by Emma Stanford. (3 March 2017 3.00pm — 5.00pm Venue: Centre for Digital Scholarship, Weston Library (Map)

From the webpage:

In this workshop offered jointly by Bodleian Digital Library Systems and Services and the Centre for Digital Scholarship, you’ll learn how to make the most of the digitized resources at the Bodleian, the BnF, the Vatican Library and a host of other institutions, using software tools built around the International Image Interoperability Framework (IIIF). After a brief introduction to the main concepts of IIIF, you’ll learn how to use Mirador and the Digital Manuscripts Toolkit to gather images from different institutions into a single viewer; rearrange, remix and enhance image sequences and add new descriptive metadata; add transcriptions and annotations to digitized images; and embed zoomable images or whole manuscripts into your own website or blog. You’ll leave with your own virtual workspace, stocked with the images you’re using.

This event is open to all. No technological or scholarly expertise is necessary. The workshop will be most useful if you already have a few digitized books or manuscripts in mind that you’d like to work with, but if you don’t, we can help you find some. In addition to manuscripts, the tools can be applied to digitized printed books, maps, paintings and ephemera.

To participate in the workshop, you will need your own laptop, with internet access via eduroam or the Bodleian Libraries network.

If you are planning on being at the Bodleian on 3 March 2017, call ahead to reserve a seat for this free event!

If not, explore Mirador and the Digital Manuscripts Toolkit on your own.

Investigating A Cyberwar

February 16th, 2017

Investigating A Cyberwar by Juliana Ruhfus.

From the post:

Editor’s Note: As the Syrian civil war has played out on the battlefields with gunshots and mortars, a parallel conflict has been fought online. The Syrian Electronic Army (SEA), a pro-Assad government group of hackers, has wielded bytes and malware to obtain crucial information from opponents of the Assad regime. The extracted information has led to arrests and torture of dissidents. In this interview, GIJN’s Eunice Au talks to Al Jazeera’s Juliana Ruhfus about the methodology and challenges of her investigation into the SEA and the process of transforming the story into an online game.

How did the idea for a documentary on the SEA come about? Who was part of your investigative team and how long did it take?

I had the idea for the film when I came across a report called “Behind Syria’s Digital Frontline,” published by a company called FireEye, cybersecurity analysts who had come across a cache of 30,000 Skype conversations that pro-Assad hackers had stolen from anti-Assad fighters. The hack provided a unique insight into the strategic intelligence that had been obtained from the Skype conversations, including Google images plans that outlined the battle at Khirbet Ghazaleh and images of missiles which the rebels were trying to purchase.

The fascinating thing was, it also shed light on how the hack was carried out. Pro-Assad hackers had created female avatars who befriended fighters on the front line by telling them how much they admired them and eventually asked to exchange photos. These images were infected with malware which proved devastating once downloaded. Computers in the field are shared by many fighters, allowing the hackers to spy on a large number of targets at once.

When I read the report I had the Eureka moment that I wait for when I am looking for a new idea: I could visualize the “invisible” cyberwar story and, for the first time ever, I really understood the crucial role that social engineering plays in hacking, that is the hacker’s psychological skill to get someone to click on an infected link.

I then shot the film together with director Darius Bazargan. Ozgur Kizilatis and Alexander Niakaris both did camera work and Simon Thorne was the editor. We filmed in London, Turkey, and France, and all together the production took just under three months.
… (emphasis in original)

C-suite level material but quite good, if a bit heavy-handed in its support for rebel forces in Syria. I favor the foxes over the hounds as well but prefer a more balanced approach to the potential of cyberwarfare.

Cyberweapons have the potential to be great equalizers with conventional forces. Punishing the use or supplying of cyberweapons, as Juliana reports here, is more than a little short-sighted. True, the Assad regime may have the cyber advantage today, but what about tomorrow? Or other governments?

“Tidying” Up Jane Austen (R)

February 16th, 2017

Text Mining the Tidy Way by Julia Silge.

Thanks to Julia’s presentation I now know there is an R package with all of Jane Austen’s novels ready for text analysis.

OK, Austen may not be at the top of your reading list, but the Tidy techniques Julia demonstrates are applicable to a wide range of textual data.

Among those mentioned in the presentation, NASA datasets!

Julia, along with Dave Robinson, wrote: Text Mining with R: A Tidy Approach, available online now and later this year from O’Reilly.

EFF Dice-Generated Passphrases

February 15th, 2017

EFF Dice-Generated Passphrases

From the post:

Create strong passphrases with EFF’s new random number generators! This page includes information about passwords, different wordlists, and EFF’s suggested method for passphrase generation. Use the directions below with EFF’s random number generator member gift or your own set of dice.

Ah, EFF random number generator member gift. 😉

Or you can order five Bicycle dice from Amazon. (Search for dice while you are there. I had no idea there were so many distinct dice sets.)

It’s mentioned but not emphasized that many sites don’t allow passphrases. Which forces you to fall back onto passwords. A password manager enables you to use different, strong passwords for every account.

Password managers should always be protected by strong passphrases. Keys to the kingdom as it were.

big-list-of-naughty-strings

February 15th, 2017

big-list-of-naughty-strings by Max Woolf.

From the webpage:

The Big List of Naughty Strings is a list of strings which have a high probability of causing issues when used as user-input data.

You won’t see any of these strings on the Tonight Show with Jimmy Fallon. 😉

They are “naughty” when used as user-input data.

For those searching for a starting point for legal liability, failure to test and/or document testing against this data set would be a good place to start.

Have you tested against the big-list-of-naughty-strings?

Amazon Chime – AES 256-bit Encryption Secure – Using Whose Key?

February 15th, 2017

Amazon Chime, Amazon’s competitor to Skype, WebEx and Google Hangouts.

I’m waiting on answers about why the Chime Dialin Rates page omits all of Africa, as well as Burma, Cambodia, Laos and Thailand.

While I wait for that answer, have you read the security claim for Chime?

Security:


Amazon Chime is an AWS service, which means you benefit from a data center and network architecture built to meet the requirements of the most security-sensitive organizations. In addition, Amazon Chime features security capabilities built directly into the service. Messages, voice, video, and content are encrypted using AES 256-bit encryption. The visual roster makes it easy to see who has joined the meeting, and meetings can be locked so that only authenticated users can join.

We have all heard stories of the super strength of AES 256-bit encryption:


As shown above, even with a supercomputer, it would take 1 billion billion years to crack the 128-bit AES key using brute force attack. This is more than the age of the universe (13.75 billion years). If one were to assume that a computing system existed that could recover a DES key in a second, it would still take that same machine approximately 149 trillion years to crack a 128-bit AES key.
… (How secure is AES against brute force attacks? by Mohit Arora.)

Longer than the universe is old! That’s secure.

Or is it?

Remember the age of universe example is a brute force attack.

What if an FBI agent shows up with a National Security Letter (NSL)?

Or a conventional search warrant demanding the decrypted content of a Chime conversation?

Unlocking AES encryption with the key is quite fast.

Yes?

PS: This isn’t a weakness limited to Chime. Any encryption where the key is not under your control is be definition insecure.

Unmet Needs for Analyzing Biological Big Data… [Data Integration #1 – Spells Market Opportunity]

February 15th, 2017

Unmet Needs for Analyzing Biological Big Data: A Survey of 704 NSF Principal Investigators by Lindsay Barone, Jason Williams, David Micklos.

Abstract:

In a 2016 survey of 704 National Science Foundation (NSF) Biological Sciences Directorate principle investigators (BIO PIs), nearly 90% indicated they are currently or will soon be analyzing large data sets. BIO PIs considered a range of computational needs important to their work, including high performance computing (HPC), bioinformatics support, multi-step workflows, updated analysis software, and the ability to store, share, and publish data. Previous studies in the United States and Canada emphasized infrastructure needs. However, BIO PIs said the most pressing unmet needs are training in data integration, data management, and scaling analyses for HPC, acknowledging that data science skills will be required to build a deeper understanding of life. This portends a growing data knowledge gap in biology and challenges institutions and funding agencies to redouble their support for computational training in biology.

In particular, needs topic maps can address rank #1, #2, #6, #7, and #10, or as found by the authors:


A majority of PIs—across bioinformatics/other disciplines, larger/smaller groups, and the four NSF programs—said their institutions are not meeting nine of 13 needs (Figure 3). Training on integration of multiple data types (89%), on data management and metadata (78%), and on scaling analysis to cloud/HP computing (71%) were the three greatest unmet needs. High performance computing was an unmet need for only 27% of PIs—with similar percentages across disciplines, different sized groups, and NSF programs.

or graphically (figure 3):

So, cloud, distributed, parallel, pipelining, etc., processing is insufficient?

Pushing undocumented and unintegratable data at ever increasing speeds is impressive but gives no joy?

This report will provoke another round of Esperanto fantasies, that is the creation of “universal” vocabularies, which if used by everyone and back-mapped to all existing literature, would solve the problem.

The number of Esperanto fantasies and the cost/delay of back-mapping to legacy data defeats all such efforts. Those defeats haven’t prevented repeated funding of such fantasies in the past, present and no doubt the future.

Perhaps those defeats are a question of scope.

That is rather than even attempting some “universal” interchange of data, why not approach it incrementally?

I suspect the PI’s surveyed each had some particular data set in mind when they mentioned data integration (which itself is a very broad term).

Why not seek out, develop and publish data integrations in particular instances, as opposed to attempting to theorize what might work for data yet unseen?

The need topic maps wanted to meet remains unmet. With no signs of lessening.

Opportunity knocks. Will we answer?

The Rise of the Weaponized AI Propaganda Machine

February 14th, 2017

The Rise of the Weaponized AI Propaganda Machine by Berit Anderson and Brett Horvath.

From the post:

“This is a propaganda machine. It’s targeting people individually to recruit them to an idea. It’s a level of social engineering that I’ve never seen before. They’re capturing people and then keeping them on an emotional leash and never letting them go,” said professor Jonathan Albright.

Albright, an assistant professor and data scientist at Elon University, started digging into fake news sites after Donald Trump was elected president. Through extensive research and interviews with Albright and other key experts in the field, including Samuel Woolley, Head of Research at Oxford University’s Computational Propaganda Project, and Martin Moore, Director of the Centre for the Study of Media, Communication and Power at Kings College, it became clear to Scout that this phenomenon was about much more than just a few fake news stories. It was a piece of a much bigger and darker puzzle — a Weaponized AI Propaganda Machine being used to manipulate our opinions and behavior to advance specific political agendas.

By leveraging automated emotional manipulation alongside swarms of bots, Facebook dark posts, A/B testing, and fake news networks, a company called Cambridge Analytica has activated an invisible machine that preys on the personalities of individual voters to create large shifts in public opinion. Many of these technologies have been used individually to some effect before, but together they make up a nearly impenetrable voter manipulation machine that is quickly becoming the new deciding factor in elections around the world.

Before you get too panicked, remember the techniques attributed to Cambridge Analytica were in use in the 1960 Kennedy presidential campaign. And have been in use since then by marketeers for every known variety of product, including politicians.

It’s hard to know if Anderson and Horvath are trying to drum up more business for Cambridge Analytica or if they are genuinely concerned for the political process.

Granting that Cambridge Analytica has more data than was available in the 1960’s but many people, not just Cambridge Analytica have labored on manipulation of public opinion since then.

If people were as easy to sway, politically speaking, as Anderson and Horvath posit, then why is there any political diversity at all? Shouldn’t we all be marching in lock step by now?

Oh, it’s a fun read so long as you don’t take it too seriously.

Besides, if a “weaponized AI propaganda machine” is that dangerous, isn’t the best defense a good offense?

I’m all for cranking up a “demonized AI propaganda machine” if you have the funding.

Yes?

We’re Bringing Learning to Rank to Elasticsearch [Merging Properties Query Dependent?]

February 14th, 2017

We’re Bringing Learning to Rank to Elasticsearch.

From the post:

It’s no secret that machine learning is revolutionizing many industries. This is equally true in search, where companies exhaust themselves capturing nuance through manually tuned search relevance. Mature search organizations want to get past the “good enough” of manual tuning to build smarter, self-learning search systems.

That’s why we’re excited to release our Elasticsearch Learning to Rank Plugin. What is learning to rank? With learning to rank, a team trains a machine learning model to learn what users deem relevant.

When implementing Learning to Rank you need to:

  1. Measure what users deem relevant through analytics, to build a judgment list grading documents as exactly relevant, moderately relevant, not relevant, for queries
  2. Hypothesize which features might help predict relevance such as TF*IDF of specific field matches, recency, personalization for the searching user, etc.
  3. Train a model that can accurately map features to a relevance score
  4. Deploy the model to your search infrastructure, using it to rank search results in production

Don’t fool yourself: underneath each of these steps lie complex, hard technical and non-technical problems. There’s still no silver bullet. As we mention in Relevant Search, manual tuning of search results comes with many of the same challenges as a good learning to rank solution. We’ll have more to say about the many infrastructure, technical, and non-technical challenges of mature learning to rank solutions in future blog posts.

… (emphasis in original)

A great post as always but of particular interest for topic map fans is this passage:


Many of these features aren’t static properties of the documents in the search engine. Instead they are query dependent – they measure some relationship between the user or their query and a document. And to readers of Relevant Search, this is what we term signals in that book.
… (emphasis in original)

Do you read this as suggesting the merging exhibited to users should depend upon their queries?

That two or more users, with different query histories could (should?) get different merged results from the same topic map?

Now that’s an interesting suggestion!

Enjoy this post and follow the blog for more of same.

(I have a copy of Relevant Search waiting to be read so I had better get to it!)

Fundamentals of Functional Programming (email lessons)

February 14th, 2017

Learn the fundamentals of functional programming — for free, in your inbox by Preethi Kasireddy.

From the post:

If you’re a software developer, you’ve probably noticed a growing trend: software applications keep getting more complicated.

It falls on our shoulders as developers to build, test, maintain, and scale these complex systems. To do so, we have to create well-structured code that is easy to understand, write, debug, reuse, and maintain.

But actually writing programs like this requires much more than just practice and patience.

In my upcoming course, Learning Functional JavaScript the Right Way, I’ll teach you how to use functional programming to create well-structured code.

But before jumping into that course (and I hope you will!), there’s an important prerequisite: building a strong foundation in the underlying principles of functional programming.

So I’ve created a new free email course that will take you on a fun and exploratory journey into understanding some of these core principles.

Let’s take a look at what the email course will cover, so you can decide how it fits into your programming education.
…(emphasis in original)

I haven’t taken an email oriented course in quite some time so interested to see how this contrasts with video lectures, etc.

Enjoy!

Deep Learning (MIT Press Book) – Published (and still online)

February 13th, 2017

Deep Learning by Yoshua Bengio, Ian Goodfellow and Aaron Courville.

From the introduction:


1.1 Who Should Read This Book?

This book can be useful for a variety of readers, but we wrote it with two main target audiences in mind. One of these target audiences is university students(undergraduate or graduate) learning about machine learning, including those who are beginning a career in deep learning and artificial intelligence research. The other target audience is software engineers who do not have a machine learning or statistics background, but want to rapidly acquire one and begin using deep learning in their product or platform. Deep learning has already proven useful in many software disciplines including computer vision, speech and audio processing,natural language processing, robotics, bioinformatics and chemistry, video games,search engines, online advertising and finance.

This book has been organized into three parts in order to best accommodate a variety of readers. Part I introduces basic mathematical tools and machine learning concepts. Part II describes the most established deep learning algorithms that are essentially solved technologies. Part III describes more speculative ideas that are widely believed to be important for future research in deep learning.

Readers should feel free to skip parts that are not relevant given their interests or background. Readers familiar with linear algebra, probability, and fundamental machine learning concepts can skip part I, for example, while readers who just want to implement a working system need not read beyond part II. To help choose which chapters to read, figure 1.6 provides a flowchart showing the high-level organization of the book.

We do assume that all readers come from a computer science background. We assume familiarity with programming, a basic understanding of computational performance issues, complexity theory, introductory level calculus and some of the terminology of graph theory.

This promises to be a real delight, whether read for an application space or to get a better handle on deep learning.

How to Listen Better [Not Just For Reporters]

February 13th, 2017

How to Listen Better by Josh Stearns.

From the post:

In my weekly newsletter, The Local Fix, I compiled a list of guides, tools, and examples of how newsrooms can listen more deeply to local communities. I’m sharing it here in case it can be useful to others, and to encourage people to add to the list.

See which of Josh’s resources resonate with you.

These resources are in the context of news/reporting but developing good listening skills is an asset in any field.

Here’s a free tip since you are likely sitting in front of your computer monitor:

If someone comes to talk to you, turn away from your monitor and pay attention to the person speaking.

Seriously, try that for a week and see if your communication with co-workers improves.

PS: Do read posts before you tweet responses to them. As they say, “reading is fundamental.”

Designing a Business Card in LaTeX (For Your New Alt-Identities)

February 13th, 2017

Designing a Business Card in LaTeX by Olivier Pieters

From the post:

In 2017, I will graduate from Ghent University. This means starting a professional career, either in academia or in industry. One of the first things that came to mind was that I needed a good curriculum vitæ, and a business card. I already have the former, but I still needed a business card. Consequently, I looked a bit online and was not all that impressed by the tools people used to design them. I did not want to change some template everybody’s using, but do my own thing. And suddenly, I realised: what better tool than LaTeX to make it!

I know, I already hear some saying “why not use the online tools?” or “Photoshop?”. I picked LaTeX because I want to have a platform independent implementation and because why not? I really like making LaTeX documents, so this seemed like something other than creating long documents.

So, how are we going to create it? First, we’ll make a template for the front and back sides. Then, we will modify this to our needs and have a perfectly formatted and aligned business card.

One of the few fun tasks in the creation of an alternative identity should be the creation of a new business card.

Olivier’s post gets you started on the LaTeX side, although an eye-catching design is on you.

It’s too late for some of us to establish convincing alternative identities.

On the other hand, alternative identities should be established for children before they are twelve or so. Complete interlocking financial, social, digital, etc. for each one.

It doesn’t make you a bad parent if you haven’t done so but a verifiable and alternative identity could be priceless in an uncertain world.

Do You Feel Chilled? W3C and DRM

February 13th, 2017

Indefensible: the W3C says companies should get to decide when and how security researchers reveal defects in browsers by Cory Doctorow.

From the post:

The World Wide Web Consortium has just signaled its intention to deliberately create legal jeopardy for security researchers who reveal defects in its members’ products, unless the security researchers get the approval of its members prior to revealing the embarrassing mistakes those members have made in creating their products. It’s a move that will put literally billions of people at risk as researchers are chilled from investigating and publishing on browsers that follow W3C standards.

It is indefensible.

I enjoy Cory’s postings and fiction but I had to read this one more than once to capture the nature of Cory’s complaint.

As I understand it the argument runs something like this:

1. The W3C is creating a “…standardized DRM system for video on the World Wide Web….”

2. Participants in the W3C process must “…surrender the right to invoke their patents in lawsuits as a condition of participating in the W3C process….” (The keyword here is participants. No non-participant waives their patent rights as a result of W3C policy.)

3. The W3C isn’t requiring waiver of DCMA 1201 rights as a condition for participating in the video DRM work.

All true but I don’t see Cory gets to the conclusion:

…deliberately create legal jeopardy for security researchers who reveal defects in its members’ products, unless the security researchers get the approval of its members prior to revealing the embarrassing mistakes those members have made in creating their products.

Whether the W3C requires participants in the DRM system for video to waive DCMA 1201 rights or not, the W3C process has no impact on non-participants in that process.

Secondly, security researchers are in jeopardy if and only if they incriminate themselves when publishing defects in DRM products. As security researchers, they are capable of anonymously publishing any security defects they find.

Third, legal liability flows from statutory law and not the presence or absence of consensual agreement among a group of vendors. Private agreements can only protect you from those agreeing.

I don’t support DRM and never have. Personally I think it is a scam and tax on content creators. It’s unfortunate that fear that someone, somewhere might not be paying full rate, is enough for content creators to tax themselves with DRM schemes and software. None of which is free.

Rather than arguing about W3C policy, why not point to the years of wasted effort and expense by content creators on DRM? With no measurable return. That’s a plain ROI question.

DRM software vendors know the pot of gold content creators are chasing is at the end of an ever receding rainbow. In fact, they’re counting on it.

Oxford Dictionaries Thesaurus Data – XQuery

February 12th, 2017

Retrieve Oxford Dictionaries API Thesaurus Data as XML with XQuery and BaseX by Adam Steffanick.

From the post:

We retrieved thesaurus data from the Oxford Dictionaries application programming interface (API) and returned Extensible Markup Language (XML) with XQuery, an XML query language, and BaseX, an XML database engine and XQuery processor. This tutorial illustrates how to retrieve thesaurus data—synonyms and antonyms—as XML from the Oxford Dictionaries API with XQuery and BaseX.

The Oxford Dictionaries API returns JavaScript Object Notation (JSON) responses that yield undesired XML structures when converted automatically with BaseX. Fortunately, we’re able to use XQuery to fill in some blanks after converting JSON to XML. My GitHub repository od-api-xquery contains XQuery code for this tutorial.

If you are having trouble staying at your computer during this unreasonably warm spring, this XQuery/Oxford Dictionary tutorial may help!

Ok, maybe that is an exaggeration but only a slight one. 😉

Enjoy!

Objective-See – OS X Malware Research

February 10th, 2017

Objective-See

Patrick Wardle‘s OS X security site described as:

As Macs become more prevelant, so does OS X malware. Unfortunately, current Mac security and anti-virus software is fairly trivial to generically bypass.

Objective-See was created to provide simple, yet effective OS X security tools. Always free of charge – no strings attached!

I don’t see news about OS X malware very often but following @patrickwardle and authors seen there will cure that problem.

Macs may be popular in current regime in Washington, among those not using:

etch-a-sketch-460

😉

BTW, since an Etch-a-Sketch uses aluminum powder, has anyone checked the concealment properties of an Etch-a-Sketch?

That is would the aluminum powder block scanners and if so, how well?

Asking for a friend of course! 😉

PS: In case you need an Etch-A-Sketch for research purposes, http://etchasketch.com/.

As the aluminum powder is removed by the stylus, blocking of EMF would go down. Making me wonder about online drawing games for the Etch-A-Sketch that would have the user removing the EMF barrier in close proximity to the computer.

Extracting any information would be a challenge but then releasing viruses in the wild to attack secure nuclear facilities relies on luck as well.

Macs Gaining Market Share? – First Mac Word Macro Malware Spotted In Wild

February 10th, 2017

Watch Out! First-Ever Word Macro Malware for Apple Mac OS Discovered in the Wild by Swati Khandelwal.

From the post:


Denying permission can save you, but if enabled ignoring warnings, the embedded macro executes a function, coded in Python, that downloads the malware payload to infect the Mac PCs, allowing hackers to monitor webcams, access browser history logs, and steal password and encryption keys.

According to a blog post published this week by Patrick Wardle, director of research at security firm Synack, the Python function is virtually identical to EmPyre – an open source Mac and Linux post-exploitation agent.

“It’s kind of a low-tech solution, but on one hand it’s abusing legitimate functionality so it’s not going to crash like a memory corruption or overflow might, and it’s not going to be patched out,” said Wardle.

Wardle tracked the IP address from which the malicious Word documents were spread to Russia and that IP has previously been associated with malicious activities like phishing attacks.

Granting this isn’t on the same level of technology as the in memory viruses I mentioned yesterday, but an attack vector that exploits human error and isn’t going to be ‘patched’ out is a good find.

With the present Republican regime in the United States, human error may be all that is necessary to peel government IT like an orange.

Besides, it isn’t the sophistication of the attack that counts (outside of BlackHat conferences) but the results you obtain without getting caught.

Yes?

Fast and Flexible Query Analysis at MapD with Apache Calcite [Merging Data?]

February 9th, 2017

Fast and Flexible Query Analysis at MapD with Apache Calcite by Alex Şuhan.

From the post:

After evaluating a few other options, we decided for Apache Calcite, an incubation stage project at the time. It takes SQL queries and generates extended relational algebra, using a highly configurable cost-based optimizer. Several projects use Calcite already for SQL parsing and query optimization.

One of the main strengths of Calcite is its highly modular structure, which allows for multiple integration points and creative uses. It offers a relational algebra builder, which makes moving to a different SQL parser (or adding a non-SQL frontend) feasible.

In our product, we need runtime functions which are not recognized by Calcite by default. For example, trigonometric functions are necessary for on-the-fly geo projections used for point map rendering. Fortunately, Calcite allows specifying such functions and they become first-class citizens, with proper type checking in place.

Calcite also includes a highly capable and flexible cost-based optimizer, which can apply high-level transformations to the relational algebra based on query patterns and statistics. For example, it can push part of a filter through a join in order to reduce the size of the input, like the following figure shows:

join_filter_pushdown-460

You can find this example and more about the cost-based optimizer in Calcite in this presentation on using it in the Apache Phoenix project. Such optimizations complement the low-level optimizations we do ourselves to achieve great speed improvements.

Relational algebra example
Let’s take a simple query: SELECT A.x, COUNT(*) FROM test JOIN B ON A.x = B.x WHERE A.y > 41 GROUP BY A.x; and analyze the relational algebra generated for it.

In Calcite relational algebra, there are a few main node types, corresponding to the theoretical extended relational algebra model: Scan, Filter, Project, Aggregate and Join. Each type of node, except Scan, has one or more (in the case of Join) inputs and its output can become the input of another node. The graph of nodes connected by data flow relationships is a
directed acyclic graph (abbreviated as “DAG”). For our query, Calcite outputs the following DAG:

DAG

The Scan nodes have no inputs and output all the rows and the columns in tables A and B, respectively. The Join node specifies the join condition (in our case A.x = B.x) and its output contains the columns in A and B concatenated. The Filter node only allows the rows which pass the specified condition and its output preserves all columns of input. The Project node only preserves the specified expressions as columns in the output. Finally, the Aggregate specifies the group by expressions and aggregates.

The physical implementation of the nodes is up to the system using Calcite as a frontend. Nothing in the Join node mandates a certain implementation of the join operation (equijoin in our case). Indeed, using a condition which can’t be implemented as a hash join, like A.x < B.x, would only be reflected by the condition in the Filter node.

You’re not MapD today but that’s no excuse for poor query performance.

Besides, learning Apache Calcite will increase your attractiveness as data and queries on it become more complex.

I haven’t read all the documentation but the “metadata” in Apache Calcite is as flat as any you will find.

Which means integration of different data sources is either luck of the draw or you asked someone the “meaning” of the metadata.

The tutorial has this example:

calcite-460

The column header “GENDER” for example appears to presume the common male/female distinction. But without further exploration of the data set, there could be other genders encoded in that field as well.

If “GENDER” seems too easy, what would you say about “NAME,” bearing in mind that Japanese family names are written first and given names written second. How would those appear under “NAME?”

Apologies! My screen shot missed field “S.”

I have utterly no idea what “S” may or may not represent as a field header. Do you?

If the obviousness of field headers fails with “GENDER” and “NAME,” what do you suspect will happen with less “obvious” field headers?

How successful will merging of data be?

Where would you add subject identity information and how would you associate it with data processed by Apache Calcite?

Opening Secure Channels for Confidential Tips [Allocating Risk for Leaks]

February 9th, 2017

Opening Secure Channels for Confidential Tips by Martin Shelton.

From the post:

In Shields Up, security user researcher Martin Shelton writes about security threats and defenses for journalists. Below, his first installment. —eds

To make it easier for tipsters to share sensitive information, a growing number of news organizations are launching resources for confidential tips. While there is some overlap between the communication channels that each news organization supports, it’s not always clear which channels are the most practical for routine use. This short guide will describe some basics around how to think about security on behalf of your sources before thinking about tools and practices. I’ll also describe common communication channels for accepting sensitive tips and tradeoffs when using each channel. When thinking about tradeoffs, consider which channels are right for you.
… (emphasis in original)

Martin does a great job of surveying your current security options but doesn’t address the allocation of risk between leakers and news organizations that I covered in U.S. Leaking Law: You Go To Jail – I Win A Pulitzer and/or the option of leaking access rather than the risk of leaking data/documents, How-To: Leaking In Two Steps.

Here’s the comment I’m posting to his post and I will report back on his response, probably in a separate post:

Martin, great job on covering the security options for tips and their tradeoffs!

I do have a question though about the current model of leaking, which puts all of the risk on the leaker. A leaker undertakes the burden of liberating data and/or documents, takes the risk of copying/removing them and then the risk of getting them securely to a news organization.

All of which requires technical skills that aren’t common.

As an alternative, why shouldn’t leakers leak access to such networks/servers and enable news organizations, who have greater technical resources, to undertake the risks of retrieval of such documents?

I mentioned this to another news person and they quickly pointed out the dangers of the Computer Fraud and Abuse Act (CFAA) for a news organization but the same holds true for the leaker. Who very likely has fewer technical skills than any news organization.

Thinking that news organizations can decide to serve the interests of government (follow the CFAA) or they can decided to serve the public interest. In my view, those are not synonymous.

I am still refining ways that leakers could securely leak access but at present, using standard subscription forms with access information instead of identifying properties, offers both a trustworthy target (the news organization) and a multiplicity of places to leak, which prevents effective monitoring of them. I have written more than once about this topic but two of particular interest: U.S. Leaking Law: You Go To Jail – I Win A Pulitzer, and, How-To: Leaking In Two Steps.

Before anyone protests the “ethics” of breaking laws such as the CFAA, recall governments broke faith with their citizens first. Laws like the CFAA are monuments to that breach of faith. Nothing more.

Fileless attacks against enterprise networks

February 9th, 2017

Kaspersky Lab reports in Fileless attacks against enterprise networks the discovery of malware that hides in memory to avoid detection.

It’s summary:

During incident response, a team of security specialists needs to follow the artefacts that attackers have left in the network. Artefacts are stored in logs, memories and hard drives. Unfortunately, each of these storage media has a limited timeframe when the required data is available. One reboot of an attacked computer will make memory acquisition useless. Several months after an attack the analysis of logs becomes a gamble because they are rotated over time. Hard drives store a lot of needed data and, depending on its activity, forensic specialists may extract data up to a year after an incident. That’s why attackers are using anti-forensic techniques (or simply SDELETE) and memory-based malware to hide their activity during data acquisition. A good example of the implementation of such techniques is Duqu2. After dropping on the hard drive and starting its malicious MSI package it removes the package from the hard drive with file renaming and leaves part of itself in the memory with a payload. That’s why memory forensics is critical to the analysis of malware and its functions. Another important part of an attack are the tunnels that are going to be installed in the network by attackers. Cybercriminals (like Carbanak or GCMAN) may use PLINK for that. Duqu2 used a special driver for that. Now you may understand why we were very excited and impressed when, during an incident response, we found that memory-based malware and tunnelling were implemented by attackers using Windows standard utilities like “SC” and “NETSH“.

Kaspersky reports 140 enterprises in 40 countries have been affected by the malware:

hidden-malware-460

The reported focus has been on banking/financial targets, which implies to me that political targets are not preparing for this type of attack.

If you are going to “play in the street,” an American expression meaning to go in harm’s way, be sure to read the attribution section carefully and repeatedly. Your skills aren’t useful to anyone if you are in prison.

Republican Regime Creates New Cyber Market – Burner Twitter/Facebook Accounts

February 9th, 2017

The current Republican regime has embarked upon creating a new cyber market, less than a month after taking office.

Samatha Dean (Tech Times) reports:

Planning a visit to the U.S.? Your passport is not the only thing you may have to turn in at the immigration counter, be prepared to relinquish your social media account passwords as well to the border security agents.

That’s right! According to a new protocol from the Homeland Security that is under consideration, visitors to the U.S. may have to give their Twitter and Facebook passwords to the border security agents.

The news comes close on the heels of the Trump administration issuing the immigration ban, which resulted in a massive state of confusion at airports, where several people were debarred from entering the country.

John F. Kelly, the Homeland Security Secretary, shared with the Congress on Feb. 7 that the Trump administration was considering this option. The measure was being weighed as a means to sieve visa applications and sift through refugees from the Muslim majority countries that are under the 90-day immigration ban.

I say burner Twitter/Facebook accounts, if you plan on making a second trip to the US, you will need to have the burner accounts maintained over the years.

The need for burner Twitter/Facebook accounts, ones you can freely disclose to border security agents, presents a wide range of data science issues.

In no particular order:

  • Defeating Twitter/Facebook security on a large scale. Not trivial but not the hard part either
  • Creating accounts with the most common names
  • Automated posting to accounts in their native language
  • Posts must be indistinguishable from human user postings, i.e., no auto-retweets of Sean Spicer
  • Profile of tweets/posts shows consistent usage

I haven’t thought about burner bank account details before but that certainly should be doable. Especially if you have a set of banks on the Net that don’t have much overhead but exist to keep records one to the other.

Burner bank accounts could be useful to more than just travelers to the United States.

Kudos to the new Republican regime and their market creation efforts!

State of Washington & State of Minnesota v. Trump [Press Resource]

February 9th, 2017

State of Washington & State of Minnesota v. Trump 9th Circuit Court of Appeals webpage on case: 17-35105.

The clerk of the Ninth Circuit has created a listing of all the pleading, hearings, etc., in date order (most recent at the top of the list) for your research and reading pleasure.

I won’t repeat the listing here as it would be quickly out of date.

Please include: State of Washington & State of Minnesota v. Trump, https://www.ca9.uscourts.gov/content/view.php?pk_id=0000000860 as a hyperlink in all your postings on this case.

Your readers deserve the opportunity to read, hear and see the arguments and briefs in this case for themselves.

PS: It appears to be updated after the close of business for the clerk’s office so filings today aren’t reflected on the page.

Turning Pixelated Faces Back Into Real Ones

February 9th, 2017

Google’s neural networks turn pixelated faces back into real ones by John E. Dunn.

From the post:

Researchers at Google Brain have come up with a way to turn heavily pixelated images of human faces into something that bears a usable resemblance to the original subject.

In a new paper, the company’s researchers describe using neural networks put to work at two different ends of what should, on the face of it, be an incredibly difficult problem to solve: how to resolve a blocky 8 x 8 pixel images of faces or indoor scenes containing almost no information?

It’s something scientists in the field of super resolution (SR) have been working on for years, using techniques such as de-blurring and interpolation that are often not successful for this type of image. As the researchers put it:

When some details do not exist in the source image, the challenge lies not only in “deblurring” an image, but also in generating new image details that appear plausible to a human observer.

Their method involves getting the first “conditioning” neural network to resize 32 x 32 pixel images down to 8 x 8 pixels to see if that process can find a point at which they start to match the test image.

John raises a practical objection:


The obvious practical application of this would be enhancing blurry CCTV images of suspects. But getting to grips with real faces at awkward angles depends on numerous small details. Emphasise the wrong ones and police could end up looking for the wrong person.

True but John presumes the “suspects” are unknown. That’s true for the typical convenience store robbery on the 10 PM news but not so for “suspects” under intentional surveillance.

In those cases, multiple ground truth images from a variety of angles are likely to be available.