Archive for June, 2017

Wikipedia: The Text Adventure

Friday, June 30th, 2017

Wikipedia: The Text Adventure by Kevan Davis.

You can choose a starting location or enter any Wikipedia article as your starting point. Described by Davis as interactive non-fiction.

Interesting way to learn an area but the graphics leave much to be desired.

If you like games, see http://kevan.org/ for a number of games designed by Davis.

Paparazzi, for example, with some modifications, could be adapted to being a data mining exercise with public data feeds. “Public” in the sense the data feeds are being obtained from public cameras.

Images from congressional hearings for example. All of the people in those images, aside from the members of congress and the witness, have identities and possibly access to information of interest to you. The same is true for people observed going to and from federal or state government offices.

Crowd-sourcing identification of people in such images, assuming you have pre-clustered them by image similarity, could make government staff and visitors more transparent than they are at present.

Enjoy the Wikipedia text adventure and mine the list of games for ideas on building data-related games.

Mistaken Location of Creativity in “Machine Creativity Beats Some Modern Art”

Friday, June 30th, 2017

Machine Creativity Beats Some Modern Art

From the post:

Creativity is one of the great challenges for machine intelligence. There is no shortage of evidence showing how machines can match and even outperform humans in vast areas of endeavor, such as face and object recognition, doodling, image synthesis, language translation, a vast variety of games such as chess and Go, and so on. But when it comes to creativity, the machines lag well behind.

Not through lack of effort. For example, machines have learned to recognize artistic style, separate it from the content of an image, and then apply it to other images. That makes it possible to convert any photograph into the style of Van Gogh’s Starry Night, for instance. But while this and other work provides important insight into the nature of artistic style, it doesn’t count as creativity. So the challenge remains to find ways of exploiting machine intelligence for creative purposes.

Today, we get some insight into progress in this area thanks to the work of Ahmed Elgammal at the Art & AI Laboratory at Rutgers University in New Jersey, along with colleagues at Facebook’s AI labs and elsewhere.
… (emphasis in original)

This summary of CAN: Creative Adversarial Networks, Generating “Art” by Learning About Styles and Deviating from Style Norms by Ahmed Elgammal, Bingchen Liu, Mohamed Elhoseiny, Marian Mazzone, repeats a mistake made by the authors, that is the misplacement of creativity.

Creativity, indeed, even art itself, is easily argued to reside in the viewer (reader) and not the creator at all.

To illustrate, I quote a long passage from Stanley Fish’s How to Recognize a Poem When You See One below but a quick summary/reminder goes like this:

Fish was teaching back to back classes in the same classroom and for the first class, wrote a list of authors on the blackboard. After the first class ended but before the second class, a poetry class, arrived, he enclosed the list of authors in a rectangle and wrote a page number, as though the list was from a book. When the second class arrived, he asked them to interpret the “poem” that was on the board. Which they proceeded to do. Where would you locate creativity in that situation?

The longer and better written start of the story (by Fish):

[1] Last time I sketched out an argument by which meanings are the property neither of fixed and stable texts nor of free and independent readers but of interpretive communities that are responsible both for the shape of a reader’s activities and for the texts those activities produce. In this lecture I propose to extend that argument so as to account not only for the meanings a poem might be said to have but for the fact of its being recognized as a poem in the first place. And once again I would like to begin with an anecdote.

[2] In the summer of 1971 I was teaching two courses under the joint auspices of the Linguistic Institute of America and the English Department of the State University of New York at Buffalo. I taught these courses in the morning and in the same room. At 9:30 I would meet a group of students who were interested in the relationship between linguistics and literary criticism. Our nominal subject was stylistics but our concerns were finally theoretical and extended to the presuppositions and assumptions which underlie both linguistic and literary practice. At 11:00 these students were replaced by another group whose concerns were exclusively literary and were in fact confined to English religious poetry of the seventeenth century. These students had been learning how to identify Christian symbols and how to recognize typological patterns and how to move from the observation of these symbols and patterns to the specification of a poetic intention that was usually didactic or homiletic. On the day I am thinking about, the only connection between the two classes was an assignment given to the first which was still on the blackboard at the beginning of the second. It read:

Jacobs-Rosenbaum
Levin
Thorne
Hayes
Ohman (?)

[3] I am sure that many of you will already have recognized the names on this list, but for the sake of the record, allow me to identify them. Roderick Jacobs and Peter Rosenbaum are two linguists who have coauthored a number of textbooks and coedited a number of anthologies. Samuel Levin is a linguist who was one of the first to apply the operations of transformational grammar to literary texts. J. P. Thorne is a linguist at Edinburgh who, like Levin, was attempting to extend the rules of transformational grammar to the notorious ir-regularities of poetic language. Curtis Hayes is a linguist who was then using transformational grammar in order to establish an objective basis for his intuitive impression that the language of Gibbon’s Decline and Fall of the Roman Empire is more complex than the language of Hemingway’s novels. And Richard Ohmann is the literary critic who, more than any other, was responsible for introducing the vocabulary of transformational grammar to the literary community. Ohmann’s name was spelled as you see it here because I could not remember whether it contained one or two n’s. In other words, the question mark in parenthesis signified nothing more than a faulty memory and a desire on my part to appear scrupulous. The fact that the names appeared in a list that was arranged vertically, and that Levin, Thorne, and Hayes formed a column that was more or less centered in relation to the paired names of Jacobs and Rosenbaum, was similarly accidental and was evidence only of a certain compulsiveness if, indeed, it was evidence of anything at all.

[4] In the time between the two classes I made only one change. I drew a frame around the assignment and wrote on the top of that frame “p. 43.” When the members of the second class filed in I told them that what they saw on the blackboard was a religious poem of the kind they had been studying and I asked them to interpret it. Immediately they began to perform in a manner that, for reasons which will become clear, was more or less predictable. The first student to speak pointed out that the poem was probably a hieroglyph, although he was not sure whether it was in the shape of a cross or an altar. This question was set aside as the other students, following his lead, began to concentrate on individual words, interrupting each other with suggestions that came so quickly that they seemed spontaneous. The first line of the poem (the very order of events assumed the already constituted status of the object) received the most attention: Jacobs was explicated as a reference to Jacob’s ladder, traditionally allegorized as a figure for the Christian ascent to heaven. In this poem, however, or so my students told me, the means of ascent is not a ladder but a tree, a rose tree or rosenbaum. This was seen to be an obvious reference to the Virgin Mary who was often characterized as a rose without thorns, itself an emblem of the immaculate conception. At this point the poem appeared to the students to be operating in the familiar manner of an iconographic riddle. It at once posed the question, “How is it that a man can climb to heaven by means of a rose tree?” and directed the reader to the inevitable answer: by the fruit of that tree, the fruit of Mary’s womb, Jesus. Once this interpretation was established it received support from, and conferred significance on, the word “thorne,” which could only be an allusion to the crown of thorns, a symbol of the trial suffered by Jesus and of the price he paid to save us all. It was only a short step (really no step at all) from this insight to the recognition of Levin as a double reference, first to the tribe of Levi, of whose priestly function Christ was the fulfillment, and second to the unleavened bread carried by the children of Israel on their exodus from Egypt, the place of sin, and in response to the call of Moses, perhaps the most familiar of the old testament types of Christ. The final word of the poem was given at least three complementary readings: it could be “omen,” especially since so much of the poem is concerned with foreshadowing and prophecy; it could be Oh Man, since it is mans story as it intersects with the divine plan that is the poem’s subject; and it could, of course, be simply “amen,” the proper conclusion to a poem celebrating the love and mercy shown by a God who gave his only begotten son so that we may live.

[5] In addition to specifying significances for the words of the poem and relating those significances to one another, the students began to discern larger structural patterns. It was noted that of the six names in the poem three–Jacobs, Rosenbaum, and Levin–are Hebrew, two–Thorne and Hayes–are Christian, and one–Ohman–is ambiguous, the ambiguity being marked in the poem itself (as the phrase goes) by the question mark in parenthesis. This division was seen as a reflection of the basic distinction between the old dis-pensation and the new, the law of sin and the law of love. That distinction, however, is blurred and finally dissolved by the typological perspective which invests the old testament events and heroes with new testament meanings. The structure of the poem, my students concluded, is therefore a double one, establishing and undermining its basic pattern (Hebrew vs. Christian) at the same time. In this context there is finally no pressure to resolve the ambiguity of Ohman since the two possible readings–the name is Hebrew, the name is Christian–are both authorized by the reconciling presence in the poem of Jesus Christ. Finally, I must report that one student took to counting letters and found, to no one’s surprise, that the most prominent letters in the poem were S, O, N.

The account by Fish isn’t long and is highly recommended if you are interested in this issue.

If readers/viewers interpret images as art, is the “creativity” of the process that brought it into being even meaningful? Or does polling of viewers measure their appreciation of an image as art, without regard to the process that created it? Exactly what are we measuring when polling such viewers?

By Fish’s account, such a poll tells us a great deal about the viewers but nothing about the creator of the art.

FYI, that same lesson applies to column headers, metadata keys, and indeed, data itself. Which means the “meaning” of what you wrote may be obvious to you, but not to anyone else.

Topic maps can increase your odds of being understood or discovering the understanding captured by others.

Neo4j 3.3.0-alpha02 (Graphs For Schemas?)

Friday, June 30th, 2017

Neo4j 3.3.0-alpha02

A bit late (release was 06/15/2017) but give Neo4j 3.3.0-alpha02 a spin over the weekend.

From the post:


Detailed Changes and Docs

For the complete list of all changes, please see the changelog. Look for 3.3 Developer manual here, and 3.3 Operations manual here.

Neo4j is one of the graph engines a friend wants to use for analysis/modeling of the ODF 1.2 schema. The traditional indented list is only one tree visualization out of the four major ones.

(From: Trees & Graphs by Nathalie Henry Riche, Microsoft Research)

Riche’s presentation covers a number of other ways to visualize trees and if you relax the “tree” requirement for display, interesting graph visualizations that may give insight into a schema design.

The slides are part of the materials for CSE512 Data Visualization (Winter 2014), so references for visualizing trees and graphs need to be updated. Check the course resources link for more visualization resources.

Reinventing Wheels with No Wheel Experience

Friday, June 30th, 2017

Rob Graham, @ErrataRob, captured an essential truth when he tweeted:

Wheel re-invention is inherent every new programming language, every new library, and no doubt, nearly every new program.

How much “wheel experience” every programmer has across the breath of software vulnerabilities?

Hard to imagine meaningful numbers on the “wheel experience” of programmers in general but vulnerability reports make it clear either “wheel experience” is lacking or the lesson didn’t stick. Your call.

Vulnerabilities may occur in any release so standard practice is to check every release, however small. Have your results independently verified by trusted others.

PS: For the details on systemd, see: Sergey Bratus and the systemd thread.

If Silo Owners Love Their Children Too*

Friday, June 30th, 2017

* Apologies to Sting for the riff on the lyrics to Russians.

Topic Maps Now by Michel Biezunski.

From the post:

This article is my assessment on where Topic Maps are standing today. There is a striking contradiction between the fact that many web sites are organized as a set of interrelated topics — Wikipedia for example — and the fact that the name “Topic Maps” is hardly ever mentioned. In this paper, I will show why this is happening and advocate that the notions of topic mapping are still useful, even if they need to be adapted to new methods and systems. Furthermore, this flexibility in itself is a guarantee that they are still going to be relevant in the long term.

I have spent many years working with topic maps. I took part in the design of the initial topic maps model, I started the process to transform the conceptual model into an international standard. We published the first edition of Topic Maps ISO/IEC 13250 in 2000, and an update and a couple of years later in XML. Several other additions to the standard were published since then, the most recent one in 2015. During the last 15 years, I have helped clients create and manage topic map applications, and I am still doing it.

An interesting read, some may quibble over the details, but my only serious disagreement comes when Michel says:


When we created the Topic maps standard, we created something that turned out to be a solution without a problem: the possibility to merge knowledge networks across organizations. Despite numerous expectations and many efforts in that direction, this didn’t prove to meet enough demands from users.

On the contrary, the inability “…to merge knowledge networks across organizations” is a very real problem. It’s one that has existed since there was more than one record that capture information about the same subject, inconsistently. That original event has been lost in the depths of time.

The inability “…to merge knowledge networks across organizations” has persisted to this day, relieved only on occasion by the use of the principles developed as part of the topic maps effort.

If “mistake” it was, the “mistake” of topic maps was failing to realize that silo owners have an investment in the maintenance of their silos. Silos distinguish them from other silo owners, make them important both intra and inter organization, make the case for their budgets, their staffs, etc.

To argue that silos create inefficiencies for an organization is to mistake efficiency as a goal of the organization. There’s no universal ordering of the goals of organizations (commercial or governmental) but preservation or expansion of scope, budget, staff, prestige, mission, all trump “efficiency” for any organization.

Unfunded “benefits for others” (including the public) falls into the same category as “efficiency.” Unfunded “benefits for others” is also a non-goal of organizations, including governmental ones.

Want to appeal to silo owners?

Appeal to silo owners on the basis of extending their silos to consume the silos of others!

Market topic maps not as leading to a Kumbaya state of openness and stupor but of aggressive assimilation of other silos.

If the CIA assimilates part of the NSA or the NSA assimilates part of the FSB , or the FSB assimilates part of the MSS, what is assimilated, on what basis and what of those are shared, isn’t decided by topic maps. Those issues are decided by the silo owners paying for the topic map.

Topic maps and subject identity are non-partisan tools that enable silo poaching. If you want to share your results, that’s your call, not mine and certainly not topic maps.

Open data, leaking silos, envious silo owners, the topic maps market is so bright I gotta wear shades.**

** Unseen topic maps may be robbing you of the advantages of your silo even as you read this post. Whose silo(s) do you covet?

Fuzzing To Find Subjects

Thursday, June 29th, 2017

Guido Vranken‘s post: The OpenVPN post-audit bug bonanza is an important review of bugs discovered in OpenVPN.

Jump to “How I fuzzed OpenVPN” for the details on Vranken fuzzing OpenVPN.

Not for the novice but an inspiration to devote time to the art of fuzzing.

The Open Web Application Security Project (OWASP) defines fuzzing this way:

Fuzz testing or Fuzzing is a Black Box software testing technique, which basically consists in finding implementation bugs using malformed/semi-malformed data injection in an automated fashion.

OWASP’s fuzzing mentions a number of resources and software, but omits the Basic Fuzzing Framework by CERT. That’s odd don’t you think?

The CERT Basic Fuzzing Framework (BFF), is current through 2016. Allen Householder has a description of version 2.8 at: Announcing CERT Basic Fuzzing Framework Version 2.8. Details on BFF, see: CERT BFF – Basic Fuzzing Framework.

Caution: One resource in the top ten (#9) for “fuzzing software” is: Fuzzing: Brute Force Vulnerability Discovery, by Michael Sutton, Adam Greene, and Pedram Amini. Great historical reference but it was published in 2007, some ten years ago. Look for more recent literature and software.

Fuzzing is obviously an important topic in finding subjects (read vulnerabilities) in software. Whether your intent is to fix those vulnerabilities or use them for your own purposes.

While reading Vranken‘s post, it occurred to me that “fuzzing” is also useful in discovering subjects in unmapped data sets.

Not all nine-digit numbers are Social Security Numbers but if you find a column of such numbers, along with what you think are street addresses and zip codes, it would not be a bad guess. Of course, if it is a 16-digit number, a criminal opportunity may be knocking at your door. (credit card)

While TMDM topic maps emphasized the use of URIs for subject identifiers, we all know that subject identifications outside of topic maps are more complex than string matching and far messier.

How would you create “fuzzy” searches to detect subjects across different data sets? Are there general principles for classes of subjects?

While your results might be presented as a curated topic map, the grist for that map would originate in the messy details of diverse information.

This sounds like an empirical question to me, especially since most search engines offer API access.

Thoughts?

Tor descriptors à la carte: Tor Metrics Library 2

Thursday, June 29th, 2017

Tor descriptors à la carte: Tor Metrics Library 2.

From the post:

We’re often asked by researchers, users, and journalists for Tor network data. How can you find out how many people use the Tor network daily? How many relays make up the network? How many times has Tor Browser been downloaded in your country? In order to get to these answers from archived data, we have to continuously fetch, parse, and evaluate Tor descriptors. We do this with the Tor Metrics Library.

Today, the Tor Metrics Team is proud to announce major improvements and launch Tor Metrics Library version 2.0.0. These improvements, supported by a Mozilla Open Source Support (MOSS) “Mission Partners” award, enhance our ability to monitor the performance and stability of the Tor network.

Tutorials too! How very cool!

From the tutorials page:

“Tor metrics are the ammunition that lets Tor and other security advocates argue for a more private and secure Internet from a position of data, rather than just dogma or perspective.”
— Bruce Schneier (June 1, 2016

Rocks!

Encourage your family, friends, visitors to all use Tor. Consider an auto-updated display of Tor statistics to drive further use.

Relying on governments, vendors and interested others for security, is by definition, insecurity.

Targeting Data: Law Firms

Thursday, June 29th, 2017

Law Firm Cyber Security Scorecard

From the webpage:

If you believe your law firm is cyber secure, we recommend that you download this report. We believe you will be quite surprised at the state the law firm industry as it relates to cyber security. This report demonstrates three key findings. First, law firms are woefully insecure. Second, billions of dollars are at-risk from corporate and government clients. Third, there exists little transparency between firms and clients about this issue.

How do we know this? LOGICFORCE surveyed and assessed over 200 law firms, ranging in size from 1 to 450+ total attorneys, located throughout the United States, working in a full complement of practice areas. The insights in this study come from critical data points gathered through authorized collection of anonymized LOGICFORCE system monitoring data, responses to client surveys, our proprietary SYNTHESIS E-IT SECURE™ assessments and published industry information.

Key Findings:

  • Every law firm assessed was targeted for confidential client data in 2016-2017. Approximately 40% did not know they were breached.
  • We see consistent evidence that cyber attacks on law firms are non-discriminatory. Size and revenues don’t seem to matter.
  • Only 23% of firms have cybersecurity insurance policies.
  • 95% of assessments conducted by LOGICFORCE show firms are not compliant with their data governance and cyber security policies.
  • 100% of those firms are not compliant with their client’s policy standards.

LOGICFORCE does not want your law firm to make headlines for the wrong reasons. Download this report now so you can understand your risks and begin to take appropriate action.

The “full report,” which I downloaded, is a sales brochure for LOGICFORCE and not a detailed technical analysis. (12 pages including cover front and back.)

It signals the general cyber vulnerability of law firms, but not so much of what works, what doesn’t, security by practice area, etc.

The Panama Papers provided a start on much needed transparency for governments and the super wealthy. That start was the result of a breach at one (1) law firm.

Martindale.com lists over one million (1,000,000) lawyers and law firms from around the world.

The Panama Papers and following fallout were the result of breaching 1 out of 1,000,000+ lawyers and law firms.

Do you ever wonder what lies hidden in the remaining 1,000,000+ lawyers and law firms?

According to Logicforce, that desire isn’t a difficult one to satisfy.

Fleeing the Country?

Thursday, June 29th, 2017

Laws on Extradition of Citizens – Library of Congress Report.

Freedom/resistance fighters need to bookmark this report! A bit dated (2013) but still a serviceable guide to extradition laws in 157 countries.

The extradition map, reduced in scale here, is encouraging:

Always consult legal professionals for updated information and realize that governments make and choose the laws they will enforce. Your local safety in a “no extradition” country depends upon the whims and caprices of government officials.

Just like your cybersecurity, take multiple steps to secure yourself against unwanted government attention, both local and foreign.

ANTLR Parser Generator (4.7)

Wednesday, June 28th, 2017

ANTLR Parser Generator

From the about page:

ANTLR is a powerful parser generator that you can use to read, process, execute, or translate structured text or binary files. It’s widely used in academia and industry to build all sorts of languages, tools, and frameworks. Twitter search uses ANTLR for query parsing, with over 2 billion queries a day. The languages for Hive and Pig, the data warehouse and analysis systems for Hadoop, both use ANTLR. Lex Machina uses ANTLR for information extraction from legal texts. Oracle uses ANTLR within SQL Developer IDE and their migration tools. NetBeans IDE parses C++ with ANTLR. The HQL language in the Hibernate object-relational mapping framework is built with ANTLR.

Aside from these big-name, high-profile projects, you can build all sorts of useful tools like configuration file readers, legacy code converters, wiki markup renderers, and JSON parsers. I’ve built little tools for object-relational database mappings, describing 3D visualizations, injecting profiling code into Java source code, and have even done a simple DNA pattern matching example for a lecture.

From a formal language description called a grammar, ANTLR generates a parser for that language that can automatically build parse trees, which are data structures representing how a grammar matches the input. ANTLR also automatically generates tree walkers that you can use to visit the nodes of those trees to execute application-specific code.

There are thousands of ANTLR downloads a month and it is included on all Linux and OS X distributions. ANTLR is widely used because it’s easy to understand, powerful, flexible, generates human-readable output, comes with complete source under the BSD license, and is actively supported.
… (emphasis in original)

A friend wants to explore the OpenOffice schema by visualizing a parse from the Multi-Schema Validator.

ANTLR is probably more firepower than needed but the extra power may encourage creative thinking. Maybe.

Enjoy!

MS Streamlines Malware Delivery

Tuesday, June 27th, 2017

Microsoft is building a smart antivirus using 400 million PCs by Alfred Ng.

Malware delivery takes a giant leap forward with the MS Fall Creators Update:


If new malware is detected on any computer running Windows 10 in the world, Microsoft said it will be able to develop a signature for it and protect all the other users worldwide. The first victim will be safe as well because the virus will be set off in a virtual sandbox on the cloud, not on the person’s device.

Microsoft sees artificial intelligence as the next solution for security as attacks get more sophisticated.

“If we’re going to stay on top of anything that is changing that fast, you have to automate,” Lefferts said.

About 96 percent of detected cyberattacks are brand new, he noted.

With Microsoft’s current researchers working at their fastest pace, it can take a few hours to develop protections from the first moment they detect malware.

It’s during those few hours when people are really hit by malware. Using cloud data from Microsoft Office to develop malware signatures is crucial, for example, because recent attacks relied on Word vulnerabilities.

Two scenarios immediately come to mind:

  1. The “malware” detection is “false,” the file/operation/URL is benign but now 400 million computers see it as “malware,” or,
  2. Due to MTM attacks, false reports are sent to Windows computers on a particular sub-net.

Global security decision making is a great leap, but the question is in what direction?

PS: Did you notice the claim “96 percent of detected cyberattacks are brand news…?” I ask because that’s inconsistent with the documented long lives of cyber exploits, Website Security Statistics Report 2015 (WhiteHat Security).

Re-imagining Legislative Data – Semantic Integration Alert

Tuesday, June 27th, 2017

Innovate, Integrate, and Legislate: Announcing an App Challenge by John Pull.

From the post:

This morning, on Tuesday, June 27, 2017, Library of Congress Chief Information Officer Bernard A. Barton, Jr., is scheduled to make an in-person announcement to the attendees of the 2017 Legislative Data & Transparency Conference in the CVC. Mr. Barton will deliver a short announcement about the Library’s intention to launch a legislative data App Challenge later this year. This pre-launch announcement will encourage enthusiasts and professionals to bring their app-building skills to an endeavor that seeks to create enhanced access and interpretation of legislative data.

The themes of the challenge are INNOVATE, INTEGRATE, and LEGISLATE. Mr. Barton’s remarks are below:

Here in America, innovation is woven into our DNA. A week from today our nation celebrates its 241st birthday, and those years have been filled with great minds who surveyed the current state of affairs, analyzed the resources available to them, and created devices, systems, and ways of thinking that created a better future worldwide.

The pantheon includes Benjamin Franklin, George Washington Carver, Alexander Graham Bell, Bill Gates, and Steve Jobs. It includes first-generation Americans like Nikolai Tesla and Albert Einstein, for whom the nation was an incubator of innovation. And it includes brilliant women such as Grace Hopper, who led the team that invented the first computer language compiler, and Shirley Jackson, whose groundbreaking research with subatomic particles enabled the inventions of solar cells, fiber-optics, and the technology the brings us something we use every day: call waiting and caller ID.

For individuals such as these, the drive to innovate takes shape through understanding the available resources, surveying the landscape for what’s currently possible, and taking it to the next level. It’s the 21st Century, and society benefits every day from new technology, new generations of users, and new interpretations of the data surrounding us. Social media and mobile technology have rewired the flow of information, and some may say it has even rewired the way our minds work. So then, what might it look like to rewire the way we interpret legislative data?

It can be said that the legislative process – at a high level – is linear. What would it look like if these sets of legislative data were pushed beyond a linear model and into dimensions that are as-yet-unexplored? What new understandings wait to be uncovered by such interpretations? These understandings could have the power to evolve our democracy.

That’s a pretty grand statement, but it’s not without basis. The sets of data involved in this challenge are core to a legislative process that is centuries old. It’s the source code of America government. An informed citizenry is better able to participate in our democracy, and this is a very real opportunity to contribute to a better understanding of the work being done in Washington. It may even provide insights for the people doing the work around the clock, both on the Hill, and in state and district offices. Your innovation and integration may ultimately benefit the way our elected officials legislate for our future.

Improve the future, and be a part of history.

The 2017 Legislative Data App Challenge will launch later this summer. Over the next several weeks Information will be made available at loc.gov/appchallenge, and individuals are invited to connect via appchallenge@loc.gov.

I mention this as a placeholder only because Pull’s post is general enough to mean several things, their opposites or something entirely different.

The gist of the post is that later this summer (2017), a challenge involving an “app” will be announced. The “app” will access/deliver/integrate legislative data. Beyond that, no further information is available at this time.

Watch for future posts as more information becomes available.

Impact of Microsoft Leaks On Programming Practice

Tuesday, June 27th, 2017

Mohit Kumar’s great graphic:

leads for his story: Microsoft’s Private Windows 10 Internal Builds and Partial Source Code Leaked Online.

The use of MS source code for discovery of vulnerabilities is obvious.

Less obvious questions:

  • Do programmers follow leaked MS source code?
  • Do programmers following leaked MS source code commit similar vulnerability errors?

Evidence for a public good argument for not spreading leaked MS source code anyone?

Alert! IAB workshop on Explicit Internet Naming Systems

Monday, June 26th, 2017

IAB workshop on Explicit Internet Naming Systems by Cindy Morgan.

From the post:

Internet namespaces rely on Internet connected systems sharing a common set of assumptions on the scope, method of resolution, and uniqueness of the names. That set of assumption allowed the creation of URIs and other systems which presumed that you could authoritatively identify a service using an Internet name, a service port, and a set of locally-significant path elements.

There are now multiple challenges to maintaining that commonality of understanding.

  • Some naming systems wish to use URIs to identify both a service and the method of resolution used to map the name to a serving node. Because there is no common facility for varying the resolution method in the URI structure, those naming systems must either mint new URI schemes for each resolution service or infer the resolution method from a reserved name or pattern. Both methods are currently difficult and costly, and the effort thus scales poorly.
  • Users’ intentions to refer to specific names are now often expressed in voice input, gestures, and other methods which must be interpreted before being put into practice. The systems which carry on that interpretation often infer which intent a user is expressing, and thus what name is meant, by contextual elements. Those systems are linked to existing systems who have no access to that context and which may thus return results or create security expectations for an unintended name.
  • Unicode allows for both combining characters and composed characters when local language communities have different practices. When these do not have a single normalization, context is required to determine which to produce or assume in resolution. How can this context be maintained in Internet systems?

While any of these challenges could easily be the topic of a stand-alone effort, this workshop seeks to explore whether there is a common set of root problems in the explicitness of the resolution context, heuristic derivation of intent, or language matching. If so, it seeks to identify promising areas for the development of new, more explicit naming systems for the Internet.

We invite position papers on this topic to be submitted by July 28, 2017 to ename@iab.org. Decisions on accepted submissions will be made by August 11, 2017.

Proposed dates for the workshop are September 28th and 29th, 2017 and the proposed location is in the Pacific North West of North America. Finalized logistics will be announced prior to the deadline for submissions.

When I hear “naming” and “Internet” in the same sentence, the line, “Oh no, no, please God help me!,” from Black Sabbath‘s Black Sabbath:

Well, except that the line needs to read:

Oh no, no, please God help us!

since any proposal is likely to impact users across the Internet.

The most frightening part of the call for proposals reads:

While any of these challenges could easily be the topic of a stand-alone effort, this workshop seeks to explore whether there is a common set of root problems in the explicitness of the resolution context, heuristic derivation of intent, or language matching. If so, it seeks to identify promising areas for the development of new, more explicit naming systems for the Internet.

Are we doing a clean reboot on the problem of naming? “…[A] common set of root problems….[?]”

Research on and design of “more” explicit naming systems for the Internet could result in proposals subject to metric evaluations. Looking for common “root problems” in naming systems, is a recipe for navel gazing.

Improved Tracking of .onion links by Facebook

Sunday, June 25th, 2017

Improved sharing of .onion links on Facebook by Will Shackleton.

From the post:

Today we are rolling out two new features on Facebook to improve the experience of sharing, discovering and clicking .onion links to Tor hidden services especially for people who are not on Tor.

First, Facebook can now show previews for .onion links. Hidden service owners can use Open Graph tags to customise these previews, much like regular websites do.

Second, people who are not using Tor and click on .onion links will now see a message informing them that the link they clicked may not work. The message enables people to find out more about Tor and – for hidden services which have opted in – helps visit the service’s equivalent regular website. For people who are already using Tor, we send them straight through to the hidden service without showing any message.

Try sharing your favorite .onion link on Facebook and let us know in the comments what you think about our improvements!

This is a very bad plan!

If you are:

not using Tor and click on .onion links will now see a message informing them that the link they clicked may not work.

and, Facebook captures your non-Tor accessing of that link.

Accessing .onion links on Facebook, without using Tor, in the words of Admiral Ackbar, “It’s a trap!”:

Consumer Warning: Stale Passwords For Sale

Sunday, June 25th, 2017

Russian hackers are selling British officials’ passwords by Alfred Ng.

The important take away: the passwords are from a 2012 LinkedIn breach.

Unless you like paying for and mining low grade ore, considering passing on this offer.

Claims of stolen government passwords don’t make someone trustworthy. 😉

Statistical Functions for XQuery 3.1 (see OpenFormula)

Saturday, June 24th, 2017

simple-statsxq by Tim Thompson.

From the webpage:

Loosely inspired by the JavaScript simple-statistics project. The goal of this module is to provide a basic set of statistical functions for doing data analysis in XQuery 3.1.

Functions are intended to be implementation-agnostic.

Unit tests were written using the unit testing module of BaseX.

OpenFormula (Open Document Format for Office Applications (OpenDocument)) defines eighty-seven (87) statistical functions.

There are fifty-five (55) financial functions defined by OpenFormula, just in case you are interested.

See Through Walls With WiFi!

Thursday, June 22nd, 2017

Drones that can see through walls using only Wi-Fi

From the post:

A Wi-Fi transmitter and two drones. That’s all scientists need to create a 3D map of the interior of your house. Researchers at the University of California, Santa Barbara have successfully demonstrated how two drones working in tandem can ‘see through’ solid walls to create 3D model of the interiors of a building using only, and we kid you not, only Wi-Fi signals.

As astounding as it sounds, researchers Yasamin Mostofi and Chitra R. Karanam have devised this almost superhero-level X-ray vision technology. “This approach utilizes only Wi-Fi RSSI measurements, does not require any prior measurements in the area of interest and does not need objects to move to be imaged,” explains Mostofi, who teaches electrical and computer engineering at the University.

For the paper and other details, see: 3D Through-Wall Imaging With Unmanned Aerial Vehicles and WiFi.

Before some contractor creates the Stingray equivalent for law enforcement, researchers and electronics buffs need to create new and improved versions for the public.

Government and industry offices are more complex than this demo but the technology will continue to improve.

I don’t have the technical ability to carry out the experiment but wondering if measurement of a strong signal from any source as it approaches a building and then its exit on the far side would serve the same purpose?

Reasoning that government/industry buildings may become shielded to some signals but in an age of smart phones, not all.

Enjoy!

.Rddj (data journalism with R)

Wednesday, June 21st, 2017

.Rddj Hand-curated, high quality resources for doing data journalism with R by Timo Grossenbacher.

From the webpage:

The R Project is a great software environment for doing all sorts of data-driven journalism. It can be used for any of the stages of a typical data project: data collection, cleaning, analysis and even (interactive) visualization. And it’s all reproducible and transparent! Sure, it requires a fair amount of scripting, yet…

Do not fear! With this hand-curated (and opinionated) list of resources, you will be guided through the thick jungle of countless R packages, from learning the basics of R’s syntax, to scraping HTML tables, to a guide on how to make your work comprehensible and reproducible.

Now, enjoy your journey.

Some more efforts at persuasion: As I work in the media, I know how a lot of journalists are turned off by everything that doesn’t have a graphical interface with buttons to click on. However, you don’t need to spend days studying programming concepts in order to get started with R, as most operations can be carried out without applying scary things such as loops or conditionals – and, nowadays, high-level abstrations like dplyr make working with data a breeze. My advice if you’re new to data journalism or data processing in general: Better learn R than Excel, ’cause getting to know Excel (and the countless other tools that each do a single thing) doesn’t come for free, either.

This list is (partially) inspired by R for Journalists by Ed Borasky, which is another great resource for getting to know R.

… (emphasis in original)

The topics are familiar:

  • RStudio
  • Syntax and basic R programming
  • Collecting Data (from the Web)
  • Data cleaning and manipulation
  • Text mining / natural language processing
  • Exploratory data analysis and plotting
  • Interactive data visualization
  • Publication-quality graphics
  • Reproducibility
  • Examples of using R in (data) journalism
  • What makes this list of resources different from search results?

    Hand curation.

    How much of a difference?

    Compare the search results of “R” + any of these categories to the resources here.

    Bookmark .Rddj for data journalism and R, then ping me with the hand curated list of resources you are creating.

    Save yourself and the rest of us from search. Thanks!

    Storyzy A.I. Fights Fake Quotes (Ineffective Against Trump White House)

    Wednesday, June 21st, 2017

    In the battle against fake news, Storyzy A.I. fights fake quotes

    From the post:

    The Quote Verifier launched today by Storyzy takes the battle against fake news to a whole new automated level by conveniently flagging fake quotes on social networks and search engines with +50,000 new authentic quotes added daily.

    Storyzy aims to help social networks and search engines by spotting fake quotes. To fulfill this ambition Storyzy developed a tool (currently available in Beta version) that verifies whether a quote is authentic or not by checking if a person truly said that or not.
    … (emphasis in original)

    A tool for your short-list of verification tools to use on a daily basis.

    It’s ineffective against the Trump White House because accurate quotes can still be “false.”

    “Truthful quotes,” as per Trump White House policy, issue only from the President and must reflect what he meant to say. Subject to correction by the President.

    A “truthful quote,” consists of three parts:

    1. Said by the President
    2. Reflects what he meant to say
    3. Includes any subsequent correction by the President (one or more)

    There is a simply solution to avoiding “false” quotes from President Trump:

    Never quote him or his tweets at all.

    Quote his lackeys, familiars and sycophants, but not him.

    A Dictionary of Victorian Slang (1909)

    Tuesday, June 20th, 2017

    Passing English of the Victorian era, a dictionary of heterodox English, slang and phrase (1909) by J. Reeding Ware.

    Quoted from the Preface:

    HERE is a numerically weak collection of instances of ‘Passing English’. It may be hoped that there are errors on every page, and also that no entry is ‘quite too dull’. Thousands of words and phrases in existence in 1870 have drifted away, or changed their forms, or been absorbed, while as many have been added or are being added. ‘Passing English’ ripples from countless sources, forming a river of new language which has its tide and its ebb, while its current brings down new ideas and carries away those that have dribbled out of fashion. Not only is ‘Passing English’ general ; it is local ; often very seasonably local. Careless etymologists might hold that there are only four divisions of fugitive language in London west, east, north and south. But the variations are countless. Holborn knows little of Petty Italia behind Hatton Garden, and both these ignore Clerkenwell, which is equally foreign to Islington proper; in the South, Lambeth generally ignores the New Cut, and both look upon Southwark as linguistically out of bounds; while in Central London, Clare Market (disappearing with the nineteenth century) had, if it no longer has, a distinct fashion in words from its great and partially surviving rival through the centuries the world of Seven Dials, which is in St Giles’s St James’s being ractically in the next parish. In the East the confusion of languages is a world of ‘ variants ‘ there must be half-a-dozen of Anglo-Yiddish alone all, however, outgrown from the Hebrew stem. ‘Passing English’ belongs to all the classes, from the peerage class who have always adopted an imperfection in speech or frequency of phrase associated with the court, to the court of the lowest costermonger, who gives the fashion to his immediate entourage.

    A healthy reminder that language is no more fixed and unchanging than the people who use it.

    Enjoy!

    My Last Index (Is Search A Form of Discrimination?)

    Tuesday, June 20th, 2017

    My Last Index by Judith Pascoe.

    From the post:

    A casual reader of authors’ acknowledgment pages will encounter expressions of familial gratitude that paper over years of spousal neglect and missed cello recitals. A keen reader of those pages may happen upon animals that were essential to an author’s well-being—supportive dogs, diverting cats, or, in one instance, “four very special squirrels.” But even an assiduous reader of acknowledgments could go a lifetime without coming across a single shout-out to a competent indexer.

    That is mostly because the index gets constructed late in the book-making process. But it’s also because most readers pay no mind to indexes, especially at this moment in time when they are being supplanted by Amazon and Google. More and more, when I want to track down an errant tidbit of information about a book, I use Amazon’s “Search inside this book” function, which allows interested parties to access a book’s front cover, copyright, table of contents, first pages (and sometimes more), and index. But there’s no reason to even use the index when you can “Look Inside!” to find anything you need.

    I had plenty of time to ponder the unsung heroism of indexers when I was finishing my latest book. Twice before, I had assembled an indexer’s tools of trade: walking down the stationery aisles of a college book store, pausing to consider the nib and color of my Flair pens, halting before the index cards. But when I began work on this index, I was overcome with thoughts of doom that Nancy Mulvany, author of Indexing Books, attributes to two factors that plague self-indexing authors: general fatigue and too much self-involvement. “Intense involvement with one’s book,” Mulvany writes, “can make it very difficult to anticipate the index user’s needs accurately.”

    Perhaps my mood was dire because I’d lost the services of my favorite proofreader, a woman who knew a blackberry from a BlackBerry, and who could be counted on to fix my flawed French. Perhaps it was because I was forced to notice how often I’d failed to include page citations in my bibliography entries, and how inconsistently I’d applied the protocol for citing Web sites—a result of my failure to imagine a future index user so needy as to require the exact date of my visit to theirvingsociety.org.uk. Or perhaps it was because my daughter was six months away from leaving home for college and I was missing her in advance.

    Perhaps for all of those reasons, I could only see my latest index as a running commentary on the fragility of all human endeavor. And so I started reading indexes while reluctantly compiling my own.

    A highly instructive tale on the importance of indexing (and hiring a professional indexer) that includes this reference to Jonathan Swift:


    Jonathan Swift, in his 1704 A Tale of a Tub, describes two means of using books: “to serve them as men do lords—learn their titles exactly and then brag of their acquaintance,” or “the choicer, the profounder and politer method, to get a thorough insight into the index, by which the whole book is governed and turned, like fishes by the tail.”

    In full context, the Swift passage is even more amusing:


    The whole course of things being thus entirely changed between us and the ancients, and the moderns wisely sensible of it, we of this age have discovered a shorter and more prudent method to become scholars and wits, without the fatigue of reading or of thinking. The most accomplished way of using books at present is twofold: either first to serve them as some men do lords, learn their titles exactly, and then brag of their acquaintance; or, secondly, which is indeed the choicer, the profounder, and politer method, to get a thorough insight into the index by which the whole book is governed and turned, like fishes by the tail. For to enter the palace of learning at the great gate requires an expense of time and forms, therefore men of much haste and little ceremony are content to get in by the back-door. For the arts are all in a flying march, and therefore more easily subdued by attacking them in the rear. Thus physicians discover the state of the whole body by consulting only what comes from behind. Thus men catch knowledge by throwing their wit on the posteriors of a book, as boys do sparrows with flinging salt upon their tails. Thus human life is best understood by the wise man’s rule of regarding the end. Thus are the sciences found, like Hercules’ oxen, by tracing them backwards. Thus are old sciences unravelled like old stockings, by beginning at the foot. (The Tale of a Tub by Jonathan Swift)

    Searching, as opposed to indexing (good indexing at any rate), is the equivalent of bragging of the acquaintance of a lord. Yes, you did find term A or term B in the text, but you don’t know what other terms appear in the text, nor do you know what other statements were made about term A or term B.

    Search is at best a partial solution and one that varies based on the skill of the searcher.

    Indexing, on the other hand, can reflect an accumulation of insights, made equally available to all readers.

    Hmmm, equally made available to all readers.

    Is search a form of discrimination?

    Is search a type of access with disproportionate (read disadvantageous) impact on some audiences and not others?

    Any research on the social class, racial, ethnic impact of search you would suggest?

    All leads and tips appreciated!

    Manning Leaks — No Real Harm (Database of Government Liars Anyone?)

    Tuesday, June 20th, 2017

    Secret Government Report: Chelsea Manning Leaks Caused No Real Harm by Jason Leopold.

    From the post:

    In the seven years since WikiLeaks published the largest leak of classified documents in history, the federal government has said they caused enormous damage to national security.

    But a secret, 107-page report, prepared by a Department of Defense task force and newly obtained by BuzzFeed News, tells a starkly different story: It says the disclosures were largely insignificant and did not cause any real harm to US interests.

    Regarding the hundreds of thousands of Iraq-related military documents and State Department cables provided by the Army private Chelsea Manning, the report assessed “with high confidence that disclosure of the Iraq data set will have no direct personal impact on current and former U.S. leadership in Iraq.”

    The 107 page report, redacted, runs 35 pages. Thanks to BuzzFeed News for prying that much of a semblance of the truth out of the government.

    It is further proof that US prosecutors and other federal government representatives lie to the courts, the press and the public, whenever its suits their purposes.

    Anyone with transcripts from the original Manning hearings, should identify statements by prosecutors at variance with this report, noting the prosecutor’s name, rank and recording the page/line reference in the transcript.

    That individual prosecutors and federal law enforcement witnesses lie is a commonly known fact. What I haven’t seen, is a central repository of all such liars and the lies they have told.

    I mention a central repository because to say one or two prosecutors have lied or been called down by a judge grabs a headline, but showing a pattern over decades of lying by the state, that could move to an entirely different level.

    Judges, even conservative ones (especially conservative ones?), don’t appreciate being lied to by anyone, including the state.

    The state has chosen lying as its default mode of operation.

    Let’s help them wear that banner.

    Interested?

    Concealed Vulnerability Survives Reboots – Consumers Left in Dark

    Monday, June 19th, 2017

    New Vulnerability Could Give Mirai the Ability to Survive Device Reboots by Catalin Cimpanu

    From the post:

    Until now, all malware targeting IoT devices survived only until the user rebooted his equipment, which cleared the device’s memory and erased the malware from the user’s equipment.

    Intense Internet scans for vulnerable targets meant that devices survived only minutes until they were reinfected again, which meant that users needed to secure devices with unique passwords or place behind firewalls to prevent exploitation.

    New vulnerability allows for permanent Mirai infections

    While researching the security of over 30 DVR brands, researchers from Pen Test Partners have discovered a new vulnerability that could allow the Mirai IoT worm and other IoT malware to survive between device reboots, permitting for the creation of a permanent IoT botnet.

    “We’ve […] found a route to remotely fix Mirai vulnerable devices,” said Pen Test Partners researcher Ken Munro. “Problem is that this method can also be used to make Mirai persistent beyond a power off reboot.”

    Understandably, Munro and his colleagues decided to refrain from publishing any details about this flaw, fearing that miscreants might weaponize it and create non-removable versions of Mirai, a malware known for launching some of the biggest DDoS attacks known today.

    Do security researchers realize concealing vulnerabilities prevents market forces from deciding the fate of insecure systems?

    Should security researchers marketing vulnerabilities to manufacturers be more important than the operation market forces on their products?

    More important than your right to choose products based on the best and latest information?

    Market forces are at work here, but they aren’t ones that will benefit consumers.

    E-Cigarette Can Hack Your Computer (Is Nothing Sacred?)

    Monday, June 19th, 2017

    Kavita Iyer has the details on how an e-cigarette can be used to hack your computer at: Know How E-Cigarette Can Be Used By Hackers To Target Your Computer.

    I’m guessing you aren’t so certain that expensive e-cigarette you “found” is harmless after all?

    Malware in e-cigarettes seems like a stretch given the number of successful phishing emails every year.

    But, a recent non-smoker maybe the security lapse you need.

    Key DoD Officials – September 1947 to June 2017

    Monday, June 19th, 2017

    While looking for a particular Department of Defense official, I stumbled on: Department of Defense Key Officials September 1947–June 2017.

    Yes, almost seventy (70) years worth of key office holders at the DoD. It’s eighty (80) pages long, produced by the Historical Office of the Secretary of Defense.

    One potential use, aside from giving historical military fiction a ring of authenticity, would be to use this as a starting set of entities to trace through the development of the military/industrial complex.

    Everyone, including me, refers to the military/industrial complex as though it is a separate entity, over there somewhere.

    But as everyone discovered with the Panama Papers, however tangled and corrupt even world-wide organizations can be, we have the technology to untangle those knots and to shine bright lights into obscure corners.

    Interested?

    DoD Audit Ready By End of September (Which September? Define “ready.”)

    Monday, June 19th, 2017

    For your Monday amusement: Pentagon Official: DoD will be audit ready by end of September by Eric White.

    From the post:

    In today’s Federal Newscast, the Defense Department’s Comptroller David Norquist said the department has been properly preparing for its deadline for audit readiness.

    The Pentagon’s top financial official said DoD will meet its deadline to be “audit ready” by the end of September. DoD has been working toward the deadline for the better part of seven years, and as the department pointed out in its most recent audit readiness update, most federal agencies haven’t earned clean opinions until they’ve been under full-scale audits for several years. But newly-confirmed comptroller David Norquist said now’s the time to start. He said the department has already contracted with several outside accounting firms to perform the audits, both for the Defense Department’s various components and an overarching audit of the entire department.

    I’m reminded of the alleged letter by the Duke of Wellington to Whitehall:

    Gentlemen,

    Whilst marching from Portugal to a position which commands the approach to Madrid and the French forces, my officers have been diligently complying with your requests which have been sent by H.M. ship from London to Lisbon and thence by dispatch to our headquarters.

    We have enumerated our saddles, bridles, tents and tent poles, and all manner of sundry items for which His Majesty’s Government holds me accountable. I have dispatched reports on the character, wit, and spleen of every officer. Each item and every farthing has been accounted for, with two regrettable exceptions for which I beg your indulgence.

    Unfortunately the sum of one shilling and ninepence remains unaccounted for in one infantry battalion’s petty cash and there has been a hideous confusion as the the number of jars of raspberry jam issued to one cavalry regiment during a sandstorm in western Spain. This reprehensible carelessness may be related to the pressure of circumstance, since we are war with France, a fact which may come as a bit of a surprise to you gentlemen in Whitehall.

    This brings me to my present purpose, which is to request elucidation of my instructions from His Majesty’s Government so that I may better understand why I am dragging an army over these barren plains. I construe that perforce it must be one of two alternative duties, as given below. I shall pursue either one with the best of my ability, but I cannot do both:

    1. To train an army of uniformed British clerks in Spain for the benefit of the accountants and copy-boys in London or perchance.

    2. To see to it that the forces of Napoleon are driven out of Spain.

    Your most obedient servant,

    Wellington

    The primary function of any military organization is suppression of the currently designated “enemy.”

    Congress should direct the Department of Homeland Security (DHS) to auditing the DoD.

    Instead of chasing fictional terrorists, DHS staff would be chasing known to exist dollars and alleged expenses.

    TensorFlow 1.2 Hits The Streets!

    Saturday, June 17th, 2017

    TensorFlow 1.2

    I’m not copying the features and improvement here, better that you download TensorFlow 1.2 and experience them for yourself!

    The incomplete list of models at TensorFlow Models:

    • adversarial_crypto: protecting communications with adversarial neural cryptography.
    • adversarial_text: semi-supervised sequence learning with adversarial training.
    • attention_ocr: a model for real-world image text extraction.
    • autoencoder: various autoencoders.
    • cognitive_mapping_and_planning: implementation of a spatial memory based mapping and planning architecture for visual navigation.
    • compression: compressing and decompressing images using a pre-trained Residual GRU network.
    • differential_privacy: privacy-preserving student models from multiple teachers.
    • domain_adaptation: domain separation networks.
    • im2txt: image-to-text neural network for image captioning.
    • inception: deep convolutional networks for computer vision.
    • learning_to_remember_rare_events: a large-scale life-long memory module for use in deep learning.
    • lm_1b: language modeling on the one billion word benchmark.
    • namignizer: recognize and generate names.
    • neural_gpu: highly parallel neural computer.
    • neural_programmer: neural network augmented with logic and mathematic operations.
    • next_frame_prediction: probabilistic future frame synthesis via cross convolutional networks.
    • object_detection: localizing and identifying multiple objects in a single image.
    • real_nvp: density estimation using real-valued non-volume preserving (real NVP) transformations.
    • resnet: deep and wide residual networks.
    • skip_thoughts: recurrent neural network sentence-to-vector encoder.
    • slim: image classification models in TF-Slim.
    • street: identify the name of a street (in France) from an image using a Deep RNN.
    • swivel: the Swivel algorithm for generating word embeddings.
    • syntaxnet: neural models of natural language syntax.
    • textsum: sequence-to-sequence with attention model for text summarization.
    • transformer: spatial transformer network, which allows the spatial manipulation of data within the network.
    • tutorials: models described in the TensorFlow tutorials.
    • video_prediction: predicting future video frames with neural advection.

    And your TensorFlow model is ….?

    Enjoy!

    If You Don’t Think “Working For The Man” Is All That Weird

    Saturday, June 17th, 2017

    J.P.Morgan’s massive guide to machine learning and big data jobs in finance by Sara Butcher.

    From the post:

    Financial services jobs go in and out of fashion. In 2001 equity research for internet companies was all the rage. In 2006, structuring collateralised debt obligations (CDOs) was the thing. In 2010, credit traders were popular. In 2014, compliance professionals were it. In 2017, it’s all about machine learning and big data. If you can get in here, your future in finance will be assured.

    J.P. Morgan’s quantitative investing and derivatives strategy team, led Marko Kolanovic and Rajesh T. Krishnamachari, has just issued the most comprehensive report ever on big data and machine learning in financial services.

    Titled, ‘Big Data and AI Strategies’ and subheaded, ‘Machine Learning and Alternative Data Approach to Investing’, the report says that machine learning will become crucial to the future functioning of markets. Analysts, portfolio managers, traders and chief investment officers all need to become familiar with machine learning techniques. If they don’t they’ll be left behind: traditional data sources like quarterly earnings and GDP figures will become increasingly irrelevant as managers using newer datasets and methods will be able to predict them in advance and to trade ahead of their release.

    At 280 pages, the report is too long to cover in detail, but we’ve pulled out the most salient points for you below.

    How important is Sarah’s post and the report by J.P. Morgan?

    Let put it this way: Sarah’s post is the first business type post I have saved as a complete webpage so I can clean it up and print without all the clutter. This year. Perhaps last year as well. It’s that important.

    Sarah’s post is a quick guide to the languages, talents and tools you will need to start “working for the man.”

    It that catches your interest, then Sarah’s post is pure gold.

    Enjoy!

    PS: I’m still working on a link for the full 280 page report. The switchboard is down for the weekend so I will be following up with J.P. Morgan on Monday next.

    The Quartz Directory of Essential Data (Directory of Directories Is More Accurate)

    Saturday, June 17th, 2017

    The Quartz Directory of Essential Data

    From the webpage:

    A curated list of useful datasets published by important sources. Please remember that “important” does not mean “correct.” You should vet these data as you would with any human source.

    Switch to the “Data” tab at the bottom of this spreadsheet and use Find (⌘ + F) to search for datasets on a particular topic.

    Note: Just because data is useful, doesn’t mean it’s easy to use. The point of this directory is to help you find data. If you need help accessing or interpreting one of these datasets, please reach out to your friendly Quartz data editor, Chris.

    Slack: @chris
    Email: c@qz.com

    A directory of 77 data directories. The breath of organizing topics, health, trade, government, for example, creates a need for repeated data mining by every new user.

    A low/no-friction method for creating more specific and re-usable directories has remained elusive.