Archive for August, 2016

NiFi 1.0

Wednesday, August 31st, 2016

NiFi 1.0 (download page)

NiFi 1.0 dropped today!

From the NiFi homepage:

Apache NiFi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. Some of the high-level capabilities and objectives of Apache NiFi include:

  • Web-based user interface
    • Seamless experience between design, control, feedback, and monitoring
  • Highly configurable
    • Loss tolerant vs guaranteed delivery
    • Low latency vs high throughput
    • Dynamic prioritization
    • Flow can be modified at runtime
    • Back pressure
  • Data Provenance
    • Track dataflow from beginning to end
  • Designed for extension
    • Build your own processors and more
    • Enables rapid development and effective testing
  • Secure
    • SSL, SSH, HTTPS, encrypted content, etc…
    • Multi-tenant authorization and internal authorization/policy management

I haven’t been following this project but the expression language for manipulation of data in a flow looks especially interesting.

Next Gen Tor?

Wednesday, August 31st, 2016

Building a new Tor that can resist next-generation state surveillance by J.M. Porup.

A great survey of both the current status of Tor as well as projects that wish they could replace it.

Highly recommended except for the “Tor is not perfect” and some unknown solution will be stronger tone.

Perhaps, perhaps not, the key insight should be no security solution is perfect. Not now, not ever.

The Snowden leak, which is mentioned in the post, is evidence that even practically unlimited budgets are no guarantee of security.

Silos You Will Always Have With You (Apologies to the Apostle Matthew)

Wednesday, August 31st, 2016

Apologies to the Apostle Matthew, but “The poor you will always have with you…” (Matt. 26:11), sprang to mind when I read the interview with Mayur Gupta, Chief Digital Officer of Healthgrades, in Digital Transformation in Healthcare with Mayur Gupta, Chief Digital Officer, Healthgrades.

Or at least my rendering of that passage as:

Silos You Will Always Have With You

when I read:

I think two things and this is again this is something that I’ve learned through my career and continue to learn. First and foremost, is break down those silos. Connect the dots, drive convergence in every single aspect of your business. Whether that is how you organized, how you’re structured, the kind of talent to bring in. How you look at data, how you look at technology. It doesn’t matter what vertical it is, but just drive convergence.

We living in a world that is all about ecosystems and platforms not about silo products, silo technologies, silo experiences. And I think the best way to think about it is from a consumer standpoint she does not see the silos you know. She does not see a channel. All she expects is the best experience, the best service, the best product, at the best price. You know at a location in touchpoint at a time of her own choice. And the only way we as brands and technologists and marketers can make that happen is when we break down those silos and we drive conversions in our own world and we stopped looking at digital as the thing, because we now live, operate and breathe in an intrinsically digital world.

Silos have been a recognized issue since organized record keeping began.

The universal solution: Let’s build another, bigger silo!

Think about it. There is never going to be a time when new and different information systems and data will not be appearing.

Rather than flailing against silos, along with all the political costs of the same, why not keep your current silos and use topic maps to map across those silos?

Those interested in preserving “silos” (you know who you are), will be happy because their systems and sovereignty over them is preserved, yet other stakeholders can combine that data with new data, for other purposes.

Do you have the political capital to defeat current silos while trying to erect your own?

As I said, “silos you will always have with you….”

You can accept that and use topic maps to create your new “silo” or fight against existing silos.

Your call.

Upgrade or I’ll Tell! [SWIFT To Banks (Where the money is)]

Wednesday, August 31st, 2016

SWIFT Discloses New Cyber-Heists, Urges Banks to Boost Security Measures by Maritza Santillan.

From the post:

SWIFT, the messaging network used by financial institutions to complete transactions, announced on Tuesday it has discovered new cyber-theft attempts against its member banks.

According to a report by Reuters, the company sent out a private letter to global clients, warning that new cyber-heists have occurred since June this year.

“Customers’ environments have been compromised, and subsequent attempts (were) made to send fraudulent payment instructions,” read a copy of the letter, which was obtained by Reuters.

Furthermore, SWIFT announced it plans to suspend banks with poor security practices. In the letter, the firm notified banks they must install the latest version of its software by November 19, or they could be reported to regulators and banking partners.

The source for the “suspend banks,” said SWIFT was considering all its options so I would not take that threat very seriously.

One doubts a majority of its members could survive a garden variety SQLi attack on one or more of their locations. That won’t get you to SWIFT but its a good gauge of how serious security is, or rather isn’t, taken by your local bank.

Start hacking on the current version of the SWIFT software as some banks will upgrade by the November 19, 2016 target date.

Like the man says, it’s where the money is.

The Next Generation R Documentation System [Dynamic R Documentation?]

Wednesday, August 31st, 2016

The R Documentation Task Force: The Next Generation R Documentation System by Joseph Rickert and Hadley Wickham.

From the post:

Andrew Redd received $10,000 to lead a new ISC working group, The R Documentation Task Force, which has a mission to design and build the next generation R documentation system. The task force will identify issues with documentation that currently exist, abstract the current Rd system into an R compatible structure, and extend this structure to include new considerations that were not concerns when the Rd system was first implemented. The goal of the project is to create a system that allows for documentation to exist as objects that can be manipulated inside R. This will make the process of creating R documentation much more flexible enabling new capabilities such as porting documentation from other languages or creating inline comments. The new capabilities will add rigor to the documentation process and enable the the system to operate more efficiently than any current methods allow. For more detail have a look at the R Documentation Task Force proposal (Full Text).

The task force team hopes to complete the new documentation system in time for the International R Users Conference, UseR! 2017, which begins July 4th 2017. If you are interested in participating in this task force, please contact Andrew Redd directly via email ( Outline your interest in the project, you experience with documentation any special skills you may have. The task force team is particularly interested in experience with documentation systems for languages other than R and C/C++.

OK, I have a weakness for documentation projects!

See the full proposal for all the details but:

There are two aspects of R documentation I intend to address which will make R an exemplary system for documentation.

The first aspect is storage. The mechanism of storing documentation in separate Rd files hinders the development process and ties documentation to the packaging system, and this need not be so. Life does not always follow the ideal; code and data are not always distributed via nice packages. Decoupling the documentation from the packaging system will allow for more dynamic and flexible documentation strategies, while also simplifying the process of transitioning to packages distributed through CRAN or other outlets.

The second aspect is flexibility of defining documentation. R is a language of flexibility and preference. There are many paths to the same outcome in R. While this has often been a source of confusion to new users of R, however it is also one of R’s greatest strengths. With packages flexibility has allowed for many contributions, some have fallen in favor while others have proven superior. Adding flexibility in documentation methods will allow for newer, and ideally improved, methods to be developed.

Have you seen the timeline?

  • Mid-August 2016 notification of approval.
  • September 1, 2016 Kickoff for the R Documentation Task Force with final members.
  • September 16, 2016 Deadline for submitting posts to the R-consortium blog, the R-announce, Rpackage-devel, and R-devel mailing lists, announcing the project.
  • September 1 through November 27th 2016 The task force conducts bi-weekly meetings via Lync to address issues in documentation.
  • November 27th, 2016 Deadline for preliminary recommendations of documentation extensions. Recommendations and conflicts written up and submitted to the R journal to be published in the December 2016 issue.
  • December 2016 Posts made to the R Consortium blog, and R mailing lists to coincide with the R Journal article to call for public participation.
  • January 27, 2017 Deadline for general comments on recommendations. Work begins to finalize new documentation system.
  • February 2017 Task force meets to finalize decisions after public input.
  • February-May 2017 Task force meets monthly as necessary to monitor progress on code development.
  • May 2017 Article is submitted outlining final recommendations and the subsequent tools developed to the R Journal for review targeting the June 2017 issue.
  • July 4-7, 2017 Developments will be presented at the International R users conference in Brussels, Belgium.

A very ambitious schedule and one that leaves me wondering if December of 2016 is the first opportunity for public participation, will notes/discussions from the bi-weekly meetings be published before then?

Probably incorrect but I have the impression from the proposal that documentation is regarded as a contiguous mass of text. Yes?

I ask because the “…contiguous mass of text…” model for documentation is a very poor one.

Documentation can present to a user as though it were a “…contiguous mass of text…,” but as I said, a very poor model for documentation itself.

Imagine R documentation that automatically updates itself from R-Bloggers, for example, to include the latest tutorials on a package.

Or that updates to include new data sets, issued since the last documentation update.

Treating documentation as though it must be episodically static should have been abandoned years ago.

The use of R and R development are not static, why should its documentation be?

Elsevier Awarded U.S. Patent For “Online Peer Review System and Method” [Sack Pool?]

Tuesday, August 30th, 2016

Elsevier Awarded U.S. Patent For “Online Peer Review System and Method” by Gary Price.

Gary quotes the abstract:

An online document management system is disclosed. In one embodiment, the online document management system comprises: one or more editorial computers operated by one or more administrators or editors, the editorial computers send invitations and manage peer review of document submissions; one or more system computers, the system computers maintain journals, records of submitted documents and user profiles, and issue notifications; and one or more user computers; the user computers submit documents or revisions to the document management system; wherein one or more of the editorial computers coordinate with one or more of the system computers to migrate one or more documents between journals maintained by the online document management system.

Is there a pool on the staff that recommended and pursued that patent being awarded the sack in the next week?

Revamped L0phtCrack 7… [SQLi as “…highly sophisticated…”]

Tuesday, August 30th, 2016

Revamped L0phtCrack 7 Audits Windows and Unix Passwords Up to 500 Times Faster

From the post:

August 30, 2016: Today, L0pht Holdings, LLC, developer of L0phtCrack, the original Windows password auditor, announces the immediate availability of the fully revamped L0phtCrack 7. This new version has an all new cracking engine which takes optimal advantage of multi-core CPUs and multi-core GPUs. A 4-core CPU running a brute force audit with L0phtCrack 7 is now 5 times faster than L0phtCrack 6. If you have a GPU such as the AMD Radeon Pro Duo the speedup is an astounding 500 times!

L0phtCrack was first released 19 years ago. Its password cracking capability forced Microsoft to make improvements to the way Windows stored password hashes. Microsoft eventually deprecated the weak LANMAN password hash and switched to only the stronger NTLM password hash it still uses today. Yet, hardware and password cracking algorithms have improved greatly in the intervening years. The new release of L0phtCrack 7 demonstrates that current Windows passwords are easier to crack today than they were 18 years ago when Microsoft started making much needed password strength improvements.

On a circa-1998 computer with a Pentium II 400 MHz CPU, the original L0phtCrack could crack a Windows NT, 8 character long alphanumeric password in 24 hours. On a 2016 gaming machine, at less hardware cost, L0phtCrack 7 can crack the same passwords stored on the latest Windows 10 in 2 hours. Windows passwords have become much less secure over time and are now much more easily cracked than in the era of Windows NT. Other OSes, such as Linux, offer much more secure password hashing, including the NSA recommended SHA-512.

The ease of abusing weak Windows domain user passwords is not lost on attackers. In fact, a recent study[1] by Praetorian of 100 penetration tests for 75 organizations found that the most prevalent insecure finding in the kill chain, at 66% of the time, is weak domain user passwords. L0phtCrack 7 can easily audit your Windows domain to discover weak domain user passwords in a few hours. Then, with a few clicks, remediate the vulnerability with forced password resets or by disabling unused accounts completely.

In addition to auditing passwords much faster, L0phtCrack 7 includes improvements in its easy to use password auditing wizard, scheduling, and reporting. An updated password hash importer works seamlessly locally and remotely with all versions of Windows, up to and including Windows 10 “Anniversary Edition”. There is also support for many new types of UNIX password hashes. A new plugin interface will allow 3rd parties to build password importers and password hash crackers for new types of passwords in the future.

Full details on features, licensing, pricing, and the complete documentation is available on our website, A 15 day free trial download is available. Test your password strength today!

L0phtCrack 7 in case you want to move up a level from SQLi attacks, which the Illinois State Board of Elections sent a message characterizing SQLi as:

· The method used was SQL injection. The offenders were able to inject SQL database queries into the IVRS database in order to access information. This was a highly sophisticated attack most likely from a foreign (international) entity.

With that degree of ignorance, voter fraud in Illinois becomes quite credible.

The Wrong Way to Teach Grammar [Programming?]

Tuesday, August 30th, 2016

The Wrong Way to Teach Grammar by Michelle Navarre Cleary.

From the post:

A century of research shows that traditional grammar lessons—those hours spent diagramming sentences and memorizing parts of speech—don’t help and may even hinder students’ efforts to become better writers. Yes, they need to learn grammar, but the old-fashioned way does not work.

This finding—confirmed in 1984, 2007, and 2012 through reviews of over 250 studies—is consistent among students of all ages, from elementary school through college. For example, one well-regarded study followed three groups of students from 9th to 11th grade where one group had traditional rule-bound lessons, a second received an alternative approach to grammar instruction, and a third received no grammar lessons at all, just more literature and creative writing. The result: No significant differences among the three groups—except that both grammar groups emerged with a strong antipathy to English.

There is a real cost to ignoring such findings. In my work with adults who dropped out of school before earning a college degree, I have found over and over again that they over-edit themselves from the moment they sit down to write. They report thoughts like “Is this right? Is that right?” and “Oh my god, if I write a contraction, I’m going to flunk.” Focused on being correct, they never give themselves a chance to explore their ideas or ways of expressing those ideas. Significantly, this sometimes-debilitating focus on “the rules” can be found in students who attended elite private institutions as well as those from resource-strapped public schools.

(Three out of five links here are pay-per-view. Sorry.)

It’s only a century of research. Don’t want to rush into anything. 😉

How would you adapt this finding to teaching programming and/or hacking?


Security Lessons Learned from Harry Potter

Tuesday, August 30th, 2016

POPsec Part 1: Security Lessons Learned from Harry Potter by Elle Armageddon.

From the post:

There are a lot of security lessons we can learn by examining popular media, analyzing mistakes which are made, and striving not to repeat them. The Harry Potter series is rich with such lessons, and while the following contains all kinds of spoilers (for every one of the books/movies), it’s also full of important life lessons we can take away by scrutinizing the mishaps which take place in the Wizarding World.

Being a Harry Potter fan increased my enjoyment but the lessons are valuable to everyone.

Looking forward to more installments of PopSec!

PS: Where do you learn your security lessons?

The rich are getting more secretive with their money [Calling All Cybercriminals]

Tuesday, August 30th, 2016

The rich are getting more secretive with their money by Rachael Levy.

From the post:

You might think the Panama Papers leak would cause the ultrarich to seek more transparent tax havens.

Not so, according to Jordan Greenaway, a consultant based in London who caters to the ultrawealthy.

Instead, they are going further underground, seeking walled-up havens such as the Marshall Islands, Lebanon, and Antigua, Greenaway, who works for the PR agency Right Angles, told Business Insider.

The Panama Papers leak around Mossack Fonseca, a law firm that helped politicians and businesspeople hide their money, has increased anxiety among the rich over being exposed, Greenaway told New York reporters in a meeting last week.

“The Panama Papers sent them to the ground,” he said

I should hope so.

The Panama Papers leak, what we know of it (hint, hint to data hoarders), was like giants capturing dwarfs in a sack. It takes some effort but not a lot.

Especially when someone dumps the Panama Papers data in your lap. News organizations have labored to make sense of that massive trove of data but its acquisition wasn’t difficult.

From Rachael’s report, the rich want to up their game on data acquisition. Fair enough.

But 2016 cybersecurity reports leave you agreeing that “sieve” is a generous description of current information security.

Cybercriminals are reluctant to share their exploits, but after exploiting data fully, they should dump their data to public repositories.

That will protect their interests (I didn’t say legitimate) in their exploits and at the same time, enable others to track the secrets of the wealthy, albeit with a time delay.

The IRS and EU tax authorities will both subscribe to RSS feeds for such data.

‘Trump Revealed’: The reporting archive

Tuesday, August 30th, 2016

‘Trump Revealed’: The reporting archive by The Washington Post.

A big jump from category theory to Donald Trump but that’s how it came in. 😉

From the post:

The Post is making public today a sizable portion of the raw reporting used in the development of “Trump Revealed,” a biography of the Republican presidential nominee published August 23 by Scribner. Drawn from the work of more than two dozen Post journalists, the archive contains 398 documents, comprising thousands of pages of interview transcripts, court filings, financial reports, immigration records and other material. Interviews conducted off the record were removed, as was other material The Post did not have the right to publish. The archive is searchable and navigable in a number of ways. It is meant as a resource for other journalists and a trove to explore for our many readers fascinated by original documents.

Kudos to The Washington Post for this document dump!

Mapping this data to your data and data of other users? A lot of duplicate effort that has been left to the reader.


Category Theory 1.2

Tuesday, August 30th, 2016

Category Theory 1.2 by Bartosz Milewski.

Brief notes on the first couple of minutes:

Our toolset includes:

Abstraction – lose the details – things that were different are now the same

Composition –


Identity – what is identical or considered to be identical

Composition and Identity define category theory.

Despite the bad press about category theory, I was disappointed when the video ended at the end of approximately 48 minutes.

Yes, it was that entertaining!

If you ever shied away from category theory, start with Category Theory 1.1 and follow on!

Or try Category Theory for Programmers: The Preface, also by Bartosz Milewski.

Flash Alert for SQLi?

Tuesday, August 30th, 2016

Never missing a chance to stir the pot of public panic, the FBI issued a “Flash Alert” on an SQLi hack of a state voter database.

If you are missing tools, cheatsheets for SQLi attacks, see my post: Developer Liability For Egregiously Poor Software. I list five cheatsheets along with an SQL scanner list.

SQLi is the top hack, every year since they started keeping such statistics and is now 18 years old.

A Flash Alert for an SQLi attack may as well be:

Flash Alert: Sexual intercourse between humans may lead to pregnancy and venereal disease.

You have been warned! 😉

PS: Consider yourself as having a “DIRECT NEED TO KNOW” before viewing the Flash Alert

Mapping U.S. wildfire data from public feeds

Monday, August 29th, 2016

Mapping U.S. wildfire data from public feeds by David Clark.

From the post:

With the Mapbox Datasets API, you can create data-based maps that continuously update. As new data arrives, you can push incremental changes to your datasets, then update connected tilesets or use the data directly in a map.

U.S. wildfires have been in the news this summer, as they are every summer, so I set out to create an automatically updating wildfire map.

An excellent example of using public data feeds to create a resource not otherwise available.

Historical fire data can be found at: Federal Wildland Fire Occurrence Data, spanning 1980 through 2015.

The Outlooks page of the National Interagency Coordination Center provides four month (from current month) outlook and weekly outlook fire potential reports and maps.

Looking For Your Next Cyber Jedi

Monday, August 29th, 2016

DoD Taps DEF CON Hacker Traits For Cybersecurity Training Program by Kelly Jackson Higgins.

The Department of Defense sends Frank DiGiovanni, director of force training in DoD’s Office of the Assistant Secretary of Defense for Readiness, to DEF CON 24.

His mission?

“My purpose was to really learn from people who come to DEF CON … Who are they? How do I understand who they are? What motivates them? What sort of attributes” are valuable to the field, the former Air Force officer and pilot who heads overall training policy for the military, says.

DiGiovanni interviewed more than 20 different security industry experts and executives during DEF CON. His main question: “If you’re going to hire someone to either replace you or eventually be your next cyber Jedi, what are you looking for?”

The big takeaway from DiGiovanni’s DEF CON research: STEM, aka science, technology, engineering, and mathematics, was not one of the top skills organizations look for in their cyber-Jedis. “Almost no one talked about technical capabilities or technical chops,” he says. “That was the biggest revelation for me.”

DiGiovanni compiled a list of attributes for the cyber-Jedi archetype based on his interviews. The ultimate hacker/security expert, he found, has skillsets such as creativity and curiosity, resourcefulness, persistence, and teamwork, for example.
… (emphasis added)

The DoD has $millions to throw at creating cyber-Jedis.

If you plan to stay ahead, now would be a good time to start.

PS: If you attend the next DEF CON, keep an eye out for Frank:


ISIS Turns To Telegram App After Twitter Crackdown [Farce Alert + My Telegram Handle]

Monday, August 29th, 2016

ISIS Turns To Telegram App After Twitter Crackdown

From the post:

With the micro-blogging site Twitter coming down heavily on ISIS-sponsored accounts, the terrorist organisation and its followers are fast joining the heavily-encrypted messaging app Telegram built by a Russian developer.

On Telegram, the ISIS followers are laying out detailed plans to conduct bombing attacks in the west, reported on Monday.

France and Germany have issued statements that they now want a crackdown against them on Telegram.

“Encrypted communications among terrorists constitute a challenge during investigations. Solutions must be found to enable effective investigation… while at the same time protecting the digital privacy of citizens by ensuring the availability of strong encryption,” the statement said.


Oh, did you notice the source? “ reported on Monday.”

If you skip over to that post: IS Followers Flock to Telegram After being Driven from Twitter (I don’t want to shame the author so omitting their name), it reads in part:

With millions of IS loyalists communicating with one another on Telegram and spreading their message of radical Islam and extremism, France and Germany last week said that they want a continent wide effort to allow for a crackdown on Telegram.

“Encrypted communications among terrorists constitute a challenge during investigations,” France and Germany said in a statement. “Solutions must be found to enable effective investigation… while at the same time protecting the digital privacy of citizens by ensuring the availability of strong encryption.”

On private Telegram channels, IS followers have laid out detailed plans to poison Westerners and conduct bombing attacks, reports say.

What? “…millions of IS loyalists…?” IS in total is about 30K of active fighters, maybe. Millions of loyalists? Documentation? Citation of some sort? Being the Voice of America, I’d say they pulled that number out of a dark place.

Meanwhile, while complaining about the strong encryption, they are party to:

detailed plans to poison Westerners and conduct bombing attacks, reports say.

You do know wishing Westerners would choke on their Fritos doesn’t constitute a plan. Yes?

Neither does wishing to have an unspecified bomb, to be exploded at some unspecified location, at no particular time, constitute planning either.

Not to mention that “reports say” is a euphemism for: “…we just made it up.”

Get yourself to Telegram!



They left out my favorite:

Annoy governments seeking to invade a person’s privacy.

Reclaim your privacy today! Telegram!

Caveat: I tried using one device for the SMS to setup my smartphone. Nada, nyet, no joy. Had to use my cellphone number to setup the account on the cellphone. OK, but annoying.

BTW, on Telegram, my handle is @PatrickDurusau.

Yes, my real name. Which excludes this account from anything requiring OpSec. 😉

Hunters Bag > 400 Database Catalogs

Monday, August 29th, 2016

Transparency Hunters Capture More than 400 California Database Catalogs by Dave Maass.

The post in its entirety:

A team of over 40 transparency activists aimed their browsers at California this past weekend, collecting more than 400 database catalogs from local government agencies, as required under a new state law. Together, participants in the California Database Hunt shined light on thousands upon thousands of government record systems.

California S.B. 272 requires every local government body, with the exception of educational agencies, to post inventories of their “enterprise systems,” essentially every database that holds records on members of the public or is used as a primary source of information. These database catalogs were required to be posted online (at least by agencies with websites) by July 1, 2016.

EFF, the Data Foundation, the Sunlight Foundation, and Level Zero, combined forces to host volunteers in San Francisco, Washington, D.C., and remotely. More than 40 volunteers scoured as many local agency websites as we could in four hours—cities, counties, regional transportation agencies, water districts, etc. Here are the rough numbers:

680 – The number of unique agencies that supporters searched

970 – The number of searches conducted (Note: agencies found on the first pass not to have catalogs were searched a second time)

430 – Number of agencies with database catalogs online

250 – Number of agencies without database catalogs online, as verified by two people

Download a spreadsheet of the local government database catalogs we found: Excel/TSV

Download a spreadsheet of cities and counties that did not have S.B. 272 catalogs: Excel/TSV

Please note that for each of the cities and counties identified as not posting database catalogs, at least two volunteers searched for the catalogs and could not find them. It is possible that those agencies do in fact have S.B. 272-compliant catalogs posted somewhere, but not in what we would call a “prominent location,” as required by the new law. If you represent an agency that would like its database catalog listed, please send an email to

We owe a debt of gratitude to the dozens of volunteers who sacrificed their Saturday afternoons to help make local government in California a little less opaque. Check out this 360-degree photo of our San Francisco team on Facebook.

In the coming days and weeks, we plan to analyze and share the data further. Stay tuned, and if you find anything interesting perusing these database catalogs, please drop us a line at

Of course, bagging the database catalogs is like having a collection of Christmas catalogs. It’s great, but there are more riches within!

What data products would you look for first?

Updated to mirror changes (clarification) in original.

DataScience+ (R Tutorials)

Monday, August 29th, 2016


From the webpage:

We share R tutorials from scientists at academic and scientific institutions with a goal to give everyone in the world access to a free knowledge. Our tutorials cover different topics including statistics, data manipulation and visualization!

I encountered DataScience+ while running down David Kun’s RDBL post.

As of today, there are 120 tutorials with 451,129 reads.

That’s impressive! Whether you are looking for tutorials or you are looking to post your R tutorial where it will be appreciated.


RDBL – manipulate data in-database with R code only

Monday, August 29th, 2016

RDBL – manipulate data in-database with R code only by David Kun.

From the post:

In this post I introduce our own package RDBL, the R DataBase Layer. With this package you can manipulate data in-database without writing SQL code. The package interprets the R code and sends out the corresponding SQL statements to the database, fully transparently. To minimize overhead, the data is only fetched when absolutely necessary, allowing the user to create the relevant joins (merge), filters (logical indexing) and groupings (aggregation) in R code before the SQL is run on the database. The core idea behind RDBL is to let R users with little or no SQL knowledge to utilize the power of SQL database engines for data manipulation.

It is important to note that the SQL statements generated in the background are not executed unless explicitly requested by the command Hence, you can merge, filter and aggregate your dataset on the database side and load only the result set into memory for R.

In general the design principle behind RDBL is to keep the models as close as possible to the usual data.frame logic, including (as shown later in detail) commands like aggregate, referencing columns by the \($\) operator and features like logical indexing using the \([]\) operator.

RDBL supports a connection to any SQL-like data source which supports a DBI interface or an ODBC connection, including but not limited to Oracle, MySQL, SQLite, SQL Server, MS Access and more.

Not as much fun as surfing mall wifi for logins/passwords, but it is something you can use at work.

The best feature is that you load resulting data sets only. RDBL uses databases for what they do well. Odd but efficient practices do happen from time to time.

I first saw this in a tweet by Christophe Lalanne.

Wifi Fishing

Monday, August 29th, 2016

4th grader’s project on cyber security proves people will click on anything by Erin Cargile.

Evan Robertson programmed a mobile hot spot with this pop-up to connect:

…You allow any and all data you transmit to be received, reused, modified and/or redistributed in any way we deem fit. You agree to allow your connecting device to be accessed and/or modified by us in any way, including but not limited to harvesting personal information, reading and responding to your emails…If you are still reading this you should definitely not connect to this network. It’s not radical, dude. Also, we love cats. Have a good day!”

More than half of the people who connected, accepted the terms!

Sounds like a great group project for the holidays! Especially if you will be at the shopping mall anyway.

Come to think of it, use a bank logo, with more reasonable terms and you will attract unwary hackers as well.

For an extra webpage or two, you may collect some logins and passwords as well.


Open Source Software & The Department of Defense

Monday, August 29th, 2016

Open Source Software & The Department of Defense by Ben FitzGerald, Peter L. Levin, and Jacqueline Parziale.

A great resource for sharing with Department of Defense (DoD) staff who may be in positions to influence software development, acquisition policies.

In particular you may want to point to the “myths” about security and open source software:

Discussion of open source software in national security is often dismissed out of hand because of technical security
concerns. These are unfounded.

To debunk a few myths:

  • Using open source licensing does not mean that changes to the source code must be shared publicly.
  • The ability to see source code is not the same as the ability to modify deployed software in production.
  • Using open source components is not equivalent to creating an entire system that is itself open sourced.

As In-Q-Tel’s Chief Information Security Officer Dan Geer explains, security is “the absence of unmitigatable surprise.”23 It is particularly difficult to mitigate surprise with closed proprietary software, because the source code, and therefore the ability to identify and address its vulnerabilities, is hidden. “Security through obscurity” is not an effective defense against today’s cybersecurity threats.

In this context, open source software can generate better security outcomes than proprietary alternatives. Conventional anti-malware scanning and intrusion detection are inadequate for many reasons, including their “focus on known vulnerabilities” that miss unknown threats, such as zero-day exploits. As an example, a DARPA-funded team built a flight controller for small quadcopter drones based on an open source autopilot readily downloaded from the Internet. A red team “found no security flaws in six weeks with full access [to the] source code,” making their UAV the most secure on the planet.24

Except that “security” to a DoD contractor has little to do with software security.

No, for a DoD contractor, “security” means change orders, which trigger additional software development cycles, which are largely unauditable, software testing, changes to documentation, all of which could be negatively impacted by “…an open source autopilot.”

If open source is used, there are fewer billing opportunities and that threatens the “security” of DoD contractors.

The paper makes a great case for why the DoD should make greater use of open source software and development practices, but the DoD will have to break the strangle hold of a number of current DoD contractors to do so.

Status of the Kernel Self Protection Project

Monday, August 29th, 2016

Status of the Kernel Self Protection Project by Kees (“Case”) Cook.

Slides from the Linux Security Summit 2016.

Kernel Self Protection Project links:

kernel-hardening mailing list archive.

Kernel Self Protection Project – wiki page.

Kees’ review of bug classes provides a guide to searching for new bugs and capturing data about existing one.


PS: Motivation to participate in this project:

Every bug fix, makes users safer from cybercriminals and incrementally diminishes government spying.

Ethics for Powerful Algorithms

Sunday, August 28th, 2016

Ethics for Powerful Algorithms by Abe Gong.

Abe’s four questions:

  1. Are the statistics solid?
  2. Who wins? Who loses?
  3. Are those changes to power structures healthy?
  4. How can we mitigate harms?

Remind me of my favorite scene from Labyrinth:


Sarah: That’s not fair!
Jareth: You say that so often, I wonder what your basis for comparison is?

Isn’t the question of “fairness” one for your client?

Twitter Said to Work on Anti-Harassment Keyword Filtering Tool [Good News!]

Sunday, August 28th, 2016

Twitter Said to Work on Anti-Harassment Keyword Filtering Tool by Sarah Frier.

From the post:

Twitter Inc. is working on a keyword-based tool that will let people filter the posts they see, giving users a more effective way to block out harassing and offensive tweets, according to people familiar with the matter.

The San Francisco-based company has been discussing how to implement the tool for about a year as it seeks to stem abuse on the site, said the people, who asked not to be identified because the initiative isn’t public. By using keywords, users could block swear words or racial slurs, for example, to screen out offenders.

Nice to have good news to report about Twitter!

Suggestions before the code gets set in stone:

  • Enable users to “follow” filters of other users
  • Enable filters to filter on nicknames in content and as sender
  • Regexes anyone?

A big step towards empowering users!

srez: Image super-resolution through deep learning

Sunday, August 28th, 2016

srez: Image super-resolution through deep learning. by David Garcia.

From the webpage:

Image super-resolution through deep learning. This project uses deep learning to upscale 16×16 images by a 4x factor. The resulting 64×64 images display sharp features that are plausible based on the dataset that was used to train the neural net.

Here’s an random, non cherry-picked, example of what this network can do. From left to right, the first column is the 16×16 input image, the second one is what you would get from a standard bicubic interpolation, the third is the output generated by the neural net, and on the right is the ground truth.


Once you have collected names, you are likely to need image processing.

Here’s an interesting technique using deep learning. Face on at the moment but you can expect that to improve.

The Court That Rules The World

Sunday, August 28th, 2016

The Court That Rules The World by Chris Hamby.

If the Trans-Pacific Partnership (TPP) and investor-state dispute settlement (ISDS) don’t sound dangerous to you, this series will change your mind.

Imagine a private, global super court that empowers corporations to bend countries to their will.

Say a nation tries to prosecute a corrupt CEO or ban dangerous pollution. Imagine that a company could turn to this super court and sue the whole country for daring to interfere with its profits, demanding hundreds of millions or even billions of dollars as retribution.

Imagine that this court is so powerful that nations often must heed its rulings as if they came from their own supreme courts, with no meaningful way to appeal. That it operates unconstrained by precedent or any significant public oversight, often keeping its proceedings and sometimes even its decisions secret. That the people who decide its cases are largely elite Western corporate attorneys who have a vested interest in expanding the court’s authority because they profit from it directly, arguing cases one day and then sitting in judgment another. That some of them half-jokingly refer to themselves as “The Club” or “The Mafia.”

And imagine that the penalties this court has imposed have been so crushing — and its decisions so unpredictable — that some nations dare not risk a trial, responding to the mere threat of a lawsuit by offering vast concessions, such as rolling back their own laws or even wiping away the punishments of convicted criminals.

This system is already in place, operating behind closed doors in office buildings and conference rooms in cities around the world. Known as investor-state dispute settlement, or ISDS, it is written into a vast network of treaties that govern international trade and investment, including NAFTA and the Trans-Pacific Partnership, which Congress must soon decide whether to ratify.

These trade pacts have become a flashpoint in the US presidential campaign. But an 18-month BuzzFeed News investigation, spanning three continents and involving more than 200 interviews and tens of thousands of documents, many of them previously confidential, has exposed an obscure but immensely consequential feature of these trade treaties, the secret operations of these tribunals, and the ways that business has co-opted them to bring sovereign nations to heel.

The BuzzFeed News investigation explores four different aspects of ISDS. In coming days, it will show how the mere threat of an ISDS case can intimidate a nation into gutting its own laws, how some financial firms have transformed what was intended to be a system of justice into an engine of profit, and how America is surprisingly vulnerable to suits from foreign companies.

(emphasis in original)

Read carefully and take names.

Few, if any, are beyond one degree of separation from the Internet.

Do Your Part! Illegally Download Scientific Papers

Sunday, August 28th, 2016


From Rob Beschizza’s post at: Do Your Part! Illegally Download Scientific Papers, which has a poster size, 1940 x 2521 pixel resolution, version.

Text To Image Synthesis Using Thought Vectors

Sunday, August 28th, 2016

Text To Image Synthesis Using Thought Vectors by Paarth Neekhara.


This is an experimental tensorflow implementation of synthesizing images from captions using Skip Thought Vectors. The images are synthesized using the GAN-CLS Algorithm from the paper Generative Adversarial Text-to-Image Synthesis. This implementation is built on top of the excellent DCGAN in Tensorflow. The following is the model architecture. The blue bars represent the Skip Thought Vectors for the captions.

OK, that didn’t grab my attention, but this did:


Full size image.

Not quite “Tea, Earl Grey, Hot,” but a step in that direction!

D3 in Depth

Saturday, August 27th, 2016

D3 in Depth by Peter Cook.

From the introduction:

D3 is an open source JavaScript library for:

  • data-driven manipulation of the Document Object Model (DOM)
  • working with data and shapes
  • laying out visual elements for linear, hierarchical, network and geographic data
  • enabling smooth transitions between user interface (UI) states
  • enabling effective user interaction

Let’s unpick these one by one.

Peter forgets to mention, there will be illustrations:


Same data as a packed circle:


Same data as a treemap:


The first two chapters are up and I’m waiting for more!


PS: Follow Peter at: @animateddata.

SkySafari 5 for Android

Saturday, August 27th, 2016

SkySafari 5 for Android

I say go for the SkySafari 5 Pro!

SkySafari 5

SkySafari 5 shows you 119,000 stars, 220 of the best-known star clusters, nebulae, and galaxies in the sky; including all of the Solar System’s major planets and moons, and more than 500 asteroids, comets, and satellites. ($1.49)

SkySafari 5 Plus

SkySafari 5 Plus shows you 2.6 million stars, and 31,000 deep sky objects; including the entire NGC/IC catalog, and 18,000 asteroids, comets, and satellites with updatable orbits. Plus, state of the art mobile telescope control. ($7.49)

SkySafari 5 Pro

SkySafari 5 Pro includes over 27 million stars, 740,000 galaxies down to 18th magnitude, and 620,000 solar system objects; including every comet and asteroid ever discovered. Plus, state of the art mobile telescope control. ($19.99)

(prices as of today and as always, subject to change)

I may start using my smartphone for more than monitoring my tweet stream. 😉