Standard Driven Bugs – Must Watch Presentation For Standards Geeks

December 16th, 2017

From the description:

Web standards are ever-evolving and determine what browsers can do. But new features can also lead to new vulnerabilities as they exercise existing functionality in new and unexpected ways. This talk discusses some of the more interesting and unusual features of JavaScript, and how they lead to bugs in a variety of software, including Adobe Flash, Chrome, Microsoft Edge and Safari.

Natalie Silvanovich is a security researcher at Google Project Zero.

Whether you are looking for origin of bugs in a standard or playing the long game, creating the origin of bugs in standards (NSA for example), this is a must watch video!

A transcript with CVE links, etc, would be especially useful.

Russians? Nation State? Dorm Room? Mirai Botnet Facts

December 16th, 2017

How a Dorm Room Minecraft Scam Brought Down the Internet by Garett M. Graff.

From the post:

The most dramatic cybersecurity story of 2016 came to a quiet conclusion Friday in an Anchorage courtroom, as three young American computer savants pleaded guilty to masterminding an unprecedented botnet—powered by unsecured internet-of-things devices like security cameras and wireless routers—that unleashed sweeping attacks on key internet services around the globe last fall. What drove them wasn’t anarchist politics or shadowy ties to a nation-state. It was Minecraft.

Graff’s account is mandatory reading for:

  • Hackers who want to avoid discovery by the FBI
  • Journalists who want to avoid false and/or misleading claims about cyberattacks
  • Manufacturers who want to avoid producing insecure devices (a very small number)
  • Readers who interested in how the Mirai botnet hype played out

Enjoy!

“It is more blessed to give than to receive.” Mallers, WiFiPhisher Can Help You With That!

December 16th, 2017

Acts 20:35 records Jesus as saying, in part: “It is more blessed to give than to receive.”

Mall shoppers may honor that admonition without their knowledge (or consent).

Automated WPA Phishing Attacks: WiFiPhisher

From the webpage:

Wifiphisher is a security tool that mounts automated victim-customized phishing attacks against WiFi clients in order to obtain credentials or infect the victims with malwares. It is primarily a social engineering attack that unlike other methods it does not include any brute forcing. It is an easy way for obtaining credentials from captive portals and third party login pages (e.g. in social networks) or WPA/WPA2 pre-shared keys.

Security advice for mallers:

  • Go hard copy, shop with cash/checks.
  • Leave all wifi devices at home, not in your car, at home.

Otherwise, you may have a very blessed holiday shopping experience.

Statistics vs. Machine Learning Dictionary (flat text vs. topic map)

December 16th, 2017

Data science terminology (UBC Master of Data Science)

From the webpage:

About this document

This document is intended to help students navigate the large amount of jargon, terminology, and acronyms encountered in the MDS program and beyond. There is also an accompanying blog post.

Stat-ML dictionary

This section covers terms that have different meanings in different contexts, specifically statistics vs. machine learning (ML).
… (emphasis in original)

Gasp! You don’t mean that the same words have different meanings in machine learning and statistics!

Even more shocking, some words/acronyms, have the same meaning!

Never fear, a human reader can use this document to distinguish the usages.

Automated processors, not so much.

If these terms were treated as occurrences of topics, where the topics had the respective scopes of statistics and machine-learning, then for any scoped document, an enhanced view with the correct definition for the unsteady reader could be supplied.

Static markup of legacy documents is not required as annotations can be added as a document is streamed to a reader. Opening the potential, of course, for different annotations depending upon the skill and interest of the reader.

If for each term/subject, more properties than the scope of statistics or machine-learning or both were supplied, users of the topic map could search on those properties to match terms not included here. Such as which type of bias (in statistics) does bias mean in your paper? A casually written Wikipedia article reports twelve and with refinement, the number could be higher.

Flat text is far easier to write than a topic map but tasks every reader with re-discovering the distinctions already known to the author of the document.

Imagine your office, department, agency’s vocabulary and its definitions captured and then used to annotate internal or external documentation for your staff.

Instead of very new staffer asking (hopefully), what do we mean by (your common term), the definition appears with a mouse-over in a document.

Are you capturing the soft knowledge of your staff?

Evil Foca [Encourage Upgrades from Windows XP]

December 16th, 2017

Network Security Testing: Evil Foca

From the webpage:

Evil Foca is a tool for security pentesters and auditors whose purpose it is to test security in IPv4 and IPv6 data networks. The software automatically scans the networks and identifies all devices and their respective network interfaces, specifying their IPv4 and IPv6 addresses as well as the physical addresses through a convenient and intuitive interface.

The tool is capable of carrying out various attacks such as:

  • MITM over IPv4 networks with ARP Spoofing and DHCP ACK Injection.
  • MITM on IPv6 networks with Neighbor Advertisement Spoofing, SLAAC attack, fake DHCPv6.
  • DoS (Denial of Service) on IPv4 networks with ARP Spoofing.
  • DoS (Denial of Service) on IPv6 networks with SLAAC DoS.
  • DNS Hijacking.

Requirements

  • Windows XP or later.

ATMs and users running Windows XP are justification for possessing Windows XP.

But upgrading from Windows XP as an operations platform should be encouraged. For any purpose.

Yes?

Otherwise, what’s next? A luggable computer for your next assignment?

getExploit (utility)

December 15th, 2017

getExploit

From the webpage:

Python script to explore exploits from exploit-db.com. Exist a similar script in Kali Linux, but in difference this python script will have provide more flexibility at search and download time.

Looks useful, modulo the added risk of a local copy.

Yeti (You Are What You Record)

December 15th, 2017

Open Distributed Threat Intelligence: Yeti

From the webpage:

Yeti is a platform meant to organize observables, indicators of compromise, TTPs, and knowledge on threats in a single, unified repository. Yeti will also automatically enrich observables (e.g. resolve domains, geolocate IPs) so that you don’t have to. Yeti provides an interface for humans (shiny Bootstrap-based UI) and one for machines (web API) so that your other tools can talk nicely to it.

Yeti was born out of frustration of having to answer the question “where have I seen this artifact before?” or Googling shady domains to tie them to a malware family.

In a nutshell, Yeti allows you to:

  • Submit observables and get a pretty good guess on the nature of the threat.
  • Inversely, focus on a threat and quickly list all TTPs, Observables, and associated malware.
  • Let responders skip the “Google the artifact” stage of incident response.
  • Let analysts focus on adding intelligence rather than worrying about machine-readable export formats.
  • Visualize relationship graphs between different threats.

This is done by:

  • Collecting and processing observables from a wide array of different sources (MISP instances, malware trackers, XML feeds, JSON feeds…)
  • Providing a web API to automate queries (think incident management platform) and enrichment (think malware sandbox).
  • Export the data in user-defined formats so that they can be ingested by third-party applications (think blocklists, SIEM).

Yeti sounds like a good tool, but always remember: You Are What You Record.

Innocent activities captured in your Yeti repository could be made to look like plans for criminal activity.

Just a word to the wise.

KubeCon/CloudNativeCon [Breaking Into Clouds]

December 15th, 2017

KubeCon/CloudNativeCon just concluded in Austin, Texas with 179 videos now available on YouTube.

A sortable list of presentations: https://kccncna17.sched.com/. How long that will persist isn’t clear.

If you missed Why The Federal Government Warmed Up To Cloud Computing, take a minute to review it now. It’s a promotional piece but the essential take away, government data is moving to the cloud, remains valid.

To detect security failures during migration and post-migration, you will need to know cloud technology better than the average migration tech.

The videos from KubeCon/CloudNativeCon 2017 are a nice starter set in that direction.

Colorized Math Equations [Algorithms?]

December 15th, 2017

Colorized Math Equations by Kalid Azad.

From the post:

Years ago, while working on an explanation of the Fourier Transform, I found this diagram:

(source)

Argh! Why aren’t more math concepts introduced this way?

Most ideas aren’t inherently confusing, but their technical description can be (e.g., reading sheet music vs. hearing the song.)

My learning strategy is to find what actually helps when learning a concept, and do more of it. Not the stodgy description in the textbook — what made it click for you?

The checklist of what I need is ADEPT: Analogy, Diagram, Example, Plain-English Definition, and Technical Definition.

Here’s a few reasons I like the colorized equations so much:

  • The plain-English description forces an analogy for the equation. Concepts like “energy”, “path”, “spin” aren’t directly stated in the equation.
  • The colors, text, and equations are themselves a diagram. Our eyes bounce back and forth, reading the equation like a map (not a string of symbols).
  • The technical description — our ultimate goal — is not hidden. We’re not choosing between intuition or technical, it’s intuition for the technical.

Of course, we still need examples to check our understanding, but 4/5 ain’t bad!

Azad includes a LaTeX template that he uses to create colorized math equations.

Consider the potential use of color + explanation for algorithms. Being mindful that use of color presents accessibility issues that will require cleverness on your part.

Another tool for your explanation quiver!

THC-Hydra – Very Fast Network Logon Cracker

December 15th, 2017

Very Fast Network Logon Cracker: THC-Hydra

From the webpage:

Number one of the biggest security holes are passwords, as every password security study shows. Hydra is a parallized login cracker which supports numerous protocols to attack. New modules are easy to add, beside that, it is flexible and very fast. This fast, and many will say fastest network logon cracker supports many different services. Deemed ‘The best parallelized login hacker’: for Samba, FTP, POP3, IMAP, Telnet, HTTP Auth, LDAP, NNTP, MySQL, VNC, ICQ, Socks5, PCNFS, Cisco and more. Includes SSL support and is part of Nessus.

If you don’t know CyberPunk, they have great graphics:

If you have found the recent 1.4 billion password dump, THC-Hydra is in your near future.

IndonesiaLeaks [Leak early, Leak often]

December 15th, 2017

IndonesiaLeaks: New Platform for Whistleblowers and Muckrakers

From the post:

Ten media houses and five civil society organizations in Indonesia announced a collaboration this week to form a digital platform for whistleblowers.

IndonesiaLeaks will allow the public a platform to anonymously and securely submit information, documents and data sets related to the public interest. The information received by IndonesiaLeaks will then be vetted and verified for use in investigative reports by the ten affiliated media organizations.

The secure online platform is crucial in Indonesia due to the lack of whistleblower protection schemes. Those who take risks leaking information on offenses happening in their institutions are often prosecuted and intimidated.

“IndonesiaLeaks is designed as a collaborative platform between ten media houses to share tasks, responsibilities and resources, as well as risks,” said Wahyu Dhyatmika, the editor of IndonesiaLeaks member publication Tempo.co, at the platform’s launch in Jakarta on Thursday. “By creating this partnership, we hope the impacts of investigative journalism will be bigger and spread widely.”

A welcome surprise as a hard year for the media draws to a close. The chest pounding antics of the American President aren’t the only woes for the media in 2017, but they have been some of the most visible.

IndonesiaLeaks promises to give the sordid side of government (is there another side?) greater visibility. This collaboration will provide strength in numbers and resources for its participants, furthering their ability to practice investigative journalism.

I don’t read Indonesian but the website is attractive and focuses on the secure submission of documents. I rather like that, clean, focused, and to the point.

The collaboration partners to date:

Support these collaborators and other investigative journalists at every opportunity. You never know when one of their stories will impact your reporting on a frothing, tantrum throwing, press hater closer to the United States.

Spatial Microsimulation with R – Public Policy Advocates Take Note

December 14th, 2017

Spatial Microsimulation with R by Robin Lovelace and Morgane Dumont.

Apologies for the long quote below but spatial microsimulation is unfamiliar enough that it merited an introduction in the authors’ own prose.

We have all attended public meetings where developers, polluters, landfill operators, etc., had charts, studies, etc., and the public was armed with, well, its opinions.

Spatial Microsimulation with R can put you in a position to offer alternative analysis, meaningfully ask for data used in other studies, in short, arm yourself with weapons long abused in public policy discussions.

From Chapter 1, 1.2 Motivations:


Imagine a world in which data on companies, households and governments were widely available. Imagine, further, that researchers and decision-makers acting in the public interest had tools enabling them to test and model such data to explore different scenarios of the future. People would be able to make more informed decisions, based on the best available evidence. In this technocratic dreamland pressing problems such as climate change, inequality and poor human health could be solved.

These are the types of real-world issues that we hope the methods in this book will help to address. Spatial microsimulation can provide new insights into complex problems and, ultimately, lead to better decision-making. By shedding new light on existing information, the methods can help shift decision-making processes away from ideological bias and towards evidence-based policy.

The ‘open data’ movement has made many datasets more widely available. However, the dream sketched in the opening paragraph is still far from reality. Researchers typically must work with data that is incomplete or inaccessible. Available datasets often lack the spatial or temporal resolution required to understand complex processes. Publicly available datasets frequently miss key attributes, such as income. Even when high quality data is made available, it can be very difficult for others to check or reproduce results based on them. Strict conditions inhibiting data access and use are aimed at protecting citizen privacy but can also serve to block democratic and enlightened decision making.

The empowering potential of new information is encapsulated in the saying that ‘knowledge is power’. This helps explain why methods such as spatial microsimulation, that help represent the full complexity of reality, are in high demand.

Spatial microsimulation is a growing approach to studying complex issues in the social sciences. It has been used extensively in fields as diverse as transport, health and education (see Chapter ), and many more applications are possible. Fundamental to the approach are approximations of individual level data at high spatial resolution: people allocated to places. This spatial microdata, in one form or another, provides the basis for all spatial microsimulation research.

The purpose of this book is to teach methods for doing (not reading about!) spatial microsimulation. This involves techniques for generating and analysing spatial microdata to get the ‘best of both worlds’ from real individual and geographically-aggregated data. Population synthesis is therefore a key stage in spatial microsimulation: generally real spatial microdata are unavailable due to concerns over data privacy. Typically, synthetic spatial microdatasets are generated by combining aggregated outputs from Census results with individual level data (with little or no geographical information) from surveys that are representative of the population of interest.

The resulting spatial microdata are useful in many situations where individual level and geographically specific processes are in operation. Spatial microsimulation enables modelling and analysis on multiple levels. Spatial microsimulation also overlaps with (and provides useful initial conditions for) agent-based models (see Chapter 12).

Despite its utility, spatial microsimulation is little known outside the fields of human geography and regional science. The methods taught in this book have the potential to be useful in a wide range of applications. Spatial microsimulation has great potential to be applied to new areas for informing public policy. Work of great potential social benefit is already being done using spatial microsimulation in housing, transport and sustainable urban planning. Detailed modelling will clearly be of use for planning for a post-carbon future, one in which we stop burning fossil fuels.

For these reasons there is growing interest in spatial microsimulation. This is due largely to its practical utility in an era of ‘evidence-based policy’ but is also driven by changes in the wider research environment inside and outside of academia. Continued improvements in computers, software and data availability mean the methods are more accessible than ever. It is now possible to simulate the populations of small administrative areas at the individual level almost anywhere in the world. This opens new possibilities for a range of applications, not least policy evaluation.

Still, the meaning of spatial microsimulation is ambiguous for many. This book also aims to clarify what the method entails in practice. Ambiguity surrounding the term seems to arise partly because the methods are inherently complex, operating at multiple levels, and partly due to researchers themselves. Some uses of the term ‘spatial microsimulation’ in the academic literature are unclear as to its meaning; there is much inconsistency about what it means. Worse is work that treats spatial microsimulation as a magical black box that just ‘works’ without any need to describe, or more importantly make reproducible, the methods underlying the black box. This book is therefore also about demystifying spatial microsimulation.

If that wasn’t impressive enough, the authors:


We’ve put Spatial Microsimulation with R on-line because we want to reduce barriers to learning. We’ve made it open source via a GitHub repository because we believe in reproducibility and collaboration. Comments and suggests are most welcome there. If the content of the book helps your research, please cite it (Lovelace and Dumont, 2016).

How awesome is that!

Definitely a model for all of us to emulate!

Twitter Bot Template – If You Can Avoid Twitter Censors

December 14th, 2017

Twitter Bot Template

From the webpage:

Boilerplate for creating simple, non-interactive twitter bots that post periodically. My comparisons bot, @botaphor, is an example of how I use this template in practice.

This is intended for coders familiar with Python and bash.

If you can avoid Twitter censors (new rules, erratically enforced, a regular “feature”), then this Twitter bot template may interest you.

Make tweet filtering a commercial opportunity and Twitter can drop the cost with no profit center of tweet censorship.

Unlikely because policing other people is such a power turn-on.

Still, this is the season for wishes.

Visual Domain Decathlon

December 14th, 2017

Visual Domain Decathlon

From the webpage:

The goal of this challenge is to solve simultaneously ten image classification problems representative of very different visual domains. The data for each domain is obtained from the following image classification benchmarks:

  1. ImageNet [6].
  2. CIFAR-100 [2].
  3. Aircraft [1].
  4. Daimler pedestrian classification [3].
  5. Describable textures [4].
  6. German traffic signs [5].
  7. Omniglot. [7]
  8. SVHN [8].
  9. UCF101 Dynamic Images [9a,9b].
  10. VGG-Flowers [10].

The union of the images from the ten datasets is split in training, validation, and test subsets. Different domains contain different image categories as well as a different number of images.

The task is to train the best possible classifier to address all ten classification tasks using the training and validation subsets, apply the classifier to the test set, and send us the resulting annotation file for assessment. The winner will be determined based on a weighted average of the classification performance on each domain, using the scoring scheme described below. At test time, your model is allowed to know the ground-truth domain of each test image (ImageNet, CIFAR-100, …) but, of course, not its category.

It is up to you to make use of the data, and you can either train a single model for all tasks or ten independent ones. However, you are not allowed to use any external data source for training. Furthermore, we ask you to report the overall size of the model(s) used.

The competition is over but you can continue to submit results and check the results in the leaderboard. (There’s an idea that merits repetition.)

Will this be your entertainment game for the holidays?

Enjoy!

98% Fail Rate on Privileged Accounts – Transparency in 2018

December 14th, 2017

Half of companies fail to tell customers about data breaches, claims study by Nicholas Fearn.

From the post:

Half of organisations don’t bother telling customers when their personal information might have been compromised following a cyber attack, according to a new study.

The latest survey from security firm CyberArk comes with the full implementation of the European Union General Data Protection Regulation (GDPR) just months away.

Organisations that fail to notify the relevant data protection authorities of a breach within 72 hours of finding it can expect to face crippling fines of up to four per cent of turnover – with companies trying to hide breaches likely to be hit with the biggest punishments.

The findings have been published in the second iteration the CyberArk Global Advanced Threat Landscape Report 2018, which explores business leaders’ attitudes towards IT security and data protection.

The survey found that, overall, security “does not translate into accountability”. Some 46 per cent of organisations struggle to stop every attempt to breach their IT infrastructure.

And 63 per cent of business leaders acknowledge that their companies are vulnerable to attacks, such as phishing. Despite this concern, 49 per cent of organisations don’t have the right knowledge about security policies.

You can download the report cited in Fearn’s post at: Cyberark Global Advanced Threat Landscape Report 2018: The Business View of Security.

If you think that report has implications for involuntary/inadvertent transparency, Cyberark Global Advanced Threat Landscape Report 2018: Focus on DevOps, reports this gem:


It’s not just that businesses underestimate threats. As noted above, they also do not seem to fully understand where privileged accounts and secrets exist. When asked which IT environments and devices contain privileged accounts and secrets, responses (IT decision maker and DevOps/app developer respondents) were at odds with the claim that most businesses have implemented a privileged account security solution. A massive 98% did not select at least one of the ‘containers’, ‘microservices’, ‘CI/CD tools’, ‘cloud environments’ or ‘source code repositories’ options. At the risk of repetition, privileged accounts and secrets are stored in all of these entities.

A fail rate of 98% on identifying “privileged accounts and secrets?”

Reports like this make you wonder about the clamor for transparency of organizations and governments. Why bother?

Information in 2018 is kept secure by a lack of interest in collecting it.

Remember that for your next transparency discussion.

Deep Learning: Practice and Trends [NIPS 2017]

December 13th, 2017

Deep Learning: Practice and Trends by Scott Reed, Nando de Freitas, Oriol Vinyals.

NIPS 2017 Tutorial, Long Beach, CA.

The image is easier to read as the first slide but the dark blue line represents registrations versus time to the NIPS conference for 2017.

The hyperlinks for the authors are to their Twitter accounts. Need I say more?

Trivia question (before you review the slides): Name two early computer scientists who rejected the use of logic as the key to intelligence?

No prize, just curious if you know without the slides.

Game Theory (Open Access textbook with 165 solved exercises)

December 13th, 2017

Game Theory (Open Access textbook with 165 solved exercises) by Giacomo Bonanno.

Not the video Bonanno references in Chapter 1 but close enough.

Game theory provides you with the tools necessary to analyze this game as well as more complex ones.

Enjoy!

A Guide To Kernel Exploitation: Attacking the Core (source files)

December 13th, 2017

If you know or are interested in >A Guide To Kernel Exploitation: Attacking the Core by Enrico Perla and Massimiliano Oldani, the source files are now available at: https://github.com/yrp604/atc-sources.

The website that accompanied the book is now reported to be defunct. Thanks to yrp604 for preserving these files.

Enjoy!

Making an Onion List and Checking It Twice (or more)

December 13th, 2017

Bash script to check if .onions and other urls are alive or not

From the post:

The basic idea of this bash script is to feed a list of .onion urls and use torsocks and wget to check if the url is active or not, surely there are many other alternatives but it always nice to have another option.

Useful script and daily reminder:

Privacy is a privilege you work for, it doesn’t happen by accident.

SIGINT for Anyone

December 12th, 2017

SIGINT for Anyone – The Growing Availability of Signals Intelligence in the Public Domain by Cortney Weinbaum, Steven Berner, Bruce McClintock.

From the webpage:

This Perspective examines and challenges the assumption that signals intelligence (SIGINT) is an inherently governmental function by revealing nongovernmental approaches and technologies that allow private citizens to conduct SIGINT activities. RAND researchers relied on publicly available information to identify SIGINT capabilities in the open market and to describe the intelligence value each capability provides to users. They explore the implications each capability might provide to the United States and allied governments.

The team explored four technology areas where nongovernmental SIGINT is flourishing: maritime domain awareness; radio frequency (RF) spectrum mapping; eavesdropping, jamming, and hijacking of satellite systems; and cyber surveillance. They then identified areas where further research and debate are needed to create legal, regulatory, policy, process, and human capital solutions to the challenges these new capabilities provide to government.

This was an exploratory effort, rather than a comprehensive research endeavor. The team relied on unclassified and publicly available materials to find examples of capabilities that challenge the government-only paradigm. They identified ways these capabilities and trends may affect the U.S. government in terms of emerging threats, policy implications, technology repercussions, human capital considerations, and financial effects. Finally, they identified areas for future study for U.S. and allied government leaders to respond to these changes.

More enticing than a practical guide to SIGINT, this report should encourage NGOs to consider SIGINT.

I say “consider” SIGINT because small organizations can’t measure intelligence success by the quantity of under-used/unexplored data on hand. Some large government do, cf. 9/11.

Where SIGINT offers a useful addition to other intelligence sources, it should be among the data feeds into an intelligence topic map.

IJCAI – Proceedings 1969-2016 Treasure Trove of AI Papers

December 12th, 2017

IJCAI – Proceedings 1969-2016

From the about page:

International Joint Conferences on Artificial Intelligence is a non-profit corporation founded in California, in 1969 for scientific and educational purposes, including dissemination of information on Artificial Intelligence at conferences in which cutting-edge scientific results are presented and through dissemination of materials presented at these meetings in form of Proceedings, books, video recordings, and other educational materials. IJCAI conferences present premier international gatherings of AI researchers and practitioners. IJCAI conferences were held biennially in odd-numbered years since 1969. They are sponsored jointly by International Joint Conferences on Artificial Intelligence Organization (IJCAI), and the national AI societie(s) of the host nation(s).

While looking for a paper on automatic concept formulation for Jack Park, I found this archive of prior International Joint Conferences on Artificial Intelligence proceedings.

The latest proceedings, 2016, runs six volumes and approximately 4276 pages.

Enjoy!

A Little Story About the `yes` Unix Command

December 12th, 2017

A Little Story About the `yes` Unix Command by Matthais Endler.

From the post:

What’s the simplest Unix command you know?

There’s echo, which prints a string to stdout and true, which always terminates with an exit code of 0.

Among the rows of simple Unix commands, there’s also yes. If you run it without arguments, you get an infinite stream of y’s, separated by a newline:

Ever installed a program, which required you to type “y” and hit enter to keep going? yes to the rescue!

Endler sets out to re-implement the yes command in Rust.

Why re-implement Unix tools?

The trivial program yes turns out not to be so trivial after all. It uses output buffering and memory alignment to improve performance. Re-implementing Unix tools is fun and makes me appreciate the nifty tricks, which make our computers fast.

Endler’s story is unlikely to replace any of your holiday favorites but unlike those, it has the potential to make you a better programmer.

Connecting R to Keras and TensorFlow

December 12th, 2017

Connecting R to Keras and TensorFlow by Joseph Rickert.

From the post:

It has always been the mission of R developers to connect R to the “good stuff”. As John Chambers puts it in his book Extending R:

One of the attractions of R has always been the ability to compute an interesting result quickly. A key motivation for the original S remains as important now: to give easy access to the best computations for understanding data.

From the day it was announced a little over two years ago, it was clear that Google’s TensorFlow platform for Deep Learning is good stuff. This September (see announcment), J.J. Allaire, François Chollet, and the other authors of the keras package delivered on R’s “easy access to the best” mission in a big way. Data scientists can now build very sophisticated Deep Learning models from an R session while maintaining the flow that R users expect. The strategy that made this happen seems to have been straightforward. But, the smooth experience of using the Keras API indicates inspired programming all the way along the chain from TensorFlow to R.

The Redditor deepfakes, of AI-Assisted Fake Porn fame mentions Keras as one of his tools. Is that an endorsement?

Rickert’s post is a quick start to Keras and Tensorflow but he does mention:

the MEAP from the forthcoming Manning Book, Deep Learning with R by François Chollet, the creator of Keras, and J.J. Allaire.

I’ve had good luck with Manning books in general so am looking forward to this one as well.

AI-Assisted Fake Porn Is Here… [Endless Possibilities]

December 12th, 2017

AI-Assisted Fake Porn Is Here and We’re All Fucked by Samantha Cole.

From the post:

Someone used an algorithm to paste the face of ‘Wonder Woman’ star Gal Gadot onto a porn video, and the implications are terrifying.

There’s a video of Gal Gadot having sex with her stepbrother on the internet. But it’s not really Gadot’s body, and it’s barely her own face. It’s an approximation, face-swapped to look like she’s performing in an existing incest-themed porn video.

The video was created with a machine learning algorithm, using easily accessible materials and open-source code that anyone with a working knowledge of deep learning algorithms could put together.

It’s not going to fool anyone who looks closely. Sometimes the face doesn’t track correctly and there’s an uncanny valley effect at play, but at a glance it seems believable. It’s especially striking considering that it’s allegedly the work of one person—a Redditor who goes by the name ‘deepfakes’—not a big special effects studio that can digitally recreate a young Princess Leia in Rogue One using CGI. Instead, deepfakes uses open-source machine learning tools like TensorFlow, which Google makes freely available to researchers, graduate students, and anyone with an interest in machine learning.
… (emphasis in original)

Posts and tweets lamenting “fake porn” abound but where others see terrifying implications, I see boundless potential.

Spoiler: The nay-sayers are on the wrong side of history – The Erotic Engine: How Pornography has Powered Mass Communication, from Gutenberg to Google Paperback by Patchen Barss.

or,


“The industry has convincingly demonstrated that consumers are willing to shop online and are willing to use credit cards to make purchases,” said Frederick Lane in “Obscene Profits: The Entrepreneurs of Pornography in the Cyber Age.” “In the process, the porn industry has served as a model for a variety of online sales mechanisms, including monthly site fees, the provision of extensive free material as a lure to site visitors, and the concept of upselling (selling related services to people once they have joined a site). In myriad ways, large and small, the porn industry has blazed a commercial path that other industries are hastening to follow.”
… (PORN: The Hidden Engine That Drives Innovation In Tech)

Enough time remains before the 2018 mid-terms for you to learn the technology used by ‘deepfakes’ to produce campaign imagery.

Paul Ryan, current Speaker of the House, isn’t going to (voluntarily) participate in a video where he steals food from children or steps on their hands as they grab for bread crusts in the street.

The same techniques that produce fake porn could be used to produce viral videos of those very scenes and more.

Some people, well-intentioned no doubt, will protest that isn’t informing the electorate and debating the issues. For them I have only one question: Why do you like losing so much?

I would wager one good viral video against 100,000 pages of position papers, unread by anyone other than the tiresome drones who produce them.

If you insist on total authenticity, then take Ryan film clips on why medical care can’t be provided for children and run it split-screen with close up death rattles of dying children. 100% truthful. See how that plays in your local TV market.

Follow ‘deepfakes’ on Reddit and start experimenting today!

Mathwashing:…

December 11th, 2017

Mathwashing: How Algorithms Can Hide Gender and Racial Biases by Kimberley Mok.

From the post:

Scholars have long pointed out that the way languages are structured and used can say a lot about the worldview of their speakers: what they believe, what they hold sacred, and what their biases are. We know humans have their biases, but in contrast, many of us might have the impression that machines are somehow inherently objective. But does that assumption apply to a new generation of intelligent, algorithmically driven machines that are learning our languages and training from human-generated datasets? By virtue of being designed by humans, and by learning natural human languages, might these artificially intelligent machines also pick up on some of those same human biases too?

It seems that machines can and do indeed assimilate human prejudices, whether they are based on race, gender, age or aesthetics. Experts are now finding more evidence that supports this phenomenon of algorithmic bias. As sets of instructions that help machines to learn, reason, recognize patterns and perform tasks on their own, algorithms increasingly pervade our lives. And in a world where algorithms already underlie many of those big decisions that can change lives forever, researchers are finding that many of these algorithms aren’t as objective as we assume them to be.

If you have ever suffered from the delusion that algorithms, any algorithm is “objective,” this post is a must read. Or re-read to remind yourself that “objectivity” is a claim used to put your position beyond question for self-interest. Nothing more.

For my part, I’m not sure what’s unclear about data collected, algorithms chosen, interpretation of results, all being the results of bias?

There may be acceptable biases, or degrees of bias, but the goal of any measurement is a result, which automatically biases a measurer in favor of phenomena that can be measured by a convenient technique. Phenomena that cannot be easily measured, no matter how important, won’t be included.

By the same token, “bias-correction” is the introduction of an acceptable bias and/or limiting bias to what is seen as, to the person judging the presence of bias, to an acceptable level of bias.

Bias is omnipresent and while evaluating algorithms is important, always bear in mind you are choosing acceptable bias over unacceptable bias.

Or to mis-quote the Princess Bride: “Bias is everywhere. Anyone who says differently is selling something.”

“Smart” Cock Ring Medical Hazard

December 10th, 2017

World’s first ‘smart condom’ collects intimate data during sex and tells men whether their performance is red-hot or a total flop.

From the post:


The smart condom is a small band which fits around the bottom of a man’s willy, which means wearers will still need to strap on a normal condom to get full protection.

It is waterproof and features a band that’s ‘extraordinarily flexible to ensure maximum comfort for all sizes’.

Bizarrely, it even lights up to provide illumination for both partners’ nether regions.

Or better, a picture:

With a hand so you can judge its size:

It’s either the world’s shortest condom or it’s a cock ring. Calling it a condom doesn’t make it one.

The distinction between a condom vs. cock ring is non-trivial. Improperly used, a cock ring can lead to serious injury.

Refer any friends you are asking for to: Post coital penile ring entrapment: A report of a non-surgical extrication method.

Catalin Cimpanu @campuscodi tweeted this as: “Security disaster waiting to happen…” but competing against others poses a health risk as well.

Incomplete Reporting – How to Verify A Dark Web Discovery?

December 10th, 2017

1.4 Billion Clear Text Credentials Discovered in a Single Database by Julio Casal.

From the post:

Now even unsophisticated and newbie hackers can access the largest trove ever of sensitive credentials in an underground community forum. Is the cyber crime epidemic about become an exponentially worse?

While scanning the deep and dark web for stolen, leaked or lost data, 4iQ discovered a single file with a database of 1.4 billion clear text credentials — the largest aggregate database found in the dark web to date.

None of the passwords are encrypted, and what’s scary is the we’ve tested a subset of these passwords and most of the have been verified to be true.

The breach is almost two times larger than the previous largest credential exposure, the Exploit.in combo list that exposed 797 million records. This dump aggregates 252 previous breaches, including known credential lists such as Anti Public and Exploit.in, decrypted passwords of known breaches like LinkedIn as well as smaller breaches like Bitcoin and Pastebin sites.

This is not just a list. It is an aggregated, interactive database that allows for fast (one second response) searches and new breach imports. Given the fact that people reuse passwords across their email, social media, e-commerce, banking and work accounts, hackers can automate account hijacking or account takeover.

This database makes finding passwords faster and easier than ever before. As an example searching for “admin,” “administrator” and “root” returned 226,631 passwords of admin users in a few seconds.

The data is organized alphabetically, offering examples of trends in how people set passwords, reuse them and create repetitive patterns over time. The breach offers concrete insights into password trends, cementing the need for recommendations, such as the NIST Cybersecurity Framework.
… (emphasis in original)

The full post goes onto discuss sources of the data, details of the dump file, freshness and password reuse. See Casal’s post for those details.

But no links were provided to the:

“…largest trove ever of sensitive credentials in an underground community forum.

How would you go about verifying such a discovery?

The post offers the following hints:

  1. “…single file … 1.4 billion clear text credentials”
  2. dump contains file “imported.log”
  3. list shown from “imported.log” has 55 unique file names

With #1, clear text credentials, I should be able to search for #2 “imported.log” and one of fifty-five (55) unique file names to come up with a fairly narrow set of search results. Not perfect but not a lot of manual browsing.

All onion search engines have .onion addresses.

Ahmia Never got to try one of the file names, “imported.log” returns 0 results.

Caronte I entered “imported.log,” but Caronte searches for “imported log.” Sigh, I really tire of corrective search interfaces. You? No useful results.

Haystack 0 results for “imported.log.”

Not Evil 3973 “hits” for “imported.log.” With search refinement, still no joy.

Bottom line: No verification of the reported credentials discovery.

Possible explanations:

  • Files have been moved or renamed
  • Forum is password protected
  • Used the wrong Dark Web search engines

Verification is all the rage in mainstream media.

How do you verify reports of content on the Dark Web? Or do you?

Releasing Failed Code to Distract from Accountability

December 10th, 2017

Dutch government publishes large project as Free Software by
Carmen Bianca Bakker.

From the post:

The Dutch Ministry of the Interior and Kingdom Relations released the source code and documentation of Basisregistratie Personen (BRP), a 100M€ IT system that registers information about inhabitants within the Netherlands. This comes as a great success for Public Code, and the FSFE applauds the Dutch government’s shift to Free Software.

Operation BRP is an IT project by the Dutch government that has been in the works since 2004. It has cost Dutch taxpayers upwards of 100 million Euros and has endured three failed attempts at revival, without anything to show for it. From the outside, it was unclear what exactly was costing taxpayers so much money with very little information to go on. After the plug had been pulled from the project earlier this year in July, the former interior minister agreed to publish the source code under pressure of Parliament, to offer transparency about the failed project. Secretary of state Knops has now gone beyond that promise and released the source code as Free Software (a.k.a. Open Source Software) to the public.

In 2013, when the first smoke signals showed, the former interior minister initially wanted to address concerns about the project by providing limited parts of the source code to a limited amount of people under certain restrictive conditions. The ministry has since made a complete about-face, releasing a snapshot of the (allegedly) full source code and documentation under the terms of the GNU Affero General Public License, with the development history soon to follow.

As far as the “…complete about-face…,” the American expression is: “You’ve been had.

Be appearing to agonize over the release of the source code, the “former interior minister” has made it appear the public has won a great victory for transparency.

Actually not.

Does the “transparency” offered by the source code show who authorized the expenditure of each part of the 100M€ total and who was paid that 100M€? Does source code “transparency” disclose project management decisions and who, in terms of government officials, approved those project decisions. For that matter, does source code “transparency” disclose discussions of project choices at all and who was present at those discussions?

It’s not hard to see that source code “transparency” is a deliberate failure on the part of the Dutch Ministry of the Interior and Kingdom Relations to be transparent. It has withheld, quite deliberately, any information that would enable Dutch citizens, programmers or otherwise, to have informed opinions about the failure of this project. Or to hold any accountable for its failure.

This may be:

…an unprecedented move of transparency by the Dutch government….

but only if the Dutch government is a black hole in terms of meaningful accountability for its software projects.

Which appears to be the case.

PS: Assuming Dutch citizens can pry project documentation out of the secretive Dutch Ministry of the Interior and Kingdom Relations, I know some Dutch topic mappers could assist with establishing transparency. If that’s what you want.

Introducing Data360R — data to the power of R [On Having an Agenda]

December 9th, 2017

Introducing Data360R — data to the power of R

From the post:

Last January 2017, the World Bank launched TCdata360 (tcdata360.worldbank.org/), a new open data platform that features more than 2,000 trade and competitiveness indicators from 40+ data sources inside and outside the World Bank Group. Users of the website can compare countries, download raw data, create and share data visualizations on social media, get country snapshots and thematic reports, read data stories, connect through an application programming interface (API), and more.

The response to the site has been overwhelmingly enthusiastic, and this growing user base continually inspires us to develop better tools to increase data accessibility and usability. After all, open data isn’t useful unless it’s accessed and used for actionable insights.

One such tool we recently developed is data360r, an R package that allows users to interact with the TCdata360 API and query TCdata360 data, metadata, and more using easy, single-line functions.

So long as you remember the World Bank has an agenda and all the data it releases serves that agenda, you should suffer no permanent harm.

Don’t take that as meaning other sources of data have less of an agenda, although you may find their agendas differ from that of the World Bank.

The recent “discovery” that machine learning algorithms can conceal social or racist bias, was long overdue.

Anyone who took survey work in social science methodology in the last half of the 20th century would report that data collection itself, much less its processing, is fraught with unavoidable bias.

It is certainly possible, in the physical sense, to give students standardized tests, but what test results mean for any given question, such as teacher competence, is far from clear.

Or to put it differently, just because something can be measured is no guarantee the measurement is meaningful. The same applied to the data that results from any measurement process.

Take advantage of data360r certainly, but keep a wary eye on data from any source.

Clojure 1.9 Hits the Streets!

December 9th, 2017

Clojure 1.9 by Alex Miller.

From the post:

Clojure 1.9 is now available!

Clojure 1.9 introduces two major new features: integration with spec and command line tools.

spec (rationale, guide) is a library for describing the structure of data and functions with support for:

  • Validation
  • Error reporting
  • Destructuring
  • Instrumentation
  • Test-data generation
  • Generative test generation
  • Documentation

Clojure integrates spec via two new libraries (still in alpha):

This modularization facilitates refinement of spec separate from the Clojure release cycle.

The command line tools (getting started, guide, reference) provide:

  • Quick and easy install
  • Clojure REPL and runner
  • Use of Maven and local dependencies
  • A functional API for classpath management (tools.deps.alpha)

The installer is available for Mac developers in brew, for Linux users in a script, and for more platforms in the future.

Being interested in documentation, I followed the link to spec rationale and found:


Map specs should be of keysets only

Most systems for specifying structures conflate the specification of the key set (e.g. of keys in a map, fields in an object) with the specification of the values designated by those keys. I.e. in such approaches the schema for a map might say :a-key’s type is x-type and :b-key’s type is y-type. This is a major source of rigidity and redundancy.

In Clojure we gain power by dynamically composing, merging and building up maps. We routinely deal with optional and partial data, data produced by unreliable external sources, dynamic queries etc. These maps represent various sets, subsets, intersections and unions of the same keys, and in general ought to have the same semantic for the same key wherever it is used. Defining specifications of every subset/union/intersection, and then redundantly stating the semantic of each key is both an antipattern and unworkable in the most dynamic cases.

Decomplect maps/keys/values

Keep map (keyset) specs separate from attribute (key→value) specs. Encourage and support attribute-granularity specs of namespaced keyword to value-spec. Combining keys into sets (to specify maps) becomes orthogonal, and checking becomes possible in the fully-dynamic case, i.e. even when no map spec is present, attributes (key-values) can be checked.

Sets (maps) are about membership, that’s it

As per above, maps defining the details of the values at their keys is a fundamental complecting of concerns that will not be supported. Map specs detail required/optional keys (i.e. set membership things) and keyword/attr/value semantics are independent. Map checking is two-phase, required key presence then key/value conformance. The latter can be done even when the (namespace-qualified) keys present at runtime are not in the map spec. This is vital for composition and dynamicity.

The idea of checking keys separate from their values strikes me as a valuable idea for processing of topic maps.

Keys not allowed in a topic or proxy, could signal an error, as in authoring, could be silently discarded depending upon your processing goals, or could be maintained while not considered or processed for merging purposes.

Thoughts?