Archive for February, 2015

Algorithmia

Saturday, February 28th, 2015

Algorithmia

Algorithmia was born in 2013 with the goal of advancing the art of algorithm development, discovery and use. As developers ourselves we believe that given the right tools the possibilities for innovation and discovery are limitless.

Today we build what we believe to be the next era of programming: a collaborative, always live and community driven approach to making the machines that we interact with better.

The community drives the Algorithmia API. One API that exposes the collective knowledge of algorithm developers across the globe.

Currently in private beta but sounds very promising!

I first saw Algorithmia mentioned in Algorithmia API Exposes Collective Knowledge of Developers by Martin W. Brennan.

MI5 accused of covering up sexual abuse at boys’ home

Saturday, February 28th, 2015

MI5 accused of covering up sexual abuse at boys’ home by Vikram Dodd and Richard Norton-Taylor.

From the post:

MI5 is facing allegations it was complicit in the sexual abuse of children, the high court in Northern Ireland will hear on Tuesday.

Victims of the abuse are taking legal action to force a full independent inquiry with the power to compel witnesses to testify and the security service to hand over documents.

The case, in Belfast, is the first in court over the alleged cover-up of British state involvement at the Kincora children’s home in Northern Ireland in the 1970s. It is also the first of the recent sex abuse cases allegedly tying in the British state directly. Victims allege that the cover-up over Kincora has lasted decades.

The victims want the claims of state collusion investigated by an inquiry with full powers, such as the one set up into other sex abuse scandals chaired by the New Zealand judge Lowell Goddard.

Amnesty International branded Kincora “one of the biggest scandals of our age” and backed the victims’ calls for an inquiry with full powers: “There are longstanding claims that MI5 blocked one or more police investigations into Kincora in the 1970s in order to protect its own intelligence-gathering operation, a terrible indictment which raises the spectre of countless vulnerable boys having faced further years of brutal abuse.

It’s too early to claim victory but, Belfast boys’ home abuse victims win legal bid by Henry McDonald:

Residents of a notorious Northern Ireland boys’ home are to be allowed to challenge a decision to exclude it from the UK-wide inquiry into establishment paedophile rings.

A high court judge in Belfast on Tuesday granted a number of former inmates from the Kincora home a judicial review into the decision to keep this scandal out of the investigation, headed by judge Lowell Goddard from New Zealand.

The Kincora boys’ home has been linked to a paedophile ring, some of whose members were allegedly being blackmailed by MI5 and other branches of the security forces during the Troubles.

Until now, the home secretary, Theresa May, has resisted demands from men who were abused at the home – and Amnesty International – that the inquiry be widened to include Kincora.

The campaigners want to establish whether the security services turned a blind eye to the abuse and instead used it to compromise a number of extreme Ulster loyalists guilty of abusing boys at the home.

If you read carefully you will see the abuse victims have won the right to challenge the exclusion of the boys home from a UK wide investigation. A long way from forcing MI5 and other collaborators in sexual abuse of children to provide an accounting in the clear light of day.

Leaked documents, caches of spy cables, spy documents, always show agents of the government protecting war criminals, paedophiles, engaging in torture, including rape and other dishonorable conduct.

My question is why does the mainstream media honors the fiction that government secrets are meant to protect the public? Government secrets are used to protect guilty, the dishonorable and the despicable. What’s unclear about that?

ClojureScript Tutorial

Friday, February 27th, 2015

ClojureScript Tutorial by Andrey Antukh.

From the webpage:

This tutorial consists on to provide an introduction to clojurescript from very basic setup, to more complex application in little incremental steps.

It includes:

  • Setup initial clojure app layout.
  • Setup initial clojurescript app layout.
  • First contact to clojurescript language.
  • Working with dom events.
  • Working with routing in the browser.
  • Working with ajax requests.
  • First contact with core.async.
  • Working with events and ajax using core.async.
  • First contact with om/reactjs.
  • Working with om and timetraveling.
  • Working with om and peristent state.
  • Little bonus: browser enabled repl.

I mention this because it will be helpful background for:

From the description:

Facebook’s React uses a virtual DOM diff implementation for high performance. It updates the view only when it’s needed. But David Nolen’s Om library (ClojureScript wrapper over React) goes even further. It stores application state in one place and passes “branches” of that state to a number of components. Data is immutable, and components are reusable. No more juggling around with JavaScript object literals. If anyone likes data as much as I do they will enjoy working with Om. It’s a great tool for building user interfaces around your data.

This talk will show how to combine core.async, liberator and om with JavaScript visualisation libraries to create interactive charts and maps. I will walk everyone through how to:

  • Generate a resource that can be used in a route and use it to pull data for visualisations
  • Use om to create reusable components out of JavaScript libraries: dimple.js and leaflet.js
  • Create core.async channels and use them communicate user clicks and selections between those components, e.g. selection on a checkbox component triggers reloading of data on a chart component.

The talk will be a practical guide to building small web applications and will be accessible to Clojurians with a basic knowledge of Clojure and HTML.

Enjoy!

Onion.city – a search engine bringing the Dark Web into the light

Friday, February 27th, 2015

Onion.city – a search engine bringing the Dark Web into the light by Mark Stockley.

From the post:

The Dark Web is reflecting a little more light these days.

On Monday I wrote about Memex, DARPA’s Deep Web search engine. Memex is a sophisticated tool set that has been in the hands of a few select law enforcement agencies for a year now, but it isn’t available to regular users like you and me.

There is another search engine that is though.

Just a few days before I wrote that article, on 11 February, user Virgil Griffith went onto the Tor-talk mailing list and announced Onion City, a Dark Web search engine for the rest of us.

The search engine delves into the anonymous Tor network, finds .onion sites and makes them available to regular users on the ordinary World Wide Web.

onion-city

Search and Access to Onion sites for Amusement ONLY! All of your activities are transparent to anyone capturing your web traffic.

If you need security and privacy, use a Tor client.

With that understanding: Onion City awaits your requests.

Is there a demand for an internal to Tor network search engine? Supported by internal to Tor advertising? Or is most Tor “marketing” by referral?

300 Data journalism blogs [1 Feedly OPML File]

Friday, February 27th, 2015

Data journalism blogs by Winny De Jong.

From the post:

At the News Impact Summit in Brussels I presented my workflow for getting ideas. Elsewhere on the blog a recap including interesting links. The RSS reader Feedly is a big part of my setup: together with Pocket its my most used app. Both are true lifesavers when reading is your default.

Since a lot op people of the News Summit audience use Feedly as well, I made this page to share my Feedly OPML file. If you’re not sure what an OPML file is read this page at Feedly.com.

Download my Feedly OPML export containing 300+ data journalism related sites here

Now that is a great way to start the weekend!

With a file of three hundred (300) data blogs!

Enjoy!

Comparing supervised learning algorithms

Friday, February 27th, 2015

Comparing supervised learning algorithms by Kevin Markham.

From the post:

In the data science course that I instruct, we cover most of the data science pipeline but focus especially on machine learning. Besides teaching model evaluation procedures and metrics, we obviously teach the algorithms themselves, primarily for supervised learning.

Near the end of this 11-week course, we spend a few hours reviewing the material that has been covered throughout the course, with the hope that students will start to construct mental connections between all of the different things they have learned. One of the skills that I want students to be able to take away from this course is the ability to intelligently choose between supervised learning algorithms when working a machine learning problem. Although there is some value in the “brute force” approach (try everything and see what works best), there is a lot more value in being able to understand the trade-offs you’re making when choosing one algorithm over another.

I decided to create a game for the students, in which I gave them a blank table listing the supervised learning algorithms we covered and asked them to compare the algorithms across a dozen different dimensions. I couldn’t find a table like this on the Internet, so I decided to construct one myself! Here’s what I came up with:

Eight (8) algorithms compared across a dozen (12) dimensions.

What algorithms would you add? Comments to add or take away?

Looks like the start of a very useful community resource.

Po’ Boy MapReduce

Friday, February 27th, 2015

po-boy-mapreduce

Posted by Mirko Krivanek as What Is MapReduce?, credit @Tgrall

Have You Tried DRAKON Comrade? (Russian Space Program Specification Language)

Friday, February 27th, 2015

DRAKON

From the webpage:

DRAKON is a visual language for specifications from the Russian space program. DRAKON is used for capturing requirements and building software that controls spacecraft.

The rules of DRAKON are optimized to ensure easy understanding by human beings.

DRAKON is gaining popularity in other areas beyond software, such as medical textbooks. The purpose of DRAKON is to represent any knowledge that explains how to accomplish a goal.


DRAKON Editor is a free tool for authoring DRAKON flowcharts. It also supports sequence diagrams, entity-relationship and class diagrams.

With DRAKON Editor, you can quickly draw diagrams for:

  • software requirements and specifications;
  • documenting existing software systems;
  • business processes;
  • procedures and rules;
  • any other information that tells “how to do something”.

DRAKON Editor runs on Windows, Mac and Linux.

The user interface of DRAKON Editor is extremely simple and straightforward.

Software developers can build real programs with DRAKON Editor. Source code can be generated in several programming languages, including Java, Processing.org, D, C#, C/C++ (with Qt support), Python, Tcl, Javascript, Lua, Erlang, AutoHotkey and Verilog

I note with amusement that the DRAKON editor has no “save” button. Rest easy! DRAKON saves all input automatically, removing the need for a “save” button. About time!

Download DRAKON editor.

I am in the middle of an upgrade so look for sample images next week.

Banning p < .05 In Psychology [Null Hypothesis Significance Testing Procedure (NHSTP)]

Friday, February 27th, 2015

The recent banning of the Null Hypothesis Significance Testing Procedure (NHSTP) in psychology should be a warning to would be data scientists that even “well established” statistical procedures may be deeply flawed.

Sorry, you may not have seen the news. In Basic and Applied Social Psychology (BASP), Banning Null Hypothesis Significance Testing Procedure (NHSTP) (2015) David Trafimow and Michael Marks write

The Basic and Applied Social Psychology (BASP) 2014 Editorial emphasized that the null hypothesis significance testing procedure (NHSTP) is invalid, and thus authors would be not required to perform it (Trafimow, 2014). However, to allow authors a grace period, the Editorial stopped short of actually banning the NHSTP. The purpose of the present Editorial is to announce that the grace period is over. From now on, BASP is banning the NHSTP.

You may be more familiar with seeing p < .05 rather than Null Hypothesis Significance Testing Procedure (NHSTP).

David Trafimow cites in the 2014 editorial warning about NHSTP his earlier work, Hypothesis Testing and Theory Evaluation at the Boundaries: Surprising Insights From Bayes’s Theorem (2003) as justifying non-use and the later ban of NHSTP.

His argument is summarized in the introduction:

Despite a variety of different criticisms, the standard nullhypothesis significance-testing procedure (NHSTP) has dominated psychology over the latter half of the past century. Although NHSTP has its defenders when used “properly” (e.g., Abelson, 1997; Chow, 1998; Hagen, 1997; Mulaik, Raju, & Harshman, 1997), it has also been subjected to virulent attacks (Bakan, 1966; Cohen, 1994; Rozeboom, 1960; Schmidt, 1996). For example, Schmidt and Hunter (1997) argue that NHSTP is “logically indefensible and retards the research enterprise by making it difficult to develop cumulative knowledge” (p. 38). According to Rozeboom (1997), “Null-hypothesis significance testing is surely the most bone-headedly misguided procedure ever institutionalized in the rote training of science students” (p. 336). The most important reason for these criticisms is that although one can calculate the probability of obtaining a finding given that the null hypothesis is true, this is not equivalent to calculating the probability that the null hypothesis is true given that one has obtained a finding. Thus, researchers are in the position of rejecting the null hypothesis even though they do not know its probability of being true (Cohen, 1994). One way around this problem is to use Bayes’s theorem to calculate the probability of the null hypothesis given that one has obtained a finding, but using Bayes’s theorem carries with it some problems of its own, including a lack of information necessary to make full use of the theorem. Nevertheless, by treating the unknown values as variables, it is possible to conduct some analyses that produce some interesting conclusions regarding NHSTP. These analyses clarify the relations between NHSTP and Bayesian theory and quantify exactly why the standard practice of rejecting the null hypothesis is, at times, a highly questionable procedure. In addition, some surprising findings come out of the analyses that bear on issues pertaining not only to hypothesis testing but also to the amount of information gained from findings and theory evaluation. It is possible that the implications of the following analyses for information gain and theory evaluation are as important as the NHSTP debate.

The most important lines for someone who was trained with the null hypothesis as an undergraduate many years ago:

The most important reason for these criticisms is that although one can calculate the probability of obtaining a finding given that the null hypothesis is true, this is not equivalent to calculating the probability that the null hypothesis is true given that one has obtained a finding. Thus, researchers are in the position of rejecting the null hypothesis even though they do not know its probability of being true (Cohen, 1994).

If you don’t know the probability of the null hypothesis, any conclusion you draw is on very shaky grounds.

Do you think any of the big data “shake-n-bake” mining/processing services are going to call that problem to your attention? True enough, such services may “empower” users but if “empowerment” means producing meaningless results, no thanks.

Trafimow cites Jacob Cohen’s The Earth is Round (p < .05) (1994) in his 2003 work. Cohen is angry and in full voice as only a senior scholar can afford to be.

Take the time to read both Trafimow and Cohen. Many errors are lurking outside your door but that will help you recognize this one.

Making Master Data Management Fun with Neo4j – Part 1, 2

Friday, February 27th, 2015

Making Master Data Management Fun with Neo4j – Part 1 by Brian Underwood.

From Part 1:

Joining multiple disparate data-sources, commonly dubbed Master Data Management (MDM), is usually not a fun exercise. I would like to show you how to use a graph database (Neo4j) and an interesting dataset (developer-oriented collaboration sites) to make MDM an enjoyable experience. This approach will allow you to quickly and sensibly merge data from different sources into a consistent picture and query across the data efficiently to answer your most pressing questions.

To start I’ll just be importing one data source: StackOverflow questions tagged with neo4j and their answers. In future blog posts I will discuss how to integrate other data sources into a single graph database to provide a richer view of the world of Neo4j developers’ online social interactions.

I’ve created a GraphGist to explore questions about the imported data, but in this post I’d like to briefly discuss the process of getting data from StackOverflow into Neo4j.

Part 1 imports data from Stackover flow into Neoj.

Making Master Data Management Fun with Neo4j – Part 2 imports Github data:

All together I was able to import:

  • 6,337 repositories
  • 6,232 users
  • 11,011 issues
  • 474 commits
  • 22,676 comments

In my next post I’ll show the process of how I linked the orignal StackOveflow data with the new GitHub data. Stay tuned for that, but in the meantime I’d also like to share the more technical details of what I did for those who are interested.

Definitely looking forward to seeing the reconciliation of data between StackOverflow and GitHub.

Data journalism: How to find stories in numbers

Friday, February 27th, 2015

Data journalism: How to find stories in numbers by Sandra Crucianelli.

From the post:

Colleagues often ask me what data journalism is. They’re confused by why it needs its own name — don’t all journalists use data?

The term is shorthand for ‘database journalism’ or ‘data-driven journalism’, where journalists find stories, or angles for stories, within large volumes of data.

It overlaps with investigative journalism in requiring lots of research, sometimes against people’s wishes. It can also overlap with data visualisation, as it requires close collaboration between journalists and digital specialists to find the best ways of presenting data.

So why get involved with spreadsheets and visualisation tools? At its most basic, adding data can give a story a new, factual dimension. But delving into datasets can also reveal new stories, or new aspects to them, that may not have otherwise surfaced.

Data journalism can also sometimes tell complicated stories more easily or clearly than relying on words alone — so it’s particularly useful for science journalists.

It can seem daunting if you’re trained in print or broadcast media. But I’ll introduce you to some new skills, and show you some excellent digital tools, so you too can soon find your feet as a data journalist.

Sandra gives as good an introduction to data journalism as you are likely to find. Her post covers everything from finding story ideas, researching relevant data, data processing and of course, presenting your findings in a persuasive way.

A must read for starting journalists but also for anyone needing an introduction to looking at data that supports a story (or not).

Gregor Aisch – Information Visualization, Data Journalism and Interactive Graphics

Thursday, February 26th, 2015

Gregor has two sites that I wanted to bring to your attention on information visualization, data journalism and interactive graphics.

The first one, driven-by-data.net are graphics from New York Times stories created by Gregor and others. Impressive graphics. If you are looking for visualization ideas, not a bad place to stop.

The second one, Vis4.net is a blog that features Gregor’s work. But more than a blog, if you choose the navigation links at the top of the page:

Color – Posts on color.

Code – Posts focused on code.

Cartography – Posts on cartography.

Advice – Advice (not for the lovelorn).

Archive – Archive of his posts.

Rather than a long list of categories (ahem), Gregor has divided his material into easy to recognize and use divisions.

Always nice when you see a professional at work!

Enjoy!

Data Visualization with JavaScript

Thursday, February 26th, 2015

Data Visualization with JavaScript by Stephen A. Thomas.

From the webpage:

It’s getting hard to ignore the importance of data in our lives. Data is critical to the largest social organizations in human history. It can affect even the least consequential of our everyday decisions. And its collection has widespread geopolitical implications. Yet it also seems to be getting easier to ignore the data itself. One estimate suggests that 99.5% of the data our systems collect goes to waste. No one ever analyzes it effectively.

Data visualization is a tool that addresses this gap.

Effective visualizations clarify; they transform collections of abstract artifacts (otherwise known as numbers) into shapes and forms that viewers quickly grasp and understand. The best visualizations, in fact, impart this understanding intuitively. Viewers comprehend the data immediately—without thinking. Such presentations free the viewer to more fully consider the implications of the data: the stories it tells, the insights it reveals, or even the warnings it offers. That, of course, defines the best kind of communication.

If you’re developing web sites or web applications today, there’s a good chance you have data to communicate, and that data may be begging for a good visualization. But how do you know what kind of visualization is appropriate? And, even more importantly, how do you actually create one? Answers to those very questions are the core of this book. In the chapters that follow, we explore dozens of different visualizations, techniques, and tool kits. Each example discusses the appropriateness of the visualization (and suggests possible alternatives) and provides step-by-step instructions for including the visualization in your own web pages.

With a publication date of March 2015 its hard to get any more current information on data visualization and JavaScript!

You can view the text online or buy a proper ebook/hard copy.

Enjoy!

Structure and Interpretation of Computer Programs (LFE Edition)

Thursday, February 26th, 2015

Structure and Interpretation of Computer Programs (LFE Edition)

From the webpage:

This Gitbook (available here) is a work in progress, converting the MIT classic Structure and Interpretation of Computer Programs to Lisp Flavored Erlang. We are forever indebted to Harold Abelson, Gerald Jay Sussman, and Julie Sussman for their labor of love and intelligence. Needless to say, our gratitude also extends to the MIT press for their generosity in licensing this work as Creative Commons.

Contributing

This is a huge project, and we can use your help! Got an idea? Found a bug? Let us know!.

Writing, or re-writing if you are transposing a CS classic into another language, is far harder than most people imagine. Probably even more difficult than the original because your range of creativity is bound by the organization and themes of the underlying text.

I may have some cycles to donate to proof reading. Anyone else?

Making A Mouse Seem Like A Dragon

Thursday, February 26th, 2015

Ishaan Tharoor writes of a new edition of ‘Mein Kampf in What George Orwell said about Hitler’s ‘Mein Kampf’ saying in part:

But, in my view, the most poignant section of Orwell’s article dwells less on the underpinnings of Nazism and more on Hitler’s dictatorial style. Orwell gazes at the portrait of Hitler published in the edition he’s reviewing:

It is a pathetic, dog-like face, the face of a man suffering under intolerable wrongs. In a rather more manly way it reproduces the expression of innumerable pictures of Christ crucified, and there is little doubt that that is how Hitler sees himself. The initial, personal cause of his grievance against the universe can only be guessed at; but at any rate the grievance is here. He is the martyr, the victim, Prometheus chained to the rock, the self-sacrificing hero who fights single-handed against impossible odds. If he were killing a mouse he would know how to make it seem like a dragon. One feels, as with Napoleon, that he is fighting against destiny, that he can’t win, and yet that he somehow deserves to.

The line:

If he were killing a mouse he would know how to make it seem like a dragon.

is particularly appropriate in a time of defense budgets at all time highs, restrictions on travel, social media, “homeland” a/k/a “fatherland” security, torture as an instrument of democratic governments, etc.

Where is this dragon that threatens us so? Multiple smallish bands of people with no country, not national industrial base, no navy, no airforce, no armored divisions, no ICBMs, no nuclear weapons, no CBW, who are most skilled with knives and lite arms.

How many terrorists? In How Many Terrorists Are There: Not As Many As You Might Think Becky Ackers does the math based on the helpful report from the U.S. Department of State, Country Reports on Terrorism.

Before I give you Becky’s total, which errs on the generous side of rounding up, know that the Department of Homeland security already has them outnumbered.

Try 184,000.

Yep, just 184,000. Even big, bad “Al-Qa’ida (AQ)” and its three affiliates (“Al-Qa’ida in the Arabian Peninsula”; “Al-Qa’ida in Iraq”; and “Al-Qa’ida in the Islamic Maghreb”) boast only 4000 bad guys combined. (The main Al-Qa’ida’s “strength” is “impossible to estimate,” but the Reports admits that its “core has been seriously degraded” following “the death or arrest of dozens of mid- and senior-level AQ operatives.” “Dozens,” not “hundreds.” Hmmm.)

And remember, 184,000 is a ridiculously inflated figure – both because of our generous accounting and also because governments often expand a word’s meaning well beyond the dictionary’s. You may recall the Feds’ contending with straight faces in 2004 that if “a little old lady in Switzerland gave money to a charity for an Afghan orphanage, and the money was passed to al Qaeda,” she met the definition of “enemy combatant.” Five years later, a federal Fusion Center decreed that “if you’re an anti-abortion activist, or if you display political paraphernalia supporting a third-party candidate or [Ron Paul], if you possess subversive literature, you very well might be a member of a domestic paramilitary group.” No telling how many confused Swiss grandmothers and readers of Techdirt’s subversive articles cluster among those 184,000.

That number grows even more absurd when we compare it with the aforementioned Homeland Security’s 240,000 Warriors on Terror. Meanwhile, something like 780,000 cops stalk us nationwide, whose duties also encompass tilting at terrorism’s windmill. And that’s to say nothing of the scores of other bureaucracies at the national, state, and local levels hunting these same 184,000 guerrillas as well as an additional 1,368,137 troops from the armed forces [click on “Rank/Grade – current month”].

Even if you round the absurd number of terrorists up to 200,000 and round our total down to 2,000,000, at present the United States along has the terrorists outnumbered 10 to 1. Now add in Europe, China, India, etc. and you get the idea that terrorists really are the mice of the world.

Personally I’m glad they are re-printing ‘Mein Kampf.’

Good opportunity to be reminded that leaders who are making dragons out of the mice of terrorism aren’t planning on sacrificing themselves, they are going to sacrifice us, each and every one.

Category Theory – Reading List

Thursday, February 26th, 2015

Category Theory – Reading List by Peter Smith.

Notes along with pointers to other materials.

About Peter Smith:

These pages are by me, Peter Smith. I retired in 2011 from the Faculty of Philosophy at Cambridge. It was my greatest good fortune to have secure, decently paid, university posts for forty years in leisurely times, with a very great deal of freedom to follow my interests wherever they led. Like many of my generation, I am sure I didn’t at the time really appreciate just how lucky I and my contemporaries were. Some of the more student-orientated areas of this site, then, such as the Teach Yourself Logic Guide, constitute my small but heartfelt effort to give something back by way of thanks.

There is much to explore at Peter’s site beside his notes on category theory.

The Spy Cables [Videos]

Thursday, February 26th, 2015

The Spy Cables by AJ+.

As of today, the following nine (9) videos on the Spy Cables are on YouTube:

If you ever have an unkind word for some governments, watch Are You A Terrorist? first.

The Aussies have a checklist for potential terrorists that they have shared with other spy agencies that includes “denouncing Western countries and governments, particularly the United States and Israel.” I’ll concede that the United States is a Western country/government but putting Israel in that category makes me doubt the state of education in Australia.

Oh, and the purchasing of explosives will get you on a terrorist checklist, which the AJ+ concedes but I don’t think they realize that gasoline and alcohol are both highly explosive materials. A large amount of both is sold every day in the United States.

Hopefully more Spy Cable videos are on the way and will be posted to this YouTube channel.

Enjoy!

PS: What’s really ironic is that for all the huffing and puffing about secrecy, when secrets do come out, you find that government agencies and leaders are as petty and spiteful as anyone you read about in the Hollywood tabloids. Is what they are trying to hide in the name of “national security?”

How Rdio Onboards New Users

Thursday, February 26th, 2015

How Rdio Onboards New Users

User Onboarding does a teardown of Rdio, a highly successful music streaming site.

Highly successful does not equal perfect onboarding!

Interesting exercise to duplicate with your web/application interface.

CVE Details

Thursday, February 26th, 2015

CVE Details: The Ultimate Security Vulnerability Datasource

From the webpage:

www.cvedetails.com provides an easy to use web interface to CVE vulnerability data. You can browse for vendors, products and versions and view cve entries, vulnerabilities, related to them. You can view statistics about vendors, products and versions of products. CVE details are displayed in a single, easy to use page, see a sample here.

CVE vulnerability data are taken from National Vulnerability Database (NVD) xml feeds provided by National Institue of Standards and Technology.

Additional data from several sources like exploits from www.exploit-db.com, vendor statements and additional vendor supplied data, Metasploit modules are also published in addition to NVD CVE data. Vulnerabilities are classified by cvedetails.com using keyword matching and cwe numbers if possible, but they are mostly based on keywords.

Unless otherwise stated CVSS scores listed on this site are “CVSS Base Scores” provided in NVD feeds. Vulnerability data are updated daily using NVD feeds. Please visit nvd.nist.gov for more details.

It is hard to say how much data about security issues is kept secret versus how much is made public. What is clear, however, is that organizing the public information leaves a lot to be desired.

Take the CVE advisory on the Superfish issue:

Vulnerability Details : CVE-2015-2077.

In addition to the information on the page you are invited to:

Search Twitter

Search YouTube

Search Google

No peeking! Without checking those links, what search string do you think appears in each one?

  • Komodia Redirector
  • man-in-the-middle attackers
  • Superfish

Would you believe, none of the above?

The actual search string is: “CVE-2015-2077.”

Yep, the identifier assigned by the CVE site is used as the search string.

The same is true for the drop down menu, External Links, which searches: Secunia Advisories, XForce Advisories, Vulnerability Details at NVD, Vulnerability Details at Mitre, Nessus Plugins, First CVSS Guide (except for First CVSS Guide, which is A Complete Guide to the Common Vulnerability Scoring System Version 2.0.)

Don’t get me wrong, CVE Details is a great information resource, but bound by the use of its own identifiers. You are going to miss blog posts, tweets, and other materials.

BTW, CVE = Common Vulnerabilities and Exposures.

Enjoy!

The World’s ‘Most Secure’ Operating System Adds a Bitcoin Wallet

Thursday, February 26th, 2015

The World’s ‘Most Secure’ Operating System Adds a Bitcoin Wallet by Ian DeMartino.

From the post:

Tails OS, which experts consider the world’s most secure operating system and that the NSA called a “threat,” has released a new version that includes a Bitcoin wallet option.

In the fight for privacy, bitcoin has been an invaluable tool. While those in the know will be the first to tell you that bitcoin isn’t completely anonymous, the pseudonymous nature of bitcoin gives it far more privacy than credit card transactions, particularly if certain precautions are taken.

However, there is a larger battle in the war for privacy, and that is the battle for privacy of communication. A major advancement in this field is Tails OS, which was famously used by Glenn Greenwald and the other journalists that broke the Edward Snowden leaks. Yesterday, Tails OS announced that they have released version 1.3, and it includes an option for adding an Electrum Bitcoin wallet.

Just in case you are being proactive about your security and collecting funds in Bitcoins, you will find this of interest.

I was surprised to find that Tails comes with LibreOffice and not Emacs as an editor. Can always add it but I am curious why it isn’t present by default.

Enjoy!

Periodic Table of Machine Learning Libraries

Thursday, February 26th, 2015

Periodic Table of Machine Learning Libraries

Interesting visual display but it isn’t apparent how libraries were associated with particular elements?

That is why would I look for GATE at 106?

What I would find more interesting would be a listing of all of these machine learning libraries with pointers to additional resources for each one.

Just a thought.

PACKT Publishing – FREE LEARNING – HELP YOURSELF

Wednesday, February 25th, 2015

PACKT Publishing – FREE LEARNING – HELP YOURSELF

I’m not sure when this started but according to the webpage, there will be one free book per day until March 5, 2015.

I will be checking back tomorrow to see if the selection changes day to day.

Worth a trip just to see if there is anything of interest.

Enjoy!

Elon Musk Must Be Wringing His Hands, Again

Wednesday, February 25th, 2015

Google develops computer program capable of learning tasks independently by Hannah Devlin.

From the post:

Google scientists have developed the first computer program capable of learning a wide variety of tasks independently, in what has been hailed as a significant step towards true artificial intelligence.

The same program, or “agent” as its creators call it, learnt to play 49 different retro computer games, and came up with its own strategies for winning. In the future, the same approach could be used to power self-driving cars, personal assistants in smartphones or conduct scientific research in fields from climate change to cosmology.

The research was carried out by DeepMind, the British company bought by Google last year for £400m, whose stated aim is to build “smart machines”.

Demis Hassabis, the company’s founder said: “This is the first significant rung of the ladder towards proving a general learning system can work. It can work on a challenging task that even humans find difficult. It’s the very first baby step towards that grander goal … but an important one.”

Truly a remarkable achievement.

I haven’t found a more detailed description of the strategies developed by the “agent,” but it would be interesting to try those out on retro computer games.

The post is a good one and worth your time to read.

It closes by contrasting Elon Musk’s fears of an AI apocalypse with Google’s assurance that any danger is decades away.

I take a great deal of reassurance from the “agent” being supplied with the retro video games.

The “agent” did not choose to become a master of Asteroids, with the intent of being the despair of all other gamers at the local arcade.

However good an “agent” may become, at any task, from video games to surgery, the question is who chooses for the task to be performed? Granting we probably want to lock out commands like: “Make me a suitcase size nuclear weapon.” and that sort of thing.

Working with Small Files in Hadoop – Part 1, Part 2, Part 3

Wednesday, February 25th, 2015

Working with Small Files in Hadoop – Part 1, Part 2, Part 3 by Chris Deptula.

From the post:

Why do small files occur?

The small file problem is an issue Inquidia Consulting frequently sees on Hadoop projects. There are a variety of reasons why companies may have small files in Hadoop, including:

  • Companies are increasingly hungry for data to be available near real time, causing Hadoop ingestion processes to run every hour/day/week with only, say, 10MB of new data generated per period.
  • The source system generates thousands of small files which are copied directly into Hadoop without modification.
  • The configuration of MapReduce jobs using more than the necessary number of reducers, each outputting its own file. Along the same lines, if there is a skew in the data that causes the majority of the data to go to one reducer, then the remaining reducers will process very little data and produce small output files.

Does it sound like you have small files? If so, this series by Chris is what you are looking for.

Learning Data Visualization using Processing

Wednesday, February 25th, 2015

Learning Data Visualization using Processing by C.P. O’Neill.

From the post:

Learning data visualization techniques using the Processing programming language has always been a skill that has been on my list of things to learn really well and I finally got around to get started. I’ve used other technologies and methods before for data visualization, most notably R and RStudio, so when I got the opportunity to learn how to take that skill to the next level I jumped at it. Here is a visualization of all the meteor strikes that have been collected around the world. The bigger the circles, the larger the impact. I’m not going to go into a hugh analysis since I’m sure it’s been done many times before, but I am excited to get cracking on other data sets in the near future.

GitHub: repo

Skillshare Class: Data Visualization: Designing Maps with Processing and Illustrator

A nice reminder about Processing.

I have seen the usual visualization of arms exporters (U.S. is #1 by the way) but wonder about a visualization of the deaths attributable to world leaders during their terms in office (20th/21st century). Some of the counts are iffy and how do you allocate Russian deaths between Germany and the Allies (for not supporting Russia)? Still, it could be an interesting exercise.

I first saw this in a tweet by Stéphane Fréchette.

DataStax – New for 2015 – Free Online Instructor Led Training

Wednesday, February 25th, 2015

DataStax – New for 2015 – Free Online Instructor Led Training

I count six (6) online free courses in March 2015:

As of today, both:

report being “sold out” and you can join a waiting list.

If you take one or more of these courses, don’t keep your attendance a secret. Provide feedback to DataStax and post your comments about the experience online.

High quality online training isn’t cheap and positive feedback will strengthen the hand of those responsible for these free training classes.

Everyone is an IA [Information Architecture]

Wednesday, February 25th, 2015

Everyone is an IA [Information Architecture] by Dan Ramsden.

From the post:

This is a post inspired by my talk from World IA Day. On the day I had 20 minutes to fill – I did a magic trick and talked about an imaginary uncle. This post has the benefit of an edit, but recreates the central argument – everyone makes IA.

Information architecture is everywhere, it’s a part of every project, every design includes it. But I think there’s often a perception that because it requires a level of specialization to do the most complicated types of IA, people are nervous about how and when they engage with it – no-one like to look out of their depth. And some IA requires a depth of thinking that deserves justification and explanation.

Even when you’ve built up trust with teams of other disciplines or clients, I think one of the most regular questions asked of an IA is probably, ‘Is it really that complicated?’ And if we want to be happier in ourselves, and spread happiness by creating meaningful, beautiful, wonderful things – we need to convince people that complex is different from complicated. We need to share our conviction that IA is a real thing and that thinking like an IA is probably one of the most effective ways of contributing to a more meaningful world.

But we have a challenge, IAs are usualy the minority. At the BBC we have a team of about 140 in UX&D, and IAs are the minority – we’re not quite 10%. It’s my job to work out how those less than 1 in 10 can be as effective as possible and have the biggest positive impact on the work we do and the experiences we offer to our audiences. I don’t think this is unique. A lot of the time IAs don’t work together, or there’s not enough IAs to work on every project that could benefit from an IA mindset, which is every project.

This is what troubled me. How could I make sure that it is always designed? My solution to this is simple. We become the majority. And because we can’t do that just by recruiting a legion of IAs we do it another way. We turn everyone in the team into an information architect.

Now this is a bit contentious. There’s legitimate certainty that IA is a specialism and that there are dangers of diluting it. But last year I talked about an IA mindset, a way of approaching any design challenge from an IA perspective. My point then was that the way we tend to think and therefore approach design challenges is usually a bit different from other designers. But I don’t believe we’re that special. I think other people can adopt that mindset and think a little bit more like we do. I think if we work hard enough we can find ways to help designers to adopt that IA mindset more regularly.

And we know the benefits on offer when every design starts from the architecture up. Well-architected things work better. They are more efficient, connected, resilient and meaningful – they’re more useful.

Dan goes onto say that information is everywhere. Much in the same way that I would say that subjects are everywhere.

Just as users must describe information architectures as they experience them, the same is true for users identifying the subjects that are important to them.

There is never a doubt that more IAs and more subjects exist, but the best anyone can do is to tell you about the ones that are important to them and how they have chosen to identify them.

To no small degree, I think terminology has been used to disenfranchise users from discussing subjects as they understand them.

From my own background, I remember a database project where the head of membership services, who ran reports by rote out of R&R, insisted on saying where data needed to reside in tables during a complete re-write of the database. I keep trying, with little success, to get them to describe what they wanted to store and what capabilities they needed.

In retrospect, I should have allowed membership services to use their terminology to describe the database because whether they understood the underlying data architecture or not wasn’t a design goal. The easier course would have been to provide them with a view that accorded with their idea of the database structure and to run their reports. That other “views” of the data existed would have been neither here nor there to them.

As “experts,” we should listen to the description of information architectures and/or identifications of subjects and their relationships as a voyage of discovery. We are discovering the way someone else views the world, not for our correction to the “right” way but so we can enable their view to be more productive and useful to them.

That approach takes more work on the part of “experts” but think of all the things you will learn along the way.

Download the Hive-on-Spark Beta

Wednesday, February 25th, 2015

Download the Hive-on-Spark Beta by Xuefu Zhang.

From the post:

The Hive-on-Spark project (HIVE-7292) is one of the most watched projects in Apache Hive history. It has attracted developers from across the ecosystem, including from organizations such as Intel, MapR, IBM, and Cloudera, and gained critical help from the Spark community.

Many anxious users have inquired about its availability in the last few months. Some users even built Hive-on-Spark from the branch code and tried it in their testing environments, and then provided us valuable feedback. The team is thrilled to see this level of excitement and early adoption, and has been working around the clock to deliver the product at an accelerated pace.

Thanks to this hard work, significant progress has been made in the last six months. (The project is currently incubating in Cloudera Labs.) All major functionality is now in place, including different flavors of joins and integration with Spark, HiveServer2, and YARN, and the team has made initial but important investments in performance optimization, including split generation and grouping, supporting vectorization and cost-based optimization, and more. We are currently focused on running benchmarks, identifying and prototyping optimization areas such as dynamic partition pruning and table caching, and creating a roadmap for further performance enhancements for the near future.

Two month ago, we announced the availability of an Amazon Machine Image (AMI) for a hands-on experience. Today, we even more proudly present you a Hive-on-Spark beta via CDH parcel. You can download that parcel here. (Please note that in this beta release only HDFS, YARN, Apache ZooKeeper, and Hive are supported. Other components, such as Apache Pig, Apache Oozie, and Impala, might not work as expected.) The “Getting Started” guide will help you get your Hive queries up and running on the Spark engine without much trouble.

We welcome your feedback. For assistance, please use user@hive.apache.org or the Cloudera Labs discussion board.

We will update you again when GA is available. Stay tuned!

If you are snowbound this week, this may be what you have been looking for!

I have listed this under both Hive and Spark separately but am confident enough of its success that I created Hive-on-Spark as well.

Enjoy!

Typography Teardown of Advertising Age

Wednesday, February 25th, 2015

Typography Teardown of Advertising Age by Jeremiah Shoaf.

From the post:

I’m a huge fan of Samuel Hulick’s user onboarding teardowns so I thought it would be fun to try a new feature on Typewolf where I do a “typography teardown” of a popular website. I’ll review the design from a typographic perspective and discuss what makes the type work and what could potentially have been done better.

In this first edition I’m going to take a deep dive into the type behind the Advertising Age website. But first, a disclaimer.

Disclaimer: The following site was created by designers way more talented than myself. This is simply my opinion on the typography and how, at times, I may have approached things differently. Rules in typography are meant to be broken.

As you already know, I’m at least graphically challenged if not worse. 😉

Still, it doesn’t prevent me from enjoying graphics and layouts, I just have a hard time originating them. And I keep trying by reading resources such as this one.

While a website is reviewed by Jeremiah, the same principles should apply to an application interface.

Enjoy!

Google As Censor

Wednesday, February 25th, 2015

Google bans sexually explicit content on Blogger by Lisa Vaas.

From the post:

Google hasn’t changed its policy’s messaging around censorship, stating that “censoring this content is contrary to a service that bases itself on freedom of expression.”

How Google will manage, with Blogger, to increase “the availability of information, [encourage] healthy debate, and [make] possible new connections between people” while still curbing “abuses that threaten our ability to provide this service and the freedom of expression it encourages” remains to be seen.

I wrote an entire post, complete with Supreme Court citations, etc., on the basis that Google was really trying to be a moral censor without saying so. As I neared the end of the post, the penny dropped and the explanation for Google’s banning of “sexually explicit content” became clear.

Read that last part of the Google quote carefully:

“abuses that threaten our ability to provide this service and the freedom of expression it encourages”

Who would have the power to threaten Google’s sponsorship of Blogger and “the freedom of expression it encourages?”

Hmmm, does China come to mind?

China relaxes on pornography but YouTube is still blocked by Malcolm Moore.

Whether China is planning on new restrictions on pornography in general or Google is attempting to sweeten a deal with China by self-policing isn’t clear.

I find that a great deal more plausible than thinking Google has suddenly lost interest what can be highly lucrative content.

When they see “sexually explicit content” Google and its offended Chinese censor buddies:

could effectively avoid further bombardment of their sensibilities simply by averting their eyes.

Cohen v. California, 403 U.S. 15 (1971).

Averting your eyes is even easier with a web browser because you have to seek out the offensive content. If material offends you, don’t go there. Problem solved.

Google’s role as censor isn’t going to start with deleting large numbers of books from Google Books and heavy handed censoring of search results.

No, Google will start by censoring IS and other groups unpopular with one government or another. Then, as here, Google will move up to making some content harder to post, again at the behest of some government. By the time Google censorship reaches you, the principle of censorship will be well established and the only question left being where the line is drawn.

PS: Obviously I am speculating that China is behind the censoring of Blogger by Google but let’s first call this action what it is in fact: censorship. I don’t have any cables between China and Google but I feel sure someone does. Perhaps there is a leaky Google employee who can clear up this mystery for us all.