Archive for the ‘Visualization’ Category

The Hitchhiker’s Guide to d3.js [+ a question]

Sunday, May 14th, 2017

The Hitchhiker’s Guide to d3.js by Ian Johnson.

From the post:

[graphic omitted: see post]

The landscape for learning d3 is rich, vast and sometimes perilous. You may be intimidated by the long list of functions in d3’s API documentation or paralyzed by choice reviewing the dozens of tutorials on the home page. There are over 20,000+ d3 examples you could learn from, but you never know how approachable any given one will be.

[graphic omitted: see post]

If all you need is a quick bar or line chart, maybe this article isn’t for you, there are plenty of charting libraries out there for that. If you’re into books, check out Interactive Data Visualization for the Web by Scott Murray as a great place to start. D3.js in Action by Elijah Meeks is a comprehensive way to go much deeper into some regions of the API.

This guide is meant to prepare you mentally as well as give you some fruitful directions to pursue. There is a lot to learn besides the d3.js API, both technical knowledge around web standards like HTML, SVG, CSS and JavaScript as well as communication concepts and data visualization principles. Chances are you know something about some of those things, so this guide will attempt to give you good starting points for the things you want to learn more about.

Depending on your needs and learning style, The Hitchhiker’s Guide to d3.js (Guide), may be just what you need.

The Guide focuses on how to use d3.js and not on: What visualization should I create?

Suggestions on what should be considered when moving from raw data to a visualization? Resources?

Thanks!

How to Spot Visualization Lies

Monday, May 8th, 2017

How to Spot Visualization Lies : Keep your eyes open by Nathan Yau.

From the post:

It used to be that we’d see a poorly made graph or a data design goof, laugh it up a bit, and then carry on. At some point though — during this past year especially — it grew more difficult to distinguish a visualization snafu from bias and deliberate misinformation.

Of course, lying with statistics has been a thing for a long time, but charts tend to spread far and wide these days. There’s a lot of them. Some don’t tell the truth. Maybe you glance at it and that’s it, but a simple message sticks and builds. Before you know it, Leonardo DiCaprio spins a top on a table and no one cares if it falls or continues to rotate.

So it’s all the more important now to quickly decide if a graph is telling the truth. This a guide to help you spot the visualization lies.

Warning: Your blind acceptance/enjoyment of news graphics may be diminished by this post. You have been warned.

Beautifully illustrated as always.

Perhaps Nathan will product a double-sided, laminated version to keep by your TV chair. A great graduation present!

Interactive Data Visualization (D3, 2nd Ed) / Who Sank My Battleship?

Wednesday, May 3rd, 2017

Interactive Data Visualization for the Web, 2nd Edition: An Introduction to Designing with D3 by Scott Murray.

From the webpage:

Interactive Data Visualization for the Web addresses people interested in data visualization but new to programming or web development, giving them what they need to get started creating and publishing their own data visualization projects on the web. The recent explosion of interest in visualization and publicly available data sources has created need for making these skills accessible at an introductory level. The second edition includes greatly expanded geomapping coverage, more real-world examples, a chapter on how to put together all the pieces, and an appendix of case studies, in addition to other improvements.

It’s pre-order time!

Estimated to appear in August of 2017 at $49.99.

This shipping map, created by Kiln, based on data from the UCL Energy Institute, should inspire you to try D3.

The Interactive version, using 2012 data, illustrates the ability to select types of shipping:

  • Container
  • Dry Bulk
  • Gas Bulk
  • Tanker
  • Vehicles

with locations, port information and a variety of other information.

All of which reminds me of the Who Sank My Battleship? episode with Gen. Paul Van Riper (ret.), who during war games, used pleasure craft and highly original tactics to sink the vast majority of the opposing American fleet. So much so that the American fleet had to be “refloated” to continue the games with any chance of winning. War game was fixed to ensure American victory, claims general.

Given the effectiveness of Gen. Van Riper’s tactics had on military vessels, you can imagine how unarmored civilian shipping would fare. You don’t need an self-immolating F-35 or a nuclear sub to damage civilian shipping.

What you need is shipping broken down into targeting categories with their locations (see https://www.shipmap.org/), one or more pleasure craft stuffed with explosives and some rudimentary planning.


For the details of what I call the Who Sank My Battleship? episode, the official report, U.S. Joint Forces Command Millennium Challenge 2002: Experiment Report, runs some 752 pages.

AI Brain Scans

Monday, March 13th, 2017

‘AI brain scans’ reveal what happens inside machine learning


The ResNet architecture is used for building deep neural networks for computer vision and image recognition. The image shown here is the forward (inference) pass of the ResNet 50 layer network used to classify images after being trained using the Graphcore neural network graph library

Credit Graphcore / Matt Fyles

The image is great eye candy, but if you want to see images annotated with information, check out: Inside an AI ‘brain’ – What does machine learning look like? (Graphcore)

From the product overview:

Poplar™ is a scalable graph programming framework targeting Intelligent Processing Unit (IPU) accelerated servers and IPU accelerated server clusters, designed to meet the growing needs of both advanced research teams and commercial deployment in the enterprise. It’s not a new language, it’s a C++ framework which abstracts the graph-based machine learning development process from the underlying graph processing IPU hardware.

Poplar includes a comprehensive, open source set of Poplar graph libraries for machine learning. In essence, this means existing user applications written in standard machine learning frameworks, like Tensorflow and MXNet, will work out of the box on an IPU. It will also be a natural basis for future machine intelligence programming paradigms which extend beyond tensor-centric deep learning. Poplar has a full set of debugging and analysis tools to help tune performance and a C++ and Python interface for application development if required.

The IPU-Appliance for the Cloud is due out in 2017. I have looked at Graphcore but came up dry on the Poplar graph libraries and/or an emulator for the IPU.

Perhaps those will both appear later in 2017.

Optimized hardware for graph calculations sounds promising but rapidly processing nodes that may or may not represent the same subject seems like a defect waiting to make itself known.

Many approaches rapidly process uncertain big data but being no more ignorant than your competition is hardly a selling point.

9 Powerful Maps: Earthquakes, Elections, and Space Exploration

Saturday, February 25th, 2017

9 Powerful Maps: Earthquakes, Elections, and Space Exploration by Marisa Krystian.

Nine really great maps with links:

  1. NOAA Science On a Sphere — Earthquakes
  2. The New York Times — Election Results
  3. Pop Chart Lab — Space Exploration
  4. Tomorrow — Electricity Map
  5. NASA — Hottest Year on Record
  6. Radio Garden — Share Music
  7. Facebook — Visualizing Friendships
  8. Transparency International — Corruption
  9. NOAA — Daily Real-Time Satellite Imagery

Two added bonuses:

  1. infogr.am offers a newsletter on visualization techniques
  2. There is an Infogram Ambassadorship program.

I just signed up for the newsletter and am pondering the Ambassadorship program.

If you sign up for the Ambassadorship program, be sure to share your experience and ping me with a link.

Repulsion On A Galactic Scale (Really Big Data/Visualization)

Tuesday, January 31st, 2017

Newly discovered intergalactic void repels Milky Way by Rol Gal.

From the post:

For decades, astronomers have known that our Milky Way galaxy—along with our companion galaxy, Andromeda—is moving through space at about 1.4 million miles per hour with respect to the expanding universe. Scientists generally assumed that dense regions of the universe, populated with an excess of galaxies, are pulling us in the same way that gravity made Newton’s apple fall toward earth.

In a groundbreaking study published in Nature Astronomy, a team of researchers, including Brent Tully from the University of Hawaiʻi Institute for Astronomy, reports the discovery of a previously unknown, nearly empty region in our extragalactic neighborhood. Largely devoid of galaxies, this void exerts a repelling force, pushing our Local Group of galaxies through space.

Astronomers initially attributed the Milky Way’s motion to the Great Attractor, a region of a half-dozen rich clusters of galaxies 150 million light-years away. Soon after, attention was drawn to a much larger structure called the Shapley Concentration, located 600 million light-years away, in the same direction as the Great Attractor. However, there has been ongoing debate about the relative importance of these two attractors and whether they suffice to explain our motion.

The work appears in the January 30 issue of Nature Astronomy and can be found online here.

Additional images, video, and links to previous related productions can be found at http://irfu.cea.fr/dipolerepeller.

If you are looking for processing/visualization of data on a galactic scale, this work by Yehuda Hoffman, Daniel Pomarède, R. Brent Tully & Hélène M. Courtois, hits the spot!

It is also a reminder that when you look up from your social media device, there is a universe waiting to be explored.

Interactive Color Wheel

Thursday, January 12th, 2017

Interactive Color Wheel

color-wheel-460

You will need to visit this interactive color wheel to really appreciate its capabilities.

What I find most helpful is the display of hex codes for the colors. I can distinguish colors but getting the codes right can be a real challenge.

Enjoy!

Good visualizations optimize for the human visual system

Saturday, December 31st, 2016

How Humans See Data by John Rauser.

Apologies to John for stepping on his title but at time mark 3:26, he says:

Good visualizations optimize for the human visual system.

That one insight sets a basis for distinguishing between good visualizations and bad ones.

Do watch the rest of the video, it is all as good as that moment.

What’s your favorite moment?

From the description:

John Rauser explains a few of the most important results from research into the functioning of the human visual system and the question of how humans decode information presented in graphical form. By understanding and applying this research when designing statistical graphics, you can simplify difficult analytical tasks as much as possible.

Links:

R/GGplot2 code for all plots in presentation.

Slides for Good visualizations optimize for the human visual system

Graphical Perception and Graphical Methods for Analyzing Scientific Data by William S. Cleveland and Robert McGill. (cited in the presentation)

The Elements of Graphing Data by William S. Cleveland. (also cited in the presentation)

2017/18 – When you can’t believe your eyes

Friday, December 23rd, 2016

Artificial intelligence is going to make it easier than ever to fake images and video by James Vincent.

From the post:

Smile Vector is a Twitter bot that can make any celebrity smile. It scrapes the web for pictures of faces, and then it morphs their expressions using a deep-learning-powered neural network. Its results aren’t perfect, but they’re created completely automatically, and it’s just a small hint of what’s to come as artificial intelligence opens a new world of image, audio, and video fakery. Imagine a version of Photoshop that can edit an image as easily as you can edit a Word document — will we ever trust our own eyes again?

“I definitely think that this will be a quantum step forward,” Tom White, the creator of Smile Vector, tells The Verge. “Not only in our ability to manipulate images but really their prevalence in our society.” White says he created his bot in order to be “provocative,” and to show people what’s happening with AI in this space. “I don’t think many people outside the machine learning community knew this was even possible,” says White, a lecturer in creative coding at Victoria University School of design. “You can imagine an Instagram-like filter that just says ‘more smile’ or ‘less smile,’ and suddenly that’s in everyone’s pocket and everyone can use it.”

Vincent reviews a number of exciting advances this year and concludes:


AI researchers involved in this fields are already getting a firsthand experience of the coming media environment. “I currently exist in a world of reality vertigo,” says Clune. “People send me real images and I start to wonder if they look fake. And when they send me fake images I assume they’re real because the quality is so good. Increasingly, I think, we won’t know the difference between the real and the fake. It’s up to people to try and educate themselves.”

An image sent to you may appear to be very convincing, but like the general in War Games, you have to ask does it make any sense?

Verification, subject identity in my terminology, requires more than an image. What do we know about the area? Or the people (if any) in the image? Where were they supposed to be today? And many other questions that depend upon the image and its contents.

Unless you are using a subject-identity based technology, where are you going to store that additional information? Or express your concerns about authenticity?

Low fat computing

Thursday, December 22nd, 2016

Low fat computing by Karsten Schmidt

A summary of the presentation by Schmidt by Malcolm Sparks, along with the presentation itself.

Lots of strange and 3-D printable eye candy for the first 15 minutes or so with Schmidt’s background. Starts to really rock around 20 minutes in with Forth code and very low level coding.

To get a better idea of what Schmidt has been doing, see his website: thi.ng, or his Forth repl in Javascript, http://forth.thi.ng/, or his GitHub repository or at: Github: thi.ng

Stop by at http://toxiclibs.org/ although the material there looks dated.

Poor Presentation – Failure to Communicate

Sunday, December 11th, 2016

If you ask about the age of city, do you expect to be told it founding date or its age?

If you said founding date, you will be as confused as I was by:

german-cities-poor-03

You can see the map in its full confusion.

The age of Aubsburg is indeed 2013, but 15 BCE (on orders of the Emperor Augustus) established the same fact with less effort on the part of the reader.

Making users work for information is always a poor communication strategy. Always.

Four Experiments in Handwriting with a Neural Network

Tuesday, December 6th, 2016

Four Experiments in Handwriting with a Neural Network by Shan Carter, David Ha, Ian Johnson, and Chris Olah.

While the handwriting experiments are compelling and entertaining, the author’s have a more profound goal for this activity:


The black box reputation of machine learning models is well deserved, but we believe part of that reputation has been born from the programming context into which they have been locked into. The experience of having an easily inspectable model available in the same programming context as the interactive visualization environment (here, javascript) proved to be very productive for prototyping and exploring new ideas for this post.

As we are able to move them more and more into the same programming context that user interface work is done, we believe we will see richer modes of human-ai interactions flourish. This could have a marked impact on debugging and building models, for sure, but also in how the models are used. Machine learning research typically seeks to mimic and substitute humans, and increasingly it’s able to. What seems less explored is using machine learning to augment humans. This sort of complicated human-machine interaction is best explored when the full capabilities of the model are available in the user interface context.

Setting up a search alert for future work from these authors!

War and Peace & R

Friday, December 2nd, 2016

No, not a post about R versus Python but about R and Tolstoy‘s War and Peace.

Using R to Gain Insights into the Emotional Journeys in War and Peace by Wee Hyong Tok.

From the post:

How do you read a novel in record time, and gain insights into the emotional journey of main characters, as they go through various trials and tribulations, as an exciting story unfolds from chapter to chapter?

I remembered my experiences when I start reading a novel, and I get intrigued by the story, and simply cannot wait to get to the last chapter. I also recall many conversations with friends on some of the interesting novels that I have read awhile back, and somehow have only vague recollection of what happened in a specific chapter. In this post, I’ll work through how we can use R to analyze the English translation of War and Peace.

War and Peace is a novel by Leo Tolstoy, and captures the salient points about Russian history from the period 1805 to 1812. The novel consists of the stories of five families, and captures the trials and tribulations of various characters (e.g. Natasha and Andre). The novel consists of about 1400 pages, and is one of the longest novels that have been written.

We hypothesize that if we can build a dashboard (shown below), this will allow us to gain insights into the emotional journey undertaken by the characters in War and Peace.

Impressive work, even though I would not use it as a short-cut to “read a novel in record time.”

Rather I take this as an alternative way of reading War and Peace, one that can capture insights a casual reader may miss.

Moreover, the techniques demonstrated here could be used with other works of literature, or even non-fictional works.

Imagine conducting this analysis over the reportedly more than 7,000 page full CIA Torture Report, for example.

A heatmap does not connect any dots, but points a user towards places where interesting dots may be found.

Certainly a tool for exploring large releases/leaks of text data.

Enjoy!

PS: Large, tiresome, obscure-on-purpose, government reports to practice on with this method?

Visualizing XML Schemas

Tuesday, November 29th, 2016

I don’t have one of the commercial XML packages at the moment and was casting about for a free visualization technique for a large XML schema when I encountered:

schema-visualization-460

I won’t be trying it on my schema until tomorrow but I thought it looked interesting enough to pass along.

Further details: Visualizing Complex Content Models with Spatial Schemas by Joe Pairman.

This looks almost teachable.

Thoughts?

Other “free” visualization tools to suggest?

Resources to Find the Data You Need, 2016 Edition

Monday, November 21st, 2016

Resources to Find the Data You Need, 2016 Edition by Nathan Yau.

From the post:

Before you get started on any data-related project, you need data. I know. It sounds crazy, but it’s the truth. It can be frustrating to sleuth for the data you need, so here are some tips on finding it (the openly available variety) and some topic-specific resources to begin your travels.

This is an update to the guide I wrote in 2009, which as it turns out, is now mostly outdated. So, 2016. Here we go.

If you know Nathan Yau’s work, FlowingData, then you know this is “the” starting list for data.

Enjoy!

How the Ghana Floods animation was created [Animating Your Local Flood Data With R]

Monday, November 7th, 2016

How the Ghana Floods animation was created by David Quartey.

From the post:

Ghana has always been getting flooded, but it seems that only floods in Accra are getting attention. I wrote about it here, and the key visualization was an animated map showing the floods in Ghana, and built in R. In this post, I document how I did it, hopefully you can do one also!

David’s focus is on Ghana but the same techniques work for data of greater local interest.

No Frills Gephi (8.2) Import of Clinton/Podesta Emails (1-18)

Wednesday, October 26th, 2016

Using Gephi 8.2, you can create graphs of the Clinton/Podesta emails based on terms in subject lines or the body of the emails. You can interactively work with all 30K+ (as of today) emails and extract networks based on terms in the posts. No programming required. (Networks based on terms will appear tomorrow.)

If you have Gephi 8.2 (I can’t find the import spigot in 9.0 or 9.1), you can import the Clinton/Podesta Emails (1-18) for analysis as a network.

To save you the trouble of regressing to Gephi 8.2, I performed a no frills/default import and exported that file as podesta-1-18-network.gephi.gz.

Download and uncompress podesta-1-18-network.gephi.gz, then you can pickup at timemark 3.49.

Open the file (your location may differ):

gephi-podesta-open-460

Obligatory hair-ball graph visualization. 😉

gephi-first-look-460

Considerably less appealing that Jennifer Golbeck’s but be patient!

First step, Layout -> Yifan Hu. My results:

yifan-hu-first-460

Second step, Network Diameter statistics (right side, run).

No visible impact on the graph but, now you can change the color and size of nodes in the graph. That is they have attributes on which you can base the assignment of color and size.

Tutorial gotcha: Not one of Jennifer’s tutorials but I was watching a Gephi tutorial that skipped the part about running statistics on the graph prior to assignment of color and size. Or I just didn’t hear it. The menu options appear in documentation but you can’t access them unless and until you run network statistics or have attributes for the assignment of color and size. Run statistics first!

Next, assign colors based on betweenness centrality:

gephi-betweenness-460

The densest node is John Podesta, but if you remove his node, rerun the network statistics and re-layout the graph, here is part of what results:

delete-central-node-460

A no frills import of 31,819 emails results in a graph of 3235 nodes and 11,831 edges.

That’s because nodes and edges combine (merge to you topic map readers) when they have the same identifier or for edges are between the same nodes.

Subject to correction, when that combining/merging occurs, the properties on the respective nodes/edges are accumulated.

Topic mappers already realize there are important subjects missing, some 31,819 of them. That is the emails themselves don’t by default appear as nodes in the network.

Ian Robinson, Jim Webber & Emil Eifrem illustrate this lossy modeling in Graph Databases this way:

graph-databases-lossy-460

Modeling emails without the emails is rather lossy. 😉

Other nodes/subjects we might want:

  • Multiple to: emails – Is who was also addressed important?
  • Multiple cc: emails – Same question as with to:.
  • Date sent as properties? So evolution of network/emails can be modeled.
  • Capture “reply-to” for relationships between emails?

Other modeling concerns?

Bear in mind that we can suppress a large amount of the detail so you can interactively explore the graph and only zoom into/display data after finding interesting patterns.

Some helpful links:

https://archive.org/details/PodestaEmailszipped: The email collection as bulk download, thanks to Michael Best, @NatSecGeek.

https://github.com/gephi/gephi/releases: Where you can grab a copy of Gephi 8.2.

Going My Way? – Explore 1.2 billion taxi rides

Friday, September 30th, 2016

Explore 1.2 billion taxi rides by Hannah Judge.

From the post:

Last year the New York City Taxi and Limousine Commission released a massive dataset of pickup and dropoff locations, times, payment types, and other attributes for 1.2 billion trips between 2009 and 2015. The dataset is a model for municipal open data, a tool for transportation planners, and a benchmark for database and visualization platforms looking to test their mettle.

MapD, a GPU-powered database that uses Mapbox for its visualization layer, made it possible to quickly and easily interact with the data. Mapbox enables MapD to display the entire results set on an interactive map. That map powers MapD’s dynamic dashboard, updating the data as you zoom and pan across New York.

Very impressive demonstration of the capabilities of MapD!

Imagine how you can visualize data from your hundreds of users geo-spotting security forces with their smartphones.

Or visualizing data from security forces tracking your citizens.

Technology cuts both ways.

The question is whether the sharper technology sword is going to be in your hands or those of your opponents?

How Mapmakers Make Mountains Rise Off the Page

Saturday, September 17th, 2016

How Mapmakers Make Mountains Rise Off the Page by Greg Miller.

From the post:

The world’s most beautiful places are rarely flat. From the soaring peaks of the Himalaya to the vast chasm of the Grand Canyon, many of the most stunning sites on Earth extend in all three dimensions. This poses a problem for mapmakers, who typically only have two dimensions to work with.

Fortunately, cartographers have some clever techniques for creating the illusion of depth, many of them developed by trial and error in the days before computers. The best examples of this work use a combination of art and science to evoke a sense of standing on a mountain peak or looking out an airplane window.

One of the oldest surviving maps, scratched onto an earthenware plate in Mesopotamia more than 4,000 years ago, depicts mountains as a series of little domes. It’s an effective symbol, still used today in schoolchildren’s drawings and a smartphone emoji, but it’s hardly an accurate representation of terrain. Over the subsequent centuries, mapmakers made mostly subtle improvements, varying the size and shape of their mountains, for example, to indicate that some were bigger than others.

But cartography became much more sophisticated during the Renaissance. Topographic surveys were done for the first time with compasses, measuring chains, and other instruments, resulting in accurate measurements of height. And mapmakers developed new methods for depicting terrain. One method, called hachuring, used lines to indicate the direction and steepness of a slope. You can see a later example of this in the 1807 map below of the Mexican volcano Pico de Orizaba. Cartographers today refer (somewhat dismissively) to mountains depicted this way as “woolly caterpillars.”

Stunning illusions of depth on maps, creating depth illusions in 2 dimensions (think computer monitors), history of map making techniques, are all reasons to read this post.

What seals it for me is that the quest for the “best” depth illusion continues. It’s not a “solved” problem. (No spoiler, see the post.)

Physical topography to one side, how are you going to bring “depth” to your topic map?

Some resources in a topic map may have great depth and others, unfortunately, may be like Wikipedia articles marked as:

This article has multiple issues.

How do you define and then enable navigation of your topic maps?

D3 in Depth

Saturday, August 27th, 2016

D3 in Depth by Peter Cook.

From the introduction:

D3 is an open source JavaScript library for:

  • data-driven manipulation of the Document Object Model (DOM)
  • working with data and shapes
  • laying out visual elements for linear, hierarchical, network and geographic data
  • enabling smooth transitions between user interface (UI) states
  • enabling effective user interaction

Let’s unpick these one by one.

Peter forgets to mention, there will be illustrations:

d3-tree-view-460

Same data as a packed circle:

d3-packed-circle-460

Same data as a treemap:

d3-treemap-460

The first two chapters are up and I’m waiting for more!

You?

PS: Follow Peter at: @animateddata.

Hair Ball Graphs

Friday, August 26th, 2016

An example of a non-useful “hair ball” graph visualization:

hairball-01-460

That image is labeled as “standard layout” at a site that offers this cohesion adapted layout alternative:

hairball-alternative-B02b-460

The full-size image is quite impressive.

If you were attempting to visualize vulnerabilities, which one would you pick?

The Ethics of Data Analytics

Sunday, August 21st, 2016

The Ethics of Data Analytics by Kaiser Fung.

Twenty-one slides on ethics by Kaiser Fung, author of: Junk Charts (data visualization blog), and Big Data, Plainly Spoken (comments on media use of statistics).

Fung challenges you to reach your own ethical decisions and acknowledges there are a number of guides to such decision making.

Unfortunately, Fung does not include professional responsibility requirements, such as the now out-dated Canon 7 of the ABA Model Code Of Professional Responsibility:

A Lawyer Should Represent a Client Zealously Within the Bounds of the Law

That canon has a much storied history, which is capably summarized in Whatever Happened To ‘Zealous Advocacy’? by Paul C. Sanders.

In what became known as Queen Caroline’s Case, the House of Lords sought to dissolve the marriage of King George the IV

George IV 1821 color

to Queen Caroline

CarolineOfBrunswick1795

on the grounds of her adultery. Effectively removing her as queen of England.

Queen Caroline was represented by Lord Brougham, who had evidence of a secret prior marriage by King George the IV to Catholic (which was illegal), Mrs Fitzherbert.

Portrait of Mrs Maria Fitzherbert, wife of George IV

Brougham’s speech is worth your reading in full but the portion most often cited for zealous defense reads as follows:


I once before took leave to remind your lordships — which was unnecessary, but there are many whom it may be needful to remind — that an advocate, by the sacred duty of his connection with his client, knows, in the discharge of that office, but one person in the world, that client and none other. To save that client by all expedient means — to protect that client at all hazards and costs to all others, and among others to himself — is the highest and most unquestioned of his duties; and he must not regard the alarm, the suffering, the torment, the destruction, which he may bring upon any other; nay, separating even the duties of a patriot from those of an advocate, he must go on reckless of the consequences, if his fate it should unhappily be, to involve his country in confusion for his client.

The name Mrs. Fitzherbert never slips Lord Brougham’s lips but the House of Lords has been warned that may not remain to be the case, should it choose to proceed. The House of Lords did grant the divorce but didn’t enforce it. Saving fact one supposes. Queen Caroline died less than a month after the coronation of George IV.

For data analysis, cybersecurity, or any of the other topics I touch on in this blog, I take the last line of Lord Brougham’s speech:

To save that client by all expedient means — to protect that client at all hazards and costs to all others, and among others to himself — is the highest and most unquestioned of his duties; and he must not regard the alarm, the suffering, the torment, the destruction, which he may bring upon any other; nay, separating even the duties of a patriot from those of an advocate, he must go on reckless of the consequences, if his fate it should unhappily be, to involve his country in confusion for his client.

as the height of professionalism.

Post-engagement of course.

If ethics are your concern, have that discussion with your prospective client before you are hired.

Otherwise, clients have goals and the task of a professional is how to achieve them. Nothing more.

National Food Days

Thursday, August 18th, 2016

All the National Food Days by Nathan Yau.

Nathan has created an interactive calendar of all the U.S. national food days.

Here is a non-working replica to entice you to see his interactive version:

national-food-days-460

What’s with July having a national food day every day?

Lobby for your favorite food and month!

Eduard Imhof – Swiss Cartographer (Video)

Thursday, August 11th, 2016

Eduard Imhof – Swiss Cartographer

A tv documentary on the Swiss cartographer Eduard Imhof.

In Swiss German but this English sub-title caught my eye:

But what can be extracted again from the map is also important.

A concern that should be voiced with attractive but complex visualizations.

The production of topographical maps at differing scales is a recurring theme in the video.

How to visualize knowledge at different scales is an open question. Not to mention an important one as more data becomes available for visualization.

Imhof tells a number of amusing anecdotes, including answering the question: Which two cantons in Switzerland have the highest density of pigs?

Enjoy!

For background:

Virtual Library Eduard Imhof

Eduard Imhof (1895-1986) was professor of cartography at the Swiss Federal Institute of Technology Zurich from 1925 – 1965. His fame far beyond the Institute of Technology was based on his school maps and atlases. In 1995 it was 100 years since his birthday. On this occasion several exhibitions celebrated his life and work, among others in Zurich, Bern, Bad Ragaz, Küsnacht/ZH, Barcelona, Karlsruhe and Berlin. The last such exhibition took place in summer 1997 in the Graphische Sammlung of the ETH. There it was possible to show a large number of maps and pictures in the original. At the conclusion of the exhibition Imhof’s family bequested his original works to the ETH-Bibliothek Zurich. Mrs. Viola Imhof, the widow of Eduard Imhof, being very much attached to his work, had a major part in making it accessible to the public.

Imhof wie ein Kartographische Rockstar

Eduard Imhof was born in Schiers on 25 Jan 1895 to the geographer Dr. Eduard Imhof and his wife Sophie.1 At the age of 19 he enrolled in ETH Zürich,2 and after several interruptions for military service, was awarded a geodesist/surveyor diploma in 1919.

He returned to ETH as an assistant to his mentor Prof. Fridolin Becker, himself a cartographic god widely viewed as the inventor of the Swiss style shaded relief map.3 In 1925, the year after Becker’s death, Imhof became an assistent professor and founded the Kartographische Institut (Institute of Cartography). Although the Institute was initially little more than a hand-painted sign above his small office, it was nevertheless the first of its kind in the world.

In 1925 he produced his first major work – the Schulkarte der Schweiz 1:500 000 (the School map of Switzerland). Over the years he would update the national school map several times as well as produce school maps for nearly half of the cantons in the Federation. He even did the school map for the Austrian Bundesländer of Vorarlberg. (footnotes omitted)

Failure of Thinking and Visualization

Wednesday, August 10th, 2016

Richard Bejtlich posted this image (thumbnail, select for full size) with the note:

When I see senior military schools create slides like this, I believe PPT is killing campaign planning. @EdwardTufte

enemy-is-ppt

I am loathe to defend PPT but the problem here lies with the author and not PPT.

Or quite possibly with concept of “center of gravity analysis.”

Whatever your opinion about the imperialistic use of U.S. military force, 😉 , the U.S. military is composed of professional warriors who study their craft in great detail.

On the topic “center of gravity analysis,” try Addressing the Fog of COG: Perspectives on the Center of Gravity in US Military Doctrine, Celestino Perez, Jr., General Editor. A no-holds barred debate by military professionals on COG.

With or without a background on COG, how do your diagrams compare to this one?

A Taxonomic Map of Philosophy

Wednesday, August 10th, 2016

A Taxonomic Map of Philosophy by Justin W..

From the post:

Some people go to PhilPapers, get the information they need, and then just go. Not Valentin Lageard, a graduate student in philosophy at Université Paris-Sorbonne. The Categories page at the site caught his eye. He says:

The completeness of their taxonomy was striking and I thought : “Could it be possible to map this taxonomy ?”. I decided it was a nice idea and i started to work on it.

The first step was to select the kind of graph and since their taxonomy includes a hierarchy permitting to sub-categories to be children of more than one parent categories, I selected a concentric circles graph.

Because I’m a python user, I choosed Networkx for the graph part and BeautifulSoup for the scraping part. Furthermore, since Philpapers gives the articles number for each category, I decided to add this data to my graph.

After some configurations of the display, I finally reached my goal: a map of the taxonomy of philosophy. And it was quite beautiful.

Agreed.

[See update, below, for the more detailed 5-layer version]


NEW UPDATE: Here is the 5-layer version. You can view it in more detail here (open it in a new tab or window for best results).

Impressive but is it informative?

In order to read the edge, I had to magnify the graph several times its original size, which then meant navigation was problematic.

Despite the beauty of the image, a graph file that enables filtering of nodes and edges would be far more useful for exploring the categories as well as the articles therein.

For example:

philosophy-categories-460

If you are wondering what falls under “whiteness,” apparently studies of “whiteness” in the racial sense but also authors whose surnames are “White.”

As the top of the categories page for whiteness advises:

This category needs an editor. We encourage you to help if you are qualified.

Caution: You may encounter resources at PhilPapers that render you unable to repeat commonly held opinions. Read at your own risk.

Enjoy!

Node XL (641 Pins)

Friday, August 5th, 2016

Node XL

Just a quick sample:

node-xl-pins-460

That’s only a sample, another 629 await your viewing (perhaps more by the time you read this post).

I have a Pineterest account but this is the first set of pins I have chosen to follow.

Suggestions of similar visualization boards at Pinterest?

Enjoy!

OnionRunner, ElasticSearch & Maltego

Wednesday, August 3rd, 2016

OnionRunner, ElasticSearch & Maltego by Adam Maxwell.

From the post:

Last week Justin Seitz over at automatingosint.com released OnionRunner which is basically a python wrapper (because Python is awesome) for the OnionScan tool (https://github.com/s-rah/onionscan).

At the bottom of Justin’s blog post he wrote this:

For bonus points you can also push those JSON files into Elasticsearch (or modify onionrunner.py to do so on the fly) and analyze the results using Kibana!

Always being up for a challenge I’ve done just that. The onionrunner.py script outputs each scan result as a json file, you have two options for loading this into ElasticSearch. You can either load your results after you’ve run a scan or you can load them into ElasticSearch as a scan runs. Now this might sound scary but it’s not, lets tackle each option separately.

A great enhancement to Justin’s original OnionRunner!

You will need a version of Maltego to perform the visualization as described. Not a bad idea to become familiar with Maltego in general.

Data is just data, until it is analyzed.

Enjoy!

Interactive 3D Clusters of all 721 Pokémon Using Spark and Plotly

Wednesday, August 3rd, 2016

Interactive 3D Clusters of all 721 Pokémon Using Spark and Plotly by Max Woolf.

721-pokemon-460

My screen capture falls far short of doing justice to the 3D image, not to mention it isn’t interactive. See Max’s post if you really want to appreciate it.

From the post:

There has been a lot of talk lately about Pokémon due to the runaway success of Pokémon GO (I myself am Trainer Level 18 and on Team Valor). Players revel in the nostalgia of 1996 by now having the ability catching the original 151 Pokémon in real life.

However, while players most-fondly remember the first generation, Pokémon is currently on its sixth generation, with the seventh generation beginning later this year with Pokémon Sun and Moon. As of now, there are 721 total Pokémon in the Pokédex, from Bulbasaur to Volcanion, not counting alternate Forms of several Pokémon such as Mega Evolutions.

In the meantime, I’ve seen a few interesting data visualizations which capitalize on the frenzy. A highly-upvoted post on the Reddit subreddit /r/dataisbeautiful by /u/nvvknvvk charts the Height vs. Weight of the original 151 Pokémon. Anh Le of Duke University posted a cluster analysis of the original 151 Pokémon using principal component analysis (PCA), by compressing the 6 primary Pokémon stats into 2 dimensions.

However, those visualizations think too small, and only on a small subset of Pokémon. Why not capture every single aspect of every Pokémon and violently crush that data into three dimensions?

If you need encouragement to explore the recent release of Spark 2.0, Max’s post that in abundance!

Caveat: Pokémon is popular outside of geek/IT circles. Familiarity with Pokémon may result in social interaction with others and/or interest in Pokémon. You have been warned.

Whose Chose Trump and Clinton?

Monday, August 1st, 2016

If you have been wondering who is responsible for choosing Trump and Clinton as the presidential nominees in 2016, you will find Only 9% of America Chose Trump and Clinton as the Nominees by Alicia Parlapiano and Adam Pearce quite interesting.

Using a fixed grid on the left hand side of the page that represents 324 million Americans, 1 square = 1 million people, the article inscribes boundaries on the grid for a series of factual statements.

For example, the first statement after the grid reads:

103 million of them are children, noncitizens or ineligible felons, and they do not have the right to vote.

For that statement, the grid displays:

chose-trump-clinton-460

An excellent demonstration that effective visualization requires a lot of thought and not necessarily graphics that jump and buzz with every movement of the mouse.

Successive statements reduce the area of people who voted in the primaries and even further by who voted for Trump or Clinton.

Eventually you are left with the 9% who chose the current nominees.

To be safe, you need 5% of the voting population to secure the nomination. Check the voting rolls for who votes in primaries and pay them directly. Cheaper than media campaigns and has the added advantage of not annoying the rest of the electorate with your ads.

If that sounds “undemocratic,” tell me what definition of democracy you are using where 9% of the population chooses the candidates and a little more than 30% will choose the winner?