March « 2015 « Another Word For It

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 2, 2015

CartoDB and Plotly Analyze Earthquakes

Filed under: CartoDB,Mapping,Maps,Plotly — Patrick Durusau @ 8:06 pm

From the post:

CartoDB lets you easily make web-based maps driven by a PostgreSQL/PostGIS backend, so data management is easy. Plotly is a cloud-based graphing and analytics platform with Python, R, & MATLAB APIs where collaboration is easy. This IPython Notebook shows how to use them together to analyze earthquake data.

…

Assuming your data/events have geographic coordinates, this post should enable you to plot that information as easy as earthquakes.

For example, if you had traffic accident locations, delays caused by those accidents and weather conditions, you could plot where the most disruptive accidents happen and the weather conditions in which they occur.

Comments Off

Drilling Down: A Quick Guide to Free and Inexpensive Data Tools

Filed under: Data Mining,Journalism,News,Reporting — Patrick Durusau @ 7:35 pm

Drilling Down: A Quick Guide to Free and Inexpensive Data Tools by Nils Mulvad.

From the post:

Newsrooms don’t need large budgets for analyzing data–they can easily access basic data tools that are free or inexpensive. The summary below is based on a five-day training session at Delo, the leading daily newspaper in Slovenia. Anuška Delić, journalist and project leader of DeloData at the paper, initiated the training with the aim of getting her team to work on data stories with easily available tools and a lot of new data.

“At first it seemed that not all of the 11 participants, who had no or almost no prior knowledge of this exciting field of journalism, would ‘catch the bug’ of data-driven thinking about stories, but soon it became obvious” once the training commenced, said Delić.

…

Encouraging story about data journalism as well as a source for inexpensive tools.

Even knowing the most basic tools will make you standout from people that repeat the government or party line (depending on where you are located).

Comments Off

Code for DeepMind & Commentary

Filed under: Deep Learning,Machine Learning — Patrick Durusau @ 6:59 pm

If you are following the news of Google’s Atari buster, ;-), the following items will be of interest:

Code for Human-Level Control through Deep Reinforcement Learning, which offers the source code to accompany the Nature article.

DeepMind’s Nature Paper and Earlier Related Work by Jürgen Schmidhuber. Jürgen takes issue with some of the claims made in the abstract of the Nature paper. Quite usefully he cites references and provides links to numerous other materials on deep learning.

How soon before this comes true?

In an online multiplayer game, no one knows you are an AI.

Comments Off

Azure Machine Learning Videos: February 2015

Filed under: Azure Marketplace,Machine Learning — Patrick Durusau @ 5:59 pm

Azure Machine Learning Videos: February 2015 by Mark Tabladillo.

From the post:

With the general availability of Azure Machine Learning, Microsoft released a collection of eighteen new videos which accurately summarize what the product does and how to use it. Most of the videos are short, and some of the material overlaps: I don’t have a recommended order, but you could play the shorter ones first. In all cases, you can download a copy of each video for your own library or offline use.

Eighteen new videos of varying lengths, the shortest and longest are:

Getting Started with Azure Machine Learning – Step3 35 seconds.

Preprocessing Data in Azure Machine Learning Studio 10 minutes 52 seconds.

Believe it or not, it is possible to say something meaningful in 35 seconds. Not a lot but enough to suggest an experiment based on information from a previous module.

For those of you on the MS side of the house or anyone who likes a range of employment options.

Enjoy!

Comments Off

Operationalizing a Hadoop Eco-System

Filed under: Hadoop,Hive — Patrick Durusau @ 4:04 pm

(Part 1: Installing & Configuring a 3-node Cluster) by Louis Frolio.

From the post:

The objective of DataTechBlog is to bring the many facets of data, data tools, and the theory of data to those curious about data science and big data. The relationship between these disciplines and data can be complex. However, if careful consideration is given to a tutorial, it is a practical expectation that the layman can be brought online quickly. With that said, I am extremely excited to bring this tutorial on the Hadoop Eco-system. Hadoop & MapReduce (at a high level) are not complicated ideas. Basically, you take a large volume of data and spread it across many servers (HDFS). Once at rest, the data can be acted upon by the many CPU’s in the cluster (MapReduce). What makes this so cool is that the traditional approach to processing data (bring data to cpu) is flipped. With MapReduce, CPU is brought to the data. This “divide-and-conquer” approach makes Hadoop and MapReduce indispensable when processing massive volumes of data. In part 1 of this multi-part series, I am going to demonstrate how to install, configure and run a 3-node Hadoop cluster. Finally, at the end I will run a simple MapReduce job to perform a unique word count of Shakespeare’s Hamlet. Future installments of this series will include topics such as: 1. Creating an advanced word count with MapReduce, 2. Installing and running Hive, 3. Installing and running Pig, 4. Using Sqoop to extract and import structured data into HDFS. The goal is to illuminate all the popular and useful tools that support Hadoop.

Operationalizing a Hadoop Eco-System (Part 2: Customizing Map Reduce)

Operationalizing a Hadoop Eco-System (Part 3: Installing and using Hive)

Be forewarned that Louis suggests hosting three Linux VMs on a fairly robust machine. He worked on a Windows 7 x64 machine with 1 TB of storage and 24G of RAM. (How much of that was used by Windows and Office he doesn’t say. 😉 )

The last post in this series was in April 2014 so you may have to look elsewhere for tutorials on Pig and Sqoop.

Enjoy!

Comments Off

‘Keep Fear Alive.’ Keep it alive.

Filed under: Data Mining,Security — Patrick Durusau @ 2:50 pm

Why Does the FBI Have To Manufacture Its Own Plots If Terrorism And ISIS Are Such Grave Threats? by Glenn Greenwald.

From the post:

The FBI and major media outlets yesterday trumpeted the agency’s latest counterterrorism triumph: the arrest of three Brooklyn men, ages 19 to 30, on charges of conspiring to travel to Syria to fight for ISIS (photo of joint FBI/NYPD press conference, above). As my colleague Murtaza Hussain ably documents, “it appears that none of the three men was in any condition to travel or support the Islamic State, without help from the FBI informant.” One of the frightening terrorist villains told the FBI informant that, beyond having no money, he had encountered a significant problem in following through on the FBI’s plot: his mom had taken away his passport. Noting the bizarre and unhinged ranting of one of the suspects, Hussain noted on Twitter that this case “sounds like another victory for the FBI over the mentally ill.”

In this regard, this latest arrest appears to be quite similar to the overwhelming majority of terrorism arrests the FBI has proudly touted over the last decade. As my colleague Andrew Fishman and I wrote last month — after the FBI manipulated a 20-year-old loner who lived with his parents into allegedly agreei target=”_blank”ng to join an FBI-created plot to attack the Capitol — these cases follow a very clear pattern:

The known facts from this latest case seem to fit well within a now-familiar FBI pattern whereby the agency does not disrupt planned domestic terror attacks but rather creates them, then publicly praises itself for stopping its own plots.

….

In an update to the post, Greenwald quotes former FBI assistant director Thomas Fuentes as saying:

If you’re submitting budget proposals for a law enforcement agency, for an intelligence agency, you’re not going to submit the proposal that “We won the war on terror and everything’s great,” cuz the first thing that’s gonna happen is your budget’s gonna be cut in half. You know, it’s my opposite of Jesse Jackson’s ‘Keep Hope Alive’—it’s ‘Keep Fear Alive.’ Keep it alive. (emphasis in the original)

The FBI run terror operations give a ring of validity to the imagined plots that the rest of the intelligence and law enforcement community is alleged to be fighting.

It’s unfortunate that the mainstream media can’t divorce itself from the government long enough to notice the shortage of terrorists in the United States. As in zero judging from terrorist attacks on government and many other institutions.

For example, the federal, state and local governments employ 21,831,255 people. Let’s see, how many died last year in terrorist attacks against any level of government? Err, that would the the 0, empty set, nil.

What about all the local, state, federal elected officials? Certainly federal officials would be targets for terrorists. How many died last year in terrorist attacks? Again, 0, empty set, nil.

Or the 900,000 police officers? Again, 0, empty set, nil. (About 150 police officers die every year in the line of duty. Auto accidents, violent encounters with criminals, etc. but no terrorists.)

That covers some of the likely targets for any terrorist and we came up with zero deaths. Either terrorists aren’t in the United States or their mother won’t let them buy a gun.

Either way, you can see why everyone should be rejecting the fear narrative.

PS: Suggestion: Let’s cut all the terrorist related budgets in half and if there are no terrorist attacks within a year, half them again. Then there would be no budget crisis, we could pay down the national debt, save Social Security and not live in fear.

Comments Off

Beginning deep learning with 500 lines of Julia

Filed under: Deep Learning,Julia,Machine Learning — Patrick Durusau @ 1:43 pm

Beginning deep learning with 500 lines of Julia by Deniz Yuret.

From the post:

There are a number of deep learning packages out there. However most sacrifice readability for efficiency. This has two disadvantages: (1) It is difficult for a beginner student to understand what the code is doing, which is a shame because sometimes the code can be a lot simpler than the underlying math. (2) Every other day new ideas come out for optimization, regularization, etc. If the package used already has the trick implemented, great. But if not, it is difficult for a researcher to test the new idea using impenetrable code with a steep learning curve. So I started writing KUnet.jl which currently implements backprop with basic units like relu, standard loss functions like softmax, dropout for generalization, L1-L2 regularization, and optimization using SGD, momentum, ADAGRAD, Nesterov’s accelerated gradient etc. in less than 500 lines of Julia code. Its speed is competitive with the fastest GPU packages (here is a benchmark). For installation and usage information, please refer to the GitHub repo. The remainder of this post will present (a slightly cleaned up version of) the code as a beginner’s neural network tutorial (modeled after Honnibal’s excellent parsing example).
…

This tutorial “begins” with you coding deep learning. If you need a bit more explanation on deep learning, you could do far worse than consulting Deep Learning: Methods and Applications or Deep Learning in Neural Networks: An Overview.

If you are already at the programming stage of deep learning, enjoy!

For Julia, Julia (homepage), Julia (online manual), juliablogger.com (Julia blog aggregator), should be enough to get you started.

I first saw this in a tweet by Andre Pemmelaar.

Comments Off

March 1, 2015

Let Me Get That Data For You (LMGTDFY)

Filed under: Bing,Open Data,Python — Patrick Durusau @ 8:22 pm

Let Me Get That Data For You (LMGTDFY) by U.S. Open Data.

From the post:

LMGTDFY is a web-based utility to catalog all open data file formats found on a given domain name. It finds CSV, XML, JSON, XLS, XLSX, XML, and Shapefiles, and makes the resulting inventory available for download as a CSV file. It does this using Bing’s API.

This is intended for people who need to inventory all data files on a given domain name—these are generally employees of state and municipal government, who are creating an open data repository, and performing the initial step of figuring out what data is already being emitted by their government.

LMGTDFY powers U.S. Open Data’s LMGTDFY site, but anybody can install the software and use it to create their own inventory. You might want to do this if you have more than 300 data files on your site. U.S. Open Data’s LMGTDFY site caps the number of results at 300, in order to avoid winding up with an untenably large invoice for using Bing’s API. (Microsoft allows 5,000 searches/month for free.)

Now there’s a useful utility!

Enjoy!

I first saw this in a tweet by Pycoders Weekly.

Comments (1)

John Snow, and OpenStreetMap

Filed under: Medical Informatics,Open Street Map — Patrick Durusau @ 6:06 pm

John Snow, and OpenStreetMap by Arthur Charpentier.

From the post:

While I was working for a training on data visualization, I wanted to get a nice visual for John Snow’s cholera dataset. This dataset can actually be found in a great package of famous historical datasets.

You know the story, right? Cholera epidemic in Soho, London, 1854. After Snow established that the Broad Street water pump was at the center of the outbreak, the Broad Street pump handle was removed.

But the story doesn’t end there, Wikipedia notes:

After the cholera epidemic had subsided, government officials replaced the Broad Street pump handle. They had responded only to the urgent threat posed to the population, and afterward they rejected Snow’s theory. To accept his proposal would have meant indirectly accepting the oral-fecal method transmission of disease, which was too unpleasant for most of the public to contemplate.

Government has been looking out for public opinion, not to say public health and well-being for quite some time.

Replicating the Snow analysis is important but it is even more important to realize that the equivalents of cholera are present in modern urban environments. Not cholera so often but street violence, bad drugs, high interest rate loans, food deserts, lack of child care, etc. are the modern equivalents of cholera.

What if a John Snow like mapping demonstrated that living in particular areas made you some N% more likely to spent X number of years in a state prison? Do you think that would affect the property values of housing owned by slum lords? Or impact the allocation for funds for schools and libraries?

Enjoy!

Comments Off

Big Data Never Sleeps

Filed under: BigData — Patrick Durusau @ 5:34 pm

Suggestion: Enlarge and print out this graphic on 8 1/2 x 11 (or A4 outside of the US) paper. When “big data” sales people come calling, hand them a copy of it and ask them to outline the relevancy of any of the shown data to your products and business model.

Don’t get me wrong, there are areas seen and unforeseen, where big data is going to have unimaginable impacts.

However, big data solutions will be sold where appropriate and where not. The only way to protect yourself is to ask the same questions of big data sales people as you would of any vendor selling you more conventional equipment for your business. What is the cost? What benefits do you gain? How does it impact your profit margins? Will it result in new revenue streams and what has been the experience of others with those streams?

Or do you want to be YouTube and still not making a profit? If you like churn perhaps so but churn is a business model for hedge fund managers for the most part.

I first saw this in a tweet by Veronique Milsant.

Comments Off

Clojure and Overtone Driving Minecraft

Filed under: Clojure,Games,Music — Patrick Durusau @ 4:53 pm

Clojure and Overtone Driving Minecraft by Joseph Wilk.

From the post:

Using Clojure we create interesting 3D shapes in Minecraft to the beat of music generated from Overtone.

We achieve this by embedding a Clojure REPL inside a Java Minecraft server which loads Overtone and connects to an external Supercollider instance (What Overtone uses for sound).

Speaking of functional programming, you may find this useful.

The graphics and music are impressive!

Comments Off

Help! Lost Source! (for story)

Filed under: Functional Programming — Patrick Durusau @ 4:35 pm

I read a delightful account of functional versus imperative programming yesterday while in the middle of a major system upgrade. Yes, I can tell by your laughter that you realize I either failed to bookmark the site and/or lost it somewhere along the way. Yes, I have tried searching for it but with all the interest in functional programming, I was about as successful as the NSA in predicting the next terror attack.

Let me relate to you as much of it as I remember, in no particular order, and perhaps you will recognize the story. It was quite clever and I want to cite it properly as well as excerpt parts of it for this blog.

The story starts off talking about functional programming and how this is the author’s take on that subject. They start with Turing and the Turing machine and observes the Turning machine writes down results on a tape. Results that are consulted by later operations.

After a paragraph or two, they move onto Church and lamda calculus. Rather than writing down results, with functional programming, the results are passed from function to function.

I thought it was an excellent illustration of why Turing machines have “state” (marks on the tape) whereas functional languages (in theory at any rate) have none.

Other writers have made the same distinction but I found the author’s focus on whether results are captured or not being the clearest I have seen.

My impression is that the piece was fairly recent, in the last month or two but I could be mistaken in that regard. It was a blog post and not terribly long. (Exclude published articles and the like.)

Are you the author? Know of the author?

Pointers are most welcome!

Comments Off

« Newer Posts