Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 27, 2015

A Certain Tendency Of The Database Community

Filed under: Consistency,Database,Networks — Patrick Durusau @ 7:16 pm

A Certain Tendency Of The Database Community by Christopher Meiklejohn.

From the post:

Abstract

We posit that striving for distributed systems that provide “single system image” semantics is fundamentally flawed and at odds with how systems operate in the physical world. We realize the database as an optimization of this system: a required, essential optimization in practice that facilitates central data placement and ease of access to participants in a system. We motivate a new model of computation that is designed to address the problems of computation over “eventually consistent” information in a large-scale distributed system.

Eventual Consistency

When we think about the world we live in, we do not usually say it is eventually consistent, for this is a term usually applied to computing systems, made up of multiple machines, that have to operate with shared information.

Eventual consistency is a consistency model for replicated, shared state. A consistency model is a contract between an application developer and a system that application will run on. A contract between a developer and a system states the following: given the developer follows the rules defined by the system, certain outcomes from the system are guaranteed. This makes it possible for developers to build successful applications, for without this contract, applications would have no guarantee that the actions they perform would have a correct outcome.

(italics in original)

A very accessible and great read on “eventual consistency.”

Christopher points out that any “state” of knowledge is a snapshot under a given set of constraints:

For instance, if the leading researchers on breast cancer were to document the state-of-the-art in a book, as the document is being written it would no longer reflect the state-of-the-art. The collective knowledge of this group is always changing, and as long as we continue to rewrite the document it will only be approaching the combined knowledge of the group. We can think of this somewhat formally: if we had a way to view the group’s knowledge as a omniscient observer and we represent that knowledge as a linear function, the recorded text would be asymptotic to function of the sum of global knowledge.

He concludes with this question:

…Can we build computational abstractions that allow devices to communicate peer-to-peer, acknowledging the true source of truth for a particular piece of information and scale to the amount of information that exists, not only between all computers in a planetary-scale distributed system, but all entities in the universe[?]

I’m not sure about “all entities in the universe,” or even a “planetary-scale distributed system,” but we do know that Netware Directory Services (NDS) (now eDirectory) was a replicated, distributed, sharded database with eventual convergence that was written in 1993.

We have had the computational abstractions for a replicated, distributed, sharded database with eventual convergence for a number of years.

I would adjust Christopher’s “true source of truth,” for “source of truth as defined by users,” to avoid the one-world-truth position that crippled the Semantic Web even before FOL and RDF syntax arrived.

October 26, 2015

How to Get Free Access to Academic Papers on Twitter [3 Rules]

Filed under: Computer Science,Twitter — Patrick Durusau @ 8:56 pm

How to Get Free Access to Academic Papers on Twitter by Aamna Mohdin.

From the post:

Most academic journals charge expensive subscriptions and, for those without a login, fees of $30 or more per article. Now academics are using the hashtag #icanhazpdf to freely share copyrighted papers.

Scientists are tweeting a link of the paywalled article along with their email address in the hashtag—a riff on the infamous meme of a fluffy cat’s “I Can Has Cheezburger?” line. Someone else who does have access to the article downloads a pdf of the paper and emails the file to the person requesting it. The initial tweet is then deleted as soon as the requester receives the file.

3 rules to remember:

  1. Paywall link + #icanhazpdf + your email.
  2. Delete tweet when paper arrives.
  3. Don’t ask/Don’t tell.

Enjoy!

Avoiding Big Data: More Business Intelligence Than You Would Think

Filed under: BigData,Business Intelligence — Patrick Durusau @ 8:29 pm

Observing that boosters of “big data” are in a near panic about the slow adoption of “big data” technologies requires no reference.

A recent report from Iron Mountain and PWC may shed some light on the reasons for slow adoption of “big data:”

iron-mountain-01

If you are in the 66% that extracts little or no value from your data, it makes no business sense buy into “big data” when you can’t derive value data already.

Does anyone seriously disagree with that statement? Other than people marketing services whether the client benefits or not.

The numbers get even worse:

From the executive summary:

We have identified a large ‘misguided majority’ – three in four businesses (76%) that are either constrained by legacy, culture, regulatory data issues or simply lack any understanding of the potential value held by their information. They have little comprehension of the commercial benefits to be gained and have therefore not made the investment required to obtain the information advantage.

Now we are up to 3/4 of the market that could not benefit from “big data” tools if they dropped from the sky tomorrow.

To entice you to download Seizing the Information Advantage (the full report):

Typical attributes and behaviours of the mis-guided majority

  • Information and exploitation of value from information is not a priority for senior leadership
  • An information governance oversight body, if it exists, is dominated by IT
  • Limited appreciation of how to exploit their information or the business benefits of doing so
  • Progress is allowed to be held back by legacy issues, regulatory issues and resources
  • Where resources are deployed to exploit information, this is often IT led, and is not linked to the overall business strategy
  • Limited ability to identify, manage and merge large amounts of data sources
  • Analytical capability may exist in the business but is not focused on information value
  • Excessive use of Excel spreadsheets with limited capacity to extract insight

Hmmm, 8 attributes and behaviours of the mis-guided majority (76%) and how many of those issues are addressed by big data technology?

Err, one. Yes?

Limited ability to identify, manage and merge large amounts of data sources

The other seven (7) attributes or behaviours that impede business from deriving value from data have little or no connection to big data technology.

Those are management, resources and social issues that no big data technology can address.

Avoidance of adoption of big data technology reveals a surprising degree of “business intelligence” among those surveyed.

A number of big data technologies will be vital to business growth, but if and only if the management and human issues are addressed that will enable their effective use.

Put differently, investment in big data technologies without addressing related management and human issues is a waste of resources. (full stop)


The report wasn’t all that easy to track down on the Iron Mountain site so here are some useful links:

Executive Summary

Seizing the Information Advantage (“free” but you have to give up your contact information)

Inforgraphic Summary


I first saw this at: 96% of Businesses Fail to Unlock Data’s Full Value by Bob Violino. Bob did not include a link to the report or sufficient detail to be useful.

46-billion-pixel Photo is Largest Astronomical Image of All Time

Filed under: Astroinformatics,BigData — Patrick Durusau @ 7:28 pm

46-billion-pixel Photo is Largest Astronomical Image of All Time by Suzanne Tracy.

From the post:

With 46 billion pixels, a 194 gigabyte file size and numerous stars, a massive new Milky Way photo has been assembled from astronomical observation data gathered over a five-year period.

Astronomers headed by Prof. Dr. Rolf Chini have been monitoring our Galaxy in a search for objects with variable brightness. The researchers explain that these phenomena may, for example, include stars in front of which a planet is passing, or may include multiple systems where stars orbit each other and where the systems may obscure each other at times. The researchers are analyzing how the brightness of stars changes over long stretches of time.

Now, using an online tool, any interested person can

  • view the complete ribbon of the Milky Way at a glance
  • zoom in and inspect specific areas
  • use an input window, which provides the position of the displayed image section, to search for specific objects. (i.e. if the user types in “Eta Carinae,” the tool moves to the respective star; entering the search term “M8” leads to the lagoon nebula.)

You can view the entire Milky Way photo at http://gds.astro.rub.de and read more on the search for variable objects at http://rubin.rub.de/en/sky-more-crowded-we-thought?.

Great project and a fun read for anyone interested in astronomy!

For big data types, confirmation that astronomy remains in the lead with regard to making big data and the power to process that big data freely available to all comers.

I first saw this in a tweet by Kirk Borne.

perv_magnet (10 years of troll abuse published by violinist)

Filed under: Privacy — Patrick Durusau @ 2:24 pm

10 years of troll abuse published by violinist by Lisa Vaas.

From the post:


Matsumiya, who’s based in Los Angeles, describes herself as a violinist and a “perv magnet.” Reading the messages published on Instagram under her perv_magnet account, it’s obvious that she’s not exaggerating.

She’s using Instagram to demonstrate the violence, aggression and volume of messages she’s received and captured via screenshot over 10 years, in an effort to show how relentlessly women are abused online.

I won’t quote any of the abusive messages but you can see the entire collection at: perv_magnet.

Lisa does a good job of covering the issue and possible responses and then says:

The rest of us non-celebrities must bear in mind that taking on trolls can have extremely dangerous consequences.

I’m thinking here of swatting.

Unless you are cultivating marijuana in the basement with grow lights or running a meth lab, I’m not sure how dangerous “swatting” is to an ordinary citizen.

To be sure, don’t make sudden moves when heavily armed police burst into your home but how many of us are likely to be cleaning an AK-47 or RPG when that happens?

The real danger is dealing with trolls is going it alone. One on one, since trolls don’t appear to have any life beyond their closets and keyboards, a troll can spend more time on you can you can possibly use to reply to them.

But what if there were a network of troll fighters? Could the average troll deal with five people responding? What about 50? For really troublesome trolls, how about 500? Or more? In combination with hackers who push back against them in the real world.

No one wants government regulation of the web so it falls to its users to reduce any cause for government intervention.

Simple ignorance on my part but are there any anti-troll networks currently in operation? I would like to volunteer some cycles to their efforts.

PS: I am untroubled by the “freedom of speech” argument of trolls. In the struggle between abusers and the abused, I chose sides a long time ago. Helps get down to concrete cases without a lot of theoretical machinery.

Software Vendors: You have been Pwned by the DoJ!

Filed under: Cybersecurity,Security — Patrick Durusau @ 1:40 pm

DoJ to Apple: your software is licensed, not sold, so we can force you to decrypt by Cory Doctorow.

Cory summarizes the latest diseased imaginings from the minds at the DoJ in their effort to compel Apple to assist in bypassing the security of an iPhone.

The basis for pwning every software vendor with a “license” EULA has been posed by the Department of Justice in IN RE ORDER REQUIRING APPLE INC. TO ASSIST IN THE EXECUTION OF A SEARCH WARRANT ISSUED BY THE COURT No. 15-MC-1902 (JO)

From the brief:

First, Apple is not “so far removed from the underlying controversy that its assistance could not be permissibly compelled.” Apple designed, manufactured, and sold the Target Phone that is the subject of the search warrant. But that is only the beginning of Apple’s relationship to the phone and to this matter. Apple wrote and owns the software that runs the phone, and this software is thwarting the execution of the warrant. Apple’s software licensing agreement specifies that iOS 7 software is “licensed, not sold” and that users are merely granted “a limited non-exclusive license to use the iOS Software.” See “Notices from Apple,” Apple iOS Software License Agreement ¶¶ B(1)-(2), attached hereto as Exhibit C. Apple also restricts users’ rights to sell or lease the iOS Software: although users may make a “one-time permanent transfer of all” license rights, they may not otherwise “rent, lease, lend, sell, redistribute, or sublicense the iOS Software.” Ex. C, ¶ B(3). Apple cannot reap the legal benefits of licensing its software in this manner and then later disclaim any ownership or obligation to assist law enforcement when that same software plays a critical role in thwarting execution of a search warrant.

Apple does not dispute that the iPhone’s passcode mechanism is in part software-based; Apple notes that each device “includes both hardware and software security features.” Apple Br. at 2. Apple’s software impedes the execution of the search warrant in at least two ways. First, it includes the passcode feature that locks the Target Phone and prevents government access to stored information without further assistance from Apple. Second, Apple’s software includes an “erase data” feature which, if enabled by the user, will render the data on the iPhone inaccessible after multiple failed passcode attempts. See “Use a passcode with your iPhone, iPad, or iPod touch,” Apple, https://support.apple.com/en-us/HT204060 (last visited Oct. 22, 2015), attached hereto as Exhibit D. This feature effectively prevents the government from attempting to execute the search warrant without Apple’s assistance. In addition, through the iOS software, Apple provides other ongoing services to device owners, including one that may be used to thwart the execution of a search warrant: “erase your device” which allows a user to send a command remotely to erase data on an iPhone. See “iCloud: Erase your device,” https://support.apple.com/kb/PH2701 (last visited Oct. 22, 2015), attached hereto as Exhibit E. As described above, in this case, someone sent an erase command to the Target Phone after the government seized the phone. Had the phone obtained a network connection while agents examined it, that erase command could have resulted in the data on the phone becoming permanently inaccessible. Given the role Apple’s software plays in thwarting execution of the warrant, by preventing access and permitting post-seizure deletion of data, Apple is not “so far removed from the underlying controversy that its assistance could not be permissibly compelled.”

Vendor licensing of software leaves them connected to it enough to compel them to assist the DoJ.

How’s that for unexpected liability from a licensing agreement? I wonder if it is now legal malpractice to recommend licensing agreements to vendors for software? If not, it will be soon enough.

Bear in mind this argument would extend to the Internet of Things.

Tell me, how does it feel to be at the beck and call of the DoJ?

If that weren’t bad enough news, the government’s brief summarizes all the times Apple has cheerfully helped law enforcement to invade the privacy of its users.

Apple has an established track record of assisting law enforcement agents by extracting data from passcode-locked iPhones pursuant to court orders issued under the All Writs Act. The government has confirmed that Apple has done so in numerous federal criminal cases around the nation, and the vast majority of these cases have been resolved without any need for Apple to testify. In the course of handling these requests, Apple has, on multiple occasions, informed the government that it can extract data from a passcode-locked device and provided the government with the specific language it seeks in the form of a court order to do so.

You must comply with lawful court orders, or face contempt but no where are you required to volunteer or assist law enforcement beyond the confines of a valid court order.

Every request should be rebuffed until accompanied by a valid court order. No exceptions, no helping.

The privacy that is protected may well be your own.

October 25, 2015

Republic, Lost v2 released

Filed under: Government,Politics — Patrick Durusau @ 8:39 pm

Republic, Lost v2 released by Lawrence Lessig.

Lessig announces a new version of Republic, Lost.

From the Amazon page:

In an era when special interests funnel huge amounts of money into our government-driven by shifts in campaign-finance rules and brought to new levels by the Supreme Court in Citizens United v. Federal Election Commission-trust in our government has reached an all-time low. More than ever before, Americans believe that money buys results in Congress, and that business interests wield control over our legislature.

With heartfelt urgency and a keen desire for righting wrongs, Harvard law professor Lawrence Lessig takes a clear-eyed look at how we arrived at this crisis: how fundamentally good people, with good intentions, have allowed our democracy to be co-opted by outside interests, and how this exploitation has become entrenched in the system. Rejecting simple labels and reductive logic-and instead using examples that resonate as powerfully on the Right as on the Left-Lessig seeks out the root causes of our situation. He plumbs the issues of campaign financing and corporate lobbying, revealing the human faces and follies that have allowed corruption to take such a foothold in our system. He puts the issues in terms that nonwonks can understand, using real-world analogies and real human stories. And ultimately he calls for widespread mobilization and a new Constitutional Convention, presenting achievable solutions for regaining control of our corrupted-but redeemable-representational system. In this way, Lessig plots a roadmap for returning our republic to its intended greatness.

While America may be divided, Lessig vividly champions the idea that we can succeed if we accept that corruption is our common enemy and that we must find a way to fight against it. In REPUBLIC, LOST, he not only makes this need palpable and clear-he gives us the practical and intellectual tools to do something about it.

Ahem,

…have allowed our democracy to be co-opted by outside interests, and how this exploitation has become entrenched in the system.

Really?

I’m sure Republic, Lost v2 is a great read but I can say without reading it that “our democracy” hasn’t been “co-opted by outside interests,” as a matter of historical fact.

If you recall even a little American history, representation for offices in the House of Representatives counted slaves as three-fifths of a person.

Section 2 of Article 1, United States Constitution:

Representatives and direct taxes shall be apportioned among the several states . . . by adding to the whole number of free persons, including those bound to service for a term of years, and excluding Indians not taxed, three-fifths of all other persons.

Moreover, “our democracy” has restricted the right to vote on the basis of land ownership, race, gender and a variety of other means. Winning the Vote: A History of Voting Rights.

Truth be told, calling the United States a democracy and/or a republic is a distortion beyond recognition of either word.

Lessig wants to create a new vision of rights and obligations in a democracy but let’s not pretend it is a correction of the present system.

The present system was designed, part and parcel to favor property owners over all other classes and wealthy property owners over the less well to do. Those advantages are baked into the present constitution and law.

Changing those advantages will require a new Constitutional Convention. But let’s remember how the last one turned out. All “rights” will be in play, not just the ones Lessig would redefine.

What if state and local governments become liable for “lost anticipated profits” because of health or regulatory activities? Could happen. The TPP has language to that effect right now.

What if the rights of criminal defendants are curtailed? What if “hate speech” is banned in the new Constitution?

And for that matter, what is to prevent corporations and the wealthy from buying influence at a new constitutional convention? Are delegates going to magically become incorruptible and civic minded?

If you have that cure, why not distributed to elected officials now?

Corrupt institutions are corrupt because people with the power to corrupt them like it that way.

Think long and hard before you give corrupt people a shot at re-writing all the rules.

What a Deep Neural Network thinks about your #selfie

Filed under: Image Processing,Image Recognition,Machine Learning,Neural Networks — Patrick Durusau @ 8:02 pm

What a Deep Neural Network thinks about your #selfie by Andrej Karpathy.

From the post:

Convolutional Neural Networks are great: they recognize things, places and people in your personal photos, signs, people and lights in self-driving cars, crops, forests and traffic in aerial imagery, various anomalies in medical images and all kinds of other useful things. But once in a while these powerful visual recognition models can also be warped for distraction, fun and amusement. In this fun experiment we’re going to do just that: We’ll take a powerful, 140-million-parameter state-of-the-art Convolutional Neural Network, feed it 2 million selfies from the internet, and train it to classify good selfies from bad ones. Just because it’s easy and because we can. And in the process we might learn how to take better selfies 🙂

A must read for anyone interested in deep neural networks and image recognition!

Selfies provide abundant and amusing data to illustrate neural network techniques that are being used every day.

Andrej provides numerous pointers to additional materials and references on neural networks. Good think considering how much interest his post is going to generate!

October 24, 2015

Obfuscation: how leaving a trail of confusion can beat online surveillance [Book]

Filed under: Books,Cybersecurity,Security — Patrick Durusau @ 7:23 pm

Obfuscation: how leaving a trail of confusion can beat online surveillance by Julia Powles.

From the post:

At the heart of Cambridge University, there’s a library tower filled with 200,000 forgotten books. Rumoured by generations of students to hold the campus collection of porn, Sir Gilbert Scott’s tower is, in fact, filled with pocket books. Guides, manuals, tales and pamphlets for everyday life, deemed insufficiently scholarly for the ordinary collection, they stand preserved as an extraordinary relic of past preoccupations.

One new guide in the handbook tradition – and one that is decidedly on point for 2015 – is the slim, black, cloth-bound volume, Obfuscation: A User’s Guide for Privacy and Protest, published by MIT Press. A collaboration between technologist Finn Brunton and philosopher Helen Nissenbaum, both of New York University, Obfuscation packs utility, charm and conviction into its tightly-composed 100-page core. This is a thin book, but its ambition is vast.

Brunton and Nissenbaum aim to start a “big little revolution” in the data-mining and surveillance business, by “throwing some sand in the gears, kicking up dust and making some noise”. Specifically, the authors champion the titular term, obfuscation, or “the addition of ambiguous, confusing, or misleading information to interfere with surveillance and data collection projects”. The objective of such measures is to thwart profiling, “to buy time, gain cover, and hide in a crowd of signals”.

Read Julia’s review and then order Obfuscation: A User’s Guide for Privacy and Protest or add it to your wish list!

MIT Press give this description:

With Obfuscation, Finn Brunton and Helen Nissenbaum mean to start a revolution. They are calling us not to the barricades but to our computers, offering us ways to fight today’s pervasive digital surveillance—the collection of our data by governments, corporations, advertisers, and hackers. To the toolkit of privacy protecting techniques and projects, they propose adding obfuscation: the deliberate use of ambiguous, confusing, or misleading information to interfere with surveillance and data collection projects. Brunton and Nissenbaum provide tools and a rationale for evasion, noncompliance, refusal, even sabotage—especially for average users, those of us not in a position to opt out or exert control over data about ourselves. Obfuscation will teach users to push back, software developers to keep their user data safe, and policy makers to gather data without misusing it.

Brunton and Nissenbaum present a guide to the forms and formats that obfuscation has taken and explain how to craft its implementation to suit the goal and the adversary. They describe a series of historical and contemporary examples, including radar chaff deployed by World War II pilots, Twitter bots that hobbled the social media strategy of popular protest movements, and software that can camouflage users’ search queries and stymie online advertising. They go on to consider obfuscation in more general terms, discussing why obfuscation is necessary, whether it is justified, how it works, and how it can be integrated with other privacy practices and technologies.

In hardcover, Obfuscation retails at $19.95, for 136 pages.

MIT should issue a paperback version for $5.00 (or less in bulk), to put Obfuscation in the range of conference swag.

The underlying principles and discussion are all very scholarly I’m sure (I haven’t read it yet) but obfuscation can only flourish when practiced in large numbers. Cf. “I’m Spartacus”. Spartacus (IMDB), Spartacus Film (Wikipedia)

To paraphrase the Capital One ad: How many different identities do you have in your wallet?

Howler Monkeys with the Louder Voices have Smaller Testicles

Filed under: Humor,Marketing — Patrick Durusau @ 4:12 pm

Howler Monkeys with the Louder Voices have Smaller Testicles by Donald V. Morris.

This was too funny to pass up.

Reminds me of pitch people for technologies that gloss over the details and distort reality beyond mere exaggeration.

Claims of impending world domination when your entire slice of the market for a type of technology is less than one percent for example. That not “impending” in any recognizable sense of the word.

Add your own commentary/remarks and pass this along to your co-workers.

I first saw this in a tweet by Violet Blue.

PS: Yes, I saw that Howler monkeys with smaller testicles live with harems. Consider that a test of how many people will forward the article without reading it first. 😉

October 23, 2015

Information Cartography

Filed under: Cartography,Information Theory — Patrick Durusau @ 8:29 pm

Information Cartography by Carlos Guestrin and Eric Horvitz. (cacm.acm.org/magazines/2015/11/193323)

Brief discussion of the CACM paper that I think will capture your interest.

From the introduction:

We demonstrate that metro maps can help people understand information in many areas, including news stories, research areas, legal cases, even works of literature. Metro maps can help them cope with information overload, framing a direction for research on automated extraction of information, as well as on new representations for summarizing and presenting complex sets of interrelated concepts.

Spend some time this weekend with this article and its references.

More to follow next week!

October 22, 2015

“The first casualty, when war comes, is truth”

Filed under: Government,Image Processing,Image Recognition — Patrick Durusau @ 7:47 pm

The quote, “The first casualty, when war comes, is truth,” is commonly attributed to Hiram Johnson a Republican politician from California in 1917. Johnson died on August 6, 1945, the day the United States dropped an atomic bomb on Hiroshima.

The ARCADE: Artillery Crater Analysis and Detection Engine is an effort to make it possible for anyone to rescue bits of the truth, even during war, at least with regard to the use of military ordinance.

From the post:

Destroyed buildings and infrastructure, temporary settlements, terrain disturbances and other signs of conflict can be seen in freely available satellite imagery. The ARtillery Crater Analysis and Detection Engine (ARCADE) is experimental computer vision software developed by Rudiment and the Centre for Visual Computing at the University of Bradford. ARCADE examines satellite imagery for signs of artillery bombardment, calculates the location of artillery craters, the inbound trajectory of projectiles to aid identification of their possible origins of fire. An early version of the tool that demonstrates the core capabilities is available here.

The software currently runs on Windows with MATLAB, but if there is enough interest, it could be ported to an open toolset built around OpenCV.

Everyone who is interested in military actions anywhere in the world should be a supporter of this project.

Given the poverty of Western reporting on bombings by the United States government around the world, I am very interested in the success of this project.

The post is a great introduction to the difficulties and potential uses of satellite data to uncover truths governments would prefer to remain hidden. That alone should be enough justification for supporting this project.

October 21, 2015

Learning Topic Map Concepts Through Topic Map Completion Puzzles

Filed under: Education,Teaching,Topic Maps — Patrick Durusau @ 4:49 pm

Enabling Independent Learning of Programming Concepts through Programming Completion Puzzles — Kyle Harms by Felienne Hermans.

From the post:

There are lots of puzzle programming tutorials currently in fashion: Code.org, Gidget and Parson’s programming puzzles. But, we don’t really know if they work? There is work [1] that shows that completion exercises do work well, but what about puzzles? That is what Kyle wants to find out.

Felienne is live blogging presentations from VL/HCC 2015 IEEE Symposium on Visual Languages and Human-Centric.

The post is quick read and should generate interest in both programming completion puzzles as well as similar puzzles for authoring topic maps.

There is a pre-print: Enabling Independent Learning of Programming Concepts through Programming Completion Puzzles.

Before you question the results based on the sample size, 27 students, realize that is 27 more test subjects than a database project to replace all the outward services for 5K+ users. Fortunately, very fortunately, a group was able to convince management to tank the entire project. Quite a nightmare and slur on “agile development.”

The lesson here is that puzzles are useful and some test subjects are better than no test subjects at all.

Suggestions for topic map puzzles?

Query the Northwind Database as a Graph Using Gremlin

Filed under: DataStax,Graphs,Gremlin,SQL Server,Titan — Patrick Durusau @ 3:33 pm

Query the Northwind Database as a Graph Using Gremlin by Mark Kromer.

From the post:

One of the most popular and interesting topics in the world of NoSQL databases is graph. At DataStax, we have invested in graph computing through the acquisition of Aurelius, the company behind TitanDB, and are especially committed to ensuring the success of the Gremlin graph traversal language. Gremlin is part of the open source Apache TinkerPop graph framework project and is a graph traversal language used by many different graph databases.

I wanted to introduce you to a superb web site that our own Daniel Kuppitz maintains called “SQL2Gremlin” (http://sql2gremlin.com) which I think is great way to start learning how to query graph databases for those of us who come from the traditional relational database world. It is full of excellent sample SQL queries from the popular public domain RDBMS dataset Northwind and demonstrates how to produce the same results by using Gremlin. For me, learning by example has been a great way to get introduced to graph querying and I think that you’ll find it very useful as well.

I’m only going to walk through a couple of examples here as an intro to what you will find at the full site. But if you are new to graph databases and Gremlin, then I highly encourage you to visit the sql2gremlin site for the rest of the complete samples. There is also a nice example of an interactive visualization / filtering, search tool here that helps visualize the Northwind data set as it has been converted into a graph model.

I’ve worked with (and worked for) Microsoft SQL Server for a very long time. Since Daniel’s examples use T-SQL, we’ll stick with SQL Server for this blog post as an intro to Gremlin and we’ll use the Northwind samples for SQL Server 2014. You can download the entire Northwind sample database here. Load that database into your SQL Server if you wish to follow along.

When I first saw the title to this post,

Query the Northwind Database as a Graph Using Gremlin (emphasis added)

I thought this was something else. A database about the Northwind album.

Little did I suspect that the Northwind Database is a test database for SQL Server 2005 and SQL Server 2008. Yikes!

Still, I thought some of you might have access to such legacy software and so I am pointing you to this post. 😉

PSA:

Support for SQL Server 2005 ends April 16, 2016 (that’s next April)

Support for SQL Server 2008 ended July 8, 2014 Ouch! You are more than a year into a dangerous place. Upgrade, migrate or get another job. Hard times are coming and blame will be assigned.

Clojure for the Brave and True Update!

Filed under: Clojure,Functional Programming,Programming — Patrick Durusau @ 3:08 pm

Clojure for the Brave and True by Daniel Higginbotham.

From the webpage:

Clojure for the Brave and True is now available in print! You can use the coupon code ZOMBIEHUGS to get 30% off at No Starch (plus you’ll get a free sticker), or buy it from Amazon.

The web site has been updated, too! (Don’t forget to force refresh.) One of the reasons I went with No Starch as a publisher was that they supported the idea of keeping the entire book available for free online. It makes me super happy to release the professionally-edited, even better book for free. I hope it makes you laugh, cry, and give up on object-oriented programming forever.

Writing this book was one of the most ambitious projects of my life, and I appreciate all the support I’ve gotten from friends, family, and readers like you. Thank you from the bottom of my crusty heart!

[Update] I got asked for a list of the major differences. Here they are:

  • Illustrations!
  • Almost every chapter now has exercises
  • The first macro chapter, Read and Eval, is massively improved. I’m hoping this will gives readers an excellent conceptual foundation for working with macros
  • There’s now a joke about melting faces
  • There used to be two Emacs chapters (basic emacs and using Emacs for Clojure dev), now there’s just one
  • The concurrency chapter got split into two chapters
  • Appendices on Leiningen and Boot were added
  • The “Do Things” chapter is much friendlier
  • I spend a lot more time explaining some of the more obscure topics, like lazy sequences.
  • Many of the chapters got massive overhauls. The functional programming chapter, for example, was turned completely inside out, and the result is that it’s much, much clearer
  • Overall, everything should be clearer

Daniel has taken the plunge and quit his job to have more time for writing. If you can, buy a print copy and recommend Clojure for the Brave and True to a friend!

We need to encourage people like Daniel and publishers like No Starch. Vote with your feet and your pocket books.

Follow Daniel on twitter @nonrecursive

The Future Of News Is Not An Article

Filed under: Journalism,News,Publishing,Reporting — Patrick Durusau @ 2:22 pm

The Future Of News Is Not An Article by Alexis Lloyd.

Alexis challenges readers to reconsider their assumptions about the nature of “articles.” Beginning with the model for articles that was taken over from traditional print media. Whatever appeared in an article yesterday must be re-created today if there is a new article on the same subject. Not surprising since print media lacks the means to transclude content from a prior article into a new one.

She saves her best argument for last:


A news organization publishes hundreds of articles a day, then starts all over the next day, recreating any redundant content each time. This approach is deeply shaped by the constraints of print media and seems unnecessary and strange when looked at from a natively digital perspective. Can you imagine if, every time something new happened in Syria, Wikipedia published a new Syria page, and in order to understand the bigger picture, you had to manually sift through hundreds of pages with overlapping information? The idea seems absurd in that context and yet, it is essentially what news publishers do every day.

While I agree fully with the advantages Alexis summarizes as Enhanced tools for journalists, Summarization and synthesis, and Adaptive Content (see her post), there are technical and non-technical roadblocks to such changes.

First and foremost, people are being paid to re-create redundant content everyday and their comfort levels, to say nothing about their remuneration for repetitive reporting of the same content will loom large in the adoption of the technology Alexis imagines.

I recall a disturbing story from a major paper where reporters didn’t share leads or research because of fear that other reporters would “scoop” them. That sort of protectionism isn’t limited to journalists. Rumor has it that Oracle sale reps refused to enter potential sales leads in a company wide database.

I don’t understand why that sort of pettiness is tolerated but be aware that it is, both in government and corporate environments.

Second and almost as importantly, Alexis needs raise the question of semantic ROI for any semantic technology. Take her point about adoption of the Semantic Web:

but have not seen universal adoption because of the labor costs involved in doing so.

To adopt a single level of semantic encoding for all content, without regard to its value, either historical or current use, is a sure budget buster. Perhaps the business community was playing closer attention to the Semantic Web than many of us thought, hence its adoption failure.

Some content may need machine driven encoding, more valuable content may require human supervision and/or encoding and some content may not be worth encoding at all. Depends on your ROI model.

I should mention that the Semantic Web manages statements about statements (in its or other semantic systems) poorly. (AKA, “facts about facts.”) Although I hate to use the term “facts.” The very notion of “facts” is misleading and tricky under the best of circumstances.

However universal (universal = among people you know) knowledge of a “fact” may seem, the better argument is that it is only a “fact” from a particular point of view. Semantic Web based systems have difficulty with such concepts.

Third, and not mentioned by Alexis, is that semantic systems should capture and preserve trails created by information explorers. Reporters at the New York Times use databases everyday, but each search starts from scratch.

If re-making redundant information over and over again is absurd, repeating the same searches (more or less successfully) over and over again is insane.

Capturing search trails as data would enrich existing databases, especially if searchers could annotate their trails and data they encounter along the way. The more intensively searched a resource becomes, the richer its semantics. As it is today, all the effort of searchers is lost at the end of each search.

Alexis is right, let’s stop entombing knowledge in articles, papers, posts and books. It won’t be quick or easy, but worthwhile journeys rarely are.

I first saw this in a tweet by Tim Strehle.

Who’s talking about what [BBC News Labs]

Filed under: BBC,News,Searching — Patrick Durusau @ 10:21 am

Who’s talking about what – See who’s talking about what across hundreds of news sources.

Imagine comparing the coverage of news feeds from approximately 350 sources (you choose), with granular date ranges (instead of last 24 hours, last week, last month, last year) plus, “…AND, OR, NOT and parenthesis in queries.” The interface shows co-occurring topics as well.

BBC New Labs did more than +1! a great idea, they implemented it and posted their code.

From the webpage:

Inspired by a concept created by Adam Ramsay, Zoe Blackler & Iain Collins at the Center for Investigative Reporting design sprint on Climate Change

Implementation by Iain Collins and Sylvia Tippmann, using data from the BBC News Labs Juicer | View Source

What conclusions would you draw from reports starting September 1, 2015 to date, “violence AND Israel?

bbc-news-wat4
bbc-news-wat2

One story only illustrates the power of this tool to create comparisons between news sources. Drawing conclusions about news sources requires systematic study of sources across a range of stories. The ability to do precisely that has fallen into your lap.

I first saw this in a tweet by Nick Diakopoulos.

October 20, 2015

Pixar Online Library

Filed under: Graphics,Visualization — Patrick Durusau @ 9:52 pm

Pixar Online Library

The five most recent titles:

  • Vector Field Processing on Triangle Meshes
  • Convolutional Wasserstein Distances: Efficient Optimal Transportation on Geometric Domains
  • Approximate Reflectance Profiles for Efficient Subsurface Scattering
  • Subspace Condensation: Full Space Adaptivity for Subspace Deformations
  • A Data-Driven Light Scattering Model for Hair

Even with help from PIXAR, your app isn’t going to be compelling enough to make users forego breaks, etc.

But, on the other hand, you won’t know until you try. 😉

I was surprised that a list of Pixar films didn’t have an edgy one in the bunch.

The techniques valid for G-rated fare can be amped up for your app.

What graphics or sounds would you program for bank apps?

I first saw this in a tweet by Ozge Ozcakir.

Making Learning Easy by Design

Filed under: Design,Interface Research/Design,Learning — Patrick Durusau @ 9:28 pm

Making Learning Easy by Design – How Google’s Primer team approached UX by Sandra Nam.

From the post:

How can design make learning feel like less of a chore?

It’s not as easy as it sounds. Flat out, people usually won’t go out of their way to learn something new. Research shows that only 3% of adults in the U.S. spend time learning during their day.¹

Think about that for a second: Despite all the information available at our fingertips, and all the new technologies that emerge seemingly overnight, 97% of people won’t spend any time actively seeking out new knowledge for their own development.

That was the challenge at hand when our team at Google set out to create Primer, a new mobile app that helps people learn digital marketing concepts in 5 minutes or less.

UX was at the heart of this mission. Learning has several barriers to entry: you need to figure out what, where, how you want to learn, and then you need the time, money, and energy to follow through.

A short read that makes it clear that designing a learning experience is not easy or quick.

Take fair warning from:

only 3% of adults in the U.S. spend time learning during their day

when you plan on users “learning” a better way from your app or software.

Targeting 3% of a potential audience isn’t a sound marketing strategy.

Google is targeting the other 97%. Shouldn’t you too?

Python at the Large Hadron Collider and CERN

Filed under: Particle Physics,Physics — Patrick Durusau @ 7:14 pm

Python at the Large Hadron Collider and CERN hosted by Michael Kennedy.

From the webpage:

The largest machine ever built is the Large Hadron Collider at CERN. It’s primary goal was the discovery of the Higgs Boson: the fundamental particle which gives all objects mass. The LHC team of 1000’s of physicists achieved that goal in 2012 winning the Nobel Prize in physics. Kyle Cranmer is here to share how Python was at the core of this amazing achievement!

You’ll learn about the different experiment including ATLAS and CMS. We talk a bit about the physics involved in the discovery before digging into the software and computer technology used at CERN. The collisions generate a tremendous amount of data and the technology to filter, gather, and understand the data is super interesting.

You’ll also learn about Crayfis, the app that turns your phone into a cosmic ray detector. No joke. Kyle is taking citizen science to a whole new level.

Bio on Kyle Crammer:

Kyle Cranmer is an American physicist and a professor at New York University at the Center for Cosmology and Particle Physics and Affiliated Faculty member at NYU’s Center for Data Science. He is an experimental particle physicist working, primarily, on the Large Hadron Collider, based in Geneva, Switzerland. Cranmer popularized a collaborative statistical modeling approach and developed statistical methodology, which was used extensively for the discovery of the Higgs boson at the LHC in July, 2012.

CRAYFIS – Join the first and only crowd-sourced cosmic ray detector. You might just help discover something big.

Not heavy with technical information but a nice glimpse into the computing side of CERN.

Share with students to encourage them to pick up programming skills as we once did typing.

Neural Networks Demystified (videos)

Filed under: Neural Networks — Patrick Durusau @ 2:50 pm

I first saw this video series in a tweet by Jason Baldridge.

You know what a pig’s breakfast YouTube’s related videos can be. No matter which part I looked at, there was no full listing of the other parts.

To save you that annoyance, here are all the videos in this series. (That’s a partial definition of curation, saving other people time and expense in finding information.)

Faster Graph Processing [Almost Linear Time Construction Of Spectral Sparsifier For Any Graph]

Filed under: Graph Analytics,Graphs,Spectral Graph Theory — Patrick Durusau @ 12:17 pm

Constructing Linear-Sized Spectral Sparsification in Almost-Linear Time by Yin Tat Lee, He Sun.

Abstract:

We present the first almost-linear time algorithm for constructing linear-sized spectral sparsification for graphs. This improves all previous constructions of linear-sized spectral sparsification, which requires $$\Omega(n^2)$$ time.

A key ingredient in our algorithm is a novel combination of two techniques used in literature for constructing spectral sparsification: Random sampling by effective resistance, and adaptive constructions based on barrier functions.

Apologies to the paper authors for my liberties with their title: Constructing Linear-Sized Spectral Sparsification in Almost-Linear Time but I wanted to capture eyes that might glaze past their more formal title.

The PR release where I saw this article reads as follows:

In the second paper, Constructing linear-sized spectral sparsification in almost-linear time, Dr He Sun, Lecturer in Computer Science in the University’s Department of Computer Science and Yin Tat Lee, a PhD student from MIT, have presented the first algorithm for constructing linear-sized spectral sparsifiers that runs in almost-linear time.

More and more applications from today’s big data scenario need to deal with graphs of millions of vertices. While traditional algorithms can be applied directly in these massive graphs, these algorithms are usually too slow to be practical when the graph contains millions of vertices. Also, storing these practical massive graphs are very expensive.

Dr He Sun said: “Over the past decade, there have been intensive studies in order to overcome these two bottlenecks. One notable approach is through the intermediate step called spectral sparsification, which is the approximation of any input graph by a very sparse graph that inherits many properties of the input graph. Since most algorithms run faster in sparse graphs, spectral sparsification is used as a key intermediate step in speeding up the runtime of many practical graph algorithms, including finding approximate maximum flows in an undirected graph, and approximately solving linear systems, among many others.”

Using spectral sparsification, the researchers ran many algorithms in a sparse graph, and obtained approximately the correct results as well. This general framework allowed them to speed up the runtime of a wide range of algorithms by a magnitude. However, to make the overall approach practical, a key issue was to find faster constructions of spectral sparsification with fewer edges in the resulting sparsifiers. There have been many studies looking at this area in the past decade.

The researchers have proved that, for any graph, they can construct in almost-linear time its spectral sparsifier, and in the output sparsifier every vertex has only constant number of vertices. This result is almost optimal respect to time complexity of the algorithm, and the number of edges in the spectral sparsifier.

Very heavy sledding in the paper but you don’t have to be able to originate the insight in order to take advantage of the technique.

Enjoy!

October 19, 2015

Introduction to Data Science (3rd Edition)

Filed under: Data Science,R — Patrick Durusau @ 9:05 pm

Introduction to Data Science, 3rd Edition by Jeffrey Stanton.

From the webpage:

In this Introduction to Data Science eBook, a series of data problems of increasing complexity is used to illustrate the skills and capabilities needed by data scientists. The open source data analysis program known as “R” and its graphical user interface companion “R-Studio” are used to work with real data examples to illustrate both the challenges of data science and some of the techniques used to address those challenges. To the greatest extent possible, real datasets reflecting important contemporary issues are used as the basis of the discussions.

A very good introductory text on data science.

I originally saw a tweet about the second edition but searching on the title and Stanton uncovered this later version.

In the timeless world of the WWW, the amount of out-dated information vastly exceeds the latest. Check for updates before broadcasting your latest “find.”

CrowdTruth

Filed under: Crowd Sourcing,Philosophy — Patrick Durusau @ 8:42 pm

CrowdTruth

From the webpage:

The CrowdTruth Framework implements an approach to machine-human computing for collecting annotation data on text, images and videos. The approach is focussed specifically on collecting gold standard data for training and evaluation of cognitive computing systems. The original framework was inspired by the IBM Watson project for providing improved (multi-perspective) gold standard (medical) text annotation data for the training and evaluation of various IBM Watson components, such as Medical Relation Extraction, Medical Factor Extraction and Question-Answer passage alignment.

The CrowdTruth framework supports the composition of CrowdTruth gathering workflows, where a sequence of micro-annotation tasks can be configured and sent out to a number of crowdsourcing platforms (e.g. CrowdFlower and Amazon Mechanical Turk) and applications (e.g. Expert annotation game Dr. Detective). The CrowdTruth framework has a special focus on micro-tasks for knowledge extraction in medical text (e.g. medical documents, from various sources such as Wikipedia articles or patient case reports). The main steps involved in the CrowdTruth workflow are: (1) exploring & processing of input data, (2) collecting of annotation data, and (3) applying disagreement analytics on the results. These steps are realised in an automatic end-to-end workflow, that can support a continuous collection of high quality gold standard data with feedback loop to all steps of the process. Have a look at our presentations and papers for more details on the research.

An encouraging quote from Truth is a Lie by Lora Aroyo.

the idea of truth is a fallacy for semantic interpretation and needs to be changed

I don’t disagree but observe a “crowdtruth” with disagreements is a variant of “truth.” What variant of “truth” is of interest to your client is an important issue.

CIA analysts, for example, have little interest in crowdtruths that threaten their prestige and/or continued employment. “Accuracy” is only one aspect of any truth.

If your client is sold on crowdtruths, by all means take up the banner on their behalf. Always remembering:

There are no facts, only interpretations. (Nietzsche)

Which interpretation interests you?

Holographic Embeddings of Knowledge Graphs [Are You Blinding/Gelding Raw Data?]

Filed under: Holographic Embeddings,Knowledge Graph,Tensors — Patrick Durusau @ 10:41 am

Holographic Embeddings of Knowledge Graphs by Maximilian Nickel, Lorenzo Rosasco, Tomaso Poggio.

Abstract:

Learning embeddings of entities and relations is an efficient and versatile method to perform machine learning on relational data such as knowledge graphs. In this work, we propose holographic embeddings (HolE) to learn compositional vector space representations of entire knowledge graphs. The proposed method is related to holographic models of associative memory in that it employs circular correlation to create compositional representations. By using correlation as the compositional operator HolE can capture rich interactions but simultaneously remains efficient to compute, easy to train, and scalable to very large datasets. In extensive experiments we show that holographic embeddings are able to outperform state-of-the-art methods for link prediction in knowledge graphs and relational learning benchmark datasets.

Heavy sledding but also a good candidate for practicing How to Read a Paper.

I suggest that in part because of this comment by the authors in the conclusion:

In future work we plan to further exploit the fixed-width representations of holographic embeddings in complex scenarios, as they are especially suitable to model higher-arity relations (e.g., taughtAt(John, AI, MIT)) and facts about facts (e.g., believes(John, loves(Tom, Mary))).

Any representation where statements of “higher-arity relations” and “facts about facts” are not easily recorded and processed, is seriously impaired when it comes to capturing human knowledge.

Perhaps capturing only triples and “facts” explains the multiple failures of the U.S. intelligence community. It is working with tools that blind and geld its raw data. The rich nuances of intelligence data are lost in a grayish paste suitable for computer consumption.

A line of research worth following. Maximilian Nickel‘s homepage at MIT is a good place to start.

I first saw this in a tweet by Stefano Bertolo.

October 18, 2015

Tracie Powell: “We’re supposed to challenge power…

Filed under: Government,Journalism,Social Media — Patrick Durusau @ 10:01 pm

Tracie Powell: “We’re supposed to challenge power…it seems like we’ve abdicated that to social media” by Laura Hazard Owen.

From the post:

Tracie Powell tries not to use the word “diversity” anymore.

“When you talk about diversity, people’s eyes glaze over,” Powell, the founder of All Digitocracy, told me. The site covers tech, policy, and the impact of media on communities that Powell describes as “emerging audiences” — people of color and of different sexual orientations and gender identities.

I first heard Powell speak at the LION conference for hyperlocal publishers in Chicago earlier this month, where she stood in front of the almost entirely white audience to discuss how journalists and news organizations can get better at reporting for more people.

I followed up with Powell, who is currently a John S. Knight Journalism Fellow at Stanford, to hear more. “If we [as journalists] don’t do a better job at engaging with these audiences, we’re dead,” Powell said. “Our survival depends on reaching these emerging audiences.”

Here’s a lightly condensed and edited version of our conversation.

Warning: Challenging power is far more risky than supporting fiery denunciations of the most vulnerable and least powerful in society.

From women facing hard choices about pregnancy, rape victims, survivors of abuse both physical and emotional, or those who have lived with doubt, discrimination and deprivation as day to day realities, victims of power aren’t hard to find.

One of the powers that needs to be challenged is the news media itself. Take for example the near constant emphasis on gun violence and mass shootings. If you were to take the news media at face value, you would be frightened to go outside.

But, a 2013 Pew Center Report, Gun Homicide Rate Down 49% Since 1993 Peak; Public Unaware tell a different tale:

pew-gun-deaths

Not as satisfying as taking down a representative or senator but in the long run, influencing the mass media may be a more reliable path to challenging power.

Teaching Deep Convolutional Neural Networks to Play Go [Networks that can’t explain their play]

Filed under: Deep Learning,Neural Networks — Patrick Durusau @ 9:21 pm

Teaching Deep Convolutional Neural Networks to Play Go by Christopher Clark, Amos Storkey.

Abstract:

Mastering the game of Go has remained a long standing challenge to the field of AI. Modern computer Go systems rely on processing millions of possible future positions to play well, but intuitively a stronger and more ‘humanlike’ way to play the game would be to rely on pattern recognition abilities rather then brute force computation. Following this sentiment, we train deep convolutional neural networks to play Go by training them to predict the moves made by expert Go players. To solve this problem we introduce a number of novel techniques, including a method of tying weights in the network to ‘hard code’ symmetries that are expect to exist in the target function, and demonstrate in an ablation study they considerably improve performance. Our final networks are able to achieve move prediction accuracies of 41.1% and 44.4% on two different Go datasets, surpassing previous state of the art on this task by significant margins. Additionally, while previous move prediction programs have not yielded strong Go playing programs, we show that the networks trained in this work acquired high levels of skill. Our convolutional neural networks can consistently defeat the well known Go program GNU Go, indicating it is state of the art among programs that do not use Monte Carlo Tree Search. It is also able to win some games against state of the art Go playing program Fuego while using a fraction of the play time. This success at playing Go indicates high level principles of the game were learned.

The last line of the abstract caught my eye:

This success at playing Go indicates high level principles of the game were learned.

That statement is expanded in 4.3 Playing Go:

The results are very promising. Even though the networks are playing using a ‘zero step look ahead’ policy, and using a fraction of the computation time as their opponents, they are still able to play better then GNU Go and take some games away from Fuego. Under these settings GNU Go might play at around a 6-8 kyu ranking and Fuego at 2-3 kyu, which implies the networks are achieving a ranking of approximately 4-5 kyu. For a human player reaching this ranking would normally require years of study. This indicates that sophisticated knowledge of the game was acquired. This also indicates great potential for a Go program that integrates the information produced by such a network.

An interesting limitation that the network can’t communicate what it has learned. It can only produce an answer for a given situation. In gaming situations that opaqueness isn’t immediately objectionable.

But what if the situation was fire/don’t fire in a combat situation? Would the limitation that the network can only say yes or no, with no way to explain its answer, be acceptable?

Is that any worse than humans inventing explanations for decisions that weren’t the result of any rational thinking process?

Some additional Go resources you may find useful: American Go Association, Go Game Guru (with a printable Go board and stones), GoBase.org (has a Japanese dictionary). Those site will lead you to many other Go sites.

Text Analysis Without Programming

Filed under: Text Analytics,Text Mining — Patrick Durusau @ 8:53 pm

Text Analysis Without Programming by Lynn Cherny.

My favorite line in the slideshow reads:

PDFs are a sad text data reality

The slides give a good overview of a number of simple tools for text analysis.

And Cherny doesn’t skimp on pointing out issues with tools such as word clouds, where she says:

People don’t know what they indicate (and at the bottom of the slide: “But geez do people love them.”)

I suspect her observation on the uncertainty of what word clouds indicate is partially responsible for their popularity.

No matter what conclusion you draw about a word cloud, how could anyone offer a contrary argument?

A coding talk is promised and I am looking forward to it.

Enjoy!

16+ Free Data Science Books

Filed under: Books,Data Science — Patrick Durusau @ 8:25 pm

16+ Free Data Science Books by William Chen.

From the webpage:

As a data scientist at Quora, I often get asked for my advice about becoming a data scientist. To help those people, I’ve took some time to compile my top recommendations of quality data science books that are either available for free (by generosity of the author) or are Pay What You Want (PWYW) with $0 minimum.

Please bookmark this place and refer to it often! Click on the book covers to take yourself to the free versions of the book. I’ve also provided Amazon links (when applicable) in my descriptions in case you want to buy a physical copy. There’s actually more than 16 free books here since I’ve added a few since conception, but I’m keeping the name of this website for recognition.

The authors of these books have put in much effort to produce these free resources – please consider supporting them through avenues that the authors provide, such as contributing via PWYW or buying a hard copy [Disclosure: I get a small commission via the Amazon links, and I am co-author of one of these books].

Some of the usual suspects are here along with some unexpected titles, such as A First Course in Design and Analysis of Experiments by Gary W. Oehlert.

From the introduction:

Researchers use experiments to answer questions. Typical questions might be:

  • Is a drug a safe, effective cure for a disease? This could be a test of how AZT affects the progress of AIDS
  • Which combination of protein and carbohydrate sources provides the best nutrition for growing lambs?
  • How will long-distance telephone usage change if our company offers a different rate structure to our customers
  • Will an ice cream manufactured with a new kind of stabilizer be as palatable as our current ice cream?
  • Does short-term incarceration of spouse abusers deter future assaults?
  • Under what conditions should I operate my chemical refinery, given this month’s grade of raw material?

This book is meant to help decision makers and researchers design good experiments, analyze them properly, and answer their questions.

It isn’t short, six hundred and fifty-nine pages, but taken in small doses you will learn a great deal about experimental design. Not only how to properly design experiments but how to spot when they aren’t well designed.

Think of it as training to go big-game hunting in the latest issue of Nature or Science. Adds a bit of competitiveness to the enterprise.

Drone Registration Coming! Call the NRA!

Filed under: Government,Privacy — Patrick Durusau @ 7:09 pm

US government will reportedly require all drone purchases to be registered by Chris Welch.

From the post:

The US government plans to make it a mandatory requirement that all drone purchases, including those made by consumers, be formally registered. NBC News reports that the Department of Transportation will announce the new plan on Monday, with hopes to have this drone registry implemented by the holidays, when drones will likely prove a popular gift. The Obama administration and DoT have yet to announce any such press conference for Monday.

Chris promises more details so follow @chriswelch.

Registration of drones isn’t going to help regulate drones, unless of course the drones have identifying marks and/or broadcast their registration. Yes?

In other words, registration of drones is a means of further government surveillance on where and when you fly your drone.

If you want an unregistered drone, buy one before regulations requiring registration go into effect.

So long as you are obeying all aviation laws, the government has no right to know where and when you fly your drone.

Hopefully the NRA will realize that preserving gun ownership where the government tracks:

  • All your phone calls.
  • All your emails.
  • All your web traffic.
  • All your cell phones.
  • All your credit cards.
  • All your purchases.
  • All your use of drones.

isn’t all that meaningful by itself.

Tracking the government and its servants is a first step towards ending the current surveillance state.

« Newer PostsOlder Posts »

Powered by WordPress