Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 4, 2015

Fear Cultists in Atlanta

Filed under: News,Security — Patrick Durusau @ 2:02 pm

connector-2

(original image by Ben Gray, bgray@ajc.com)

Fear cultists shut down the main connector running through downtown Atlanta for two hours on Monday afternoon.

Why? The device you see covered with duct tape above was a part of an art project using time lapse cameras. How a College Art Project Shut Down an Atlanta Highway by Emily Sharpiro.

Unfortunately, the fear cultists didn’t know what the device was and therefore it had the potential to be a smaller than a briefcase thermonuclear device. Which they subsequently exploded. Good move there!

The number of devices that fear cultists do not recognize is rather large. Some estimates put it as high as your odds of winning the lottery.

The traditional remedy would be to put a rope around the device and simply pull it off. It was obviously too small to do any serious damage and misplaced if it were meant to be a bomb. Take maybe 15 minutes and only stopping traffic for a few minutes.

But no, the fear cultists have to turn every art project, empty suitcase/backpack, cardboard box into a major security incident. How else to convince the public to subscribe to the terrorists are just around the corner fantasy?

One person was saying on the news than an apology, one assumes from the art project wasn’t enough. If any apology is owed it is by the fear cultists of Atlanta for interfering with a legitimate educational activity and annoying drivers with its anti-terror antics.

Yes, had it been a real bomb, badly placed, it might have damaged a few cars and possibly killed someone. You can watch the Atlanta news every night to hear about cars being damaged and people killed. Why a terrorist incident makes a difference to fear cultists, well, you would have to ask them, I have no idea.

To spend your security funds wisely, stay away from fear cultists who dodge and squeal at every unknown. Not only is it tiresome, it is also wasteful.

February 3, 2015

Marko Reassures the TinkerPop Community

Filed under: DataStax,TinkerPop — Patrick Durusau @ 8:24 pm

How the DataStax Acquisition of Aurelius Will Effect TinkerPop by Marko A. Rodriguez.

From the post:

As you may already know, Aurelius has been acquired by DataStax — http://thinkaurelius.com/2015/02/03/aurelius-acquired-by-datastax/. Aurelius is the graph computing company behind Titan which also provides a good number of contributors to TinkerPop. DataStax is the distributed database company behind Cassandra. Matthias and I are very excited about this acquisition. With DataStax’s resources, the graph community is going to see a powerful, rock-solid, Titan-inspired distributed graph database in the near future — enterprise support, focused large-team development, and 1000+ node cluster testing for each release. A great thing for the graph community, indeed. Also, a great thing for TinkerPop —

Looking forward to seeing how a pairing of DataStax resources and Marko’s vision for graph computing expresses itself. This could be a lot of fun!

Wandora 2015-02-03 Release!

Filed under: Topic Map Software,Wandora — Patrick Durusau @ 8:02 pm

Wandora 2015-02-03 Release!

From the change log:

This release enhances Wandora’s UI.

  • View menu has been restructured. Now the View menu contains a submenu for each open topic panel.
  • Search (topic) panel. Open any number of distinct searches and queries. Use menu option View > New panel > Search.
  • Tree (topic) panel. Open any number of distinct topic trees. Add new tree with menu option View > New panel > Tree.
  • Layer info (topic panel). Keep topic map layer info panel open while you edit the topic map. View layer info with menu option View > New panel > Layer info.
  • Drop extractor (topic panel). Yes, Drag and drop extractor is back. Drop extractor is very handy when you need to build a topic of using local resources such as files.
  • Numerous smaller fixes and enhancements.

Looking forward to testing out the new UI features!

Comments/suggestions?

Handicapping Judges and Lawyers [and Legislators?]

Filed under: Government,Law — Patrick Durusau @ 6:24 pm

Stanford-Bred Startup Uses Moneyball Stats To Handicap Judges, Lawyers by Daniel Fisher.

From the post:

If you’re being sued for patent infringement before U.S. District Judge Lucy Koh in the heart of California’s Silicon Valley, there’s something you ought to know. Koh is tough on defendants and only grants 18% of motions for summary judgment, less than half the national average. So your lawyers had better be on their game before they try to convince Koh to chuck out the case, and you’d better have a fallback strategy in case she doesn’t.

These stats are just a sample of what a venture-funded startup with roots in a project funded by Cisco Systems and Apple is doing to bring mathematical analysis to the arcane world of litigation. Where Lexis and Westlaw tell attorneys what the law is, Lex Machina tells them what actually happens in the courtroom.

The system, focusing for now on the litigation-intense world of patent law, has compiled an exhaustive database of patent cases going back to 2000 so companies and lawyers can determine how many times a patent has been the subject of litigation and how the lawsuits were resolved. Lex Machina also handicaps law firms based on their win-loss records before specific judges with specific procedural maneuvers, so in-house attorneys can determine who to hire.

As Daniel points out, this service is limited to patent law at the moment but it is a wide open field otherwise. The data is in the public domain (court records) and the real rub is going to be efficient collection of the data.

To a future startup: Be mindful that the same techniques demonstrated here can also be applied to legislators, legislation and campaign contributions at local, state and federal levels.

Your clients can avoid over-paying for largely ineffectual members of Congress and not under bidding for state house speakers when losing isn’t an option.

Handicapping judges, lawyers, legislators won’t ever be 100%, modulo illegal inducements, but you can play your best hand more often.

Realize that other than the offensive terminology, “handicapping,” and the level of detail, this sort of knowledge of local judges was accumulated ad hoc by lawyers in a particular jurisdiction. One of the main reasons for hiring “local” counsel.

The advantage that Lex Machina offers is that the knowledge is detailed and can be offered to anyone willing to pay for it.

Now would be a good time for public spirited foundations to start public handicapping of judges and lawyers projects. Just as an example, rather than election year or episodic reports of judges discriminating against battered women, such a process could provide current a real time view into the operation of the judiciary.

I first saw this in a tweet by Carl Anderson.

Raspberry Pi gets 6x the power, 2x the memory…

Filed under: Parallel Programming,Supercomputing — Patrick Durusau @ 5:40 pm

Raspberry Pi gets 6x the power, 2x the memory and still costs $35 by Stacey Higginbotham.

From the post:

Makers, academics and generally anyone who likes to play with computers: get ready for some awesomesauce. Raspberry Pis, the tiny Linux computers that currently sell for $35 are getting a makeover that will give a tremendous boost to their compute power and double their memory while still keeping their price the same.

The Pi 2 boards will be available today, and Pi creator and CEO of Raspberry Pi (Trading) Ltd. Eben Upton says the organization has already built 100,000 units, so buyers shouldn’t have to wait like they did at the original Pi launch. The Pi 2 will have the following specs:

  • SoC : Broadcom BCM2836 (CPU, GPU, DSP, SDRAM, and single USB port)
  • CPU: 900 MHz quad-core ARM Cortex A7 (ARMv7 instruction set)
  • GPU: Broadcom VideoCore IV @ 250 MHz, OpenGL ES 2.0 (24 GFLOPS), 1080p30 MPEG-2 and VC-1 decoder (with license), 1080p30 h.264/MPEG-4 AVC high-profile decoder and encoder
  • Memory: 1 GB (shared with GPU)
  • Total backwards compatibility (in terms of multimedia, form-factor and interfacing) with Pi 1

Why order a new Raspberry Pi?

Well, Kevin Trainor had Ontopia running on the first version: Ontopia Runs on Raspberry Pi [This Rocks!]

Hadoop on a Raspberry Pi

And bear in mind my post: 5,000 operations per second – Computations for Hydrogen Bomb. What are you going to design with your new Raspberry Pi?

If 5,000 operations per second could design a Hydrogen Bomb, what can you do with a 24 GFLOPS video chip? Faster Pac-Man, more detailed WarCraft, or Call of Duty, Future Warfare 4?

Money makers no doubt but at the end of the day, still substitutes for changing the world.

February 2, 2015

Neo4j 2.2 Milestone 3 Release

Filed under: Graphs,Neo4j — Patrick Durusau @ 7:47 pm

Highlights of Neo4j 2.2 release:

From the post:

Three of the key areas being tackled in this release are:

      1. Highly Concurrent Performance

      With Neo4j 2.2, we introduce a brand new page cache designed to deliver extreme performance and scalability under highly concurrent workloads. This new page cache helps overcome the limitations imposed by the current IO systems to support larger applications with hundreds of read and/or write IO requirements. The new cache is auto-configured and matches the available memory without the need to tune memory mapped IO settings anymore.

      2. Transactional & Batch Write Performance

      We have made several enhancements in Neo4j 2.2 to improve both transactional and batch write performance by orders of magnitude under highly concurrent load. Several things are changing to make this happen.

      • First, the 2.2 release improves coordination of commits between Lucene, the graph, and the transaction log, resulting in a much more efficient write channel.
      • Next, the database kernel is enhanced to optimize the flushing of transactions to disk for high number of concurrent write threads. This allows throughput to improve significantly with more write threads since IO costs are spread across transactions. Applications with many small transactions being piped through large numbers (10-100+) of concurrent write threads will experience the greatest improvement.
      • Finally, we have improved and fully integrated the “Superfast Batch Loader”. Introduced in Neo4j 2.1, this utility now supports large scale non-transactional initial loads (of 10M to 10B+ elements) with sustained throughputs around 1M records (node or relationship or property) per second. This seriously fast utility is (unsurprisingly) called neo4j-import, and is accessible from the command line.

      3. Cypher Performance

      We’re very excited to be releasing the first version of a new Cost-Based Optimizer for Cypher, under development for nearly a year. While Cypher is hands-down the most convenient way to formulate queries, it hasn’t always been as fast as we’d like. Starting with Neo4j 2.2, Cypher will determine the optimal query plan by using statistics about your particular data set. Both the cost-based query planner, and the ability of the database to gather statistics, are new, and we’re very interested in your feedback. Sample queries & data sets are welcome!

The most recent milestone is here.

Now is the time to take Neo4j 2.2 for a spin!

Master Concurrent Processes with core.async

Filed under: Clojure,Functional Programming,Programming — Patrick Durusau @ 7:21 pm

Master Concurrent Processes with core.async

From the post:

One day, while you are walking down the street, you will be surprised, intrigued, and a little disgusted to discover a hot dog vending machine. Your scalp tingling with guilty curiosity, you won’t be able to help yourself from pulling out three Sacagawea dollars and seeing if this contraption actually works. After accepting your money, it will emit a click and a whirr, and out will pop a fresh hot dog, bun and all.

The vending machine exhibits simple behavior: when it receives money, it releases a hot dog, then readies itself for the next purchase. When it’s out of hot dogs, it stops. All around us are hot dog vending machines in different guises, independent entities concurrently responding to events in the world according to their nature. The espresso machine at your favorite coffee shop, the pet hamster you loved as a child – everything can be modeled in terms of behavior using the general form, “when x happens, do y.” Even the programs we write are just glorified hot dog vending machines, each one an independent process waiting for the next event, whether it’s a keystroke, a timeout, or the arrival of data on a socket.

machine

Clojure’s core.async library allows you to create independent processes within a single program. This chapter describes a useful model for thinking about this style of programming as well as the practical details you need to actually write stuff. You’ll learn how to use channels, alts, and go blocks to create independent processes and communicate between them, and you’ll learn a bit about how Clojure uses threads and something called “parking” to allow this all to happen efficiently.

One of the things I like the best about CLOJURE for the BRAVE and TRUE by Daniel Higginbotham is the writing style.

Yes, most processes are more complicated than a hot dog vending machine but then the inside of a hot dog vending machine if probably more complicated than you would think as well. But the example captures enough of the essence of the vending machine to make it work.

That’s a rare authoring talent and you should take advantage of it whenever you see it. (As in Daniel’s Clojure site.)

Who asks the questions? [Big Data]

Filed under: BigData,Government,Politics — Patrick Durusau @ 7:04 pm

“Who asks the questions?” is a section header in Follow The Data Down The Rabbit Hole by Mark Gazit.

The question of human bias hangs like a shadow over the accuracy and efficiency of big data analytics, and thus the viability of answers obtained thereof. If different humans can look at the same data and come to different conclusions, just how reliable can those deductions be?

There is no question that using data science to extract knowledge from raw data provides tremendous value and opportunity to organizations in any sector, but the way it is analyzed has crucial bearing on that value.

In order to extract meaningful answers from big data, data scientists must decide which questions to ask of the algorithms. However, as long as humans are the ones asking the questions, they will forever introduce unintentional bias into the equation. Furthermore, the data scientists in charge of choosing the queries are often much less equipped to formulate the “right questions” than the organization’s specialized domain experts.

For example, a compliance manager would ask much better questions about her area than a scientist who has no idea what her day-to-day work entails. The same goes for a CISO or the executive in charge of insider threats. Does this mean that your data team will have to involve more people all the time? And what happens if one of those people leaves the company?

Data science is necessary and important, and as data grows, so does the need for experienced data scientists. But at the same time, leaving all the computational work to humans makes it slower, less scientific, and quick to degrade in quality because the human mind cannot keep up with the quantum leap that big data is undergoing. (emphasis added)

I find Big Data hype such as:

But at the same time, leaving all the computational work to humans makes it slower, less scientific, and quick to degrade in quality because the human mind cannot keep up with the quantum leap that big data is undergoing. (emphasis added)

deeply problematic.

The “human mind” is responsible for the creation of “big data” and our biases and assumptions are built into the hardware and software that process it.

Why should that be any different for the questions we ask of “Big Data?” Who is there to pose questions other than the “human mind?” Or to set into motion a framework that asks questions within the parameters of a framework that originated with a “human mind?”

Claims that “…the human mind cannot keep up…” are references to “your” human mind and not the “human maid” of the person making the statement. They are about to claim to have the correct interpretation of some data set. Just as statisticians (or to be fair, people claiming to be statisticians) for years claimed there was no link between smoking and lung cancer.

Claims about the human (your) brain are always made with an agenda. An agenda that puts some “fact,” some policy, some principle beyond being questioned. Identify that “fact,” policy, or principle because it is where their evidence is the weakest, else they would not try to put it beyond question.

Configuring IPython Notebook Support for PySpark

Filed under: Python,Spark — Patrick Durusau @ 4:50 pm

Configuring IPython Notebook Support for PySpark by John Ramey.

From the post:

Apache Spark is a great way for performing large-scale data processing. Lately, I have begun working with PySpark, a way of interfacing with Spark through Python. After a discussion with a coworker, we were curious whether PySpark could run from within an IPython Notebook. It turns out that this is fairly straightforward by setting up an IPython profile.

A quick setup note for a useful configuration of PySpark, IPython Notebook.

Good example of it being unnecessary to solve every problem to make a useful contribution.

Enjoy!

Hortonworks Establishes Data Governance Initiative

Filed under: Data Governance,Hadoop,Hortonworks — Patrick Durusau @ 9:53 am

Hortonworks Establishes Data Governance Initiative

From the post:

Hortonworks® (NASDAQ:HDP), the leading contributor to and provider of enterprise Apache™ Hadoop®, today announced the creation of the Data Governance Initiative (DGI). DGI will develop an extensible foundation that addresses enterprise requirements for comprehensive data governance. In addition to Hortonworks, the founding members of DGI are Aetna, Merck, and Target and Hortonworks’ technology partner SAS.

Enterprises adopting a modern data architecture must address certain realities when legacy and new data from disparate platforms are brought under management. DGI members will work with the open source community to deliver a comprehensive solution; offering fast, flexible and powerful metadata services, deep audit store and an advanced policy rules engine. It will also feature deep integration with Apache Falcon for data lifecycle management and Apache Ranger for global security policies. Additionally, the DGI solution will interoperate with and extend existing third-party data governance and management tools by shedding light on the access of data within Hadoop. Further DGI investment roadmap phases will be released in the coming weeks.

Supporting quotes

“This joint engineering initiative is another pillar in our unique open source development model,” said Tim Hall, vice president, product management at Hortonworks. “We are excited to partner with the other DGI members to build a completely open data governance foundation that meets enterprise requirements.”

“As customers are moving Hadoop into corporate data and processing environments, metadata and data governance are much needed capabilities. SAS participation in this initiative strengthens the integration of SAS data management, analytics and visualization into the HDP environment and more broadly it helps advance the Apache Hadoop project. This additional integration will give customers better ability to manage big data governance within the Hadoop framework,” said SAS Vice President of Product Management Randy Guard.

Further reading

Enterprise Hadoop: www.hortonworks.com/hadoop

Apache Falcon: http://hortonworks.com/hadoop/falcon/

Hadoop and a Modern Data Architecture: www.hortonworks.com/mda

For more information:

Mike Haro
408-438-8628
comms@hortonworks.com

Quite possibly an opportunity to push for topic map like capabilities in an enterprise setting.

That will require affirmative action on the part of members of the TM community as it is unlikely Hortonworks and others will educate themselves on topic maps.

Suggestions?

Best of the Visualization Web… December 2014

Filed under: Graphics,Visualization — Patrick Durusau @ 9:38 am

Best of the Visualization Web… December 2014 by Andy Kirk.

From the post:

At the end of each month I pull together a collection of links to some of the most relevant, interesting or thought-provoking web content I’ve come across during the previous month. Here’s the latest collection from December 2014.

Andy lists:

Forty (40) links to visualizations.

Thirteen (13) links to articles.

Seven (7) links to learning and development.

Seven (7) links on visualization as a subject.

Six (6) sundry links that may be of interest.

Out of seventy-three (73) links, not one visual!

I rather like that because you can quickly scan Andy’s one line descriptions far faster than you could scroll through a sample from each site.

Worth bookmarking and returning to on a regular basis.

February 1, 2015

Mapping the Blind Spots:…

Filed under: Data Science,Mapping,Maps,Privacy — Patrick Durusau @ 4:48 pm

Mapping the Blind Spots: Developer Unearths Secret U.S. Military Bases by Lorenzo Franceschi-Bicchierai.

From the post:

If you look closely enough on Google or Bing Maps, some places are blanked out, hidden from public view. Many of those places disguise secret or sensitive American military facilities.

The United States military has a foothold in every corner of the world, with military bases on every continent. It’s not even clear how many there are out there. The Pentagon says there are around 5,000 in total, and 598 in foreign countries, but those numbers are disputed by the media.

But how do these facilities look from above? To answer that question, you first need to locate the bases. Which, as it turns out, is relatively easy.

That’s what Josh Begley, a data artist, found out when he embarked on a project to map all known U.S. military bases around the world, collect satellite pictures of them using Google Maps and Bing Maps, and display them all online.

The project, which he warns is ongoing, was inspired by Trevor Paglen’s book “Blank Spots on the Map” which goes inside the world of secret military bases that are sometimes censored on maps.

A great description of how to combine public data to find information others prefer to not be found.

I suspect the area is well enough understood to make a great high school science fair project, particularly if countries that aren’t as open as the United States were used as targets for filling in the blank spaces. Would involve obtaining public maps for that country, determining what areas are “blank,” photo analysis of imagery, correlation with press and other reports.

Or detection of illegal cutting of forests, mining, or other ecological crimes. All of those are too large scale to be secret.

Better imagery is only a year or two away, perhaps sufficient to start tracking polluters who truck industrial wastes to particular states for dumping.

With satellite/drone imagery and enough eyes, no crime is secret.

The practices of illegal forestry, mining, pollution, virtually any large scale outdoor crime will wither under public surveillance.

That might not be a bad trade-off in terms of privacy.

Data Sources on the Web

Filed under: Data,R — Patrick Durusau @ 4:23 pm

Data Sources on the Web

From the post:

The following list of data sources has been modified as of January 2015. Most of the data sets listed below are free, however, some are not.

If an (R!) appears after source this means that the data are already in R format or there exist R commands for directly importing the data from R. (See http://www.quantmod.com/examples/intro/ for some code.) Otherwise, i have limited the list to data sources for which there is a reasonably simple process for importing csv files. What follows is a list of data sources organized into categories that are not mutually exclusive but which reflect what's out there.

Want to add to or update this list? Send to mran@revolutionanalytics.com

As you know, there are any number of data lists on the Net. This one is different, it is a maintained data list.

Enjoy!

I first saw this in a tweet by David Smith.

Parallel Programming with GPUs and R

Filed under: Data Science,GPU,Parallel Programming — Patrick Durusau @ 1:49 pm

Parallel Programming with GPUs and R by Norman Matloff.

From the post:

You’ve heard that graphics processing units — GPUs — can bring big increases in computational speed. While GPUs cannot speed up work in every application, the fact is that in many cases it can indeed provide very rapid computation. In this tutorial, we’ll see how this is done, both in passive ways (you write only R), and in more direct ways, where you write C/C++ code and interface it to R.

Norman provides as readable an introduction to GPUs as I have seen in a while, a quick overview of R packages for accessing GPUs and then a quick look at writing CUDA code and problems you may have compiling it.

Of particular note is Norman’s reference to a draft of his new book, Parallel Computation for Data Science, which introduces parallel computing with R.

It won’t be long before parallel or not processing will be a detail hidden from the average programmer. Until then, see this post and Norman’s book for parallel processing and R.

Harry Potter eBooks

Filed under: Books,Literature,Publishing — Patrick Durusau @ 1:26 pm

All the Harry Potter ebooks are now on subscription site Oyster by Laura Hazard Owen.

Laura reports the Harry Potter books are available on Oyster and Amazon. She says that Oyster has the spin-off titles from the original series where Amazon does not.

Both offer $9.95 per month subscription rates, where Oyster claims “over a million” books and Amazon over 700,000. After reading David Mason’s How many books will you read in your lifetime?, I am not sure the difference in raw numbers will make much difference.

Access to electronic texts will certainly make creating topic maps for popular literature a good deal easier.

Enjoy!

Singularity University and Google

Filed under: Education,Government,Politics — Patrick Durusau @ 11:30 am

Singularity University Announces Google Support for Increased Global Access and Diversity in Tech

From the post:

Singularity University (SU), the technology-focused education institute and global business accelerator has announced a new multi-million dollar agreement with Google aimed at breaking down barriers to technology innovation by creating opportunities for a more diverse group of entrepreneurs from around the world.

Through the agreement, Google will provide $1.5 million annually for the next two years to help fund qualified and selected candidates to SU’s flagship Graduate Studies Program (GSP) – a 10-week immersive experience that educates and empowers the best minds to use exponential technologies to solve the world’s greatest challenges. While SU’s sponsored Global Impact Competitions (GIC) winners will continue to comprise a substantial portion of the GSP class, the new Google funding will enable SU to also make the remaining seats in the program available free of charge to direct applicants. GSP participants are engaged in twelve tracks of exponential technology development and mentored by leaders and investors in the technology sector with the focus of abating poverty and creating innovative solutions in the areas of clean energy, water, education, security, and healthcare.

Applications are now open for the 2015 Graduate Studies Program through SU’s Direct Admission online application: http://apply2015.singularityu.org/

Recently MapR made Hadoop training and certification available for free and now Google is supporting Singularity University to make it free as well.

A marked contrast to state supported colleges and universities where tuition continues to rise faster than inflation. Not to mention educational loans, which are made at no risk to lenders, continue to burden students for years after graduation.

What does the “free market” know about the return on education that the “public sector” seems to have forgotten?

Rather than investing $trillions in the pursuit of terrorist bogeymen, paying off all student debt and making higher education free for everyone would be a much better investment.

DJA Newsletter [If you can’t see the data, it’s not news, just rumor.]

Filed under: Journalism,News,Reporting — Patrick Durusau @ 11:04 am

DJA Newsletter: The best of Data Journalism every month

From the about page:

The Global Editors Network is a cross-platform community of editors-in-chief and media innovators committed to sustainable, high-quality journalism, empowering newsrooms through a variety of programmes designed to inspire, connect and share. Our online community allows media innovators from around the world to interact and collaborate on projects created through our programmes. The GEN Board Members support this mission and have signed the GEN Manifesto.

We are driven by a journalistic imperative and a common goal: Content and Engagement First. To that end, we support all kinds of organizations and media outlets, to define a vision for the future of journalism and enhance its quality through innovation and cooperation. Freedom of information and independence of the news media are, and will remain, the main credo of the Global Editors Network and we will back all efforts to enhance press freedom worldwide.

The links in this month’s newsletter:

  1. Every active satellite orbiting earth
  2. Islam in Europe – the gap between perceptions and reality
  3. What news sources does China block?
  4. What happens when you scrape AirBnB data?
  5. RiseUp revolutions

Looking forward to seeing more issue of the DJA newsletter!

« Newer Posts

Powered by WordPress