Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 6, 2012

Hadoop for Dummies

Filed under: Hadoop,MapReduce — Patrick Durusau @ 11:37 am

Hadoop for Dummies by Robert D. Schneider.

Courtesy of IBM, it’s what you think it is.

I am torn between thinking that educating c-suite executives is a good idea and wondering what sort of mis-impressions will follow from that education.

I suppose that could be an interesting sociology experiment. IT departments could forward the link to their c-suite executives and then keep track of the number and type of mis-impressions.

Collected at some common website by industry, could create a baseline for c-suite explanations of technology. 😉

Advanced Data Analysis from an Elementary Point of View

Filed under: Data Analysis,Mathematics,Statistics — Patrick Durusau @ 11:35 am

Advanced Data Analysis from an Elementary Point of View by Cosma Rohilla Shalizi. (UPDATE: 2014 draft

From the Introduction:

These are the notes for 36-402, Advanced Data Analysis, at Carnegie Mellon. If you are not enrolled in the class, you should know that it’s the methodological capstone of the core statistics sequence taken by our undergraduate majors (usually in their third year), and by students from a range of other departments. By this point, they have taken classes in introductory statistics and data analysis, probability theory, mathematical statistics, and modern linear regression (“401”). This class does not presume that you have learned but forgotten the material from the pre-requisites; it presumes that you know that material and can go beyond it. The class also presumes a ïŹrm grasp on linear algebra and multivariable calculus, and that you can read and write simple functions in R. If you are lacking in any of these areas, now would be an excellent time to leave.

36-402 is a class in statistical methodology: its aim is to get students to understand something of the range of modern1 methods of data analysis, and of the considerations which go into choosing the right method for the job at hand (rather than distorting the problem to ïŹt the methods the student happens to know). Statistical theory is kept to a minimum, and largely introduced as needed.

[Footnote 1] Just as an undergraduate “modern physics” course aims to bring the student up to about 1930 (more speciïŹcally, to 1926), this class aims to bring the student up to about 1990.

Very recent introduction to data analysis. Shalizi includes a list of concepts in the introduction that best be mastered before tackling this material.

According to footnote 1, when you have mastered this material, you have another twenty-two years to make up in general and on your problem in particular.

Still, knowing it cold will put you ahead of a lot of data analysis you are going to encounter.

I first saw this in a tweet by Gene Golovchinsky.

December 5, 2012

Army intelligence awards $149 million IT contract to SAIC [Example of Insanity]

Filed under: Marketing,Topic Maps — Patrick Durusau @ 5:51 pm

Army intelligence awards $149 million IT contract to SAIC

I assume you have heard Einstein’s definition of insanity?

Insanity: doing the same thing over and over again and expecting different results.

You may remember SAIC for soaking the FBI for $170 million on the Virtual Case File project.

A somewhat older piece covers SAIC up until 2008.

Or another report of:

Found sufficient evidence that the company acted with reckless disregard or deliberate ignorance;

So, the Army is expecting some result other than failure from SAIC.

I mention this to suggest a topic map application that has a link labeled “Failures, Fraud, Other Crimes of Contractor’s Name” displayed on the contract award page.

Does raise a time line issue since it would be unfair to list failures, frauds and other crimes that occurred after the contract award.

Suggestions?

Don’t feed the semantic black holes [Dangers of Semantic Promiscuity]

Filed under: Linked Data,Security,Virus — Patrick Durusau @ 4:42 pm

Don’t feed the semantic black holes by Bernard Vatant.

From the post:

If I remember correctly it was at Knowledge Technologies 2001Ann Wrightson explained us, during the informal RDF-Topic Maps session, how to build a semantic virus for Topic Maps, through abuse of subject indicator. At the time OWL and its now infamous owl:sameAs were not yet around, but the idea was identical : if several “topics” A, B, C, … indicate the same “subject” X, then they should be merged into a single topic. In linked data land ten years after it’s the same story : if RDF descriptions A, B, C … declare a owl:sameAs link to X, then A and B are merged together with the current description of X.

Hence the very simple semantic virus concept :

1. Harvest all the topic identifiers you can grab from distributed topic maps (read today : URIs from distributed linked data).

2. Publish a new topic map adding a common subject indicator to every topic description you have harvested (read today : add owl:sameAs X to all resource descriptions)

Now if you query the resulting data base for the description of any topic (resource) in it you get just all elements of description of everything on anything. All the map is collapsed on a single heavy and meaningless node. An irreversible semantic collapse.

True but that’s like having unprotected sex with a hooker in the bushes near a truck stop in India.

Reliance on non-verified sources of data is like unprotected sex, except for the lack of enjoyable parts.

As Bernard points out, this can lead to very bad consequences.

I would not wait for Bernard’s provenance indication using named graphs. Do you think people who would create malicious owl:sameAs statements would also create false statements about their graphs? Gasp! 😉

Trusting evil-doers to respect provenance conventions meant to exclude their content is a low percentage bet.

One solution, possibly a commercially viable one, would be to harvest and test linked data, being a canonical and trusted source for that data. Any semantic black holes being detected and blocked from reaching you.

A prophylactic service as it were.

YASGUI: Web-based SPARQL client with bells ‘n wistles

Filed under: RDF,SPARQL — Patrick Durusau @ 4:13 pm

YASGUI: Web-based SPARQL client with bells ‘n wistles

From the post:

A few months ago Laurens Rietveld was looking for a query interface from which he could easily query any other SPARQL endpoint.

But he couldn’t find any that fit my requirements:

So he decided to make his own!

Give it a try at: http://aers.data2semantics.org/sparql/

Future work (next year probably):

In case you are interested in SPARQL per se or want to extract information for re-use in a topic map. Could be interesting.

Good to see mention of our friends at Mondeca.

PragPub #42, December 2012

Filed under: Haskell,Programming — Patrick Durusau @ 3:48 pm


PragPub #42, December 2012 as HTML PDF epub mobi

Where you will find:

Web Programming in Haskell, Part One by Paul Callaghan.

The next few articles in this series on the Haskell language will look at web programming via Haskell. Web programming is supported in various ways, from low-level libraries for the basic operations like communicating over http and creation of html documents, up to sophisticated frameworks for building apps and performant web servers for running the apps. January’s article will look at the frameworks, primarily Yesod [U1]. This article will look at Fay [U2], which is a Haskell alternative to CoffeeScript.

Replicating the libraries and frameworks of other languages is not too difficult. The interesting question—following a key theme of this series of articles—is how we can use Haskell’s facilities to improve the experience of web programming. For example, less coding and better maintainability via more powerful abstractions, more safety through using the language style and type system to avoid certain problems, or even improved performance by supporting certain kinds of optimization behind the scenes. Even more interesting is to understand the cases and phenomena where Haskell isn’t enough and where more powerful techniques, like dependent types, might provide additional leverage.

The closer web browsers approach being the default app interface, the more important they become for delivery of topic maps.

This issue of PragPub, as always, has several items you will find interesting.

An upcoming issue will have a Raspberry Pi project.

Enjoy!

Algorithms: Design and Analysis, Part 2

Filed under: Algorithms,CS Lectures — Patrick Durusau @ 12:31 pm

Algorithms: Design and Analysis, Part 2 by Tim Roughgarden. (Coursera)

From the course description:

In this course you will learn several fundamental principles of advanced algorithm design: greedy algorithms and applications; dynamic programming and applications; NP-completeness and what it means for the algorithm designer; the design and analysis of heuristics; and more.

The course started December 3, 2012 so if you are going to join, best do so soon.

Update: Introduction to Complexity [Santa Fe Institute]

Filed under: Cellular Automata,Complexity,Fractals — Patrick Durusau @ 12:19 pm

The Santa Fe Institute has released the FAQ and syllabus for its “Introduction to Complexity” course in 2013.

The course starts January 28, 2013 and will last for eleven (11) weeks.

Lecture units:

  1. What is Complexity?
  2. Dynamics, Chaos, and Fractals
  3. Information, Order, and Randomness
  4. Cellular Automata
  5. Genetic Algorithms
  6. Self-Organization in Nature
  7. Modeling Social Systems
  8. Networks
  9. Scaling
  10. Cities as Complex Systems
  11. Course Field Trip; Final Exam

Funding permitting there may be a Complexity part II in the summer of 2013.

Your interest and participation in this course may help drive the appearance of the second course.

An earlier post on the course: Introduction to Complexity [Santa Fe Institute].

Normalizing company names with SPARQL and DBpedia

Filed under: DBpedia,RDF,SPARQL — Patrick Durusau @ 12:01 pm

Normalizing company names with SPARQL and DBpedia

Bob DuCharme writes:

Wikipedia page redirection data, waiting for you to query it.

If you send your browser to http://en.wikipedia.org/wiki/Big_Blue, you’ll end up at IBM’s page, because Wikipedia knows that this nickname usually refers to this company. (Apparently, it’s also a nickname for several high schools and universities.) This data pointing from nicknames to official names is also stored in DBpedia, which means that we we can use SPARQL queries to normalize company names. You can use the same technique to normalize other kinds of names—for example, trying to send your browser to http://en.wikipedia.org/wiki/Bobby_Kennedy will actually send it to http://en.wikipedia.org/wiki/Robert_F._Kennedy—but a query that sticks to one domain will have a simpler job. Description Logics and all that.

As always Bob is on the cutting edge of the use of a markup standard!

Possible topic map analogies:

  • create a second name cluster and the “normalized name” is an additional base name
  • move the “nickname” to a variant name (scope?) and update the base name to be the normalized name (with changes to sort/display as necessary)

I am assuming that Bob’s lang(?redirectsTo) = "en" operates like scope in topic maps.

Except that scope in topic map is represented by one or more topics, which means merging can occur between topics that represent the same language.

Mathematical Writing

Filed under: Mathematics,Writing — Patrick Durusau @ 7:09 am

Mathematical Writing by Don Knuth, Tracy Larrabee, and Paul M. Roberts.

From the course catalog:

CS 209. Mathematical Writing—Issues of technical writing and the effective presentation of mathematics and computer science. Preparation of theses, papers, books, and “literate” computer programs. A term paper on a topic of your choice; this paper may be used for credit in another course.

Stanford University, Fall, 1987.

An admixture of notes, citations, examples, and war stories by several legends in the world of CS.

While making a number of serious points about writing, the materials are also highly entertaining.

I first saw this in Christophe Lalanne’s A bag of tweets / November 2012.

The Elements of Statistical Learning (2nd ed.)

Filed under: Machine Learning,Mathematics,Statistical Learning,Statistics — Patrick Durusau @ 6:50 am

The Elements of Statistical Learning (2nd ed.) by Trevor Hastie, Robert Tibshirani and Jerome Friedman. (PDF)

The authors note in the preface to the first edition:

The field of Statistics is constantly challenged by the problems that science and industry brings to its door. In the early days, these problems often came from agricultural and industrial experiments and were relatively small in scope. With the advent of computers and the information age, statistical problems have exploded both in size and complexity. Challenges in the areas of data storage, organization and searching have led to the new field of “data mining”; statistical and computational problems in biology and medicine have created “bioinformatics.” Vast amounts of data are being generated in many fields, and the statistician’s job is to make sense of it all: to extract important patterns and trends, and understand “what the data says.” We call this learning from data.

I’m sympathetic to that sentiment but with the caveat that it is our semantic expectations of the data that give it any meaning to be “learned.”

Data isn’t lurking outside our door with “meaning” captured separate and apart from us. Our fancy otherwise obscures our role in the origin of “meaning” that we attach to data. In part to bolster the claim that the “facts/data say….”

It is us who take up the gauge for our mute friends, facts/data, and make claims on their behalf.

If we recognized those as our claims, perhaps we would be more willing to listen to the claims of others. Perhaps.

I first saw this in a tweet by Michael Conover.

Fast Parallel Sorting Algorithms on GPUs

Filed under: Algorithms,GPU,Parallel Programming,Sorting — Patrick Durusau @ 6:00 am

Fast Parallel Sorting Algorithms on GPUs by Bilal Jan, Bartolomeo Montrucchio, Carlo Ragusa, Fiaz Gul Khan, Omar Khan.

Abstract:

This paper presents a comparative analysis of the three widely used parallel sorting algorithms: OddEven sort, Rank sort and Bitonic sort in terms of sorting rate, sorting time and speed-up on CPU and different GPU architectures. Alongside we have implemented novel parallel algorithm: min-max butterfly network, for finding minimum and maximum in large data sets. All algorithms have been implemented exploiting data parallelism model, for achieving high performance, as available on multi-core GPUs using the OpenCL specification. Our results depicts minimum speed-up19x of bitonic sort against oddeven sorting technique for small queue sizes on CPU and maximum of 2300x speed-up for very large queue sizes on Nvidia Quadro 6000 GPU architecture. Our implementation of full-butterfly network sorting results in relatively better performance than all of the three sorting techniques: bitonic, odd-even and rank sort. For min-max butterfly network, our findings report high speed-up of Nvidia quadro 6000 GPU for high data set size reaching 224 with much lower sorting time.

Is there a GPU in your topic map processing future?

I first saw this in a tweet by Stefano Bertolo.

Impala Beta (0.3) + Cloudera Manager 4.1.2 [Get’m While Their Hot!]

Filed under: Cloudera,Hadoop,Impala,MapReduce — Patrick Durusau @ 5:46 am

Cloudera Impala Beta (version 0.3) and Cloudera Manager 4.1.2 Now Available by Vinithra Varadharajan.

If you are keeping your Hadoop ecosystem skills up to date, drop by Cloudera for the latest Impala beta and a new release of Cloudera Manager.

Vinithra reports that new releases of Impala are going to drop every two to four weeks.

You can either wait for the final release of Impala or read along and contribute to the final product with your testing and comments.

December 4, 2012

Ontopia Runs on Raspberry Pi [This Rocks!]

Filed under: Ontopia,Parallel Programming,Supercomputing — Patrick Durusau @ 3:18 pm

Ontopia Runs on Raspberry Pi by Kevin Trainor.

From the post:

I am pleased to report that I have had the Ontopia Topic Maps software running on my Raspberry Pi for the past week. Ontopia is a suite of open source tools for building, maintaining and deploying Topic Maps-based applications. The Raspberry Pi is an ultra-affordable ARM GNU/Linux box based upon the work of the Raspberry Pi Foundation. My experience in running the out-of-the-box Ontopia apps (Ontopoly topic map editor, Omnigator topic map browser, and Vizigator topic map vizualizer) has been terrific. Using the Raspberry Pi to run the Apache Tomcat server that hosts the Ontopia software, response time is as good or better than I have experienced when hosting the Ontopia software on a cloud-based Linux server at my ISP. Topic maps open quickly in all three applications and navigation from topic to topic within each application is downright snappy.

As you will see in my discussion of testing below, I have experienced good results with up to two simultaneous users. So, my future test plans include testing with more simultaneous users and testing with the Ontopia RDBMS Backend installed. Based upon the performance that I have experienced so far, I have high hopes. Stay tuned for further reports.

What a great way to introduce topic maps to experimenters!

Thanks Kevin!

Awaiting future results! (And for a Raspberry PI to arrive!)

See also: A Raspberry Pi Supercomputer

Continuum Unleases Anaconda on Python Analytics Community

Filed under: Analytics,Python — Patrick Durusau @ 1:05 pm

Continuum Unleases Anaconda on Python Analytics Community

From the post:

Python-based data analytics solutions and services company, Continuum Analytics, today announced the release of the latest version of Anaconda, its collection of libraries for Python that includes Numba Pro, IOPro and wiseRF all in one package.

Anaconda enables large-scale data management, analysis, and visualization for business intelligence, scientific analysis, engineering, machine learning, and more. The latest release, version 1.2.1, includes improved performance and feature enhancements for Numba Pro and IOPro.

Available for Windows, Mac OS X and Linux, Anaconda includes packages more than 80 popular numerical and scientific Python libraries used by scientists, engineers and data analysts, with a single integrated and flexible installer. The company says its goal is to seamlessly support switching between multiple versions of Python and other packages, via a “Python environments” feature that allows mixing and matching different versions of Python, Numpy and Scipy.

New features and upgrades in the latest version of Anaconda include performance and feature enhancements to Numba Pro and IOPro, improved conda command and in addition, Continuum has added Qt to Linux versions and has also added mdp, MLTK and pytest.

Oh, you might like the Continuum Analytics link.

And the direct Anaconda link as well.

I expect people to go elsewhere after reading my analysis or finding a resource of interest.

Isn’t that what the web is supposed to be about?

INSA Highlights Increasing Importance of Open Source

Filed under: Government,Government Data,Intelligence — Patrick Durusau @ 12:52 pm

INSA Highlights Increasing Importance of Open Source

From Recorded Future*:

The Intelligence and National Security Alliance (INSA) Rebalance Task Force recently released its new white paper “Expectations of Intelligence in the Information Age“.

We’re obviously big fans of open source analysis, so some of the lead observations reported by the task force really hit home. Here they are, as written by INSA:

  • The heightened expectations of decision makers for timely strategic warning and current intelligence can be addressed in significant ways by the IC through “open sourcing” of information.
  • “Open sourcing” will not replace traditional intelligence; decision makers will continue to expect the IC to extract those secrets others are determined to keep from the United States.
  • However, because decision makers will access open sources as readily as the IC, they will expect the IC to rapidly validate open source information and quickly meld it with that derived from espionage and traditional sources of collection to provide them with the knowledge desired to confidently address national security issues and events.

You can check out an interactive version of the full report here, and take a moment to visit Recorded Future to see how we’re embracing this synthesis of open source and confidential intelligence.

I have confidence that the IC will find ways to make their collection, recording, analysis and synthesis of information with traditional intelligence sources incompatible with each other.

After all, we are less than five (5) years away from some unknown level of sharing of traditional intelligence data: Read’em and Weep.

Let’s say there is some sort of intelligence sharing by 2017 (2012 + 5). That’s sixteen (16) years after 9/11.

Being mindful that sharing doesn’t mean integrated into the information flow of the respective agencies.

How does that saying go?

Once is happenstance.

Twice is coincidence.

Three times is enemy action?

Where does the continuing failure to share intelligence fall on that list?

(Topic maps can’t provide the incentives to make sharing happen, but they do make sharing possible for people with incentives to share.)


* I listed the entry as originating from Recorded Future. Why some blog authors find it difficult to identify themselves I cannot say.

Python Scientific Lecture Notes

Filed under: Programming,Python — Patrick Durusau @ 12:24 pm

Python Scientific Lecture Notes edited by Valentin Haenel, Emmanuelle Gouillart and Gaël Varoquaux.

From the description:

Teaching material on the scientific Python ecosystem, a quick introduction to central tools and techniques. The different chapters each correspond to a 1 to 2 hours course with increasing level of expertise, from beginner to expert.

Coverage? Here is the top level of the table of contents:

1. Getting started with Python for science
1.1. Scientific computing with tools and workflow
1.2. The Python language
1.3. NumPy: creating and manipulating numerical data
1.4. Getting help and finding documentation
1.5. Matplotlib: plotting
1.6. Scipy : high-level scientific computing
2. Advanced topics
2.1. Advanced Python Constructs
2.2. Advanced Numpy
2.3. Debugging code
2.4. Optimizing code
2.5. Sparse Matrices in SciPy
2.6. Image manipulation and processing using Numpy and Scipy
2.7. Mathematical optimization: finding minima of functions
2.8. Traits
2.9. 3D plotting with Mayavi
2.10. Sympy : Symbolic Mathematics in Python
2.11. scikit-learn: machine learning in Python

The contents are available in single and double sided PDF, HTML and example files, plus source code.

I first saw this in a tweet from Scientific Python.

New to Hadoop

Filed under: Cloudera,Hadoop,MapReduce — Patrick Durusau @ 12:08 pm

New to Hadoop

Cloudera has organized a seven step program for learning Hadoop!

  1. Read Up on Background
  2. Install Locally, Install a VM, or Spin Up on Cloud
  3. Explore Tutorials
  4. Get Trained Up
  5. Read Books
  6. Contribute!
  7. Participate!

It doesn’t list every possible resource but all the ones listed are high quality.

Following this program will build a solid basis for exploring the Hadoop ecosystem on your own.

How to Contribute to Apache Hadoop Projects, in 24 Minutes

Filed under: Hadoop,Programming — Patrick Durusau @ 11:54 am

How to Contribute to Apache Hadoop Projects, in 24 Minutes by Justin Kestelyn.

From the webpage:

So, you want to report a bug, propose a new feature, or contribute code or doc to Apache Hadoop (or a related project), but you don’t know what to do and where to start? Don’t worry, you’re not alone.

Let us help: in this 24-minute screencast, Clouderan Jeff Bean (@jwfbean) offers a step-by-step tutorial that explains why and how to contribute. Apache JIRA ninjas need not view, but anyone else with slight (or less) familiarity with that curious beast will find this information very helpful.

I have mentioned a number of Hadoop ecosystem projects and this is a nice overview of how to contribute to those projects.

Other than the examples, the advice is generally useful for any Apache project (or other projects for that matter).

I first saw this in a tweet from Cloudera.

node-webkit

Filed under: CSS3,HTML5,Interface Research/Design,Javascript — Patrick Durusau @ 10:14 am

node-webkit

From the webpage:

node-webkit is an app runtime based on Chromium and node.js. You can write native apps in HTML and Javascript with node-webkit. It also lets you to call Node.js modules directly from DOM and enables a new way of writing native applications with all Web technologies.

Will HTML5 and Javascript, via apps like node-webkit free users from developer based interfaces?

Developer based interfaces are intended to be useful for others, or at least I suspect so, but quite often fall short of the mark.

Apps like node-webkit should encourage rapid prototyping and less reluctance to trash yesterday’s interface code. (I said “should,” whether it will or not remains to be seen.)

Rather than a “full featured” topic map editor, how would you divide the task of authoring a topic map into pieces?

I first saw this in Christophe Lalanne’s A bag of tweets / November 2012.

NodeBox

Filed under: Graphics,NodeBox,Visualization — Patrick Durusau @ 9:53 am

NodeBox

From the webpage:

Using our open-source tools we enable designers to automate boring production challenges, visualize large sets of data and access the raw power of the computer without thinking in ones and zeroes. Our tools integrate with traditional design applications and run on many platforms.

NodeBox 3 requires either Mac OS or Windows.

NodeBox makes it easy to do data visualisations, generative design and complex production challenges.

NodeBox OpenGL

NodeBox for OpenGL is a free, cross-platform library for generating 2D animations with Python programming code. It is built on Pyglet and adopts the drawing API from NodeBox for Mac OS X (http://nodebox.net). It has built-in support for paths, layers, motion tweening, hardware-accelerated image effects, simple physics and interactivity.

I was reminded about NodeBox by a tweet from Christophe Lalanne’s A bag of tweets / November 2012.

PS: While looking at NodeBox OpenGL, you will enjoy reading the description of “City in a Bottle.”

The game environment is based on the principle of emergence (Goldstein, 1999). Organisms (plants and insects) start off with basic behaviouristic rules and goals. If an opponent is edible, attack it. If an opponent is stronger, flee. When cornered, fight back. Hide in a flock of relatives to minimize the chance of being singled out. Follow a food trail marked by a relative. Expand and defend a productive environment. Grow colourful feathers/flowers to incite reproduction.

Complex social behavior then emerges by itself as organisms interact with each other. Species with a good strategy will survive and evolve over time, will adapt, will look different. The gaming enviroment changes procedurally, there is no preprogrammed story or pathway. We don’t control the biotope. The creatures will find their own way and either co-exist or fight for limited space and food.

Functional Composition [Overtone/Clojure]

Filed under: Clojure,Functional Programming,Music — Patrick Durusau @ 6:06 am

Functional Composition by Chris Ford.

From the webpage:

A live-coding presentation on music theory and Bach’s “Canone alla Quarta” by @ctford.

Based on Overtone:

Overtone is an open source audio environment being created to explore musical ideas from synthesis and sampling to instrument building, live-coding and collaborative jamming. We use the SuperCollider synth server as the audio engine, with Clojure being used to develop the APIs and the application. Synthesizers, effects, analyzers and musical generators can be programmed in Clojure.

Come and join the Overtone Google Group if you want to get involved in the project or have any questions about how you can use Overtone to make cool sounds and music.

An inducement to learn Clojure and to better understand the influence of music on the second edition of HyTime.

I first saw this in Christophe Lalanne’s A bag of tweets / November 2012.

Interactive Models and La nuit blanche

Filed under: D3,Graphics,Visualization — Patrick Durusau @ 5:46 am

Jerome Cukier has posted a collection of simulations (Interactive Models) and a potential aid for museum visitors (La nuit blanche) using d3.

Hard to say which one I like better. The models page is an impressive demonstration of d3, but La nuit blanche has more immediate application with topic maps.

I first saw this in Christophe Lalanne’s A bag of tweets / November 2012.

December 3, 2012

Cloudera – Videos from Strata + Hadoop World 2012

Filed under: Cloudera,Hadoop,MapReduce — Patrick Durusau @ 7:20 pm

Cloudera – Videos from Strata + Hadoop World 2012

The link is to the main resources page, where you can find many other videos and other materials.

If you want Strata + Hadoop World 2012 videos specifically, search on Hadoop World 2012.

As of today, that pulls up 41 entries. Should be enough to keep you occupied for a day or so. 😉

10 Tips on Writing from David Ogilvy

Filed under: Writing — Patrick Durusau @ 7:13 pm

10 Tips on Writing from David Ogilvy by Maria Popova.

From the post:

How is your new year’s resolution to read more and write better holding up? After tracing the fascinating story of the most influential writing style guide of all time and absorbing advice on writing from some of modern history’s most legendary writers, here comes some priceless and pricelessly uncompromising wisdom from a very different kind of cultural legend: iconic businessman and original “Mad Man” David Ogilvy. On September 7th, 1982, Ogilvy sent the following internal memo to all agency employees, titled “How to Write”:

The one where I fail is: Never write more than two pages on any subject.

And could use the storing emails overnight and re-editing the next day.

Which ones would benefit you the most?

Crowd Sourcing Reflected Intelligence Using Search and Big Data [Webinar]

Filed under: LucidWorks,MapR — Patrick Durusau @ 5:07 pm

Crowd Sourcing Reflected Intelligence Using Search and Big Data

Date: December 13, 2012

Time: 10:00 am PT / 1:00 pm ET

From the webpage:

Anyone interested in drawing insights from their Big Data repository/project/application should attend this informative webinar brought to you by MapR and LucidWorks. LucidWorks Search is a development platform that accelerates and simplifies building highly secure, scalable, and cost-effective search applications.

This webinar will show:

  • how search users’ search behavior can be mined
  • how big data analytics can be applied to that raw data
  • how to redeploy that data back to the users to improve their experience

Experts from MapR and Lucidworks will show the strengths of combining the easiest, most dependable and fastest distribution for Hadoop with the real-time, ad hoc data accessibility of LucidWorks Search to provide analytic capabilities along with scalable machine learning algorithms for deeper insight into both content and user behavior.

Speakers: Grant Ingersoll, Chief Scientist for LucidWorks and Ted Dunning, Chief Application Architect for MapR.

I have seen Grant on video and it was great. If Ted is anywhere close to as good as Grant, this is going to be a webinar to remember!

“I Have Been Everywhere” by Johnny Cash

Filed under: Humor,Mapping,Maps,Music — Patrick Durusau @ 3:31 pm

A Real-Time Map of the Song “I Have Been Everywhere” by Johnny Cash

From the post:

Freelance web developer Iain Mullan has developed a map mashup titled “Johnny Cash Has Been EVERYWHERE (Man)!” [iainmullan.com].

The concept is simple yet funny: using a combination of an on-demand music service, an online lyrics catalog and some Google Maps programming magic, all the cities mentioned in the song are displayed simultaneously as they are mentioned during the song, as performed by Johnny Cash.

Some maps are meant to amuse.

BTW, Johnny prefers Safari or Chrome (as in won’t work with FireFox and I suspect IE as well).

Solving Problems with Graphs

Filed under: Faunus,Fulgora,Graphs,Titan — Patrick Durusau @ 3:20 pm

Solving Problems with Graphs by Marko A. Rodriguez.

Marko covers solving problems with graphs in general and then gives an overview of Titan (a distributed graph database), Faunus (graph analytic engine) and Fulgora (graph processor).

My only misgiving about graphs is that we know very little of the world’s data is stored in graph format. And that is unlikely to change in the foreseeable future. ETL will suffice convert some data to obtain the advantages of graph processing, but what of data that isn’t converted?

Unlike the W3C, I have a high degree of confidence that the world is not going to adapt itself to any one solution or even a range of solutions.

The majority of data (from a current perspective), will be in “legacy” formats, the next largest portion in the successful formats just prior to the latest one, and the smallest portion, the latest proposed new format.

Big data should address the “not my format” problem in addition to running after large amounts of sensor data.

AXLE: Advanced Analytics for Extremely Large European Databases

Filed under: Analytics,BigData,Orange,PostgreSQL — Patrick Durusau @ 2:25 pm

AXLE: Advanced Analytics for Extremely Large European Databases

From the webpage:

The objectives of the AXLE project are to greatly improve the speed and quality of decision making on real-world data sets. AXLE aims to make these improvements generally available through high quality open source implementations via the PostgreSQL and Orange products.

The project started in early November 2012. Will be checking back to see what is proposed for PostgreSQL and/or Orange.

Meld

Filed under: Meld,Versioning — Patrick Durusau @ 2:11 pm

Meld

From the webpage:

What is Meld?

Meld is a visual diff and merge tool targeted at developers. Meld helps you compare files, directories, and version controlled projects. It provides two- and three-way comparison of both files and directories, and has support for many popular version control systems.

Meld helps you review code changes and understand patches. It might even help you to figure out what is going on in that merge you keep avoiding

Features

  • Two- and three-way comparison between files and directories
  • Auto-merge mode (in development version)
  • Comparisons update as you type
  • Visualisations make it easier to compare your files
  • Actions on diff chunks make for easier merges
  • Supports Git, Bazaar, Mercurial, Subversion, etc.
  • …and more

Coming on the heels of Kevlin Henney’s Cool Code [Chess Program in 4.8 Tweets], I may have paid closer attention to this program than otherwise.

Still, source code has semantics and different ways of expressing those semantics, just like the usual topic map examples.

I first saw this in a tweet by Scientific Python.

« Newer PostsOlder Posts »

Powered by WordPress