Archive for September, 2012

T-Shirt Ideas for the Hadoop Team

Sunday, September 30th, 2012

T-Shirt Ideas for the Hadoop Team

Start the week with smile!

Now suggest t-shirt ideas for topic maps!

Local News?

Sunday, September 30th, 2012

The Pew Internet project has released: How people get local news and information in different communities, which shows how people get local news varies from community to community.

From the overview:

“Interest in community news on all kinds of topics is quite high in every type of community,” noted Kristen Purcell of the Pew Research Center’s Internet & American Life Project, a co-author of the report. “Still, people get local information in different ways depending on the type of community in which they live, and they differ in the degree to which digital and mobile platforms factor into their mix of sources.”

How would you use topic maps in the delivery of “local” information?

Assume that we are talking about campus based information since that is a fairly accessible community.

What means would you use to establish a baseline and to measure differences before and after implementation of a topic map based solution?

I first saw this at Full Text Reports.

Quantcast File System for Hadoop

Sunday, September 30th, 2012

Quantcast File System for Hadoop

From Alex Popescu’s myNoSQL news of a new file system for Hadoop.


Pointers to comments?

Twitter Semantics Opportunity?

Sunday, September 30th, 2012

Carl Bialik (Wall Street Journal) writes in Timing Twitter about the dangers of reading too much into tweet statistics and then says:

She [Twitter spokeswoman Elaine Filadelfo] noted that the company is being conservative in its counting, and that the true counts likely are higher than the ones reported by Twitter. For instance, the company didn’t include “Ryan” in its search tersm for the Republican convention, to avoid picking up tweets about, say, Ryan Gosling rather than those about Republican vice-presidential candidate Paul Ryan. And it has no way to catch tweets such as “beautiful dress” that are referring to presenters’ outfits during the Emmy Awards telecast. “You follow me during the Emmys, and you know I’m talking about the Emmys,” Filadelfo said of the hypothetical “beautiful dress” tweet. But Twitter doesn’t know that and doesn’t count that tweet.

Twitter may not “know” about the Emmys (they need to get out more) but certainly followers on Twitter did.

Followers probably bright enough to know which presenter was being identified in the tweet.

Imagine a crowd sourced twitter application where you follow particular people and add semantics to their tweets.

Might not return big bucks for the people adding semantics but if they were donating their time to an organization or group, could reach commercial mass.

We can keep waiting for computers to become dumb, at least, or we can pitch in to cover the semantic gap.

What do you think?

Moments that Matter … Moments that Don’t

Sunday, September 30th, 2012

Moments that Matter … Moments that Don’t by Doug Klein.

Another marketing jewel for the topic map crowd.

Doug writes (in part):

What’s needed is a new approach engineered around what the customer wants to hear from us, not what we want to say to them.

I could personally fill a tome or two with stuff that I want to say about topic maps and/or using topic maps, that doesn’t have much (anything?) to do with what customers want to hear about.

What about you?

Here are Doug’s take aways:

So what can we take away from all of this?

  1. Commit to learning why customers buy. It is critical for successful experience planning and innovation to update our traditional purchase behavior knowledge with insights into new channel-device integrations.
  2. Revisit how customers shop today. Really understanding how customers interact with your brand across the media and device landscape, both online and offline, results in understanding the “moments that matter” to your customers, and the ones that don’t.
  3. Focus on “attracting” customers, not acquiring them. Resist the temptation to overwhelm your customers by trying to be everywhere at all times and you will have a better chance at creating meaningful relationships at key buying moments.
  4. Understand that the traditional marketing funnel is dead. Customers control more of the decision cycle and are spending more time researching on their own. As a result, programs and properties need to be retooled with information in key channels like search, social, and CRM to answer customer questions before they have to ask.
  5. Be an environmentalist. As digital professionals, we need to stop polluting the airwaves. We must prioritize, focus, and do the few things that really matter to customers exceptionally well, instead of scrambling to do too many things only moderately well. In the end, a focused approached is a win-win for brands, their customers, and the entire community of experience designers and marketers.

Here is my fractured version for topic maps:

So what can we take away from all of this?

  1. Commit to learning why customers buy. Learn the price points for enhanced information. At what point will customers pay for better information and/or information services?
  2. Revisit how customers shop today. In our case, where do they look for information? What makes it valuable to them?
  3. Focus on “attracting” customers, not acquiring them. Some things are better done with a pencil and note card. Others with a relational database. Push topic maps where they will make the most difference to your customer.
  4. Understand that the traditional marketing funnel is dead. Not only broaden marketing channels but I would suggest powering marketing channels with topic maps.
  5. Be an environmentalist. Read from above. I can’t add anything to it.

What moments are you going to focus on?

Vowpal Wabbit, version 7.0

Sunday, September 30th, 2012

Vowpal Wabbit, version 7.0

From the post:

A new version of VW is out. The primary changes are:

  1. Learning Reductions: I’ve wanted to get learning reductions working and we’ve finally done it. Not everything is implemented yet, but VW now supports direct:
    1. Multiclass Classification –oaa or –ect.
    2. Cost Sensitive Multiclass Classification –csoaa or –wap.
    3. Contextual Bandit Classification –cb.
    4. Sequential Structured Prediction –searn or –dagger

    In addition, it is now easy to build your own custom learning reductions for various plausible uses: feature diddling, custom structured prediction problems, or alternate learning reductions. This effort is far from done, but it is now in a generally useful state. Note that all learning reductions inherit the ability to do cluster parallel learning.

  2. Library interface: VW now has a basic library interface. The library provides most of the functionality of VW, with the limitation that it is monolithic and nonreentrant. These will be improved over time.
  3. Windows port: The priority of a windows port jumped way up once we moved to Microsoft. The only feature which we know doesn’t work at present is automatic backgrounding when in daemon mode.
  4. New update rule: Stephane visited us this summer, and we fixed the default online update rule so that it is unit invariant.

There are also many other small updates including some contributed utilities that aid the process of applying and using VW.

Plans for the near future involve improving the quality of various items above, and of course better documentation: several of the reductions are not yet well documented.

A good test for your understanding of a subject is your ability to explain it.

Writing good documentation for projects like Vowpal Wabbit would benefit the project. And demonstrate your chops with the software. Something to consider.

On Legislative Collaboration and Version Control

Saturday, September 29th, 2012

On Legislative Collaboration and Version Control

John Wonderlich of the Sunlight Foundation writes:

We often are confronted with the idea of legislation being written and tracked online through new tools, whether it’s Clay Shirky’s recent TED talk, or a long, long list of experiments and pilot projects (including Sunlight’s and Rep. Issa’s MADISON) designed to give citizens a new view and voice in the production of legislation.

Proponents of applying version control systems to law have a powerful vision: a bill or law, with its history laid bare and its sections precisely broken out, and real names attached prominently to each one. Why shouldn’t we able to have that? And since version control systems are helpful to the point of absolute necessity in any collaborative software effort, why wouldn’t Congress employ such an approach?

When people first happen upon this idea, their reaction tends to fall into two camps, which I’ll refer to as triumphalist and dismissive.

John’s and the Sunlight Foundation’s view that legislative history of acts of Congress is a form of transparency is the view taught to high school civics classes. And about as naive as it comes.

True enough, there are extensive legislative histories for every act passed by Congress. That has very little to do with how laws come to be written, by who and for whose interests.

Say for example a lobbyist who has contributed to a Senator’s campaign is concerned with the rules for visa’s for computer engineers. He/she visits the Senator and just happens to have a draft of amendments, created by a well known Washington law firm, that addresses their needs. That document is studied by the Senator’s staff.

Lo and behold, similar language appears in a bill introduced by the Senator. (Or as an amendment to some other bill.)

The Senator will even say that he is sponsoring the legislation to further the interests of those “job creators” in the high tech industry. What gets left out is the access to the Senator by the lobbyist and the assistance in bringing that legislation to the fore.

Indulging governments in their illusions of transparency is the surest way to avoid meaningful transparency.

Now you have to ask yourself, who has an interest in avoiding meaningful transparency?

I first saw this at Legal Informatics (which has other links that will interest you).

Lucene’s new analyzing suggester [Can You Say Synonym?]

Saturday, September 29th, 2012

Lucene’s new analyzing suggester by Mike McCandless.

From the post:

Live suggestions as you type into a search box, sometimes called suggest or autocomplete, is now a standard, essential search feature ever since Google set a high bar after going live just over four years ago.

In Lucene we have several different suggest implementations, under the suggest module; today I’m describing the new AnalyzingSuggester (to be committed soon; it should be available in 4.1).

To use it, you provide the set of suggest targets, which is the full set of strings and weights that may be suggested. The targets can come from anywhere; typically you’d process your query logs to create the targets, giving a higher weight to those queries that appear more frequently. If you sell movies you might use all movie titles with a weight according to sales popularity.

You also provide an analyzer, which is used to process each target into analyzed form. Under the hood, the analyzed form is indexed into an FST. At lookup time, the incoming query is processed by the same analyzer and the FST is searched for all completions sharing the analyzed form as a prefix.

Even though the matching is performed on the analyzed form, what’s suggested is the original target (i.e., the unanalyzed input). Because Lucene has such a rich set of analyzer components, this can be used to create some useful suggesters:

One of the use cases that Mike mentions is use of the AnalyzingSuggester to suggest synonyms of terms entered by a user.

That presumes that you know the target of the search and likely synonyms that occur in it.

Use standard synonym sets and you will get standard synonym results.

Develop custom synonym sets and you can deliver time/resource saving results.

Twitter Social Network by @aneeshs (video lecture)

Saturday, September 29th, 2012

Video Lecture: Twitter Social Network by @aneeshs by Marti Hearst.

From the post:

Learn about weak ties, triadic closures, and personal pagerank, and how they all relate to the Twitter social graph from Aneesh Sharma:

Just when you think the weekend can’t get any better!


Amazon RDS Now Supports SQL Server 2012

Saturday, September 29th, 2012

Amazon RDS Now Supports SQL Server 2012

From the post:

The Amazon Relational Database Service (RDS) now supports SQL Server 2012.You can now launch the Express, Web, and Standard Editions of this powerful database from the comfort of the AWS Management Console. SQL Server 2008 R2 is still available, as are multiple versions and editions of MySQL and Oracle Database.

If you are from the Microsoft world and haven't heard of RDS, here's the executive summary: You can run the latest and greatest offering from Microsoft in a fully managed environment. RDS will install and patch the database, make backups, and detect and recover from failures. It will also provide you with a point-and-click environment to make it easy for you to scale your compute resources up and down as needed.

What's New?
SQL Server 2012 supports a number of new features including contained databases, columnstore indexes, sequences, and user-defined roles:

  • A contained database is isolated from other SQL Server databases including system databases such as "master." This isolation removes dependencies and simplifies the task of moving databases from one instance of SQL Server to another.
  • Columnstore indexes are used for data warehouse style queries. Used properly, they can greatly reduce memory consumption and I/O requests for large queries.
  • Sequences are counters that can be used in more than one table.
  • The new user-defined role management system allows users to create custom server roles.

Read the SQL Server What's New documentation to learn more about these and other features.

I almost missed this!

It is about the only way I am going to get to play with SQL Server. I don’t have a local Windows sysadmin to maintain the server, etc.

Amazon RDS for Oracle Database – Now Starting at $30/Month

Saturday, September 29th, 2012

Amazon RDS for Oracle Database – Now Starting at $30/Month by Jeff Barr.

From the post:

You can now create Amazon RDS database instances running Oracle Database on Micro instances.

This new, option will allow you to build, test, and run your low-traffic database-backed applications at a cost starting at $30 per month ($0.04 per hour) using the License Included option. If you have a more intensive application, the micro instance enables you to get hands on experience with Amazon RDS before you scale up to a larger instance size. You can purchase Reserved Instances in order to further lower your effectively hourly rate.

These instances are available now in all AWS Regions. You can learn more about using Amazon RDS for managing Oracle database instances by attending this webinar.

Oracle databases aren’t for the faint of heart but they are everywhere in enterprise settings.

If you are or aspire to be working with enterprise information systems, the more you know about Oracle databases the more valuable you become.

To your employer and your clients.

Hadoop as Java Ecosystem “MVP”

Saturday, September 29th, 2012

Apache Hadoop Wins Duke’s Choice Award, is a Java Ecosystem “MVP” by Justin Kestelyn.

From the post:

For those of you new to it, the Duke’s Choice Awards program was initiated by Sun Microsystems in 2002 in an effort to “celebrate extreme innovation in the world of Java technology” – in essence, it’s the “MVP” of the Java ecosystem. Since it acquired Sun in 2009, Oracle has continued the tradition of bestowing the award, and in fact has made the process more community-oriented by accepting nominations from the public and involving Java User Groups in the judging effort.

For the 2012 awards, I’m happy to report that Apache Hadoop is among the awardees – which also include the United Nations High Commission for Refugees, Liquid Robotics, and Java cloud company Jelastic Inc., among others.

Very cool!

Kudos to the Apache Hadoop project!

MySQL Schema Agility on SSDs

Saturday, September 29th, 2012

MySQL Schema Agility on SSDs by Tim Callaghan.

From the post:

TokuDB v6.5 adds the ability to expand certain column types without downtime. Users can now enlarge char, varchar, varbinary, and integer columns with no interruption to insert/update/delete statements on the altered table. Prior to this feature, enlarging one of these column types required a full table rebuild. InnoDB blocks all insert/update/delete operations to a table during column expansion as it rebuilds the table and all indexes.

Not sure how often you will need the ability to enlarge columns types without downtime but when you do, suspect it is mission critical.

Something to keep in mind while planning for uncertain data futures.

Balancing Your “….Political News Reading Habits”

Saturday, September 29th, 2012

Browser Plugin Helps People Balance Their Political News Reading Habits

From the post:

As the U.S. presidential election approaches, many voters become voracious consumers of online political news. A tool by a University of Washington researcher tracks whether all those articles really provide a balanced view of the debate — and, if not, suggests some sites that offer opinions from the other side of the political spectrum.

Balancer (, a free plug-in for Google’s Chrome browser, was developed this summer by Sean Munson, a new UW assistant professor of Human Centered Design and Engineering. The tool analyzes a person’s online reading habits for a month and calculates the political bias in that behavior. It then suggests sites that represent a different point of view and continues to monitor reading behavior and offer feedback.

“I was a bit surprised when I was testing out the tool to learn just how slanted my own reading behavior was,” Munson said. “Even self-discovery is a valuable outcome, just being aware of your own behavior. If you do agree that you should be reading the other side, or at least aware of the dialogue in each camp, you can use it as a goal: Can I be more balanced this week than I was last week?”

The tool classifies more than 10,000 news websites and sections of news websites on a spectrum ranging from far left to far right, using results of previous studies and existing media-bias indices. For a few popular sites the tool also tries to classify individual columnists whose views may be different from those of the overall publication’s slant.

If you think being “informed,” as opposed to owning a stable of elected officials, makes a difference, this is the plugin for you.

The same principle could monitor your technical reading, to keep your reading a mixture of classic and new material.

A service that maps across terminology differences to send you the latest research could be quite useful.

So you would not have to put forth all that effort to remain current.

Visual Clues: A Brain “feature,” not a “bug”

Saturday, September 29th, 2012

You will read in When Your Eyes Tell Your Hands What to Think: You’re Far Less in Control of Your Brain Than You Think that:

You’ve probably never given much thought to the fact that picking up your cup of morning coffee presents your brain with a set of complex decisions. You need to decide how to aim your hand, grasp the handle and raise the cup to your mouth, all without spilling the contents on your lap.

A new Northwestern University study shows that, not only does your brain handle such complex decisions for you, it also hides information from you about how those decisions are made.

“Our study gives a salient example,” said Yangqing ‘Lucie’ Xu, lead author of the study and a doctoral candidate in psychology at Northwestern. “When you pick up an object, your brain automatically decides how to control your muscles based on what your eyes provide about the object’s shape. When you pick up a mug by the handle with your right hand, you need to add a clockwise twist to your grip to compensate for the extra weight that you see on the left side of the mug.

“We showed that the use of this visual information is so powerful and automatic that we cannot turn it off. When people see an object weighted in one direction, they actually can’t help but ‘feel’ the weight in that direction, even when they know that we’re tricking them,” Xu said. (emphasis added)

I never quite trusted my brain and now I have proof that it is untrustworthy. Hiding stuff indeed! 😉

But that’s the trick of subject identification/identity isn’t it?

That our brains “recognize” all manner of subjects without any effort on our part.

Another part of the effortless features of our brains. But it hides the information we need to integrate information stores from ourselves and others.

Or rather, making it more work than we are usually willing to devote to digging it out.

When called upon to be “explicit” about subject identification, or even worse, to imagine how other people identify subjects, we prefer to stay at home consuming passive entertainment.

Two quick points:

First, need to think about how to incorporate this “feature” into delivery interfaces for users.

Second, what subjects would users pay others to mine/collate/identify for them? (Delivery being a separate issue.)

Topic Map Modeling of Sequestration Data (Help Pls!)

Saturday, September 29th, 2012

With the political noise in the United States over presidential and other elections, it is easy to lose sight of a looming “sequestration” that on January 2, 2013 will result in:

10.0% reduction non-exempt defense mandatory funding
9.4% reduction non-exempt defense discretionary funding
8.2% reduction non-exempt nondefense discretionary funding
7.6% reduction non-exempt nondefense mandatory funding
2.0% reduction Medicare

The report is not a model of clarity/transparency. See: U.S. Sequestration Report – Out of the Shadows/Into the Light?.

Report caveats make it clear cited amounts are fanciful estimates that can change radically as more information becomes available.

Be that as it may, a topic map based on the reported accounts as topics can capture the present day conjectures. To say nothing of capturing future revelations of exact details.

Whether from sequestration or from efforts to avoid sequestration.

Tracking/transparency has to start somewhere and it may as well be here.

In evaluating the data for creation of a topic map, I have encountered an entry with a topic map modeling issue.

I could really use your help.

Here is the entry in question:

Department of Health and Human Services, Health Resources and Services Administration, 009-15-0350, Health Resources and Services, Nondefense Function, Mandatory (page 80 of Appendix A, page 92 of the pdf of the report):

BA Type BA Amount Sequester Percentage Sequester Amount
Sequestrable BA 514 7.6 39
Sequestrable BA
– special rule
1352 2.0 27
Exempt BA 10
Total Gross BA 1876
Offsets -16
Net BA 1860

If it read as follows, no problem.

Example: Not Accurate

BA Type BA Amount Sequester Percentage Sequester Amount
Sequestrable BA 514 7.6 39
Sequestrable BA
– special rule
1352 2.0 27
Total Gross BA 1876

Because there is no relationship between “Exempt BA” and “Offsets” to either “Sequestrable BA” or “Sequestrable BA – special rule.” I just report both of them with the percentages and total amounts to be withheld.

True, the percentages don’t change, nor does the amount to be withheld change, because of the “Exempt BA” or the “Offsets.” (Trusting soul that I am, I did verify the calculations. 😉 )

Problem: How do I represent the relationship between the “Exempt BA” and “Offsets” to either/or/both “Sequestrable BA,” “Sequestrable BA – special rule?”

Of the 1318 entries in Appendix A of this report, including this one, it is the only entry with this issue. (A number of accounts are split into discretionary/mandatory parts. I am counting each part as a separate “entry.”)

If I ignore “Exempt BA” and “Offsets” in this case, my topic map is an incomplete representation of Appendix A.

It is also the case that I want to represent the information “as written.” There may be some external explanation that clarifies this entry, but that would be an “addition” to the original topic map.


Alan Gates CHUGs HCatalog in Windy City

Friday, September 28th, 2012

Alan Gates CHUGs HCatalog in Windy City (Chicago Hadoop User Group) by Kim Truong

From the post:

Alan Gates presented HCatalog to the Chicago Hadoop User Group (CHUG) on 9/17/12. There was a great turnout, and the strength of CHUG is evidence that Chicago is a Hadoop city. Below are some kind words from the host, Mark Slusar.

On 9/17/12, the Chicago Hadoop User Group (CHUG) was delighted to host Hortonworks Co-Founder Alan Gates to give an overview of HCatalog. In addition to downtown Chicago meetups, Allstate Insurance Company in Northbrook, IL hosts regular Chicago Hadoop User Group Meetups. After noshing on refreshments provided by Hortonworks, attendees were treated to an in-depth overview of HCatalog, it’s history, as well as how and when to use it. Alan’s experience and expertise were an excellent contribution to CHUG. Alan made a great connection with every attendee. With his detailed lecture, he answered many questions, and also joined a handful of attendees for drinks after the meetup. CHUG would be thrilled to have Alan & Hortonworks team return in the future!” – Mark Slusar

What a great way to start the weekend!


Mathematics at Google

Friday, September 28th, 2012

Mathematics at Google by Javier Tordable.

High spots history of Google with an emphasis on the mathematics that came into play.

Highly motivation!

I first saw this at: Four short links: 28 September 2012 by Nat Torkington.

Nat remarks this should encourage high school and college students to do their homework.

True, but post-college folk should also maintain math literacy.

Not just having math skills but also recognizing the unspoken assumptions in mathematical techniques.

Representing Solutions with PMML (ACM Data Mining Talk)

Friday, September 28th, 2012

Representing Solutions with PMML (ACM Data Mining Talk)

Dr. Alex Guazzelli’s talk on PMML and Predictive Analytics to the ACM Data Mining Bay Area/SF group at the LinkedIn auditorium in Sunnyvale, CA.


Data mining scientists work hard to analyze historical data and to build the best predictive solutions out of it. IT engineers, on the other hand, are usually responsible for bringing these solutions to life, by recoding them into a format suitable for operational deployment. Given that data mining scientists and engineers tend to inhabit different information worlds, the process of moving a predictive solution from the scientist’s desktop to the operational environment can get lost in translation and take months. The advent of data mining specific open standards such as the Predictive Model Markup Language (PMML) has turned this view upside down: the deployment of models can now be achieved by the same team who builds them, in a matter of minutes.

In this talk, Dr. Alex Guazzelli not only provides the business rationale behind PMML, but also describes its main components. Besides being able to describe the most common modeling techniques, as of version 4.0, released in 2009, PMML is also capable of handling complex pre-processing tasks. As of version 4.1, released in December 2011, PMML has also incorporated complex post-processing to its structure as well as the ability to represent model ensemble, segmentation, chaining, and composition within a single language element. This combined representation power, in which an entire predictive solution (from pre-processing to model(s) to post-processing) can be represented in a single PMML file, attests to the language’s refinement and maturity.

I hesitated at the story of replacing IT engineers with data scientists. Didn’t we try that one before?

But then it was programmers with business managers. And it was called COBOL. 😉

Nothing against COBOL, it is still in use today. Widespread use as a matter of fact.

But all tasks, including IT engineering, look easy from a distance. Only after getting poor results is that lesson learned. Again.

What have your experiences been with PMML?

Windows into Relational Events: Data Structures for Contiguous Subsequences of Edges

Friday, September 28th, 2012

Windows into Relational Events: Data Structures for Contiguous Subsequences of Edges by Michael J. Bannister, Christopher DuBois, David Eppstein, Padhraic Smyth.


We consider the problem of analyzing social network data sets in which the edges of the network have timestamps, and we wish to analyze the subgraphs formed from edges in contiguous subintervals of these timestamps. We provide data structures for these problems that use near-linear preprocessing time, linear space, and sublogarithmic query time to handle queries that ask for the number of connected components, number of components that contain cycles, number of vertices whose degree equals or is at most some predetermined value, number of vertices that can be reached from a starting set of vertices by time-increasing paths, and related queries.

Among other interesting questions, raises the issue of what time span of connections constitutes a network of interest? More than being “dynamic.” A definitional issue for the social network in question.

If you are working with social networks, a must read.

PS: You probably need to read: Relational events vs graphs, a posting by David Eppstein.

David details several different terms for “relational event data,” and says there are probably others they did not find. (Topic maps anyone?)

Parametric matroid of rough set

Friday, September 28th, 2012

Parametric matroid of rough set by Yanfang Liu, William Zhu. ( for the first author, DBLP for the second.)


Rough set is mainly concerned with the approximations of objects through an equivalence relation on a universe. Matroid is a combinatorial generalization of linear independence in vector spaces. In this paper, we define a parametric set family, with any subset of a universe as its parameter, to connect rough sets and matroids. On the one hand, for a universe and an equivalence relation on the universe, a parametric set family is defined through the lower approximation operator. This parametric set family is proved to satisfy the independent set axiom of matroids, therefore it can generate a matroid, called a parametric matroid of the rough set. Three equivalent representations of the parametric set family are obtained. Moreover, the parametric matroid of the rough set is proved to be the direct sum of a partition-circuit matroid and a free matroid. On the other hand, since partition-circuit matroids were well studied through the lower approximation number, we use it to investigate the parametric matroid of the rough set. Several characteristics of the parametric matroid of the rough set, such as independent sets, bases, circuits, the rank function and the closure operator, are expressed by the lower approximation number.

If you are guessing this isn’t the “simpler” side of topic maps, you are right in one!

There are consumers of information/services (herein of “simpler” services of topic maps), authors of information/services (herein of semantic diversity by whatever tools), and finally, semantic intermediaries, map makers that cross the boundaries of semantic universes of discourse (here be dragons).

Not every aspect of topic maps is for everyone and we should not pretend otherwise.

Procedia Computer Science

Friday, September 28th, 2012

Procedia Computer Science. Elsevier.

From about this journal:

Launched in 2009, Procedia Computer Science is an electronic product focusing entirely on publishing high quality conference proceedings. Procedia Computer Science enables fast dissemination so conference delegates can publish their papers in a dedicated online issue on ScienceDirect, which is then made freely available worldwide.

Only ten (10) volumes but open access.

The Proceedings of the International Conference on Computational Science, 2010, 2011, 2012, are all 2,000+ pages. With two hundred and twenty-five (225) articles in the 2012 volume, I am sure you will find something interesting.

Don’t neglect the other volumes but that’s where I am starting.

2013 Workshop on Interoperability in Scientific Computing

Friday, September 28th, 2012

2013 Workshop on Interoperability in Scientific Computing

From the post:

The 13th annual International Conference on Computational Science (ICCS 2013) will be held in Barcelona, Spain from 5th – 7th June 2013. ICCS is an ERA 2010 ‘A’-ranked conference series. For more details on the main conference, please visit The 2nd Workshop on Interoperability in Scientific Computing (WISC ’13) will be co-located with ICCS 2013.

Approaches to modelling take many forms. The mathematical, computational and encapsulated components of models can be diverse in terms of complexity and scale, as well as in published implementation (mathematics, source code, and executable files). Many of these systems are attempting to solve real-world problems in isolation. However the long-term scientific interest is in allowing greater access to models and their data, and to enable simulations to be combined in order to address ever more complex issues. Markup languages, metadata specifications, and ontologies for different scientific domains have emerged as pathways to greater interoperability. Domain specific modelling languages allow for a declarative development process to be achieved. Metadata specifications enable coupling while ontologies allow cross platform integration of data.

The goal of this workshop is to bring together researchers from across scientific disciplines whose computational models require interoperability. This may arise through interactions between different domains, systems being modelled, connecting model repositories, or coupling models themselves, for instance in multi-scale or hybrid simulations. The outcomes of this workshop will be to better understand the nature of multidisciplinary computational modelling and data handling. Moreover we hope to identify common abstractions and cross-cutting themes in future interoperability research applied to the broader domain of scientific computing.

How is your topic map information product going to make the lives of scientists simpler?

Introducing BOSS Geo – the next chapter for BOSS

Friday, September 28th, 2012

Introducing BOSS Geo – the next chapter for BOSS

From the post:

Today, the Yahoo! BOSS team is thrilled to announce BOSS Geo, new additions to our Search API that’s designed to help foster innovation in the search industry. BOSS Geo, comprised of two popular services – PlaceFinder and PlaceSpotter – now offers powerful, new geo services to BOSS developers.

Geo is increasingly important in today’s always-on, mobile world and adding features like these have been among the most requested we’ve received from our developers. With mobile devices becoming more pervasive, users everywhere want to be able to quickly pull up relevant geo information like maps or addresses. By adding PlaceFinder and PlaceSpotter to BOSS, we’re arming developers with rich new tools for driving more valuable and personalized interactions with their users.

PlaceFinder – Geocoding made simple

PlaceFinder is a geocoder (and reverse geocoder) service. The service helps developers convert an address into a latitude/longitude and alternatively, if you provide a latitude/longitude it can resolve it to an address. Whether you are building a check-in service or want to show an address on a map, we’ve got you covered. PlaceFinder already powers several popular applications like foursquare. which uses it to power check-ins on their mobile application. BOSS PlaceFinder offers tiered pricing and one simple monthly bill.

(graphics omitted)

PlaceSpotter – Adding location awareness to your content

The PlaceSpotter API (formerly known as PlaceMaker) allows developers to take a piece of content, pull rich information about the locations mentioned and provide meaning to those locations. A news article is no longer just text but has rich, meaningful geographical information associated with it. For instance, the next time your users are reading a review of a cool new coffee shop in the Mission neighborhood in San Francisco, they can discover another article about a hip new bakery in the same neighborhood. Learn more on the new PlaceSpotter service.

What information would you merge using address data as a link point?

Amsterdam (Netherlands) is included. Perhaps sexual preferences in multiple languages, keyed to your cell phone’s location? (Or does that exist already?)


We intend to shut down the current free versions of PlaceFinder and PlaceMaker on November 17, 2012.

Development using YQL tables will still be available.

Are Topic Maps Making Your Life Simpler?

Friday, September 28th, 2012

Megan Geyer in Attributes of Innovation dives into the details behind the call:

“Be innovative!”

You may have heard this from your boss or colleagues. Everyone wants to be ahead of the curve and lead their industry—to set an example for others to follow. In the digital sphere, customer- and service-oriented products are in the midst of a great many innovations right now, with the emergence of elements like cloud computing, tablets, mobile location services, and social media integration. Now is a fertile time for innovation. But what does it take to be innovative? What does an innovative product look like?

There are attributes that materialize differently depending on the product or service, but they are attributes that all innovations have in common. When these attributes of innovation are combined, the resulting product or service often exceeds the expectations of current user experiences and pushes the field of UX design forward. In particular scenarios such as enterprise IT or the public sector, these common attributes can seem daunting. They can sometimes even seem irrelevant. But in successful and innovative ideas, they are always present. (emphasis added)

All of the points she makes are relevant to topic maps but none more than:

Innovation is Simple

Think about some of the most innovative ideas and products you’ve seen in the last 20 years. What’s a common factor they all share? They make your life easier. They do not add complexity or confusion. They simplify things, make things more accessible, or bring comfort to your life. You may have to spend some time learning to use the new product or service. It may take a while for it to become ingrained in your everyday life. But when you use it, the innovation makes your life easier. (emphasis added)

Ask yourself: Are topic maps making your life simpler?

If you answer is no, that signals a problem that needs to be addressed in marketing topic maps. (The same argument applies to RDF, which after $billion in funding, puff pieces in SciAM, etc., is still struggling.)

My answer to: “What can I do with topic maps?” of “Anything that you want.” is about as helpful as a poke with a sharp stick.

Users aren’t interested in doing “anything” or even “everything.” They have a particular “something” they want to do.

Promoting topic maps requires finding “somethings” with have value for users.

Let the “somethings” be what sells topic maps.

Three Ways that Fractal Tree Indexes Improve SSD for MySQL

Friday, September 28th, 2012

Three Ways that Fractal Tree Indexes Improve SSD for MySQL

The three advantages:

  • Advantage 1: Index maintenence performance.
  • Advantage 2: Compression.
  • Advantage 3: Reduced wear.

See the post for details and the impressive numbers one expects from Fractal tree indexes.

Merging Data Sets Based on Partially Matched Data Elements

Friday, September 28th, 2012

Merging Data Sets Based on Partially Matched Data Elements by Tony Hirst.

From the post:

A tweet from @coneee yesterday about merging two datasets using columns of data that don’t quite match got me wondering about a possible R recipe for handling partial matching. The data in question related to country names in a datafile that needed fusing with country names in a listing of ISO country codes.

Reminds me of the techniques used in record linkage (epidemiology). There, unlike topic maps, the records from diverse sources were mapped to a target record layout and then analyzed.

Quite powerful but lossy with regard to the containers of the original data.

JSONize Anything in Pig with ToJson

Thursday, September 27th, 2012

JSONize Anything in Pig with ToJson by Russell Jurney.

The critical bit reads:

That is precisely what the ToJson method of pig-to-json does. It takes a bag or tuple or nested combination thereof and returns a JSON string.

See Russell’s post for the details.

About Apache Flume FileChannel

Thursday, September 27th, 2012

About Apache Flume FileChannel by Brock Noland.

From the post:

This blog post is about Apache Flume’s File Channel. Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.

FileChannel is a persistent Flume channel that supports writing to multiple disks in parallel and encryption.

Just in case you are one of those folks with large amounts of data to move about.