Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

June 2, 2013

UK Income Tax and National Insurance

Filed under: Graphics,Marketing,Visualization — Patrick Durusau @ 8:53 am

http://www.youtube.com/watch?feature=player_embedded&v=C9ZMgG9NiUs

Now if I could just come up with the equivalent of this video for semantic diversity.

Suggestions?

I first saw this at Randy Krum’s Cool Infographics.

May 31, 2013

Going Bright… [Hack Shopping Mall?]

Filed under: Cybersecurity,Marketing,Security — Patrick Durusau @ 9:40 am

Going Bright: Wiretapping without Weakening Communications Infrastructure by Steven M. Bellovin, Matt Blaze, Sandy Clark, and Susan Landau (unofficial version). (Steven M. Bellovin, Matt Blaze, Sandy Clark, Susan Landau, “Going Bright: Wiretapping without Weakening Communications Infrastructure,” IEEE Security & Privacy, vol. 11, no. 1, pp. 62-72, Jan.-Feb. 2013, doi:10.1109/MSP.2012.138)

Abstract:

Mobile IP-based communications and changes in technologies, including wider use of peer-to-peer communication methods and increased deployment of encryption, has made wiretapping more difficult for law enforcement, which has been seeking to extend wiretap design requirements for digital voice networks to IP network infrastructure and applications. Such an extension to emerging Internet-based services would create considerable security risks as well as cause serious harm to innovation. In this article, the authors show that the exploitation of naturally occurring weaknesses in the software platforms being used by law enforcement’s targets is a solution to the law enforcement problem. The authors analyze the efficacy of this approach, concluding that such law enforcement use of passive interception and targeted vulnerability exploitation tools creates fewer security risks for non-targets and critical infrastructure than do design mandates for wiretap interfaces.

The authors argue against an easy-on-ramp for law enforcement to intercept digital communications.

What chance is there a non-law enforcement person could discover such back doors and also be so morally depraved as to take advantage of them?

What could possibly go wrong with a digital back door proposal? 😉

No lotteries for 0-day vulnerabilities but the article does mention:

Secunia, https://secunia.com/community/advisories

VulnerabilityLab, www.vulnerability-lab.com

Vupen, www.vupen.com/english/services/solutions-gov.php

ZDI, http://dvlabs.tippingpoint.com/advisories/disclosure-policy

as offering

subscription services that make available varying levels of access information about 0-day vulnerabilities to their clients.

As far as the FBI is concerned, they should adapt to changing technology and stop being a drag on communications technology.

You do know they still don’t record interviews with witnesses?

How convenient when it comes time for a trial on obstruction of justice or perjury. All the evidence is an agent’s notes of the conversation.

BTW, in case you are looking for a cybersecurity/advertising opportunity, you have seen those services that gather up software packages for comparison price shopping?

Why not a service that gathers up software packages and displays unresolved (and/or historical) hacks on those products?

With ads from security services, hackers, etc.

A topic map powered hack shopping mall as it were.

May 28, 2013

Why Tumblr Was a Massive Steal for Yahoo [There will be a test.]

Filed under: Marketing,Topic Maps — Patrick Durusau @ 4:07 pm

Why Tumblr Was a Massive Steal for Yahoo by Adam Rifkin.

Adam makes two critical points for topic maps marketing:

First, Tumblr is an interest not a social graph. That is people are looking for context of interest to them.

Second, as Adam writes:

Writers have time but no money. Certain groups are going to be overrepresented: Students, stay-at-home moms, the underemployed, retirees. Epinions, which paid for product reviews, especially ran into issues with writers whose relationship to reviewed products lay more in the realm of fantasy than reality. Writers are also going to have the time and emotional commitment to give your site a lot of feedback about their needs and desires … many of which will be counter to the best interests of the business.

Readers have money but no time. They don’t want to spend hours combing the Internet for photos of vintage jewelry. They want to see a picture of a watch they like, and buy it now. If readers don’t find your content valuable, they’re not going to send you a long email about what they don’t like. They’ll just silently hit the back button and get gone.

Test:

In marketing a topic map, which of the following are more important?:

  1. Writers can contribute their input to a public topic map interface?
  2. Readers can purchase access to high quality content?

You have until your VC funding runs out to decide.

May 27, 2013

Zero Day / Leaker’s Lottery

Filed under: Cybersecurity,Marketing,Security — Patrick Durusau @ 1:09 pm

This graphic at the Economist:

lottery graphic

made me think of an alternative to brokers for zero day exploits, a Zero Day Lottery!

Take a known reliable source of zero day exploits like “the Grugq” (see: Shopping For Zero-Days: A Price List For Hackers’ Secret Software Exploits and setup a weekly lottery for zero day exploits.

Every week without a winner, rolls another zero day exploit into the final prize package.

Would have to work out the details but authors of zero day exploits included in the prize would share in some percentage of the cash spent on lottery tickets.

The runner of the lottery should get say 20% of the bets with some percentage of the remaining funds being used for contests to develop zero day exploits.

Same principles apply for a Leaker’s Lottery!

Except there some of the proceeds for a leak would be split among the leakers.

Could you be a news or government agency and refuse to buy a ticket?

Or even a large block of tickets?

Consider what the Pentagon Papers would have attracted as a lottery prize.

Zero Day / Leakers Lotteries have the potential to put hacking/leaking on a firm financial basis.

Interested?

May 25, 2013

King.com’s Climb to the Social Gaming Throne [TM Incentives]

Filed under: Games,Marketing — Patrick Durusau @ 1:53 pm

King.com’s Climb to the Social Gaming Throne by Karina Babcock.

From the post:

This week I’d like to highlight King.com, a European social gaming giant that recently claimed the throne for having the most daily active users (more than 66 million). King.com has methodically and successfully expanded its reach beyond mainstream social gaming to dominate the mobile gaming market — it offers a streamlined experience that allows gamers to pick up their gaming session from wherever they left off, in any game and on any device. King.com’s top games include “Candy Crush Saga” and “Bubble Saga”.

And — you guessed it — King.com runs on CDH.

With a business model that offers all games for free, King.com relies advertising and in-game products like boosters and extra lives to generate revenue. In other words, it has to be smart in every communication with customers in order to create value for both the gamer and the advertiser.

King.com uses Hadoop to process, store, and analyze massive volumes of log data generated from the games along with other data sources such as daily currency exchange rates from the European Central bank, multiple metadata feeds, and advertising servers’ log files.

Karina ends with links to more details on the Hadoop setup at King.com.

I don’t know how to make a useful topic map as easy as “Candy Crush Saga” or “Bubble Saga,” but you might.

Or perhaps a combination of topic maps and games.

For example, buying up extra lives for popular games and they are awarded as incentives for uses of a topic map interface?

You can search G.*e with no prize or use Topic Map X, with a prize.

Which one would you choose?

Protests of unfairness from a house that rigs counts aren’t going to bother me.

You?

May 15, 2013

Causium Sales Model

Filed under: Marketing,Topic Maps — Patrick Durusau @ 3:16 pm

Atlassian’s Causium Sales Model Reaches $2.5 Million Charity Donations by Kit Eaton.

From the post:

Back in May 2010 Atlassian, a large innovative software company, revealed that its alternative business model causium had allowed it to donate $500,000 to international literacy improvement charity Room to Read. Now the firm says it has surpassed $2.5 million in donations, and is holding a special event with the charity on May 14th to celebrate.

Causium is an alternative to the freemium business model that many companies–from the Wall Street Journal to Babbel–follow. Under freemium thinking, Atlassian would give away some of its enterprise-grade code for free in order to attract business for its paid services. But instead, the company charges a nominal $10 fee, which it then donates to charity. The fee works in two ways–as a boost to charitable causes, and also to demonstrate to the software’s end-users that the code itself has value.

Atlassian’s President Jay Simons spoke to Fast Company, explaining that the plan has worked better than they expected: “We didn’t appreciate at the time that we were effectively building this annuity stream. Customers that buy the 10-user license will buy it again the following year.” The first year of the plan resulted in some $300,000 in charity donations, and the growth of the company’s reputation since means they donated the same amount in the first quarter of 2013. The donations are important to Room to Read, Simons says, because “they have a reliable funding source” on a regular basis.

Important to note that Atlassian had the market presence to make a causium sales model work.

On that score, see:

Why Atlassian is to Software as Apple is to Design by Mark Fidelman.

and, of course:

Atlassian.

Important lessons if you hope to make your software or service a success.

May 3, 2013

Is Search a Thing of the Past

Filed under: Marketing,Searching,Topic Maps — Patrick Durusau @ 4:12 pm

Is Search a Thing of the Past by April Holmes.

April covers a survey of 2277 private technology firms that were acquired in 2012.

See her post for the details but the bottom line was:

None of them were search companies.

I can’t remember anyone ever saying they had a “great” search experience.

Can you?

If not, what would you want to replace present search interfaces? (Leaving technical feasibility aside for the moment.)

April 30, 2013

Patterns of information use and exchange:…

Filed under: Design,Interface Research/Design,Marketing,Usability,Users — Patrick Durusau @ 3:05 pm

Patterns of information use and exchange: case studies of researchers in the life sciences

From the post:

A report of research patterns in life sciences revealing that researcher practices diverge from policies promoted by funders and information service providers

This report by the RIN and the British Library provides  a unique insight into how information is used by researchers across life sciences. Undertaken by the University of Edinburgh’s Institute for the Study of Science, Technology and Innovation, and the UK Digital Curation Centre and the University of Edinburgh?s Information Services, the report concludes that one-size-fits-all information and data sharing policies are not achieving scientifically productive and cost-efficient information use in life sciences.

The report was developed using an innovative approach to capture the day-to-day patterns of information use in seven research teams from a wide range of disciplines, from botany to clinical neuroscience. The study undertaken over 11 months and involving 56 participants found that there is a significant gap between how researchers behave and the policies and strategies of funders and service providers. This suggests that the attempts to implement such strategies have had only a limited impact. Key findings from the report include:

  • Researchers use informal and trusted sources of advice from colleagues, rather than institutional service teams, to help identify information sources and resources
  • The use of social networking tools for scientific research purposes is far more limited than expected
  • Data and information sharing activities are mainly driven by needs and benefits perceived as most important by life scientists rather than top-down policies and strategies
  • There are marked differences in the patterns of information use and exchange between research groups active in different areas of the life sciences, reinforcing the need to avoid standardised policy approaches

Not the most recent research in the area but a good reminder that users do as users do, not as system/software/ontology architects would have them do.

What approach does your software take?

Does it make users perform their tasks the “right” way?

Or does it help users do their tasks “their” way?

April 28, 2013

Topic Maps Logo?

Filed under: Advertising,Marketing — Patrick Durusau @ 10:32 am

While writing about Drake, I was struck by the attractiveness of the project logo:

Drake logo

So I decided to look at some other projects logos, just to get some ideas on what other projects were doing as far as logos:

Hadoop logo

Mahout logo

Chukwa logo

But the most famous project at Apache has the simplest logo of all:

HTTPD logo

To be truthful, when someone says web server, I automatically think of the Apache server. Others exist and new ones are invented, but Apache server is nearly synonymous with web server.

Perhaps the lesson is the logo did not make it so.

Has anyone written a history of the Apache web server?

A cross between a social history and a technical one, that illustrates how the project responded to user demands and and requirements. That could make a very nice blueprint for other projects to follow.

April 24, 2013

How to Go Viral, Every Time

Filed under: Marketing,Topic Maps — Patrick Durusau @ 8:52 am

How to Go Viral, Every Time by Jess Bachman.

From the post:

Everyone wants their content to go viral. It’s the holy grail of marketing. It can turn companies and product into the talk of the town, even if they sell toiletries. The ROI on content with more than a million views is almost unmeasurable. So how do you make sure your content will go viral?

The secret is simple. Be incredibly lucky.

Luck is the third piece of the virality triumvirate and obviously the hardest to bank on. In fact, you cannot achieve true virality without it. With great content and powerful tactics you can certainly get millions of views on a consistent basis, but if lady luck doesn’t give her blessing, you will end up with a good – but not great – ROI.

What do you think would make good viral material for a topic map video?

And of course:

Anyone with skills at producing videos interested in a topic map video?

Balisage Advice: How To Organize A Talk

Filed under: Marketing — Patrick Durusau @ 8:01 am

How To Organize A Talk

From the post:

Say you are speaking for an hour to an audience of 100. Its just a fact of human nature that nobody in the audience is going to be paying close attention to what you are saying for more than 1/4 of the time. The other 45 minutes of the time people will be thinking, talking, or just daydreaming. You must accept this as an unavoidable constraint.

Absent any intervention on your part then you will get a randomly selected 15 minutes of attention from each member of the audience. This means that at any one point in time you will have the attention of only 1/4 of your audience or 25 out of the 100 people. The very important things you will have to say will be processed and potentially remembered by 1/4 of your audience, the same fraction that will be paying attention to the least important things you have to say.
….

A forty-five minute time slot means you have about 11 minutes to say your important ideas.

See the post for some tips on doing exactly that.

I suspect the same is true for discussions with potential/actual customers as well.

They are not stupid, they just aren’t paying attention to what you are saying.

One response would be to wire them up like mice that get shocked at random. (That may be illegal in some jurisdictions.)

Another response would be to accept that people are as they are and not as we might want them to to be.

The second response is likely to be the more successful, if less satisfying. 😉

Not easy to do but explanations with complex diagrams to which more complexity is added, haven’t set the woods on fire as a marketing tool.

April 22, 2013

Marketing: Heads I Win, Tails You Lose

Filed under: Marketing — Patrick Durusau @ 3:50 pm

I haven’t seen this marketing tip in any of the manuals:

Watch for bad news, then explain how your technology saved the day!

Like the claim by FLIR Corp. that their thermal imager helped spot Dzhokar Tsarnaev (Boston Marathon bomber) hiding in a boat.

Or more precisely:

FLIR’s thermal imaging gear was able to discern a live, moving individual hiding in a recreational boat being stored in the backyard of a Watertown home, even though the human being could not been seen beneath a covering tarpaulin by video surveillance cameras or the naked eye.

It is not clear from announcements by law enforcement authorities, and news accounts, whether it was the FLIR system that first discovered the wounded alleged terrorist, Dzhokhar Tsarnaev, and led police on the ground to surround the boat and eventually take Tsarnaev into custody. Or whether it was the tipoff from a man living in the Watertown house to blood on the tarpaulin that first led police to the injured alleged terrorist. (From Thermal imager from FLIR Corp. helps spot Boston Marathon terrorist beneath boat tarp)

It’s not that “unclear:”

The manhunt for Dzhokar Tsarnaev lasted all day Friday and left Boston streets deserted as police asked everyone to stay indoors. Then after the request was lifted, authorities got a tip: A Watertown man told police someone was hiding in his boat in the backyard, bleeding. It was their suspect, Watertown police Chief Edward Deveau said.

Officers spotted Tsarnaev poking through the tarp covering the boat, and a shootout erupted, Deveau said. Police used “flash-bangs,” devices meant to stun people with a loud noise, and negotiated with Tsarnaev for about half an hour.

“We used a robot to pull the tarp off the boat,” David Procopio of the Massachusetts State Police said. “We were also watching him with a thermal imaging camera in our helicopter. He was weakened by blood loss — injured last night, most likely.”(From: As Boston reeled, younger bombing suspect partied

After the boat was pointed out, the thermal imager could see the suspect through a cover.

Not as impressive is it?

If you are going to market based on bad news, pick something that isn’t contradicted in published news accounts.

If you are reading marketing, read carefully, very carefully.

April 20, 2013

Data Science Markets [Marketing]

Filed under: Data Science,Marketing,Topic Maps — Patrick Durusau @ 8:38 am

Data Visualization: The Data Industry by Sean Gonzalez.

From the post:

In any industry you either provide a service or a product, and data science is no exception. Although the people who constitute the data science workforce are in many cases rebranded from statistician, physicist, algorithm developer, computer scientist, biologist, or anyone else who has had to systematically encode meaning from information as the product of their profession, data scientists are unique from these previous professions in that they operate across verticals as opposed to diving ever deeper down the rabbit hole.

Sean identifies five (5) market segments in data science and a visualization product for each one:

  1. New Recruits
  2. Contributors
  3. Distillers
  4. Consultants
  5. Traders

See Sean’s post for the details.

Have you identified market segments and the needs they have for topic map based data and/or software?

Yes, I said their needs.

You may want a “…more just, verdant, and peaceful world” but that’s hardly a common requirement.

Starting with a potential customer’s requirements is more likely to result in a sale.

April 17, 2013

HowTo: Develop Your First Google Glass App [Glassware]

Filed under: Marketing,Topic Maps — Patrick Durusau @ 2:50 pm

HowTo: Develop Your First Google Glass App [Glassware] by Tarandeep Singh.

From the post:

Google has raised curtains off it’s Glass revealing detailed Tech Specs. Along with the specs came the much awaited Mirror API – The API for Glass apps.

So you had that killer app idea for Google Glass? Now its time for you to put those ideas into code!

The race is on to produce the first topic map based Google Glass App!

A response to a request can be a machine generated guess or a human curated answer.

Which one do you think users would prefer?

April 11, 2013

Glass – Another Topic Map Medium?

Filed under: Marketing,Topic Maps — Patrick Durusau @ 4:29 pm

If you haven’t seen Glass, go to: http://www.google.com/glass/start/

If lame search results are annoying on your desktop, pad or cellphone, imagine not being able to escape them.

Of for a positive spin, would you want a service provider with better results?

Bad data “in your face” may be the selling point we need.

Spreadsheet is Still the King of all Business Intelligence Tools

Filed under: Business Intelligence,Marketing,Spreadsheets,Topic Maps — Patrick Durusau @ 4:01 pm

Spreadsheet is Still the King of all Business Intelligence Tools by Jim King.

From the post:

The technology consulting firm Gartner Group Inc. once precisely predicated that BI would be the hottest technology in 2012. The year of 2012 witnesses the sharp and substantial increase of BI. Unexpectedly, spreadsheet turns up to be the one developed and welcomed most, instead of the SAP BusinessObjects, IBM Cognos, QlikTech Qlikview, MicroStrateg, or TIBCO Spotfire. In facts, no matter it is in the aspect of total sales, customer base, or the increment, the spreadsheet is straight the top one.

Why the spreadsheet is still ruling the BI world?

See Jim’s post for the details but the bottom line was:

It is the low technical requirement, intuitive and flexible calculation capability, and business-expert-oriented easy solution to the 80% BI problems that makes the spreadsheet still rule the BI world.

Question:

How do you translate:

  • low technical requirement
  • intuitive and flexible calculation capacity (or its semantic equivalent)
  • business-expert-oriented solution to the 80% of BI problems

into a topic map application?

Selling Topic Maps: One Feature At A Time?

Filed under: Marketing,Topic Maps — Patrick Durusau @ 3:45 pm

Dylan Jones writes in Data Quality: One Habit at a Time:

I started learning about data quality management back in 1992. Back then there were no conferences, limited publications and if you received an email via the internet the excitement lasted for hours.

Fast forward to today. We are practically swamped with data quality knowledge outlets. Sites like the Data Roundtable, OCDQ Blog and scores of other data quality bloggers provide practical ideas and techniques on an almost hourly basis.

We never lack for ideas and methods for implementing data quality management, and of course this is hugely beneficial for professionals looking to mature data quality in their organisation.

However, with all this knowledge comes a warning. Data quality management can only succeed when behaviours are changed, but to change a person’s behaviour requires the formation of new habits. This is where many projects will ultimately fail.

Have you ever started the New Year with a promise to change your ways and introduce new habits? Perhaps the guilt of festive excesses drove you to join a gym or undertake some other new health regime. How was that health drive looking in March? How about September?

The problem of habit formation is exacerbated when we attempt to change multiple habits. Perhaps we want to combine a regular running regime with learning new skills. The result is often failure.

Does your topic maps sales pitch require too much change? (I know mine does.)

Or do you focus on the one issue/problem that your client needs solving?

Sure, topic maps enable robust integration of diverse data stores but it that’s not your clients issue, why bring it up?

Can we sell more by promising less?

April 10, 2013

Free Data Mining Tools [African Market?]

Filed under: Data Mining,jHepWork,Knime,Mahout,Marketing,Orange,PSPP,RapidMiner,Rattle,Weka — Patrick Durusau @ 10:17 am

The Best Data Mining Tools You Can Use for Free in Your Company by: Mawuna Remarque KOUTONIN.

Short descriptions of the usual suspects but a couple (jHepWork and PSPP) that were new to me.

  1. RapidMiner
  2. RapidAnalytics
  3. Weka
  4. PSPP
  5. KNIME
  6. Orange
  7. Apache Mahout
  8. jHepWork
  9. Rattle

An interesting site in general.

Consider the following pitch for business success in Africa:

Africa: Your Business Should be Profitable in 45 days or Die

And the reasons for that claim:

1. “It’s almost virgin here. There are lot of opportunities, but you have to fight!”

2. “Target the vanity class with vanity products. The “new rich” have lot of money. They are though on everything except their big ego and social reputation”

3. “Target the lazy executives and middle managers. Do the job they are paid for as a consultant. Be good, and politically savvy, and the money is yours”

4. “You’ll make more money in selling food or opening a restaurant than working for the Bank”

5. “You can’t avoid politics, but learn to think like the people your are talking with. Always finish your sentence with something like “the most important is the country’s development, not power. We all have to work in that direction”

6. “It’s about hard work and passion, but you should first forget about managing time like in Europe.

Take time to visit people, go to the vanity parties, have the patience to let stupid people finish their long empty sentences, and make the politicians understand that your project could make them win elections and strengthen their positions”

7. “Speed is everything. Think fast, Act fast, Be everywhere through friends, family and informants”

With the exception of #1, all of these points are advice I would give to someone marketing topic maps on any continent.

It may be easier to market topic maps where there are few legacy IT systems that might feel threatened by a new technology.

…Cloud Integration is Becoming a Bigger Issue

Filed under: Cloud Computing,Data Integration,Marketing — Patrick Durusau @ 5:27 am

Survey Reports that Cloud Integration is Becoming a Bigger Issue by David Linthicum.

David cites a survey by KPMG that found thirty-three percent of executives complained of higher than expected costs for data integration in cloud projects.

One assume the brighter thirty-three percent of those surveyed. The remainder apparently did not recognize data integration issues in their cloud projects.

David writes:

Part of the problem is that data integration itself has never been sexy, and thus seems to be an issue that enterprise IT avoids until it can’t be ignored. However, data integration should be the life-force of the enterprise architecture, and there should be a solid strategy and foundational technology in place.

Cloud computing is not the cause of this problem, but it’s shining a much brighter light on the lack of data integration planning. Integrating cloud-based systems is a bit more complex and laborious. However, the data integration technology out there is well proven and supports cloud-based platforms as the source or the target in an integration chain. (emphasis added)

The more diverse data sources become, the larger data integration issues will loom.

Topic maps offer data integration efforts in cloud projects a choice:

1) You can integrate one off, either with inhouse or third-party tools, only to redo all that work with each new data source, or

2) You can integrate using a topic map (for integration or to document integration) and re-use the expertise from prior data integration efforts.

Suggest pitching topic maps as a value-add proposition.

April 6, 2013

Data-Plundering at Amazon

Filed under: Cybersecurity,Marketing,Security,Topic Maps — Patrick Durusau @ 10:52 am

Amazon S3 storage buckets set to ‘public’ are ripe for data-plundering by Ted Samson.

From the post:

Using a combination of relatively low-tech techniques and tools, security researchers have discovered that they can access the contents of one in six Amazon Simple Storage Service (S3) buckets. Those contents range from sales records and personal employee information to source code and unprotected database backups. Much of the data could be used to stage a network attack, to compromise users accounts, or to sell on the black market.

All told, researchers managed to discover and explore nearly 2,000 buckets from which they gathered a list of more than 126 billion files. They reviewed over 40,000 publicly visible files, many of which contained sensitive information, according to Rapid 7 Senior Security Consultant Will Vandevanter.

….

The root of the problem isn’t a security hole in Amazon’s storage cloud, according to Vandevanter. Rather, he credited Amazon S3 account holders who have failed to set their buckets to private — or to put it more bluntly, organizations that have embraced the cloud without fully understanding it. The fact that all S3 buckets have predictable, publically accessible URLs doesn’t help, though.

That was close!

From the headline I thought Chinese government hackers had carelessly left Amazon S3 storage buckets open after downloading. 😉

If you want an even lower tech technique for hacking into your network, try the following (with permission):

Call users from your internal phone system and say system passwords have been stolen and IT will monitor all logins for 72 hours. To monitor access, IT needs users logins and passwords to put tracers on accounts. Could make the difference in next quarter earnings being up or being non-existent.

After testing, are you in more danger from your internal staff than external hackers?

As you might suspect, I would be using a topic map to provide security accountability across both IT and users.

With the goal of assisting security risks to become someone else’s security risks.

K-Nearest Neighbors: dangerously simple

Filed under: Data Mining,K-Nearest-Neighbors,Marketing,Topic Maps — Patrick Durusau @ 10:31 am

K-Nearest Neighbors: dangerously simple by Cathy O’Neil.

From the post:

I spend my time at work nowadays thinking about how to start a company in data science. Since there are tons of companies now collecting tons of data, and they don’t know what do to do with it, nor who to ask, part of me wants to design (yet another) dumbed-down “analytics platform” so that business people can import their data onto the platform, and then perform simple algorithms themselves, without even having a data scientist to supervise.

After all, a good data scientist is hard to find. Sometimes you don’t even know if you want to invest in this whole big data thing, you’re not sure the data you’re collecting is all that great or whether the whole thing is just a bunch of hype. It’s tempting to bypass professional data scientists altogether and try to replace them with software.

I’m here to say, it’s not clear that’s possible. Even the simplest algorithm, like k-Nearest Neighbor (k-NN), can be naively misused by someone who doesn’t understand it well. Let me explain.

The devil is all in the detail of what you mean by close. And to make things trickier, as in easier to be deceptively easy, there are default choices you could make (and which you would make) which would probably be totally stupid. Namely, the raw numbers, and Euclidean distance.

Read and think about Cathy’s post.

All those nice, clean, clear number values and a simple math equation, muddied by meaning.

Undocumented meaning.

And undocumented relationships between the variables the number values represent.

You could document your meaning and the relationships between variables and still make dumb decisions.

The hope is you or your successor will use documented meaning and relationships to make better decisions.

For documentation you can:

  • Try to remember the meaning of “close” and the relationships for all uses of K-Nearest Neighbors where you work.
  • Write meaning and relationships down on sticky notes collected in your desk draw.
  • Write meaning and relationships on paper or in electronic files, the latter somewhere on the server.
  • Document meaning and relationships with a topic map, so you can leverage on information already known. Including identifiers for the VP who ordered you to use particular values, for example. (Along with digitally signed copies of the email(s) in question.)

Which one are you using?

PS: This link was forwarded to me by Sam Hunting.

April 5, 2013

Concurrent and Parallel Programming

Filed under: Graphics,Marketing,Visualization — Patrick Durusau @ 1:39 pm

Concurrent and Parallel Programming by Joe Armstrong.

Joe explains the difference between concurrency and parallelism to a five year old.

This is the type of stark clarity that I am seeking for topic map explanations.

At least the first ones someone sees. Time enough later for the gory details.

Suggestions welcome!

April 4, 2013

Targeting Developers?

Filed under: Marketing,Topic Maps — Patrick Durusau @ 11:15 am

Most topic map software, either explicitly or implicitly, is targeted at developers.

I ran across a graphic today that highlights what I consider to be a flaw in that strategy.

The original graphic concerns the number of students enrolled in computer science:

CS enrollment

I first saw that in a tweet by Matt Asay.

I need to practice (read learn) Gimp skills so my first attempt to re-purpose the graphic was:

CS student enrollment

But that leaves my main point implied, so after some fiddling, I got:

Marketing image

Even without a marketing degree, I can pick the better marketing target.

What about you?

BTW, the experience with Hadoop supports my side, not the targeting for developers argument.

Yes, a lot of Hadoop tools are difficult to use, if not black arts.

However, Hadoop marketing has more hand waving and arm flapping than you will see among Democrats on entitlement reform and Republicans on tax reform, combined.

The Hadoop ecosystem (which I like a lot by the way) is billed to consumers as curing everything but AIDS and that is just a matter of application.

Consumer demand, from people who aren’t going to run Hadoop clusters, write pig scripts, etc. is driving developers to build better tools and to learn the harder ones.

Suggestions on how to build consumer oriented marketing of topic maps will be greatly appreciated!

April 3, 2013

Outing Censors

Filed under: Marketing,Topic Maps — Patrick Durusau @ 1:45 pm

You may already be aware of threats and legal proceedings by Edwin Mellen Press against criticism of itself and its publications.

For one recent update, see: Posts Removed Because We’ve Received Letters From Edwin Mellen Press’ Attorney by Kent Anderson.

For further background, see: When Sellers and Buyers Disagree — Edwin Mellen Press vs. a Critical Librarian by Rick Anderson.

The thought occurs to me that over the years there must be a treasure trove of letters and other communications from Edwin Mellen Press, not to mention litigation files, depositions, etc.

But any story about Edwin Mellen Press will be written with access to only part of that historical information.

What if McMaster University were to publicize the “…demands and considerable pressure from the Edwin Mellen Press….?” And those demands could be mapped to other demands against others?

The demands by Edwin Mellen Press have been made against librarians. The very people who excel at the collection and creation of archives.

Is it time for the library community to pool its knowledge about Edwin Mellen Press?

My time resources are limited but I would be willing to contribute as I am able to such an effort.

You?

Information Management – Gartner 2013 “Predictions”

Filed under: Data Management,Marketing,Topic Maps — Patrick Durusau @ 10:58 am

I hesitate to call Gartner reports “predictions.”

The public ones I have seen are c-suite summaries of information already known to the semi-informed.

Are Gartner “predictions” about what c-suite types may become informed about in the coming year?

That qualifies for the dictionary sense of “prediction.”

More importantly, what c-suite types may become informed about are clues on how to promote topic maps.

If you don’t have access to the real Gartner reports, Andy Price has summarized information management predictions in: IT trends: Gartner’s 2013 predictions for information management.

The ones primarily relevant to topic maps are:

  • Big data
  • Semantic technologies
  • The logical data warehouse
  • NoSQL DBMSs
  • Information stewardship applications
  • Information valuation/infonomics

One possible way to capitalize on these “predictions” would be to create a word cloud from the articles reporting on these “predictions.”

Every article with use slightly different language and the most popular terms are the ones to use for marketing.

Thinking they will be repeated often enough to resonate with potential customers.

Capturing the business needs answered by those terms would be a separate step.

April 1, 2013

On the Eight Day

Filed under: Marketing,Topic Maps — Patrick Durusau @ 7:45 pm

On the eight day of creation [language and time units are for the convenience of the reader. The celestial court exists outside of their strictures].

I started this post off as an April Fools Day gag but the keyboard ran away from me.

See what you think.

L = Lord

O = Other member(s) of the celestial court

L: The Tower of Babel is another example of bad PR from my own followers.

O: How so? Didn’t you confuse their languages to prevent an assault on Heaven?

L: Look around you. It is likely I would be fearful of someone piling up bricks to assault Heaven?

O: Well, now that you mention it, no, it doesn’t seem likely. (In an uncertain tone of voice.)

L: Would it help if I explained why humans invented the story of the Tower of Babel?

O: Nodding quickly.

L: Arrogance.

O: Arrogance?

L: Think about it. There are two types of people. One type thinks they know what and how everyone else should be thinking. The other type knows who should be telling others what and how to think.

The Tower of Babel story blames me for the competition to force others to a single way of thinking.

What’s ironic is their arrogance multiplies the number of languages and approaches to languages. Every generation denigrates what went before, for a new bumper crop of shiny “truths.”

Need an example?

Take their “when in the beginning there was FORTRAN….”

Now look at any listing of major programming languages, never mind the smaller ones.

No Tower of Babel story there.

O: What about the Tower of Babel as an explanation for different languages?

L: Glad you asked.

I’ll give you one guess who thinks they are entitled to an explanation for everything.

More listening to others and less whining about not being in charge would be a start towards less confusion of languages.

New Book Explores the P-NP Problem [Explaining Topic Maps]

Filed under: Marketing,Mathematical Reasoning,Mathematics — Patrick Durusau @ 5:24 pm

New Book Explores the P-NP Problem by Shar Steed.

From the post:

The Golden Ticket: P, NP, and the Search for the Impossible, written by CCC Council and CRA board member, Lance Fortnow is now available. The inspiration for the book came in 2009 when Fortnow published an article on the P-NP problem for Communications of the ACM. With more than 200,000 downloads, the article is one of the website’s most popular, which signals that this is an issue that people are interested in exploring. The P-NP problem is the most important open problem in computer science because it attempts measure the limits of computation.

The book is written to appeal to readers outside of computer science and shed light on the fact that there are deep computational challenges that computer scientists face. To make it relatable, Fortnow developed the “Golden Ticket” analogy, comparing the P-NP problem to the search for the golden ticket in Charlie and the Chocolate Factory, a story many people can relate to. Fortnow avoids mathematical and technical terminology and even the formal definition of the P-NP problem, and instead uses examples to explain concepts

“My goal was to make the book relatable by telling stories. It is a broad based book that does not require a math or computer science background to understand it.”

Fortnow also credits CRA and CCC for giving him inspiration to write the book.

Fortnow has explained the P-NP problem without using “…mathematical and technical commentary and even the formal definition of the P-NP problem….”

Now, we were talking about how difficult it is to explain topic maps?

Suggest we all read this as a source of inspiration for better (more accessible) explanations and tutorials on topic maps.

(I just downloaded it to the Kindle reader on a VM running on my Ubuntu box. This promises to be a great read!)

March 29, 2013

Speaking of Business Cases

Filed under: Marketing,Topic Maps — Patrick Durusau @ 4:35 am

The Telenor post reminded me about my arguments about topic maps saving users time by not (re)searching for information already found.

In Telenor’s case, there was someone, customers in fact, who wanted faster and more accurate information.

Is there a business case for avoiding (re)searching for information already found?

Say where research is being billed to a client by the hour?

The more attorneys, CPAs, paralegals, etc. that find the same information = more billable hours.

Where a topic map = fewer billable hours.

And where billable hours aren’t an issue, what do users do with the time they used to spend on the appearance of working by searching?

I am reminded of a then department manager who described themselves as “…doing market research…” by reading the latest issue of Computer Shopper. Nearly twenty (20) years ago now but even then there were more effective means of such research.

On the other hand, there may be cases where use of topic maps by one side may force others to improve their game.

Intelligence gathering and processing for example.

Topic maps need not disrupt current layers of contracting, feathered nests and revolving doors, to say nothing of the turf guardians.

But topic maps could envelope such systems, in place, to provide access to integrated inter-agency intelligence, long before agreement is reached (if ever) on what intelligence to share.

How NoSQL Paid Off for Telenor

Filed under: Lucene,Marketing,Neo4j,Solr — Patrick Durusau @ 4:07 am

How NoSQL Paid Off for Telenor by Sebastian Verheughe and Katrina Sponheim.

A presentation I encountered while searching for something else.

Makes a business case for Lucene/Solr and Neo4j solutions to improve customer access to data.

As opposed to the world being a better place case.

What information process/need have you encountered where you can make a business case for topic maps?

March 27, 2013

The Three Y’s of Topic Maps

Filed under: Marketing,Topic Maps — Patrick Durusau @ 10:58 am

Thinking of ways that topic maps are the same or different from other information technologies.

Your Data: I think all information technologies would claim to handle your data. Some focus more on structured data that others but in general, all handle “your data.”

Your Model: This is where topic maps and key/value stores depart from the Semantic Web.

You don’t get “your model” with the Semantic Web, you get a prefab logical model.

Contrast that with topic map where you can have FOL (first order logic), SOL (second order logic) or any other logic or non-logic you choose to have. It’s your model and it operates as you think it should. Could even be: “go ask Steve” for some operations.

The Semantic Web types will protest not using their model means your data won’t work with their software. Which is one of their main reasons for touting their model. It works with their software.

Personally I prefer models that fit my use cases. As opposed to models whose first requirement is to work on a particular class of software.

Guess it depends on whether you want to further the well being of Semantic Web software developers or your own.

Your Vocabulary: Another point where topic maps and key/value stores depart from the Semantic Web.

The vocabulary you choose for your model is your own.

Which is very likely to be more familiar and you can apply it more accurately.

How do topic maps differ from key/value stores?

Two things come to mind. First, topic maps have inherent machinery for the representation of relationships between subjects.

Not that you could not do that with a key/value store but in some sense a key/value store is more primitive than a topic map. You would have to build up such structures for yourself.

Second, “as is,” key/value stores (at least the ones I have seen, which isn’t all of them), don’t have a well developed notion of subject identity.

That is keys and values are both treated as primitives. If your key and my key aren’t the same, then they must be different. Or if they are the same, then they must be the same thing. Same for values.

That may not be a disadvantage in some cases where information aggregation or merging isn’t a requirement. But it is becoming harder and harder to think of use cases where aggregation/merging isn’t ever going to be an issue.


I need to cover issues like the differences between topic maps and key/value stores more fully.

Would you be interested in longer pieces that could eventually form a book on topic maps?

Perhaps even by subscription?

« Newer PostsOlder Posts »

Powered by WordPress