Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 15, 2012

People and Process > Prescription and Technology

Filed under: Project Management,Semantic Diversity,Semantics,Software — Patrick Durusau @ 3:55 pm

Factors that affect software systems development project outcomes: A survey of research by Laurie McLeod and Stephen G. MacDonell. ACM Computing Surveys (CSUR) Surveys Volume 43 Issue 4, October 2011 Article No. 24, DOI: 10.1145/1978802.1978803.

Abstract:

Determining the factors that have an influence on software systems development and deployment project outcomes has been the focus of extensive and ongoing research for more than 30 years. We provide here a survey of the research literature that has addressed this topic in the period 1996–2006, with a particular focus on empirical analyses. On the basis of this survey we present a new classification framework that represents an abstracted and synthesized view of the types of factors that have been asserted as influencing project outcomes.

As with most survey work, particularly ones that summarize 177 papers, this is a long article, some fifty-six pages.

Let me try to tempt you into reading it by quoting from Angelica de Antonio’s review of it (in Computing Reviews, Oct. 2012):

An interesting discussion about the very concept of project outcome precedes the survey of factors, and an even more interesting discussion follows it. The authors stress the importance of institutional context in which the development project takes place (an aspect almost neglected in early research) and the increasing evidence that people and process have a greater effect on project outcomes than technology. A final reflection on what projects still continue to fail—even if we seem to know the factors that lead to success—raises a question on the utility of prescriptive factor-based research and leads to considerations that could inspire future research. (emphasis added)

Before you run off to the library or download a copy of the survey, two thoughts to keep in mind:

First, if “people and process” are more important than technology, where should we place the emphasis in projects involving semantics?

Second, if “prescription” can’t cure project failure, what are its chances with semantic diversity?

Thoughts?

Requirements Engineering (3rd ed.)

Filed under: Project Management,Requirements — Patrick Durusau @ 3:23 pm

Requirements Engineering (3rd ed.) by Hull, Elizabeth, Jackson, Ken, Dick, Jeremy. Springer, 3rd ed., 2011, XVIII, 207 p. 131 illus., ISBN 978-1-84996-404-3.

From the webpage:

Using the latest research and driven by practical experience from industry, the third edition of this popular book provides useful information to practitioners on how to write and structure requirements. • Explains the importance of Systems Engineering and the creation of effective solutions to problems • Describes the underlying representations used in system modelling and introduces the UML2 • Considers the relationship between requirements and modelling • Covers a generic multi-layer requirements process • Discusses the key elements of effective requirements management • Explains the important concept of rich traceability In this third edition the authors have updated the overview of DOORS to include the changes featured in version 9.2. An expanded description of Product Family Management and a more explicit definition of Requirements Engineering are also included. Requirements Engineering is written for those who want to develop their knowledge of requirements engineering, whether practitioners or students.

I saw a review of this work on the October 2012 issue of Computing Reviews, where Diego Merani remarks:

The philosopher Seneca once said: “There is no fair wind for one who knows not whither he is bound.” This sentence encapsulates the essence of the book: the most common reasons projects fail involve incomplete requirements, poor planning, and the incorrect estimation of resources, risks, and challenges.

Requirements and the consequences of their absence rings true across software and other projects, including the authoring of topic maps.

Requirements: Don’t leave home without them!

Can information be beautiful when information doesn’t exist?

Filed under: Graphics,Infographics,Visualization — Patrick Durusau @ 2:54 pm

Can information be beautiful when information doesn’t exist? by Kaiser Fung.

From the post:

Reader Steve S. sent in this article that displays nominations for the “Information is Beautiful” award (link). I see “beauty” in many of these charts but no “information”. Several of these charts have appeared on our blog before.

Kaiser comments on a number of the graphics that I pointed to in: Information is Beautiful Awards – The Results Are In!

Kaiser is far better qualified than I am to comment on the suitability of the graphics chosen.

I am less confident in his ability to judge the information contained by a graphic.

The information content of a graphic, like its semantics, doesn’t exist separate and apart from the reader/viewer.

If you think it does, do you have an example of information or semantics in the absence of a reader/viewer?

Do read Kaiser’s comments to get a different take on some of the graphics.

Exploring Splunk: Search Processing Language (SPL) Primer and Cookbook

Filed under: Data Analysis,Searching,Splunk — Patrick Durusau @ 2:35 pm

Exploring Splunk: Search Processing Language (SPL) Primer and Cookbook by David Carraso.

From the webpage:

Splunk is probably the single most powerful tool for searching and exploring data you will ever encounter. Exploring Splunk provides an introduction to Splunk — a basic understanding of Splunk’s most important parts, combined with solutions to real-world problems.

Part I: Exploring Splunk

  • Chapter 1 tells you what Splunk is and how it can help you.
  • Chapter 2 discusses how to download Splunk and get started.
  • Chapter 3 discusses the search user interface and searching with Splunk.
  • Chapter 4 covers the most commonly used search commands.
  • Chapter 5 explains how to visualize and enrich your data with knowledge.

Part II: Solution Recipes

  • Chapter 6 covers the most common monitoring and alerting solutions.
  • Chapter 7 covers the most common transaction solutions.
  • Chapter 8 covers the most common lookup table solutions.

My Transaction Searching: Unifying Field Names post is based on an excerpt from this book.

You can download the book in ePub, pdf or Kindle versions or order a hardcopy.

Documentation that captures the interest of a reader.

Not that warns them software is going to be painful, even if in the long term beneficial.

Most projects could benefit from using “Exploring Splunk” as a model for introductory documentation.

Transaction Searching: Unifying Field Names

Filed under: Merging,Splunk — Patrick Durusau @ 2:14 pm

Transaction Searching: Unifying Field Names posted by David Carraso.

From the post:

Problem

You need to build transactions from multiple data sources that use different field names for the same identifier.

Solution

Typically, you can join transactions with common fields like:

... | transaction username

But when the username identifier is called different names (login, name, user, owner, and so on) in different data sources, you need to normalize the field names.

If sourcetype A only contains field_A and sourcetype B only contains field_B, create a new field called field_Z which is either field_A or field_B, depending on which is present in an event. You can then build the transaction based on the value of field_Z.

sourcetype=A OR sourcetype=B
| eval field_Z = coalesce(field_A, field_B)
| transaction field_Z

Looks a lot like a topic map merging operation doesn’t it?

But “looks a lot like” doesn’t mean it is “the same as” a topic map merging operation.

How would you say it is different?

While the outcome may be the same as a merging operation (which are legend defined), I would say that I don’t know how we got from A or B to Z?

That is next month or six months from now, or even two years down the road, I have C and I want to modify this transaction.

Question: Can I safely modify this transaction to add C?

I suspect the answer is:

“We don’t know. Have to go back to confirm what A and B (as well as C) mean and get back to you on that question.”

For a toy example that seems like overkill, but what if you have thousands of columns spread over hundreds of instances of active data systems.

Still feel confident about offering an answer without researching it?

Topic map based merging could give you that confidence.

Even if like Scotty, you say two weeks and have the answer later that day, to age a bit before delivering it ahead of schedule. 😉

Using oDesk for microtasks [Data Semantics – A Permanent Wait?]

Filed under: Crowd Sourcing,oDesk — Patrick Durusau @ 11:01 am

Using oDesk for microtasks by Panos Ipeirotis.

From the post:

Quite a few people keep asking me about Mechanical Turk. Truth be told, I have not used MTurk for my own work for quite some time. Instead I use oDesk to get workers for my tasks, and, increasingly, for my microtasks as well.

When I mention that people can use oDesk for micro-tasks, people get often surprised: “oDesk cannot be used through an API, it is designed for human interaction, right?” Oh well, yes and no. Yes, most jobs require some form of interviewing, but there are certainly jobs where you do not need to manually interview a worker before engaging them. In fact, with most crowdsourcing jobs having both the training and the evaluation component built in the working process, the manual interview is often not needed.

For such crowdsourcing-style jobs, you can use the oDesk API to automate the hiring of workers to work on your tasks. You can find the API at http://developers.odesk.com/w/page/12364003/FrontPage (Saying that the API page is, ahem, badly designed, is an understatement. Nevertheless, it is possible to figure out how to use it, relatively quickly, so let’s move on.)

Panos promises future posts with the results of crowd-sourcing experiments with oDesk.

Looking forward to it because waiting for owners of data to disclose semantics looks like a long wait.

Perhaps a permanent wait.

And why not?

If the owners of data “know” the semantics of their data, what advantage do they get from telling you? What is their benefit?

If you guessed “none,” go to the head of the class.

We can either wait for crumbs of semantics to drop off the table or we can setup our own table to produce semantics.

Which one sounds quicker to you?

item similarity by bipartite graph dispersion

Filed under: Graphs,Similarity — Patrick Durusau @ 10:39 am

item similarity by bipartite graph dispersion by Mat Kelcey.

From the post:

the basis of most recommendation systems is the ability to rate similarity between items. there are lots of different ways to do this.

one model is based the idea of an interest graph where the nodes of the graph are users and items and the edges of the graph represent an interest, whatever that might mean for the domain.

if we only allow edges between users and items the graph is bipartite.

As Mat says, there are lots of ways to judge similarity.

How “similar” two subject representatives need be to represent the same subject is also wide open.

For reliable interchange, however, it is best that you declare (disclose) your similarity measure and what happens when it is met.

The Topic Maps Data Model declares similarity based on matching IRIs within sets of IRIs and prescribes what follows when matches are found. If you need an example of disclosure.

Real-time Big Data Analytics Engine – Twitter’s Storm

Filed under: BigData,Hadoop,Storm — Patrick Durusau @ 8:44 am

Real-time Big Data Analytics Engine – Twitter’s Storm by Istvan Szegedi.

From the post:

Hadoop is a batch-oriented big data solution at its heart and leaves gaps in ad-hoc and real-time data processing at massive scale so some people have already started counting its days as we know it now. As one of the alternatives, we have already seen Google BigQuery to support ad-hoc analytics and this time the post is about Twitter’s Storm real-time computation engine which aims to provide solution in the real-time data analytics world. Storm was originally developed by BackType and running now under Twitter’s name, after BackType has been acquired by them. The need for having a dedicated real-time analytics solution was explained by Nathan Marz as follows: “There’s no hack that will turn Hadoop into a realtime system; realtime data processing has a fundamentally different set of requirements than batch processing…. The lack of a “Hadoop of realtime” has become the biggest hole in the data processing ecosystem. Storm fills that hole.”

Introduction to Storm, including a walk through the word count typology example that comes with the current download.

A useful addition to your toolkit!

Information needs of public health practitioners: a review of the literature [Indexing Needs]

Filed under: Biomedical,Indexing,Medical Informatics,Searching — Patrick Durusau @ 4:37 am

Information needs of public health practitioners: a review of the literature by Jennifer Ford and Helena Korjonen.

Abstract:

Objective

To review published literature covering the information needs of public health practitioners and papers highlighting gaps and potential solutions in order to summarise what is already known about this population and models tested to support them.

Methods

The search strategy included bibliographic databases LISTA, LISA, PubMed and Web of Knowledge. The results of this literature review were used to create two tables displaying published literature.

Findings

The literature highlighted that some research has taken place into different public health subgroups with consistent findings. Gaps in information provision have also been identified by looking at the information services provided.

Conclusion

There is a need for further research into information needs in subgroups of public health practitioners as this group is diverse, has different needs and needs varying information. Models of informatics that can support public health must be developed and published so that the public health information community can share experiences and solutions and begin to build an evidence-base to produce superior information systems for the goal of a healthier society.

One of the key points for topic map advocates:

The need for improved indexing of public health information was highlighted by Alpi, discussing the role of expert searching in public health information retrieval.2 Existing taxonomies such as the MeSH system used by PubMed/Medline are perceived as inadequate for indexing the breadth of public health literature and are seen to be too clinically focussed.2 There is also concern at the lack of systematic indexing of grey literature.2 Given that more than one study has highlighted the high level of use of grey literature by public health practitioners, this lack of indexing should be of real concern to public health information specialists and practitioners. LaPelle also found that participants in her research had experienced difficulties with search terms for public health which is indicative of the current inadequacy of public health indexing.1

Other opportunities for topic maps are apparent in the literature review but inadequate indexing should be topic maps bread and butter.

What you hear could depend on what your hands are doing [Interface Subtleties]

Filed under: Interface Research/Design — Patrick Durusau @ 4:17 am

What you hear could depend on what your hands are doing

Probably not ready for the front of the interface queue but something you should keep in mind.

There are subtleties of information processing that a difficult to dig out but that you can ignore only at the peril of an interface that doesn’t quite “work,” but no one can say why.

I will have to find the reference but I remember some work years ago where poor word spacing algorithms made text measurably more difficult to read, without the reader being aware of the difference.

What if you had information you would prefer readers not pursue beyond a certain point? Could altering the typography make the cognitive load so high that they would “remember” reading a section but not recall that they quit before understanding it in detail?

How would you detect such a strategy if you encountered it?

From the post:

New research links motor skills and perception, specifically as it relates to a second finding—a new understanding of what the left and right brain hemispheres “hear.” Georgetown University Medical Center researchers say these findings may eventually point to strategies to help stroke patients recover their language abilities, and to improve speech recognition in children with dyslexia.

The study, presented at Neuroscience 2012, the annual meeting of the Society for Neuroscience, is the first to match human behavior with left brain/right brain auditory processing tasks. Before this research, neuroimaging tests had hinted at differences in such processing.

“Language is processed mainly in the left hemisphere, and some have suggested that this is because the left hemisphere specializes in analyzing very rapidly changing sounds,” says the study’s senior investigator, Peter E. Turkeltaub, M.D., Ph.D., a neurologist in the Center for Brain Plasticity and Recovery. This newly created center is a joint program of Georgetown University and MedStar National Rehabilitation Network.

Turkeltaub and his team hid rapidly and slowly changing sounds in background noise and asked 24 volunteers to simply indicate whether they heard the sounds by pressing a button.

“We asked the subjects to respond to sounds hidden in background noise,” Turkeltaub explained. “Each subject was told to use their right hand to respond during the first 20 sounds, then their left hand for the next 20 second, then right, then left, and so on.” He says when a subject was using their right hand, they heard the rapidly changing sounds more often than when they used their left hand, and vice versa for the slowly changing sounds.

October 14, 2012

Big data cube

Filed under: BigData,Database,NoSQL — Patrick Durusau @ 7:40 pm

Big data cube by John D. Cook.

From the post:

Erik Meijer’s paper Your Mouse is a Database has an interesting illustration of “The Big Data Cube” using three axes to classify databases.

Enjoy John’s short take, then spend some time with Erik’s paper.

Some serious time with Erik’s paper.

You won’t be disappointed.

The Units Ontology: a tool for integrating units of measurement in science

Filed under: Measurement,Ontology,Science — Patrick Durusau @ 4:40 pm

The Units Ontology: a tool for integrating units of measurement in science by Georgios V. Gkoutos, Paul N. Schofield, and Robert Hoehndorf. ( Database (2012) 2012 : bas033 doi: 10.1093/database/bas03)

Abstract:

Units are basic scientific tools that render meaning to numerical data. Their standardization and formalization caters for the report, exchange, process, reproducibility and integration of quantitative measurements. Ontologies are means that facilitate the integration of data and knowledge allowing interoperability and semantic information processing between diverse biomedical resources and domains. Here, we present the Units Ontology (UO), an ontology currently being used in many scientific resources for the standardized description of units of measurements.

As the paper acknowledges, there are many measurement systems in use today.

Leaves me puzzled as to what happens to data that follows some other drummer? Other than this one?

I assume any coherent system has no difficulty integrating data written in that system.

So how does adding another coherent system assist in that integration?

Unless everyone universally moves to the new system. Unlikely don’t you think?

“Microsoft Dynamics CRM Compatible” On the Outside?

Filed under: Marketing,Searching — Patrick Durusau @ 4:22 pm

Sonoma Partners Releases Universal Search for Microsoft Dynamics CRM

From the post:

Sonoma Partners, a leading Microsoft Dynamics CRM consultancy with expertise in enterprise mobility, announced today the release of Universal Search, a free add-on tool for Dynamics CRM that provides enhanced search functionality and increased productivity.

Universal Search allows Microsoft Dynamics CRM 2011 users to view search results from multiple entities by executing a single search. Without this tool, Microsoft Dynamics CRM users have been limited to searching from within one entity at a time. With Universal Search, fields can be returned across multiple entities, including accounts, leads and opportunities.

With Universal Search, Microsoft Dynamics CRM administrators can configure which entities are searched, which attributes to search by and what information is displayed. The tool is conveniently located in the ribbon of Microsoft Dynamics 2011 and is available to users at any time within the system.

“We developed Universal Search to create a convenient way for Microsoft Dynamics CRM users to greatly streamline the experience of searching for records, even if they don’t know what type of record it is,” said Mike Snyder, principal of Sonoma Partners. “With this free add-on, we hope to enable Dynamics CRM users to utilize their on-premise or online deployment to the fullest.”

Universal Search from Sonoma Partners is available for Microsoft Dynamics 2011 on-premise and online deployments….

Rather amazing that something so basic as searching across entities should come as an add-on, even a free one.

Should give you an idea of the gap between common search capability and what you can provide using topic maps.

And in cases like this one, you only have to improve an existing product. That is being marketed by someone else.

I first saw this at Beyond Search.

The Graph Database: One Option for Exploring Big Data

Filed under: BigData,Graphs — Patrick Durusau @ 3:44 pm

The Graph Database: One Option for Exploring Big Data by Loraine Lawson.

While certainly true, that graph databases can explore big data, I’m not sure that was the question.

From the post:

“The majority of companies are on the sidelines because they think they can’t readily access the data they have, they don’t have in house tools or talent to analyze it and don’t have the ability to put the data to use anyway,” writes Matthew Crowl in a recent CAN blog post.

One answer may be the graph database, which uses nodes, properties and edges rather than traditional indexing to store data. In other words, it allows you to create a graph of connections between people, objects and data.

The first paragraph makes it sound like a lack of tools, talent and process are preventing the use of big data.

If those are in fact the problems, then a graph (or any other) database isn’t going to be the answer.

Or at least not a useful answer from the client’s perspective.

However attractive it may be from a vendor’s perspective.

Tech That Protects The President, Part 1: Data Mining

Filed under: Data Mining,Natural Language Processing,Semantics — Patrick Durusau @ 3:41 pm

Tech That Protects The President, Part 1: Data Mining by Alex Popescu.

From the post:

President Obama’s appearance at the Democratic National Convention in September took place amid a rat’s nest of perils. But the local Charlotte, North Carolina, police weren’t entirely on their own. They were aided by a sophisticated data mining system that helped them identify threats and react to them quickly. (Part 1 of a 3-part series about the technology behind presidential security.)

The Charlotte-Mecklenberg police used a software from lxReveal to monitor the Internet for associations between Obama, the DNC, and potential treats. The company’s program, known as uReveal, combs news articles, status updates, blog posts, discussion forum comments. But it doesn’t simply search for keywords. It works on concepts defined by the user and uses natural language processing to analyze plain English based on meaning and context, taking into account slang and sentiment. If it detects something amiss, the system sends real-time alerts.

“We are able to read and alert almost as fast as [information] comes on the Web, as opposed to other systems where it takes hours,” said Bickford, vice president of operations of IxReveal.

In the past, this kind of task would have required large numbers of people searching and then reading huge volumes of information and manually highlighting relevant references. “Normally you have to take information like an email and shove it in to a database,” Bickford explained. “Someone has to physically read it or do a keyword search.

uReveal, on the other hand, lets machines do the reading, tracking, and analysis. “If you apply our patented technology and natural language processing capability, you can actually monitor that information for specific keywords and phrases based on meaning and context,” he says. The software can differentiate between a Volkswagen bug, a computer bug and an insect bug, Bickford explained – or, more to the point, between a reference to fire from a gun barrel and on to fire in a fireplace.

Bickford says the days of people slaving over sifting through piles of data, or ETL (extract, transform and load) data processing capabilities are over. “It’s just not supportable.”

I understand product promotion but do you think potential assassins are publishing letters to the editor, blogging or tweeting about their plans or operational details?

Granting contract killers in Georgia are caught when someone tries to hire an undercover police officer as a “hit” man.

Does that expectation of dumbness apply in other cases as well?

Or, is searching large amounts of data like the drunk looking for his keys under the street light?

A case of “the light is better here?”

October 13, 2012

Data visualisation: how Alberto Cairo creates a functional art

Filed under: Graphics,Visualization — Patrick Durusau @ 7:13 pm

Data visualisation: how Alberto Cairo creates a functional art by Simon Rogers.

From the post:

It’s not enough for visualisations to string the correct numbers together, they should – in the words of William Morris – be beautiful and useful.

And one of the leading experts in making data beautiful is Alberto Cairo – who teaches information graphics and visualisation at the University of Miami’s School of Communication.

His latest book, The Functional Art, is a comprehensive guide not only to how to do it; but how to get it right, too. And, if you’re interested in data visualisation, you must not only read this but absorb each of the lessons he teaches so patiently.

Definitely one for the wish list at Amazon!

See Simon’s post for links and other information.

Five User Experience Lessons from Johnny Depp

Filed under: Authoring Topic Maps,Interface Research/Design,Usability,Users — Patrick Durusau @ 7:01 pm

Five User Experience Lessons from Johnny Depp by Steve Tengler.

Print this post out and pencil in your guesses for the Johnny Depp movies that illustrate these lessons:

Lesson #1: It’s Not About the Ship You Rode In On

Lesson #2: Good UXers Plan Ahead to Assimilate External Content

Lesson #3: Flexibility on Size Helps Win the Battle

Lesson #4: Design for What Your Customer Wants … Not for What You Want

Lesson #5: Tremendous Flexibility Can Lead to User Satisfaction

Then pass a clean copy to the next cubicle and see how they do.

Funny how Lesson #4 keeps coming up.

I had an Old Testament professor who said laws against idol worship were evidence people were engaged in idol worship. Rarely prohibit what isn’t a problem.

I wonder if #4 keeps coming up because designers keep designing for themselves?

What do you think?

If that is true, then it must be true that authors write for themselves. (Ouch!)

So how do authors discover (or do they) how to write for others?

Know the ones that succeed in commercial trade by sales. But that is after the fact and not explanatory.

Important question if you are authoring curated content with a topic map for sale.

Modeling Question: What Happens When Dots Don’t Connect?

Filed under: Associations,Modeling — Patrick Durusau @ 6:35 pm

Working with a data set and have run across a different question than vagueness/possibility of relationships. (see Topic Map Modeling of Sequestration Data (Help Pls!) if you want to help with that one.)

What if when analyzing the data I determine there is no association between two subjects?

I am assuming that if there is no association, there are no roles at play.

How do I record the absence of the association?

I don’t want to trust the next user will “notice” the absence of the association.

A couple of use cases come to mind:

I suspect there is an association but have no proof. The cheating husband/wife scenario. (I suppose there I would know the “roles.”)

What about corporations or large organizations? Allegations are made but no connection to identifiable actors.

Corporations act only through agents. A charge that names the responsible agents is different from a general allegation.

How do I distinguish those? Or make it clear no agent has been named?

Wouldn’t that be interesting?

We read now: XYZ corporation plead guilty to government contract fraud.

We could read: A, B, and C, XYZ corporation and L, M, N, government contract officers managed the XYZ government contract. XYZ plead guilty to contract fraud and was fined $.

Could keep better score on private and public employees that keep turning up in contract fraud cases.

One test for transparency is accountability.

No accountability, no transparency.

Standards and Infrastructure for Innovation Data Exchange [#6000]

Filed under: Data Integration,Data Silos,Standards — Patrick Durusau @ 4:14 pm

Standards and Infrastructure for Innovation Data Exchange by Laurel L. Haak, David Baker, Donna K. Ginther, Gregg J. Gordon, Matthew A. Probus, Nirmala Kannankutty and Bruce A. Weinberg. (Science 12 October 2012: Vol. 338 no. 6104 pp. 196-197 DOI: 10.1126/science.1221840)

Appropriate that post number six thousand (6000) should report an article on data exchange standards.

But the article seems to be at war with itself.

Consider:

There is no single database solution. Data sets are too large, confidentiality issues will limit access, and parties with proprietary components are unlikely to participate in a single-provider solution. Security and licensing require flexible access. Users must be able to attach and integrate new information.

Unified standards for exchanging data could enable a Web-based distributed network, combining local and cloud storage and providing public-access data and tools, private workspace “sandboxes,” and versions of data to support parallel analysis. This infrastructure will likely concentrate existing resources, attract new ones, and maximize benefits from coordination and interoperability while minimizing resource drain and top-down control.

As quickly as the authors say “[t]here is no single database solution.”, they take a deep breath and outline the case for a uniform data sharing structure.

If there is no “single database solution,” it stands to reason there is no single infrastructure for sharing data. The same diversity that blocks the single database, impedes the single exchange infrastructure.

We need standards, but rather than unending quests for enlightened permanence, we should focus on temporary standards, to be replaced by other temporary standards, when circumstances or needs change.

A narrow range required to demonstrate benefits from temporary standards is a plus as well. A standard enabling data integration between departments at a hospital, one department at a time, will show benefits (if there are any to be had), far sooner than a standard that requires universal adoption prior to any benefits appearing.

The Topic Maps Data Model (TMDM) is an example of a narrow range standard.

While the TMDM can be extended, in its original form, subjects are reliably identified using IRI’s (along with data about those subjects). All that is required is that one or more parties use IRIs as identifiers, and not even the same IRIs.

The TMDM framework enables one or more parties to use their own IRIs and data practices, without prior agreement, and still have reliable merging of their data.

I think it is the without prior agreement part that distinguishes the Topic Maps Data Model from other data interchange standards.

We can skip all the tiresome discussion about who has the better name/terminology/taxonomy/ontology for subject X and get down to data interchange.

Data interchange is interesting, but what we find following data interchange is even more so.

More on that to follow, sooner rather than later, in the next six thousand posts.

(See the Donations link. Your encouragement would be greatly appreciated.)

Bio4j 0.8 is here!

Filed under: Bio4j,Bioinformatics,Biomedical,Genome — Patrick Durusau @ 1:57 pm

Bio4j 0.8 is here! by Pablo Pareja Tobes.

You will find “5.488.000 new proteins and 3.233.000 genes” and other improvements!

Whether you are interested in graph databases (Neo4j), bioinformatics or both, this is welcome news!

October 12, 2012

Leon Panetta Plays Chicken Little

Filed under: Government,Government Data,Security,Telecommunications — Patrick Durusau @ 4:53 pm

If you haven’t seen DOD: Hackers Breached U.S. Critical Infrastructure Control Systems, or similar coverage of Leon Panetta’s portrayal of Chicken Little (aka “Henny Penny”), you may find this interesting.

The InformationWeek Government article says:

Warning of more destructive attacks that could cause loss of life if successful, Panetta urged Congress to pass comprehensive legislation in the vein of the Cybersecurity Act of 2012, a bill co-sponsored by Sens. Joe Lieberman, I-Conn., Susan Collins, R-Maine, Jay Rockefeller, D-W.Va., and Dianne Feinstein, D-Calif., that failed to pass in its first attempt earlier this year by losing a cloture vote in the Senate.

“Congress must act and it must act now,” he said. “This bill is victim to legislative and political gridlock like so much else in Washington. That frankly is unacceptable and it should be unacceptable not just to me, but to you and to anyone concerned with safeguarding our national security.”

Specifically, Panetta called for legislation that would make it easier for companies to share “specific threat information without the prospect of lawsuits” but while still respecting civil liberties. He also said that there must be “baseline standards” co-developed by the public and private sector to ensure the cybersecurity of critical infrastructure IT systems. The Cybersecurity Act of 2012 contained provisions that would arguably fit the bill on both of those accounts.

While Panetta said that “there is no substitute” for legislation, he noted that the Obama administration has been working on an executive order on cybersecurity as an end-around on Congress. “We need to move as far as we can” even in the face of Congressional inaction, he said. “We have no choice because the threat that we face is already here.”

I particularly liked the lines:

“…That frankly is unacceptable and it should be unacceptable not just to me, but to you and to anyone concerned with safeguarding our national security.”

“We have no choice because the threat that we face is already here.”

Leon is old enough to remember (too old perhaps?) the Cold War when we had the Russians, the Chinese and others to defend ourselves against. Without the Cybersecurity Act of 2012.

Oh, you don’t know what the Cybersecurity Act of 2012 says do you?

The part Leon is lusting after to make private entities exempt from:

[Sec 701]….chapter 119, 121, or 206 of title 18, United States Code, the Foreign Intelligence Surveillance Act of 1978 (50 U.S.C. 1801 et seq.), and the Communications Act of 1934 (47 U.S.C. 151 et seq.), ..

I’m sorry, that still doesn’t help does it?

Try this:

[Title 18, United States Code] CHAPTER 119—WIRE AND ELECTRONIC COMMUNICATIONS INTERCEPTION AND INTERCEPTION OF ORAL COMMUNICATIONS (§§ 2510–2522)

[Title 18, United States Code] CHAPTER 121—STORED WIRE AND ELECTRONIC COMMUNICATIONS AND TRANSACTIONAL RECORDS ACCESS (§§ 2701–2712)

[Title 18, United States Code] CHAPTER 206—PEN REGISTERS AND TRAP AND TRACE DEVICES (§§ 3121–3127)

[Title 47, United States Code, start here and following]CHAPTER 5—WIRE OR RADIO COMMUNICATION (§§ 151–621)

[Title 50, United States Code, start here and following]CHAPTER 36—FOREIGN INTELLIGENCE SURVEILLANCE (§§ 1801–1885c)

Just reading the section titles should give you the idea:

The Cybersecurity Act of 2012 exempts all private entities from criminal and civil penalties for monitoring, capturing and reporting any communication by anyone. Well, except for whatever the government is doing, that stays secret.

During the Cold War, facing nuclear armageddon, we had the FBI, CIA and others, subject to the laws you read above, to protect us from our enemies. And we did just fine.

Now we are facing a group of raggamuffins and Leon wants to re-invent the Stasi. Put us all to spying and reporting on each other. Free of civil and criminal liability.

A topic map could connect half-truths, lies and the bed wetters who support this sort of legislation together. (They aren’t going to go away.)

Interested?

PS: A personal note for Leon Panetta:

Leon, before you repeat any more idle latrine gossip, talk to some of the more competent career security people at the Pentagon. They will tell you about things like separation of secure from unsecure networks. Not allowing recordable magnetic media (including Lady Gaga CDs) access to secure networks, and a host of other routine security measures already in place.

Computer security didn’t just become an issue since 9/11. Every sane installation has been aware of computer security issues for decades.

Two kinds of people are frantic about computer security now:

  1. Decision makers who don’t understand computer security.
  2. People who want to sell the government computer security services.

Our military computer experts can fashion plans within the constitution and legal system to deal with what is a routine security issue.

You just have to ask them.

DNA Big Data Research Stuns Stephen Colbert

Filed under: Humor — Patrick Durusau @ 3:37 pm

DNA Big Data Research Stuns Stephen Colbert

Stephen is given 20 million copies of George Church’s Regenesis. His quick wit appears to retreat.

Watch the video at the link. What do you think?

Is this a good test for technology?

That it can stun Stephen Cobert into silence?

That may be too high a bar for topic maps. 😉

(Thought you might like something amusing. My next post is fairly grim.)

Lucene 4.0 and Solr 4.0 Released!

Filed under: Lucene,Solr — Patrick Durusau @ 3:29 pm

Lucene 4.0 and Solr 4.0 Released!

I am omitted the lengthy list of new features, both since the beta and since the last version.

There will be more complete reviews and a list of changes is in the distribution. No need for me to repeat it here.

Grab copies of both and enjoy the weekend!

Best Practices for Scientific Computing

Filed under: Computer Science,Programming — Patrick Durusau @ 3:22 pm

Best Practices for Scientific Computing by D. A. Aruliah, C. Titus Brown, Neil P. Chue Hong, Matt Davis, Richard T. Guy, Steven H. D. Haddock, Katy Huff, Ian Mitchell, Mark Plumbley, Ben Waugh, Ethan P. White, Greg Wilson, and Paul Wilson.

Abstract:

Scientists spend an increasing amount of time building and using software. However, most scientists are never taught how to do this efficiently. As a result, many are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort. We describe a set of best practices for scientific software development that have solid foundations in research and experience, and that improve scientists’ productivity and the reliability of their software.

If programming, or should I say good practices at programming, isn’t second nature to you, you will find something to learn/relearn in this paper.

I first saw this at Simply Statistics.

PathNet: A tool for pathway analysis using topological information

Filed under: Bioinformatics,Biomedical,Genome,Graphs,Networks — Patrick Durusau @ 3:12 pm

PathNet: A tool for pathway analysis using topological information by Bhaskar Dutta, Anders Wallqvist and Jaques Reifman. (Source Code for Biology and Medicine 2012, 7:10 doi:10.1186/1751-0473-7-10)

Abstract:

Background

Identification of canonical pathways through enrichment of differentially expressed genes in a given pathway is a widely used method for interpreting gene lists generated from highthroughput experimental studies. However, most algorithms treat pathways as sets of genes, disregarding any inter- and intra-pathway connectivity information, and do not provide insights beyond identifying lists of pathways.

Results

We developed an algorithm (PathNet) that utilizes the connectivity information in canonical pathway descriptions to help identify study-relevant pathways and characterize non-obvious dependencies and connections among pathways using gene expression data. PathNet considers both the differential expression of genes and their pathway neighbors to strengthen the evidence that a pathway is implicated in the biological conditions characterizing the experiment. As an adjunct to this analysis, PathNet uses the connectivity of the differentially expressed genes among all pathways to score pathway contextual associations and statistically identify biological relations among pathways. In this study, we used PathNet to identify biologically relevant results in two Alzheimers disease microarray datasets, and compared its performance with existing methods. Importantly, PathNet identified deregulation of the ubiquitin-mediated proteolysis pathway as an important component in Alzheimers disease progression, despite the absence of this pathway in the standard enrichment analyses.

Conclusions

PathNet is a novel method for identifying enrichment and association between canonical pathways in the context of gene expression data. It takes into account topological information present in pathways to reveal biological information. PathNet is available as an R workspace image from http://www.bhsai.org/downloads/pathnet/.

Important work for genomics but also a reminder that a list of paths is just that, a list of paths.

The value-add and creative aspect of data analysis is in the scoring of those paths in order to wring more information from them.

How is it for you? Just lists of paths or something a bit more clever?

Mirror, Mirror, on the Screen

Filed under: Interface Research/Design,Usability — Patrick Durusau @ 3:03 pm

Mirror, Mirror, on the Screen by David Moskovic.

From the post:

According to Don Norman (author of Emotional Design: Why We Love (or Hate) Everyday Things) there are three levels of cognitive processing. The visceral level is the most immediate and is the one marketing departments look to when trying to elicit trigger responses and be persuasive. Behavioral processing is the middle level, and is the concern of traditional usability or human factors practitioners designing for ergonomics and ease of use. The third level is reflective processing.

Reflective processing is when our desires for uniqueness and cultural or aesthetic sophistication influence our preferences. Simply put, it is about seeing ourselves positively reflected in the products we use. What that means to individuals and their own self-images is highly subjective (see the picture at upper-left), however—and again according to Norman—designing for reflection is the most powerful way to build long-term product/user relationships.

Unfortunately, reflective processing is often dismissed by interaction designers as a style question they shouldn’t concern themselves with. To be fair, applying superficial style has too often been used in ways that cause major usability issues—a fairly common occurrence with brand websites for consumer packaged goods. One that comes to mind (although perhaps not the most egregious) is Coors.com, with its wood paneling background image where the navigation gets lost. It is superficial style with no reflective trade-off because not only is its usability quite poor, it is also completely product-centric rather than customer-centric. On the flip side, and what seems to be a recurring problem, is that many very usable digital products and services fail to generate the levels of adoption, engagement, and retention their creators were after because they lack that certain je ne sais quoi that connects with users at a deeper level.

The point of this article is to make the case for reflective processing design in a way that does not detract from usability’s chief concerns. When reflection-based design goes deeper than superficial stylization tricks and taps into our reflected sense of self, products become much more rewarding and life-enhancing, and have a higher potential for a more engaged and longer-lasting customer relationship.

Equally important, and deserving of attention from a UX and user-centered design perspective, is the fact that products that successfully address the reflective level are almost unanimously perceived as more intuitive and easier to use. Norman famously makes that case by pointing out how the original iPod click-wheel navigation was perhaps not the most usable solution but was perceived as the easiest because of Apple’s amazing instinct for reflection-based design.

Questions:

1. Does your application connect with your customers at a deeper level?

Or

2. Does your application connect with your developers at a deeper level?

If #2 is yes, hope your developers buy enough copies to keep the company afloat.

Otherwise, work to make the answer to #1 yes.

See David’s post for suggestions.

43 Big Data experts to follow on Twitter

Filed under: BigData,Tweets — Patrick Durusau @ 2:36 pm

43 Big Data experts to follow on Twitter by David Smith.

David points to a list of forty-three big data experts to follow on Twitter.

Who do you follow on Twitter for:

  • ElasticSearch
  • Graphs
  • Indexing
  • Lucene
  • Neo4j
  • Search Engines
  • Semantics
  • Solr

?

What else should I have listed and suggest experts for the same. Thanks!

Lacking Data Integration, Cloud Computing Suffers

Filed under: Cloud Computing,Data Integration — Patrick Durusau @ 2:11 pm

Lacking Data Integration, Cloud Computing Suffers by David Linthicum.

From the post:

The findings of the Cloud Market Maturity study, a survey conducted jointly by Cloud Security Alliance (CSA) and ISACA, show that government regulations, international data privacy, and integration with internal systems dominate the top 10 areas where trust in the cloud is at its lowest.

The Cloud Market Maturity study examines the maturity of cloud computing and helps identify market changes. In addition, the report provides detailed information on the adoption of cloud services at all levels within global companies, including senior executives.

Study results reveal that cloud users from 50 countries expressed the lowest level of confidence in the following (ranked from most reliable to least reliable):

  • Government regulations keeping pace with the market
  • Exit strategies
  • International data privacy
  • Legal issues
  • Contract lock in
  • Data ownership and custodian responsibilities
  • Longevity of suppliers
  • Integration of cloud with internal systems
  • Credibility of suppliers
  • Testing and assurance

Questions:

As “big data” gets “bigger,” will cloud integration issues get better or worse?

Do you prefer disposable data integration or reusable data integration? (For bonus points, why?)

Ten Reasons Users Won’t Use Your Topic Map

Filed under: Interface Research/Design,Marketing,Usability — Patrick Durusau @ 1:28 pm

Ian Nicholson’s analysis of why business intelligence applications aren’t used equally applies to topic maps and topic map applications.

From: Ten Reasons Your Users Won’t Use Your Business Intelligence Solution.

  • Project Stalled or Went Over Deadline/Budget
  • The Numbers Cannot Be Trusted
  • Reports Take Too Long To Run
  • Requirements Have Changed Since The Project Began
  • The World Has Moved On After Delivery
  • Inadequate Training
  • Delivery Did Not Meet User Expectations
  • Your BI Solution is Not Available to Everyone
  • Reports Too Static – No Self-Serve Reporting
  • Users Simply Won’t Give Up Excel or Whatever It Is They Use

Ian also offers possible solutions to these issues.

Questions:

Do any of the issues sound familiar?

Do the solutions sound viable in a topic maps context?

Estimating Subject Sameness?

Filed under: Graphs,Networks,Similarity — Patrick Durusau @ 5:26 am

If you think about it, graph isomorphism is a type of subject sameness problem.

Sadly, graph isomorphism remains a research problem so not immediately applicable to problems you encounter with topic maps.

However, Alex Smola in The Weisfeiler-Lehman algorithm and estimation on graphs covers some “cheats” that you may find useful.

Imagine you have two graphs \(G\) and \(G’\) and you’d like to check how similar they are. If all vertices have unique attributes this is quite easy:

FOR ALL vertices \(v \in G \cup G’\) DO

  • Check that \(v \in G\) and that \(v \in G’\)
  • Check that the neighbors of v are the same in \(G\) and \(G’\)

This algorithm can be carried out in linear time in the size of the graph, alas many graphs do not have vertex attributes, let alone unique vertex attributes. In fact, graph isomorphism, i.e. the task of checking whether two graphs are identical, is a hard problem (it is still an open research question how hard it really is). In this case the above algorithm cannot be used since we have no idea which vertices we should match up.

The Weisfeiler-Lehman algorithm is a mechanism for assigning fairly unique attributes efficiently. Note that it isn’t guaranteed to work, as discussed in this paper by Douglas – this would solve the graph isomorphism problem after all. The idea is to assign fingerprints to vertices and their neighborhoods repeatedly. We assume that vertices have an attribute to begin with. If they don’t then simply assign all of them the attribute 1. Each iteration proceeds as follows:

Curious if you find the negative approach, “these two graphs are not isomorphic,” as useful as a positive one (where it works), “these two graphs are isomorphic?”

Or is it sufficient to reliably know that graphs are different?

« Newer PostsOlder Posts »

Powered by WordPress