Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 15, 2016

People NOT Technology Produce Data ROI

Filed under: BigData,Data,Data Science,Data Silos — Patrick Durusau @ 4:00 pm

Too many tools… not enough carpenters! by Nicholas Hartman.

From the webpage:

Don’t let your enterprise make the expensive mistake of thinking that buying tons of proprietary tools will solve your data analytics challenges.

tl;dr = The enterprise needs to invest in core data science skills, not proprietary tools.

Most of the world’s largest corporations are flush with data, but frequently still struggle to achieve the vast performance increases promised by the hype around so called “big data.” It’s not that the excitement around the potential of harvesting all that data was unwarranted, but rather these companies are finding that translating data into information and ultimately tangible value can be hard… really hard.

In your typical new tech-based startup the entire computing ecosystem was likely built from day one around the need to generate, store, analyze and create value from data. That ecosystem was also likely backed from day one with a team of qualified data scientists. Such ecosystems spawned a wave of new data science technologies that have since been productized into tools for sale. Backed by mind-blowingly large sums of VC cash many of these tools have set their eyes on the large enterprise market. A nice landscape of such tools was recently prepared by Matt Turck of FirstMark Capital (host of Data Driven NYC, one of the best data science meetups around).

Consumers stopped paying money for software a long time ago (they now mostly let the advertisers pay for the product). If you want to make serious money in pure software these days you have to sell to the enterprise. Large corporations still spend billions and billions every year on software and data science is one of the hottest areas in tech right now, so selling software for crunching data should be a no-brainer! Not so fast.

The problem is, the enterprise data environment is often nothing like that found within your typical 3-year-old startup. Data can be strewn across hundreds or thousands of systems that don’t talk to each other. Devices like mainframes are still common. Vast quantities of data are generated and stored within these companies, but until recently nobody ever really envisioned ever accessing — let alone analyzing — these archived records. Often, it’s not initially even clear how the all data generated by these systems directly relates to a large blue chip’s core business operations. It does, but a lack of in-house data scientists means that nobody is entirely even sure what data is really there or how it can be leveraged.

I would delete “proprietary” from the above because non-proprietary tools create data problems just as easily.

Thus I would re-write the second quote as:

Tools won’t replace skilled talent, and skilled talent doesn’t typically need many particular tools.

I substituted “particular” tools to avoid religious questions about particular non-proprietary tools.

Understanding data, recognizing where data integration is profitable and where it is a dead loss, creating tests to measure potential ROI, etc., are all tasks of a human data analyst and not any proprietary or non-proprietary tool.

That all enterprise data has some intrinsic value that can be extracted if it were only accessible is an article of religious faith, not business ROI.

If you want business ROI from data, start with human analysts and not the latest buzzwords in technological tools.

February 8, 2016

The Danger of Ad Hoc Data Silos – Discrediting Government Experts

Filed under: Data Science,Data Silos,Government — Patrick Durusau @ 8:48 am

This Canadian Lab Spent 20 Years Ruining Lives by Tess Owen.

From the post:

Four years ago, Yvonne Marchand lost custody of her daughter.

Even though child services found no proof that she was a negligent parent, that didn’t count for much against the overwhelmingly positive results from a hair test. The lab results said she was abusing alcohol on a regular basis and in enormous quantities.

The test results had all the trappings of credible forensic science, and was presented by a technician from the Motherisk Drug Testing Laboratory at Toronto’s Sick Kids Hospital, Canada’s foremost children’s hospital.

“I told them they were wrong, but they didn’t believe me. Nobody would listen,” Marchand recalls.

Motherisk hair test results indicated that Marchand had been downing 48 drinks a day, for 90 days. “If you do the math, I would have died drinking that much” Marchand says. “There’s no way I could function.”

The court disagreed, and determined Marchand was unfit to have custody of her daughter.

Some parents, like Marchand, pursued additional hair tests from independent labs in a bid to fight their cases. Marchand’s second test showed up as negative. But, because the lab technician couldn’t testify as an expert witness, the second test was thrown out by the court.

Marchand says the entire process was very frustrating. She says someone should have noticed a pattern when parents repeatedly presented hair test results from independent labs which completely contradicted Motherisk results. Alarm bells should have gone off sooner.

Tess’ post and a 366-page report make it clear that Motherisk has impaired the fairness of a large number of child-protection service cases.

Child services, the courts, state representatives, the only one would would have been aware of contradictions of Motherisk results over multiple cases, had not interest in “connecting the dots.”

Each case, with each attorney, was an ad hoc data silo that could not present the pattern necessary to challenge the systematic poor science from Motherisk.

The point is that not all data silos are in big data or nation-state sized intelligence services. Data silos can and do regularly have tragic impact upon ordinary citizens.

Privacy would be an issue but mechanisms need to be developed where lawyers and other advocates can share notice of contradiction of state agencies so that patterns such as by Motherisk can be discovered, documented and hopefully ended sooner rather than later.

BTW, there is an obvious explanation for why:

“No forensic toxicology laboratory in the world uses ELISA testing the way [Motherisk] did.”

Child services did not send hair samples to Motherisk to decide whether or not to bring proceedings.

Child services had already decided to remove children and sent hair samples to Motherisk to bolster their case.

How bright did Motherisk need to be to realize that positive results were expected outcome?

Does your local defense bar collect data on police/state forensic experts and their results?

Looking for suggestions?

July 20, 2015

Why Are Data Silos Opaque?

Filed under: Data Integration,Data Silos — Patrick Durusau @ 1:32 pm

As I pointed out in In Praise of the Silo [Listening NSA?], quoting Neil Ward-Dutton:

Every organisational feature – including silos – is an outcome of some kind of optimisation. By talking about trying to destroy silos, we’re denying the sometimes very rational reasons behind their creation.

While working on a presentation for Balisage 2015, it occurred to me to ask: Why Are Data Silos Opaque?

A popular search engine reports that sans duplicates, there were three hundred and thirty-three (333) “hits” on “data silo” that were updated in the last year. Far more reports than I want to list or that you want to read.

The common theme, of course, is the difficulty of accessing data silos.

OK, I’ll bite, why are data silos opaque?

Surely if our respective data silos are based on relational database technology, even with NoSQL, still a likely bet, don’t our programmers know about JDBC drivers? Doesn’t connecting to the data silo solve the problem?

Can we assume that data silos are not opaque due to accessibility? That is drivers exist for accessing data stores, modulo the necessity for system security. Yes?

Data silos aren’t opaque to the users who use them or the DBAs who maintain them. So opacity isn’t something inherent in the data silo itself because we know of people who successfully use what we call a data silo.

What do you think makes data silos opaque?

If we knew where the problem comes from, it might be possible to discuss solutions.

July 1, 2014

Slaying Data Silos?

Filed under: Data Silos,Data Virtualization,Integration — Patrick Durusau @ 6:50 pm

Krishnan Subramanian’s Modern Enterprise: Slaying the Silos with Data Virtualization keeps coming up in my Twitter feed.

In speaking of breaking down data silos, Krishnan says:

A much better approach to solving this problem is abstraction through data virtualization. It is a powerful tool, well suited for the loose coupling approach prescribed by the Modern Enterprise Model. Data virtualization helps applications retrieve and manipulate data without needing to know technical details about each data store. when implemented, organizational data can be easily accessed using a simple REST API.

Data Virtualization (or an abstracted Database as a Service) plugs into the Modern Enterprise Platform as a higher-order layer, offering the following advantages:

  • Better business decisions due to organization wide accessibility of all data
  • Higher organizational agility
  • Loosely coupled services making future proofing easier
  • Lower cost

I find that troubling because there is no mention of data integration.

In fact, in more balanced coverage of data virtualization, which recites the same advantages as Krishnan, we read:

For some reason there are those who sell virtualization software and cloud computing enablement platforms who imply that data integration is something that comes along for the ride. However, nothing gets less complex and data integration still needs to occur between the virtualized data stores as if they existed on their own machines. They are still storing data in different physical data structures, and the data must be moved or copied, and the difference with the physical data structures dealt with, as well as data quality, data integrity, data validation, data cleaning, etc. (The Pros and Cons of Data Virtualization)

Krishnan begins his post:

There’s a belief that cloud computing breaks down silos inside enterprises. Yes, the use of cloud and DevOps breaks down organizational silos between different teams but it only solves part of the problem. The bigger problem is silos between data sources. Data silos, as I would like to refer the problem, is the biggest bottlenecks enterprises face as they try to modernize their IT infrastructure. As I advocate the Modern Enterprise Model, many people ask me what problems they’ll face if they embrace it. Today I’ll do a quick post to address this question at a more conceptual level, without getting into the details.

If data silos are the biggest bottleneck enterprises face, why is the means to address that, data integration, a detail?

Every hand waving approach to data integration fuels unrealistic expectations, even among people who should know better.

There are no free lunches and there are no free avenues for data integration.

January 3, 2014

Data Without Meaning? [Dark Data]

Filed under: Data,Data Analysis,Data Mining,Data Quality,Data Silos — Patrick Durusau @ 5:47 pm

I was reading IDC: Tons of Customer Data Going to Waste by Beth Schultz when I saw:

As much as companies understand the need for data and analytics and are evolving their relationships with both, they’re really not moving quickly enough, Schaub suggested during an IDC webinar earlier this week about the firm’s top 10 predictions for CMOs in 2014. “The aspiration is know that customer, and know what the customer wants at every single touch point. This is going to be impossible in today’s siloed, channel orientation.”

Companies must use analytics to help take today’s multichannel reality and recreate “the intimacy of the corner store,” she added.

Yes, great idea. But as IDC pointed out in the prediction I found most disturbing — especially with how much we hear about customer analytics — gobs of data go unused. In 2014, IDC predicted, “80% of customer data will be wasted due to immature enterprise data ‘value chains.’ ” That has to set CMOs to shivering, and certainly IDC found it surprising, according to Schaub.

That’s not all that surprising, either the 80% and/or the cause being “immature enterprise data ‘value chains.'”

What did surprise me was:

IDC’s data group researchers say that some 80% of data collected has no meaning whatsoever, Schaub said.

I’m willing to bet the wasted 80% of consumer data and the “no meaning” 80% of consumer data, is the same 80%.

Think about it.

If your information chain isn’t associating meaning with the data you collect, the data may as well be streaming to /dev/null.

The data isn’t without meaning, you just failed to capture it. Not the same thing as having “no meaning.”

Failing to capture meaning along with data is one way to produce what I call “dark data.”

I first saw this in a tweet by Gregory Piatetsky.

August 14, 2013

Social Remains Isolated From ‘Business-Critical’ Data

Filed under: Data Integration,Data Silos,Social Media — Patrick Durusau @ 2:29 pm

Social Remains Isolated From ‘Business-Critical’ Data by Aarti Shah.

From the post:

Social data — including posts, comments and reviews — are still largely isolated from business-critical enterprise data, according to a new report from the Altimeter Group.

The study considered 35 organizations — including Caesar’s Entertainment and Symantec — that use social data in context with enterprise data, defined as information collected from CRM, business intelligence, market research and email marketing, among other sources. It found that the average enterprise-class company owns 178 social accounts and 13 departments — including marketing, human resources, field sales and legal — are actively engaged on social platforms.

“Organizations have invested in social media and tools are consolidating but it’s all happening in a silo,” said Susan Etlinger, the report’s author. “Tools tend to be organized around departments because that’s where budgets live…and the silos continue because organizations are designed for departments to work fairly autonomously.”

Somewhat surprisingly, the report finds social data is often difficult to integrate because it is touched by so many organizational departments, all with varying perspectives on the information. The report also notes the numerous nuances within social data make it problematic to apply general metrics across the board and, in many organizations, social data doesn’t carry the same credibility as its enterprise counterpart. (emphasis added)

Isn’t the definition of a silo the organization of data from a certain perspective?

If so, why would it be surprising that different views on data make it difficult to integrate?

Viewing data from one perspective isn’t the same as viewing it from another perspective.

Not really a question of integration but of how easy/hard it is to view data from a variety of equally legitimate perspectives.

Rather than a quest for “the” view shouldn’t we be asking users: “What view serves you best?”

July 14, 2013

Unlocking the Big Data Silos Through Integration

Filed under: BigData,Data Integration,Data Silos,ETL,Silos — Patrick Durusau @ 7:01 pm

Unlocking the Big Data Silos Through Integration by Theo Priestly.

From the post:

Big Data, real-time and predictive analytics present companies with the unparalleled ability to understand consumer behavior and ever-shifting market trends at a relentless pace in order to take advantage of opportunity.

However, organizations are entrenched and governed by silos; data resides across the enterprise in the same way, waiting to be unlocked. Information sits in different applications, on different platforms, fed by internal and external sources. It’s a CIO’s headache when the CEO asks why the organization can’t take advantage of it. According to a recent survey, 54% of organizations state that managing data from various sources is their biggest challenge when attempting to make use of the information for customer analytics.

(…)

Data integration. Again?

A problem that just keeps on giving. The result of every ETL operation is a data set that needs another ETL operation sooner or later.

If Topic Maps weren’t a competing model but a way to model your information for re-integration, time after time, that would be a competitive advantage.

Both for topic maps and your enterprise.

March 31, 2013

Opening Standards: The Global Politics of Interoperability

Filed under: Data Silos,Interoperability,Silos,Standards — Patrick Durusau @ 10:26 am

Opening Standards: The Global Politics of Interoperability Edited by Laura DeNardis.

Overview:

Openness is not a given on the Internet. Technical standards–the underlying architecture that enables interoperability among hardware and software from different manufacturers–increasingly control individual freedom and the pace of innovation in technology markets. Heated battles rage over the very definition of “openness” and what constitutes an open standard in information and communication technologies. In Opening Standards, experts from industry, academia, and public policy explore just what is at stake in these controversies, considering both economic and political implications of open standards. The book examines the effect of open standards on innovation, on the relationship between interoperability and public policy (and if government has a responsibility to promote open standards), and on intellectual property rights in standardization–an issue at the heart of current global controversies. Finally, Opening Standards recommends a framework for defining openness in twenty-first-century information infrastructures.

Contributors discuss such topics as how to reflect the public interest in the private standards-setting process; why open standards have a beneficial effect on competition and Internet freedom; the effects of intellectual property rights on standards openness; and how to define standard, open standard, and software interoperability.

If you think “open standards” have impact, what would you say about “open data?”

At a macro level, “open data” has many of the same issues as “open standards.”

At a micro level, “open data” has unique social issues that drive the creation of silos for data.

So far as I know, a serious investigation of the social dynamics of data silos has yet to be written.

Understanding the dynamics of data silos might, no guarantees, lead to better strategies for dismantling them.

Suggestions for research/reading on the social dynamics of data silos?

March 21, 2013

Google Keep: Another Temporary Google Data Silo

Filed under: Data Silos — Patrick Durusau @ 2:00 pm

Google launches Google Keep, an app to help you remember things by Laura Hazard Owen.

I report this only to ask is anyone tracking new data silos as they appear?

If anyone is tracking them I would be willing to submit candidates as I encounter them.

Thanks!

PS: When Google decides to close Google Keep, please let me know if you create an export to topic map script for it.

January 12, 2013

SDDC And The Elephant In the Room

Filed under: Data Silos,SDDC,Virtualization — Patrick Durusau @ 6:59 pm

SDDC And The Elephant In the Room by Chuck Hollis.

From the post:

Like many companies, we at EMC start our new year with a leadership gathering. We gather to celebrate, connect, strategize and share. They are *always* great events.

I found this year’s gathering was particularly rewarding in terms of deep content. The majority of the meeting was spent unpacking the depth behind the core elements of EMC’s strategy: cloud, big data and trust.

We dove in from a product and technology perspective. We came at it from a services view. Another take from a services and skills viewpoint. And, finally, the organizational and business model implications.

For me, it was like a wonderful meal that just went on and on. Rich, detailed and exceptionally well-thought out — although your head started to hurt after a while.

Underlying much of the discussion was the central notion of a software-defined datacenter (SDDC for short), representing the next generation of infrastructure and operational models. All through the discussion, that was clearly the conceptual foundation for so much of what needed to happen in the industry.

And I started to realize we still have a lot of explaining to do: not only around the concepts themselves, but what they mean to IT groups and the organizations they support.

I’ve now had some time to think and digest, and I wanted to add a few different perspectives to the mix.

The potential of software-defined datacenters (SDDC) comes across loud and clear in Chuck’s post. Particularly for ad-hoc integration of data sources for new purposes.

But then I remembered, silos aren’t built by software. Silos are build by users and software is just a means for building a silo.

Silos won’t become less frequent because of software-defined datacenters, unless users stop building silos.

There will be a potential for fewer silos and more pressure on users to build fewer silos, maybe, but that is no guarantee of fewer silos.

Even a subject-defined datacenter (SubDDC) cannot guarantee no silos.

A SubDDC that defines subjects in its data, structures and software offers a chance to move across silo barriers.

How much of a chance depends on its creator and the return from crossing across silo barriers.

October 13, 2012

Standards and Infrastructure for Innovation Data Exchange [#6000]

Filed under: Data Integration,Data Silos,Standards — Patrick Durusau @ 4:14 pm

Standards and Infrastructure for Innovation Data Exchange by Laurel L. Haak, David Baker, Donna K. Ginther, Gregg J. Gordon, Matthew A. Probus, Nirmala Kannankutty and Bruce A. Weinberg. (Science 12 October 2012: Vol. 338 no. 6104 pp. 196-197 DOI: 10.1126/science.1221840)

Appropriate that post number six thousand (6000) should report an article on data exchange standards.

But the article seems to be at war with itself.

Consider:

There is no single database solution. Data sets are too large, confidentiality issues will limit access, and parties with proprietary components are unlikely to participate in a single-provider solution. Security and licensing require flexible access. Users must be able to attach and integrate new information.

Unified standards for exchanging data could enable a Web-based distributed network, combining local and cloud storage and providing public-access data and tools, private workspace “sandboxes,” and versions of data to support parallel analysis. This infrastructure will likely concentrate existing resources, attract new ones, and maximize benefits from coordination and interoperability while minimizing resource drain and top-down control.

As quickly as the authors say “[t]here is no single database solution.”, they take a deep breath and outline the case for a uniform data sharing structure.

If there is no “single database solution,” it stands to reason there is no single infrastructure for sharing data. The same diversity that blocks the single database, impedes the single exchange infrastructure.

We need standards, but rather than unending quests for enlightened permanence, we should focus on temporary standards, to be replaced by other temporary standards, when circumstances or needs change.

A narrow range required to demonstrate benefits from temporary standards is a plus as well. A standard enabling data integration between departments at a hospital, one department at a time, will show benefits (if there are any to be had), far sooner than a standard that requires universal adoption prior to any benefits appearing.

The Topic Maps Data Model (TMDM) is an example of a narrow range standard.

While the TMDM can be extended, in its original form, subjects are reliably identified using IRI’s (along with data about those subjects). All that is required is that one or more parties use IRIs as identifiers, and not even the same IRIs.

The TMDM framework enables one or more parties to use their own IRIs and data practices, without prior agreement, and still have reliable merging of their data.

I think it is the without prior agreement part that distinguishes the Topic Maps Data Model from other data interchange standards.

We can skip all the tiresome discussion about who has the better name/terminology/taxonomy/ontology for subject X and get down to data interchange.

Data interchange is interesting, but what we find following data interchange is even more so.

More on that to follow, sooner rather than later, in the next six thousand posts.

(See the Donations link. Your encouragement would be greatly appreciated.)

June 27, 2012

The Scourge of Data Silos

Filed under: Data,Data Silos — Patrick Durusau @ 1:28 pm

The Scourge of Data Silos by Rick Sherman

From the post:

“Those who cannot remember the past are condemned to repeat it.” [1]

Over the years there have been many technology waves related to the design, development and deployment of Business Intelligence (BI). As BI technologies evolved, they have been able to significantly expand their functionality by leveraging the incredible capacity growth of CPUs, storage, disk I/O, memory and network bandwidth. New technologies have emerged as enterprises’ data needs keep expanding in variety, volume and velocity.

Technology waves are occurring more frequently than ever. Current technology waves include Big Data, data virtualization, columnar databases, BI appliances, in-memory analytics, predictive analytics, and self-service BI.

Common Promises

Each wave brings with it the promise of faster, easier to use and cheaper BI solutions. Each wave promises to be the breakthrough that makes the “old ways” archaic, and introduces a new dawn of pervasive BI responsive to business needs. No more spreadsheets or reports needed!

IT and product vendors are ever hopeful that the latest technology wave will be the magic elixir for BI, however, people seem to miss that it is not technology that is the gating factor to pervasive BI. What has held back BI has been the reluctance to address the core issues of establishing enterprise data management, information architecture and data governance. Those core issues are hard and the perpetual hope is that one of these technology waves will be the Holy Grail of BI and allow enterprises to skip the hard work of transforming and managing information. We have discussed these issues many times (and will again), but what I want to discuss is the inevitable result in the blind faith in the latest technology wave.

Rick does a good job at pointing out “the inevitable result in the blind faith in the latest technology wave.”

His cool image of silos at the top is a hint about his conclusion:

silos

I have railed about data silos, along with everyone else, for years. But the line of data silos seems to be endless. As indeed I have come to believe it is.

Endless that is. We can’t build data structures or collections of data without building data silos. Some times with enough advantages to justify a new silo, sometimes not.

Rather than “kick against the bricks” of data silos, our time would be better spent making our data silos as transparent as need be.

Not completely and in some cases not at all. Simply not wrote the effort. In those cases, we can always fall back on ETL, or simply ignore the silo altogether.

I posted recently about open data passing the one millionth data set. Data that is trapped in data silos of one sort or another.

We can complain about the data that is trapped inside or we can create mechanisms to free it and data that will inevitably be contained in future data silos.

Even topic map syntaxes and/or models are data silos. But that’s the point isn’t it? We are silo builders and that’s ok.

What we need to add to our skill set is making windows in silos and sharing those windows with others.

June 7, 2012

Breaking Silos – Carrot or Stick?

Filed under: Data Governance,Data Integration,Data Silos,Silos — Patrick Durusau @ 2:17 pm

Alex Popescu, in Silos Are Built for a Reason quotes Greg Lowe saying:

In a typical large enterprise, there are competitions for resources and success, competing priorities and lots of irrelevant activities that are happening that can become distractions from accomplishing the goals of the teams.

Another reason silos are built has to do with affiliation. This is by choice, not by edict. By building groups where you share a shared set of goals, you effectively have an area of focus with a group of people interested in the same area and/or outcome.

There are many more reasons and impacts of why silos are built, but I simply wanted to establish that silos are built for a purpose with legitimate business needs in mind.

Alex then responds:

Legitimate? Maybe. Productive? I don’t really think so.

Greg’s original post is: Breaking down silos, what does that mean?

Greg asks about the benefits of breaking down silos:

  • Are the silos mandatory?
  • What would breaking down silos enable in the business?
  • What do silos do to your business today?
  • What incentive is there for these silos to go away?
  • Is your company prepared for transparency?
  • How will leadership deal with “Monday morning quarterbacks?”

As you can see, there are many benefits to silos as well as challenges. By developing a deeper understanding of the silos and why they get created, you can then have a better handle on whether the silos are beneficial or detrimental to the organization.

I would add to Greg’s question list:

  • Which stakeholders benefit from the silos?
  • What is that benefit?
  • It there a carrot or stick that out weighs that benefit? (in the view of the stakeholder)
  • Do you have the political capital to take the stakeholders on and win?

If your answer are:

  • List of names
  • List of benefits
  • Yes, list of carrots/sticks
  • No

Then you are in good company.

Intelligence silos persist despite the United States being at war with identifiable terrorist groups.

Generalized benefit or penalty for failure, isn’t a winning argument to break a data silo.

Specific benefits and penalties must matter to stakeholders. Then you have a chance to break a data silo.

Good luck!

May 9, 2012

Making Intelligence Systems Smarter (or Dumber)

Filed under: Data Silos,Distributed Sensemaking,Intelligence,Sensemaking — Patrick Durusau @ 10:02 am

Picking the Brains of Strangers….[$507 Billion Dollar Prize (at least)] had three keys to its success:

  • Use of human analysts
  • Common access to data and prior efforts
  • Reuse of prior efforts by human analysts

Intelligence analysts spend their days with snippets and bits of data, trying to wring sense out of it, only to pigeon hold their results in silos.

Other analysts have to know about data to even request it. Or analysts with information must understand their information will help others with their own sensemaking.

All contrary to the results in Picking the Brains of Strangers….

What information will result in sensemaking for one or more analysts is unknown. And cannot be known.

Every firewall, every silo, every compartment, every clearance level, makes every intelligence agency and the overall intelligence community dumber.

Until now, the intelligence community has chosen to be dumber and more secure.

In a time of budget cuts and calls for efficiency in government, it is time for more effective intelligence work, even if less secure.

Take the leak of the diplomatic cables. The only people unaware of the general nature of the cables were the public and perhaps the intelligence agency of Zambia. All other intelligence agencies probably had them or their own version, pigeon holed in their own systems.

With robust intelligence sharing, the NSA could do all the signal capture and expense it out to other agencies. Rather than having duplicate systems by various agencies.

And perhaps a public data flow of analysis for foreign news sources in their original languages. They may not have clearance but they may have insights into cultures and languages that are rare in intelligence agencies.

But that presumes an interest in smarter intelligence systems, not dumber ones by design.

May 16, 2011

The Filter Bubble: Algorithm vs. Curator & the Value of Serendipity

Filed under: Data Silos,Filters,Personalization — Patrick Durusau @ 3:33 pm

The Filter Bubble: Algorithm vs. Curator & the Value of Serendipity by Maria Popova.

Covers the same TED presentation that I mention at On the dangers of personalization but with the value-add that Maria both interviews Eli Pariser and talks about his new book, The Filter Bubble.

I remain untroubled by filtering.

We filter the information we give others around us.

Advertisers filter the information they present in commercials.

For example, I don’t recall any Toyota ads that end with: Buy a Toyota ****, your odds of being in a recall are 1 in ***. That’s filtering.

Two things would increase my appreciation for Google filtering:

First, much better filtering, where I can choose narrow-band filter(s) based on my interests.

Second, the ability to turn the filters off at my option.

You see, I don’t agree that there is information I need to know as determined by someone else.

Here’s an interesting question: What information would you filter from: www.cnn.com?

May 7, 2011

On the dangers of personalization

Filed under: Data Silos,Filters,Personalization — Patrick Durusau @ 6:06 pm

On the dangers of personalization

From the post:

We’re getting our search results seriously edited and, I bet, most of us don’t even know it. I didn’t. One Google engineer says that their search engine uses 57 signals to personalize your search results, even when you’re logged out.

Do we really want to live in a web bubble?

What I find interesting about this piece is that it describes a data silo but from the perspective of an individual.

Think about it.

A data silo is based on data that is filtered and stored.

Personalization is based on data that is filtered and presented.

Do you see any difference?

May 24, 2010

Knowledge Is Power

Filed under: Data Silos,Mapping,Marketing — Patrick Durusau @ 7:01 pm

Sir Francis Bacon originated the aphorism “Knowledge is power.” (Actually he said, “nam et ipsa scientia potestas est”….)

How powerful?

The 9/11 Report points out:

Agencies uphold a “need-to-know” culture of information protection rather than promoting a “need-to-share” culture of integration. (page 417)

Fast forward seven years and we find:

[Information Sharing Environment – ISE] Gaps exist in….(3) determining the results to be achieved by the ISE (that is, how information sharing is improved) along with associated milestones, performance measures, and the individual projects. (Information Sharing [2008]

Seven years later and there are gaps in “how information sharing is improved…..”?

The power of not sharing knowledge is powerful enough to maintain data silos even in the face of national peril.

Topic maps can help you breach any silo you can access. Make that access meaningful and effective.

Not just national security data silos. Take mapping data silos of a regulated industry, say financial institutions. A mapping that grows with every audit/investigation.

Your choices are: 1) Wait for someone to relinquish power, or 2) Increase your power by breaching their data silo. Which one is for you?

April 19, 2010

Why Semantic Technologies Remain Orphans (Lack of Adoption)

Filed under: Data Silos,Heterogeneous Data,Mapping,Semantic Diversity,Topic Maps — Patrick Durusau @ 6:54 pm

In the debate over Data 3.0 (a Manifesto for Platform Agnostic Structured Data) Update 1, Kingsley Idehen has noted the lack of widespread adoption of semantic technologies.

Everyone prefers their own world view. We see some bright, shiny future if everyone else, at their expense, would adopt our view of the world. That hasn’t been persuasive.

And why should it be? What motivation do I have to change how I process/encode my data, in the hopes that if everyone else in my field does the same thing, then at some unknown future point, I will have some unquantifiable advantage over how I process data now?

I am not advocating that everyone adopt XTM syntax or the TMDM as a data model. Just as there are an infinite number of semantics there are an infinite number of ways to map and combine those semantics. I am advocating a disclosed mapping strategy that enables others to make meaningful use of the resulting maps.

Let’s take a concrete case.

The Christmas Day “attack” by a terrorist who set his pants on fire (Christmas Day Attack Highlights US Intelligence Failures) illustrates a failure to share intelligence data.

One strategy, the one most likely to fail, is the development of a common data model for sharing intelligence data. The Guide to Sources of Information for Intelligence Officers, Analysts, and Investigators, Updated gives you a feel for the scope of such a project. (100+ pages listing sources of intelligence data)

A disclosed mapping strategy for the integration of intelligence data would enable agencies to keep their present systems, data structures, interfaces, etc.

Disclosing the basis for mapping, whatever the target (such as RDF), will mean that users can combine the resulting map with other data. Or not. But it will be a meaningful choice. A far saner (and more cost effective) strategy than a common data model.

Semantic diversity is our strength. So why not play to our strength, rather than against it?

March 9, 2010

Schlepping From One Data Silo to Another (Part 1)

Filed under: Data Silos — Tags: , , , , — Patrick Durusau @ 6:59 pm

Talking about data silos is popular. Particularly with a tone of indignation, about someone else’s data silo. But, like the weather, everyone talks about data silos, but nobody does anything about them. In fact, if you look closely, all solutions to data silos, are (drum roll please!), more data silos.

To be sure, some data silos are more common than others but every data format is a data silo to someone who doesn’t use that format. Take the detailed records from the Federal Election Commission (FEC) on candidates, parties and other committees as an example. Important stuff for US residents interested in who bought access to their local representative or senator.

The tutorial on how to convert the files to MS Access clues you in that the files are in fixed width fields, or as the tutorial puts it: “Notice that a columns’ start value is the previous columns’ start value plus its’ width value (except for the first column, which is always “1”).” That sounds real familiar.

But, we return to the download page where we read about how to handle overpunch characters. Overpunch characters? Oh, as in COBOL. Now that’s an old data silo.

The point being that for all the talk about data silos we never escape them. Data formats are data silos. Get over it.

What we can do is make it possible to view information in one data silo as though it were held by another data silo. And if you must switch from one data silo to another, the time, cost and uncertainty of the migration can be reduced. (to be continued)

Powered by WordPress