Archive for the ‘Information Integration’ Category

IRI-DIM 2014…

Monday, December 30th, 2013

IRI-DIM 2014 : The Third IEEE International Workshop on Data Integration and Mining


April 4, 2014 Regular Paper submission deadline( Midnight PST )
May 4, 2014 Acceptance Notification
May 14, 2014 Camera-ready paper due
May 14, 2014 Conference author registration due
Aug. 13-15, 2014 Conference (San Francisco)

From the call for papers:

Given the emerging global Information-centric IT landscape that has tremendous social and economic implications, effectively processing and integrating humungous volumes of information from diverse sources to enable effective decision making and knowledge generation have become one of the most significant challenges of current times. Information Reuse and Integration (IRI) seeks to maximize the reuse of information by creating simple, rich, and reusable knowledge representations and consequently explores strategies for integrating this knowledge into systems and applications. IRI plays a pivotal role in the capture, representation, maintenance, integration, validation, and extrapolation of information; and applies both information and knowledge for enhancing decision-making in various application domains.

This conference explores three major tracks: information reuse, information integration, and reusable systems. Information reuse explores theory and practice of optimizing representation; information integration focuses on innovative strategies and algorithms for applying integration approaches in novel domains; and reusable systems focus on developing and deploying models and corresponding processes that enable Information Reuse and Integration to play a pivotal role in enhancing decision-making processes in various application domains.

Looks like I need to pull up the prior IRI proceedings. 😉

Name all the technologies you know that can address data structures as subjects? With properties and the ability to declare synonyms for components of data structures?

Did you say something other than topic maps?

Use owl:sameAs as an example. How would you represent properties of owl:sameAs?

This sounds very much like a topic maps conference!

Teiid (8.2 Final Released!) [Component for TM System]

Thursday, November 22nd, 2012

Teiid

From the homepage:

Teiid is a data virtualization system that allows applications to use data from multiple, heterogenous data stores.

Teiid is comprised of tools, components and services for creating and executing bi-directional data services. Through abstraction and federation, data is accessed and integrated in real-time across distributed data sources without copying or otherwise moving data from its system of record.

Teiid Parts

  • Query Engine: The heart of Teiid is a high-performance query engine that processes relational, XML, XQuery and procedural queries from federated datasources.  Features include support for homogenous schemas, hetrogenous schemas, transactions, and user defined functions.
  • Embedded: An easy-to-use JDBC Driver that can embed the Query Engine in any Java application. (as of 7.0 this is not supported, but on the roadmap for future releases)
  • Server: An enterprise ready, scalable, managable, runtime for the Query Engine that runs inside JBoss AS that provides additional security, fault-tolerance, and administrative features.
  • Connectors: Teiid includes a rich set of Translators and Resource Adapters that enable access to a variety of sources, including most relational databases, web services, text files, and ldap.  Need data from a different source? A custom translators and resource adaptors can easily be developed.
  • Tools:

Teiid 8.2 final was released on November 20, 2012.

Like most integration services, not strong on integration between integration services.

Would make one helluva component for a topic map system.

A system with an inter-integration solution mapping layer in addition to the capabilities of Teiid.

Boy Scout Explusions – Oil Drop Semantics

Monday, October 22nd, 2012

Data on decades of Boy Scout expulsions released by Nathan Yau.

Nathan points to an interactive map, searchable list and downloadable data from the Los Angeles Times of data from the Boy Scouts of America on people expelled from the Boy Scouts for suspicions of sexual abuse.

The LA Times has done a great job with this data set (and the story) but it also illustrates a limitation in current data practices.

All of these cases occurred in jurisdictions with laws against sexual abuse of children.

If a local sheriff or district attorney reads about this database, how do they tie it into their databases?

Not at simple as saying “topic map,” if that’s what you were anticipating.

Among the issues that would need addressing:

  • Confidentiality – Law enforcement and courts have their own rules about sharing data.
  • Incompatible System Semantics – The typical problem that is encountered in business enterprises, writ large. Every jurisdiction is likely to have its own rules, semantics and files.
  • Incompatible Data Semantics – Assuming systems talk to each other, the content and its semantics will vary from one jurisdiction to another.
  • Subjects Evading Identification – The subjects (sorry!) in question are trying to avoid identification.

You could get funding for a conference of police administrators to discuss how to organize additional meetings to discuss potential avenues for data sharing and get the DHS to fund a large screen digital TV (not for the meeting, just to have one). Consultants could wax and whine about possible solutions if someday you decided on one.

I have a different suggestion: Grab your records guru and meet up with an overlapping or neighboring jurisdiction’s data guru and one of their guys. For lunch.

Bring note pads and sample records. Talk about how you share information between officers (that you and your counter-part). Let the data gurus talk about how they can share data.

Practical questions of how to share data and what does your data mean now? Make no global decisions, no award medals for attending, etc.

Do that once or twice a month for six months. Write down what worked, what didn’t work (just as important). Each of you picks an additional partner. Share what you have learned.

The documenting and practice at information sharing will be the foundation for more formal information sharing systems. Systems based on documented sharing practices, not how administrators imagine sharing works.

Think of it as “oil drop semantics.”

Start small and increase only as more drops are added.

The goal isn’t a uniform semantic across law enforcement but understanding what is being said. That understanding can be mapped into a topic map or other information sharing strategy. But understanding comes first, mapping second.

Are You An IT Hostage?

Monday, August 13th, 2012

As I promised last week in From Overload to Impact: An Industry Scorecard on Big Data Business Challenges [Oracle Report], the key finding that is missing from Oracle’s summary:

Executives’ Biggest Data Management Gripes:*

#1 Don’t have the right systems in place to gather the information we need (38%)

#2 Can’t give our business managers access to the information they need; need to rely on IT (36%)

Ask your business managers: Do they feel like IT hostages?

You are likely to be surprised at the answers you get.

IT’s vocabulary acts as an information clog.

A clog that impedes the flow of information in your organization.

Information that can improve the speed and quality of business decision making.

The critical point is: Information clogs are bad for business.

Do you want to borrow my plunger?

From Overload to Impact: An Industry Scorecard on Big Data Business Challenges [Oracle Report]

Friday, August 10th, 2012

From Overload to Impact: An Industry Scorecard on Big Data Business Challenges [Oracle Report]

Summary:

IT powers today’s enterprises, which is particularly true for the world’s most data-intensive industries. Organizations in these highly specialized industries increasingly require focused IT solutions, including those developed specifically for their industry, to meet their most pressing business challenges, manage and extract insight from ever-growing data volumes, improve customer service, and, most importantly, capitalize on new business opportunities.

The need for better data management is all too acute, but how are enterprises doing? Oracle surveyed 333 C-level executives from U.S. and Canadian enterprises spanning 11 industries to determine the pain points they face regarding managing the deluge of data coming into their organizations and how well they are able to use information to drive profit and growth.

Key Findings:

  • 94% of C-level executives say their organization is collecting and managing more business information today than two years ago, by an average of 86% more
  • 29% of executives give their organization a “D” or “F” in preparedness to manage the data deluge
  • 93% of executives believe their organization is losing revenue – on average, 14% annually – as a result of not being able to fully leverage the information they collect
  • Nearly all surveyed (97%) say their organization must make a change to improve information optimization over the next two years
  • Industry-specific applications are an important part of the mix; 77% of organizations surveyed use them today to run their enterprise—and they are looking for more tailored options

What key finding did they miss?

They cover it in the forty-two (42) page report but it doesn’t appear here.

Care to guess what it is?

Forgotten key finding post coming Monday, 13 August 2012. Watch for it!

I first saw this at Beyond Search.

Does She or Doesn’t She?

Saturday, June 16th, 2012

Information Processing: Adding a Touch of Color

From the post:

An innovative computer program brings color to grayscale images.

Creating a high-quality realistic color image from a grayscale picture can be challenging. Conventional methods typically require the user’s input, either by using a scribbling tool to color the image manually or by using a color transfer. Both options can result in poor colorization quality limited by the user’s degree of skill or the range of reference images available.

Alex Yong-Sang Chia at the A*STAR’s Institute for Infocomm Research and co-workers have now developed a computer program that utilizes the vast amount of imagery available on the internet to find suitable color matches for grayscale images. The program searches hundreds of thousands of online color images, cross-referencing their key features and objects in the foreground with those of grayscale pictures.

“We have developed a method that takes advantage of the plentiful supply of internet data to colorize gray photos,” Chia explains. “The user segments the image into separate major foreground objects and adds semantic labels naming these objects in the gray photo. Our program then scans the internet using these inputs for suitable object color matches.”

If you think about it for a moment, it appears that subject recognition in images is being performed here. As the researchers concede, its not 100% but then it doesn’t need to be. They have human users in the loop.

I wonder if the human users have to correct the coloration for an image more than once for a source of color image? That is does the system “remember” earlier choices?

The article doesn’t say so I will follow up with an email.

Keeping track of user-corrected subject recognition would create a bread crumb trail for other users confronted with the same images. (In other words, a topic map.)

Knowledge Economics II

Sunday, March 18th, 2012

My notes about Steve Newcomb’s economic asset approach to knowledge/information integration were taking me too far afield from the conference proper.

As an economic asset, take information liberated from alphabet soup agency (ASP) #1 to be integrated with your information. Your information could be from unnamed sources, public records, your records, etc. Reliable integration requires semantic knowledge of ASP #1’s records or trial-n-error processing. Unless, of course, you have a mapping enabling reliable integration of ASP #1 information with your own.

How much is that “mapping” worth to you? Is it reusable? Or should I say, “retargetable?”

You can, as people are want to do, hire a data mining firm to go over thousands of documents (like State Department cables, which revealed the trivia nature of State Department secrets) and get a one off result. But what happens the next time? Do you do the same thing over again? And how does that fit into your prior results?

That’s really the question isn’t it? Not how do you process the current batch of information (although that can be important) but how does that integrate into your prior data? So that your current reporters will not have to duplicate all the searching your earlier reporters did to find the same information.

Perhaps they will uncover relationships that were not apparent from only one batch of leaked information. Perhaps they will purchase from the airlines their travel data to be integrated with reported sightings from their own sources. Or telephone records from carriers not based in the United States.

But data integration opportunities are not just for governments and the press.

Your organization has lots of information. Information on customers. Information on suppliers. Information on your competition. Information on what patients were taking what drugs with what results? (Would you give that information to drug companies or sell it to drug companies? I know my answer. How about yours?)

What will you answer when a shareholder asks: What is our KROI? Knowledge Return on Investment?

You have knowledge to sell. How are you going to package it up to attract buyers? (inquiries welcome)

Documenting decisions separately from use cases

Thursday, March 15th, 2012

Documenting decisions separately from use cases by James Taylor.

From the post:

I do propose making decisions visible. By visible, I mean a separate and explicit step for each decision being made. These steps help the developer identify where possible alternate and exception paths may be placed. These decision points occur when an actor’s input drives the scenario down various paths.

I could not have put this better myself. I am a strong believer in this kind of separation, and of documenting how the decision is made independently of the use case so it can be reused. The only thing I would add is that these decisions need to be decomposed and analyzed, not simply documented. Many of these decisions are non-trivial and decomposing them to find the information, know-how and decisions on which they depend can be tremendously helpful.

James describes development and documentation of use cases and decisions in a context broader than software development. His point on decomposition of decisions is particularly important for systems designed to integrate information.

He describes decomposition of decisions as leading to discovery of “information, know-how and decisions on which they depend….”

Compare and contrast that with simple mapping decisions that map one column in a table to another. Can you say on what basis that mapping was made? Or with more complex systems, what “know-how” is required or on what other decisions that mapping may depend?

If your integration software/practice/system doesn’t encourage or allow such decomposition of decisions, you may need another system.

James also cover’s some other decision management materials that you may find useful in designing, authoring, evaluating information systems. (I started to say “semantic information systems” but all information systems have semantics, so that would be prepending an unnecessary noise word.)

Information Heterogeneity and Fusion

Thursday, May 12th, 2011

2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011)

Important Dates:

Paper submission deadline: 25th July 2011
Notification of acceptance: 19th August 2011
Camera-ready version due: 12th September 2011
Workshop: 23rd or 27th October 2011

Datasets are also being made available. Just in case you can’t find any heterogeneous data lying around. 😉

Looks like a perfect venue for topic map papers. (Not to mention that a re-usable mapping between recommender systems looks like a commercial opportunity.)

From the website:

In recent years, increasing attention has been given to finding ways for combining, integrating and mediating heterogeneous sources of information for the purpose of providing better personalized services in many information seeking and e-commerce applications. Information heterogeneity can indeed be identified in any of the pillars of a recommender system: the modeling of user preferences, the description of resource contents, the modeling and exploitation of the context in which recommendations are made, and the characteristics of the suggested resource lists.

Almost all current recommender systems are designed for specific domains and applications, and thus usually try to make best use of a local user model, using a single kind of personal data, and without explicitly addressing the heterogeneity of the existing personal information that may be freely available (on social networks, homepages, etc.). Recognizing this limitation, among other issues: a) user models could be based on different types of explicit and implicit personal preferences, such as ratings, tags, textual reviews, records of views, queries, and purchases; b) recommended resources may belong to several domains and media, and may be described with multilingual metadata; c) context could be modeled and exploited in multi-dimensional feature spaces; d) and ranked recommendation lists could be diverse according to particular user preferences and resource attributes, oriented to groups of users, and driven by multiple user evaluation criteria.

The aim of HetRec workshop is to bring together students, faculty, researchers and professionals from both academia and industry who are interested in addressing any of the above forms of information heterogeneity and fusion in recommender systems. We would like to raise awareness of the potential of using multiple sources of information, and look for sharing expertise and suitable models and techniques.

Another dire need is for strong datasets, and one of our aims is to establish benchmarks and standard datasets on which the problems could be investigated. In this edition, we make available on-line datasets with heterogeneous information from several social systems. These datasets can be used by participants to experiment and evaluate their recommendation approaches, and be enriched with additional data, which may be published at the workshop website for future use.

12th IEEE International Conference on Information Reuse and Integration (IEEE IRI-2011)

Tuesday, March 8th, 2011

12th IEEE International Conference on Information Reuse and Integration (IEEE IRI-2011)

From the announcement:

Given the emerging global Information-centric IT landscape that has tremendous social and economic implications, effectively processing and integrating humongous volumes of information from diverse sources to enable effective decision making and knowledge generation have become one of the most significant challenges of current times. Information Reuse and Integration (IRI) seeks to maximize the reuse of information by creating simple, rich, and reusable knowledge representations and consequently explores strategies for integrating this knowledge into systems and applications. IRI plays a pivotal role in the capture, representation, maintenance, integration, validation, and extrapolation of information; and applies both information and knowledge for enhancing decision-making in various application domains.

This conference explores three major tracks: information reuse, information integration, and reusable systems. Information explores theory and practice of optimizing representation; information integration focuses on innovative strategies and algorithms for applying integration approaches in novel domains; and reusable systems focus on developing and deploying models and corresponding processes that enable Information Reuse and Integration to play a pivotal role in enhancing decision-making processes in various application domains.

Important dates:

March 28, 2011 Submission of abstract (Recommended)
April 5, 2011 Paper submission deadline
May 14, 2011 Notification of acceptance
May 28, 2011 Camera-ready paper due
May 28, 2011 Presenting author registration due
June 30, 2011 Advance (discount) registration for general public and other co-author
July 15, 2011 Hotel reservation (special discount rate) closing date
August 3-5, 2011 Conference events

12th IEEE International Conference on Information Reuse and Integration (IEEE IRI-2011)

Tuesday, January 11th, 2011

12th IEEE International Conference on Information Reuse and Integration (IEEE IRI-2011)

From the announcement:

Given the emerging global Information-centric IT landscape that has tremendous social and economic implications, effectively processing and integrating humongous volumes of information from diverse sources to enable effective decision making and knowledge generation have become one of the most significant challenges of current times. Information Reuse and Integration (IRI) seeks to maximize the reuse of information by creating simple, rich, and reusable knowledge representations and consequently explores strategies for integrating this knowledge into systems and applications. IRI plays a pivotal role in the capture, representation, maintenance, integration, validation, and extrapolation of information; and applies both information and knowledge for enhancing decision-making in various application domains.

This conference explores three major tracks: information reuse, information integration, and reusable systems. Information explores theory and practice of optimizing representation; information integration focuses on innovative strategies and algorithms for applying integration approaches in novel domains; and reusable systems focus on developing and deploying models and corresponding processes that enable Information Reuse and Integration to play a pivotal role in enhancing decision-making processes in various application domains.

All three tracks depend on subject identity, whether explicitly recognized or not. Would be nice to have topic map representatives at the conference.

Important Dates:

Paper submission deadline February 15, 2011

Notification of acceptance April 15, 2011

Camera-ready paper due May 1, 2011

Presenting author registration due May 1, 2011

Advance (discount) registration for general public and other co-author June 30, 2011

Hotel reservation (special discount rate) closing date July 15, 2011

Conference events August 3-5, 2011

Just picking at random from prior proceedings, I noticed:

Inconsistency: the good, the bad, and the ugly by Du Zhang from the 9th annual meeting.

Definitely a topic map sort of conference.