Archive for the ‘Knowledge Representation’ Category

Knowledge Map At The Washington Post (Rediscovery of HyperText)

Saturday, August 1st, 2015

How The Washington Post built — and will be building on — its “Knowledge Map” feature by Shan Wang.

From the post:

The Post is looking to create a database of “supplements” — categorized pieces of text and graphics that help give context around complicated news topics — and add it as a contextual layer across lots of different Post stories.

The Washington Post’s Knowledge Map aims to diminish that frustration by embedding context and background directly in a story. (We wrote about it briefly when it debuted earlier this month.) Highlighted links and buttons within the story, allowing readers to click on and then read brief overviews — called “supplements” — on the right hand side of the same page, without having to leave the page (currently the text and supplements are not tethered, so if you scroll away in the main story, there’s no easy way to jump back to the phrase or name you clicked on initially).

Knowledge Map sprouted a few months ago out of a design sprint (based on a five-day brainstorming method outlined by Google Ventures) that included the Post’s New York-based design and development team WPNYC and members of the data science team in the D.C. office, as well as engineers, designers, and other product people. After narrowing down a list of other promising projects, the team presented to the Post newsroom and to its engineering team an idea for providing readers with better summaries and context for the most complicated, long-evolving stories.

That idea of having context built into a story “really resonated” with colleagues, Sampsel said, so her team quickly created a proof-of-concept using an existing Post story, recruiting their first round of testers for the prototype via Craigslist. Because they had no prior data on what sort of key phrases or figures readers might want explained for any given story, the team relied on trial and error to settle on the right level of detail.

Not to take anything away from the Washington Post but doesn’t that scenario sounds a lot like HTML, <a> links with Javascript “hover” content? Perhaps the content is a bit long for hover, perhaps a pop-up window on mouseOver? Hold the context data locally for response time reasons.

Has the potential of hypertext been so muted by advertising, graphics, interactivity and > 1 MB pages that it takes a “design sprint” to bring some of that potential back to the fore?

I’m very glad that:

That idea of having context built into a story “really resonated” with colleagues,

but it isn’t a new idea.

Perhaps the best way to move the Web forward at this point would be to re-read (or read) some of the early web conference proceedings.

Rediscover what the web was like before being Google-driven was an accurate description of the web.

Other suggestions?

Groups: knowledge spreadsheets for symbolic biocomputing [Semantic Objects]

Tuesday, September 17th, 2013

Groups: knowledge spreadsheets for symbolic biocomputing by Michael Travers, Suzanne M. Paley, Jeff Shrager, Timothy A. Holland and Peter D. Karp.

Abstract:

Knowledge spreadsheets (KSs) are a visual tool for interactive data analysis and exploration. They differ from traditional spreadsheets in that rather than being oriented toward numeric data, they work with symbolic knowledge representation structures and provide operations that take into account the semantics of the application domain. ‘Groups’ is an implementation of KSs within the Pathway Tools system. Groups allows Pathway Tools users to define a group of objects (e.g. groups of genes or metabolites) from a Pathway/Genome Database. Groups can be transformed (e.g. by transforming a metabolite group to the group of pathways in which those metabolites are substrates); combined through set operations; analysed (e.g. through enrichment analysis); and visualized (e.g. by painting onto a metabolic map diagram). Users of the Pathway Tools-based BioCyc.org website have made extensive use of Groups, and an informal survey of Groups users suggests that Groups has achieved the goal of allowing biologists themselves to perform some data manipulations that previously would have required the assistance of a programmer.

Database URL: BioCyc.org.

Not my area so a biologist would have to comment on the substantive aspects of using these particular knowledge spreadsheets.

But there is much in this article that could be applied more broadly.

From the introduction:

A long-standing problem in computing is that of providing non-programmers with intuitive, yet powerful tools for manipulating and analysing sets of entities. For example, a number of bioinformatics database websites provide users with powerful tools for composing database queries, but once a user obtains the query results, they are largely on their own. What if a user wants to store the query results for future reference, or combine them with other query results, or transform the results, or share them with a colleague? Sets of entities of interest arise in other contexts for life scientists, such as the entities that are identified as significantly perturbed in a high-throughput experiment (e.g. a set of differentially occurring metabolites), or a set of genes of interest that emerge from an experimental investigation.

We observe that spreadsheets have become a dominant form of end-user programming and data analysis for scientists. Although traditional spreadsheets provide a compelling interaction model, and are excellent tools for the manipulation of the tables of numbers that are typical of accounting and data analysis problems, they are less easily used with the complex symbolic computations typical of symbolic biocomputing. For example, they cannot perform semantic transformations such as converting a gene list to the list of pathways the genes act in.

We coined the term knowledge spreadsheet (KS) to describe spreadsheets that are characterized by their ability to manipulate semantic objects and relationships instead of just numbers and strings. Both traditional spreadsheets and KSs represent data in tabular structures, but in a KS the contents of a cell will typically be an object from a knowledge base (KB) [such as a MetaCyc (1) frame or a URI entity from an RDF store]. Given that a column in a KS will typically contain objects of the same ontological type, a KS can offer high-level semantically knowledgeable operations on the data. For example, given a group with a column of metabolites, a semantic operation could create a parallel column in which each cell contained the reactions that produced that metabolite. Another difference between our implementation of KSs and traditional spreadsheets is that cells in our KSs can contain multiple values.
(…)

Can you think of any domain that would not benefit from better handling of “semantic objects?”

As you read the article closely, any number of ideas or techniques for manipulating “semantic objects” will come to mind.

Some principles of intelligent tutoring

Tuesday, February 12th, 2013

Some principles of intelligent tutoring by Stellan Ohlsson. (Instructional Science May 1986, Volume 14, Issue 3-4, pp 293-326)

Abstract:

Research on intelligent tutoring systems is discussed from the point of view of providing moment-by-moment adaptation of both content and form of instruction to the changing cognitive needs of the individual learner. The implications of this goal for cognitive diagnosis, subject matter analysis, teaching tactics, and teaching strategies are analyzed. The results of the analyses are stated in the form of principles about intelligent tutoring. A major conclusion is that a computer tutor, in order to provide adaptive instruction, must have a strategy which translates its tutorial goals into teaching actions, and that, as a consequence, research on teaching strategies is central to the construction of intelligent tutoring systems.

Be sure to notice the date: 1986, when you could write:

The computer offers the potential for adapting instruction to the student at a finer grain-level than the one which concerned earlier generations of educational researchers. First, instead of adapting to global traits such as learning style, the computer tutor can, in principle, be programmed to adapt to the student dynamically, during on-going instruction, at each moment in time providing the kind of instruction that will be most beneficial to the student at that time. Said differently, the computer tutor takes a longitudinal, rather than cross-sectional, perspective, focussing on the fluctuating cognitive needs of a single learner over time, rather than on stable inter-individual differences. Second, and even more important, instead of adapting to content-free characteristics of the learner such as learning rate, the computer can, in principle, be programmed to adapt both the content and the form of instruction to the student’s understanding of the subject matter. The computer can be programmed, or so we hope, to generate exactly that question, explanation, example, counter-example, practice problem, illustration, activity, or demonstration which will be most helpful to the learner. It is the task of providing dynamic adaptation of content and form which is the challenge and the promise of computerized instruction*

That was written decades before we were habituated to users adapting to the interface, not the other way around.

More on point, the quote from Ohlsson, Principle of Non-Equifinality of Learning, was proceeded by:

But there are no canonical representations of knowledge. Any knowledge domain can be seen from several different points of view, each view showing a different structure, a different set of parts, differently related. This claim, however broad and blunt – almost impolite – it may appear when laid out in print, is I believe, incontrovertible. In fact, the evidence for it is so plentiful that we do not notice it, like the fish in the sea who never thinks about water. For instance, empirical studies of expertise regularly show that human experts differ in their problem solutions (e.g., Prietula and Marchak, 1985); at the other end of the scale, studies of young children tend to show that they invent a variety of strategies even for simple tasks, (e.g., Young, 1976; Svenson and Hedenborg, 1980). As a second instance, consider rational analyses of thoroughly codified knowledge domains such as the arithmetic of rational numbers. The traditional mathematical treatment by Thurstone (1956) is hard to relate to the didactic analysis by Steiner (1969), which, in turn, does not seem to have much in common with the informal, but probing, analyses by Kieren (1976, 1980) – and yet, they are all experts trying to express the meaning of, for instance, “two-thirds”. In short, the process of acquiring a particular subject matter does not converge on a particular representation of that subject matter. This fact has such important implications for instruction that it should be stated as a principle.

The first two sentences capture the essence of topic maps as well as any I have ever seen:

But there are no canonical representations of knowledge. Any knowledge domain can be seen from several different points of view, each view showing a different structure, a different set of parts, differently related.
(emphasis added)

Single knowledge representations, such as in bank accounting systems can be very useful. But when multiple banks with different accounting systems try to roll knowledge up to the Federal Reserve, different (not better) representations may be required.

Could even require representations that support robust mappings between different representations.

What do you think?

Principle of Non-Equifinality of Learning

Tuesday, February 12th, 2013

In “Educational Concept Maps: a Knowledge Based Aid for Instructional Design.” by Giovanni Adorni, Mauro Coccoli, Giuliano Vivanet (DMS 2011: 234-237), you will find the following passage:

…one of the most relevant problems concerns the fact that there are no canonical representations of knowledge structures and that a knowledge domain can be structured in different ways, starting from various points of view. As Ohlsson [2] highlighted, this fact has such relevant implications for authoring systems, that it should be stated as the “Principle of Non-Equifinality of Learning”. According to this, “The state of knowing the subject matter does not correspond to a single well-defined cognitive state. The target knowledge can always be represented in different ways, from different perspectives; hence, the process of acquiring the subject matter have many different, equally valid, end states”. Therefore it is necessary to re-think learning models and environments in order to enable users to better build represent and share their knowledge. (emphasis in original)

Nominees for representing “target knowledge…in different ways, from different perspectives….?”

In the paper, the authors detail their use of topic maps, XTM topic maps in particular and the Vizigator for visualization of their topic maps.

Sorry, I was so excited about the quote I forgot to post the article abstract:

This paper discusses a knowledge-based model for the design and development of units of learning and teaching aids. The idea behind this model originates from both the analysis of the open issues in instructional authoring systems, and the lack of a well-defined process able to merge pedagogical strategies with systems for the knowledge organization of the domain. In particular, it is presented the Educational Concept Map (ECM): a, pedagogically founded (derived from instructional design theories), abstract annotation system that was developed with the aim of guaranteeing the reusability of both teaching materials and knowledge structures. By means of ECMs, it is possible to design lessons and/or learning paths from an ontological structure characterized by the integration of hierarchical and associative relationships among the educational objectives. The paper also discusses how the ECMs can be implemented by means of the ISO/IEC 13250 Topic Maps standard. Based on the same model, it is also considered the possibility of visualizing, through a graphical model, and navigate, through an ontological browser, the knowledge structure and the relevant resources associated to them.

BTW, you can find the paper in DMS 2011 Proceedings Warning: Complete Proceedings, 359 pages, 26.3 MB PDF file. Might not want to try it on your cellphone.

And yes, this is the paper that I found this morning that triggered a number of posts as I ran it to ground. 😉 At least I will have sign-posts for some of these places next time.

Molecules from scratch without the fiendish physics

Sunday, February 12th, 2012

Molecules from scratch without the fiendish physics by Lisa Grossman.

From the post:

But because the equation increases in complexity as more electrons and protons are introduced, exact solutions only exist for the simplest systems: the hydrogen atom, composed of one electron and one proton, and the hydrogen molecule, which has two electrons and two protons.

This complexity rules out the possibility of exactly predicting the properties of large molecules that might be useful for engineering or medicine. “It’s out of the question to solve the Schrödinger equation to arbitrary precision for, say, aspirin,” says von Lilienfeld.

So he and his colleagues bypassed the fiendish equation entirely and turned instead to a computer-science technique.

Machine learning is already widely used to find patterns in large data sets with complicated underlying rules, including stock market analysis, ecology and Amazon’s personalised book recommendations. An algorithm is fed examples (other shoppers who bought the book you’re looking at, for instance) and the computer uses them to predict an outcome (other books you might like). “In the same way, we learn from molecules and use them as previous examples to predict properties of new molecules,” says von Lilienfeld.

His team focused on a basic property: the energy tied up in all the bonds holding a molecule together, the atomisation energy. The team built a database of 7165 molecules with known atomisation energies and structures. The computer used 1000 of these to identify structural features that could predict the atomisation energies.

When the researchers tested the resulting algorithm on the remaining 6165 molecules, it produced atomisation energies within 1 per cent of the true value. That is comparable to the accuracy of mathematical approximations of the Schrödinger equation, which work but take longer to calculate as molecules get bigger (Physical Review Letters, DOI: 10.1103/PhysRevLett.108.058301). (emphasis added)

One way to look at this research is to say we have three avenues to discovering the properties of molecules:

  1. Formal logic – but would require far more knowledge than we have at the moment
  2. Schrödinger equation – but that may be intractable for some molecules
  3. Knowledge-based approach – May be less precise than 1 & 2 but works now.

A knowledge-based approach allows us to make progress now. Topic maps can be annotated with other methods, such as math or research results, up to and including formal logic.

The biggest different with topic maps is that the information you wish to record or act upon is not restricted ahead of time.

SIGKDD 2011 Conference

Tuesday, September 6th, 2011

A pair of posts from Ryan Rosario on the SIGKDD 2011 Conference.

Day 1 (Graph Mining and David Blei/Topic Models)

Tough sledding on Probabilistic Topic Models but definitely worth the effort to follow.

Days 2/3/4 Summary

Useful summaries and pointers to many additional resources.

If you attended SIGKDD 2011, do you have pointers to other reviews of the conference or other resources?

I added a category for SIGKDD.

Knowledge Representation and Reasoning with Graph Databases

Wednesday, April 13th, 2011

Knowledge Representation and Reasoning with Graph Databases

Just in case you aren’t following Marko A. Rodriguez:

A graph database and its ecosystem of technologies can yield elegant, efficient solutions to problems in knowledge representation and reasoning. To get a taste of this argument, we must first understand what a graph is. A graph is a data structure. There are numerous types of graph data structures, but for the purpose of this post, we will focus on a type that has come to be known as a property graph. A property graph denotes vertices (nodes, dots) and edges (arcs, lines). Edges in a property graph are directed and labeled/typed (e.g. “marko knows peter”). Both vertices and edges (known generally as elements) can have any number of key/value pairs associated with them. These key/value pairs are called properties. From this foundational structure, a suite of questions can be answered and problems solved.

See the post for the details.

Deep Knowledge Representation Challenge Workshop

Friday, March 25th, 2011

Deep Knowledge Representation Challenge Workshop

From the website:

This workshop will provide a forum to discuss difficult problems in representing complex knowledge needed to support deep reasoning, question answering, explanation and justification systems. The goals of the workshop are: (1) to create a comprehensive set of knowledge representation (KR) challenge problems suitable for a recurring competition, and (2) begin to develop KR techniques to meet those challenges. A set of difficult to represent sentences from a biology textbook are included as an initial set of KR challenges. Cash prizes will be awarded for the most creative and comprehensive solutions to the selected challenges.

The workshop will be a highly interactive event with brief presentations of problems and solutions followed by group discussion. To submit a paper to the workshops, the participants should select a subset of the challenge sentences and present approaches for representing them along with an approach to use that representation in a problem solving task (question answering or decision support). Participants are free to add to the list of challenge sentences, for example, from other chapters of the textbook, or within the spirit of their own projects and experience but should base their suggestions on concrete examples, if possible, from real applications.

Important Dates:

  • 7 May: Submissions due
  • 16 May: Notification of participants
  • 13 June: Final camera ready material for workshop web site and all material for discussion
  • 25 June: Initial workshop – with report back and further, during KCAP. Details to be announced.

I mention this because deep knowledge as in identification of and navigation to is part and parcel of topic maps.

It seems to me that any “…deep reasoning, question answering, explanation and justification system.” is going to succeed or fail based on its identification of subjects.

Or to put it differently, it is difficult to reason effectively if you don’t know what you are talking about. (I could mention several examples from recent news casts but I will forego the opportunity.)