In physics, Feynman diagrams are used to reason about quantum processes. In the 1980s, it became clear that underlying these diagrams is a powerful analogy between quantum physics and topology. Namely, a linear operator behaves very much like a ‘cobordism’: a manifold representing spacetime, going between two manifolds representing space. This led to a burst of work on topological quantum field theory and ‘quantum topology’. But this was just the beginning: similar diagrams can be used to reason about logic, where they represent proofs, and computation, where they represent programs. With the rise of interest in quantum cryptography and quantum computation, it became clear that there is extensive network of analogies between physics, topology, logic and computation. In this expository paper, we make some of these analogies precise using the concept of ‘closed symmetric monoidal category’. We assume no prior knowledge of category theory, proof theory or computer science.

While this is an “expository” paper, at some 66 pages (sans the references), you best set aside some of your best thinking/reading time to benefit from it.

The set-theoretic notion of truth proposed by Tarski is the basis of most work in machine semantics and probably has its roots in the work and influence of Aristotle. We take it for granted that the world can be described, not in shades of grey, but in terms of statements and propositions that are either true or false – and it seems most of western science stands on the same principle. This assumption at the core of our training as scientists should be questioned, because it stands in direct opposition to our human experience. Is there any statement that can be made that can actually be reduced to true or false? Only, it seems, in the artificial human-created realms of mathematics, games, and logic. We have been investigating a different mode of truth, inspired by results in Crowdsourcing, which allows for a highly dimension notion of semantic interpretation that makes true and false look like a childish simplifying assumption.

Chris was the keynote speaker at the Third International Workshop on Linked Data for Information Extraction (LD4IE2015). (Proceedings)

I wasn’t able to find a video for that presentation but I did find “Chris Welty formerly IBM Watson Team – Cognitive Computing GDG North Jersey at MSU” from about ten months ago.

Great presentation on “cognitive computing.”

Enjoy!

Posted in Logic, Semantics | Comments Off on Goodbye to True: Advancing semantics beyond the black and white

The Logical Language Group (LLG) began developing Lojban in 1987. The LLG sought to realize Loglan’s purposes, and further improve the language by making it more usable and freely available (as indicated by its official full English title, “Lojban: A Realization of Loglan”). After a long initial period of debating and testing, the baseline was completed in 1997, and published as The Complete Lojban Language. In an interview in 2010 with the New York Times, Arika Okrent, the author of In the Land of Invented Languages, stated: “The constructed language with the most complete grammar is probably Lojban—a language created to reflect the principles of logic.”

Lojban was developed to be a worldlang; to ensure that the gismu (root words) of the language sound familiar to people from diverse linguistic backgrounds, they were based on the six most widely spoken languages as of 1987—Mandarin, English, Hindi, Spanish, Russian, and Arabic. Lojban has also taken components from other constructed languages, notably the set of evidential indicators from Láadan.

I mention this just in case someone proposes to you than a universal language would increase communication and decrease ambiguity, resulting in better, more accurate communication in all fields.

Yes, yes it would. And several already exist. Including Lojban. Their language can take its place along side other universal languages, i.e., it can increase the number of languages that make up the present matrix of semantic confusion.

In case you know, what part of: New languages increase the potential for semantic confusion, seems unclear?

Recursive neural network models and their accompanying vector representations for words have seen success in an array of increasingly semantically sophisticated tasks, but almost nothing is known about their ability to accurately capture the aspects of linguistic meaning that are necessary for interpretation or reasoning. To evaluate this, I train a recursive model on a new corpus of constructed examples of logical reasoning in short sentences, like the inference of “some animal walks” from “some dog walks” or “some cat walks,” given that dogs and cats are animals. This model learns representations that generalize well to new types of reasoning pattern in all but a few cases, a result which is promising for the ability of learned representation models to capture logical reasoning.

From the introduction:

Natural language inference (NLI), the ability to reason about the truth of a statement on the basis of some premise, is among the clearest examples of a task that requires comprehensive and accurate natural language understanding [6].

I stumbled over that line in Samuel’s introduction because it implies, at least to me, that there is a notion of truth that resides outside of ourselves as speakers and hearers.

Take his first example:

Consider the statement all dogs bark. From this, one can infer quite a number of other things. One can replace the first argument of all (the first of the two predicates following it, here dogs) with any more specific category that contains only dogs and get a valid inference: all puppies bark; all collies bark.

Contrast that with one the premises that starts my day:

All governmental statements are lies of omission or commission.

Yet, firmly holding that as a “fact” of the world, I write to government officials, post ranty blog posts about government policies, urge others to attempt to persuade government to take certain positions.

It’s not that I think Samuel is incorrect about monotonicity for “logical reasoning” but monotonicity is a far cry from how people reason day to day.

Rather than creating “reasoning” that is such a departure from human inference, why not train a deep learning system to “reason” by exposing it to the same inputs and decisions made by human decision makers? Imitation doesn’t require understanding of human “reasoning,” just the ability to engage in the same behavior under similar circumstances.

That would reframe Samuel’s question to read: Can recursive neural tensor networks learn human reasoning?

An important result in computer science and type theory is that a type system corresponds to a particular logic system.

How does this work? The basic idea is that of the Curry-Howard Correspondence. A type is interpreted as a proposition, and a value is interpreted as a proof of the proposition corresponding to its type. Most standard logical connectives can be derived from this idea: for example, the values of the pair type (A, B) are pairs of values of types A and B, meaning they’re pairs of proofs of A and B, which means that (A, B) represents the logical conjunction “A && B”. Similarly, logical disjunction (“A | | B”) corresponds to what’s called a “tagged union” type: a value (proof) of Either A B is either a value (proof) of A or a value (proof) of B.

This might be a lot to take in, so let’s take a few moments for concrete perspective.

Types like Int and String are propositions – you can think of simple types like these as just stating that “an Int exists” or “a String exists”. 1 is a proof of Int, and "hands" is a proof of String. (Int, String) is a simple tuple type, stating that “there exists an Int and there exists a String”. (1, "hands") is a proof of (Int, String). Finally, the Either type is a bit more mysterious if you aren’t familiar with Haskell, but the type Either a b can contain values of type a tagged as the “left” side of an Either or values of type b tagged as the “right” side of an Either. So Either Int String means “either there exists an Int or there exists a String”, and it can be proved by either Left 1 or Right "hands". The tags ensure that you don’t lose any information if the two types are the same: Either Int Int can be proved by Left 1 or Right 1, which can be distinguished from each other by their tags.

…

Heavy sledding but should very much be on your reading list.

It has gems like:

truth is useless for computation and proofs are not

I would have far fewer objections to some logic/ontology discussions if they limited their claims to computation.

People are free to accept or reject any result of computation. Depends on their comparison of the result to their perception of the world.

Case in point, the five year old who could not board a plane because they shared a name with someone on the no-fly list.

One person, a dull TSA agent, could not see beyond the result of a calculation on the screen.

Everyone else could see a five year old who, while cranky, wasn’t on the no-fly list.

Our lives now run on software. Bugs are becoming not just annoyances for software developers, but sources of potentially catastrophic failures. A careless programmer mistake could leak our social security numbers or crash our cars. While testing provides some assurance, it is difficult to test all possibilities in complex systems–and practically impossible in concurrent systems. For the critical systems in our lives, we should demand mathematical guarantees that the software behaves the way the programmer expected.

A single paper influenced much of the work towards providing these mathematical guarantees. C.A.R. Hoare’s seminal 1969 paper “An Axiomatic Basis for Computer Programming” introduces a method of reasoning about program correctness now known as Hoare logic. In this paper, Hoare provides a technique that 1) allows programmers to express program properties and 2) allows these properties to be automatically checked. These ideas have influenced decades of research in automated reasoning about software correctness.

In this talk, I will describe the main ideas in Hoare logic, as well as the impact of these ideas. I will talk about my personal experience using Hoare logic to verify memory guarantees in an operating system. I will also discuss takeaway lessons for working programmers.

The slides are impressive enough! I will be updating this post to include a pointer to the video when posted.

How important is correctness of merging in topic maps?

If you are the unfortunate individual whose personal information includes an incorrectly merged detail describing you as a terrorist, correctness of merging may be very important, at least to you.

The same would be true for information systems containing arrest warrants, bad credit information, incorrect job histories, education records, and banking records, just to mention a few.

What guarantees can you provide clients concerning merging of data in your topic maps?

Or is that the client and/or victim’s problem?

Posted in Logic, Programming, Proof Theory | Comments Off on Jean Yang on An Axiomatic Basis for Computer Programming

Despite advancements in graphics, physics, and artificial intelligence, modern video games are still lacking in believable dialogue generation. The more complex and interactive stories in modern games may allow the player to experience diffierent paths in dialogue trees, but such trees are still required to be manually created by authors. Recently, there has been research on methods of creating emergent believable behaviour, but these are lacking true dialogue construction capabilities. Because the mapping of natural language to meaningful computational representations (logical forms) is a difficult problem, an important first step may be to develop a means of representing in-game dialogue as logical expressions. This thesis introduces and describes a system for representing dialogue as first-order logic predicates, demonstrates its equivalence with current dialogue authoring techniques, and shows how this representation is more dynamic and flexible.

If you remember the Knights and Knaves from Labyrinth or other sources, you will find this an enjoyable read. After solving the puzzle, Kaylen’s discussion shows that a robust solution requires information hiding and the capacity for higher-order questioning.

Clojure fans will appreciate the use of clojure.core.logic.

This tutorial will guide you through the magic and fun of combining relational programming (also known as logic programming) with functional programming. This tutorial does not assume that you have any knowledge of Lisp, Clojure, Java, or even functional programming. The only thing this tutorial assumes is that you are not afraid of using the command line and you have used at least one programming language before in your life.

A fairly short tutorial but one where “relational” in the title is likely to result in confusion. Here “relational” is meant in the sense of “logical.”

Powerful insights arise from linking two fields of study previously thought separate. Examples include Descartes’s coordinates, which links geometry to algebra, Planck’s Quantum Theory, which links particles to waves, and Shannon’s Information Theory, which links thermodynamics to communication. Such a synthesis is offered by the principle of Propositions as Types, which links logic to computation. At first sight it appears to be a simple coincidence—almost a pun—but it turns out to be remarkably robust, inspiring the design of automated proof assistants and programming languages, and continuing to influence the forefronts of computing.

Propositions as Types is a notion with many names and many origins. It is closely related to the BHK Interpretation, a view of logic developed by the intuitionists Brouwer, Heyting, and Kolmogorov in the 1930s. It is often referred to as the Curry-Howard Isomorphism, referring to a correspondence observed by Curry in 1934 and refined by Howard in 1969 (though not published until 1980, in a Festschrift dedicated to Curry). Others draw attention to significant contributions from de Bruijn’s Automath and Martin-Löf’s Type Theory in the 1970s. Many variant names appear in the literature, including Formulae as Types, Curry-Howard-de Bruijn Correspondence, Brouwer’s Dictum, and others.

Propositions as Types is a notion with depth. It describes a correspondence between a given logic and a given programming language, for instance, between Gentzen’s intuitionistic natural deduction (a logic) and Church’s simply-typed lambda calculus (which may be viewed as a programming language). At the surface, it says that for each proposition in the logic there is a corresponding type in the programming language—and vice versa…

Important work even if it is very heavy sledding!

BTW, Wadler mentions two textbook treatments of the subject:

M. H. Sørensen and P. Urzyczyn. Lectures on the Curry-Howard isomorphism. Elsevier, 2006. Amazon has it listed for $146.33.

S. Thompson. Type Theory and Functional Programming. Addison-Wesley, 1991. Better luck here, out of print and posted online by the author: https://www.cs.kent.ac.uk/people/staff/sjt/TTFP/. Errata page was last updated October 2013.

I just glanced at 4.10 Equality and 5.1 Assumptions – 5.2 Naming and abbreviations in Thompson and it promises to be an interesting read!

The most widely known of these four siblings is operator number 11. This operator is called the “material conditional”. It is used to test if a statement fits the logical pattern “P implies Q”. It is equivalent to !P || Q by the material implication.

I only know one language that implementes this operation: VBScript.
…

The post has a good example of why material conditional is useful.

Will your next language have a material conditional operator?

Reasoned Programming by Krysia Broda, Susan Eisenbach, Hessam Khoshnevisan, and Steve Vickers.

From the preface:

Can we ever be sure that our computer programs will work reliably? One approach to this problem is to attempt a mathematical proof of reliability, and this has led to the idea of Formal Methods: if you have a formal, logical specification of the properties meant by `working reliably’, then perhaps you can give a formal mathematical proof that the program (presented as a formal text) satisfies them.

Of course, this is by no means trivial. Before we can even get started on a formal proof we must turn the informal ideas intended by `working reliably’ into a formal specification, and we also need a formal account of what it means to say that a program satisfies a specification (this amounts to a semantics of the programming language, an account of the meaning of programs). None the less, Formal Methods are now routinely practised by a number of software producers.

However, a tremendous overhead derives from the stress on formality, that is to say, working by the manipulation of symbolic forms. A formal mathematical proof is a very different beast from the kind of proof that you will see in mathematical text books. It includes the minutest possible detail, both in proof steps and in background assumptions, and is not for human consumption &emdash; sophisticated software support tools are needed to handle it. For this reason, Formal Methods are often considered justifiable only in `safety critical’ systems, for which reliability is an overriding priority.

The aim of this book is to present informal formal methods, showing the benefits of the approach even without strict formality: although we use logic as a notation for the specifications, we rely on informal semantics &emdash; a programmer’s ordinary intuitions about what small, linear stretches of code actually do &emdash; and we use proofs to the level of rigour of ordinary mathematics.

A bit dated (1994) and teaches Miranda, a functional programming language and uses it to reason about imperative programming.

Even thinking about a “specification” isn’t universally admired these days but the author’s cover that point when they say:

This `precise account of the users’ needs and wants’ is called a specification, and the crucial point to understand is that it is expressing something quite different from the code, that is, the users’ interests instead of the computer’s. If the specification and code end up saying the same thing in different ways &emdash; and this can easily happen if you think too much from the computer’s point of view when you specify &emdash; then doing both of them is largely a waste of time. (emphasis added, Chapter 1, Section 1.3)

That’s blunt enough. 😉

You can pick up Miranda, homesite or translate the examples into a more recent functional language, Clojure comes to mind.

This book is aimed at newcomers to the field of logical reasoning, particularly those who, to borrow a phrase from Pascal, are so made that they understand best through visuals. I have selected a small set of common errors in reasoning and visualized them using memorable illustrations that are supplemented with lots of examples. The hope is that the reader will learn from these pages some of the most common pitfalls in arguments and be able to identify and avoid them in practice.

A delightfully written and illustrated book on bad arguments.

Computational linguistics employs mathematical models to represent morphological, syntactic, and semantic structures in natural languages. The course introduces several such models while insisting on their underlying logical structure and algorithmics. Quite often these models will be related to mathematical objects studied in other MPRI courses, for which this course provides an original set of applications and problems.

The course is not a substitute for a full cursus in computational linguistics; it rather aims at providing students with a rigorous formal background in the spirit of MPRI. Most of the emphasis is put on the symbolic treatment of words, sentences, and discourse. Several fields within computational linguistics are not covered, prominently speech processing and pragmatics. Machine learning techniques are only very sparsely treated; for instance we focus on the mathematical objects obtained through statistical and corpus-based methods (i.e. weighted automata and grammars) and the associated algorithms, rather than on automated learning techniques (which is the subject of course 1.30).

Abundant supplemental materials, slides, notes, further references.

In particular you may like Notes on Computational Aspects of Syntax by Sylvain Schmitz, that cover the first part of Logical and Computational Structures for Linguistic Modeling.

As with any model, there are trade-offs and assumptions build into nearly every choice.

Knowing where to look for those trade-offs and assumptions will give you a response to: “Well, but the model shows that….”

The publishing platform I set up for my forecasting book has now been extended to cover more books and greater functionality. Check it out at www.otexts.org.

This talk will sketch some connections at the foundations of semantics (of programming languages, logics, formal systems in general). In various degrees of abbreviation, we will present Galois Connections, Lawvere Theories, adjoint functors and their relationship to syntax and semantics, and the core notion behind abstract interpretation. At each step we’ll draw connections, trying to show why these are good tools to think with even as we’re solving real world problems and building tools and libraries others will find simple and elegant to use.

In physics, Feynman diagrams are used to reason about quantum processes. In the 1980s, it became clear that underlying these diagrams is a powerful analogy between quantum physics and topology: namely, a linear operator behaves very much like a “cobordism”. Similar diagrams can be used to reason about logic, where they represent proofs, and computation, where they represent programs. With the rise of interest in quantum cryptography and quantum computation, it became clear that there is extensive network of analogies between physics, topology, logic and computation. In this expository paper, we make some of these analogies precise using the concept of “closed symmetric monoidal category”. We assume no prior knowledge of category theory, proof theory or computer science.

The authors set out to create a Rosetta stone for the areas of physics, topology, logic and computation on the subject of categories.

Seventy (70)+ pages of heavy reading but worth the effort (at least so far)!

I believe there are two other emerging trends in programming worth watching that will impact the data world.

Logic Programming, like FP, is actually not new at all, but it is seeing a resurgence of interest, especially in the Clojure community. Rules engines, like Drools, are an example category of logic programming that has been in use for a long time.

…

We’re on the verge of moving to the next level, probabilistic programming languages and systems that make it easier to build probabilistic models, where the modeling concepts are promoted to first-class primitives in new languages, with underlying runtimes that do the hard work of inferring answers, similar to the way that logic programming languages work already. The ultimate goal is to enable end users with limited programming skills, like domain experts, to build effective probabilistic models, without requiring the assistance of Ph.D.-level machine learning experts, much the way that SQL is widely used today.

The description of the meetup is probably the clearest introduction to the paper:

Developers are increasingly choosing datastores that sacrifice strong consistency guarantees in exchange for improved performance and availability. Unfortunately, writing reliable distributed programs without the benefit of strong consistency can be very challenging.

….

In this talk, I’ll discuss work from our group at UC Berkeley that aims to make it easier to write distributed programs without relying on strong consistency. Bloom is a declarative programming language for distributed computing, while CALM is an analysis technique that identifies programs that are guaranteed to be eventually consistent. I’ll then discuss our recent work on extending CALM to support a broader range of programs, drawing upon ideas from CRDTs (A Commutative Replicated Data Type).

If you have an eye towards understanding the future then this is for you.

Do note that the Bloom language is treated more extensively in Datalog Reloaded. You may recall that the basis for tolog (a topic map query language) was Datalog.

Jurimetrics, The Journal of Law, Science, and Technology (ISSN 0897-1277), published quarterly, is the journal of the American Bar Association Section of Science & Technology Law and the Center for Law, Science & Innovation. Click here to view the online version of Jurimetrics.

Jurimetrics is a forum for the publication and exchange of ideas and information about the relationships between law, science and technology in all areas, including:

Physical, life and social sciences

Engineering, aerospace, communications and computers

Logic, mathematics and quantitative methods

The uses of science and technology in law practice, adjudication and court and agency administration

Policy implications and legislative and administrative control of science and technology.

Jurimetrics was first published in 1959 under the leadership of Layman Allen as Modern Uses of Logic in Law (MULL). The current name was adopted in 1966. Jurimetrics is the oldest journal of law and science in the United States, and it enjoys a circulation of more than 8,000, which includes all members of the ABA Section of Science & Technology Law.

A nice cartoon illustration of the problem with inductive arguments in a social context. A video on youtube, so there is an ad popup. Best watched as a loop to appreciate the full point:

Makes me wish I knew how to do animation.

Will make you re-consider the use of induction in your topic map!

Posted in Induction, Logic | Comments Off on Video about a Problem of Inductive Arguments

Exploring the rationality of some syntactic merging operators (extended version) by José Luis Chacón and Ramón Pino Pérez

Abstract:

Most merging operators are defined by semantics methods which have very high computational complexity. In order to have operators with a lower computational complexity, some merging operators defined in a syntactical way have be proposed. In this work we define some syntactical merging operators and exploring its rationality properties. To do that we constrain the belief bases to be sets of formulas very close to logic programs and the underlying logic is defined through forward chaining rule (Modus Ponens). We propose two types of operators: arbitration operators when the inputs are only two bases and fusion with integrity constraints operators. We introduce a set of postulates inspired of postulates LS, proposed by Liberatore and Shaerf and then we analyzed the first class of operators through these postulates. We also introduce a set of postulates inspired of postulates KP, proposed by Konieczny and Pino P\’erez and then we analyzed the second class of operators through these postulates.

Another paper on logic based merging.

I created a separate tag, “merging operators,” to distinguish this from the merging we experience with TMDM based topic maps.

The merging here refers to merging of beliefs to form a coherent view of the world.

A topic map, not subject to other constraints, can “merge” data about a subject that leads to different inferences or is even factually contradictory.

Even if logical consistency post-merging isn’t your requirement, this is a profitable paper to read.

I will see what other resources I can find on logic based merging.

Posted in Logic, Merging Operators | Comments Off on Exploring the rationality of some syntactic merging operators (extended version)

I took the precaution to say “Not only Logic” so I would not have to reach back and invent a soothing explanation for saying “NoLogic.”

The marketing reasons for parroting “NoSQL” are obvious and I won’t belabor them here.

There are some less obvious reasons for saying “NoLogic.”

Logic, as in formal logic (description logic for example), is rarely used by human user. Examples mainly exist in textbooks and journal articles. And of late, in semantic web proposals.

Ask anyone in your office to report the number of times they used formal logic to make a decision in the last week. We both know the most likely answer, by a very large margin.

But we rely upon searches everyday that are based upon the use of digital logic.

Searches that are quite useful in assisting non-logical users but we limit ourselves in refining those search results. By more logic. Which we don’t use ourselves.

Isn’t that odd?

Or take the “curse of dimensionality.” Viewed from the perspective of data mining, Baeza-Yates & Ribeiro-Neto point out that “…a large feature space might render document classifiers impractical.” p.320

Those are features that can be identified with the document.

What of the dimensions of a user who is a former lawyer, theology student, markup editor, Ancient Near Easter amateur, etc., all of which have an impact on how they view any particular document and its relevance to a search result? Or to make connections to another document?

Some of those dimensions would be shared by other users, some would not.

But in either case, human users are untroubled by the “curse of dimensionality.” In part I would suggest because “NoLogic” comes easy for the human user. We may not be able to articulate all the dimensions, but we are likely to pick results similar users will find useful.

We should not forgo logic, either as digital logic or formal reasoning systems, when those assist us.

We should be mindful that logic does not represent all views of the world.

In other words, not only logic (NoLogic).

Posted in Logic, Marketing | Comments Off on NoLogic (Not only Logic) – #5,000

I was chasing down proceedings from prior “Large Heterogeneous Data” workshops (damn, that’s a fourth name), when I ran across this jewel as the location of some of the archives.

Has lots of other interesting papers, software, activities.

Sing out if you see something you think needs to appear on this blog.

That may not be an artful way to frame the question. Perhaps better:

When an author uses some means for identification, whatever that may be, can it be recognized differently by different users?

One case that comes to mind in the interpretation of Egyptian Hieroglyphics over time. In addition to the attempts in the 16th and 17th centuries, which are now thought to be completely fantastic, there are the modern “accepted” translations as well as ancient Egyptian texts where it appears the scribe did not understand what was being copied.

If we are going to faithfully record the history of interpretation of such literature, we cannot flatten the “translated” texts to have the meanings we would assign to them today. The references of the then current literature would make no sense if we did.

Google Books is a valuable service but it is also a dangerous one for research purposes. In part because semantic drift occurs in any living language (or the interpretation of dead ones) and the results are reported without any warnings about such shifts.

Did you know, for example, that “cab” at one time was a slang reference to a house of prostitution? Would give new meaning to the statement: “I will call you a cab.” doesn’t it?

Before we can assign semantics to any word, we need to know what is being identified by that word. But knowing that any one word may represent multiple identifications.

Requirement: A system of identification must support the same identifiers resolving to different identifications.

The consequences of deciding otherwise on such a requirement, I will try to take up tomorrow.

This is the first of a number of posts on what we can do, facing this tidal wave of data with nary a logician or ontologist in sight.

I have a question that I think we need to answer before we get to the question of semantics.

Is it fair to say that identification comes before semantics? That is we have to recognize something (whatever that may be) before we can talk about its semantics?

I ask because I think it is important to take the requirements for data and its semantics one step at a time. And in particular to not jump ahead of ourselves with half-remembered bits of doggerel from grade school to propose syntactic solutions.

Or to put it differently, let’s make sure of what order steps need to be taken before we trip over our own feet.

That would be the requirements phase, as is well known to the successful programmers and startup folks among the audience.

So, is requirement #1 that something be recognized? Whether that is a file, format, subject of any sort or description. I don’t know but suspect we can’t even use logic on things we have yet to recognize.

Just to give you a hint about tomorrow or perhaps the next day, I have meetings tomorrow, can something be recognized more than once?

This may seem like a slow start but the time will pass more quickly than you think it will. There are a number of “perennial” issues that I will argue can be side-lined, in part because they have no answer other than personal preference.

More than 1.8 zettabytes of information will be created and stored in 2011, according to the latest IDC Digital Universe Study sponsored by EMC. That’s a mind-boggling figure, equivalent to 1.8 trillion gigabytes -enough information to fill 57.5 billion 32GB Apple iPads. It also illustrates the challenge in storing and managing all that data.

I know it is apples and oranges to some degree but compare the figures for data and linked data:

Data

1,800,000,000,000,000,000,000

Triples

31,634,213,770

Not to mention that the semantics of data is constantly evolving. If not business and scientific data, recall that “texting” was unknown little more than a decade ago.

It is clear that we don’t have enough logicians and ontologists (who have yet to agree on a common upper ontology) to keep up with the increasing flow of data. For that matter, the truth is they have been constantly falling behind for centuries. Systems are proposed, cover some data, only to become data that has to be covered by subsequent systems.

Some options to deal with this crisis:

Universal Logician/Ontologist Conscription Act – All 18 year olds world wide have to spend 6 years in the LogoOnto Corps. First four years learning the local flavor of linked data and the last two years coding data.

Excess data to /dev/null – Pipe all non-Linked data to /dev/null until logicians/ontologists can catch up. Projected to be sometime after 5500, perhaps late 5500’s. (According to Zager and Evans.)

???

There are other options. Propose yours and/or wait for some suggestions here next week!

An in depth discussion of relational, NoSQL and graph database views of the world.

I must admit to being surprised when James Frazer’s Golden Bough came up in the presentation. It was used quite effectively as an illustration but I have learned to not expect humanities references or examples in CS presentations. This was a happy exception.

I agree with Ian that the relational world view remains extremely useful but also that it limits the data that can be represented and queried.

Complex relationships between entities simply don’t come up with relational databases because they aren’t easy (if possible) to represent.

I would take Ian’s point a step further and point out that logic, as in RDF and the Semantic Web, is a similar constraint.

Logic can be very useful in any number of areas, just like relational databases, but it only represents a very small slice of the world. A slice of the world that can be represented quite artificially without contradictions, omissions, inconsistencies, or any of the other issues that make logic systems fall over clutching their livers.

BTW, topic mappers need to take a look at timemark 34.26. The representation of the companies who employ workers and the “contagion” relationships. (You will have to watch the video to find out why I say “contagion.” It is worth the time.) Does that suggest to you that I could point topics to a common node based on their possession of some property, say a subject identifier? Such that when I traverse any of those topics I can go to the common node and produce a “merged” result if desired?

I say that because any topic could point to more than one common node, depending upon the world view of an author. That could be very interesting in terms of comparing how authors would merge topics.

Posted in Logic, Neo4j, NoSQL | Comments Off on New Opportunities for Connected Data (logic, contagion relationships and merging)

I really dislike the so called “Boolean Operators” (“AND”, “OR”, and “NOT”) and generally discourage people from using them. It’s understandable that novice users may tend to think about the queries they want to run in those terms, but as you become more familiar with IR concepts in general, and what Solr specifically is capable of, I think it’s a good idea to try to “set aside childish things” and start thinking (and encouraging your users to think) in terms of the superior “Prefix Operators” (“+”, “-”).

Don’t hold back Chris! It’s not good for you. Tell us how you feel about “Boolean Operators.” 😉

Seriously, Chris makes a very good case for using “Prefix Operators” and you will learn about powerful searching in both Lucene and Solr.