You know, links in the middle of sentences.

I’ve been thinking lately about the visionary optimism of the days when people dreamed of the promise of large-scale hypertext systems. I’m pretty sure they didn’t mean linkless content down the middle of a screen with columns of ads to the left and right of it, which is much of what we read off of screens these days. I certainly don’t want to start one of those rants of “the World Wide Web is deficient because it’s missing features X and Y, which by golly we had in the HyperThingie™ system that I helped design back in the 80s, and the W3C should have paid more attention to us” because I’ve seen too many of those. The web got so popular because Tim Berners-Lee found such an excellent balance between which features to incorporate and which (for example, central link management) to skip.

The idea of inline links, in which words and phrases in the middle of sentences link to other documents related to those words and phrases, was considered an exciting thing back when we got most of information from printed paper. A hypertext system had links between the documents stored in that system, and the especially exciting thing about a “world wide” hypertext system was that any document could link to any other document in the world.

But who does, in 2016? The reason I’ve been thinking more about the past and present of hypertext (a word that, sixteen years into the twenty-first century, is looking a bit quaint) is that since adding a few links to something I was writing at work recently, I’ve been more mindful of which major web sites include how many inline links and how many of those links go to other sites. For example, while reading the article Bayes’s Theorem: What’s the Big Deal? on Scientific American’s site recently, I found myself thinking “good for you guys, with all those useful links to other web sites right in the body of your article!”

My experience with contemporary hyperlinks has been like Bob’s. There are sites that cite only themselves but there are also sites that do point to external sources. Perhaps the most annoying failure to hyperlink is when a text mentions a document, report or agreement, and then fails to link the reader to that document.

The New York Times has a distinct and severe poverty of external links to original source materials. Some stories do have external links but not nearly all of them. Which surprises me for any news reporting site, much less the New York Times.

More hypertext linking would be great, but being able to compose documents from other documents, not our cut-n-paste of today but transclusion into a new document, that would be much better.

The topic of web annotation continues to grow in interest and importance.

Here’s how the World Wide Web Consortium (W3C) describes the topic:

Web annotations are an attempt to recreate and extend that functionality as a new layer of interactivity and linking on top of the Web. It will allow anyone to annotate anything anywhere, be it a web page, an ebook, a video, an image, an audio stream, or data in raw or visualized form. Web annotations can be linked, shared between services, tracked back to their origins, searched and discovered, and stored wherever the author wishes; the vision is for a decentralized and open annotation infrastructure.

A Few Examples

In recent weeks and months a WC3 Web Annotation working group got underway,, a company that has been working in this area for several years (and one we’ve mentioned several times on infoDOCKET) formally launched a web annotation extension for Chrome, the Mellon Foundation awarded $750,000 in research funding, and The Journal of Electronic Publishing began offering annotation for each article in the publication.

New Video

Today, posted a 15 minute video (embedded below) where several experts share some of their perspectives (Why the interest in the topic? Biggest Challenges, Future Plans, etc.) on the topic of web annotation.

The video was recorded at the recent W3C TPAC 2014 Conference in Santa Clara, CA.

I am puzzled by more than one speaker on the video referring to the lack of robust addressing as a reason why annotation has not succeeded in the past. Perhaps they are unaware of the XLink and XPointer work at the W3C? Or HyTime for that matter?

True, none of those efforts were widely supported but that doesn’t mean that robust addressing wasn’t available.

I for one will be interested in comparing the capabilities of prior efforts against what is marketed as “new web annotation” capabilities.

Annotation, particularly what was known as “extended XLinks” is very important for the annotation of materials to which you don’t have read/write access. Think about annotating texts distributed by a vendor on DVD. Or annotating text that are delivered over a web stream. A separate third-party value-add product. Like a topic map for instance.

See videos from I Annotate 2014

The Internet is an endlessly rich world of sites, pages and posts — until it all ends with a click and a “404 not found” error message. While the hyperlink was conceived in the 1960s, it came into its own with the HTML protocol in 1991, and there’s no doubt that the first broken link soon followed.

On its surface, the problem is simple: A once-working URL is now a goner. The root cause can be any of a half-dozen things, however, and sometimes more: Content could have been renamed, moved or deleted, or an entire site could have evaporated. Across the Web, the content, design and infrastructure of millions of sites are constantly evolving, and while that’s generally good for users and the Web ecosystem as a whole, it’s bad for existing links.

In its own way, the Web is also a very literal-minded creature, and all it takes is a single-character change in a URL to break a link. For example, many sites have stopped using “www,” and even if their content remains the same, the original links may no longer work. The rise of CMS platforms such as WordPress have led to the fall of static HTML sites with their .htm and .html extensions, and with each relaunch, untold thousands of links die.

Even if a core URL remains the same, many sites frequently append login information or search terms to URLs, and those are ephemeral. And as the Web has grown, the problem has been complicated by Google and other search engines that crawl the Web and archive — briefly — URLs and pages. Many work, but their long-term stability is open to question.

Hmmm, link rot, do you think that impacts the Semantic Web? 😉

If you can have multiple IRI’s for the same subject, well, you can have a different result.

Leighton has a number of suggestions to lessen your own link rot. For the link rot (as far as identifiers) of others, I suggest topic maps.

I first saw this at Full Text Reports as: Website linking: The growing problem of “link rot” and best practices for media and online publishers.

Bob republished an old tilt at a windmill that attempts to claim “linking” as beginning in the 12th century CE.

It’s an interesting read but I disagree with his dismissal of quoting of a work as a form of linking. Bob says:

Quoting of one work by another was certainly around long before the twelfth century, but if an author doesn’t identify an address for his source, his reference can’t be traversed, so it’s not really a link. Before the twelfth century, religious works had a long tradition of quoting and discussing other works, but in many traditions (for example, Islam, Theravada Buddhism, and Vedic Hinduism) memorization of complete religious works was so common that telling someone where to look within a work was unnecessary. If one Muslim scholar said to another “In the words of the Prophet…” he didn’t need to name the sura of the Qur’an that the quoted words came from; he could assume that his listener already knew. Describing such allusions as “links” adds heft to claims that linking is thousands of years old, but a link that doesn’t provide an address for its destination can’t be traversed, and a link that can’t be traversed isn’t much of a link. And, such claims diminish the tremendous achievements of the 12th-century scholars who developed new techniques to navigate the accumulating amounts of recorded information they were studying. (emphasis added)

Bob’s error is too narrow a view of the term “address.” Quoted text of the Hebrew Bible acts as an “address,” assuming you are familiar enough with the text. The same is true for the examples of the Qur’an and Vedic Hinduism. It is as certain and precise as a chapter and verse reference, but it does require a degree of knowledge of the text in question.

That does not take anything away from 12th century scholars who created addresses that did not require as much knowledge of the underlying text. Faced with more and more information, their inventions assisted in navigating texts with a new type of address, one that could be used by anyone.

Taking a broader view of addressing creates a continuum of addressing that encompasses web-based linking. Rather than using separate systems of physical addresses to locate information in books, users now have electronic addresses that can deliver them to particular locations in a work.

Here is my continuum of linking:

Linking Requires
Quoting Memorized Text
Reference System Copy of Text
WWW Hyperlink Access to Text

The question to ask about Bob’s point about quoting “…his reference can’t be traversed…” is, “Who can’t traverse that link?” Anyone who has memorized the text can quite easily.

Oh, people who have not memorized the text cannot traverse the link. And? If I don’t have access to the WWW, I can’t traverse hyperlinks. Does that make them any less links?

Or does it mean I haven’t met the conditions for exercising the link?

Instead of diminishing the work of 12th century scholars, recognizing prior linking practices allows us to explore what changed and for who as a result of their “…tremendous achievements….”

Transclusion? You be the judge.

All I can say is that it appears to be so.


PS: I want to start cheering, loudly, but without more, I can’t. Not yet.

When Vannevar Bush’s “As We May Think” first appeared in The Atlantic’s pages in July 1945, it set off an intellectual chain reaction that resulted, more than four decades later, in the creation of the World Wide Web.

In that landmark essay, Bush described a hypothetical machine called the Memex: a hypertext-like device capable of allowing its users to comb through a large set of documents stored on microfilm, connected via a network of “links” and “associative trails” that anticipated the hyperlinked structure of today’s Web.

Historians of technology often cite Bush’s essay as the conceptual forerunner of the Web. And hypertext pioneers like Douglas Engelbart, Ted Nelson, and Tim Berners-Lee have all acknowledged their debt to Bush’s vision. But for all his lasting influence, Bush was not the first person to imagine something like the Web.

Alex identifies several inventors in the early 20th who proposed systems quite similar to Vannevar Bush’s, prior to the publication of “As We May Think”. A starting place that may get you interested in learning the details of these alternate proposals.

Personally I would separate the notion of “hypertext” from the notion of networking remote sites together (not by Bush but by others) and that pushes the history of hypertext much further back in time.


