Archive for the ‘WWW’ Category

How Bad Is Wikileaks Vault7 (CIA) HTML?

Thursday, March 9th, 2017

How bad?

Unless you want to hand correct 7809 html files to use with XQuery, grab the latest copy of Tidy

It’s not the worst HTML I have ever seen, but put that in the context of having seen a lot of really poor HTML.

I’ve “tidied” up a test collection and will grab a fresh copy of the files before producing and releasing a clean set of the HTML files.

Producing a document collection for XQuery processing. Working towards something suitable for application of NLP and other tools.

SEO Tools: The Complete List (153 Free and Paid Tools) [No IEO Tools?]

Tuesday, July 5th, 2016

SEO Tools: The Complete List (153 Free and Paid Tools) by Brian Dean.

Updated as of May 20, 2016.

There is a PDF version but that requires sacrifice of your email address, indeterminate waiting for the confirmation email, etc.

The advantage of the PDF version isn’t clear, other than you can print it on marketing’s color printer. Something to cement that close bond between marketing and IT.

With the abundance of search engine optimization tools, have you noticed the lack of index engine optimization (IEO) tools?

When an indexing engine is “optimized,” settings of the indexing engine are altered to produce a “better” result. So far as I know, the data being indexed isn’t normally changed to alter the behavior of the indexing engine.

In contrast to an indexing engine, it is expected data destined for a search engine can and will change/optimize itself to alter the behavior of the search engine.

What if data were index engine optimized, say to distinguish terms with multiple meanings, at the time of indexing? Say articles in the New York Times were paired with vocabulary lists of the names, terms, etc. that appear within them.

Bi-directional links so that an index of the vocabulary lists would at the same time be an index of the articles themselves.

Thoughts?

Pride Goeth Before A Fall – DMCA & Security Researchers

Friday, June 24th, 2016

Cory Doctorow has written extensively on the problems with present plans to incorporate DRM in HTML5:

W3C DRM working group chairman vetoes work on protecting security researchers and competition – June 18, 2016.

An Open Letter to Members of the W3C Advisory Committee – May 12, 2016.

Save Firefox: The W3C’s plan for worldwide DRM would have killed Mozilla before it could start – May 11, 2016.

Interoperability and the W3C: Defending the Future from the Present – March 29, 2016.

among others.

In general I agree with Cory’s reasoning but I don’t see:

…Once DRM is part of a full implementation of HTML5, there’s a real risk to security researchers who discover defects in browsers and want to warn users about them…. (from Cory’s latest post)

Do you remember the Sony “copy-proof” CDs? Sony “copy-proof” CDs cracked with a marker pen Then, just as now, Sony is about to hand over bushels of cash to the content delivery crowd.

When security researchers discover flaws in the browser DRM, what prevents them from advising users?

Cory says the anti-circumvention provisions of the DMCA prevent security researchers from discovering and disclosing such flaws.

That’s no doubt true, if you want to commit a crime (violate the DMCA) and publish evidence of that crime with your name attached to it on the WWW.

Isn’t that a case of pride goeth before a fall?

If I want to alert other users to security defects in their browsers, possibly equivalent to the marker pen for Sony CDs, I post that to the WWW anonymously.

Or publish code to make that defect apparent to even a casual user.

What I should not do is put my name on either a circumvention bug report or code to demonstrate it. Yes?

That doesn’t answer Cory’s points about impairing innovation, etc. but once Sony realizes it has been had, again, by the content delivery crowd, what’s the point of more self-inflicted damage?

I feel sorry for content owners. Their greed makes them easy prey for people selling patented DRM medicine for the delivery of their content. In the long run it only hurts themselves (the DRM tax) and users. In fact, the only people making money off of DRM are the people who deliver content.

Should DRM appear as proposed in HTML5, any suggestions for a “marker pen” logo to be used by hackers of a Content Decryption Module?

PS: Another approach to opposing DRM would be to inform shareholders of Sony and other content owners they are about to be raped by content delivery systems.

PPS: In private email Cory advised me to consider the AACS encryption key controversy, where public posting of an encryption key was challenged with take down requests. However, in the long run, such efforts only spread the key more widely, not the effect intended by those attempted to limit its spread.

And there is the Dark Web, ahem, where it is my understanding that non-legal content and other material can be found.

APIs.guru Joins Growing List of API Indexes [Index of Indexes Anyone?]

Sunday, June 12th, 2016

APIs.guru Joins Growing List of API Indexes by Benjamin Young.

From the post:

APIs.guru is the latest entry into the API definition indexing, curation, and discovery space.

The open source (MIT-licensed) community curated index currently includes 236 API descriptions which cover 6,271 endpoints. APIs.guru is focused on becoming the "Wikipedia for REST APIs."

APIs.guru is entering an increasingly crowded market with other API indexing sites including The API Stack, API Commons, APIs.io, AnyAPI, and older indexes such as ProgrammableWeb's API Directory. These API indexes share a common goal says APIEvangelist.com blogger Kin Lane:

Developers around the world are using these definitions in their work, and modern API tooling and service providers are using them to define the value they bring to the table. To help the API sector reach the next level, we need you to step up and share the API definitions you have with API Stack, APIs.io, or APIs.guru, and if you have the time and skills, we could use your help crafting other new API definitions for popular services available today.

The APIs.guru content is curated primarily by its creator, Ivan Goncharov. According to a DataFire Blog entry, the initial content was populated "using a combination of automated scraping and human curation to crawl the web for machine-readable API definitions."

The empirical evidence from Spain indicates that the more places that link to you, the more traffic you enjoy. Even for news sites.

From that perspective, a multitude of over-lapping, duplicative API indexes is a good thing.

From my perspective, that is a one-stop shop for APIs, it’s a nightmare.

Which one you see depend on your use case.

Enjoy!

Kindermädchen (Nanny) Court Protects Facebook Users – Hunting Down Original Sources

Friday, January 22nd, 2016

Facebook’s Friend Finder found unlawful by Germany’s highest court by Lisa Vaas.

From the post:

Reuters reports that a panel of the Federal Court of Justice has ruled that Facebook’s Friend Finder feature, used to encourage users to market the social media network to their contacts, constituted advertising harassment in a case that was filed in 2010 by the Federation of German Consumer Organisations (VZBV).

Friends Finder asks users for permission to snort the e-mail addresses of their friends or contacts from their address books, thereby allowing the company to send invitations to non-Facebook users to join up.

There was a time when German civil law and the reasoning of its courts were held in high regard. I regret to say it appear that may not longer be the case.

This decision on Facebook asking users to spread the use of Facebook being a good example.

From the Reuters account, it appears that sending of unsolicited email is the key to the court’s decision.

It’s difficult to say much more about the court’s decision because finding something other than re-tellings of the Reuters report is difficult.

You can start with the VZBV press release on the decision: Wegweisendes BGH-Urteil: Facebooks Einladungs-E-Mails waren unlautere Werbung, but it too is just a summary.

Unlike the Reuters report, it at least has: Auf anderen Webseiten Pressemitteilung des BGH, which takes you to: Bundesgerichtshof zur Facebook-Funktion “Freunde finden,” a press release by the court about its decision. 😉

The court’s press release offers: Siehe auch: Urteil des I. Zivilsenats vom 14.1.2016 – I ZR 65/14 –, which links to a registration facility to subscribe for a notice of the opinion of the court when it is published.

No promises on when the decision will appear. I subscribed today, January 22nd and the decision was made on January 14, 2016.

I did check Aktuelle Entscheidungen des Bundesgerichtshofes (recent decisions), but it refers you back to the register for the opinion to appear in the future.

Without the actual decision, it’s hard to tell if the court is unaware of the “delete” key on German keyboards or if there is some other reason to inject itself into a common practice on social media sites.

I will post a link to the decision when it becomes available. (The German court makes its decisions available for free to the public and charges a document fee for profit making services, or so I understand the terms of the site.)

PS: For journalists, researchers, bloggers, etc. I consider it a best practice to always include pointers to original sources.

PPS: The German keyboard does include a delete key (Entf) if you had any doubts:

880px-German-T2-Keyboard-Prototype-May-2012

(Select the image to display a larger version.)

The past and present of hypertext

Sunday, January 17th, 2016

The past and present of hypertext by Bob DuCharme.

From the post:

You know, links in the middle of sentences.

I’ve been thinking lately about the visionary optimism of the days when people dreamed of the promise of large-scale hypertext systems. I’m pretty sure they didn’t mean linkless content down the middle of a screen with columns of ads to the left and right of it, which is much of what we read off of screens these days. I certainly don’t want to start one of those rants of “the World Wide Web is deficient because it’s missing features X and Y, which by golly we had in the HyperThingie™ system that I helped design back in the 80s, and the W3C should have paid more attention to us” because I’ve seen too many of those. The web got so popular because Tim Berners-Lee found such an excellent balance between which features to incorporate and which (for example, central link management) to skip.

The idea of inline links, in which words and phrases in the middle of sentences link to other documents related to those words and phrases, was considered an exciting thing back when we got most of information from printed paper. A hypertext system had links between the documents stored in that system, and the especially exciting thing about a “world wide” hypertext system was that any document could link to any other document in the world.

But who does, in 2016? The reason I’ve been thinking more about the past and present of hypertext (a word that, sixteen years into the twenty-first century, is looking a bit quaint) is that since adding a few links to something I was writing at work recently, I’ve been more mindful of which major web sites include how many inline links and how many of those links go to other sites. For example, while reading the article Bayes’s Theorem: What’s the Big Deal? on Scientific American’s site recently, I found myself thinking “good for you guys, with all those useful links to other web sites right in the body of your article!”

My experience with contemporary hyperlinks has been like Bob’s. There are sites that cite only themselves but there are also sites that do point to external sources. Perhaps the most annoying failure to hyperlink is when a text mentions a document, report or agreement, and then fails to link the reader to that document.

The New York Times has a distinct and severe poverty of external links to original source materials. Some stories do have external links but not nearly all of them. Which surprises me for any news reporting site, much less the New York Times.

More hypertext linking would be great, but being able to compose documents from other documents, not our cut-n-paste of today but transclusion into a new document, that would be much better.

URLs Are Porn Vulnerable

Monday, June 22nd, 2015

Graham Cluley reports in Heinz takes the heat over saucy porn QR code that some bottles of Heinz Hot Ketchup provide more than “hot” ketchup. The QR code on the bottle leads to a porn site. (It is hard to put a “prize” in a ketchup bottle.)

Graham observes a domain registration lapsed for Heinz and the new owner wasn’t in the same line of work.

Are you presently maintaining every domain you have ever registered?

The lesson here is that URLs (as identifiers) are porn vulnerable.

CNIL Anoints Itself Internet Censor

Thursday, June 18th, 2015

France seeks to extend Google ‘right to be forgotten’.

From the post:

Google has 15 days to comply with a request from France’s data watchdog to extend the “right to be forgotten” to all its search engines.

Last year a European Court of Justice ruling let people ask Google to delist some information about them.

However, the data deleting system only strips information from searches done via Google’s European sites.

French data regulator CNIL said Google could face sanctions if it did not comply within the time limit.

In response, Google said in a statement: “We’ve been working hard to strike the right balance in implementing the European Court’s ruling, co-operating closely with data protection authorities.

“The ruling focused on services directed to European users, and that’s the approach we are taking in complying with it.”

(emphasis in the original)

The first news I saw of this latest round of censorship from the EU was dated June 12, 2015. Assuming that started the fifteen (15) days running, Google has until the 27th of June, 2015, to comply.

Plenty enough time to reach an agreement with the other major search providers to go dark in the EU on the 27th of June, 2015.

By working with the EU at all on the fantasy right-to-be-forgotten, Google has encouraged a step towards Balkanization of the Internet, where what resources you may or may not see, will depend upon your physical location.

Not only does that increase the overhead for providers of Internet content, but it also robs the Internet of its most powerful feature, the free exchange of ideas, education and resources.

Eventually, even China will realize that the minor social eddies caused by use of the Internet pale when compared to the economic activity spurred by it. People do blame/credit the Internet with social power but where it has worked, the people who lost should have been removed long ago by other means.

Internet advocates are quick to take credit for things the Internet has not done, much as Unitarians of today want to claim Thomas Jefferson as a Unitarian. I would not credit the view of advocates as being a useful measure of the Internet’s social influence.

If that were the case, then why does sexism, rape, child porn, violence, racism, discrimination, etc. still exist? Hmmm, maybe the Internet isn’t as powerful as people think? Maybe the Internet reflects the same social relationships and short falls that exist off of the Internet? Could be.

Google needs to agree with other search providers to go dark for the EU for some specified time period. EU residents can see how the Internet looks with effective search tools. Perhaps they will communicate their wishes with regard to search engines to their duly elected representatives.

PS: Has anyone hacked CNIL lately? Just curious.

W3C Validation Tools – New Location

Wednesday, June 3rd, 2015

W3C Validation Tools

The W3C graciously hosts the following free validation tools:

CSS Validator – Checks your Cascading Style Sheets (CSS)

Internationalization Checker – Checks level of internationalization-friendliness.

Link Checker – Checks your web pages for broken links.

Markup Validator – Checks the markup of your Web documents (HTML or XHTML).

RSS Feed Validator – Validator for syndicated feeds. (RSS and Atom feeds)

RDF Validator – Checks and visualizes RDF documents.

Unicorn – Unified validator. HTML, CSS, Links & Mobile.

Validator.nu Checks HTML5.

I mention that these tools are free to emphasize there is no barrier to their use.

Just as you wouldn’t submit a research paper with pizza grease stains on it, use these tools to proof draft standards before you submit them for review.

.sucks

Sunday, May 24th, 2015

New gTLDs: .SUCKS Illustrates Potential Problems for Security, Brand Professionals by Camille Stewart.

From the post:

The launch of the .SUCKS top-level domain name (gTLD) has reignited and heightened concerns about protecting brands and trademarks from cybersquatters and malicious actors. This new extension, along with more than a thousand others, has been approved by the Internet Corporation for Assigned Names and Numbers (ICANN) as part of their new gTLD program. The program was designed to spark competition and innovation by opening up the market to additional gTLDs.

Not surprisingly, though, complaints are emerging that unscrupulous operators are using .SUCKS to extort money from companies by threatening to use it to create websites that could damage their brands. ICANN is now reportedly asking the Federal Trade Commission (FTC) and Canada’s Office of Consumer Affairs to weigh in on potential abuses so it can address them. Recently, Congress weighed in on the issue, holding a hearing about. SUCKS and other controversial domains like .PORN .

Vox Populi Registry Ltd. began accepting registrations for .SUCKS domains on March 30 from trademark holders and celebrities before it opened to public applicants. It recommended charging $2,499 a year for each domain name registration, and according to Vox Populi CEO John Berard, resellers are selling most of the names for around $2,000 a year. Berard asserts that the extension is meant to create destinations for companies to interact with their critics, and called his company’s business “well within the lines of ICANN rules and the law.”

If you follow the link to the statement by Vox Populi CEO John Berard, that post concludes with:

The new gTLD program is about increasing choice and competition in the TLD space, it’s not supposed to be about applicants bilking trademark owners for whatever they think they can get away with.

A rather surprising objection considering that trademark (and copyright) owners have been bilking/gouging consumers for centuries.

Amazing how sharp the pain can be when a shoe pinches on a merchant’s foot.

How many Disney properties could end in .sucks? (Research question)

Metaflop: Hello World

Friday, May 8th, 2015

Metaflop: Hello World

From the webpage:

Metaflop is an easy to use web application for modulating your own fonts. Metaflop uses Metafont, which allows you to easily customize a font within the given parameters and generate a large range of font families with very little effort.

With the Modulator it is possible to use Metafont without dealing with the programming language and coding by yourself, but simply by changing sliders or numeric values of the font parameter set. This enables you to focus on the visual output – adjusting the parameters of the typeface to your own taste. All the repetitive tasks are automated in the background.

The unique results can be downloaded as a webfont package for embedding on your homepage or an OpenType PostScript font (.otf) which can be used on any system in any application supporting otf.

Various Metafonts can be chosen from our type library. They all come along with a small showcase and a preset of type derivations.

Metaflop is open source – you can find us on Github, both for the source code of the platform and for all the fonts.

If metafont rings any bells, congratulations! Metafont was invented by Don Knuth for TeX.

Metaflop provides a web interface to the Metafont program and with parameters that can be adjusted.

Only A-Z, a-z, and 0-9 are available for font creation.

In the FAQ, the improvement over Metafont is said to be:

  • font creators are mostly designers, not engineers. so metafont is rather complicated to use, you need to learn programming.
  • it has no gui (graphical user interface).
  • the native export is to bitmap fonts which is a severe limitation compared to outline fonts.

Our contribution to metafont is to address these issues. we are aware that it is difficult to produce subtle and refined typographical fonts (in the classical meaning). Nevertheless we believe there is a undeniable quality in parametric font design and we try to bring it closer to the world of the designers.

While Metaflop lacks the full generality of Metafont, it is a big step in the right direction to bring Metafont to a broader audience.

With different underlying character sets, certainly of interest to anyone interested in pre-printing press texts. Glyphs can transliterate to the same characters but which glyph was used can be important information to both capture and display.

Turning the MS Battleship

Saturday, March 21st, 2015

Improving interoperability with DOM L3 XPath by Thomas Moore.

From the post:

As part of our ongoing focus on interoperability with the modern Web, we’ve been working on addressing an interoperability gap by writing an implementation of DOM L3 XPath in the Windows 10 Web platform. Today we’d like to share how we are closing this gap in Project Spartan’s new rendering engine with data from the modern Web.

Some History

Prior to IE’s support for DOM L3 Core and native XML documents in IE9, MSXML provided any XML handling and functionality to the Web as an ActiveX object. In addition to XMLHttpRequest, MSXML supported the XPath language through its own APIs, selectSingleNode and selectNodes. For applications based on and XML documents originating from MSXML, this works just fine. However, this doesn’t follow the W3C standards for interacting with XML documents or exposing XPath.

To accommodate a diversity of browsers, sites and libraries wrap XPath calls to switch to the right implementation. If you search for XPath examples or tutorials, you’ll immediately find results that check for IE-specific code to use MSXML for evaluating the query in a non-interoperable way:

It seems like a long time ago that a relatively senior Microsoft staffer told me that turning a battleship like MS takes time. No change, however important, is going to happen quickly. Just the way things are in a large organization.

The important thing to remember is that once change starts, that too takes on a certain momentum and so is more likely to continue, even though it was hard to get started.

Yes, I am sure the present steps towards greater interoperability could have gone further, in another direction, etc. but they didn’t. Rather than complain about the present change for the better, why not use that as a wedge to push for greater support for more recent XML standards?

For my part, I guess I need to get a copy of Windows 10 on a VM so I can volunteer as a beta tester for full XPath (XQuery?/XSLT?) support in a future web browser. MS as a full XML competitor and possible source of open source software would generate some excitement in the XML community!

UI Events (Formerly DOM Level 3 Events) Draft Published

Thursday, March 19th, 2015

UI Events (Formerly DOM Level 3 Events) Draft Published

From the post:

The Web Applications Working Group has published a Working Draft of UI Events (formerly DOM Level 3 Events). This specification defines UI Events which extend the DOM Event objects defined in DOM4. UI Events are those typically implemented by visual user agents for handling user interaction such as mouse and keyboard input. Learn more about the Rich Web Client Activity.

If you are planning on building rich web clients, now would be the time to start monitoring W3C drafts in this area. To make sure your use cases are met.

People have different expectations with regard to features and standards quality. Make sure your expectations are heard.

DYI Web Server

Monday, March 16th, 2015

Let’s Build A Web Server. Part 1. by Ruslan Spivak.

From the post:

Out for a walk one day, a woman came across a construction site and saw three men working. She asked the first man, “What are you doing?” Annoyed by the question, the first man barked, “Can’t you see that I’m laying bricks?” Not satisfied with the answer, she asked the second man what he was doing. The second man answered, “I’m building a brick wall.” Then, turning his attention to the first man, he said, “Hey, you just passed the end of the wall. You need to take off that last brick.” Again not satisfied with the answer, she asked the third man what he was doing. And the man said to her while looking up in the sky, “I am building the biggest cathedral this world has ever known.” While he was standing there and looking up in the sky the other two men started arguing about the errant brick. The man turned to the first two men and said, “Hey guys, don’t worry about that brick. It’s an inside wall, it will get plastered over and no one will ever see that brick. Just move on to another layer.”1

The moral of the story is that when you know the whole system and understand how different pieces fit together (bricks, walls, cathedral), you can identify and fix problems faster (errant brick).

What does it have to do with creating your own Web server from scratch?

I believe to become a better developer you MUST get a better understanding of the underlying software systems you use on a daily basis and that includes programming languages, compilers and interpreters, databases and operating systems, web servers and web frameworks. And, to get a better and deeper understanding of those systems you MUST re-build them from scratch, brick by brick, wall by wall. (emphasis in original)

You probably don’t want to try this with an office suite package but for a basic web server this could be fun!

More installments to follow.

Enjoy!

Google Chrome (Version 41.0.2272.89 (64-bit)) WARNING!

Saturday, March 14th, 2015

An update of Google Chrome on Ubuntu this morning took my normal bookmark manager list of small icons and text to:

google-bookmarks

What do the kids say these days?

That sucks!

Some of you may prefer the new display. Good for you.

As far as I can tell, Chrome does not offer an option to revert to the previous display.

I keep quite a few bookmarks with an active blog so the graphic images are a waste of screen space and force me to scroll far more often than otherwise. I often work with the bookmark manager open in a separate screen.

For people who like this style, great. My objection is to it being forced on users who may prefer the prior style of bookmarks.

Here’s your design tip for the day: Don’t help users without giving them the ability to decline the help. Especially with display features.

Redefining “URL” to Invalidate Twenty-One (21) Years of Usage

Saturday, February 21st, 2015

You may be interested to know that efforts are underway to bury the original meaning of URL and to replace it with another meaning.

Our trail starts with the HTML 5 draft of 17 December 2012, which reads in part:

2.6 URLs

This specification defines the term URL, and defines various algorithms for dealing with URLs, because for historical reasons the rules defined by the URI and IRI specifications are not a complete description of what HTML user agents need to implement to be compatible with Web content.

The term “URL” in this specification is used in a manner distinct from the precise technical meaning it is given in RFC 3986. Readers familiar with that RFC will find it easier to read this specification if they pretend the term “URL” as used herein is really called something else altogether. This is a willful violation of RFC 3986. [RFC3986]

2.6.1 Terminology

A URL is a string used to identify a resource.

A URL is a valid URL if at least one of the following conditions holds:

  • The URL is a valid URI reference [RFC3986].
  • The URL is a valid IRI reference and it has no query component. [RFC3987]
  • The URL is a valid IRI reference and its query component contains no unescaped non-ASCII characters. [RFC3987]
  • The URL is a valid IRI reference and the character encoding of the URL’s Document is UTF-8 or a UTF-16 encoding. [RFC3987]

You may not like the usurpation of URL and its meaning but at least it is honestly reported.

Compare Editor’s Draft 13 November 2014, which reads in part:

2.5 URLs

2.5.1 Terminology

A URL is a valid URL if it conforms to the authoring conformance requirements in the WHATWG URL standard. [URL]

A string is a valid non-empty URL if it is a valid URL but it is not the empty string.

Hmmm, all the references to IRIs and violating RFC3986 has disappeared.

But there is a reference to the WHATWG URL standard.

If you follow that internal link to the bibliography you will find:

[URL]
URL (URL: http://url.spec.whatwg.org/), A. van Kesteren. WHATWG.

Next stop: URL Living Standard — Last Updated 6 February 2015, which reads in part:

The URL standard takes the following approach towards making URLs fully interoperable:

  • Align RFC 3986 and RFC 3987 with contemporary implementations and obsolete them in the process. (E.g. spaces, other “illegal” code points, query encoding, equality, canonicalization, are all concepts not entirely shared, or defined.) URL parsing needs to become as solid as HTML parsing. [RFC3986] [RFC3987]
  • Standardize on the term URL. URI and IRI are just confusing. In practice a single algorithm is used for both so keeping them distinct is not helping anyone. URL also easily wins the search result popularity contest.

A specification being developed by WHATWG.org.

Not nearly as clear and forthcoming as the HTML5 draft as of 17 December 2012. Yes?

RFC3986 and RFC3987 are products of the IETF. If revisions of those RFCs are required, shouldn’t that work be at IETF?

Or at a minimum, why is a foundation for HTML5 not at the W3C, if not at IETF?

The conflating URLs (RFC3986) and IRIs (RFC3987) is taking place well away from the IETF and W3C processes.

A conflation that invalidates twenty-one (21) years of use of URL in books, papers, presentations, documentation, etc.

BTW, URL was originally defined in 1994 in RFC1738.

Is popularity of an acronym worth that cost?

One Week of Harassment on Twitter

Thursday, January 29th, 2015

One Week of Harassment on Twitter by Anita Sarkeesian.

From the post:

Ever since I began my Tropes vs Women in Video Games project, two and a half years ago, I’ve been harassed on a daily basis by irate gamers angry at my critiques of sexism in video games. It can sometimes be difficult to effectively communicate just how bad this sustained intimidation campaign really is. So I’ve taken the liberty of collecting a week’s worth of hateful messages sent to me on Twitter. The following tweets were directed at my @femfreq account between 1/20/15 and 1/26/15.

The limited vocabularies of the posters to one side, one hundred and fifty-six (156) hate messages is an impressive number. I pay no more attention to postings by illiterates than I do to cat pictures but I can understand why that would get to be a drag.

Many others have commented more usefully on the substance of this topic than I can but as a technical matter, how would you:

  • Begin to ferret out the origins and backgrounds on these posters?
  • Automate response networks (use your imagination about the range of responses)?
  • Automate filtering for an account under such attacks?

Lacking any type of effective governance structure, think any unexplored and ungoverned territory, security and safety on the Internet is a question of alliances for mutual protection. Eventually governance will evolve for the Internet but since that will require relinquishing of some national sovereignty, I don’t expect to see it in our lifetimes.

In the meantime, we need stop-gap measures that can set the tone for the governance structures that will eventually evolve.

Suggestions?

PS: Some people urge petitioning current governments for protection. Since their interests are in inherent conflict with the first truly transnational artifact (the Internet), I don’t see that as being terribly useful. I prefer whatever other stick comes to hand.

I first saw this in a tweet by kottke.org.

The Cobweb: Can the Internet be archived?

Monday, January 26th, 2015

The Cobweb: Can the Internet be archived? by Jill Lepore.

From the post:

Malaysia Airlines Flight 17 took off from Amsterdam at 10:31 A.M. G.M.T. on July 17, 2014, for a twelve-hour flight to Kuala Lumpur. Not much more than three hours later, the plane, a Boeing 777, crashed in a field outside Donetsk, Ukraine. All two hundred and ninety-eight people on board were killed. The plane’s last radio contact was at 1:20 P.M. G.M.T. At 2:50 P.M. G.M.T., Igor Girkin, a Ukrainian separatist leader also known as Strelkov, or someone acting on his behalf, posted a message on VKontakte, a Russian social-media site: “We just downed a plane, an AN-26.” (An Antonov 26 is a Soviet-built military cargo plane.) The post includes links to video of the wreckage of a plane; it appears to be a Boeing 777.

Two weeks before the crash, Anatol Shmelev, the curator of the Russia and Eurasia collection at the Hoover Institution, at Stanford, had submitted to the Internet Archive, a nonprofit library in California, a list of Ukrainian and Russian Web sites and blogs that ought to be recorded as part of the archive’s Ukraine Conflict collection. Shmelev is one of about a thousand librarians and archivists around the world who identify possible acquisitions for the Internet Archive’s subject collections, which are stored in its Wayback Machine, in San Francisco. Strelkov’s VKontakte page was on Shmelev’s list. “Strelkov is the field commander in Slaviansk and one of the most important figures in the conflict,” Shmelev had written in an e-mail to the Internet Archive on July 1st, and his page “deserves to be recorded twice a day.”

On July 17th, at 3:22 P.M. G.M.T., the Wayback Machine saved a screenshot of Strelkov’s VKontakte post about downing a plane. Two hours and twenty-two minutes later, Arthur Bright, the Europe editor of the Christian Science Monitor, tweeted a picture of the screenshot, along with the message “Grab of Donetsk militant Strelkov’s claim of downing what appears to have been MH17.” By then, Strelkov’s VKontakte page had already been edited: the claim about shooting down a plane was deleted. The only real evidence of the original claim lies in the Wayback Machine.

If you aren’t a daily user of the the Internet Archive (home of the WayBack Machine) you are missing out on a very useful resource.

Jill tells the story about the archive, its origins and challenges as well as I have heard it told. Very much worth your time to read.

Hopefully after reading the story you will find ways to contribute/support the Internet Archive.

Without the Internet Archive, the memory of the web would be distributed, isolated and in peril of erasure and neglect.

I am sure many governments and corporations wish the memory of the web could be altered, let’s disappoint them!

Crawling the WWW – A $64 Question

Saturday, January 24th, 2015

Have you ever wanted to crawl the WWW? To make a really comprehensive search? Waiting for a private power facility and server farm? You need wait no longer!

Ross Fairbanks details in WikiReverse data pipeline details the creation of Wikireverse:

WikiReverse is a reverse web-link graph for Wikipedia articles. It consists of approximately 36 million links to 4 million Wikipedia articles from 900,000 websites.

You can browse the data at WikiReverse or downloaded from S3 as a torrent.

The first thought that struck me was the data set would be useful for deciding which Wikipedia links are the default subject identifiers for particular subjects.

My second thought was what a wonderful starting place to find links with similar content strings, for the creation of topics with multiple subject identifiers.

My third thought was, $64 to search a CommonCrawl data set!

You can do a lot of searches at $64 per before you get to the cost of a server farm, much less a server farm plus a private power facility.

True, it won’t be interactive but then few searches at the NSA are probably interactive. 😉

The true upside being you are freed from the tyranny of page-rank and hidden algorithms by which vendors attempt to guess what is best for them and secondarily, what is best for you.

Take the time to work through Ross’ post and develop your skills with the CommonCrawl data.

Spatial Data on the Web Working Group

Monday, January 12th, 2015

Spatial Data on the Web Working Group

From the webpage:

The mission of the Spatial Data on the Web Working Group is to clarify and formalize the relevant standards landscape. In particular:

  • to determine how spatial information can best be integrated with other data on the Web;
  • to determine how machines and people can discover that different facts in different datasets relate to the same place, especially when ‘place’ is expressed in different ways and at different levels of granularity;
  • to identify and assess existing methods and tools and then create a set of best practices for their use;

where desirable, to complete the standardization of informal technologies already in widespread use.

The Spatial Data on the Web WG is part of the Data Activity and is explicitly chartered to work in collaboration with the Open Geospatial Consortium (OGC), in particular, the Spatial Data on the Web Task Force of the Geosemantics Domain Working Group. Formally, each standards body has established its own group with its own charter and operates under the respective organization’s rules of membership, however, the ‘two groups’ will work together very closely and create a set of common outputs that are expected to be adopted as standards by both W3C and OGC and to be jointly branded.

Read the charter and join the Working Group.

Just when I think the W3C has broken free of RDF/OWL, I see one of the deliverables is “OWL Time Ontology.”

Some people never give up.

There is a bright shiny lesson about the success of deep learning. It doesn’t start with any rules. Just like people don’t start with any rules.

Logic isn’t how we get anywhere. Logic is how we justify our previous arrival.

Do you see the difference?

I first saw this in a tweet by Marin Dimitrov.

Wouldn’t it be fun to build your own Google?

Thursday, December 11th, 2014

Wouldn’t it be fun to build your own Google? by Martin Kleppmann.

Martin writes:

Imagine you had your own copy of the entire web, and you could do with it whatever you want. (Yes, it would be very expensive, but we’ll get to that later.) You could do automated analyses and surface the results to users. For example, you could collate the “best” articles (by some definition) written on many different subjects, no matter where on the web they are published. You could then create a tool which, whenever a user is reading something about one of those subjects, suggests further reading: perhaps deeper background information, or a contrasting viewpoint, or an argument on why the thing you’re reading is full of shit.

Unfortunately, at the moment, only Google and a small number of other companies that have crawled the web have the resources to perform such analyses and build such products. Much as I believe Google try their best to be neutral, a pluralistic society requires a diversity of voices, not a filter bubble controlled by one organization. Surely there are people outside of Google who want to work on this kind of thing. Many a start-up could be founded on the basis of doing useful things with data extracted from a web crawl.

He goes on to discuss current search efforts such a Common Crawl and Wayfinder before hitting full stride with his suggestion for a distributed web search engine. Painting in the broadest of strokes, Martin makes it sound almost plausible to contemplate such an effort.

While conceding the technological issues would be many, it is contended that the payoff would be immense, but in ways we won’t know until it is available. I suspect Martin is right but if so, then we should be able to see a similar impact from Common Crawl. Yes?

Not to rain on a parade I would like to join, but extracting value from a web crawl like Common Crawl is not a guaranteed thing. A more complete crawl of the web only multiplies those problems, it doesn’t make them easier to solve.

On the whole I think the idea of a distributed crawl of the web is a great idea, but while that develops, we best hone our skills at extracting value from the partial crawls that already exist.

Eurotechnopanic

Saturday, November 29th, 2014

Eurotechnopanic by Jeff Jarvis.

From the post:

I worry about Germany and technology. I fear that protectionism from institutions that have been threatened by the internet — mainly media giants and government — and the perception of a rising tide of technopanic in the culture will lead to bad law, unnecessary regulation, dangerous precedents, and a hostile environment that will make technologists, investors, and partners wary of investing and working in Germany.

I worry, too, about Europe and technology. Germany’s antiprogress movement is spreading to the EU — see its court’s decision creating a so-called right to be forgotten — as well as to members of the EU — see Spain’s link tax.

I worry mostly about damage to the internet, its freedoms and its future, limiting the opportunities an open net presents to anyone anywhere. Three forces are at work endangering the net: control, protectionism, and technopanic.

Jeff pens a wonderful essay and lingers the longest on protectionism and eurotechnopanic. His essay is filled with examples both contemporary and from history. Except for EU officials who are too deep into full panic to listen, it is very persuasive.

Jeff proposes a four-part plan for Google to overcome eurotechnopanic:

  • Address eurotechnopanic at a cultural and political level
  • Invest in innovation in German startups
  • Teach and explain the benefits of sharing information
  • Share the excitement of the net and technology

I would like to second all of those points, but Jeff forgets that German economic and social stability are the antithesis of the genetic makeup of Google.

Take one of Jeff’s recommendations: Invest in innovation in German startups.

Really? Show of hands. How many people have known German startups with incompetent staff who could not be fired?

Doubtful on that score?

Terminating Employees in Germany is extremely complicated and subject to a myriad of formal and substantive requirements. Except for small businesses, employers as generally required to show cause and are not free to select whom to terminate. Social criteria such as seniority, age, and number of dependants must be considered. Termination of employees belonging to certain classes such as pregnant women and people with disabilities requires prior involvement of and approval from government agencies. If a workers’ council has been established, it must be heard prior to terminating an employee in most instances. It is good practice to involve counsel already in the preparation of any termination or layoff. Any termination or layoff will most likely trigger a lawsuit. Most judges are employee friendly and most employees have insurance to cover their attorney fees. SIEGWART GERMAN AMERICAN LAW advises employers on all issues regarding termination of employment and alternative buyout strategies in Germany.

Notice Requirements: Even if the employer can show cause, the employee must be given notice. Notice periods under German law, which can be found in statutes and collective bargaining agreements, vary depending on seniority and can be more than six months long.

Firing Employees in Germany: Employees can be fired for good cause under extraordinary circumstances. Counsel should be involved immediately since the right to fire an employee is waived if the employer does not act within two weeks. SIEGWART GERMAN AMERICAN LAW has the experience and expertise to evaluate the circumstances of your case, develop a strategy, and make sure all formal requirements are timely met. [None of this is legal advice. Drawn from: http://www.siegwart-law.com/Sgal-en/lawyer-german-employment-law-germany.html]

That’s just not the world Google lives in. Not to say innovative work doesn’t happen in Germany and the EU because it does. But that innovative work is despite the government and not fostered by it.

Google should address eurotechnopanic by relocating bright Europeans to other countries that are startup and innovation friendly. Not necessarily to the United States. The future economic stars are said to be India, China, Korea, all places with innovative spirits and good work ethics.

Eventually citizens of the EU and the German people in particular will realize they have been betrayed by people seeking to further their own careers at the expense of their citizens.

PS: I wonder how long German banking would survive if the ISPs and Telcos decided enough was enough? Parochialism isn’t something that should be long tolerated.

505 Million Internet Censors and Growing (EU) – Web Magna Carta

Friday, November 28th, 2014

EU demands ‘right to be forgotten’ be applied globally by James Temperton.

From the post:

Google will be told to extend the “right to be forgotten” outside the EU in a move that is likely to cause a furrowing of brows at the search giant. EU regulators are looking to close a loophole in the controversial online privacy legislation that effectively renders it useless.

Currently people only have the right to be forgotten on versions of Google within the EU. Anyone wanting to see uncensored Google searches can simply use the US version of Google instead. The EU has taken a tough line against Google, expressing annoyance at its approach to removing search results.

The right to be forgotten allows people to remove outdated, irrelevant or misleading web pages from search results relating to their name. The EU will now ask Google to apply the ruling to the US version of its site, sources told Bloomberg Businessweek.

The latest demographic report shows the EU with five hundred and five million potential censors of search results with more on the way.

Not content to be an island of ignorance, the EU now wants to censor search results on behalf of the entire world.

In other words, 7.3% of the world’s population will decide what search results can be seen by the other 92.7%.

Tim Berners-Lee’s call for a new Magna Carta won’t help with this problem:

Berners-Lee’s Magna Carta plan is to be taken up as part of an initiative called “the web we want”, which calls on people to generate a digital bill of rights in each country – a statement of principles he hopes will be supported by public institutions, government officials and corporations.

A statement of principles? Really?

As I recall the original Magna Carta, a group of feudal lords forced King John to accept the agreement. Had the original Magna Carta been only a statement of principles, it would not be remembered today. It was the enforcement of those principles that purchased its hold on our imaginations.

So long as the Web is subject to the arbitrary and capricious demands of geographically bounded government entities, it will remain a hostage of those governments.

We have a new class of feudal lords, the international ISPs. The real question is whether they will take up a Magna Carta for the Web and enforce its terms on geographically bounded government entities?

The first provision of such a Magna Carta should not be:

1. Freedom of expression online and offline
(From: https://webwewant.org/about_us)

The first provision should be:

1. No government shall interfere with the delivery of content of any nature to any location connected to the Web. (Governments are free to penalize receipt or possession of information but in no way shall hinder its transfer and delivery on the Web.)

Such that when some country decides Google or others must censor information, which is clearly interference with delivery of content, the feudal lords, in this case, ISPs, will terminate all Internet access for that country.

It will be amusing to see how long spy agencies, telephone services, banks, etc., can survive without free and unfettered access to the global network.

The global network deserves and needs a global network governance structure separate and apart from existing government infrastructures. Complete with a court system with a single set of laws and regulations, an assembly to pass laws and other structures as are needful.

Don’t look so surprised. It is a natural progression from small hamlets to larger regional governance and to the geographically bounded governments of today. Which have proven themselves to be best at carry for their own rather than their citizens. A global service like the Net needs global governance and there are no existing bodies competent to take up the mantle.*

ISPs need to act as feudal lords to free themselves and by implication us, from existing government parochialism. Only then will we realize the full potential of the Web.

* You may be tempted to say the United Nations could govern the Web but consider that the five (5) permanent members of the Security Council can veto any resolution they care block from the other one hundred and ninety three (193) members. Having the ISPs govern the Web would be about as democratic, if not more so.

Launching in 2015: A Certificate Authority to Encrypt the Entire Web

Tuesday, November 18th, 2014

Launching in 2015: A Certificate Authority to Encrypt the Entire Web by Peter Eckersley.

From the post:

encrypt the web

Today EFF is pleased to announce Let’s Encrypt, a new certificate authority (CA) initiative that we have put together with Mozilla, Cisco, Akamai, Identrust, and researchers at the University of Michigan that aims to clear the remaining roadblocks to transition the Web from HTTP to HTTPS.

Although the HTTP protocol has been hugely successful, it is inherently insecure. Whenever you use an HTTP website, you are always vulnerable to problems, including account hijacking and identity theft; surveillance and tracking by governments, companies, and both in concert; injection of malicious scripts into pages; and censorship that targets specific keywords or specific pages on sites. The HTTPS protocol, though it is not yet flawless, is a vast improvement on all of these fronts, and we need to move to a future where every website is HTTPS by default.With a launch scheduled for summer 2015, the Let’s Encrypt CA will automatically issue and manage free certificates for any website that needs them. Switching a webserver from HTTP to HTTPS with this CA will be as easy as issuing one command, or clicking one button.

The biggest obstacle to HTTPS deployment has been the complexity, bureaucracy, and cost of the certificates that HTTPS requires. We’re all familiar with the warnings and error messages produced by misconfigured certificates. These warnings are a hint that HTTPS (and other uses of TLS/SSL) is dependent on a horrifyingly complex and often structurally dysfunctional bureaucracy for authentication.

This shouldn’t bother US security services since they were only gathering metadata and not content. Yes? 😉

The Wikipedia article on HTTPS reads in part:

Because HTTPS piggybacks HTTP entirely on top of TLS, the entirety of the underlying HTTP protocol can be encrypted. This includes the request URL (which particular web page was requested), query parameters, headers, and cookies (which often contain identity information about the user). However, because host (website) addresses and port numbers are necessarily part of the underlying TCP/IP protocols, HTTPS cannot protect their disclosure. In practice this means that even on a correctly configured web server, eavesdroppers can infer the IP address and port number of the web server (sometimes even the domain name e.g. www.example.org, but not the rest of the URL) that one is communicating with as well as the amount (data transferred) and duration (length of session) of the communication, though not the content of the communication.

No guarantees of security but it is a move in the right direction.

I first saw this in a tweet by Tim Bray.

Terms of Service

Saturday, November 8th, 2014

Terms of Service: understanding our role in the world of Big Data by Michael Keller and Josh Neufeld.

Caution: Readers of Terms of Service will discover they are products and only incidentally consumers of digital services. Surprise, dismay, depression, and despair are common symptoms post-reading. You have been warned.

Al Jazeera uses a comic book format to effectively communicate privacy issues raised by Big Data, the Internet of Things, the Internet, and “free” services.

The story begins with privacy concerns over scanning of Gmail content (remember that?) and takes the reader up to present and likely future privacy concerns.

I quibble with the example of someone being denied a loan because they failed to exercise regularly. The authors innocently assume that banks make loans with the intention of being repaid. That’s the story in high school economics but a long way from how lending works in practice.

The recent mortgage crisis in the United States was caused by banks inducing borrowers to over state their incomes, financing a home loan and its down payment, etc. Banks don’t keep such loans but package them as securities which they then foist off onto others. Construction companies make money building the houses, local government gain tax revenue, etc. Basically a form of churn.

But the authors are right that in some theoretical economy loans could be denied because of failure to exercise. Except that would exclude such a large market segment in the United States. Did you know they are about to change the words “…the land of the free…” to “…the land of the obese…?”

That is a minor quibble about what is overall a great piece of work. In only forty-six (46) pages it brings privacy issues into a sharper focus than many longer and more turgid works.

Do you know of any comparable exposition on privacy and Big Data/Internet?

Suggest it for conference swag/holiday present. Write to Terms-of-Service.

I first saw this in a tweet by Gregory Piatetsky.

HTML5 is a W3C Recommendation

Tuesday, October 28th, 2014

HTML5 is a W3C Recommendation

From the post:

(graphic omitted) The HTML Working Group today published HTML5 as W3C Recommendation. This specification defines the fifth major revision of the Hypertext Markup Language (HTML), the format used to build Web pages and applications, and the cornerstone of the Open Web Platform.

Today we think nothing of watching video and audio natively in the browser, and nothing of running a browser on a phone,” said Tim Berners-Lee, W3C Director. “We expect to be able to share photos, shop, read the news, and look up information anywhere, on any device. Though they remain invisible to most users, HTML5 and the Open Web Platform are driving these growing user expectations.

HTML5 brings to the Web video and audio tracks without needing plugins; programmatic access to a resolution-dependent bitmap canvas, which is useful for rendering graphs, game graphics, or other visual images on the fly; native support for scalable vector graphics (SVG) and math (MathML); annotations important for East Asian typography (Ruby); features to enable accessibility of rich applications; and much more.

The HTML5 test suite, which includes over 100,000 tests and continues to grow, is strengthening browser interoperability. Learn more about the Test the Web Forward community effort.

With today’s publication of the Recommendation, software implementers benefit from Royalty-Free licensing commitments from over sixty companies under W3C’s Patent Policy. Enabling implementers to use Web technology without payment of royalties is critical to making the Web a platform for innovation.

Read the Press Release, testimonials from W3C Members, and
acknowledgments. For news on what’s next after HTML5, see W3C CEO Jeff Jaffe’s blog post: Application Foundations for the Open Web Platform. We also invite you to check out our video Web standards for the future.

Just in case you have been holding off on HTML5 until it became an W3C Recommendation. 😉

Enjoy!

The Anatomy of a Large-Scale Hypertextual Web Search Engine (Ambiguity)

Saturday, October 25th, 2014

If you search for “The Anatomy of a Large-Scale Hypertextual Web Search Engine” by Sergey Brin and Lawrence Page, will you get the “long” version or the “short” version?

The version found at: http://infolab.stanford.edu/pub/papers/google.pdf reports in its introduction:

(Note: There are two versions of this paper — a longer full version and a shorter printed version. The full version is available on the web and the conference CD-ROM.)

However, it doesn’t say whether it is the “longer full version” or the “shorter printed version.” Length, twenty (20) pages.

The version found at: http://snap.stanford.edu/class/cs224w-readings/Brin98Anatomy.pdf claims the following citation: “Computer Networks and ISDN Systems 30 (1998) 107-117.” Length, eleven (11) pages. It “looks” like a journal printed article.

Ironic that the search engine fails to distinguish between these two versions of such an important paper.

Perhaps the search confusion is justified to some degree because Lawrence Page’s publications at: http://research.google.com/pubs/LawrencePage.html reports:

Lawrence Page pub info

But if you access the PDF, you get the twenty (20) page version, not the eleven page version published at: Computer Networks and ISDN Systems 30 (1998) 107-117.

BTW, if you want to automatically distinguish the files, the file sizes on the two versions referenced above are: 123603 (the twenty (20) page version) and 1492735 (the eleven (11) page version). (The published version has the publisher logo, etc. that boosts the file size.)

If Google had a mechanism to accept explicit crowd input, that confusion and the typical confusion between slides and papers with the same name could be easily solved.

The first reader who finds either the paper or slides, types it as paper or slides. The characteristics of that file become the basis for distinguishing those files into paper or slides. When the next searcher is returned results including those files, they get a pointer to paper or slides?

If they don’t submit a change for paper or slides, that distinction becomes more certain.

I don’t know what the carrot would be for typing resources returned in search results, perhaps five (5) minutes of freedom from ads! 😉

Thoughts?

I first saw this in a tweet by onepaperperday.

Supporting Open Annotation

Friday, October 10th, 2014

Supporting Open Annotation by Gerben.

From the post:

In its mission to connect the world’s knowledge and thoughts, the solution Hypothes.is pursues is a web-wide mechanism to create, share and discover annotations. One of our principal steps towards this end is providing a browser add-on that works with our annotation server, enabling people to read others’ annotations on any web page they visit, and to publish their own annotations for others to see.

I spent my summer interning at Hypothes.is to work towards a longer term goal, taking annotation sharing to the next level: an open, decentralised approach. In this post I will describe how such an approach could work, how standardisation efforts are taking off to get us there, and how we are involved in making this happen – the first step being support for the preliminary Open Annotation data model.

An annotation ecosystem

While we are glad to provide a service enabling people to create and share annotations on the web, we by no means want to become the sole annotation service provider, as this would imply creating a monopoly position and a single point of failure. Rather, we encourage anyone to build annotation tools and services, possibly using the same code we use. Of course, a problematic consequence of having multiple organisations each running separate systems is that even more information silos emerge on the web, and quite likely the most popular service would obtain a monopoly position after all.

To prevent either fragmentation or monopolisation of the world’s knowledge, we would like an ecosystem to evolve, comprising interoperable annotation services and client implementations. Such an ecosystem would promote freedom of innovation, prevent dependence on a single party, and provide scalability and robustness. It would be like the architecture of the web itself.

Not a bad read if you accept the notion that interoperable annotation servers are an acceptable architecture for annotation of web resources.

Me? I would just as soon put:

target

on my annotation and mail the URL to the CIA, FBI, NSA and any foreign intelligence agencies that I can think of with a copy of my annotaton.

You can believe that government agencies will follow the directives of Congress with regard to spying on United States citizens, but then that was already against the law. Remember the old saying, “Fool me once, shame on you. Fool me twice, shame on me.”? That is applicable to government surveillance.

We need robust annotation mechanisms but not ones that make sitting targets out of our annotations. Local, encrypted annotation mechanisms that can cooperate with other local, encrypted annotation mechanisms would be much more attractive to me.

I first saw this in a tweet by Ivan Herman.

WWW 2015 Call for Research Papers

Saturday, September 20th, 2014

WWW 2015 Call for Research Papers

From the webpage:

Important Dates:

  • Research track abstract registration:
    Monday, November 3, 2014 (23:59 Hawaii Standard Time)
  • Research track full paper submission:
    Monday, November 10, 2014 (23:59 Hawaii Standard Time)
  • Notifications of acceptance:
    Saturday, January 17, 2015
  • Final Submission Deadline for Camera-ready Version:
    Sunday, March 8, 2015
  • Conference dates:
    May 18 – 22, 2015

Research papers should be submitted through EasyChair at:
https://easychair.org/conferences/?conf=www2015

For more than two decades, the International World Wide Web (WWW) Conference has been the premier venue for researchers, academics, businesses, and standard bodies to come together and discuss latest updates on the state and evolutionary path of the Web. The main conference program of WWW 2015 will have 11 areas (or themes) for refereed paper presentations, and we invite you to submit your cutting-edge, exciting, new breakthrough work to the relevant area. In addition to the main conference, WWW 2015 will also have a series of co-located workshops, keynote speeches, tutorials, panels, a developer track, and poster and demo sessions.

The list of areas for this year is as follows:

  • Behavioral Analysis and Personalization
  • Crowdsourcing Systems and Social Media
  • Content Analysis
  • Internet Economics and Monetization
  • Pervasive Web and Mobility
  • Security and Privacy
  • Semantic Web
  • Social Networks and Graph Analysis
  • Web Infrastructure: Datacenters, Content Delivery Networks, and Cloud Computing
  • Web Mining
  • Web Search Systems and Applications

Great conference, great weather (weather for Florence in May) and it is in Florence, Italy. What other reasons do you need to attend? 😉

The growing problem of “link rot” and best practices for media and online publishers

Sunday, September 14th, 2014

The growing problem of “link rot” and best practices for media and online publishers by Leighton Walter Kille.

From the post:

The Internet is an endlessly rich world of sites, pages and posts — until it all ends with a click and a “404 not found” error message. While the hyperlink was conceived in the 1960s, it came into its own with the HTML protocol in 1991, and there’s no doubt that the first broken link soon followed.

On its surface, the problem is simple: A once-working URL is now a goner. The root cause can be any of a half-dozen things, however, and sometimes more: Content could have been renamed, moved or deleted, or an entire site could have evaporated. Across the Web, the content, design and infrastructure of millions of sites are constantly evolving, and while that’s generally good for users and the Web ecosystem as a whole, it’s bad for existing links.

In its own way, the Web is also a very literal-minded creature, and all it takes is a single-character change in a URL to break a link. For example, many sites have stopped using “www,” and even if their content remains the same, the original links may no longer work. The rise of CMS platforms such as WordPress have led to the fall of static HTML sites with their .htm and .html extensions, and with each relaunch, untold thousands of links die.

Even if a core URL remains the same, many sites frequently append login information or search terms to URLs, and those are ephemeral. And as the Web has grown, the problem has been complicated by Google and other search engines that crawl the Web and archive — briefly — URLs and pages. Many work, but their long-term stability is open to question.

Hmmm, link rot, do you think that impacts the Semantic Web? 😉

If you can have multiple IRI’s for the same subject, well, you can have a different result.

Leighton has a number of suggestions to lessen your own link rot. For the link rot (as far as identifiers) of others, I suggest topic maps.

I first saw this at Full Text Reports as: Website linking: The growing problem of “link rot” and best practices for media and online publishers.