Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

June 8, 2014

Laboratory for Web Algorithmics

Filed under: Algorithms,Search Algorithms,Search Engines,Webcrawler — Patrick Durusau @ 2:53 pm

Laboratory for Web Algorithmics

From the homepage:

The Laboratory for Web Algorithmics (LAW) was established in 2002 at the Dipartimento di Scienze dell’Informazione (now merged in the Computer Science Department) of the Università degli studi di Milano.

The LAW is part of the NADINE FET EU project.

Research at LAW concerns all algorithmic aspects of the study of the web and of social networks. More in detail…

The details include:

  • High-performance web crawling: Including an open source web crawler
  • Compression of web graphs and social networks: compression of web crawling results
  • Analysis of web graphs and social networks: research and algorithms for exploration of web graphs

Deeply impressive project and one with several papers and resources that I will be covering in more detail in future posts.

I first saw this in a tweet by Network Fact.

The Clojure Style Guide

Filed under: Clojure,Programming,Search Engines — Patrick Durusau @ 1:36 pm

The Clojure Style Guide by Bozhidar Batsov.

From the webpage:

Role models are important.
— Officer Alex J. Murphy / RoboCop

This Clojure style guide recommends best practices so that real-world Clojure programmers can write code that can be maintained by other real-world Clojure programmers. A style guide that reflects real-world usage gets used, and a style guide that holds to an ideal that has been rejected by the people it is supposed to help risks not getting used at all — no matter how good it is.

The guide is separated into several sections of related rules. I’ve tried to add the rationale behind the rules (if it’s omitted, I’ve assumed that it’s pretty obvious).

I didn’t come up with all the rules out of nowhere; they are mostly based on my extensive career as a professional software engineer, feedback and suggestions from members of the Clojure community, and various highly regarded Clojure programming resources, such as “Clojure Programming” and “The Joy of Clojure“.

The guide is still a work in progress; some sections are missing, others are incomplete, some rules are lacking examples, some rules don’t have examples that illustrate them clearly enough. In due time these issues will be addressed — just keep them in mind for now.

Please note, that the Clojure developing community maintains a list of coding standards for libraries, too.

You can generate a PDF or an HTML copy of this guide using Transmuter.

Another example where Ungoogleable Symbols from Clojure may be of interest.

A good index to Clojure resources needs to overcome the limitations of Google‘s search engine as well as others.

I first saw this in a tweet by LE Minh Triet.

June 7, 2014

A heuristic for sorting science stories in the news

Filed under: News,Reporting — Patrick Durusau @ 7:49 pm

A heuristic for sorting science stories in the news by David Spiegelhalter.

From the post:

Dominic Lawson’s article in the Sunday Times today[paywall] quotes me as having the rather cynical heuristic: “the very fact that a piece of health research appears in the papers indicates that it is nonsense.” I stand by this, but after a bit more consideration I would like to suggest a slightly more refined version for dealing with science stories in the news, particularly medical ones.

Ask yourself: if the study had come up with a negative result, would I be hearing about it? If NO, then don’t bother to read or listen to the story

(emphasis in the original)

This is a great post and deserves to be read in full.

After reading it, how would you answer this question: Would you use the same criteria for social media reports?

Granting there is a lot of noise in some social media streams but at the same time, there are some that are quite high quality.

As far as the “mainstream” news, you are dumber for having heard it.

Lifehack: using your microwave oven against NSA snooping

Filed under: Humor — Patrick Durusau @ 7:35 pm

Lifehack: using your microwave oven against NSA snooping

The author says placing your cellphone in a microwave oven will prevent tracking of your cellphone. The oven acts as a Faraday cage.

Perhaps so but tracking someone walking around with a microwave oven under their arm isn’t going to be very difficult.

Yes?

Enhanced Clojure Cheatsheet

Filed under: Clojure,Functional Programming,Programming — Patrick Durusau @ 7:19 pm

Enhanced Clojure Cheatsheet

Clojure Weekly, June 3rd, 2014 reports:

An enhanced version of the official clojure.org/cheatsheet with added popup tooltip and search box! The cheatsheet is quite useful to have a glance of the standard library functions grouped by functionality type. With this one you just hover over a function to see the docs and typing in the search box will filter down the list. Thanks Andy Fingerhut for posting this in the group list.

I like the pop-ups but wish they contained embedded links.

Enjoy!

Introducing the Solr Scale Toolkit

Filed under: Lucene,SolrCloud — Patrick Durusau @ 7:05 pm

Introducing the Solr Scale Toolkit by Timothy Potter.

From the post:

SolrCloud is a set of features in Apache Solr that enable elastic scaling of distributed search indexes using sharding and replication. One of the hurdles to adopting SolrCloud has been the lack of tools for deploying and managing a SolrCloud cluster. In this post, I introduce the Solr Scale Toolkit, an open-source project sponsored by LucidWorks (www.lucidworks.com), which provides tools and guidance for deploying and managing SolrCloud in cloud-based platforms such as Amazon EC2. In the last section, I use the toolkit to run some performance benchmarks against Solr 4.8.1 to see just how “scalable” Solr really is.

Motivation

When you download a recent release of Solr (4.8.1 is the latest at the time of this writing), it’s actually quite easy to get a SolrCloud cluster running on your local workstation. Solr allows you to start an embedded ZooKeeper instance to enable “cloud” mode using a simple command-line option: -DzkRun. If you’ve not done this before, I recommend following the instructions provided by the Solr Reference Guide: https://cwiki.apache.org/confluence/display/solr/SolrCloud

Once you’ve worked through the out-of-the-box experience with SolrCloud, you quickly realize you need tools to help you automate deployment and system administration tasks across multiple servers. Moreover, once you get a well-configured cluster running, there are ongoing system maintenance tasks that also should be automated, such as doing rolling restarts, performing off-site backups, or simply trying to find an error message across multiple log files on different servers.

Until now, most organizations had to integrate SolrCloud operations into an existing environment using tools like Chef or Puppet. While those are still valid approaches, the Solr Scale Toolkit provides a simple, Python-based solution that is easy to install and use to manage SolrCloud. In the remaining sections of this post, I walk you through some of the key features of the toolkit and encourage you to follow along. To begin there’s a little setup that is required to use the toolkit.

If you are looking to scale Solr, Timothy’s post is the right place to start!

Take serious heed of the following advice:

One of the most important tasks when planning to use SolrCloud is to determine how many servers you need to support your index(es). Unfortunately, there’s not a simple formula for determining this because there are too many variables involved. However, most experienced SolrCloud users do agree that the only way to determine computing resources for your production cluster is to test with your own data and queries. So for this blog, I’m going to demonstrate how to provision the computing resources for a small cluster but you should know that the same process works for larger clusters. In fact, the toolkit was developed to enable large-scale testing of SolrCloud. I leave it as an exercise for the reader to do their own cluster-size planning.

If anyone offers you a fixed rate SolrCloud, you should know they have calculated the cluster to be good for them, and if possible, good for you.

You have been warned.

Sin of Omission

Filed under: Cybersecurity,NSA,Security — Patrick Durusau @ 12:43 pm

Barb Darrow interviews IBM’s Lance Crosby (SoftLayer CEO within IBM) and at or about time mark 27:00, asks about trusting data to U.S. based companies:

Crosby responds:

My response is protect your data against any third party — whether it’s the NSA, other governments, hackers, terrorists, whatever…” he noted. “I say let’s stop worrying about the NSA and start talking about encryption and VPNs and all the ways you can protect yourself. Yes the NSA got caught but they’re not the first and won’t be the last.

Who did Crosby omit?

Your U.S.-based vendor..

How do you protect your data from your own vendor? That’s the question that Crosby so neatly ducks.

Crosby is speaking at Structure 2014. If you are attending, be sure to ask him,

“How do we protect our data from IBM if IBM is our cloud vendor?”

In all fairness, please revise and ask the same question of other cloud vendors as well.

Today, all of your data held by a U.S.-based vendor is just one court order away from being in the possession of the United States government. Who can use it, share it with competitors, etc.

Governments who want to promote the cloud will create exempt from government process mechanisms for data centers, their owners and staff, to be free from all government requests for data.

Governments who don’t want to promote the cloud and economic growth, well, they won’t have such mechanisms.

This could be the golden moment when multi-national vendors to become truly multi-national as opposed to being surrogates for the United States government.

Quotes from and commentary based on: Why we need to stop freaking out about the NSA and get on with business.

June 6, 2014

Modern GPU

Filed under: GPU,Parallel Programming — Patrick Durusau @ 7:10 pm

Modern GPU by Sean Baxter.

From the webpage:

Modern GPU is code and commentary intended to promote new and productive ways of thinking about GPU computing.

This project is a library, an algorithms book, a tutorial, and a best-practices guide. If you are new to CUDA, start here. If you’re already familiar with CUDA, are ready for a challenge, and want to learn design patterns for parallel programming, enjoy this series. (emphasis in original)

Just in time for the weekend! And you don’t need an iOS device to read the text. 😉

Skimming the FAQ this is serious work but will return serious dividends as well.

Enjoy!

A Methodology for Empirical Analysis of LOD Datasets

Filed under: Bioinformatics,Biomedical,LOD — Patrick Durusau @ 6:52 pm

A Methodology for Empirical Analysis of LOD Datasets by Vit Novacek.

Abstract:

CoCoE stands for Complexity, Coherence and Entropy, and presents an extensible methodology for empirical analysis of Linked Open Data (i.e., RDF graphs). CoCoE can offer answers to questions like: Is dataset A better than B for knowledge discovery since it is more complex and informative?, Is dataset X better than Y for simple value lookups due its flatter structure?, etc. In order to address such questions, we introduce a set of well-founded measures based on complementary notions from distributional semantics, network analysis and information theory. These measures are part of a specific implementation of the CoCoE methodology that is available for download. Last but not least, we illustrate CoCoE by its application to selected biomedical RDF datasets. (emphasis in original)

A deeply interesting work on the formal characteristics of LOD datasets but as we learned in Community detection in networks:… a relationship between a typology (another formal characteristic) and some hidden fact(s) may or may not exist.

Or to put it another way, formal characteristics are useful for rough evaluation of data sets but cannot replace a grounded actor considering their meaning. That would be you.

I first saw this in a tweet by Marin Dimitrov

Myth Busting Doubts About Formal Methods

Filed under: Formal Methods,Modeling,Programming — Patrick Durusau @ 6:38 pm

Use of Formal Methods at Amazon Web Services by Chris Newcombe, Tim Rath, Fan Zhang, Bogdan Munteanu, Marc Brooker, and, Michael Deardeuff. (PDF)

From the paper:

Since 2011, engineers at Amazon Web Services (AWS) have been using formal specification and model-checking to help solve difficult design problems in critical systems. This paper describes our motivation and experience, what has worked well in our problem domain, and what has not. When discussing personal experiences we refer to authors by their initials.

At AWS we strive to build services that are simple for customers to use. That external simplicity is built on a hidden substrate of complex distributed systems. Such complex internals are required to achieve high-availability while running on cost-efficient infrastructure, and also to cope with relentless rapid business-growth. As an example of this growth; in 2006 we launched S3, our Simple Storage Service. In the 6 years after launch, S3 grew to store 1 trillion objects [1]. Less than a year later it had grown to 2 trillion objects, and was regularly handling 1.1 million requests per second [2].

S3 is just one of tens of AWS services that store and process data that our customers have entrusted to us. To safeguard that data, the core of each service relies on fault-tolerant distributed algorithms for replication, consistency, concurrency control, auto-scaling, load-balancing, and other coordination tasks. There are many such algorithms in the literature, but combining them into a cohesive system is a major challenge, as the algorithms must usually be modified in order to interact properly in a real-world system. In addition, we have found it necessary to invent
algorithms of our own. We work hard to avoid unnecessary complexity, but the essential complexity of the task remains high.

The authors are not shy about arguing for the value of formal methods for complex systems:

In industry, formal methods have a reputation of requiring a huge amount of training and effort to verify a tiny piece of relatively straightforward code, so the return on investment is only justified in safety-critical domains such as medical systems and avionics. Our experience with TLA+ has shown that perception to be quite wrong. So far we have used TLA+ on 6 large complex real-world systems. In every case TLA+ has added significant value, either finding subtle bugs that we are sure we would not have found by other means, or giving us enough understanding and confidence to make aggressive performance optimizations without sacrificing correctness. We now have 7 teams using TLA+, with encouragement from senior management and technical leadership. (emphasis added)

Hard to argue with “real-world” success. Yes?

Well, or if you want your system to be successful. Say compare Amazon’s S3 with the ill-fated healthcare site.

The paper also covers what formal methods cannot do and recounts how this was sold to programmers within Amazon.

I suggest reading the paper more than once and following all the links in the bibliography, but if you are in a hurry, at least see these two:

Lamport, L. The TLA Home Page; http://research.microsoft.com/en-us/um/people/lamport/tla/tla.html

Lamport, L. The Wildfire Challenge Problem ; http://research.microsoft.com/en-us/um/people/lamport/tla/wildfire-challenge.html

Public forum of the TLA+ user community; https://groups.google.com/forum/?hl=en&fromgroups#!forum/tlaplus

Which leaves me with the question: How do you create a reliability guarantee for a topic map? Manual inspection doesn’t scale.

I first saw this in a tweet by Marc Brooker.

UK Houses of Parliament launches Open Data portal

Filed under: Government,Government Data,Open Data — Patrick Durusau @ 6:05 pm

UK Houses of Parliament launches Open Data portal

From the webpage:

Datasets related to the UK Houses of Parliament are now available via data.parliament.uk – the institution’s new dedicated Open Data portal.

Site developers are currently seeking feedback on the portal ahead of the next release, details of how to get in touch can be found by clicking here.

From the alpha release of the portal:

Welcome to the first release of data.parliament.uk – the home of Open Data from the UK Houses of Parliament. This is an alpha release and contains a limited set of features and data. We are seeking feedback from users about the platform and the data on it so please contact us.

I would have to agree that the portal presently contains “limited data.” 😉

What would be helpful for non-U.K. data miners as well as ones in the U.K., would be some sense of what data is available?

A PDF file listing data that is currently maintained on the UK Houses of Parliament, their members, record of proceedings, transcripts, etc. would be a good starting point.

Pointers anyone?

Regex Cross­word

Filed under: Entertainment — Patrick Durusau @ 4:51 pm

Regex Cross­word

From the webpage:

Welcome to the fantastic world of nerdy regex fun! Start playing by selecting one of the puzzle challenges below. There are a wide range of difficulties from beginner to expert.

From the “How to play” page:

Regex Crossword is a game similar to sudoku or the traditional crossword puzzle, where you must guess the correct letters in the horizontal and vertical lines of a grid. In Regex Crossword you are not given a word to guess, but a pattern that tells you which letters are allowed.

The NYT Crossword editors don’t need to start polishing their resumes. 😉

On the other hand, I can foresee office competitions!

Or at conferences? Someone needs to carry the word to the Balisage organizers

Ungoogleable Symbols from Clojure

Filed under: Clojure,Search Engines — Patrick Durusau @ 4:03 pm

The Weird and Wonderful Characters of Clojure by James Hughes.

From the post:

A reference collection of characters used in Clojure that are difficult to “google”. Descriptions sourced from various blogs, StackOverflow, Learning Clojure and the official Clojure docs – sources attributed where necessary. Type the symbols into the box below to search (or use CTRL-F). Sections not in any particular order but related items are grouped for ease. If I’m wrong or missing anything worthy of inclusion tweet me @kouphax or mail me at james@yobriefca.se.

Before reading further, do you agree/disagree that symbols are hard to search for?

Jot your reasons down.

Now try search for each of the following strings:

#

#{

#”

Hmmm, the post is on the WWW and indexed by Google.

I can prove that, search using Google for: “The Weird and Wonderful Characters of Clojure”.

I can understand the result for “#.” There are a variety of subjects that are all represented by “#” so that result isn’t surprising. You would have to distinguish the different subjects represented by “#,” something search engines don’t do.

That is search engines operate on surface strings only.

What is less understandable is the total failure on #{ and #”, with an without surrounding quotes.

If you are going to return results on “#,” it seems like you would return results on other arbitrary strings.

Can someone comment without violating their NDA with Google?

I first saw this in a tweet by Rob Stuttaford.

Offshore Leaks:…Azerbaijan

Filed under: Graphs,Neo4j — Patrick Durusau @ 2:26 pm

How to use Neo4j to analyse the Offshore Leaks : the case of Azerbaijan by Jean Villedieu.

From the post:

Introduction to Problem

The Offshore Leaks released in 2013 by the ICIJ is a rarity. It is a big dataset of real information about some of the most secret places on earth : the offshore financial centers. The investigation of the ICIJ brought to the surface many interesting stories including the suspicious activities of the President of Azerbaijan. We are going to see how graph technologies can help us make sense of the complex data in the Offshore Leaks.

Our data model for the Offshore Leaks

We want to know how the President of Azerbaijan is connected to offshore accounts. This means that we will need to focus on the network he uses to control his assets stored in offshore entities. These networks includes family members and a complex set of intermediaries or partners. We want to see how things are connected so we are going to have to represent each of these entities as distinct nodes in a graph.

A good tutorial on Neo4j, Cypher (query language) and modeling data.

Notice I didn’t say “modeling data with graphs.” That is the result in this case but modeling data should inform your choice of storage or analytical solutions. Saying that graphs can model any data is a truism that doesn’t lead to informed IT choices.

In this particular case I would suggest using graphs, in part because the relationships between actors and their types aren’t known in advance. Some aspects of stock trading systems would not present the same issues.

Graphs don’t have this as an inherent limitation but if several groups were gathering information about President Ilham Aliyev and quite easily using different names/identifiers, how would you merge those graphs together? Would you have to re-create the relationships between actors if new nodes had to replace old ones?

Graphs are very good for some data. Distributed and collaborative graphs are even better.

Further information on Offshore Leaks.

I first saw this in a tweet by GraphemeDB.

June 5, 2014

Community detection in networks:…

Filed under: Clustering,Networks,Topology — Patrick Durusau @ 7:17 pm

Community detection in networks: structural clusters versus ground truth by Darko Hric, Richard K. Darst, and, Santo Fortunato.

Abstract:

Algorithms to find communities in networks rely just on structural information and search for cohesive subsets of nodes. On the other hand, most scholars implicitly or explicitly assume that structural communities represent groups of nodes with similar (non-topological) properties or functions. This hypothesis could not be verified, so far, because of the lack of network datasets with ground truth information on the classification of the nodes. We show that traditional community detection methods fail to find the ground truth clusters in many large networks. Our results show that there is a marked separation between structural and annotated clusters, in line with recent findings. That means that either our current modeling of community structure has to be substantially modified, or that annotated clusters may not be recoverable from topology alone.

Deeply interesting work if you are trying to detect “subjects” by clustering nodes in a network.

I would heed the warning that typology may not accurately represent hidden information.

Beyond this particular case, I would test any assumption that some known factor represents an unknown factor(s) for any data set. Better than the results surprise you than your client.

I first saw this in a tweet by Brian Keegan.

PS: As you already know, “ground truth” depends upon your point of view. Don’t risk your work on the basis of someone else’s “ground truth.”

Leipzig from scratch

Filed under: Clojure,Music — Patrick Durusau @ 6:53 pm

Leipzig from scratch (GitHub) by Chris Ford.

From the description:

I show you how to make a simple track using Leipzig, Overtone and Clojure, from “lein new” onwards.

And you can type along with the video!

Enjoy!

UniArab:…

Filed under: Linguistics,Translation — Patrick Durusau @ 4:18 pm

UniArab: An RRG Arabic-to-English Machine Translation Software by Dr. Brian Nolan and Yasser Salem.

A slide deck introducing UniArab.

I first saw this mentioned in a tweet by Christopher Phipps.

Which was enough to make me curious about the software and perhaps the original paper.

UniArab: An RRG Arabic-to-English Machine Translation Software (paper) by Brian Nolan and Yasser Salem.

Abstract:

This paper presents a machine translation system (Hutchins 2003) called UniArab (Salem, Hensman and Nolan 2008). It is a proof-of-concept system supporting the fundamental aspects of Arabic, such as the parts of speech, agreement and tenses. UniArab is based on the linking algorithm of RRG (syntax to semantics and vice versa). UniArab takes MSA Arabic as input in the native orthography, parses the sentence(s) into a logical meta-representation based on the fully expanded RRG logical structures and, using this, generates perfectly grammatical English output with full agreement and morphological resolution. UniArab utilizes an XML-based implementation of elements of the Role and Reference Grammar theory in software. In order to analyse Arabic by computer we first extract the lexical properties of the Arabic words (Al-Sughaiyer and Al-Kharashi 2004). From the parse, it then creates a computer-based representation for the logical structure of the Arabic sentence(s). We use the RRG theory to motivate the computational implementation of the architecture of the lexicon in software. We also implement in software the RRG bidirectional linking system to build the parse and generate functions between the syntax-semantic interfaces. Through seven input phases, including the morphological and syntactic unpacking, UniArab extracts the logical structure of an Arabic sentence. Using the XML-based metadata representing the RRG logical structure, UniArab then accurately generates an equivalent grammatical sentence in the target language through four output phases. We discuss the technologies used to support its development and also the user interface that allows for the
addition of lexical items directly to the lexicon in real time. The UniArab system has been tested and evaluated generating equivalent grammatical sentences, in English, via the logical structure of Arabic sentences, based on MSA Arabic input with very significant and accurate results (Izwaini 2006). At present we are working to greatly extend the coverage by the addition of more verbs to the lexicon. We have demonstrated in this research that RRG is a viable linguistic model for building accurate rulebased semantically oriented machine translation software. Role and Reference Grammar (RRG) is a functional theory of grammar that posits a direct mapping between the semantic representation of a sentence and its syntactic representation. The theory allows a sentence in a specific language to be described in terms of its logical structure and grammatical procedures. RRG creates a linking relationship between syntax and semantics, and can account for how semantic representations are mapped into syntactic representations. We claim that RRG is very suitable for machine translation of Arabic, notwithstanding well-documented difficulties found within Arabic MT (Izwaini, S. 2006), and that RRG can be implemented in software as the rule-based kernel of an Interlingua bridge MT engine. The version of Arabic (Ryding 2005, Alosh 2005, Schulz 2005), we consider in this paper is Modern Standard Arabic (MSA), which is distinct from classical Arabic. In the Arabic linguistic tradition there is not a clear-cut, well defined analysis of the inventory of parts of speech in Arabic.

At least as of today, http://informatics.itbresearch.id/~ysalem/ times out. Other pointers?

Interesting work on Arabic translation. Makes me curious about adaptation of these techniques to map between semantic domains.

I first saw this in a tweet by Christopher Phipps.

Presenting: structure story and support

Filed under: Communication,Presentation — Patrick Durusau @ 3:13 pm

Presenting: structure story and support by Felienne Hermans.

From the description:

Conference presentations are the moment to share your results, and to connect with researchers about future directions. However, presentations are often created as an afterthought and as a result they are often not as exciting as they could be.

In this slidedeck Felienne Hermans shares hands-on techniques to engage an audience.

The talk covers the entire spectrum of presenting: we start with advice on how to structure a talk and how to incorporate a core message into it. Once we have addressed the right structure for a talk, we will work on adding stories and arcs of tension to your presentation. Finally, to really perform as a presenter, we will talk about how slide design and body language can support your presentation.

If you want to effectively present topic maps or other technologies, this is a slide deck you cannot miss!

If Felienne Hermans is presenting this or some future version of this presentation at a conference, that alone is a reason to register.

Seriously.

I first saw this in a tweet by Olga Liskin.

Extending The Apple Ghetto

Filed under: Programming — Patrick Durusau @ 2:49 pm

Get Started With Apple’s Swift Programming Language With a Free eBook by Patrick Allan.

Patrick’s blog summarizes a free book on the Swift programming language when he says:

Keep in mind, you will need an iOS device to read the book. (emphasis added)

Can you imagine the hue and cry if Microsoft published a book that required Windows 8 to read it?

I suppose that is one way to keep a market share, make a righteous ghetto out of it.

New SSL Issues

Filed under: Cybersecurity,NSA,Privacy,Security — Patrick Durusau @ 2:35 pm

OpenSSL Security Advisory [05 Jun 2014]

Seven new SSL bugs have been documented. See the advisory for details.

Given how insecure the Net is at present, I have to wonder at the effectiveness of Reset The Net at stopping mass surveillance?

I agree with ending mass surveillance but mostly because storing all that data is contractor waste.

I first saw this in a tweet by Nick Sullivan.

Wandora 2014-06-05 Available!

Filed under: Topic Map Software,Wandora — Patrick Durusau @ 10:46 am

Wandora 2014-06-05 Available!

From the webpage:

Read carefully Wandora’s system requirements and license before downloading and installing the application. The Wandora application requires Java Runtime Environment (JRE) or Java Development Kit (JDK) version 7. Neither JRE nor JDK is not included in the distribution packages. We emphasize that the Wandora is an ongoing project and the software is incomplete, absolutely contains bugs and the feature set may change without notice. Download Wandora’s latest version (build date 2014-06-05, see Change log):

Of particular interest:

Twitter extractor has been updated to reflect Twitter API changes.

Enjoy!

A Topic Map Classic

Filed under: Topic Maps — Patrick Durusau @ 10:12 am

I ran into a classic topic map problem today.

I have been trying to find a way to move a very large refrigerator out of an alcove to clean its coils.

Searching the web, I found a product by Airsled that is the perfect solution for me. Unfortunately, the cheapest one is over $500.00 US.

I really want to move the refrigerator but to move it once every four or five years, that seems really expensive.

Reasoning that other people would have the same reaction, I started calling equipment rental places, describing the tool and calling it by the manufacturer’s name, Airsled.

The last place I talked to this afternoon offered several other solutions but no, they had no such device.

This evening I was searching the web again and added “rental” to my search for aid sled and got the last place I called today.

You already know what the problem turns out to be.

Their name for the device?

700lb Air Appliance Mover Dolly

But when you compare:

The Airsled:

Airsled

to the 700 Pound Appliance Mover Dolly:

700 pound appliance mover

You get the idea they are the same thing.

Yes?

But for the chance finding of the reference to the local rental store and following it, they would have lost the rental, my refrigerator would not get moved, etc.

That’s just one experience. Imagine all the similar experiences today.

June 4, 2014

Sharing key indexes

Filed under: Saxon,XPath,XSLT — Patrick Durusau @ 7:15 pm

Sharing key indexes by Michael Kay.

From the post:

For ever and a day, Saxon has tried to ensure that when several transformations are run using the same stylesheet and the same source document(s), any indexes built for those documents are reused across transformations. This has always required some careful juggling of Java weak references to ensure that the indexes are dropped from memory as soon as either the executable stylesheet or the source document are no longer needed.

I’ve now spotted a flaw in this design. It wasn’t reported by a user, and it didn’t arise from a test case, it simply occurred to me as a theoretical possibility, and I have now written a test case that shows it actually happens. The flaw is this: if the definition of the key includes a reference to a global variable or a stylesheet parameter, then the content of the index depends on the values of global variables, and these are potentially different in different transformations using the same stylesheet.

Michael discovers a very obscure bug entirely on his own and yet resolves to fix it.

That is so unusual that I thought it merited mentioning.

It should give you great confidence in Saxon.

How that impacts your confidence on other software I cannot say.

Xanadu is Launching!

Filed under: Hypertext,Xanadu — Patrick Durusau @ 6:53 pm

Xanadu

Transclusion? You be the judge.

All I can say is that it appears to be so.

Comments?

PS: I want to start cheering, loudly, but without more, I can’t. Not yet.

Python for Data Science

Filed under: Data Science,Python — Patrick Durusau @ 6:37 pm

Python for Data Science by Joe McCarthy.

From the post:

This short primer on Python is designed to provide a rapid “on-ramp” to enable computer programmers who are already familiar with concepts and constructs in other programming languages learn enough about Python to facilitate the effective use of open-source and proprietary Python-based machine learning and data science tools.

Uses an IPython Notebook for delivery.

This is a tutorial you will want to pass on to others! Or emulate if you want to cover another language or subject.

I first saw this in a tweet by Tom Brander.

RenderMan/RIS

Filed under: Graphics,Visualization — Patrick Durusau @ 6:28 pm

RenderMan/RIS and the start of next 25 years by Mike Seymour.

From the post:

At SIGGRAPH last July, Pixar celebrated 25 years of RenderMan (see our story here). Today the company has announced new breakthrough technology, a new commitment to R&D and massive pricing changes including free access to RenderMan for non-commercial use. Ed Catmull, President, Walt Disney and Pixar Animation Studios, along with Dana Batali, VP of RenderMan Products, Chris Ford, RenderMan’s Business Director and the Pixar RenderMan team have introduced sweeping changes to the way RenderMan will be developed, sold and the very latest technology that will ship before SIGGRAPH 2014 in Vancouver. This is clearly the start of the next 25 years.

The new product is a combination of RenderMan Pro Server and RenderMan Studio. There will now be one product, used by artists or on the farm, and movable between the two. The new RenderMan has a powerful bi-directional path tracer and serious new technology from Disney Animation, which underlines a new unified approach to rendering from the House of Mouse – the amazing powerhouse that is Disney today.
….

If you appreciate high-end graphics, you owe it to yourself to read Mike’s post and watch the videos.

And if you want to try the software, you have to appreciate the simplicity of their license:

There is only one RenderMan and the free non-commercial RenderMan is exactly the same as the commercial version. There are no watermarks, no time limits, and no reduced functionality. The only limitation is that upon acceptance of the EULA at initial installation, the software is to be only used for non-commercial purposes. We want to keep it very simple, and as importantly, RenderMan highly accessible.

Enjoy!

Health Intelligence

Filed under: Data Mining,Intelligence,Visualization — Patrick Durusau @ 4:55 pm

Health Intelligence: Analyzing health data, generating and communicating evidence to improve population health. by Ramon Martinez.

I was following a link to Ramon’s Data Sources page when I discovered his site. The list of data resources is long and impressive.

But there is so much more under Resources!

  • Data Tools
  • Database (DB) Blogs
  • Data Visualization Tools
  • Data Viz Blogs
  • Reading for Data Visualizations
  • Best of the Web…
  • Tableau Training
  • Going to School
  • Reading for Health Analysis

You will probably like the rest of the site as well!

Data tools/visualization are very ecumenical.

Introduction to Functional Programming

Filed under: Functional Programming,Haskell — Patrick Durusau @ 4:21 pm

Introduction to Functional Programming

October 2014 8 weeks

From the description:

Broadly speaking, functional programming is a style of programming in which the primary method of computation is the application of functions to arguments. Among other features, functional languages offer a compact notation for writing programs, powerful abstraction methods for structuring programs, and a simple mathematical basis that supports reasoning about programs.

Functional languages represent the leading edge of programming language design, and the primary setting in which new programming concepts are introduced and studied. All contemporary programming languages such as Hack/PHP, C#, Visual Basic, F#, C++, JavaScript, Python, Ruby, Java, Scala, Clojure, Groovy, Racket, … support higher-order programming via the concept of closures or lambda expressions.

This course will use Haskell as the medium for understanding the basic principles of functional programming. While the specific language isn’t all that important, Haskell is a pure functional language so it is entirely appropriate for learning the essential ingredients of programming using mathematical functions. It is also a relatively small language, and hence it should be easy for you to get up to speed with Haskell.

Once you understand the Why, What and How that underlies pure functional programming and learned to “think like a fundamentalist”, we will apply the concepts of functional programming to “code like a hacker” in mainstream programming languages, using Facebook’s novel Hack language as our main example.

This course assumes no prior knowledge of functional programming, but assumes you have at least one year of programming experience in a regular programming language such as Java, .NET, Javascript or PHP.

Or, I could have just said:

Erik Meijer is teaching this course. Enough said. 😉

Overview and Splitting PDF Files

Filed under: News,PDF,Reporting — Patrick Durusau @ 4:09 pm

I have been seeing tweets from the Overview Project that as of today, yoiu can split PDF files into pages without going through DocumentCloud or other tools.

I don’t have Overview installed so I can’t confirm that statement but if true, it is a step in the right direction.

Think about it for a moment.

If you “tag” a one hundred page PDF file with all the “tags” you need to return to that document, what happens? Sure, you can go back to that document, but then you have to search for the material you were tagging.

It is a question of the granularity of your “tagging.” Now imagine tagging a page in PDF. Is it now easier for you to return to that one page? Can you also say it would be easier for someone else to return to the same page following your path?

Which makes you wonder about citation practices that simply cite an article and not a location within the article.

Are they trying to make your job as a reader that much harder?

Unicode Character Table

Filed under: Unicode — Patrick Durusau @ 1:03 pm

Unicode Character Table

A useful webpage that I first saw in a tweet by Scott Chamberlain.

Displays Unicode characters on “buttons” that when selected displays the Unicode Hex code and HTML code for the selected character.

Quite useful when all you need is one entity value for a post.

If you need more information try Unicode Table – The Unicode Character Reference, which for “Latin Small Letter D” displays:

Unicode Character Information
Unicode Hex U+0064
Character Name LATIN SMALL LETTER D
General Category Lowercase Letter [Code: Ll]
Canonical Combining Class 0
Bidirectional Category L
Mirrored N
Uppercase Version U+0044
Titlecase Version U+0044
Unicode Character Encodings
Latin Small Letter D HTML Entity d (decimal entity), d (hex entity)
Windows Key Code Alt 0100 or Alt +00641
Programming Source Code Encodings Python hex: u”\u0064″, Hex for C++ and Java: “\u0064”
UTF-8 Hexadecimal Encoding 0x64

Or if you need all the information available on Unicode and to know it is the canonical information, see http://www.unicode.org/

« Newer PostsOlder Posts »

Powered by WordPress