Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 23, 2014

Early Canadiana Online

Filed under: Data,Language,Library — Patrick Durusau @ 6:50 pm

Early Canadiana Online

From the webpage:

These collections contain over 80,000 rare books, magazines and government publications from the 1600s to the 1940s.

This rare collection of documentary heritage will be of interest to scholars, genealogists, history buffs and anyone who enjoys reading about Canada’s early days.

The Early Canadiana Online collection of rare books, magazines and government publications has over 80,000 titles (3,500,000 pages) and is growing. The collection includes material published from the time of the first European settlers to the first four decades of the 20th Century.

You will find books written in 21 languages including French, English, 10 First Nations languages and several European languages, Latin and Greek.

Every online collection such as this one, increases the volume of information that is accessible and also increases the difficulty of finding related information for any given subject. But the latter is such a nice problem to have!

I first saw this in a tweet from Lincoln Mullen.

Capability URLs: We Need Your Feedback

Filed under: Cybersecurity,Security — Patrick Durusau @ 4:23 pm

Capability URLs: We Need Your Feedback by Daniel Appelquist.

From the post:

The battle for web security and privacy is fought at many levels. Sometimes common practice in web application design can lead to data leakage with untended consequences for users. A good example of this came up recently where confidential files shared through common web-based document sharing services were being exposed unintentionaly to third parties because the private URLs used to share them had been unintentionally leaked.

URLs that allow a user to access an otherwise privileged resource or information are called Capability URLs, and while they can be powerful, they can also cause potential problems when used improperly.

TAG member Jeni Tennison has been working on a draft defining the space of capability URLs and outlining some good practices for usage. We think this document should be useful for web builders who are thinking about incorporating this pattern into their applications. We think it’s pretty good, but we need your feedback before we finalize it and release it as a TAG finding.

The draft may be found here: http://www.w3.org/TR/capability-urls/ and if you have feedback you are encouraged to raise an issue on github or e-mail us on the TAG public mailing list. Thanks!

The most common example that Jeni mentions is a password reset URL, which allows anyone using that URL to reset a user’s password.

Interesting document and one that merits your review and any comments you may have.

It would not work in an ordinary browser but I wonder about generation of a “capacity URL” via a Challenge-response authentication?

Using the password example, assume that I have selected “Lost my password,” and the server returns a URL that ends with a challenge token that requires some calculation on my part. That is I get a “capacity URL” but the “capacity URL” that I must return is different.

Should be as secure are your challenge-response authentication. Yes?

That may be an edge case but if we are outside of browser land, I could see that being built into an application.

Suggestions?

Data Analytics Handbook

Filed under: Analytics,BigData,Data Analysis — Patrick Durusau @ 3:58 pm

Data Analytics Handbook

The “handbook” appears in three parts, the first of which you download, while links to parts 2 and 3 are emailed to you for participating in a short survey. The survey collects your name, email address, educational background (STEM or not), and whether you are interested in a new resource that is being created to teach data analysis.

Let’s be clear up front that this is NOT a technical handbook.

Rather all three parts are interviews with:

Part 1: Data Analysts + Data Scientists

Part 2: CEO’s + Managers

Part 3: Researchers + Academics

Technical handbooks abound but this is one of the few (only?) books that covers the “soft” side of data analytics. By the “soft” side I mean the people and personal relationships that make up the data analytics industry. Technical knowledge is a must but being able to work well with others is as if not more important.

The interviews are wide ranging and don’t attempt to provide cut-n-dried answers. Readers will need to be inspired by and adapt the reported experiences to their own circumstances.

Of all the features of the books, I suspect I liked the “Top 5 Take Aways” the best.

In the interest of full disclosure, that maybe because part 1 reported:

2. The biggest challenge for a data analyst isn’t modeling, it’s cleaning and collecting

Data analysts spend most of their time collecting and cleaning the data required for analysis. Answering questions like “where do you collect the data?”, “how do you collect the data?”, and “how should you clean the data?”, require much more time than the actual analysis itself.

Well, when someone puts your favorite hobby horse at #2, see how you react. 😉

I first saw this in a tweet by Marin Dimitrov.

Kafka-Storm-Starter

Filed under: Avro,Kafka,Storm — Patrick Durusau @ 3:27 pm

Kafka-Storm-Starter by Michael G. Noll.

From the webpage:

Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+, while using Apache Avro as the data serialization format.

If you aren’t excited already (from their respective homepages):

Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.

Apache Storm is a free and open source distributed realtime computation system.

Apache Avro™ is a data serialization system.

Now are you excited?

Good!

Note the superior organization of the project documentation!

Following the table of contents you find:

Quick Start

Show me!

$ ./sbt test

Short of starting up remotely and allowing you to import/keyboard data, I can’t imagine an easier way to start project documentation.

It’s a long weekend in the United States so check out Michael G. Noll’s GitHub repository for other interesting projects.

Learning Everything About Anything (sort of)

Filed under: Artificial Intelligence,Machine Learning — Patrick Durusau @ 2:52 pm

Meet the algorithm that can learn “everything about anything” by Dennis Harris.

From the post:

One of the more interesting projects is a system called LEVAN, which is short for Learn EVerything about ANything and was created by a group of researchers out of the Allen Institute for Artificial Intelligence and the University of Washington. One of them, Carlos Guestrin, is also co-founder and CEO of a data science startup called GraphLab. What’s really interesting about LEVAN is that it’s neither human-supervised nor unsupervised (like many deep learning systems), but what its creators call “webly supervised.”

(image omitted)

What that means, essentially, is that LEVAN uses the web to learn everything it needs to know. It scours Google Books Ngrams to learn common phrases associated with a particular concept, then searches for those phrases in web image repositories such as Google Images, Bing and Flickr. For example, LEVAN now knows that “heavyweight boxing,” “boxing ring” and “ali boxing” are all part of the larger concept of “boxing,” and it knows what each one looks like.

When I said “sort of” in the title I didn’t mean any disrespect for LEVAN. On the contrary, the researchers limiting LEVAN to Google Book Ngrams and images is a brilliant move. That limits LEVAN to the semantic debris that can be found in public image repositories but depending upon your requirements, that may be more than sufficient.

The other upside is that despite a pending patent, sigh, the source code is available for research/academic purposes.

What data sets make useful limits for your AI/machine learning algorithm? Your application need not understand intercepted phone conversations, Barbara Walters, or popular music, if those are not in your requirements. Simplifying your AI problem may be the first step towards solving it.

The Secret History of Hypertext

Filed under: Hypertext,WWW — Patrick Durusau @ 2:23 pm

The Secret History of Hypertext by Alex Wright.

From the post:

When Vannevar Bush’s “As We May Think” first appeared in The Atlantic’s pages in July 1945, it set off an intellectual chain reaction that resulted, more than four decades later, in the creation of the World Wide Web.

In that landmark essay, Bush described a hypothetical machine called the Memex: a hypertext-like device capable of allowing its users to comb through a large set of documents stored on microfilm, connected via a network of “links” and “associative trails” that anticipated the hyperlinked structure of today’s Web.

Historians of technology often cite Bush’s essay as the conceptual forerunner of the Web. And hypertext pioneers like Douglas Engelbart, Ted Nelson, and Tim Berners-Lee have all acknowledged their debt to Bush’s vision. But for all his lasting influence, Bush was not the first person to imagine something like the Web.

Alex identifies several inventors in the early 20th who proposed systems quite similar to Vannevar Bush’s, prior to the publication of “As We May Think”. A starting place that may get you interested in learning the details of these alternate proposals.

Personally I would separate the notion of “hypertext” from the notion of networking remote sites together (not by Bush but by others) and that pushes the history of hypertext much further back in time.

Enjoy!

I first saw this in a tweet by Ed H. Chi.

May 22, 2014

texblog

Filed under: TeX/LaTeX — Patrick Durusau @ 7:33 pm

texblog

From the about page:

My intention is to provide valuable tips and tricks for your daily LaTeX editing. In addition, I’ll try to give answers to questions which are not easily found on the web.

If you have a topic you think I should write an article about, any question on LaTex or if there are errors/bad links in my blog, please let me know.

I get quite a few questions/problems which I try to answer/solve in order to help with your editing. Quite often, however, the answer to your question can be found within the previous comments. Please include a minimal example that illustrates your problem if you are posting a comment with a question or problem.

If you are interested in serious typography, this is a blog you need to follow.

Or if you are interested in indexing TeX/LaTeX files.

ClojureDocs

Filed under: Clojure,Topic Maps — Patrick Durusau @ 7:23 pm

ClojureDocs

From the webpage:

ClojureDocs is a community-powered documentation and examples repository for the Clojure programming language.

Currently in beta but has an API to allow integration of ClojureDocs’ backend with other applications.

I like the integration of a quick reference with examples, when there are examples.

What would make this resource better would be the integration of existing source code, keyed to Clojure vars.

That sounds like a topic map doesn’t it?

Everything is Broken

Filed under: Cybersecurity,NSA,Security — Patrick Durusau @ 7:06 pm

Everything is Broken by Quinn Norton.

From the post:

Once upon a time, a friend of mine accidentally took over thousands of computers. He had found a vulnerability in a piece of software and started playing with it. In the process, he figured out how to get total administration access over a network. He put it in a script, and ran it to see what would happen, then went to bed for about four hours. Next morning on the way to work he checked on it, and discovered he was now lord and master of about 50,000 computers. After nearly vomiting in fear he killed the whole thing and deleted all the files associated with it. In the end he said he threw the hard drive into a bonfire. I can’t tell you who he is because he doesn’t want to go to Federal prison, which is what could have happened if he’d told anyone that could do anything about the bug he’d found. Did that bug get fixed? Probably eventually, but not by my friend. This story isn’t extraordinary at all. Spend much time in the hacker and security scene, you’ll hear stories like this and worse.

It’s hard to explain to regular people how much technology barely works, how much the infrastructure of our lives is held together by the IT equivalent of baling wire.

Computers, and computing, are broken.

Your reaction may be different but I took Quinn’s essay as a breath of fresh air.

Seriously. The predictions of a computer assisted nirvana emerging from big data, graphs, participation, etc. are tiresome. Not to mention false.

Quinn does a great job of outlining the current problems with computers and computing as well as fixing the blame for the same.

Take a look in the mirror.

Yep, it isn’t some evildoer lurking behind a tree.

True, evildoers may take advantage of the system we have allowed to happen, but that’s a symptom and not a cause.

Read Quinn’s essay and decide how your participation is going to change in what we have wrought.

I first saw this in Nat Torkington’s Four short links: 21 May 2014.

PDFium

Filed under: PDF — Patrick Durusau @ 6:54 pm

PDFium

From the webpage:

PDFium is an open-source PDF rendering engine.

Just in case you need a PDF rendering engine for your topic map application and/or want to make subjects out of the internal structure of PDF files.

I first saw this at Nat Torkington’s Four short links: 22 May 2014.

Nomad and Historic Information

Filed under: Archives,Data,Documentation — Patrick Durusau @ 10:55 am

You may remember Nomad from the Star Trek episode The Changeling. Not quite on that scale but NASA has signed an agreement to allow citizen scientists to “wake up” a thirty-five (35) year old spacecraft this next August.

NASA has given a green light to a group of citizen scientists attempting to breathe new scientific life into a more than 35-year old agency spacecraft.

The agency has signed a Non-Reimbursable Space Act Agreement (NRSAA) with Skycorp, Inc., in Los Gatos, California, allowing the company to attempt to contact, and possibly command and control, NASA’s International Sun-Earth Explorer-3 (ISEE-3) spacecraft as part of the company’s ISEE-3 Reboot Project. This is the first time NASA has worked such an agreement for use of a spacecraft the agency is no longer using or ever planned to use again.

The NRSAA details the technical, safety, legal and proprietary issues that will be addressed before any attempts are made to communicate with or control the 1970’s-era spacecraft as it nears the Earth in August.

“The intrepid ISEE-3 spacecraft was sent away from its primary mission to study the physics of the solar wind extending its mission of discovery to study two comets.” said John Grunsfeld, astronaut and associate administrator for the Science Mission Directorate at NASA headquarters in Washington. “We have a chance to engage a new generation of citizen scientists through this creative effort to recapture the ISEE-3 spacecraft as it zips by the Earth this summer.” NASA Signs Agreement with Citizen Scientists Attempting to Communicate with Old Spacecraft

Do you have any thirty-five (35) year old software you would like to start re-using? 😉

What information should you have captured for that software?

The crowdfunding is in “stretch mode,” working towards $150,000. Support at: ISEE-3 Reboot Project by Space College, Skycorp, and SpaceRef.

May 21, 2014

Govcode

Filed under: Open Source,Programming — Patrick Durusau @ 8:11 pm

Govcode: Government Open Source Projects

This is a handy collection of government projects from GitHub.

Weren’t we just talking about a corpus of software earlier today? Are you thinking about a corpus of government open source projects would give you insight into their non-open source code?

Like handwriting, don’t programmers code the same way for open as well as closed source software?

Interesting.

Tie people to projects, code, agencies, and magic happens.

Mapped with a purpose

Filed under: Mapping,Maps — Patrick Durusau @ 8:02 pm

Mapped with a purpose by Andrew Janes.

From the post:

A few years ago, a colleague asked me for help in finding a map. What he wanted, he told me, was a fairly up-to-date map that showed Great Britain at ‘a normal scale’.

After laughing briefly at him, and then apologising for my rudeness, I plucked a road atlas from one of the bookshelves in our staff reading room. Fortunately, this seemed to fit the bill perfectly.

With the benefit of hindsight, I think that there are two morals to this story:

1. What is obvious to one person is not obvious to another. Like any good researcher, my colleague should have tried to explain what he wanted in a less ambiguous way. Equally, like any good public services archivist, I should have helped him to shape his request into something more sensible.

2. There is no such thing as a ‘normal’ map. The features and attributes of any map (including its scale) depend on its purpose. Why was it made and what was it intended to be used for? Two maps of the same place made at roughly the same time, but for different reasons, can look quite unlike one another.

Andrew illustrates his point with three maps of Nottingham made during the early 20th century. Vastly different even to the unpracticed eye.

The maps are quite fascinating but I leave to you to visit Andrew’s post for those.

Andrew then concludes with:

If you want to look for an old map and you think that The National Archives might have what you want, the map pages on our website are your best starting point.

When searching, bear in mind that a map showing the right place at the right date may not suit your needs in other ways. In other words, think about ‘what’ and ‘why’, as well as ‘where’ and ‘when’.

Those are some of the same rules for writing and reading topic maps.

I say “some” because Andrew is presuming a location on a particular sphere, whereas topic maps don’t start with that presumption. 😉

Corpus-based Empirical Software Engineering

Filed under: Corpora,Programming — Patrick Durusau @ 7:50 pm

Corpus-based Empirical Software Engineering – Ekaterina Pek by Felienne Hermans.

Felienne was live blogging Ekaterina’s presentation and defense (Defense of Ekaterina Pek May 21, 2014) today.

From the presentation notes:

The motivation for Kate’s work, she tells us, is the work of Knuth who empirically studied punchcards with FORTRAN code, in order to discover ‘what programmers really do’, as opposed to ‘what programmers should do’

Kate has the same goal: she wants to measure use of languages:

  • frequency counts -> How often are parts of the language used?
  • coverage -> What parts of the language are used?
  • footprint -> How much of each language part is used?

In order to be able to perform such analyses, we need a ‘corpus’ a big set of language data to work on. Knuth even collected punch cards from garbage bins, because it was so important for him to get more data.

And it is not just code she looked at, also libraries, bugs, emails and commits are taken into account. But some have to be sanitized in order to be usable for the corpus.

Now there is an interesting sea of subjects.

Imagine exploring such a corpus for patterns of bugs and merging in patterns found in bug reports.

After all, bugs are introduced with programmers program as they do in real life, not as they would in theory.

Getting functional with Erlang

Filed under: Erlang,Functional Programming — Patrick Durusau @ 6:56 pm

Getting functional with Erlang by Mark Nijhof.

From the webpage:

This book will get you started writing Erlang applications right from the get go. After some initial chapters introducing the language syntax and basic language features we will dive straight into building Erlang applications. While writing actual code you will discover and learn more about the different Erlang and OTP features. Each application we create is geared towards a different use-case, exposing the different mechanics of Erlang. 

I want this to become the book I would have read myself, simple and to the point. Something to help you get functional with Erlang quickly. I imagine you; with one hand holding your e-reader while typing code with the other hand.

I have made a broad assumption: Because only smart people would want to learn Erlang (that is you), that you are then also smart enough to find your way to all the language specifics when needed. So this book is not meant as a complete reference guide for Erlang. But it will teach you enough to give you a running start.

When you have reached the end of this book you will be able to build a full blown Erlang application and release it into production. You will understand the core Erlang features like; pattern matching, message passing, working with processes, and hot code swapping.

I haven’t bought a copy, but that is a reflection on my book budget and not Mark’s book.

Take a look and pass this along to others. Mark is using a publishing model that merits encouragement.

OpenIntro Statistics

Filed under: Mathematics,Statistics — Patrick Durusau @ 6:44 pm

OpenIntro Statistics

From the about page:

The mission of OpenIntro is to make educational products that are free, transparent, and lower barriers to education.

The site includes a textbook, labs (R), videos, teachers resources, forums and extras, including data.

A good template for courses in other technical areas.

I first saw this in Chris Blattman’s Links I liked

Online Statistics Education:…

Filed under: Mathematics,Statistics — Patrick Durusau @ 4:58 pm

Online Statistics Education: An Interactive Multimedia Course of Study. Project Leader: David M. Lane, Rice University.

From the project homepage:

Online Statistics: An Interactive Multimedia Course of Study is a resource for learning and teaching introductory statistics. It contains material presented in textbook format and as video presentations. This resource features interactive demonstrations and simulations, case studies, and an analysis lab.

A far cry from introductory statistics pre-Internet. Definitely a resource to recommend to others.

I first saw this in Chris Blattman’s Links I liked

Your own search engine…

Filed under: Search Engines,Searching,Solr — Patrick Durusau @ 4:46 pm

Your own search engine (based on Apache Solr open-source enterprise-search)

From the webpage:

Tools for easier searching with free software on your own server

  • search in many documents, images and files
    • full text search with powerful search operators
    • in many different formats (text, word, openoffice, PDF, sheets, csv, doc, images, jpg, video and many more)
    • get a overview by explorative search and comfortable and powerful navigation with faceted search (easy to use interactive filters)
  • analyze documents (preview, extracted text, wordlists and visualizations with wordclouds and trend charts)
  • structure your research, investigation, navigation, metadata or notes (semantic wiki for tagging documents, annotations and structured notes)
  • OCR: automatic text recognition for images and graphical content or scans inside PDF, i.e. for scanned or photographed documents

Do you think this would be a way to pull back the curtain on search a bit? To show people that even results like we see from Google require more than casual effort?

I ask because Jeni Tennison tweeted earlier today:

#TDC14 @emckean “search is the hammer that makes us think everything is a nail that can be searched for”

Is a common misunderstanding of search making “improved” finding methods a difficult sell?

Not that I have a lot of faith or interest in educating potential purchasers. Finding a way to use the misunderstanding seems like a better marketing strategy to me.

Suggestions?

How we built interactive heatmaps…

Filed under: Design,Heatmaps,Interface Research/Design — Patrick Durusau @ 2:22 pm

How we built interactive heatmaps using Solr and Heatmap.js by Chris Becker.

From the post:

One of the things we obsess over at Shutterstock is the customer experience. We’re always aiming to better understand how customers interact with our site in their day to day work. One crucial piece of information we wanted to know was which elements of our site customers were engaging with the most. Although we could get that by running a one-off report, we wanted to be able to dig into that data for different segments of customers based on their language, country, purchase decisions, or a/b test variations they were viewing in various periods of time.

To do this we built an interactive heatmap tool to easily show us where the “hot” and “cold” parts of our pages were — where customers clicked the most, and where they clicked the least. The tool we built overlaid this heatmap on top of the live site, so we could see the site the way users saw it, and understand where most of our customer’s clicks took place. Since customers are viewing our site in many different screen resolutions we wanted the heatmap tool to also account for the dynamic nature of web layouts and show us heatmaps for any size viewport that our site is used in.

If you are offering a web interface to topic map (or other information services) this is a great way to capture user feedback on your UI.

PS: shutterstock-heatmap-toolkit (GitHub)

Practical Relevance Ranking for 11 Million Books, Part 1

Filed under: Relevance,Searching — Patrick Durusau @ 1:54 pm

Practical Relevance Ranking for 11 Million Books, Part 1 by Tom Burton-West.

From the post:

This is the first in a series of posts about our work towards practical relevance ranking for the 11 million books in the HathiTrust full-text search application.

Relevance is a complex concept which reflects aspects of a query, a document, and the user as well as contextual factors. Relevance involves many factors such as the user’s preferences, the user’s task, the user’s stage in their information-seeking, the user’s domain knowledge, the user’s intent, and the context of a particular search.

While many different kinds of relevance have been discussed in the literature, topical relevance is the one most often used in testing relevance ranking algorithms. Topical relevance is a measure of “aboutness”, and attempts to measure how much a document is about the topic of a user’s query.

At its core, relevance ranking depends on an algorithm that uses term statistics, such as the number of times a query term appears in a document, to provide a topical relevance score. Other ranking features that try to take into account more complex aspects of relevance are built on top of this basic ranking algorithm.

In many types of search, such as e-commerce or searching for news, factors other than the topical relevance (based on the words in the document) are important. For example, a search engine for e-commerce might have facets such as price, color, size, availability, and other attributes, that are of equal importance to how well the user’s query terms match the text of a document describing a product. In news retrieval, recency[iii] and the location of the user might be factored into the relevance ranking algorithm. (footnotes omitted)

Great post that discusses the impact of the length of a document on its relevancy ranking by Lucene/Solr. That impact is well known but how to move from studies on relevancy studies with short documents to long documents (books) isn’t known.

I am looking forward to Part 2, which will cover the relationship between relevancy and document length.

Apache Lucene/Solr 4.8.1 (Bug Fixes)

Filed under: Lucene,Solr — Patrick Durusau @ 1:23 pm

From the Lucene News:

The Lucene PMC is pleased to announce the availability of Apache Lucene 4.8.1 and Apache Solr 4.8.1.

Lucene can be downloaded from http://lucene.apache.org/core/mirrors-core-latest-redir.html and Solr can be downloaded from http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

Both releases contain a number of bug fixes.

See the Lucene CHANGES.txt and Solr CHANGES.txt files included with the release for a full list of details.

It’s upgrade time again!

May 20, 2014

Free Tools For Offensive Security

Filed under: Cybersecurity,Security — Patrick Durusau @ 7:21 pm

Free Tools For Offensive Security by John H. Sawyer.

From the post:

There are a lot of excellent offensive security tools available online for free, thanks to open-source licenses and the security professionals who’ve created tools in an effort to give back to the community. But because they are created by individuals or open-source efforts without the marketing and promotion resources of a vendor, these tools may not be well known in the enterprise.

Two years ago I wrote a Tech Insight on offensive security tools that defenders can leverage to help find vulnerabilities and secure their environments. Today, I want to update that list with some currently available tools that should be included in every offensive and defensive security professional’s toolbox.

I truly believe that a security professional focused on defense or offense must understand the tools and techniques used by the other side. Those who defend a network should be aware of the attacks they will face and the ways that attackers avoid detection. To become familiar with these approaches, they should try out some of these same attack methods.

Government surveillance, corporate data gathering, and network security breaches are proof that not everyone is on the Kumbaya Information train. Choose your security strategy accordingly.

Interactive Maps with D3.js, Three.js, and Mapbox

Filed under: D3,MapBox,Mapping,Maps — Patrick Durusau @ 7:05 pm

Interactive Maps with D3.js, Three.js, and Mapbox by Steven Hall.

From the post:

Over the past couple of weeks I have been experimenting with creating 2D maps that can be explored in three dimensional space using D3.js and Three.js.  The goal was to produce some highly polished prototypes with multiple choropleth maps that could be easily navigated on a single page.  Additionally, I wanted to make sure to address some of the common tasks that arise when presenting map data such as applying well-formatted titles, legends and elegantly handling mouse-over events. The two examples presented below use D3.js for for generating nested HTML elements that contain the maps, titles and labeling information and use Three.js to position the elements in 3D space using CSS 3D transforms.  Importantly, there is no WebGL used in these examples.  Everything is rendered in the DOM using CSS 3D transforms which, at the time of writing, has much wider browser support than WebGL.

This article is an extension of two of my previous articles on D3.js and Three.js that can be found here and here.   Below, I’ll go into more depth about how the examples are produced and some of the roadblocks I encountered in putting these demos together, but for more background on the general process it may be good to look at the first article in this series: D3.js, Three.js and CSS 3D Transforms.

The maps here are geographical maps but what Steve covers could be easily applied to other types of maps.

Community Detection in Graphs — a Casual Tour

Filed under: Graphs,Networks,Social Networks — Patrick Durusau @ 4:43 pm

Community Detection in Graphs — a Casual Tour by Jeremy Kun.

From the post:

Graphs are among the most interesting and useful objects in mathematics. Any situation or idea that can be described by objects with connections is a graph, and one of the most prominent examples of a real-world graph that one can come up with is a social network.

Recall, if you aren’t already familiar with this blog’s gentle introduction to graphs, that a graph G is defined by a set of vertices V, and a set of edges E, each of which connects two vertices. For this post the edges will be undirected, meaning connections between vertices are symmetric.

One of the most common topics to talk about for graphs is the notion of a community. But what does one actually mean by that word? It’s easy to give an informal definition: a subset of vertices C such that there are many more edges between vertices in C than from vertices in C to vertices in V - C (the complement of C). Try to make this notion precise, however, and you open a door to a world of difficult problems and open research questions. Indeed, nobody has yet come to a conclusive and useful definition of what it means to be a community. In this post we’ll see why this is such a hard problem, and we’ll see that it mostly has to do with the word “useful.” In future posts we plan to cover some techniques that have found widespread success in practice, but this post is intended to impress upon the reader how difficult the problem is.

Thinking that for some purposes, communities of nodes could well be a subject in a topic map. But we would have to be able to find them. And Jeremy says that’s a hard problem.

Looking forward to more posts on communities in graphs from Jeremy.

Theorizing the Web, an experience

Filed under: WWW — Patrick Durusau @ 4:14 pm

Theorizing the Web, an experience by Chas Emerick.

From the post:

Last week, I attended Theorizing the Web (TtW). I can say without hesitation that it was one of the most challenging, enlightening, and useful conference experiences I’ve ever had. I’d like to provide a summary account of my experience, and maybe offer some (early, I’m still processing) personal takeaways that might be relevant to you, especially if you are involved professionally in building the software and technology that is part of what is theorized at TtW.

The first thing you need to know is that TtW is not a technology conference. Before I characterize it positively though, it’s worth considering the conference’s own statement:

Theorizing the Web is an inter- and non-disciplinary annual conference that brings together scholars, journalists, artists, activists, and commentators to ask big questions about the interrelationships between the Web and society.

While there were a few technologists in attendance, even fewer were presenting. As it said on the tin, TtW was fundamentally about the social, media, art, legal, and political aspects and impacts of the internet and related technologies.

Before I enumerate some of my highlights of TtW, I want to offer some context of my own, a thread that mostly winds around:

When I saw the tweet by Chas, I thought this was a technical conference, but I quickly learned my error. 😉

Before you watch videos from the conference, Theorizing the Web, take a slow read of Chas’ post.

Whether you will draw the same conclusions as Chas or different ones remains to be seen. What is clear from his post, this conference covered many subjects that aren’t visible at many other conferences.

If you have a favorite video from the conference let me know. I will be watching at least some of them before offering my perspective.

Fun with CRDTs

Filed under: Consistency,CRDT,Merging — Patrick Durusau @ 3:32 pm

Fun with CRDTs by Richard Dallaway.

From the post:

At the end of last year I had some fun implementing a CRDT. These are data structures designed to combine together when you have no control over order of changes, timing of changes, or the number of participants in the data structure. The example I looked at was a sequential datatype, namely the WOOT CRDT for collaborative text editing.

Doesn’t:

combine together when you have no control over order of changes, timing of changes, or the number of participants in the data structure.

sound familiar? 😉

Richard points to:

Slides.

Video.

He also recommends that you watch: Reconciling Eventually-Consistent Data with CRDTs by Noel Welsh, before viewing his video.

Great stuff!

Elm 0.12.3

Filed under: Elm,Graphics,Visualization — Patrick Durusau @ 3:06 pm

Elm 0.12.3: Hardware accelerated 3D rendering with WebGL

From the post:

Elm now supports 3D rendering with WebGL! Huge thank you to John P. Mayer for designing and implementing such a simple API for this. It has been really fun to work with so far and we are excited to see what people can do with it!

This is the first public exploration of using alternate renders with Elm. Our goal is to be great for all kinds of UI tasks, so 3D is just the first step on the road to more traditional renderers such as the D3 backend for Elm. Future exploration will focus on more traditional kinds of UI, all super easy to embed as a component in an existing JS app.

This release also comes with some changes to the Color library, making it easier to create colors programmatically. The initial motivation was to make Color play nice with WebGL, but the library came out a lot friendlier to use in general.

If you want to become a functional programming shop, use Elm to experiment with 3D UI components. Or UIs in general for that matter.

I first saw this in a tweet by Paul Smith.

Madagascar

Filed under: Geophysical,Publishing,TeX/LaTeX — Patrick Durusau @ 2:31 pm

Madagascar

From the webpage:

Madagascar is an open-source software package for multidimensional data analysis and reproducible computational experiments. Its mission is to provide

  • a convenient and powerful environment
  • a convenient technology transfer tool

for researchers working with digital image and data processing in geophysics and related fields. Technology developed using the Madagascar project management system is transferred in the form of recorded processing histories, which become “computational recipes” to be verified, exchanged, and modified by users of the system.

Interesting tool for “reproducible documents” and data analysis.

The file format, Regularly Sampled Format (RSF) sounds interesting:

For data, Madagascar uses the Regularly Sampled Format (RSF), which is based on the concept of hypercubes (n-D arrays, or regularly sampled functions of several variables), much like the SEPlib (its closest relative), DDS, or the regularly-sampled version of the Javaseis format (SVF). Up to 9 dimensions are supported. For 1D it is conceptually analogous to a time series, for 2D to a raster image, and for 3D to a voxel volume. The format (actually a metaformat) makes use of a ASCII file with metadata (information about the data), including a pointer (in= parameter) to the location of the file with the actual data values. Irregularly sampled data are currently handled as a pair of datasets, one containing data and the second containing the corresponding irregular geometry information. Programs for conversion to and from other formats such as SEG-Y and SU are provided. (From Package Overview)

In case you are interested SEG-Y and SU (Seismic Unix data format) are both formats for geophysical data.

I first saw this in a tweet by Scientific Python.

<oXygen/> XML Editor 16.0

Filed under: Editor,XML — Patrick Durusau @ 12:51 pm

<oXygen/> XML Editor 16.0

From the post:

<oXygen/> XML Editor 16 increases your productivity for XSLT development with the addition of Quick Fixes and improvements to refactoring actions. Saxon-CE specific extensions are supported and you can apply now XPath queries on multiple files.

If you use Ant to orchestrate build processes then <oXygen/> will support you with a powerful Ant editor featuring validation, content completion, outline view, syntax highlight and search and refactoring actions.

Working with conditional content is a lot easier now as you can set different colors and styles for each condition or focus exclusively on a specific deliverable by hiding all excluded content. You can modify DITA and DocBook tables easily using the new table properties action.

You can customize the style of the <oXygen/> WebHelp output to look exactly as you want using the new WebHelp skin builder.

As usual, the new version includes many component updates and new API functionality.
….

Too many changes and new features to list!

Not cheap and has a learning curve but if you are looking for a top end XML editor, you need look no further.

Follow the Money (OpenTED)

Filed under: EU,Open Data,Open Government — Patrick Durusau @ 10:42 am

Opening Up EU Procurement Data by Friedrich Lindenberg.

From the post:

What is the next European dataset that investigative journalists should look at? Back in 2012 at the DataHarvest conference, Brigitte, investigative superstar from FarmSubsidy and co-host of the conference, had a clear answer: let’s open up TED (Tenders Electronic Daily). TED is the EU’s shared procurement mechanism, and is at the heart of the EU contracting process. Opening it up would shine a light on the key questions of who receives public money, and what they receive it for.

Her suggestion triggered a two-year project, OpenTED, which, as of last week, has finally matured into a useful resource for journalists and researchers. While gaps remain, we hope it will now start to be used by journalists, NGOs, analysts and citizens to get information on everything from large scale trends to local municipal developments.

(image omitted)

OpenTED

TED collects tender notices for large public projects so that companies from all EU countries can bid on those contracts. For journalists, there are many exciting questions such a database would be able to answer: What major projects are being announced? Who is winning the contracts for these projects, and is that decision made prudently and impartially? Who are the biggest suppliers in a particular country or industry?

A data dictionary for the project remains unfinished and there are plenty of other opportunities to contribute to this project.

The phrase “large public project” means projects with budgets in excess of €200,000. If experience in the United States holds true for the EU, there can be a lot of FGC (Fraud, Greed, Corruption) in under €200,000 contracts.

If you are looking for volunteer opportunities, the data needs to be used and explored, a data dictionary remains unfinished, current code can be improved and I assume documentation would be appreciated.

Certainly the type of project that merits widespread public support.

I find the project interesting because once you connect the players based on this data set, folding in other sets of connections, such as school, social, club, agency, employer, will improve the value of the original data set. Topic maps of course being my preferred method for the folding.

I first saw this in a tweet by ePSIplatform.

« Newer PostsOlder Posts »

Powered by WordPress