September « 2012 « Another Word For It

September 21, 2012

Discovery Informatics: Science Challenges for Intelligent Systems

Filed under: Discovery Informatics,Informatics — Patrick Durusau @ 2:52 pm

Discovery Informatics: Science Challenges for Intelligent Systems by Erwin Gianchandani.

From the post:

This past February in Arlington, VA, Yolanda Gil (University of Southern California Information Sciences Institute) and Haym Hirsh (Rutgers University) co-organized a workshop on discovery informatics, assembling over 50 participants from academia, industry, and government “to investigate the opportunities that scientific discoveries present to information sciences and intelligent systems as a new area of research called discovery informatics.” A report summarizing the key themes that emerged during discussions at that workshop is now available.

From the workshop homepage:

What is Discovery Informatics?

Discovery Informatics focuses on computing advances aimed at identifying scientific discovery processes that require knowledge assimilation and reasoning, and applying principles of intelligent computing and information systems in order to understand, automate, improve, and innovate any aspects of those processes.

No surprise that I think we need to focus on the human aspects of computing and information systems.

It isn’t like our machines are going to come up with interesting questions on their own.

Comments Off

Local Search – How Hard Can It Be? [Unfolding Searches?]

Filed under: Local Search,Search Behavior,Search Engines,Searching — Patrick Durusau @ 2:20 pm

Local Search – How Hard Can It Be? by Matthew Hurst.

From the post:

This week, Apple got a rude awakening with its initial foray into the world of local search and mapping. The media and user backlash to their iOS upgrade which removes Google as the maps and local search partner and replaces it with their own application (built on licensed data) demonstrates just how important the local scenario is to the mobile space.

While the pundits are reporting various (and sometimes amusing) issues with the data and the search service, it is important to remind ourselves how hard local search can be.

For example, if you search on Google for Key Arena – a major venue in Seattle located in the famous Seattle Center, you will find some severe data quality problems.

See Matthew’s post for the detail but I am mostly interesting in his final observation:

One of the ironies of local data conflation is that landmark entities (like stadia, large complex hotels, hospitals, etc.) tend to have lots of data (everyone knows about them) and lots of complexity (the Seattle Center has lots of things within it that can be confused). These factors conspire to make the most visible entities in some ways the entities more prone to problems.

Every library student is (or should be) familiar with the “reference interview.” A patron asks a question (consider this to be the search request, “Key Arena”) and a librarian uses the reference interview to further identify the information being requested.

Contrast that unfolding of the search request, which at any juncture offers different paths to different goals, with the “if you can identify it, you can find it,” approach of most search engines.

Computers have difficulty searching complex entities such as “Key Arena” successfully. Whereas starting with the same query with a librarian does not.

Doesn’t that suggest to you that “unfolding” searches may be a better model for computer searching than simple identification?

More than static facets, but a presentation of the details most likely to distinguish subjects searched for by users under similar circumstances. Dynamically.

Sounds like the sort of heuristic knowledge that topic maps could capture quite handily.

Comments Off

The First Three Seconds: How Users Are Lost

Filed under: Interface Research/Design,Usability,Users,Visualization — Patrick Durusau @ 2:02 pm

The First Three Seconds: How Users Are Lost by Zac Gery.

From the post:

In the time it takes to read this sentence, someone has viewed this post and moved on. They probably didn’t even read this sentence. Why did they leave? What were they looking for? Users searching on the internet have a short attention span. It is commonly referred to as the “3 Second Rule.” Although not specifically three seconds, the rule accentuates the limited time a website has to make a first impression. The goal of any website is to clarify, then build interest. Interest drives return visits and recommendations. As a user’s visit extends so does the chance for a return visit.

On the web, first impressions start with speed. From the moment users request a web page, they begin to evaluate. Displaying a modern website is a coordinated effort of content, css files, javascript files, images, and more. Too many requests or large files can increase a website’s load time. Tools such as Firebug, YSlow, Webkit’s Inspector, and Fiddler offer an excellent overview of load times. Browser caching can help with additional requests, but most websites are not afforded a second look. Investigate the number of files required for a web page. Sprites are a great way to reduce multiple image files and overall size. Compression tools can also help to reduce wasted space in javsacript and CSS files.

A little bit longer than Love or Hate in 50 Milliseconds but it still raises the bar over the thirty (30) second elevator speech.

Are you measuring user reactions to your interfaces in milliseconds?

Or do you ask your manager for their reaction?

Care to guess which test is most often used by successful products?

I first saw this at DZone.

Comments (1)

September 20, 2012

Interacting with Weka from Jython

Filed under: Machine Learning,Weka — Patrick Durusau @ 8:08 pm

Interacting with Weka from Jython by Christophe Lalanne.

From the post:

I discovered a lovely feature: You can use WEKA directly with Jython in a friendly interactive REPL.

There are days when I think I need more than multiple workspaces on multiple monitors. I need an extra set of hands and eyes. 😉

Enjoy!

Comments Off

25th ACM Symposium on Parallelism in Algorithms and Architectures

Filed under: Algorithms,Conferences,Parallel Programming — Patrick Durusau @ 7:59 pm

25th ACM Symposium on Parallelism in Algorithms and Architectures

Submission Deadlines:
Abstracts: February 11 (11:59 pm EST)
Full papers: February 13 (11:59 pm EST)
These are firm deadlines. No extensions will be granted.
Notification: April 15
Camera-ready copy due: May 14

From the call for papers:

This year, SPAA is co-located with PODC. SPAA defines the term “parallel” broadly, encompassing any computational system that can perform multiple operations or tasks simultaneously. Topics include, but are not limited to:

Parallel and Distributed Algorithms

Parallel and Distributed Data Structures

Green Computing and Power-Efficient Architectures

Management of Massive Data Sets

Parallel Complexity Theory

Parallel and Distributed Architectures

Multi-Core Architectures

Instruction Level Parallelism and VLSI

Compilers and Tools for Concurrent Programming

Supercomputer Architecture and Computing

Transactional Memory Hardware and Software

The Internet and the World Wide Web

Game Theory and Collaborative Learning

Routing and Information Dissemination

Resource Management and Awareness

Peer-to-Peer Systems

Mobile Ad-Hoc and Sensor Networks

Robustness, Self-Stabilization and Security

Synergy of Parallelism in Algorithms, Programming and Architecture

Montreal, Canada, July 23 – 25, 2013.

Think about it. Balisage won’t be that far away, could put some vacation time together with the conferences at either end.

Comments Off

“Communicating the User Experience” (Book Review)

Filed under: Documentation,Interface Research/Design,Usability — Patrick Durusau @ 7:47 pm

“Communicating the User Experience” – reviewed by Jane Pyle.

From the post:

I’ll admit it. I haven’t spent a lot of time in my career creating beautiful wireframes. For the past four years I’ve been designing mobile apps for internal use in a large corporation and the first casualty in every project has been design documentation. I’ve been able to successfully communicate my designs using sketches, dry erase boards, and/or rapid prototyping, but the downside of this approach became quite clear when our small team disbanded. As a new team was formed, the frequently asked question of “so where is the documentation for this project” was met with my sheepish gaze.

So I was very curious to read Communicating the User Experience and perhaps learn some practical methods for creating UX documentation on a shoestring time budget. What’s the verdict? Have I seen the documentation light and decided to turn over a new leaf? Read on.

As Jane discovers, there are no shortcuts to documentation, UX or otherwise.

A guide to tools for creating a particular style of documentation can be helpful to beginners, as Jane notes, but not beyond that.

Creating documentation is not a tool driven activity. It is a more creative activity than creation of software or an interface.

Software works with deterministic machines and can be tested as such. Documentation has to work with non-deterministic users.

The only test for documentation being whether it is understood by those non-deterministic users.

Rather than facing the harder task of documentation, many prefer to grunt and wave their sharpies in the air.

It may be amusing, but it’s not documentation.

Comments Off

HCatalog Meetup at Twitter

Filed under: Hadoop,HCatalog,Pig — Patrick Durusau @ 7:22 pm

HCatalog Meetup at Twitter by Russell Jurney.

From the post:

Representatives from Twitter, Yahoo, LinkedIn, Hortonworks and IBM met at Twitter HQ on Thursday to talk HCatalog. Committers from HCatalog, Pig and Hive were on hand to discuss the state of HCatalog and its future.

Apache HCatalog is a table and storage management service for data created using Apache Hadoop.

See Russell’s post for more details.

Then brush up on HCatalog (if you aren’t already following it).

Comments Off

Pig as Duct Tape, Part Three: TF-IDF Topics with Cassandra, Python Streaming and Flask

Filed under: Cassandra,Pig — Patrick Durusau @ 7:15 pm

Pig as Duct Tape, Part Three: TF-IDF Topics with Cassandra, Python Streaming and Flask by Russell Jurney.

From the post:

Apache Pig is a dataflow oriented, scripting interface to Hadoop. Pig enables you to manipulate data as tuples in simple pipelines without thinking about the complexities of MapReduce.

But Pig is more than that. Pig has emerged as the ‘duct tape’ of Big Data, enabling you to send data between distributed systems in a few lines of code. In this series, we’re going to show you how to use Hadoop and Pig to connect different distributed systems to enable you to process data from wherever and to wherever you like.

Working code for this post as well as setup instructions for the tools we use and their environment variables are available at https://github.com/rjurney/enron-python-flask-cassandra-pig and you can download the Enron emails we use in the example in Avro format at http://s3.amazonaws.com/rjurney.public/enron.avro. You can run our example Pig scripts in local mode (without Hadoop) with the -x local flag: pig -x local. This enables new Hadoop users to try out Pig without a Hadoop cluster.

Part one and two can get you started using Pig if you’re not familiar.

With this post in the series, “duct tape,” made it into the title.

In case you don’t know (I didn’t), Flask is a “lightweight web application framework in Python.”

Just once I would like to see a “heavyweight, cumbersome, limited and annoying web application framework in (insert language of your choice).”

Just for variety.

Rather than characterizing software, say what it does.

Sorry, I have been converting one of the most poorly edited documents I have ever seen into a csv file. Proofing will follow the conversion process but hope to finish that by the end of next week.

Comments Off

Misinformation: Why It Sticks and How to Fix It

Filed under: Interface Research/Design,Usability — Patrick Durusau @ 4:43 pm

Misinformation: Why It Sticks and How to Fix It

From the post:

Childhood vaccines do not cause autism. Barack Obama was born in the United States. Global warming is confirmed by science. And yet, many people believe claims to the contrary.

Why does that kind of misinformation stick? A new report published in Psychological Science in the Public Interest, a journal of the Association for Psychological Science, explores this phenomenon. Psychological scientist Stephan Lewandowsky of the University of Western Australia and colleagues highlight the cognitive factors that make certain pieces of misinformation so “sticky” and identify some techniques that may be effective in debunking or counteracting erroneous beliefs.

The main reason that misinformation is sticky, according to the researchers, is that rejecting information actually requires cognitive effort. Weighing the plausibility and the source of a message is cognitively more difficult than simply accepting that the message is true — it requires additional motivational and cognitive resources. If the topic isn’t very important to you or you have other things on your mind, misinformation is more likely to take hold.

And when we do take the time to thoughtfully evaluate incoming information, there are only a few features that we are likely to pay attention to: Does the information fit with other things I believe in? Does it make a coherent story with what I already know? Does it come from a credible source? Do others believe it?

Misinformation is especially sticky when it conforms to our preexisting political, religious, or social point of view. Because of this, ideology and personal worldviews can be especially difficult obstacles to overcome.

Useful information for designing interfaces in general and topic maps in particular.

I leave it for others to decide which worldviews support are “information,” as opposed to “misinformation.”

But whatever your personal view of some “facts,” the same techniques should serve equally well.

PS: The taking effort to reject information is a theme explored in Thinking, Fast and Slow.

Comments Off

September 19, 2012

Five Lessons Learned Doing User Research in Asia

Filed under: Interface Research/Design,Usability,Users — Patrick Durusau @ 7:20 pm

Five Lessons Learned Doing User Research in Asia by Carissa Carter.

From the post:

If you have visited any country in Asia recently, you have probably seen it. Turn your head in any direction; stand up; go shopping; or check an app on your phone and you will notice products from Western companies lurking about. Some of these products are nearly identical to their counterparts overseas, and others are brand new, launched specifically for the local market.

As more and more companies are taking their products abroad, the need for user research in these new markets is increasing in importance. I spent a year spanning 2010 and 2011 living in Hong Kong and leading user research campaigns—primarily in China, Japan, and India. Through a healthy balance of trial and error (and more error), I learned a lot about leading these studies in cultures incredibly different than my own. Meta understanding with a bit of methodology mixed in, I offer you my top five lessons learned while conducting and applying user research in Asia.

Successful user interface designs change across cultures.

Is that a clue as to what happens with subject identifications?

Comments Off

Topic Based Authoring (Webinar)

Filed under: Authoring Topic Maps,Topic Maps — Patrick Durusau @ 7:02 pm

Topic Based Authoring

Date: Thursday, October 4, 2012
Time: 11:00 AM PDT | 2:00 PM EDT

From the description:

Using a topic-based approach can improve consistency and usability of information and make it easier to reuse topics in different contexts. It can also simplify maintenance, speed up the review process, and facilitate shared authoring.

All of those benefits sound great. But which ones really matter to you, your business, and your customers? It’s important to know why you want to change your content strategy, and how you’ll evaluate whether you’ve been successful.

Topic-based authoring implementations often focus on learning writing patterns, techniques, and technologies like DITA and CCMS. Those are important and useful, but topic-based authoring doesn’t exist in a vacuum. Decisions you make about your content need to be tied to business goals and user needs. Too often, the activity of thinking through the business goals and user needs gets neglected.

This 45-minute webinar will define topic-based authoring and help you understand not only the benefits of this approach but also walk you through the critical steps to defining and implementing a successful program.

Comments Off

GT-VMT…Graph Transformation and Visual Modeling Techniques

Filed under: Conferences,Graphs,Networks,Visualization — Patrick Durusau @ 4:43 pm

GT-VMT 2013 : 12th International Workshop on Graph Transformation and Visual Modeling Techniques

Abstract Submission: December 7, 2012

Paper Submission: December 14, 2012

Notification to Authors: January 18, 2013

Camera Ready Submission: February 1, 2013

Workshop Dates: March 23-24, 2013

From the call for papers:

GT-VMT 2013 is the twelfth workshop of a series that serves as a forum for all researchers and practitioners interested in the use of visual notations (especially graph-based), techniques and tools for the specification, modeling, validation, manipulation and verification of complex systems. The aim of the workshop is to promote engineering approaches that provide effective sound tool support for visual modeling languages, enhancing formal reasoning at the syntactic as well as semantic level (e.g., for model specification, model analysis, model transformation, and model consistency management) in different domains, such as UML, Petri Nets, Graph Transformation or Business Process/Workflow Models.

This year’s workshop has a special theme of the analysis of non-functional / extra-functional / quality properties like performance, real-time, safety, reliability, energy consumption. We particularly encourage submissions that focus on the definition and the evaluation of such properties using visual/graph specification techniques, ranging from underlying theory through to their utility in complex system design.

As a summary, topics relevant to the scope of the workshop include (but are not restricted to) the following:

visual languages definition and syntax (incl. meta-modelling, grammars and graphical parsing);

static and dynamic semantics of visual languages (incl. OCL, graph constraints, simulation, animation, compilation);

visual/graph-based analysis in software engineering (incl. testing, verification & validation, static & dynamic analysis techniques);

visual/graph constraints (incl. definition, expressiveness, analysis techniques involving constraints);

model transformations and their application in model-driven development (incl. in particular, transformations between graphical and textual formalisms);

visual modeling techniques and graph transformation applied to patterns;

visual modeling techniques and graph transformations for systems with quality properties like performance, real-time, safety, reliability, energy consumption;

case studies and novel application areas (e.g. within engineering, biology, etc);

tool support and efficient algorithms.

Did I forget to mention the workshop will be held in Rome, Italy? 😉 (March is a great time to be in Rome.)

Comments Off

Towards a Universal SMILES representation…

Filed under: Cheminformatics — Patrick Durusau @ 4:25 pm

Towards a Universal SMILES representation – A standard method to generate canonical SMILES based on the InChI by Noel M O’Boyle. Journal of Cheminformatics 2012, 4:22 doi:10.1186/1758-2946-4-22

Abstract:

Background

There are two line notations of chemical structures that have established themselves in the field: the SMILES string and the InChI string. The InChI aims to provide a unique, or canonical, identifier for chemical structures, while SMILES strings are widely used for storage and interchange of chemical structures, but no standard exists to generate a canonical SMILES string.

Results

I describe how to use the InChI canonicalisation to derive a canonical SMILES string in a straightforward way, either incorporating the InChI normalisations (Inchified SMILES) or not (Universal SMILES). This is the first description of a method to generate canonical SMILES that takes stereochemistry into account. When tested on the 1.1 m compounds in the ChEMBL database, and a 1 m compound subset of the PubChem Substance database, no canonicalisation failures were found with Inchified SMILES. Using Universal SMILES, 99.79% the ChEMBL database was canonicalised successfully and 99.77% of the PubChem subset.

Conclusions

The InChI canonicalisation algorithm can successfully be used as the basis for a common standard for canonical SMILES. While challenges remain — such as the development of a standard aromatic model for SMILES — the ability to create the same SMILEs using different toolkits will mean that for the first time it will be possible to easily compare the chemical models used by different toolkits.

Noel notes much work remains to be done but being able to reliably compare the output of different toolkits sounds like a step in the right direction.

Comments Off

Process group in erlang: some thoughts about the pg module

Filed under: Distributed Systems,Erlang — Patrick Durusau @ 10:58 am

Process group in erlang: some thoughts about the pg module by Paolo D’Incau.

From the post:

One of the most common ways to achieve fault tolerance in distributed systems, consists in organizing several identical processes into a group, that can be accessed by a common name. The key concept here is that whenever a message is sent to the group, all members of the group receive it. This is a really nice feature, since if one process in the group fails, some other process can take over for it and handle the message, doing all the operations required.

Process groups allow also abstraction: when we send a message to a group, we don’t need to know who are the members and where they are. In fact process groups are all but static. Any process can join an existing group or leave one at runtime, moreover a process can be part of more groups at the same time.

Fault tolerance is going to be an issue if you are using topic maps and/or social media in an operational context.

Having really “cool” semantic capabilities isn’t worth much if the system fails at a critical point.

Comments Off

Analyzing Twitter Data with Hadoop [Hiding in a Public Data Stream]

Filed under: Cloudera,Flume,Hadoop,HDFS,Hive,Oozie,Tweets — Patrick Durusau @ 10:46 am

Analyzing Twitter Data with Hadoop by Jon Natkins

From the post:

Social media has gained immense popularity with marketing teams, and Twitter is an effective tool for a company to get people excited about its products. Twitter makes it easy to engage users and communicate directly with them, and in turn, users can provide word-of-mouth marketing for companies by discussing the products. Given limited resources, and knowing we may not be able to talk to everyone we want to target directly, marketing departments can be more efficient by being selective about whom we reach out to.

In this post, we’ll learn how we can use Apache Flume, Apache HDFS, Apache Oozie, and Apache Hive to design an end-to-end data pipeline that will enable us to analyze Twitter data. This will be the first post in a series. The posts to follow to will describe, in more depth, how each component is involved and how the custom code operates. All the code and instructions necessary to reproduce this pipeline is available on the Cloudera Github.

Looking forward to more posts in this series!

Social media is a focus for marketing teams for obvious reasons.

Analysis of snaps, crackles and pops en masse.

What if you wanted to communicate securely with others using social media?

Thinking of something more robust and larger than two (or three) lovers agreeing on code words.

How would you hide in a public data stream?

Or the converse, how would you hunt for someone in a public data stream?

How would you use topic maps to manage the semantic side of such a process?

Comments Off

September 18, 2012

Topic Map Cheat Sheets?

Filed under: Authoring Topic Maps,Topic Maps — Patrick Durusau @ 9:17 pm

I have run across several collections of cheat sheets recently.

Would it be helpful to have “cheat sheets” for topic maps?

And if so, would it be more helpful to have “cheat sheets” that were subject specific?

Thinking of subjects I helped identify for a map with chemicals in it. Used a standard set of identifiers plus alternate identifiers as well. Some might be commonly known, others possibly not.

Thoughts? Suggestions? Volunteers?

Comments Off

Mind maps just begging for RDF triples…. [human understanding = computer interpretation?]

Filed under: Mind Maps,RDF — Patrick Durusau @ 9:05 pm

Mind maps just begging for RDF triples and formal models by Kerstin Forsberg.

From the post:

Earlier this week CDISC English Speaking User Group (ESUG) Committee arranged a webinar: “CDISC SHARE – How SHARE is developing as a project/standard” with Simon Bishop, Standards and Operations Director, GSK. I did find the comprehensive presentation from Simon, and his colleuage Diana Wold, very interesting.

Interesting as the presentation in an excellent way exemplifies how “Current standards (company standards, SDTM standards, other standards) do not current deliver the capability we require” Also, I do find the presentation interesting as it exemplifies mind maps as a way forward as “Diagrams help us understand clinical processes and how this translates into datasets and variables.” (Quotes from slide 20 in the presentation: Conclusions.)

Below a couple of examples of mind maps from the presentation. And also, the background to my thinking that they are Mind maps just begging for RDF triples and formal models of the clinical and biomedical reality to make them fully ready “both for human understanding and for computer interpretation“.

Interesting post but the:

both for human understanding and for computer interpretation

is what caught my attention.

Always a good thing to improve the ability of computer’s to find things for us. To the extent RDF can do that, great!

But human understanding is far deeper and more complex than any computer, by RDF or other means, can achieve.

I think we need to keep the distinction between human understanding and computer interpretation firmly in mind.

I first saw this at the SemanticWeb.com.

Comments Off

Designing Data Apps with R at Periscopic [No maps?]

Filed under: Graphics,R,Visualization — Patrick Durusau @ 8:31 pm

Designing Data Apps with R at Periscopic by Andrew Winterman.

From the post:

The Hewlett Foundation contacted us a few months ago because they were interested in exploring ways to visualize the distribution and impact of their grantmaking efforts over the last ten years. They hoped to make a tool with three functions: It would provide insight into where the Foundation has made the largest impact; provide grant seekers context for their applications; and help the Foundation’s officers make decisions about new grantmaking efforts, based on their existing portfolio. They had one request: No maps.

The data arrived, as it so often does, in the rough: An Excel document compiled quickly, by hand, with the primary goal of providing an overview, rather than complete accuracy. At this point in the process, we paint with broad brushes. We learn the data’s characteristics, determine which facets are interesting, and prototype visualization ideas.

At the beginning of a project, I always explore a few simple visualization techniques to get a feel for the data. For example, simple bar charts as shown in Figure 1, scatter plots, and choropleths, are great ways to get a visual sense of what the data is saying.

I was surprised at the request for “no maps” but after you think about it for a minute, it probably encouraged visual exploration of the data.

Do you experiment with visualizations of data before you start designing the final deliverable?

Comments Off

A Model of Consumer Search Behaviour

Filed under: Search Behavior,Search Interface,Searching — Patrick Durusau @ 8:08 pm

A Model of Consumer Search Behaviour by Tony Russell-Rose.

From the post:

A couple of weeks ago I posted the slides to my talk at EuroHCIR on A Model of Consumer Search Behaviour. Finally, as promised, here is the associated paper, which is co-authored with Stephann Makri (and also available as a pdf in the proceedings). I hope it addresses the questions that the slide deck provoked, and provides further food for thought 🙂

ABSTRACT

In order to design better search experiences, we need to understand the complexities of human information-seeking behaviour. In previous work [13], we proposed a model of information behavior based on an analysis of the information needs of knowledge workers within an enterprise search context. In this paper, we extend this work to the site search context, examining the needs and behaviours of users of consumer-oriented websites and search applications.

We found that site search users presented significantly different information needs to those of enterprise search, implying some key differences in the information behaviours required to satisfy those needs. In particular, the site search users focused more on simple “lookup” activities, contrasting with the more complex, problem-solving behaviours associated with enterprise search. We also found repeating patterns or ‘chains’ of search behaviour in the site search context, but in contrast to the previous study these were shorter and less complex. These patterns can be used as a framework for understanding information seeking behaviour that can be adopted by other researchers who want to take a ‘needs first’ approach to understanding information behaviour.

Take the time to read the paper.

How would you test the results?

Placeholder: Probably beyond the bounds of the topic maps course but a guest lecture on designing UI tests could be very useful for library students. They will be selecting interfaces to be used by patrons and knowing how to test candidate interfaces could be valuable.

Comments Off

Scholarly metadata from R

Filed under: Metadata,OAI,R — Patrick Durusau @ 7:59 pm

Scholarly metadata from R

From the post:

Metadata! Metadata is very cool. It’s super hot right now – everybody is talking about it. Okay, maybe not everyone, but it’s an important part of archiving scholarly work.

We are working on a repo on GitHub rmetadata to be a one stop shop for querying metadata from around the web. Various repos on GitHub we have started – rpmc, rdatacite, rdryad, rpensoft, rhindawi – will at least in part be folded into rmetadata.

As a start we are writing functions to hit any metadata services that use the OAI-PMH: “Open Archives Initiative Protocol for Metadata Harvesting” framework. OAI-PMH has six methods (or verbs as they are called) for data harvesting that are the same across different metadata providers:

GetRecord

Identify

ListIdentifiers

ListMetadataFormats

ListRecords

ListSets

OAI-PMH provides an updating list of data providers, which we can easily use to get the base URLs for their data. Then we just use one of the six above methods to query their metadata.

Re-using metadata is a lot easier than creating all new metadata.

Not to mention avoiding creating new metadata that is inconsistent with existing metadata.

Comments Off

September 17, 2012

Statistical Data Mining Tutorials

Filed under: Data Mining,Statistics — Patrick Durusau @ 6:25 pm

Statistical Data Mining Tutorials by Andrew Moore.

From the post:

The following links point to a set of tutorials on many aspects of statistical data mining, including the foundations of probability, the foundations of statistical data analysis, and most of the classic machine learning and data mining algorithms.

These include classification algorithms such as decision trees, neural nets, Bayesian classifiers, Support Vector Machines and cased-based (aka non-parametric) learning. They include regression algorithms such as multivariate polynomial regression, MARS, Locally Weighted Regression, GMDH and neural nets. And they include other data mining operations such as clustering (mixture models, k-means and hierarchical), Bayesian networks and Reinforcement Learning.

Perhaps a bit dated but not seriously so.

And one never knows when a slightly different explanation will make something obscure suddenly clear.

Comments Off

Probability and Statistics Cookbook

Filed under: Mathematics,Probability,Statistics — Patrick Durusau @ 6:16 pm

Probability and Statistics Cookbook by Matthias Vallentin.

From the webpage:

The cookbook contains a succinct representation of various topics in probability theory and statistics. It provides a comprehensive reference reduced to the mathematical essence, rather than aiming for elaborate explanations.

Very summary presentation so better as a quick reminder type resource.

I was particularly impressed by the univariate distribution relationships map on the last page.

In that regard, you may want to look at John D. Cook’s Diagram of distribution relationships
and the links therein.

Comments Off

Make cool images with emergent algorithm

Filed under: Graphics,Visualization — Patrick Durusau @ 6:05 pm

Make cool images with emergent algorithm by Nathan Yau.

Nathan finds very cool visualizations on a regular basis.

Visit Nathan’s site and post here how you would use this technique with a topic map.

What of subjects that arise from social interaction? Marriage, birth?

Are those best modeled as static subjects or ones that “emerge” from social interaction?

Comments Off

Trees – A Primer

Filed under: Programming,Trees — Patrick Durusau @ 4:51 pm

Trees – A Primer by Jeremy Kun.

From the post:

This post comes in preparation for a post on decision trees (a specific type of tree used for classification in machine learning). While most mathematicians and programmers are familiar with trees, we have yet to discuss them on this blog. For completeness, we’ll give a brief overview of the terminology and constructions associated with trees, and describe a few common algorithms on trees. We will assume the reader has read our first primer on graph theory, which is a light assumption. Furthermore, we will use the terms node and vertex interchangeably, as mathematicians use the latter and computer scientists the former.

A nice introduction/refresher on tree data structures.

Comments Off

U.S. Sequestration Report – Out of the Shadows/Into the Light?

Filed under: Government,Government Data,Topic Maps — Patrick Durusau @ 10:25 am

Due to personalities, pettiness and partisan politics too boring to recount, the U.S. budget is about to be automatically cut (sequestered). To accomplish that goal, the OMB Report Pursuant to the Sequestration Transparency Act of 2012 (P. L. 112–155) has been released. I first saw this report in Obama releases sequestration report by Amber Corrin (Federal Computer Weekly).

Can it be that U.S. government spending has stepped out of the shadows and into the light?

The report identifies specific programs and the proposed cuts to each one.

As you can imagine, howls of “dire consequences” are issuing from agencies, grantees, elected officials and of course, government staff.

Some of which are probably true. Some of them.

Does the sequestration report give us an opportunity to determine which claims of “dire consequences” are true and while ones are false?

Let’s take an easy one:

001-05-0127 Sergeant at Arms and Doorkeeper of the Senate

Present	Cut	Remaining
131 $Million	11 $Million	120 $Million

Can you name (identify) a specific “dire consequence” to reducing the “Sergeant at Arms and Doorkeeper of the Senate” budget by 11 $Million?

Want to represent the public interest? Ask your elected representatives to say what “dire consequences” they see from specific items in the sequestration report?

Do not accept hand waving generalities of purported “dire consequences.”

To qualify as a possible “dire consequence,” it should at least be identified by name. Such as: “Cut X means we can’t run metal scanners at government buildings.” Or “Cut 001-05-0130 means we can’t afford refreshments for Senate offices. (Yes, its really in there.)”

That would enable a meaningful debate over “dire consequences.”

Part of that debate should be around who claims “dire consequences” and what “dire consequences” are being claimed.

Can you capture that without using a topic map?

Comments (2)

Identities and Identifications: Politicized Uses of Collective Identities

Filed under: Identification,Identifiers,Identity — Patrick Durusau @ 3:56 am

Identities and Identifications: Politicized Uses of Collective Identities

Deadline for Panels 15 January 2013
Deadline for Papers 1 March 2013
Conference 18-20 April 2013, Zagreb, Croatia

From the call for panels and papers:

Identity is one of the crown jewelleries in the kingdom of ‘contested concepts’. The idea of identity is conceived to provide some unity and recognition while it also exists by separation and differentiation. Few concepts were used as much as identity for contradictory purposes. From the fragile individual identities as self-solidifying frameworks to layered in-group identifications in families, orders, organizations, religions, ethnic groups, regions, nation-states, supra-national entities or any other social entities, the idea of identity always shows up in the core of debates and makes everything either too dangerously simple or too complicated. Constructivist and de-constructivist strategies have led to the same result: the eternal return of the topic. Some say we should drop the concept, some say we should keep it and refine it, some say we should look at it in a dynamic fashion while some say it’s the reason for resistance to change.

If identities are socially constructed and not genuine formations, they still hold some responsibility for inclusion/exclusion – self/other nexuses. Looking at identities in a research oriented manner provides explanatory tolls for a wide variety of events and social dynamics. Identities reflect the complex nature of human societies and generate reasonable comprehension for processes that cannot be explained by tracing pure rational driven pursuit of interests. The feelings of attachment, belonging, recognition, the processes of values’ formation and norms integration, the logics of appropriateness generated in social organizations are all factors relying on a certain type of identity or identification. Multiple identifications overlap, interact, include or exclude, conflict or enhance cooperation. Identities create boundaries and borders; define the in-group and the out-group, the similar and the excluded, the friend and the threatening, the insider and the ‘other’.

Beyond their dynamic fuzzy nature that escapes exhaustive explanations, identities are effective instruments of politicization of social life. The construction of social forms of organization and of specific social practices together with their imaginary significations requires all the time an essentialist or non-essentialist legitimating act of belonging; a social glue that extracts its cohesive function from the identification of the in-group and the power of naming the other. Identities are political. Multicultural slogans populate extensively the twenty-first century yet the distance between the ideal and the real multiculturalism persists while the virtues of inclusion coexist with the adversity of exclusion. Dealing with the identities means to integrate contestation into contestation until potentially a n degree of contestation. Due to the confusion between identities and identifications some scholars demanded that the concept of identity shall be abandoned. Identitarian issues turned out to be efficient tools for politicization of a ‘constraining dissensus’ while universalizing terms included in the making of the identities usually tend or intend to obscure the localized origins of any identitarian project. Identities are often conceptually used as rather intentional concepts: they don’t say anything about their sphere but rather defining the sphere makes explicit the aim of their usage. It is not ‘identity of’ but ‘identity to’.

Quick! Someone get them a URL! 😉 Just teasing.

Enjoy the conference!

Comments Off

September 16, 2012

Get More Out Of Google

Filed under: Searching — Patrick Durusau @ 7:04 pm

Get More Out Of Google

I saw this in Sunday Data/Statistics Link Roundup (9/9/12)

You probably won’t find it useful but may know someone who will.

Comments Off

Sketching User Experiences: The Workbook

Filed under: Design,Graphics,Interface Research/Design,Visualization — Patrick Durusau @ 4:42 pm

Sketching User Experiences: The Workbook By: Saul Greenberg; Sheelagh Carpendale; Nicolai Marquardt; Bill Buxton.

Description:

In Sketching User Experiences: The Workbook, you will learn, through step-by-step instructions and exercises, various sketching methods that will let you express your design ideas about user experiences across time. Collectively, these methods will be your sketching repertoire: a toolkit where you can choose the method most appropriate for developing your ideas, which will help you cultivate a culture of experience-based design and critique in your workplace.

Features standalone modules detailing methods and exercises for practitioners who want to learn and develop their sketching skills

Extremely practical, with illustrated examples detailing all steps on how to do a method

Excellent for individual learning, for classrooms, and for a team that wants to develop a culture of design practice

Perfect complement to Buxtons Sketching User Experience or any UX text

Author-maintained companion website at
http://grouplab.cpsc.ucalgary.ca/sketchbook/

My first time to encounter this book.

Comments/suggestions?

Similar materials?

Interfaces are as much about mapping as anything we do inside topic maps.

Which implies the ability to map from “your” interface to one I find more congenial doesn’t it?

Comments Off

New Army Guide to Open-Source Intelligence

Filed under: Intelligence,Open Data,Open Source,Public Data — Patrick Durusau @ 4:06 pm

New Army Guide to Open-Source Intelligence

If you don’t know Full Text Reports, you should.

A top-tier research professional’s hand-picked selection of documents from academe, corporations, government agencies, interest groups, NGOs, professional societies, research institutes, think tanks, trade associations, and more.

You will winnow some chaff but also find jewels like Open Source Intelligence (PDF).

From the post:

Provides fundamental principles and terminology for Army units that conduct OSINT exploitation.
Discusses tactics, techniques, and procedures (TTP) for Army units that conduct OSINT exploitation.
Provides a catalyst for renewing and emphasizing Army awareness of the value of publicly available information and open sources.
Establishes a common understanding of OSINT.
Develops systematic approaches to plan, prepare, collect, and produce intelligence from publicly available information from open sources.

Impressive intelligence overview materials.

Would be nice to re-work into a topic map intelligence approach document with the ability to insert a client’s name and industry specific examples. Has that militaristic tone that is hard to capture with civilian writers.

Comments Off

Supreme Court Database–Updated [US]

Filed under: Law,Law - Sources,Legal Informatics — Patrick Durusau @ 1:30 pm

Supreme Court Database–Updated

Michael Heise writes:

An exceptionally helpful source of data for those interested in US Supreme Court decisions was recently updated to include data from OT2011. The Supreme Court Database (2012 release, v.01, here) “contains over two hundred pieces of information about each case decided by the Court between the 19[46] and 20[11] terms. Examples include the identity of the court whose decision the Supreme Court reviewed, the parties to the suit, the legal provisions considered in the case, and the votes of the Justices.” An online codebook for this leading compilation of Supreme Court decisions (particularly for political scientists) can be found here.

The Supreme Court Database sponsors this dataset, tools for analysis and training materials to assist you with both.

Very useful for combining with other data and analysis, ranging from political science and history to more traditional legal approaches.

Comments Off

« Newer Posts — Older Posts »

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 21, 2012

September 20, 2012

September 19, 2012

September 18, 2012

September 17, 2012

September 16, 2012