Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 7, 2011

Haskell – Typeclassopedia

Filed under: Computer Science,Examples — Patrick Durusau @ 7:01 am

Typeclassopedia appears in The Monad.Reader Issue 13

I was looking for Calculating Monads with Category Theory by Derek Elkins (in this issue of the Monad.Reader) when I ran across The Typeclassopedia by Brent Yorgey.

From the abstract:

The standard Haskell libraries feature a number of type classes with algebraic or category-theoretic underpinnings. Becoming a fluent Haskell hacker requires intimate familiarity with them all, yet acquiring this familiarity often involves combing through a mountain of tutorials, blog posts, mailing list archives, and IRC logs.

The goal of this article is to serve as a starting point for the student of Haskell wishing to gain a firm grasp of its standard type classes. The essentials of each type class are introduced, with examples, commentary, and extensive references for further reading.

Doesn’t combing through a mountain of tutorials, blog posts, mailing list archives, and IRC logs just cry topic map?

Will be using this article as a jumping of point for exploring a topic map interface for authoring a topic map about Haskell as well as what would an interface for a topic map about Haskell look like?

Quite serious about this being an exploration because I don’t think there is a one size fits all authoring or using/viewing interface.

Your thoughts, suggestions, comments, etc. are most welcome.

First step: I am going to start mapping out this article and not worry about other sources of information. Want to start from a known source and then incorporate other sources.

February 6, 2011

“Astonish Us” – Post

Filed under: Marketing,Topic Maps — Patrick Durusau @ 7:08 am

Astonish Us has to be one of the more persuasive pieces of any genre I have read in a very long time.

The advice you find there is applicable to funding for topic maps research, marketing topic maps to potential customers, promoting topic maps among fellow researchers, in short, every aspect of spreading the word about topic maps.

Go read the post and then come back here (using the back button) to post your comments on what astonishes you with a topic map on some particular subject? How do you want to convey that astonishment to others?

Redis Command Page – Topic Map Improvements?

Filed under: Examples,Redis — Patrick Durusau @ 6:38 am

Via Alex Popescu Redis: One Page Command References, an alternative to the Redis command page with Redis One Page command listing.

Take a look at both and then come back to this post.

Do you notice anything odd about the command information from Redis? Redis command page

While I appreciate the Time complexity information, that seems like a mis-nomer for most of the information present. The content is mostly an explanation of the command.

One reason I mention it is that I am thinking a topic map of the commands could certainly treat time complexity as a subject and therefore present all commands of a given time complexity. That could be useful in terms of planning a series of commands.

Another subject would be the examples. How many of them are shared and by which commands?

Will have to push it back and forth this week to see what develops.

Interface suggestions? Occurs to me that the traditional Unix man page layout, albeit with enhanced information, such as “example -> occurs in -> here be all the commands where it occurs” could be text/hyperlink bound might be a useful one.

Just enough addition to a traditional interface to make the additional information from a topic map available.

(The traditional topic map interface qualifies for the “an ill-favoured thing sir, but mine own” comment. I think we can and should do better.)

February 5, 2011

InfiniteGraph 1.1 Release!

Filed under: Graphs,InfiniteGraph,Software — Patrick Durusau @ 11:12 am

InfiniteGraph 1.1 Release!

From the website:

InfiniteGraph 1.1, the distributed graph database, was released today with a new indexing framework that gives users greater performance on indexing, data ingest and lookups. The improvements will help developers more quickly develop and deploy with InfiniteGraph, to process larger graph datasets and collections.

How much faster is this version? We’ve seen 100x faster performance in some scenarios, such as processing multiple indexed fields with large index sizes.

….

(general description of InfiniteGraph)

InfiniteGraph is a distributed, scalable graph database and developer API which enables large-scale graph processing, data analytics and discovery in systems and services developed around social networking, business intelligence, scientific research, national security and other advanced, mission critical requirements. InfiniteGraph offers a unique, graph database solution based on a highly-scalable, distributed data persistence technology that has been deployed in some of the most advanced and mission-critical enterprise and government systems in operation today. Organizations can use this solution to discover complex relationships in their data and develop applications with significant time-to-market advantages and technical cost savings.

On my short list of graph databases to evaluate in 2011 in connection with topic maps.

Not to mention also being folks I need to evangelize about topic maps.

Comments or suggestions on both of those tasks welcome!

Mapping Wikileaks’ Cablegate topics using Python, MongoDB, Neo4j and Gephi

Filed under: Gephi,MongoDB,Neo4j — Patrick Durusau @ 7:55 am

Mapping Wikileaks’ Cablegate topics using Python, MongoDB, Neo4j and Gephi

Data and slides and movies while the conference is ongoing! Oh My!

This is the sort of effort that topic maps needs to step up to and compete against.

I have some thoughts on what that would take with the Afghan war diaries that I will be posting later today.

Subject (defined)

Filed under: Subject Identity — Patrick Durusau @ 6:13 am

Just in case you were thinking that ISO has a handle on the definition of subject in ISO/IEC 13250:

  • subject
    • person in whose ear canal the hearing aid performance is being characterized (ISO 12124:2001)
    • in the most generic sense, a “subject” is any thing whatsoever, regardless of whether it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever (ISO/IEC 13250:2003)
    • Any concept or combination of concepts representing a theme in a document. (ISO 5963:1985)
    • an entity within the TSC that causes operations to be performed. (ISO/IEC 15408-1:2005)
    • anything whatsoever, regardless of whether it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever (ISO/IEC 13250-2:2006)
    • particular information item which corresponds to the object of interest of the natural-language assertions and typically is matched by the context expression of a rule (ISO/IEC 19757-3:2006, yes, the DSDL standard)
    • entity whose public key is certified in a public key certificate (ISO 15782-2:2001)
    • condition under which two or more entities separately have key fragments which, individually, convey no knowledge of the resultant cryptographic key entity whose public key is certified in a public key certificate [split knowledge subject] (ISO 15782-1:2003)
    • individual who participates in a clinical investigation, either as a recipient of the device under investigation or as a control (ISO 14155-1:2003)
    • end-user whose biometric data is intended to be enrolled or compared (ISO/IEC 24713-1:2008)
    • entity whose public key is certified in the certificate (ISO/TS 21091:2005)
    • entity whose public key is certified in a public key certificate (ISO 21188:2006)
    • entity whose public key is certified in a public key certificate (ISO 15782-1:2009)
    • active entity in the TOE that performs operations on objects (ISO/IEC 15408-1:2009)

Questions:

  1. How would you distinguish these uses of subject in a topic map?
  2. How would these uses impact searching across texts?
  3. What if anything would you suggest to minimize the impact of these definitions on searching?

This website, The ISO Concept Database (ISO/CDB), apparently powered by Apache CentOS, is a good example of inappropriate use of open source software.

Perform a search, then go to an item returned by that search, the choose Back to previous search. Application will fail. Close tab. The try again from the homepage. That is where you get the CentOS pages.

I said inappropriate, perhaps the better term is poor. It reflects badly on open source software to have it poorly used.

ISO Concepts Database

Filed under: Semantic Diversity — Patrick Durusau @ 6:10 am

ISO Concepts Database

The ISO Concepts Database has appeared online and will give us a window into semantic diversity at ISO.

As soon as the site stops crashing, I will be posting a report about the term subject. There are thirteen different definitions for that term.

February 4, 2011

TopicView

Filed under: Conferences,Examples,Marketing,Topic Maps — Patrick Durusau @ 3:04 pm

TopicView

TopicView is a project by Morpheus on behalf of the Amsterdam police to bridge the practical and semantic boundaries between their information systems.

That is to say it is a solution that allows existing systems to remain in place, but creating bridges between them to enable the police to make more effective use of the information they do have and to share information across systems.

Do be aware that I used Google’s translate feature to read the homepage of this project so some of my appreciation of it is based on surmises based on my knowledge of topic maps.

I did stumble in some places, such as where the translation reports: Bandages stay hidden for Verbanden blijven verborgen.*

Perhaps fuller information will appear in the future.
*****
*I suspect way off base but since it is a police topic map, I would assume that sources of information can remain hidden, even as the information they provide is shown.

Artificial Intelligence | Natural Language Processing (Topic Maps by Problem Solving)

Filed under: Natural Language Processing,Topic Maps — Patrick Durusau @ 10:27 am

Artificial Intelligence | Natural Language Processing Stanford course with Christopher D. Manning.

From the website:

This course is designed to introduce students to the fundamental concepts and ideas in natural language processing (NLP), and to get them up to speed with current research in the area. It develops an in-depth understanding of both the algorithms available for the processing of linguistic information and the underlying computational properties of natural languages. Wordlevel, syntactic, and semantic processing from both a linguistic and an algorithmic perspective are considered. The focus is on modern quantitative techniques in NLP: using large corpora, statistical models for acquisition, disambiguation, and parsing. Also, it examines and constructs representative systems.

Only the lecture notes, quizzes, etc. are available. Update: 29 April 2011 – Lecture notes, quizzes, and Video’s of the lectures are online.

Still, quite an interesting resource.

I am particularly interested in Manning’s approach of not building the class around an edifice to be mastered but rather around problems to be solved.

As primarily a theorist that is rather disturbing but at the same time, it is strangely attractive.

Wondering what a topic map class would look like that started with two or even three related but distinct data sets?

The sort of data sets that lead to topic maps and to walk through what problems we want to solve and unfold topic maps along the way.

Would be an opportunity to use other software, indexing software for example, to see how they compare with topic maps or can be used in their construction.

Thoughts, suggestions, comments?

The Best Machine Learning Course on the Web

Filed under: Machine Learning — Patrick Durusau @ 9:06 am

The Best Machine Learning Course on the Web by Shubhendu Trivedi.

A bit dated (2008) and reported by Trivedi to be a very board survey type course.

Can anyone suggest anything more recent, of equal quality?

Erlang Factory – SF Bay Area 2010
2011 Coming Up!

Filed under: Conferences,CS Lectures,Erlang — Patrick Durusau @ 8:51 am

Erlang Factory – SF Bay Area 2010

From the website:

The Erlang Factory SFBay Area was a resounding success with 34 speakers delivering talks in three tracks to an audience of over 120! The event was held at the San Francisco Airport Hilton and proved to be the largest Erlang event in the US so far, overtaking last year’s despite the continuing effects of the downturn in the marketplace.

There were delegates and speakers from Argentina, Brazil, Israel, Japan, Canada, Sweden, Denmark, Germany, France, and Italy as well as from all parts of the US. This resulted in a very stimulating environment where Erlang was discussed and user’s experiences compared.

The long term and successful use of Erlang in telecommunications makes me suspect that it has a lot to offer designers of distributed topic map systems.

The presentations and slides from the 2010 conference are available for your viewing.

The Erlang Factory – SF Bay Area 2011 conference is coming up, 21-25 March 2011.

Please post a note if you are working on topic maps using Erlang. Thanks!

Unix – The Hole Hawg

Filed under: Humor,Marketing — Patrick Durusau @ 5:29 am

Unix – The Hole Hawg by Neal Stephenson.

I assume everyone on the Net has seen and enjoyed this item but it’s Friday and I could not resist repeating it.

I particularly liked the line: “….when I got ready to use the Hole Hawg my heart actually began to pound with atavistic terror.”

Don’t know that I want people to view using topic maps with atavistic terror but I would not mind if the people who topic maps were being used against had that feeling. 😉

Nothing markets a product better than fear someone else has a better version (and it about to do something to you with the better version).

Admittedly I don’t have or know of a demonstration of topic maps that would strike fear in anyone, but I suspect that is only a matter of time.

Use The Index, Luke

Filed under: Indexing — Patrick Durusau @ 5:11 am

Use The Index, Luke (A Guide to SQL Performance by Markus Winand) is an interesting site devoted to improving the use of B-tree indexes in relational databases.

Improving the uses of indexes in general is a good idea and given the use of relational databases to persist topic maps, it seemed appropriate to mention it here.

It would be interesting to see comparisons of the uses of B-tree and other indexing structures for a known topic map.

Personally I suspect that the amount of local memory would make more of an impact than any algorithm but that would depend on whether access to stored topic maps is being measured or merging of topic maps. That is another open research question.

If this resource helps with planning persistence of your topic maps or if you have other comments about indexing and/or this resource, please post them.

TweetDeck and Topic Maps

Filed under: Authoring Topic Maps,Marketing — Patrick Durusau @ 4:51 am

If you don’t know TweetDeck.com you need to slide by to take a look.

As an admittedly slow and still uncertain adopter of all this social software, I would appreciate any feedback you have on this or other alternatives.

But, onto the topic map relevant part of this post!

I noticed that TweetDeck 0.37 has a feature: Hide repeated retweets.

I think they should go one better than that and scan tweets for the same shortened URL and to offer an option to display what we would call a topic with multiple occurrences.

That is there would be the one shortend URL, which you could follow if you like, with occurrences under that one tweet that list all the various tweets that contain it.

Would certainly shorten up my tweet windows in TweetDeck a good bit. Most of the repeats aren’t marked as retweets so the software isn’t catching them.

Now, if TweetDeck or equivalent software wanted to be really clever, they could make associations with the senders of those tweets so I could see a list of all the users who sent that resource.

*****
PS: This would be a case where TweetDeck need not offer the generic in your face topic map interface but could offer some of the advantages of topic maps (de-duping content and gathering up all the authors of the same content).

Topic Map Competition

Filed under: Authoring Topic Maps,Interface Research/Design,Marketing,News,Topic Maps — Patrick Durusau @ 4:34 am

The idea of a topic map competition seems like a good one to me.

We need to demonstrate that topic map development isn’t like a trip to the ontological dentist or protologist.

Just some random thoughts that hopefully can firm up in the near future.

Suggest starting off with two contests, with two different data sets.

24-Hour Topic Map

A 24 hour contest, with points, in part, for inclusion of participants in different time zones. To encourage the spread of topic maps around the globe.

Each team would be encouraged (required?) to keep a blog while developing the topic map so that the progress of the map, interaction with others, etc., could be documented.

Points to be awarded for participants in different time zones (up to 24 points), up to 25 points for extraction of subjects/creation of topic map structures, up to 25 points for the interface/delivery, and up to 26 points for generality of the scripts/software used in generating the map.

The greatest number of points being for generality of scripts/software so we can encourage others to try these techniques on their own data sets.

7-Day Topic Map

Not unlike the 24-Hour Topic Map (24HTM) contest except that with a much longer time period, the expectations for the results are much higher.

Points should still be awarded for participants in different time zones but should drop to 12 points, extraction/subject map structures should remain at up to 25 points, interfaces/delivery should go up to 31 points and scripts/software, up to 32 points.

Since the teams will be composed of multiple individuals, I suspect prizes are going to be limited to award certificates, listing on public websites as the winners, etc.

Any number of governments are mandating a transition to digital records (including XML) as though that will solve their access problems. For those seeking contracts, being recognized for work with a data set from a particular government could not hurt.

I suppose that may depend on whether the government views you as having permission to work with the data set. 😉

This is a very rough draft and needs a lot more details before being something practical.

PS: Should either one or both or some other variation of this suggestion prove popular, contests could be run on a monthly basis.

February 3, 2011

Scala Update with Martin Odersky

Filed under: Scala — Patrick Durusau @ 7:56 pm

Scala Update with Martin Odersky

From the website:

This episode is an update on the developments around the Scala language. We covered the new features in 2.7 and 2.8, as well as what’s planned for 2.9. We then discussed briefly the different “proficiency levels” of Scala programmers. The main part of the episode centered around Martin’s new research project: the polymorphic embedding of DSLs for expressing concurrency into Scala.

Scala is important for a number of uses, not the least of which is noted in: Introduction to Category Theory in Scala

At this Scala Update, you will find: The research project. Follow it. Takes you to a notice about a 5 year European Research Grant that was won by the Scala Research Group. Looks very important and quite possibly an area where topic maps might want to play.

Software Engineering Radio

Filed under: CS Lectures — Patrick Durusau @ 4:36 pm

Software Engineering Radio

I ran across this while following up a lead on a Scala update (to be covered in a separate post).

Just scanning a few of the archived pod-casts I saw materials on Agile programming, JUnit, on being a consultant, NoSQL, etc.

I am reminded that at its inception and even now, in the better projects, software projects aren’t limited to programmers or engineers but have a rich mixture of humanists, mathematicians, logic types, historians (of information as well as the domain), librarians, domain specialists, users (not just user representatives), and others.

Not every project needs them in the same proportions or can even afford to have them all.

But re-inventing semantic diversity using IRIs instead of word tokens is a good example of a lack of diversity in both input and decision making for that project.

That diversity argument applies to other aspects of software projects as well, not just to engineers so don’t start feeling too smug.

Humanists need to learn more about software processes, we can all learn from librarians, project leads can learn from users, etc.

Selected Best Paper Awards – 1996 to date

Filed under: CS Lectures — Patrick Durusau @ 4:16 pm

Selected Best Paper Awards – 1996 to date

Jeff Huang has collected the best cs paper awards since 1996 into a single listing.

Three things occur to me:

  1. We all owe Jeff a kind mention for completing such an interesting listing of papers!
  2. We should contribute to this listing to extend it beyond 1996.
  3. At least for students in my class, you choose two papers to summarize and describe how they is relevant to topic maps. (2-3 pages, no citations, one summary due mid-term, the second summary due at the end of the term.)

PyBrain: The Python Machine Learning Library

PyBrain: The Python Machine Learning Library

From the website:

PyBrain is a modular Machine Learning Library for Python. Its goal is to offer flexible, easy-to-use yet still powerful algorithms for Machine Learning Tasks and a variety of predefined environments to test and compare your algorithms.

PyBrain is short for Python-Based Reinforcement Learning, Artificial Intelligence and Neural Network Library. In fact, we came up with the name first and later reverse-engineered this quite descriptive “Backronym”.

How is PyBrain different?

While there are a few machine learning libraries out there, PyBrain aims to be a very easy-to-use modular library that can be used by entry-level students but still offers the flexibility and algorithms for state-of-the-art research. We are constantly working on more and faster algorithms, developing new environments and improving usability.

What PyBrain can do

PyBrain, as its written-out name already suggests, contains algorithms for neural networks, for reinforcement learning (and the combination of the two), for unsupervised learning, and evolution. Since most of the current problems deal with continuous state and action spaces, function approximators (like neural networks) must be used to cope with the large dimensionality. Our library is built around neural networks in the kernel and all of the training methods accept a neural network as the to-be-trained instance. This makes PyBrain a powerful tool for real-life tasks.

Another tool kit to assist in the construction of topic maps.

And another likely contender for the Topic Map Competition!

MALLET: MAchine Learning for LanguagE Toolkit
Topic Map Competition (TMC) Contender?

MALLET: MAchine Learning for LanguagE Toolkit

From the website:

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

MALLET includes sophisticated tools for document classification: efficient routines for converting text to “features”, a wide variety of algorithms (including Naïve Bayes, Maximum Entropy, and Decision Trees), and code for evaluating classifier performance using several commonly used metrics.

In addition to classification, MALLET includes tools for sequence tagging for applications such as named-entity extraction from text. Algorithms include Hidden Markov Models, Maximum Entropy Markov Models, and Conditional Random Fields. These methods are implemented in an extensible system for finite state transducers.

Topic models are useful for analyzing large collections of unlabeled text. The MALLET topic modeling toolkit contains efficient, sampling-based implementations of Latent Dirichlet Allocation, Pachinko Allocation, and Hierarchical LDA.

Many of the algorithms in MALLET depend on numerical optimization. MALLET includes an efficient implementation of Limited Memory BFGS, among many other optimization methods.

In addition to sophisticated Machine Learning applications, MALLET includes routines for transforming text documents into numerical representations that can then be processed efficiently. This process is implemented through a flexible system of “pipes”, which handle distinct tasks such as tokenizing strings, removing stopwords, and converting sequences into count vectors.

An add-on package to MALLET, called GRMM, contains support for inference in general graphical models, and training of CRFs with arbitrary graphical structure.

Another tool to assist in the authoring of a topic map from a large data set.

It would be interesting but beyond the scope of the topic maps class, to organize a competition around several of the natural language processing packages.

To have a common data set, to be released on X date, with topic maps due say within 24 hours (there is a TV show with that in the title or so I am told).

Will have to give that some thought.

Could be both interesting and entertaining.

Twenty-Fourth Annual Conference on Neural Information Processing Systems (NIPS) 2010

Filed under: Biomedical,Conferences,Neural Networks — Patrick Durusau @ 3:18 pm

Twenty-Fourth Annual Conference on Neural Information Processing Systems (NIPS) 2010

Another treasure trove of conference presentations, tutorials and other materials of interest to anyone working on information systems.

From the website:

You are invited to participate in the Twenty-Fourth Annual Conference on Neural Information Processing Systems, which is the premier scientific meeting on Neural Computation.

A one-day Tutorial Program offered a choice of six two-hour tutorials by leading scientists. The topics span a wide range of subjects including Neuroscience, Learning Algorithms and Theory, Bioinformatics, Image Processing, and Data Mining.

The NIPS Conference featured a single track program, with contributions from a large number of intellectual communities. Presentation topics include: Algorithms and Architectures; Applications; Brain Imaging; Cognitive Science and Artificial Intelligence; Control and Reinforcement Learning; Emerging Technologies; Learning Theory; Neuroscience; Speech and Signal Processing; and Visual Processing.

There were two Posner Lectures named in honor of Ed Posner who founded NIPS. Ed worked on communications and information theory at Caltech and was an early pioneer in neural networks. He organized the first NIPS conference and workshop in Denver in 1989 and incorporated the NIPS Foundation in 1992. He was an inpiring teacher and an effective leader. His untimely death in a bicycle accident in 1993 was a great loss to our community. Posner Lecturers were Josh Tenebaum and Michael Jordan.

The Poster Sessions offered high-quality posters and an opportunity for researchers to share their work and exchange ideas in a collegial setting. The majority of contributions accepted at NIPS were presented as posters.

The Demonstrations enabled researchers to highlight scientific advances, systems, and technologies in ways that go beyond conventional poster presentations. It provided a unique forum for demonstrating advanced technologies — both hardware and software — and fostering the direct exchange of knowledge.

February 2, 2011

Mapping Wikileaks’ Cablegate using Python, mongoDB and Gephi – Saturday, 5 Feburary 2011

Filed under: Gephi,MongoDB,Natural Language Processing — Patrick Durusau @ 10:34 am

Mapping Wikileaks’ Cablegate using Python, mongoDB and Gephi

From the website:

Text analysis and graph visualization on the Wikileaks Cablegate dataset.

We propose to present a complete work-flow of textual data analysis, from acquisition to visual exploration of a complex network. Through the presentation of a simple software specifically developed for this talk, we will cover a set of productive and widely used softwares and libraries in text analysis, then introduce some features of Gephi, an open-source network visualization & analysis software, using the data collected and transformed with cablegate-semnet.

See: cablegate-semnet

If you are in (or can be) Brussels, Belgium this coming Saturday and Sunday, don’t miss this presentation!

There will be many others worthy of your attention as well.

Data Governance, Data Architecture and Metadata Essentials – Webinar

Filed under: Data Governance,Data Integration,Marketing — Patrick Durusau @ 9:20 am

Data Governance, Data Architecture and Metadata Essentials

Date: February 24, 2011 Time: 9:00AM PT

Speaker: David Loshin

From the website:

The absence of data governance standards is a critical failure point for enterprise data repurposing. As the rates of data volume grows, you want to make sure you are employing the correct practices and standards to make the most of this volume of information. Data can be your company’s best or worst asset. Join David Loshin, industry expert on data governance for this informative webcast.

I suppose it goes without saying that an absence of data governance means that a topic map effort to use outside data is going to be even more expensive. Or perhaps not.

People have been urging documentation of data practices since before the advent of the digital computer. That is still the starting point for any data governance.

What you don’t know about you can’t govern. It’s just that simple. (Can’t merge it with outside data either. But if your internal systems are toast, topic maps aren’t going to save you.)

CrowdFlower

Filed under: Authoring Topic Maps,Crowd Sourcing,Interface Research/Design — Patrick Durusau @ 9:16 am

CrowdFlower

From the website:

Like Cloud computing with People.

Computers can’t do every task. Luckily, we have people to help.

We provide instant access to an elastic labor force. And our statistical quality control technology yields results you can trust.

From CrowdFlower Gets Gamers to Do Real Work for Virtual Pay

Here’s how it works. CrowdFlower embeds tasks in online games like FarmVille, Restaurant City, It Girl, Happy Aquarium, Happy Pets, Happy Island and Pop Boom. This means that the estimated 80 million gamers — from teens to homemakers — who are hooked on FarmVille, Zynga’s popular virtual farming game on Facebook, can be transformed into a virtual workforce.

To get to the next level in FarmVille, for example, the gamer might need 600 XP (XP means “experience” in Farmville parlance). So the gamer might buy a bed and breakfast building for $60 in FarmVille cash, which would earn him 600 XP. But for many gamers, revenue — and XP — from crop harvesting comes too slowly.

To earn game money quickly, the gamer can click a tab on the FarmVille page that links to real-world tasks to be performed by crowdsourced workers. Once the task is successfully completed, the gamer gets his FarmVille cash and CrowdFlower is paid by the client. The latter pays in real money, usually with a 10 percent markup.

Like any number of crowd sourcing services but I was struck by the notion of embedding tasks inside games for virtual payment.

Not the answer to all topic map authoring tasks but certainly worth thinking about.

Question: Does anyone have experience with creating topic maps by embedding tasks in online games?

Pivot Labs – Talks

Filed under: Software — Patrick Durusau @ 9:15 am

Pivot Labs – Talks

Putting this under software alone is accurate but so insufficient.

Quite a range of videos and the few that I have watched so far proved to be interesting.

Available in both mpeg-4 as well as mp3 formats.

Interview with Salvatore Sanfilippo on Redis – Podcast

Filed under: NoSQL,Redis — Patrick Durusau @ 9:14 am

Redis with Salvatore Sanfilippo Podcast from MyNoSQL by Alex Popescu.

Sanfilippo says that Redis is a key/value database to be sure but from another point of view, it is also specific values that have data models. (?A fifteen minutes introduction to Redis data types)

See the Redis homepage

Which reminds me of a post on the nature of keys that I have been meaning to finish. More on that topic soon.

Design Patterns for Efficient Graph Algorithms in MapReduce

Filed under: Graphs,MapReduce — Patrick Durusau @ 8:32 am

Design Patterns for Efficient Graph Algorithms in MapReduce Authors: Jimmy Lin and Michael Schatz

Abstract:

Graphs are analyzed in many important contexts, including ranking search results based on the hyperlink structure of the world wide web, module detection of proteinprotein interaction networks, and privacy analysis of social networks. Many graphs of interest are difficult to analyze because of their large size, often spanning millions of vertices and billions of edges. As such, researchers have increasingly turned to distributed solutions. In particular, MapReduce has emerged as an enabling technology for large-scale graph processing. However, existing best practices for MapReduce graph algorithms have significant shortcomings that limit performance, especially with respect to partitioning, serializing, and distributing the graph. In this paper, we present three design patterns that address these issues and can be used to accelerate a large class of graph algorithms based on message passing, exemplified by PageRank. Experiments show that the application of our design patterns reduces the running time of PageRank on a web graph with 1.4 billion edges by 69%.

I wonder if the partitioning into similar domains (their term with no prompting from me) would have the same impact on merging in a topic map?

February 1, 2011

About Version Vectors (a.k.a. Vector Clocks)

Filed under: Topic Map Systems,Version Vectors — Patrick Durusau @ 9:11 pm

About Version Vectors (a.k.a. Vector Clocks) by Kresten Krab Thorup.

Using spreadsheets as an example, Kresten explains how version vectors can solve a large class of versioning issues but not all.

Assuming you are interested in distributed topic map systems, versioning that leads to acceptable (not perfect) results will interest to you.

This is going to become more important as topic maps develop into distributed systems.

STRING – Known and Predicted Protein-Protein Interactions

Filed under: Associations,Bioinformatics,Biomedical — Patrick Durusau @ 7:43 pm

STRING – Known and Predicted Protein-Protein Interactions

From the website:

STRING is a database of known and predicted protein interactions.

The interactions include direct (physical) and indirect (functional) associations; they are derived from four sources:

  • Genomic Context
  • High-throughput Experiments
  • (Conserved) Coexpression
  • Previous Knowledge

STRING quantitatively integrates interaction data from these sources for a large number of organisms, and transfers information between these organisms where applicable. The database currently covers 2,590,259 proteins from 630 organisms. (Note: I had to alter the presentation from the website, which was a table to a list for the sources for the interactions.)

Looks like fertile ground for research on associations.

A Short eBook on Scaling MongoDB

Filed under: MongoDB,NoSQL — Patrick Durusau @ 7:52 am

A Short eBook on Scaling MongoDB

Kristina Chodorow’s blog, Snail in a Turtleneck announced a short eBook by Kristina.

I haven’t read it, yet, but am sure to be doing so in the near future.

Comments on the same are welcome!

« Newer Posts

Powered by WordPress