Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 4, 2013

Crowdsourcing Multi-Label Classification for Taxonomy Creation

Filed under: Crowd Sourcing,Decision Making,Machine Learning,Taxonomy — Patrick Durusau @ 5:19 pm

Crowdsourcing Multi-Label Classification for Taxonomy Creation by Jonathan Bragg, Mausam and Daniel S. Weld.

Abstract:

Recent work has introduced CASCADE, an algorithm for creating a globally-consistent taxonomy by crowdsourcing microwork from many individuals, each of whom may see only a tiny fraction of the data (Chilton et al. 2013). While CASCADE needs only unskilled labor and produces taxonomies whose quality approaches that of human experts, it uses significantly more labor than experts. This paper presents DELUGE, an improved workflow that produces taxonomies with comparable quality using significantly less crowd labor. Specifically, our method for crowdsourcing multi-label classification optimizes CASCADE’s most costly step (categorization) using less than 10% of the labor required by the original approach. DELUGE’s savings come from the use of decision theory and machine learning, which allow it to pose microtasks that aim to maximize information gain.

An extension of work reported at Cascade: Crowdsourcing Taxonomy Creation.

While the reduction in required work is interesting, the ability to sustain more complex workflows looks like the more important.

That will require the development of workflows to be optimized, at least for subject identification.

Or should I say validation of subject identification?

What workflow do you use for subject identification and/or validation of subject identification?

September 24, 2013

Crowdsourcing.org

Filed under: Crowd Sourcing — Patrick Durusau @ 12:45 pm

Crowdsourcing.org

From the about page:

Crowdsourcing.org is the leading industry resource offering the largest online repository of news, articles, videos, and site information on the topic of crowdsourcing and crowdfunding.

Founded in 2010, crowdsourcing.org, is a neutral professional association dedicated solely to crowdsourcing and crowdfunding. As one of the most influential and credible authorities in the crowdsourcing space, crowdsourcing.org is recognized worldwide for its intellectual capital, crowdsourcing and crowdfunding practice expertise and unbiased thought leadership.

Crowdsourcing.org’s mission is to serve as an invaluable source of information to analysts, researchers, journalists, investors, business owners, crowdsourcing experts and participants in crowdsourcing and crowdfunding platforms. (emphasis in original)

If you are interested in crowdsourcing, there are worse places to start searching. 😉

Seriously, Crowdsourcing.org, hosts a directory of 2,482 crowdsourcing and crowdfunding sites (as of September 24, 2013) along with numerous other resources.

September 23, 2013

…Crowd-Sourcing to Classify Strange Oceanic Creatures

Filed under: Classification,Crowd Sourcing — Patrick Durusau @ 3:48 pm

Plankton Portal Uses Crowd-Sourcing to Classify Strange Oceanic Creatures

From the post:

Today, an online citizen-science project launches called “Plankton Portal” was created by researchers at the University of Miami Rosenstiel School of Marine and Atmospheric Sciences (RSMAS) in collaboration with the National Oceanic and Atmospheric Administration (NOAA) and the National Science Foundation (NSF) and developers at Zooniverse.org Plankton Portal allows you to explore the open ocean from the comfort of your own home. You can dive hundreds of feet deep, and observe the unperturbed ocean and the myriad animals that inhabit Earth’s last frontier.

The goal of the site is to enlist volunteers to classify millions of underwater images to study plankton diversity, distribution and behavior in the open ocean. It was developed under the leadership of Dr. Robert K. Cowen, UM RSMAS Emeritus Professor in Marine Biology and Fisheries (MBF) and now the Director of Oregon State University’s Hatfield Marine Science Center, and by Research Associate Cedric Guigand and MBF graduate students Jessica Luo and Adam Greer.

Millions of plankton images are taken by the In Situ Ichthyoplankton Imaging System (ISIIS), a unique underwater robot engineered at the University of Miami in collaboration with Charles Cousin at Bellamare LLC and funded by NOAA and NSF. ISIIS operates as an ocean scanner that casts the shadow of tiny and transparent oceanic creatures onto a very high resolution digital sensor at very high frequency. So far, ISIIS has been used in several oceans around the world to detect the presence of larval fish, small crustaceans and jellyfish in ways never before possible. This new technology can help answer important questions ranging from how do plankton disperse, interact and survive in the marine environment, to predicting the physical and biological factors could influence the plankton community.

You can go to Zoniverse.org or jump directly to the Plankton Portal.

If plankton don’t excite you all that much, consider one of the other projects at Zoniverse:

Galaxy Zoo
How do galaxies form?
NASA’s Hubble Space Telescope archive provides hundreds of thousands of galaxy images.
Ancient Lives
Study the lives of ancient Greeks
The data gathered by Ancient Lives helps scholars study the Oxyrhynchus collection.
Moon Zoo
Explore the surface of the Moon
We hope to study the lunar surface in unprecedented detail.
WhaleFM
Hear Whales communicate
You can help marine researchers understand what whales are saying
Solar Stormwatch
Study explosions on the Sun
Explore interactive diagrams to learn about the Sun and the spacecraft monitoring it.
Seafloor Explorer
Help explore the ocean floor
The HabCam team and the Woods Hole Oceanographic Institution need your help!
PlanetHunters.org
Find planets around stars
Lightcurve changes from the Kepler spacecraft can indicate transiting planets.
Bat Detective
You’re hot on the trail of bats!
Help scientists characterise bat calls recorded by citizen scientists.
The Milky Way Project
How do stars form?
We’re asking you to help us find and draw circles on infrared image data from the Spitzer Space Telescope.
Snapshot Serengeti
Go wild in the Serengeti!
We need your help to classify all the different animals caught in millions of camera trap images.
Planet Four
Explore the Red Planet
Planetary scientists need your help to discover what the weather is like on Mars.
Notes from Nature
Take Notes from Nature
Transcribe museum records to take notes from nature, contribute to science.
SpaceWarps
Help us find gravitational lenses
Imagine a galaxy, behind another galaxy. Think you won’t see it? Think again.
Plankton Portal
No plankton means no life in the ocean
Plankton are a critically important food source for our oceans.
oldWeather
Model Earth’s climate using historic ship logs
Help scientists recover Arctic and worldwide weather observations made by US Navy and Coast Guard ships.
Cell Slider
Analyse real life cancer data.
You can help scientists from the world’s largest cancer research institution find cures for cancer.
CycloneCenter
Classify over 30 years of tropical cyclone data.
Scientists at NOAA’s National Climatic Data Center need your help.
Worm Watch Lab
Track genetic mysteries
We can better understand how our genes work by spotting the worms laying eggs.

I count eighteen (18) projects and this is just one of the many crowd source project collections.

Question: We overcome semantic impedance to work cooperatively on these projects, what is it that creates semantic impedance in other projects?

Or perhaps better: How do we or others benefit from the presence of semantic impedance?

The second question might lead to a strategy that replaces that benefit with a bigger one from using topic maps.

September 15, 2013

Are crowdsourced maps the future of navigation? [Supplying Context?]

Filed under: Crowd Sourcing,Mapping,Maps,Open Street Map — Patrick Durusau @ 3:05 pm

Are crowdsourced maps the future of navigation? by Kevin Fitchard.

From the post:

Given the craziness of the first two weeks in September in the tech world an interesting hire that should have gotten more attention slipped largely through the cracks. Steve Coast, founder of the OpenStreetMap project, has joined Telenav, signaling a big move by the navigation outfit toward crowdsourced mapping.

OpenStreetMap is the Wikipedia of mapping. OSM’s dedicated community of 1.3 million editors have gathered GPS data while driving, biking and walking the streets of the world to build a map from the ground up. They’ve even gone so far as to mark objects that exist on few other digital maps, from trees to park benches. That map was then offered up free to all comers.

Great story about mapping, crowd sourcing, etc., but it also has this gem:

For all of its strengths, OSM primarily has been a display map filled with an enormous amount of detail — Coast said editors will spend hours placing individual trees on boulevards. Many editors often don’t want to do the grunt work that makes maps truly useful for navigation, like filling in address data or labeling which turns are allowed at an intersection. (emphasis added)

Sam Hunting has argued for years that hobbyists, sports fans, etc., are naturals for entering data into topic maps.

Well, assuming an authoring interface with a low enough learning curve.

I went to the OpenStreetMap project, discovered an error in Covington, GA (where I live), created an account, watched a short editing tutorial and completed my first edit in about ten (10) minutes. I refreshed my browser and the correction is in place.

Future edits/corrections should be on the order of less than two minutes.

Care to name a topic map authoring interface that easy to use?

Not an entirely fair question because the geographic map provided me with a lot of unspoken context.

For example, I did not create associations between my correction and the City of Covington, Newton County, Georgia, United States, Western Hemisphere, Northern Hemisphere, Earth, or fill in types or roles for all those associations. Or remove any of the associations, types or roles that were linked to the incorrect information.

Baseball fans are reported to be fairly fanatical. But can you imagine any fan starting a topic map of baseball from scratch? I didn’t think so either. But on the other hand, what if there was an interface styled in a traditional play by play format, that allowed fans to capture games in progress? And as the game progresses, the associations and calculations on those associations (stats) are updated.

All the fan is doing is entering familiar information, allowing the topic map engine to worry about types, associations, etc.

Is that the difficulty with semantic technology interfaces?

That we require users to do more than enter the last semantic mile?

July 19, 2013

Designing Topic Map Languages

Filed under: Crowd Sourcing,Graphics,Visualization — Patrick Durusau @ 2:00 pm

A graphical language for explaining, discussing, planning topic maps has come up before. But no proposal has ever caught on.

I encountered a paper today that describes how to author a notation language with a 300% increase in semantic transparency for novices and a reduction of interpretation errors by a factor of 5.

Interested?

Visual Notation Design 2.0: Designing UserComprehensible Diagramming Notations by Daniel L. Moody, Nicolas Genon, Patrick Heymans, Patrice Caire.

Designing notations that business stakeholders can understand is one of the most difficult practical problems and greatest research challenges in the IS field. The success of IS development depends critically on effective communication between developers and end users, yet empirical studies show that business stakeholders understand IS models very poorly. This paper proposes a radical new approach to designing diagramming notations that actively involves end users in the process. We use i*, one of the leading requirements engineering notations, to demonstrate the approach, but the same approach could be applied to any notation intended for communicating with non-experts. We present the results of 6 related empirical studies (4 experiments and 2 nonreactive studies) that conclusively show that novices consistently outperform experts in designing symbols that are comprehensible to novices. The differences are both statistically significant and practically meaningful, so have implications for IS theory and practice. Symbols designed by novices increased semantic transparency (their ability to be spontaneously interpreted by other novices) by almost 300% compared to the existing i* diagramming notation and reduced interpretation errors by a factor of 5. The results challenge the conventional wisdom about visual notation design, which has been accepted since the beginning of the IS field and is followed unquestioningly today by groups such as OMG: that it should be conducted by a small team of technical experts. Our research suggests that instead it should be conducted by large numbers of novices (members of the target audience). This approach is consistent with principles of Web 2.0, in that it harnesses the collective intelligence of end users and actively involves them as codevelopers (“prosumers”) in the notation design process rather than as passive consumers of the end product. The theoretical contribution of this paper is that it provides a way of empirically measuring the user comprehensibility of IS notations, which is quantitative and practical to apply. The practical contribution is that it describes (and empirically tests) a novel approach to developing user comprehensible IS notations, which is generalised and repeatable. We believe this approach has the potential to revolutionise the practice of IS diagramming notation design and change the way that groups like OMG operate in the future. It also has potential interdisciplinary implications, as diagramming notations are used in almost all disciplines.

This is a very exciting paper!

I thought the sliding scale from semantic transparency (mnemonic) to semantic opacity (conventional) to semantic perversity (false mnemonic) was particularly good.

Not to mention that their process is described in enough detail for others to use the same process.

For designing a Topic Map Graphical Language?

What about designing the next Topic Map Syntax?

We are going to be asking “novices” to author topic maps. Why not ask them to author the language?

And not just one language. A language for each major domain.

Talk about stealing the march on competing technologies!

June 26, 2013

Choosing Crowdsourced Transcription Platforms at SSA 2013

Filed under: Crowd Sourcing,Manuscripts — Patrick Durusau @ 1:45 pm

Choosing Crowdsourced Transcription Platforms at SSA 2013

Transcript of Ben Brumfield’s presentation at the Society of Southwestern Archivists. Audio available.

Ben covers the principles of crowd sourced projects, such as:

Now, I’m an open source developer, and in the open source world we tend to differentiate between “free as in beer” or “free as in speech”.

puppy

Crowdsourcing projects are really “free as in puppy”. The puppy is free, but you have to take care of it; you have to do a lot of work. Because volunteers that are participating in these things don’t like being ignored. They don’t like having their work lost. They’re doing something that they feel is meaningful and engaging with you, therefore you need to make sure their work is meaningful and engage with them.

For the details on tools, Ben points us to: Collaborative Transcription Tools.

You will need the technology side for a crowd sourced project, topic map related or not.

But don’t neglect the human side of such a project. At least if you want a successful project.

June 12, 2013

State of the OpenStreetMap [Watching the Watchers?]

Filed under: Crowd Sourcing,OpenStreetMap — Patrick Durusau @ 3:13 pm

State of the OpenStreetMap by Nathan Yau.

OpenStreetMap

Nathan reminds us to review the OpenStreetMap Data Report, which includes a dynamic map showing changes as they are made.

OpenStreetMap has exceeded 1,000,000 users and 1,000 mappers contribute every day.

I wonder if the OpenStreetMap would be interested in extending its Features to include a “seen-at” tag?

So people can upload cellphone photos with geotagging of watchers.

With names, if known, if not, perhaps other users can supply names.

June 5, 2013

Crowdsourcing + Machine Learning…

Filed under: Crowd Sourcing,Machine Learning,Manuscripts — Patrick Durusau @ 9:20 am

Crowdsourcing + Machine Learning: Nicholas Woodward at TCDL by Ben W. Brumfield.

I was so impressed by Nicholas Woodward’s presentation at TCDL this year that I asked him if I could share “Crowdsourcing + Machine Learning: Building an Application to Convert Scanned Documents to Text” on this blog.

Hi. My name is Nicholas Woodward, and I am a Software Developer for the University of Texas Libraries. Ben Brumfield has been so kind as to offer me an opportunity to write a guest post on his blog about my approach for transcribing large scanned document collections that combines crowdsourcing and computer vision. I presented my application at the Texas Conference on Digital Libraries on May 7th, 2013, and the slides from the presentation are available on TCDL’s website. This purpose of this post is to introduce my approach along with a test collection and preliminary results. I’ll conclude with a discussion on potential avenues for future work.

Before we delve into algorithms for computer vision and what-not, I’d first like to say a word about the collection used in this project and why I think it’s important to look for new ways to complement crowdsourcing transcription. The Guatemalan National Police Historical Archive (or AHPN, in Spanish) contains the records of the Guatemalan National Police from 1882-2005. It is estimated that AHPN contains more than 80 million pages of documents (8,000 linear meters) such as handwritten journals and ledgers, birth certificate and marriage license forms, identification cards and typewritten letters. To date, the AHPN staff have processed and digitized approximately 14 million pages of the collection, and they are publicly available in a digital repository that was developed by UT Libraries.

While unique for its size, AHPN is representative of an increasingly common problem in the humanities and social sciences. The nature of the original documents precludes any economical OCR solution on the scanned images (See below), and the immense size of the collection makes page-by-page transcription highly impractical, even when using a crowdsourcing approach. Additionally, the collection does not contain sufficient metadata to support browsing via commonly used traits, such as titles or authors of documents.

A post at the intersection of many of my interests!

Imagine pushing this just a tad further to incorporate management of subject identity, whether visible to the user or not.

crowdcrafting

Filed under: Crowd Sourcing,Interface Research/Design,Usability — Patrick Durusau @ 8:04 am

crowdcrafting

Crowdcrafting is an instance of PyBossa:

From the about page:

PyBossa is a free, open-source crowd-sourcing and micro-tasking platform. It enables people to create and run projects that utilise online assistance in performing tasks that require human cognition such as image classification, transcription, geocoding and more. PyBossa is there to help researchers, civic hackers and developers to create projects where anyone around the world with some time, interest and an internet connection can contribute.

PyBossa is different to existing efforts:

  • It’s a 100% open-source
  • Unlike, say, “mechanical turk” style projects, PyBossa is not designed to handle payment or money — it is designed to support volunteer-driven projects.
  • It’s designed as a platform and framework for developing deploying crowd-sourcing and microtasking apps rather than being a crowd-sourcing application itself. Individual crowd-sourcing apps are written as simple snippets of Javascript and HTML which are then deployed on a PyBossa instance \(such as_ CrowdCrafting.org). This way one can easily develop custom apps while using the PyBossa platform to store your data, manage users, and handle workflow.

You can read more about the architecture in the PyBossa Documentation and follow the step-by-step tutorial to create your own apps.

Are interfaces for volunteer projects better than for-hire projects?

Do they need to be?

How would you overcome the gap between “…this is how I see the interface (the developers)…” versus the interface that users prefer?

Hint: 20th century advertising discovered that secret decades ago. See: Predicting What People Want and especially the reference to Selling Blue Elephants.

June 3, 2013

(Re)imagining the Future of Work

Filed under: Crowd Sourcing,Interface Research/Design — Patrick Durusau @ 2:19 pm

(Re)imagining the Future of Work by Tatiana.

From the post:

Here at CrowdFlower, our Product and Engineering teams are a few months into an ambitious project: building everything we’ve learned about crowdsourcing in the past five years as industry leaders into a new, powerful and intuitive platform.

Today, we’re excited to kick off a monthly blog series that gives you an insider pass to our development process.

Here, we’ll cover the platform puzzles CrowdFlower wrestles with everyday:

  • How do we process 4 million human judgments per day with a relatively small engineering team?
  • Which UX will move crowdsourcing from the hands of early adopters into the hands of every business that requires repetitive, online work?
  • What does talent management mean in an online crowd of millions?
  • Can we become an ecosystem for developers who want to build crowdsourcing apps and tools for profit?
  • Most of all: what’s it like to rebuild a platform that carries enormous load… a sort of pit-crewing of the car while it’s hurtling around the track, or multi-organ transplant.

Our first post next week will dive into one of our recent projects: the total rewrite of our worker interface. It’s common lore that engaging in a large code-rewrite project is risky at best, and a company-killer at worst. We’ll tell you how we made it through with only a few minor scrapes and bruises, and many happier workers.

Questions:

How is a crowd different from the people who work for your enterprise?

If you wanted to capture the institutional knowledge of your staff, would the interface look like a crowd-source UI?

Should capturing institutional knowledge be broken into small tasks?

Important lessons for interfaces may emerge from this series!

May 14, 2013

Cascade: Crowdsourcing Taxonomy Creation

Filed under: Crowd Sourcing,Taxonomy — Patrick Durusau @ 12:52 pm

Cascade: Crowdsourcing Taxonomy Creation by Lydia B. Chilton, Greg Little, Darren Edge, Daniel S. Weld, James A. Landay.

Abstract:

Taxonomies are a useful and ubiquitous way of organizing information. However, creating organizational hierarchies is difficult because the process requires a global understanding of the objects to be categorized. Usually one is created by an individual or a small group of people working together for hours or even days. Unfortunately, this centralized approach does not work well for the large, quickly-changing datasets found on the web. Cascade is an automated workflow that creates a taxonomy from the collective efforts of crowd workers who spend as little as 20 seconds each. We evaluate Cascade and show that on three datasets its quality is 80-90% of that of experts. The cost of Cascade is competitive with expert information architects, despite taking six times more human labor. Fortunately, this labor can be parallelized such that Cascade will run in as fast as five minutes instead of hours or days.

In the introduction the authors say:

Crowdsourcing has become a popular way to solve problems that are too hard for today’s AI techniques, such as translation, linguistic tagging, and visual interpretation. Most successful crowdsourcing systems operate on problems that naturally break into small units of labor, e.g., labeling millions of independent photographs. However, taxonomy creation is much harder to decompose, because it requires a global perspective. Cascade is a unique, iterative workflow that emergently generates this global view from the distributed actions of hundreds of people working on small, local problems.

The authors demonstrate the potential for time and cost savings in the creation of taxonomies but I take the significance of their paper to be something different.

As the paper demonstrates, taxonomy creation does not require a global perspective.

Any one of the individuals who participated, contributed localized knowledge that when combined with other localized knowledge, can be formed into what an observer would call a taxonomy.

A critical point since every user represents/reflects slightly varying experiences and viewpoints, while the most learned expert represents only one.

Does “your” taxonomy reflect your views or some expert’s?

May 9, 2013

Help Map Historical Weather From Ship Logs

Filed under: Climate Data,Crowd Sourcing,History,Topic Maps — Patrick Durusau @ 1:05 pm

Help Map Historical Weather From Ship Logs by Caitlin Dempsey.

From the post:

The Old Weather project is a crowdsourcing data gathering endeavor to understand and map historical weather variability. The data collected will be used to understand past weather patterns and extremes in order to better predict future weather and climate. The project is headed by a team of collaborators from a range of agencies such as NOAA, the Met Office, the National Archives, and the National Maritime Museum.

Information about historical weather, in the form of temperature and pressure measurements, can be gleaned from old ship logbooks. For example, Robert Fitzory, the Captain of the Beagle, and his crew recorded weather conditions in their logs at every point the ship visited during Charles Darwin’s expedition. The English East India from the 1780s to the 1830s made numerous trips between the United Kingdom and China and India, with the ship crews recording weather measurements in their log books. Other expeditions to Antarctica provide rare historical measurements for that region of the world.

By utilizing a crowdsourcing approach, the Old Weather project team aims to use the collective efforts of public participation to gather data and to fact check data recorded from log books. There are 250,000 log books stored in the United Kingdom alone. Clive Wilkinson, a climate historian and research manager for the Recovery of Logbooks and International Marine Data (RECLAIM) Project, a part of NOAA’s Climate Database Modernisation Program, notes there are billions of unrecorded weather observations stored in logbooks around the world that could be captured and use to better climate prediction models.

In addition to climate data, I suspect that ships logs would make interesting records to dovetail, using a topic map, with other records, such as of ports, along their voyages.

Tracking the identities of passengers and crew, cargoes, social events/conditions along the way.

Standing on their own, logs and other historical materials are of interest, but integrated with other historical records a fuller historical tapestry emerges.

Crowdsourced Astronomy…

Filed under: Astroinformatics,Crowd Sourcing — Patrick Durusau @ 10:52 am

Crowdsourced Astronomy – A Talk By Carolina Ödman-Govender by Bruce Berriman.

From the post:

This is a talk given by Carolina Ödman-Govender, given at the re:publica 13 meeting, on May 8 2013. She gives a fine general introduction to the value of crowdsourcing in astronomy, and invites people to get in touch with her if they want get involved.

Have you considered crowdsourcing for development of a topic map corpus?

April 27, 2013

The Motherlode of Semantics, People

Filed under: Conferences,Crowd Sourcing,Semantic Web,Semantics — Patrick Durusau @ 8:08 am

1st International Workshop on “Crowdsourcing the Semantic Web” (CrowdSem2013)

Submission deadline: July 12, 2013 (23:59 Hawaii time)

From the post:

1st International Workshop on “Crowdsourcing the Semantic Web” in conjunction with the 12th Interantional Seamntic Web Conference (ISWC 2013), 21-25 October 2013, in Sydney, Australia. This interactive workshop takes stock of the emergent work and chart the research agenda with interactive sessions to brainstorm ideas and potential applications of collective intelligence to solving AI hard semantic web problems.

The Global Brain Semantic Web—a Semantic Web interleaving a large number of human and machine computation—has great potential to overcome some of the issues of the current Semantic Web. In particular, semantic technologies have been deployed in the context of a wide range of information management tasks in scenarios that are increasingly significant in both technical (data size, variety and complexity of data sources) and economical terms (industries addressed and their market volume). For many of these tasks, machine-driven algorithmic techniques aiming at full automation do not reach a level of accuracy that many production environments require. Enhancing automatic techniques with human computation capabilities is becoming a viable solution in many cases. We believe that there is huge potential at the intersection of these disciplines – large scale, knowledge-driven, information management and crowdsourcing – to solve technically challenging problems purposefully and in a cost effective manner.

I’m encouraged.

The Semantic Web is going to start asking the entities (people) that originate semantics about semantics.

Going the motherlode of semantics.

Now to see what they do with the answers.

April 5, 2013

Crowdsourcing Chemistry for the Community…

Filed under: Authoring Topic Maps,Cheminformatics,Crowd Sourcing — Patrick Durusau @ 12:57 pm

Crowdsourcing Chemistry for the Community — 5 Year of Experiences by Antony Williams.

From the description:

ChemSpider is one of the internet’s primary resources for chemists. ChemSpider is a structure-centric platform and hosts over 26 million unique chemical entities sourced from over 400 different data sources and delivers information including commercial availability, associated publications, patents, analytical data, experimental and predicted properties. ChemSpider serves a rather unique role to the community in that any chemist has the ability to deposit, curate and annotate data. In this manner they can contribute their skills, and data, to any chemist using the system. A number of parallel projects have been developed from the initial platform including ChemSpider SyntheticPages, a community generated database of reaction syntheses, and the Learn Chemistry wiki, an educational wiki for secondary school students.

This presentation will provide an overview of the project in terms of our success in engaging scientists to contribute to crowdsouring chemistry. We will also discuss some of our plans to encourage future participation and engagement in this and related projects.

Perhaps not encouraging in terms of the rate of participation but certainly encouraging in terms of the impact of those who do participate.

I suspect the ratio of contributors to users isn’t that far off from those observed in open source projects.

On the whole, I take this as a plus sign for crowd-sourced curation projects, including topic maps.

I first saw this in a tweet by ChemConnector.

March 18, 2013

Crowdsourced Chemistry… [Documents vs. Data]

Filed under: Cheminformatics,Crowd Sourcing,Curation — Patrick Durusau @ 5:01 am

Crowdsourced Chemistry Why Online Chemistry Data Needs Your Help by Antony Williams. (video)

From the description:

This is the Ignite talk that I gave at ScienceOnline2010 #sci010 in the Research Triangle Park in North Carolina on January 16th 2010. This was supposed to be a 5 minute talk highlighting the quality of chemistry data on the internet. Ok, it was a little tongue in cheek because it was an after dinner talk and late at night but the data are real, the problem is real and the need for data curation of chemistry data online is real. On ChemSpider we have provided a platform to deposit and curate data. Other videos will show that in the future.

Great demonstration of the need for curation in chemistry.

And of the impact that re-usable information can have on the quality of information.

The errors in chemical descriptions you see in this video could be corrected in:

  • In an article.
  • In a monograph.
  • In a webpage.
  • In an online resource that can be incorporated by reference.

Which one do you think would propagate the corrected information more quickly?

Documents are a great way to convey information to a reader.

They are an incredibly poor way to store/transmit information.

Every reader has to extract the information in a document for themselves.

Not to mention that data is fixed, unless it has incorporated information by reference.

Funny isn’t it? We are still storing data as we did when clay tablets were the medium of choice.

Isn’t it time we separated presentation (documents) from storage/transmission (data)?

March 10, 2013

Tom Sawyer and Crowdsourcing

Filed under: Crowd Sourcing,Marketing — Patrick Durusau @ 3:15 pm

Crowdsource from your Community the Tom Sawyer Way – Community Nuggets Vol.1 (video by Dave Olson)

Crowdsource From Your Community – the Tom Sawyer Way (article by Connor Meakin)

Deeply impressive video/article.

More of the nuts and bolts of the social side of crowd sourcing.

The side that makes it so successful (or not) depending on how well you do the social side.

Makes me wonder how to adapt the lessons of crowd sourcing both for development of topic maps but also for topic maps standardization?

Suggestions/comments?

February 20, 2013

Crowdsourcing Cybersecurity: A Proposal (Part 2)

Filed under: Crowd Sourcing,Cybersecurity,Security — Patrick Durusau @ 9:29 pm

As you may already suspect, my proposal for increasing cybersecurity is transparency.

A transparency borne of crowdsourcing cybersecurity.

What are the consequences of the current cult of secrecy around cybersecurity?

Here’s my short list (feel free to contribute):

  • Governments have no source of reliable information on the security of their contractors, vendors, etc.
  • Corporations have no source of reliable information on the security of their contractors, partners and others.
  • Sysadmins outside the “inner circle” have no notice of the details of hacks, with which to protect their systems.
  • Consumers of software have no source of reliable information on how insecure software may or may not be.

Secrecy puts everyone at greater cybersecurity risk, not less.

Let’s end cybersecurity secrecy and crowdsource cybersecurity.

Here is a sketch of one way to do just that:

  1. Establish or re-use an agency or organization to offer bounties on hacks into systems.
  2. Sliding scale where penetration using published root passwords are worth less than more sophisticated hacks. But even a minimal hack is worth say $5,000.
  3. To collect the funds, a hacker must provide full hack details and proof of the hack.
  4. A hacker submitting a “proof of hackability” attack has legal immunity (civil and criminal).
  5. Hack has to be verified using the hack as submitted.
  6. Upon verification of the hack, the hacker is paid the bounty.
  7. One Hundred and Eighty (180) days after the verification of the hack, the name of the hacked organization, the full details of the hack and the hacker’s identity (subject to their permission), are published to a public website.

Finance such a proposal, if run by a government, by fines on government contractors who get hacked.

Defense contractors who aren’t cybersecure should not be defense contractors.

That’s how you stop loss of national security information.

Surprised it hasn’t occurred to anyone inside the beltway.


With greater transparency, hacks, software, origins of software, authors of software, managers of security, all become subject to mapping.

Would you hire your next security consultant from a firm that gets hacked on a regular basis?

Or would you hire a defense contractor that changed its skin to avoid identification as an “easy” hack?

Or retain a programmer who keeps being responsible for security flaws?

Transparency and a topic map could give you better answers to those questions than you have today.

Crowdsourcing Cybersecurity: A Proposal (Part 1)

Filed under: Crowd Sourcing,Cybersecurity,Security — Patrick Durusau @ 9:28 pm

Mandiant’s provocative but hardly conclusive report has created a news wave on cybersecurity.

Hardly conclusive because as Mandiant states:

we have analyzed the group’s intrusions against nearly 150 victims over seven years (page 2)

A little over twenty-one victims a year. And I thought hacking was common place. 😉

Allegations of hacking should require a factual basis other than “more buses were going the other way.” (A logical fallacy because you get on the first bus going your way.)

Here we have a tiny subset (if general hacking allegations have any credibility) of all hacking every year.

Who is responsible for the intrusions?

It is easy and commonplace to blame hackers, but there are other responsible parties.

The security industry that continues to protect the identity of the “victims” of hacks and shares hacking information with a group of insiders comes to mind.

That long standing cult of secrecy has not prevented, if you believe the security PR, a virtual crime wave of hacking.

In fact, every non-disclosed hack, leaves thousands if not hundreds of thousands of users, institutions, governments and businesses with no opportunity to protect themselves.

And, if you are hiring a contractor, say a defense contractor, isn’t their record with protecting your data from hackers a relevant concern?

If users, institutions, governments and businesses had access to the details of hacking reports, who was hacked, who in the organization was responsible for computer security, how the hack was performed, etc., then we could all better secure our computers.

Or be held accountable for failing to secure our computers. By management, customers and/or governments.

Decades of diverting attention from poor security practices, hiding those who practice poor security, and cultivating a cult of secrecy around computer security, hasn’t diminished hacking.

What part of that lesson is unclear?

Or do you deny the reports by Mandiant and others?

It really is that clear: Either Mandiant and others are inventing hacking figures out of whole clothe or the cult of cybersecurity secrecy has failed to stop hacking.

Interested? See Crowdsourcing Cybersecurity: A Proposal (Part 2) for my take on a solution.


Just as a side note, President Obama’s Executive Order — Improving Critical Infrastructure Cybersecurity appeared on February 12, 2013. Compare: Mandiant Releases Report Exposing One of China’s Cyber Espionage Groups released February 19, 2013.

Is Mandiant trying to ride on the President’s coattails as they say?

Or just being opportunistic with the news cycle?

Connected into the beltway security cult?

Hard to say, probably impossible to know. Interesting timing none the less.

I wonder who will be on the various panels, experts, contractors under the Cybersecurity executive order?

Don’t you?

February 17, 2013

Models and Algorithms for Crowdsourcing Discovery

Filed under: Crowd Sourcing — Patrick Durusau @ 5:15 pm

Models and Algorithms for Crowdsourcing Discovery by Siamak Faridani. (PDF)

From the abstract:

The internet enables us to collect and store unprecedented amounts of data. We need better models for processing, analyzing, and making conclusions from the data. In this work, crowdsourcing is presented as a viable option for collecting data, extracting patterns and insights from big data. Humans in collaboration, when provided with appropriate tools, can collectively see patterns, extract insights and draw conclusions from data. We study diff erent models and algorithms for crowdsourcing discovery.

In each section in this dissertation a problem is proposed, the importance of it is discussed, and solutions are proposed and evaluated. Crowdsourcing is the unifying theme for the projects that are presented in this dissertation. In the first half of the dissertation we study diff erent aspects of crowdsourcing like pricing, completion times, incentives, and consistency with in-lab and controlled experiments. In the second half of the dissertation we focus on Opinion Space1 and the algorithms and models that we designed for collecting innovative ideas from participants. This dissertation speci cally studies how to use crowdsourcing to discover patterns and innovative ideas.

We start by looking at the CONE Welder project2 which uses a robotic camera in a remote location to study the eff ect of climate change on the migration of birds. In CONE, an amateur birdwatcher can operate a robotic camera at a remote location from within her web browser. She can take photos of diff erent bird species and classify diff erent birds using the user interface in CONE. This allowed us to compare the species presented in the area from 2008 to 2011 with the species presented in the area that are reported by Blacklock in 1984 [Blacklock, 1984]. Citizen scientists found eight avian species previously unknown to have breeding populations within the region. CONE is an example of using crowdsourcing for discovering new migration patterns.

Crowdsourcing has great potential.

Especially if you want to discover the semantics people are using rather than dictating the semantics they ought to be using.

I think the former is more accurate than the latter.

You?

I first saw this at Christophe Lalanne’s A bag of tweets / January 2013.

February 10, 2013

The Power of Semantic Diversity

Filed under: Bioinformatics,Biology,Contest,Crowd Sourcing — Patrick Durusau @ 3:10 pm

Prize-based contests can provide solutions to computational biology problems by Karim R Lakhani, et al. (Nature Biotechnology 31, 108–111 (2013) doi:10.1038/nbt.2495)

From the article:

Advances in biotechnology have fueled the generation of unprecedented quantities of data across the life sciences. However, finding analysts who can address such ‘big data’ problems effectively has become a significant research bottleneck. Historically, prize-based contests have had striking success in attracting unconventional individuals who can overcome difficult challenges. To determine whether this approach could solve a real big-data biologic algorithm problem, we used a complex immunogenomics problem as the basis for a two-week online contest broadcast to participants outside academia and biomedical disciplines. Participants in our contest produced over 600 submissions containing 89 novel computational approaches to the problem. Thirty submissions exceeded the benchmark performance of the US National Institutes of Health’s MegaBLAST. The best achieved both greater accuracy and speed (1,000 times greater). Here we show the potential of using online prize-based contests to access individuals without domain-specific backgrounds to address big-data challenges in the life sciences.

….

Over the last ten years, online prize-based contest platforms have emerged to solve specific scientific and computational problems for the commercial sector. These platforms, with solvers in the range of tens to hundreds of thousands, have achieved considerable success by exposing thousands of problems to larger numbers of heterogeneous problem-solvers and by appealing to a wide range of motivations to exert effort and create innovative solutions18, 19. The large number of entrants in prize-based contests increases the probability that an ‘extreme-value’ (or maximally performing) solution can be found through multiple independent trials; this is also known as a parallel-search process19. In contrast to traditional approaches, in which experts are predefined and preselected, contest participants self-select to address problems and typically have diverse knowledge, skills and experience that would be virtually impossible to duplicate locally18. Thus, the contest sponsor can identify an appropriate solution by allowing many individuals to participate and observing the best performance. This is particularly useful for highly uncertain innovation problems in which prediction of the best solver or approach may be difficult and the best person to solve one problem may be unsuitable for another19.

An article that merits wider reading that it is likely to get behind a pay-wall.

A semantically diverse universe of potential solvers is more effective than a semantically monotone group of selected experts.

An indicator of what to expect from the monotone logic of the Semantic Web.

Good for scheduling tennis matches with Tim Berners-Lee.

For more complex tasks, rely on semantically diverse groups of humans.

I first saw this at: Solving Big-Data Bottleneck: Scientists Team With Business Innovators to Tackle Research Hurdles.

January 26, 2013

Human Computation and Crowdsourcing

Filed under: Artificial Intelligence,Crowd Sourcing,Human Computation,Machine Learning — Patrick Durusau @ 1:42 pm

Announcing HCOMP 2013 – Conference on Human Computation and Crowdsourcing by Eric Horvitz.

From the conference website:

Where

Palm Springs, California
Venue information coming soon

When

November 7-9, 2013

Important Dates

All deadlines are 5pm Pacific time unless otherwise noted.

Papers

Submission deadline: May 1, 2013
Author rebuttal period: June 21-28
Notification: July 16, 2013
Camera Ready: September 4, 2013

Workshops & Tutorials

Proposal deadline: May 10, 2013
Notification: July 16, 2013
Camera Ready: September 4, 2013

Posters & Demonstrations

Submission deadline: July 25, 2013
Notification: August 26, 2013
Camera Ready: September 4, 2013

From the post:

Announcing HCOMP 2013, the Conference on Human Computation and Crowdsourcing, Palm Springs, November 7-9, 2013. Paper submission deadline is May 1, 2013. Thanks to the HCOMP community for bringing HCOMP to life as a full conference, following on the successful workshop series.

The First AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2013) will be held November 7-9, 2013 in Palm Springs, California, USA. The conference was created by researchers from diverse fields to serve as a key focal point and scholarly venue for the review and presentation of the highest quality work on principles, studies, and applications of human computation. The conference is aimed at promoting the scientific exchange of advances in human computation and crowdsourcing among researchers, engineers, and practitioners across a spectrum of disciplines. Papers submissions are due May 1, 2013 with author notification on July 16, 2013. Workshop and tutorial proposals are due May 10, 2013. Posters & demonstrations submissions are due July 25, 2013.

I suppose it had to happen.

Instead of asking adding machines for their opinions, someone would decide to ask the creators of adding machines for theirs.

I first saw this at: New AAAI Conference on Human Computation and Crowdsourcing by Shar Steed.

December 13, 2012

Crowdsourcing campaign spending: …

Filed under: Crowd Sourcing,Government Data,Journalism — Patrick Durusau @ 3:43 pm

Crowdsourcing campaign spending: What ProPublica learned from Free the Files by Amanda Zamora.

From the post:

This fall, ProPublica set out to Free the Files, enlisting our readers to help us review political ad files logged with Federal Communications Commission. Our goal was to take thousands of hard-to-parse documents and make them useful, helping to reveal hidden spending in the election.

Nearly 1,000 people pored over the files, logging detailed ad spending data to create a public database that otherwise wouldn’t exist. We logged as much as $1 billion in political ad buys, and a month after the election, people are still reviewing documents. So what made Free the Files work?

A quick backstory: Free the Files actually began last spring as an effort to enlist volunteers to visit local TV stations and request access to the “public inspection file.” Stations had long been required to keep detailed records of political ad buys, but they were only available on paper and required actually traveling to the station.

In August, the FCC ordered stations in the top 50 markets to begin posting the documents online. Finally, we would be able to access a stream of political ad data based on the files. Right?

Wrong. It turns out the FCC didn’t require stations to submit the data in anything that approaches an organized, standardized format. The result was that stations sent in a jumble of difficult to search PDF files. So we decided if the FCC or stations wouldn’t organize the information, we would.

Enter Free the Files 2.0. Our intention was to build an app to help translate the mishmash of files into structured data about the ad buys, ultimately letting voters sort the files by market, contract amount and candidate or political group (which isn’t possible on the FCC’s web site), and to do it with the help of volunteers.

In the end, Free the Files succeeded in large part because it leveraged data and community tools toward a single goal. We’ve compiled a bit of what we’ve learned about crowdsourcing and a few ideas on how news organizations can adapt a Free the Files model for their own projects.

The team who worked on Free the Files included Amanda Zamora, engagement editor; Justin Elliott, reporter; Scott Klein, news applications editor; Al Shaw, news applications developer, and Jeremy Merrill, also a news applications developer. And thanks to Daniel Victor and Blair Hickman for helping create the building blocks of the Free the Files community.

The entire story is golden but a couple of parts shine brighter for me than the others.

Design consideration:

The success of Free the Files hinged in large part on the design of our app. The easier we made it for people to review and annotate documents, the higher the participation rate, the more data we could make available to everyone. Our maxim was to make the process of reviewing documents like eating a potato chip: “Once you start, you can’t stop.”

Let me re-say that: The easier it is for users to author topic maps, the more topic maps they will author.

Yes?

Semantic Diversity:

But despite all of this, we still can’t get an accurate count of the money spent. The FCC’s data is just too dirty. For example, TV stations can file multiple versions of a single contract with contradictory spending amounts — and multiple ad buys with the same contract number means radically different things to different stations. But the problem goes deeper. Different stations use wildly different contract page designs, structure deals in idiosyncratic ways, and even refer to candidates and groups differently.

All true but knowing the semantics vary ahead of time, station to station, why not map the semantics in the markets ahead of time?

Granting I second their request to the FCC to request standardized data but having standardized blocks doesn’t mean the information has the same semantics.

The OMB can’t keep the same semantics for a handful of terms in one document.

What chance is there with dozens and dozens of players in multiple documents?

November 19, 2012

Georeferencer: Crowdsourced Georeferencing for Map Library Collections

Georeferencer: Crowdsourced Georeferencing for Map Library Collections by Christopher Fleet, Kimberly C. Kowal and Petr Přidal.

Abstract:

Georeferencing of historical maps offers a number of important advantages for libraries: improved retrieval and user interfaces, better understanding of maps, and comparison/overlay with other maps and spatial data. Until recently, georeferencing has involved various relatively time-consuming and costly processes using conventional geographic information system software, and has been infrequently employed by map libraries. The Georeferencer application is a collaborative online project allowing crowdsourced georeferencing of map images. It builds upon a number of related technologies that use existing zoomable images from library web servers. Following a brief review of other approaches and georeferencing software, we describe Georeferencer through its five separate implementations to date: the Moravian Library (Brno), the Nationaal Archief (The Hague), the National Library of Scotland (Edinburgh), the British Library (London), and the Institut Cartografic de Catalunya (Barcelona). The key success factors behind crowdsourcing georeferencing are presented. We then describe future developments and improvements to the Georeferencer technology.

If your institution has a map collection or if you are interested in maps at all, you need to read this article.

There is an introduction video if you prefer: http://www.klokantech.com/georeferencer/.

Either way, you will be deeply impressed by this project.

And wondering: Can the same lessons be applied to crowd source the creation of topic maps?

November 11, 2012

Why I decided to crowdfund my research

Filed under: Crowd Sourcing,Funding,Marketing — Patrick Durusau @ 1:02 pm

Why I decided to crowdfund my research by Ethan O. Perlstein.

From the post:

For the last five years, I ran a lab in Princeton University as an independent researcher through a $1 million grant. That money ran out in September. Now my option is to apply for government grants where I have a slim chance of success. And, if unsuccessful, I have to stop research.

Over 80% of grant applications to funding agencies in the United States fail. The government is planning to make further cuts to the science budget. More disturbing is the fact that now scientists receive their first big grant at the age of 42, nearly a decade after surviving graduate school, postdoctoral fellowships and temporary faculty appointments.

That’s why I decided to experiment with the way experiments are funded. I am trying to crowdfund a basic research project. Kickstarter brought the concept of crowdfunding to my attention years ago. However, it was only in the last year that I learned about the SciFund Challenge, a “by scientists, for scientists” initiative to finance small-scale ($200 – $2,000) projects, mostly in ecology and related fields, but not much in the biomedical sciences.

Ethan researched the models use by other crowdfunded projects and this post includes pointers to that research as well as other lessons he learned along the way. Including how to visualize the network of supporters for his campaign and consequently how to reach out to new supporters.

Not for the first time, I wonder if crowdfunding would work for the production of subject specific topic maps?

That is to pick some area, a defined data set with a proposed deliverable, and then promote it for funding?

I would shy away from secret government documents unless I ran across a funder who read the Pentagon Papers from cover to cover. It’s a classic, “something that everybody wants to have read and nobody wants to read.

My problem, which you may share, is that I know what I like, not so good about what other people like. As in other people willing to contribute money.

Suggestions as to sources on what “other” people like?

Twitter trends? News programs? Movie/music reviews?

The next big question: How can topic maps increase their enjoyment of X?

I first saw news of Ethan O. Perlstein in a tweet by Duncan Hall.

October 22, 2012

New version of Get-Another-Label available

Filed under: Crowd Sourcing,Mechanical Turk,oDesk,Semantics — Patrick Durusau @ 8:49 am

New version of Get-Another-Label available by Panos Ipeirotis.

From the post:

I am often asked what type of technique I use for evaluating the quality of the workers on Mechanical Turk (or on oDesk, or …). Do I use gold tests? Do I use redundancy?

Well, the answer is that I use both. In fact, I use the code “Get-Another-Label” that I have developed together with my PhD students and a few other developers. The code is publicly available on Github.

We have updated the code recently, to add some useful functionality, such as the ability to pass (for evaluation purposes) the true answers for the different tasks, and get back answers about the quality of the estimates of the different algorithms.

Panos continues his series on the use of crowd sourcing.

Just a thought experiment at the moment but could semantic gaps between populations be “discovered” by use of crowd sourcing?

That is to create tasks that require “understanding” some implicit semantic in the task and then collecting the answer.

There being no “incorrect” answers but answers that reflect the differing perceptions of the semantics of the task.

A way to get away from using small groups of college students for such research? (Nothing against small groups of college students but they best represent small groups of college students. May need a broader semantic range.)

October 20, 2012

Why oDesk has no spammers

Filed under: Crowd Sourcing,oDesk — Patrick Durusau @ 2:37 pm

Why oDesk has no spammers by Panos Ipeirotis.

From the post:

So, in my last blog post, I described a brief outline on how to use oDesk to execute automatically a set of tasks, in a “Mechanical Turk” style (i.e., no interviews for hiring and completely computer-mediated process for posting a job, hiring, and ending a contract).

A legitimate question by appeared in the comments:

“Well, the concept is certainly interesting. But is there a compelling reason to do microstasks on oDesk? Is it because oDesk has a rating system?”

So, here is my answer: If you hire contractors on oDesk you will not run into any spammers, even without any quality control. Why is that? Is there a magic ingredient at oDesk? Short answer: Yes, there is an ingredient: Lack of anonymity!

Well, when you put it that way. 😉

Question: How open are your topic maps?

Question: Would you use lack of anonymity to prevent spam in a publicly curated topic map?

Question: If we want a lack of anonymity to provide transparency and accountability in government, why isn’t that the case with public speech?

October 15, 2012

Using oDesk for microtasks [Data Semantics – A Permanent Wait?]

Filed under: Crowd Sourcing,oDesk — Patrick Durusau @ 11:01 am

Using oDesk for microtasks by Panos Ipeirotis.

From the post:

Quite a few people keep asking me about Mechanical Turk. Truth be told, I have not used MTurk for my own work for quite some time. Instead I use oDesk to get workers for my tasks, and, increasingly, for my microtasks as well.

When I mention that people can use oDesk for micro-tasks, people get often surprised: “oDesk cannot be used through an API, it is designed for human interaction, right?” Oh well, yes and no. Yes, most jobs require some form of interviewing, but there are certainly jobs where you do not need to manually interview a worker before engaging them. In fact, with most crowdsourcing jobs having both the training and the evaluation component built in the working process, the manual interview is often not needed.

For such crowdsourcing-style jobs, you can use the oDesk API to automate the hiring of workers to work on your tasks. You can find the API at http://developers.odesk.com/w/page/12364003/FrontPage (Saying that the API page is, ahem, badly designed, is an understatement. Nevertheless, it is possible to figure out how to use it, relatively quickly, so let’s move on.)

Panos promises future posts with the results of crowd-sourcing experiments with oDesk.

Looking forward to it because waiting for owners of data to disclose semantics looks like a long wait.

Perhaps a permanent wait.

And why not?

If the owners of data “know” the semantics of their data, what advantage do they get from telling you? What is their benefit?

If you guessed “none,” go to the head of the class.

We can either wait for crumbs of semantics to drop off the table or we can setup our own table to produce semantics.

Which one sounds quicker to you?

October 11, 2012

Verification: In God We Trust, All Others Pay Cash

Filed under: Authoring Topic Maps,Crowd Sourcing — Patrick Durusau @ 10:56 am

Crowdsourcing is a valuable technique, at least if accurate information is the result. Incorrect information or noise is still incorrect information or noise, crowdsourced or not.

From PLOS ONE (not Nature or Science) comes news of progress on verification of crowdsourced information. (Verification in Referral-Based Crowdsourcing Naroditskiy V, Rahwan I, Cebrian M, Jennings NR (2012) Verification in Referral-Based Crowdsourcing. PLoS ONE 7(10): e45924. doi:10.1371/journal.pone.0045924)

Abstract:

Online social networks offer unprecedented potential for rallying a large number of people to accomplish a given task. Here we focus on information gathering tasks where rare information is sought through “referral-based crowdsourcing”: the information request is propagated recursively through invitations among members of a social network. Whereas previous work analyzed incentives for the referral process in a setting with only correct reports, misreporting is known to be both pervasive in crowdsourcing applications, and difficult/costly to filter out. A motivating example for our work is the DARPA Red Balloon Challenge where the level of misreporting was very high. In order to undertake a formal study of verification, we introduce a model where agents can exert costly effort to perform verification and false reports can be penalized. This is the first model of verification and it provides many directions for future research, which we point out. Our main theoretical result is the compensation scheme that minimizes the cost of retrieving the correct answer. Notably, this optimal compensation scheme coincides with the winning strategy of the Red Balloon Challenge.

UCSD Jacobs School of Engineering, in Making Crowdsourcing More Reliable, reported the following experience with this technique:

The research team has successfully tested this approach in the field. Their group accomplished a seemingly impossible task by relying on crowdsourcing: tracking down “suspects” in a jewel heist on two continents in five different cities, within just 12 hours. The goal was to find five suspects. Researchers found three. That was far better than their nearest competitor, which located just one “suspect” at a much later time.

It was all part of the “Tag Challenge,” an event sponsored by the U.S. Department of State and the U.S. Embassy in Prague that took place March 31. Cebrian’s team promised $500 to those who took winning pictures of the suspects. If these people had been recruited to be part of “CrowdScanner” by someone else, that person would get $100. To help spread the word about the group, people who recruited others received $1 per person for the first 2,000 people to join the group.

This has real potential!

Could use money, but what of other inducements?

What if department professors agree to substitute participation in a verified crowdsourced bibliography in place of the usual 10% class participation?

Motivation, structuring the task, are all open areas for experimentation and research.

Suggestions on areas for topic maps using this methodology?

Some other resources you may find of interest:

Tag Challenge website

Tag Challenge – Wikipedia (Has links to team pages, etc.)

August 27, 2012

Experts vs. Crowds (How to Distinguish, CIA and Drones)

Filed under: Crowd Sourcing,Intelligence — Patrick Durusau @ 9:22 am

Reporting on the intelligence community’s view of crowd-sourcing, Ken Dilanian reports:

“I don’t believe in the wisdom of crowds,” said Mark Lowenthal, a former senior CIA and State Department analyst (and 1988″Jeopardy!” champion) who now teaches classified courses about intelligence. “Crowds produce riots. Experts produce wisdom.”

I would modify Lowenthal’s assessment to read:

Crowds produce diverse judgements. Experts produce highly similar judgements.

Or to put it another way, the smaller the group, over time, the less variation you will find in opinion. And the further group opinion diverges from reality as experienced by non-group members.

No real surprise Beltway denizens failed to predict the Arab Spring. None of the concerns that led to the Arab Spring are part of the “experts” concerns. Not just on a conscious level but as a social experience.

The more diverse the opinion/experience pool, the less likely a crowd judgement is to be completely alien to reality as experienced by others.

Which is how I would explain the performance of the crowd thus far in the experiment.

Dilanian’s speculation:

Crowd-sourcing would mean, in theory, polling large groups across the 200,000-person intelligence community, or outside experts with security clearances, to aggregate their views about the strength of the Taliban, say, or the likelihood that Iran is secretly building a nuclear weapon.

reflects a failure to appreciate the nature of crowd-sourced judgements.

First, crowd-sourcing will be more effective if the “intelligence community” is only a small part of the crowd. To choose people only with security clearances I suspect automatically excludes many Taliban sympathizers. Not going to get good results if the crowd is poorly chosen.

Think of it as trying to re-create the “dance” that bees do as a means of communicating the location of pollen. I would trust the CIA to build a bee hive with only drones. And then complain that crowd behavior didn’t work.

Second, crowd-sourcing can do factual questions, like guessing the weight of an animal, but only if everyone has the same information. Otherwise, use crowd-sourcing to gauge the likely impact of policies, changes in policies, etc. Pulse of the “public” as it were.

The “likelihood that Iran is secretly building a nuclear weapon” isn’t a crowd-source question. No lack of information can counter the effort being “secret.” There is no information because, yes, Iran is keeping it secret.

Properly used, crowd-sourcing can be a very valuable tool.

The ad agencies call it public opinion polling.

Imagine appropriate polling activities on the ground in the Middle East. Asking ordinary people about their hopes, desires, and dreams. If credited over summarized and sanitized results of experts, could lead to policies that benefit the people, not to say the governments, of the Middle East. (Another reason some prefer experts. Experts support current governments.)

Los Angeles Times, in: U.S. intelligence tests crowd-sourcing against its experts.

« Newer PostsOlder Posts »

Powered by WordPress