Archive for the ‘Contest’ Category

British Library Labs – Competition 2013

Sunday, May 5th, 2013

British Library Labs – Competition 2013

Deadline for entry: Wednesday 26 June , 2013 (midnight GMT)

From the webpage:

We want you to propose an innovative and transformative project using the British Library’s digital collections and if your idea is chosen, the Labs team will work with you to make it happen and you could win a prize of up to £3,000.

From the digitisation of thousands of books, newspapers and manuscripts, the curation of UK websites, bird sounds or location data for our maps, over the last two decades we’ve been faithfully amassing a vast and wide-ranging number of digital collections for the nation. What remains elusive, however, is understanding what researchers need in place in order to unlock the potential for new discoveries within these fascinating and diverse sets of digital content.

The Labs competition is designed to attract scholars, explorers, trailblazers and software developers who see the potential for new and innovative research and development opportunities lurking within these immense digital collections. Through soliciting imaginative and transformative projects utilising this content you will be giving us a steer as to the types of new processes, platforms, arrangements, services and tools needed to make it more accessible. We’ll even throw the Library’s resources behind you to make your idea a reality.

Numerous ways to get support for developing your idea before submission.

In terms of PR for your solution (hopefully topic maps based) do note:

Prizes

Winners will get direct curatorial and financial support for completing their project from the Labs team, which may involve an expenses paid residency at the British Library for a mutually agreed period of time (dependent on the winners’ circumstances, the winning ideas, access to resources and budget allowing).

  • Winners will receive £3000 for completing their project
  • Runners-up will receive £1000 for completing their project

The work will take place between between Saturday July 6 and Monday 4 November, 2013, with the completed projects being showcased during November 2013 when prizes will be awarded.

What happens to your ideas?

All ideas will be posted on the Labs website after they have been judged. All project ideas submitted for the competition can continue to be worked on and where possible the Labs team will provide support (time and resources permitting). Well developed projects will be showcased together with the competition winners during November 2013.

This is also a good excuse to spend more time at the British Library website. I don’t spend nearly enough time there myself.

KDD Cup 2013 – Author-Paper Identification Challenge

Thursday, April 18th, 2013

KDD Cup 2013 – Author-Paper Identification Challenge

Started: 3:47 am, Thursday 18 April 2013 UTC
Ends: 12:00 am, Wednesday 12 June 2013 UTC (54 total days)

From the post:

The ability to search literature and collect/aggregate metrics around publications is a central tool for modern research. Both academic and industry researchers across hundreds of scientific disciplines, from astronomy to zoology, increasingly rely on search to understand what has been published and by whom.

Microsoft Academic Search is an open platform that provides a variety of metrics and experiences for the research community, in addition to literature search. It covers more than 50 million publications and over 19 million authors across a variety of domains, with updates added each week. One of the main challenges of providing this service is caused by author-name ambiguity. On one hand, there are many authors who publish under several variations of their own name. On the other hand, different authors might share a similar or even the same name.

As a result, the profile of an author with an ambiguous name tends to contain noise, resulting in papers that are incorrectly assigned to him or her. This KDD Cup task challenges participants to determine which papers in an author profile were truly written by a given author.

$7,500 and bragging rights.

Is there going to be a topic map entry this year?

Increasing Interoperability of Data for Social Good [$100K]

Saturday, March 23rd, 2013

Increasing Interoperability of Data for Social Good

March 4, 2013 through May 7, 2013 11:30 AM PST

Each Winner to Receive $100,000 Grant

Got your attention? Good!

From the notice:

The social sector is full of passion, intuition, deep experience, and unwavering commitment. Increasingly, social change agents from funders to activists, are adding data and information as yet one more tool for decision-making and increasing impact.

But data sets are often isolated, fragmented and hard to use. Many organizations manage data with multiple systems, often due to various requirements from government agencies and private funders. The lack of interoperability between systems leads to wasted time and frustration. Even those who are motivated to use data end up spending more time and effort on gathering, combining, and analyzing data, and less time on applying it to ongoing learning, performance improvement, and smarter decision-making.

It is the combining, linking, and connecting of different “data islands” that turns data into knowledge – knowledge that can ultimately help create positive change in our world. Interoperability is the key to making the whole greater than the sum of its parts. The Bill & Melinda Gates Foundation, in partnership with Liquidnet for Good, is looking for groundbreaking ideas to address this significant, but solvable, problem. See the website for more detail on the challenge and application instructions. Each challenge winner will receive a grant of $100,000.

From the details website:

Through this challenge, we’re looking for game-changing ideas we might never imagine on our own and that could revolutionize the field. In particular, we are looking for ideas that might provide new and innovative ways to address the following:

  • Improving the availability and use of program impact data by bringing together data from multiple organizations operating in the same field and geographical area;
  • Enabling combinations of data through application programming interface (APIs), taxonomy crosswalks, classification systems, middleware, natural language processing, and/or data sharing agreements;
  • Reducing inefficiency for users entering similar information into multiple systems through common web forms, profiles, apps, interfaces, etc.;
  • Creating new value for users trying to pull data from multiple sources;
  • Providing new ways to access and understand more than one data set, for example, through new data visualizations, including mashing up government and other data;
  • Identifying needs and barriers by experimenting with increased interoperability of multiple data sets;
  • Providing ways for people to access information that isn’t normally accessible (for using natural language processing to pull and process stories from numerous sources) and combing that information with open data sets.

Successful Proposals Will Include:

  • Identification of specific data sets to be used;
  • Clear, compelling explanation of how the solution increases interoperability;
  • Use case;
  • Description of partnership or collaboration, where applicable;
  • Overview of how solution can be scaled and/or adapted, if it is not already cross-sector in nature;
  • Explanation of why the organization or group submitting the proposal has the capacity to achieve success;
  • A general approach to ongoing sustainability of the effort.

I could not have written a more topic map oriented challenge. You?

They suggest the usual social data sites:

Apache Solr 4 Cookbook (Win a free copy)

Saturday, March 16th, 2013

Apache Solr 4 Cookbook (Win a free copy)

Deadline 28.03.2013.

From the post:

Readers would be pleased to know that we have teamed up with Packt Publishing to organize a Giveaway of the Apache Solr 4 Cookbook. Two lucky winners will win a copy of the book (in eBook format). Keep reading to find out how you can be one of the Lucky Winners.

Let’s start with a little reminder about the book:

  • Learn how to make Apache Solr search faster, more complete, and comprehensively scalable
  • Solve performance, setup, configuration, analysis, and query problems in no time
  • Get to grips with, and master, the new exciting features of Apache Solr 4

Read more about this book and download free Sample Chapter.

How to Enter ?

All you need to do is head on over to the book page (Apache Solr 4 Cookbook) and look through the product description of the book and drop a line via the comments below this post to let us know what interests you the most about this book. It’s that simple.

Product Description: http://www.packtpub.com/apache-solr-4-cookbook/book

Deadline

The contest will close on 28.03.2013. Winners will be contacted by email, so be sure to use your real email address when you comment!

Who Will Win ?

The winners will be chosen by the Solr.pl team randomly from readers entering the competition that replied with on topic comment.

If you want to increase your chances of winning, write a small review of the book using the sample chapter on Amazon.com and also forward the same post to bhavins@packtpub.com.

You would know I see this contest two (2) days about purchasing an electronic copy of this book!

I may enter the contest anyway so I can forward someone the “extra” copy of it.

Netflix Cloud Prize [$10K plus other stuff]

Friday, March 15th, 2013

Netflix Cloud Prize

Duration of Contest: 13th March 2013 to 15th September 2013.

From github:

This contest is for software developers.

Step 0 – You need your own GitHub account

Step 1 – Read the rules in the Wiki

Step 2 – Fork this repo to your own GitHub account

Step 3 – Send us your email address

Step 4 – Modify your copy of the repo as your Submission

Categories/Prizes:

We want you to build something cool using or modifying our open source software. Your submission will be a standalone program or a patch for one of our open source projects. Your submission will be judged in these categories:

  1. Best Example Application Mash-Up

  2. Best New Monkey

  3. Best Contribution to Code Quality

  4. Best New Feature

  5. Best Contribution to Operational Tools, Availability, and Manageability

  6. Best Portability Enhancement

  7. Best Contribution to Performance Improvements

  8. Best Datastore Integration

  9. Best Usability Enhancement

  10. Judges Choice Award

If you win, you’ll get US$10,000 cash, US$5000 AWS credits, a trip to Las Vegas for two, a ticket to Amazon’s user conference, and fame and notoriety (at least within Netflix Engineering).

I can see several of those categories where topic maps would make a nice fit.

You?

Yes, I have an ulterior motive. Having topic maps underlying one or more winners or even runners-up in this contest would promote topic maps and gain needed visibility.

I first saw this at: $10k prizes up for grabs in Netflix cloud contest by Elliot Bentley.

Competition: visualise open government data and win $2,000

Wednesday, February 13th, 2013

Competition: visualise open government data and win $2,000 by Simon Rogers.

Closing date: 23:59 BST on 2 April 2013

What can you do with the thousands of open government datasets? With Google and Open Knowledge Foundation we are launching a competition to find the best dataviz out there. You might even win a prize.

(graphic omitted)

Governments around the world are releasing a tidal wave of open data – on everything from spending through to crime and health. Now you can compare national, regional and city-wide data from hundreds of locations around the world.

But how good is this data? We want to see what you can do with it. What apps and visualisations can you make with this data? We want to see how the data changes the way you see the world.

In conjunction with Google and the Open Knowledge Foundation (who will be helping us judge the results), see if you can win the $2,000 prize.

All we want you to do is to take an open dataset from any government open data website (there’s a list of them at the bottom of this article) and visualise it.

The competition is open to citizens of the UK, US, France, Germany, Spain, Netherlands, Sweden. The winner will take home $2,000 and the result will be published on the Guardian Datastore on our Show and Tell site.

Here are some of the key datasets we’ve found (list below) – and feel free to bring your own data to the party – we only ask that it is freely available and open as in OpenDefinition.org.

You are visualizing data anyway, why not take a chance on free PR and $2,000?

The Power of Semantic Diversity

Sunday, February 10th, 2013

Prize-based contests can provide solutions to computational biology problems by Karim R Lakhani, et al. (Nature Biotechnology 31, 108–111 (2013) doi:10.1038/nbt.2495)

From the article:

Advances in biotechnology have fueled the generation of unprecedented quantities of data across the life sciences. However, finding analysts who can address such ‘big data’ problems effectively has become a significant research bottleneck. Historically, prize-based contests have had striking success in attracting unconventional individuals who can overcome difficult challenges. To determine whether this approach could solve a real big-data biologic algorithm problem, we used a complex immunogenomics problem as the basis for a two-week online contest broadcast to participants outside academia and biomedical disciplines. Participants in our contest produced over 600 submissions containing 89 novel computational approaches to the problem. Thirty submissions exceeded the benchmark performance of the US National Institutes of Health’s MegaBLAST. The best achieved both greater accuracy and speed (1,000 times greater). Here we show the potential of using online prize-based contests to access individuals without domain-specific backgrounds to address big-data challenges in the life sciences.

….

Over the last ten years, online prize-based contest platforms have emerged to solve specific scientific and computational problems for the commercial sector. These platforms, with solvers in the range of tens to hundreds of thousands, have achieved considerable success by exposing thousands of problems to larger numbers of heterogeneous problem-solvers and by appealing to a wide range of motivations to exert effort and create innovative solutions18, 19. The large number of entrants in prize-based contests increases the probability that an ‘extreme-value’ (or maximally performing) solution can be found through multiple independent trials; this is also known as a parallel-search process19. In contrast to traditional approaches, in which experts are predefined and preselected, contest participants self-select to address problems and typically have diverse knowledge, skills and experience that would be virtually impossible to duplicate locally18. Thus, the contest sponsor can identify an appropriate solution by allowing many individuals to participate and observing the best performance. This is particularly useful for highly uncertain innovation problems in which prediction of the best solver or approach may be difficult and the best person to solve one problem may be unsuitable for another19.

An article that merits wider reading that it is likely to get behind a pay-wall.

A semantically diverse universe of potential solvers is more effective than a semantically monotone group of selected experts.

An indicator of what to expect from the monotone logic of the Semantic Web.

Good for scheduling tennis matches with Tim Berners-Lee.

For more complex tasks, rely on semantically diverse groups of humans.

I first saw this at: Solving Big-Data Bottleneck: Scientists Team With Business Innovators to Tackle Research Hurdles.

Call for KDD Cup Competition Proposals

Sunday, February 10th, 2013

Call for KDD Cup Competition Proposals

From the post:

Please let us know if you are interested in being considered for the 2013 KDD Cup Competition by filling out the form below.

This is the official call for proposals for the KDD Cup 2013 competition. The KDD Cup is the well known data mining competition of the annual ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD-2013 conference will be held in Chicago from August 11 – 14, 2013. The competition will last between 6 and 8 weeks and the winners should be notified by end-June. The winners will be announced in the KDD-2013 conference and we are planning to run a workshop as well.

A good competition task is one that is practically useful, scientifically or technically challenging, can be done without extensive application domain knowledge, and can be evaluated objectively. Of particular interest are non-traditional tasks/data that require novel techniques and/or thoughtful feature construction.

Proposals should involve data and a problem whose successful completion will result in a contribution of some lasting value to a field or discipline. You may assume that Kaggle will provide the technical support for running the contest. The data needs to be available no later than mid-March.

If you have initial questions about the suitability of your data/problem feel free to reach out to claudia.perlich [at] gmail.com.

Do you have:

non-traditional tasks/data that require[s] novel techniques and/or thoughtful feature construction?

Is collocation of information on the basis of multi-dimensional subject identity a non-traditional task?

Does extraction of multiple dimensions of a subject identity from users require novel techniques?

If so, what data sets would you suggest using in this challenge?

I first saw this at: 19th ACM SIGKDD Knowledge Discovery and Data Mining Conference.

International Space Apps Challenge

Monday, February 4th, 2013

International Space Apps Challenge

From the webpage:

The International Space Apps Challenge is a two-day technology development event during which citizens from around the world will work together to address current challenges relevant to both space exploration and social need.

NASA believes that mass collaboration is key to creating and discovering state-of-the-art technology. The International Space Apps Challenge aims to engage YOU in developing innovative solutions to our toughest challenges.

Join us on April 20-21, 2013, as we join together cities around the world to be part of pioneering the future. Sign up to be notified when registration opens in early 2013!

The list of challenges will be released around March 15th, spaceappschallenge.org.

I won’t be able to attend in person but would be interested in participating with others should a semantic integration challenge come up.

I first saw this at: NASA launches second International Space Apps Challenge by Alex Howard.

Saturday 23rd February is Open Data Day 2013!

Thursday, January 31st, 2013

Saturday 23rd February is Open Data Day 2013! from AIMS.

From the post:

Open Data Day is a gathering of citizens in cities around the world to write applications, liberate data, create visualizations and publish analyses using open public data to show support for and encourage the adoption of open data policies by the world’s local, regional and national governments. There are Open Data Day events taking place all around the world.

Are you are planning to organize or participate in one of these events? Are you going to launch new open data catalogs on the Open Data Day? Share with us your plans and highlight events that might be of interest for the agricultural information management community.

Know more at http://opendataday.org/

As of today: 52 events.

Anyone interested in a virtual event on Open Data Day using open data and topic maps?

App-lifying USGS Earth Science Data

Thursday, January 10th, 2013

App-lifying USGS Earth Science Data

Challenge Dates:

Submissions: January 9, 2013 at 9:00am EST – Ends April 1, 2013 at 11:00pm EDT.

Public Voting: April 5, 2013 at 5:00pm EDT – Ends April 25, 2013 at 11:00pm EDT.

Judging: April 5, 2013 at 5:00pm EDT – Ends April 25, 2013 at 11:00pm EDT.

Winners Announced: April 26, 2013 at 5:00pm EDT.

From the webpage:

USGS scientists are looking for your help in addressing some of today’s most perplexing scientific challenges, such as climate change and biodiversity loss. To do so requires a partnership between the best and the brightest in Government and the public to guide research and identify solutions.

The USGS is seeking help via this platform from many of the Nation’s premier application developers and data visualization specialists in developing new visualizations and applications for datasets.

USGS datasets for the contest consist of a range of earth science data types, including:

  • several million biological occurrence records (terrestrial and marine);
  • thousands of metadata records related to research studies, ecosystems, and species;
  • vegetation and land cover data for the United States, including detailed vegetation maps for the National Parks; and
  • authoritative taxonomic nomenclature for plants and animals of North America and the world.

Collectively, these datasets are key to a better understanding of many scientific challenges we face globally. Identifying new, innovative ways to represent, apply, and make these data available is a high priority.

Submissions will be judged on their relevance to today’s scientific challenges, innovative use of the datasets, and overall ease of use of the application. Prizes will be awarded to the best overall app, the best student app, and the people’s choice.

Of particular interest for the topic maps crowd:

Data used – The app must utilize a minimum of 1 DOI USGS Core Science and Analytics (CSAS) data source, though they need not include all data fields available in a particular resource. A list of CSAS databases and resources is available at: http://www.usgs.gov/core_science_systems/csas/activities.html. The use of data from other sources in conjunction with CSAS data is encouraged.

CSAS has a number of very interesting data sources. Classifications, thesauri, data integration, metadata and more.

Contest wins you a recognition and bragging rights, not to mention visibility for your approach.

The “Ask Bigger Questions” Contest!

Monday, November 19th, 2012

The “Ask Bigger Questions” Contest! by Ryan Goldman. (Deadline, Feb. 1 2013)

From the post:

Have you helped your company ask bigger questions? Our mission at Cloudera University is to equip Hadoop professionals with the skills to manage, process, analyze, and monetize more data than they ever thought possible.

Over the past three years, we’ve heard many great stories from our training participants about faster cluster deployments, complex data workflows made simple, and superhero troubleshooting moments. And we’ve heard from executives in all types of businesses that staffing Cloudera Certified professionals gives them confidence that their Hadoop teams have the skills to turn data into breakthrough insights.

Now, it’s your turn to tell us your bigger questions story! Cloudera University is seeking tales of Hadoop success originating with training and certification. How has an investment in your education paid dividends for your company, team, customer, or career?

The most compelling stories chosen from all entrants will receive prizes like Amazon gift cards, discounted Cloudera University training, autographed copies of Hadoop books from O’Reilly Media, and Cloudera swag. We may even turn your story into a case study!

Sign up to participate here. Submissions must be received by Friday, Feb. 1, 2013 to qualify for a prize.

A good marketing technique that might bear imitation.

Don’t have to seek out success stories. Incentives for people to bring them to you.

You get good marketing material that is likely to resonate with other users.

Something to think about.

NASA Tournament Lab to Launch Big Data Challenge Series for U.S. Government Agencies

Thursday, October 4th, 2012

Big Data Challenge Series: NASA Tournament Lab to Launch Big Data Challenge Series for U.S. Government Agencies

Contest ends: Nov 12, 2012 05:00 PM EST

From the webpage:

NASA, the National Science Foundation (NSF), and the Department of Energy’s Office of Science, announced Oct. 3, 2012, the launch of the Big Data Challenge – a series of ideation competitions hosted through the NASA Tournament Lab (NTL). The Big Data Challenge series will apply the process of Open Innovation (OI) to the goal of conceptualizing new and novel approaches to utilizing “Big Data” information sets residing in various agency silos while remaining consistent with individual United States agencies missions related to the field of health, energy and earth sciences.

Competitors will be tasked with imagining analytical techniques and software tools that utilize Big Data from discrete government information domains and then describing how they may be shared as universal, cross-agency solutions that transcend the limitations of individual silos. The competition will be run by the NASA Tournament Lab (NTL), a collaboration between Harvard University and TopCoder, a competitive community of digital creators.

“The ability to create new applications and algorithms using diverse data sets is a key element of the NTL,” said Jason Crusan, Director of Advanced Exploration Systems at NASA’s Human Exploration and Operations Mission Directorate. “NASA is excited to see the results that open innovation can provide to these big data applications.”

You have to go to: studio.topcoder.com and have a topcoder account (but you have that already).

More than beer money and in time for the holiday season. Something to think about.

DATA MINING: Accelerating Drug Discovery by Text Mining of Patents

Sunday, August 12th, 2012

DATA MINING: Accelerating Drug Discovery by Text Mining of Patents

From the contest page:

Patent documents contain important research that is valuable to the industry, business, law, and policy-making communities. Take the patent documents from the United States Patent and Trademark Office (USPTO) as examples. The structured data include: filing date, application date, assignees, UPC (US Patent Classification) codes, IPC codes, and others, while the unstructured segments include: title, abstract, claims, and description of the invention. The description of the invention can be further segmented into field of the invention, background, summary, and detailed description.

Given a set of “Source” patents or documents, we can use text mining to identify patents that are “similar” and “relevant” for the purpose of discovery of drug variants. These relevant patents could further be clustered and visualized appropriately to reveal implicit, previously unknown, and potentially useful patterns.

The eventual goal is to obtain a focused and relevant subset of patents, relationships and patterns to accelerate discovery of variations or evolutions of the drugs represented by the “source” patents.

Timeline:

  • July 19, 2012 – Start of the Contest Part 1
  • August 23, 2012 – Deadline for Submission of Onotolgy delieverables 
  • August 24 to August 29, 2012 – Crowdsourced And Expert Evaluation for Part 1. NO SUBMISSIONS ACCEPTED for contest during this week.
  • Milestone 1: August 30, 2012 – Winner for Part 1 contest announced and Ontology release to the community for Contest Part 2
  • Aug. 31 to Sept. 21, 2012 – Contest Part 2 Begins – Data Exploration / Text Mining of Patent Data
  • Milestone 2: Sept. 21, 2012 – Deadline for Submission Contest Part 2. FULL CONTEST CLOSING.
  • Sept. 22 to Oct. 5, 2012 – Crowdsourced and Expert Evaluation for contest Part 2
  • Milestone 3: Oct. 5, 2012 – Conditional Winners Announcement 

Possibly fertile ground for demonstrating the value of topic maps.

Particularly if you think of topic maps as curating search strategies and results.

Think about that for a moment: curating search strategies and results.

We have all asked reference librarians or other power searchers for assistance and watched while they discovered resources we didn’t imagine existed.

What if for medical expert searchers, we curate the “search request” along with the “search strategy” and the “result” of that search?

Such that we can match future search requests up with likely search strategies?

What we are capturing is the experts understanding and recognition of subjects not apparent to the average user. Capturing it in such a way as to make use of it again in the future.

If you aren’t interested in medical research, how about: Accelerating Discovery of Trolls by Text Mining of Patents? ;-)

I first saw this at KDNuggets.


Update: 13 August 2012

Tweet by Lars Marius Garshol points to: Patent troll Intellectual Ventures is more like a HYDRA.

Even a low-end estimate – the patents actually recorded in the USPTO as being assigned to one of those shells – identifies around 10,000 patents held by the firm.

At the upper end of the researchers’ estimates, Intellectual Ventures would rank as the fifth-largest patent holder in the United States and among the top fifteen patent holders worldwide.

As sad as that sounds, remember this is one (1) troll. There are others.

MyMoneyAppUp by U.S. Department of the Treasury – $25,000

Wednesday, June 27th, 2012

MyMoneyAppUp by U.S. Department of the Treasury – $25,000

Submission period: June 27 – August 12, 2012.

Prizes:

1st: $10,000

2nd: $5,000 (2)

3rd: $2,500 (2)

From the webpage:

The MyMoneyAppUp Challenge, launched by the U.S. Treasury Department in partnership with the D2D Fund and Center for Financial Services Innovation, is a contest intended to motivate American entrepreneurs, software developers, the public, and students to propose the best ideas and designs for next-generation mobile tools to help Americans control and shape their financial futures. The Challenge calls for mobile app ideas (IdeaBank) and designs (App Design), with cash prizes awarded to the best submissions. Competitors are encouraged to propose mobile apps that incorporate data to empower consumers, as part of Treasury’s initiative to promote Smart Disclosure. MyMoneyAppUp competitors who want to take their winning ideas to the next step and develop prototypes may enter the FinCapDev Competition, a complementary competition sponsored exclusively by D2D and CFSI at the conclusion of the MyMoneyAppUp Challenge. Support for prizes and the administration of the Challenge by CFSI and D2D for the MyMoneyAppUp Challenge comes from the Ford Foundation, Omidyar Network, and the Citi Foundation.

Sounds like a place where topic maps could play a role.

From something as simple as integrating balances from specified accounts or drafts on those accounts, to provide users with projected balances. Could even include projected credit card balances with interest rates.

Need a kill switch for the credit card one, at least while you are buying me a book present online. No particular holiday required. ;-)

It’s not a lot of money but a good opportunity to build street cred for topic maps.

Heterogeneous data structures are the rule in the finance community.

PS: When some friend of yours says, “Oh, but we can use X to map between heterogeneous data structures.,” your response should be: “Sure, and when you move up in management, how do we know why that mapping exists?” “Or add to it?”

Fixed mappings are useful, but also repetitively expensive.

Data Prospecting

Thursday, June 7th, 2012

Derrick Harris writes in: Kaggle is now crowdsourcing big data creativity about a new product from Kaggle, Kaggle Prospect:

The Kaggle Prospect homepage says:

Kaggle Prospect is an open data exploration and problem identification platform that lets organizations with large datasets solicit proposals from the best minds in our 40,000 strong community of predictive modeling and machine learning experts. The experts will peer-review each others ideas’ and we’ll present you with the short list of what problems your data could answer.

If you are sitting on a gold mine of data, but aren’t sure where to start digging, Kaggle Prospect is the place to start.

Kaggle Prospect has a great deal of promise. Assuming enough users can pry data out of data silos for submission. ;-)

If you are not familiar with Kaggle contests, see: Kaggle.

PS: I like the Kaggle headline:

We’re making data science a sport.™

Google BigQuery and the Github Data Challenge

Wednesday, May 2nd, 2012

Google BigQuery and the Github Data Challenge

Deadline May 21, 2012

From the post:

Github has made data on its code repositories, developer updates, forks etc. from the public GitHub timeline available for analysis, and is offering prizes for the most interesting visualization of the data. Sounds like a great challenge for R programmers! The R language is currently the 26th most popular on GitHub (up from #29 in December), and it would be interesting to visualize the usage of R compared to other languages, for example. The deadline for submissions to the contest is May 21.

Interestingly, GitHub has made this data available on the Google BigQuery service, which is available to the public today. BigQuery was free to use while it was in beta test, but Google is now charging for storage of the data: $0.12 per gigabyte per month, up to $240/month (the service is limited to 2TB of storage – although there a Premier offering that supports larger data sizes … at a price to be negotiated). While members of the public can run SQL-like queries on the GitHub data for free, Google is charging subscribers to the service 3.5 cents per Gb processed in the query: this is measured by the source data accessed (although columns of data not referenced aren't counted); the size of the result set doesn't matter.

Watch your costs but thoughts on how you would visualize the data?

NYC BigApps

Wednesday, April 25th, 2012

NYC BigApps

From the webpage:

New York City is challenging software developers to create apps that use city data to make NYC better.

There are three completed contests (one just ended) that resulted in very useful applications.

NYC BigApps 3.0 resulted in:

NYC Facets: Best Overall Application – Grand Prize – Explores and visualizes more than 1 million facts about New York City.

Work+: Best Overall Application – Second – Prices – Working from home not working for you? Discover new places to get things done.

Funday Genie: Investor’s Choice Application – The Funday Genie is an application for planning a free day. Our unique scheduling and best route algorithm creates a smart personalized day-itinerary of things to do, including events, attractions, restaurants, shopping, and more, based on the user’s preferences. Everyday can be a Funday.

among others.

Quick question: How would you interchange information between any two of these apps? Or if you like, any other two apps in this or prior contests?

Second question: How would you integrate additional information into any of these apps, prepared for use by another application?

Topic maps can:

  • collate information for display.
  • power re-usable and extensible mappings of data into other formats.
  • augment data for applications that lack merging semantics.

Where is your data today and where would you like for it to be tomorrow?

Third Challenge on Large Scale Hierarchical Text Classification

Monday, April 16th, 2012

ECML/PKDD 2012 Discovery Challenge: Third Challenge on Large Scale Hierarchical Text Classification

Important dates:

- March 30, start of the challenge
- April 20, opening of the evaluation
- June 29, closing of evaluation
- July 20, paper submission deadline
- August 3, paper notifications

From the website:

This year’s discovery challenge hosts the third edition of the successful PASCAL challenges on large scale hierarchical text classification. The challenge comprises three tracks and it is based on two large datasets created from the ODP web directory (DMOZ) and Wikipedia. The datasets are multi-class, multi-label and hierarchical. The number of categories ranges between 13,000 and 325,000 roughly and the number of documents between 380,000 and 2,400,000.

The tracks of the challenge are organized as follows:

1. Standard large-scale hierarchical classification
a) On collection of medium size from Wikipedia
b) On a large collection from Wikipedia

2. Multi-task learning, based on both DMOZ and Wikipedia category systems

3. Refinement-learning
a) Semi-Supervised approach
b) Unsupervised approach

In order to register for the challenge and gain access to the datasets you must have an account at the challenge Web site.

More fun than repeating someone’s vocabulary. Yes?

CloudSpokes Coding Challenge Winners – Build a DynamoDB Demo

Saturday, April 14th, 2012

CloudSpokes Coding Challenge Winners – Build a DynamoDB Demo

From the post:

Last November CloudSpokes was invited to participate in the DynamoDB private beta. We spent some time kicking the tires, participating in the forums and developing use cases for their Internet-scale NoSQL database service. We were really excited about the possibilities of DynamoDB and decided to crowdsource some challenge ideas from our 38,000 strong developer community. Needless to say, the release generated quite a bit of buzz.

When Amazon released DynamoDB in January, we launched our CloudSpokes challenge Build an #Awesome Demo with Amazon DynamoDB along with a blog post and a sample ”Kiva Loan Browser Demo” application to get people started. The challenge requirements were wide open and all about creating the coolest application using Amazon DynamoDB. We wanted to see what the crowd could come up with.

The feedback we received from numerous developers was extremely positive. The API was very straightforward and easy to work with. The SDKs and docs, as usual, were top-notch. Developers were able to get up to speed fast as DynamoDB’s simple storage and query methods were easy to grasp. These methods allowed developers to store and access data items with a flexible number of attributes using the simple “Put” or “Get” verbs that they are familiar with. No surprise here, but we had a number of comments regarding the speed of both read and write operations.

When our challenge ended a week later we were pleasantly surprised with the applications and chose to highlight the following top five:

I don’t think topic maps has 38,000 developers but challenges do seem to pull people out of the woodwork.

Any thoughts on what would make interesting/attractive challenges? Other than five figure prizes? ;-)

Plastic Surgeon Holds Video Contest, Offers Free Nose Job to Winner

Wednesday, March 14th, 2012

Plastic Surgeon Holds Video Contest, Offers Free Nose Job to Winner by Tim Nudd.

From the post:

Plastic surgeons aren’t known for their innovating marketing. But then, Michael Salzhauer isn’t your ordinary plastic surgeon. He’s “Dr. Schnoz,” the self-described “Nose King of Miami,” and he’s got an unorthodox offer for would-be patients—a free nose job to the winner of a just-announced video contest.

Can’t give away a nose job but what about a topic map?

What sort of contest should we have?

What would you do for a topic map?

Flavorwocky

Saturday, February 25th, 2012

Flavorwocky

Another Neo4j challenge contender!

Lists foods that go well together.

I tried “rice” and “red beans” did not come up. :-(

I will have to add that tomorrow.

FrostyMug – Beer Rating/Recommendation Service

Saturday, February 25th, 2012

Similarity-based Recommendation Engines by Josh Adell.

From the post:

I am currently participating in the Neo4j-Heroku Challenge. My entry is a — as yet, unfinished — beer rating and recommendation service called FrostyMug. All the major functionality is complete, except for the actual recommendations, which I am currently working on. I wanted to share some of my thoughts and methods for building the recommendation engine.

I hear “similarity” as a measure of subject identity: beers recommended to X; movies enjoyed by Y users, even though those are group subjects.

Or perhaps better, as a possible means of subject identity. A person could list all the movies they have enjoyed and that list be the same as a recommendation list. Same subject, just a different method of identification. (Unless the means of subject identification has an impact on the subject you think is being identified.)

3rd Globals Challenge

Wednesday, January 25th, 2012

3rd Globals Challenge

Contest starts: 10 Feb 12 18:00 EST
Contest ends: 17 Feb 12 18:00 EST

Topic mappers take note:

All applications must be built using Globals. However, you are also allowed to use additional technologies to supplement Globals (emphasis added, additional technologies, unlike some linked data competitions)

The email I got reports:

  • A cash prize of USD $3,500 for the winning entry
  • A press release announcing the winning participant and solution
  • A chance to win a free registration for the InterSystems Global Summit

You might want to drop by Globals to grab a copy of the software and read up on the documentation.

You can also see the prior challenges. These are non-trivial events but that also means you will learn a lot in the process.

Neo4j Challenge – Seed the Cloud

Wednesday, January 18th, 2012

Neo4j Challenge

Important Dates: January 18 – February 13, 2012

From the challenge webpage:

Challenge: Seed the Cloud

Join Neo4j on Heroku, then help others get started by creating a Heroku-ready template or demo application using Neo4j.

The best project templates will win recognition and prizes. Use any language, any framework, with Neo4j!

  1. Create a Project using the Neo4j Add-on
  2. Share the Project as a Template on Gensen
  3. Win a place in the clouds (and cool prizes)

Neo4j has thrown down their gage. Will you be the one that picks it up?

Digging into Data Challenge

Thursday, January 5th, 2012

Digging into Data Challenge

From the homepage:

What is the “challenge” we speak of? The idea behind the Digging into Data Challenge is to address how “big data” changes the research landscape for the humanities and social sciences. Now that we have massive databases of materials used by scholars in the humanities and social sciences — ranging from digitized books, newspapers, and music to transactional data like web searches, sensor data or cell phone records — what new, computationally-based research methods might we apply? As the world becomes increasingly digital, new techniques will be needed to search, analyze, and understand these everyday materials. Digging into Data challenges the research community to help create the new research infrastructure for 21st century scholarship.

Winners for Round 2, some 14 projects out of 67, were announced on 3 January 2012.

Interested to hear your comments on the projects as I am sure the projects would as well.

Topical Classification of Biomedical Research Papers

Monday, January 2nd, 2012

JRS 2012 Data Mining Competition: Topical Classification of Biomedical Research Papers

From the webpage:

JRS 2012 Data Mining Competition: Topical Classification of Biomedical Research Papers, is a special event of Joint Rough Sets Symposium (JRS 2012, http://sist.swjtu.edu.cn/JRS2012/) that will take place in Chengdu, China, August 17-20, 2012. The task is related to the problem of predicting topical classification of scientific publications in a field of biomedicine. Money prizes worth 1,500 USD will be awarded to the most successful teams. The contest is funded by the organizers of the JRS 2012 conference, Southwest Jiaotong University, with support from University of Warsaw, SYNAT project and TunedIT.

Introduction: Development of freely available biomedical databases allows users to search for documents containing highly specialized biomedical knowledge. Rapidly increasing size of scientific article meta-data and text repositories, such as MEDLINE [1] or PubMed Central (PMC) [2], emphasizes the growing need for accurate and scalable methods for automatic tagging and classification of textual data. For example, medical doctors often search through biomedical documents for information regarding diagnostics, drugs dosage and effect or possible complications resulting from specific treatments. In the queries, they use highly sophisticated terminology, that can be properly interpreted only with a use of a domain ontology, such as Medical Subject Headings (MeSH) [3]. In order to facilitate the searching process, documents in a database should be indexed with concepts from the ontology. Additionally, the search results could be grouped into clusters of documents, that correspond to meaningful topics matching different information needs. Such clusters should not necessarily be disjoint since one document may contain information related to several topics. In this data mining competition, we would like to raise both of the above mentioned problems, i.e. we are interested in identification of efficient algorithms for topical classification of biomedical research papers based on information about concepts from the MeSH ontology, that were automatically assigned by our tagging algorithm. In our opinion, this challenge may be appealing to all members of the Rough Set Community, as well as other data mining practitioners, due to its strong relations to well-founded subjects, such as generalized decision rules induction [4], feature extraction [5], soft and rough computing [6], semantic text mining [7], and scalable classification methods [8]. In order to ensure scientific value of this challenge, each of participating teams will be required to prepare a short report describing their approach. Those reports can be used for further validation of the results. Apart from prizes for top three teams, authors of selected solutions will be invited to prepare a paper for presentation at JRS 2012 special session devoted to the competition. Chosen papers will be published in the conference proceedings.

Data sets became available today.

This is one of those “praxis” opportunities for topic maps.

USEWOD 2012 Data Challenge

Wednesday, December 7th, 2011

USEWOD 2012 Data Challenge

From the website:

The USEWOD 2012 Data Challenge invites research and applications built on the basis of USEWOD 2012 Dataset.

Accepted submissions will be presented at USEWOD2012, where a winner will be chosen. Examples of analyses and research that could be done with the dataset are the following (but not limited to those):

  • correlations between linked data requests and real-world events
  • types of structured queries
  • linked data access vs. conventional access
  • analysis of user agents visiting the sites
  • geographical analysis of requests
  • detection and visualisation of trends
  • correlations between site traffic and available datasets
  • etc. – let your imagination run wild!

USEWOD 2012 Dataset

The USEWOD dataset consists of server logs from from two major web servers publishing datasets on the Web
of linked data. In particular, the dataset contains logs from:

  • DBPedia: slices of log data
    spanning several months from
    the linked data twin of Wikipedia, one of the focal points of the Web of data.
    The logs were kindly made available to us for the challenge
    by OpenLink Software!
    Further details about this part of the dataset to follow.
  • SWDF:
    Semantic Web Dog Food is a
    constantly growing dataset of publications, people and organisations in the Web and Semantic Web area,
    covering several of the major conferences and workshops, including WWW, ISWC and ESWC. The logs
    contain two years of requests to the server from about 12/2008 until 12/2010.
  • Linked Open Geo Data A dataset about geographical data.
  • Bio2RDF Linked Data for life sciences.

Data sets are still under construction. Organizers advise that data sets should be available next week.

Your results should be reported as short papers and are due by 15 February 2011.

2nd Globals Challenge

Friday, December 2nd, 2011

2nd Globals Challenge

Just a few hours left until the start of the 2nd Globals Challenge so I am sending this on its way.

Details being released at 18:00 EST on 2 December 2011!

Prizes to be awarded!

Maybe by this time next year we could organize something like this for topic maps. That would be way cool!