Archive for the ‘Challenges’ Category

Re-imagining Legislative Data – Semantic Integration Alert

Tuesday, June 27th, 2017

Innovate, Integrate, and Legislate: Announcing an App Challenge by John Pull.

From the post:

This morning, on Tuesday, June 27, 2017, Library of Congress Chief Information Officer Bernard A. Barton, Jr., is scheduled to make an in-person announcement to the attendees of the 2017 Legislative Data & Transparency Conference in the CVC. Mr. Barton will deliver a short announcement about the Library’s intention to launch a legislative data App Challenge later this year. This pre-launch announcement will encourage enthusiasts and professionals to bring their app-building skills to an endeavor that seeks to create enhanced access and interpretation of legislative data.

The themes of the challenge are INNOVATE, INTEGRATE, and LEGISLATE. Mr. Barton’s remarks are below:

Here in America, innovation is woven into our DNA. A week from today our nation celebrates its 241st birthday, and those years have been filled with great minds who surveyed the current state of affairs, analyzed the resources available to them, and created devices, systems, and ways of thinking that created a better future worldwide.

The pantheon includes Benjamin Franklin, George Washington Carver, Alexander Graham Bell, Bill Gates, and Steve Jobs. It includes first-generation Americans like Nikolai Tesla and Albert Einstein, for whom the nation was an incubator of innovation. And it includes brilliant women such as Grace Hopper, who led the team that invented the first computer language compiler, and Shirley Jackson, whose groundbreaking research with subatomic particles enabled the inventions of solar cells, fiber-optics, and the technology the brings us something we use every day: call waiting and caller ID.

For individuals such as these, the drive to innovate takes shape through understanding the available resources, surveying the landscape for what’s currently possible, and taking it to the next level. It’s the 21st Century, and society benefits every day from new technology, new generations of users, and new interpretations of the data surrounding us. Social media and mobile technology have rewired the flow of information, and some may say it has even rewired the way our minds work. So then, what might it look like to rewire the way we interpret legislative data?

It can be said that the legislative process – at a high level – is linear. What would it look like if these sets of legislative data were pushed beyond a linear model and into dimensions that are as-yet-unexplored? What new understandings wait to be uncovered by such interpretations? These understandings could have the power to evolve our democracy.

That’s a pretty grand statement, but it’s not without basis. The sets of data involved in this challenge are core to a legislative process that is centuries old. It’s the source code of America government. An informed citizenry is better able to participate in our democracy, and this is a very real opportunity to contribute to a better understanding of the work being done in Washington. It may even provide insights for the people doing the work around the clock, both on the Hill, and in state and district offices. Your innovation and integration may ultimately benefit the way our elected officials legislate for our future.

Improve the future, and be a part of history.

The 2017 Legislative Data App Challenge will launch later this summer. Over the next several weeks Information will be made available at, and individuals are invited to connect via

I mention this as a placeholder only because Pull’s post is general enough to mean several things, their opposites or something entirely different.

The gist of the post is that later this summer (2017), a challenge involving an “app” will be announced. The “app” will access/deliver/integrate legislative data. Beyond that, no further information is available at this time.

Watch for future posts as more information becomes available.

Microsoft Quantum Challenge [Deadline April 29, 2016.]

Tuesday, February 2nd, 2016

Microsoft Quantum Challenge

From the webpage:

Join students from around the world to investigate and solve problems facing the quantum universe using Microsoft’s simulator, LIQUi|>.

Win big prizes, or the opportunity to interview for internships at Microsoft Research.

Objectives of the Quantum Challenge

The Quantum Architectures and Computing Group QuArC is seeking exceptional students!

WE want to find students who are eager to expand their knowledge of quantum computing, and who can translate thoughts into programs. Thereby we will expand the use of Microsoft’s Quantum Simulator LIQUi|>.

How to enter

First of all, REGISTER for the Challenge so that you can receive updates about the contest.

In the challenge you will use the LIQUi|> simulator to solve a novel problem and then report on your findings. So, think of a project. Then, download the simulator from GitHub and work with it to solve your problem. Finally, write a report about your findings and submit it. Your report submission will enter you into the Challenge.

In the report, present a description of the project including goals, methods, challenges, and any result obtained using LIQUi|>. You do not need to submit circuits and the software you develop, however, sample input and output for LIQUi|> must be submitted to show you used the simulator in the project. Your entry must consist of six pages or less, in PDF format.

The Challenge is open to students at colleges and universities world-wide (with a few restrictions) and aged 18+. NO PURCHASE NECESSARY. For full details, see the Official Rules

The prizes

 The Quantum Challenge is your change to win a big prize!

  • First Prize:  $5,000
  • Second Prizes:   Four at $2,500
  • Honorary Mention: Certificates will be presented to runner-up entries

Extra – visits or internship interviews

As a result of the challenge, some entrants could be invited to visit the QuArC team at Microsoft Research in Redmond, or have an opportunity to interview for internships at Microsoft Research. Internships are highly prestigious and involve working with the QuArC team for 12 weeks on cutting edge research.

If you are young enough to enter, just a word of warning about the “big prize.” $5,000 today isn’t a “big prize.” Maybe a nice weekend if you keep it low key but only just.

Interaction with the QuArC team, either by winning or in online discussions is the real prize.

Besides, who need $5,000 if you can break quantum encrypted bank transfer orders? 😉

Yelp Dataset Challenge

Saturday, February 21st, 2015

Yelp Dataset Challenge

From the webpage:

Yelp Dataset Challenge is doubling up: Now 10 cities across 4 countries! Two years, four highly competitive rounds, over $35,000 in cash prizes awarded and several hundred peer-reviewed papers later: the Yelp Dataset Challenge is doubling up. We are proud to announce our latest dataset that includes information about local businesses, reviews and users in 10 cities across 4 countries. The Yelp Challenge dataset is much larger and richer than the Academic Dataset. This treasure trove of local business data is waiting to be mined and we can’t wait to see you push the frontiers of data science research with our data.

The Challenge Dataset:

  • 1.6M reviews and 500K tips by 366K users for 61K businesses
  • 481K business attributes, e.g., hours, parking availability, ambience.
  • Social network of 366K users for a total of 2.9M social edges.
  • Aggregated check-ins over time for each of the 61K businesses
  • The deadline for the fifth round of the Yelp Dataset Challenge is June 30, 2015. Submit your project to Yelp by visiting You can submit a research paper, video presentation, slide deck, website, blog, or any other medium that conveys your use of the Yelp Dataset Challenge data.

    Pitched at students but it is an interesting dataset.

    I first saw this in a tweet by Marin Dimitrov.

    Bringing biodiversity information to life

    Thursday, December 11th, 2014

    Bringing biodiversity information to life

    From the post:

    The inaugural GBIF Ebbe Nielsen Challenge aims to inspire scientists, informaticians, data modelers, cartographers and other experts to create innovative applications of open-access biodiversity data.


    For the past 12 years, GBIF has awarded the Ebbe Nielsen Prize to recognize outstanding contributions to biodiversity informatics while honouring the legacy of Ebbe Nielsen, one of the principal founders of GBIF, who tragically died just before it came into being.

    The Science Committee, working with the Secretariat, has revamped the award for 2015 as the GBIF Ebbe Nielsen Challenge. This open incentive competition seeks to encourage innovative uses of the more than half a billion species occurrence records mobilized through GBIF’s international network. These creative applications of GBIF-mediated data may come in a wide variety of forms and formats—new analytical research, richer policy-relevant visualizations, web and mobile applications, improvements to processes around data digitization, quality and access, or something else entirely. Judges will evaulate submissions on their innovation, functionality and applicability.

    As a simple point of departure, participants may wish to review the visual analyses of trends in mobilizing species occurrence data at global and national scales recently unveiled on Challenge submissions may build on such creations and propose uses or extensions that make GBIF-mediated data even more useful to researchers, policymakers, educators, students and citizens alike.

    A jury composed of experts from the biodiversity informatics community will judge the Round One entries collected through this ChallengePost website on their innovation, functionality and applicability, before selecting three to six finalists to compete for a €20,000 First Prize later in 2015.

    You can’t argue with the judging criteria:


    How novel is the submission? A significant portion of the submission should be developed for the challenge. A submission based largely (or entirely) on work published or developed prior to the challenge start date will not be eligible for submission.


    Does the submission work and show or do something useful?


    Can the GBIF and biodiversity informatics communities use and/or build on the submission?

    Deadline: Tuesday, 3 March 2015 at 5pm CET.

    An obvious opportunity to introduce the biodiversity community to topic maps!

    Oh, there is a €20,000 first prize and €5,000 second prize. Just something to pique your interest. 😉

    Data Science Challenge 3

    Friday, October 24th, 2014

    Data Science Challenge 3

    From the post:

    Challenge Period

    The Fall 2014 Data Science Challenge runs October 11, 2014 through January 21, 2015.

    Challenge Prerequisite

    You must pass Data Science Essentials (DS-200) prior to registering for the Challenge.

    Challenge Description

    The Fall 2014 Data Science Challenge incorporates three independent problems derived from real-world scenarios and data sets. Each problem has its own data, can be solved independently, and should take you no longer than eight hours to complete. The Fall 2014 Challenge includes problems dealing with online travel services, digital advertising, and social networks.

    Problem 1: SmartFly
    You have been contacted by a new online travel service called SmartFly. SmartFly provides its customers with timely travel information and notifications about flights, hotels, destination weather, and airport traffic, with the goal of making your travel experience smoother. SmartFly’s product team has come up with the idea of using the flight data that it has been collecting to predict whether customers’ flights will be delayed in order to respond proactively. The team has now contacted you to help test out the viability of the idea. You will be given SmartFly’s data set from January 1 to September 30, 2014 and be asked to return a list of of upcoming flights sorted from the most likely to the least likely to be delayed.

    Problem 2: Almost Famous
    Congratulations! You have just published your first book on data science, advanced analytics, and predictive modeling. You’ve also decided to use your skills as a data scientist to build and optimize a website that promotes your book, and you have started several ad campaigns on a popular search engine in order to drive traffic to your site. Using your skills in data munging and statistical analysis, you will be asked to evaluate the performance of a series of campaigns directed towards site visitors using the log data in Hadoop as your source of truth.

    Problem 3: WINKLR
    WINKLR is a curiously popular social network for fans of the 1970s sitcom Happy Days. Users can post photos, write messages, and, most importantly, follow each other’s posts. This helps members keep up with new content from their favorite users. To help its users discover new people to follow on the site, WINKLR is building a new machine learning system called The Fonz to predict who a given user might like to follow. Phase One of The Fonz project is underway. The engineers can export the entire user graph as tuples. You have joined the Fonz project to implement Phase Two, which improves on this result. Given the user graph and the list of frequent-click tuples, you are being asked to select a 70,000 tuple subset in “user1,user2” format, where you believe user1 is mostly likely to want to follow user2. These will result in emails to the users, inviting them to follow the recommended user.

    Prize for success: CCP: Data Scientist status

    Great way to start 2015!

    I first saw this in a tweet by Sarah.

    Virtual Workshop and Challenge (NASA)

    Tuesday, June 24th, 2014

    Open NASA Earth Exchange (NEX) Virtual Workshop and Challenge 2014

    From the webpage:

    President Obama has announced a series of executive actions to reduce carbon pollution and promote sound science to understand and manage climate impacts for the U.S.

    Following the President’s call for developing tools for climate resilience, OpenNEX is hosting a workshop that will feature:

    1. Climate science through lectures by experts
    2. Computational tools through virtual labs, and
    3. A challenge inviting participants to compete for prizes by designing and implementing solutions for climate resilience.

    Whether you win any of the $60K in prize money or not, this looks like a great way to learn about climate data, approaches to processing climate data and the Amazon cloud all at one time!

    Processing in the virtual labs is on the OpenNEX (Open NASA Earth Exchange) nickel. You can experience cloud computing without fear of the bill for computing services. Gain valuable cloud experience and possibly make a contribution to climate science.


    the HiggsML challenge

    Saturday, May 24th, 2014

    the HiggsML challenge

    The challenge runs from May 12th to September 2014.

    From the challenge:

    In a nutshell, we provide a data set containing a mixture of simulated signal and background events, built from simulated events provided by the ATLAS collaboration at CERN. Competitors can use or develop any algorithm they want, and the one who achieves the best signal/background separation wins! Besides classical prizes for the winners, a special “HEP meets ML” prize will also be awarded with an invitation to CERN; we are also seeking to organise a NIPS workshop.

    For this HEP challenge we deliberately picked one of the most recent and hottest playgrounds: the Higgs decaying into a pair of tau leptons. The first ATLAS results were made public in december 2013 in a CERN seminar, ATLAS sees Higgs boson decay to fermions. The simulated events that participants will have in their hands are the same that physicists used. Participants will be working in realistic conditions although we have simplified quite a bit the original problem so that it became tractable without any background in physics.

    HEP physicist, even ATLAS physicists, who have experience with multivariate analysis, neural nets, boosted decision trees and the like are warmly encouraged to compete with machine learning experts.

    The Laboratoire de l’Accélerateur Linéaire (LAL) is a French lab located in the vicinity of Paris. It is overseen by both the CNRS (IN2P3) and University Paris-Sud. It counts 330 employees (125 researchers and 205 engineers and technicians) and brings internationally recognized contributions to experimental Particle Physics, Accelerator Physics, Astroparticle Physics, and Cosmology.

    Contact : for any question of general interest about the challenge, please consult and use the forum provided on the Kaggle web site. For private comments, we are also reachable at

    Now there is a machine learning challenge for the summer!

    Not to mention more science being done on the basis of public data sets.

    Be sure to forward this to both your local computer science and physics department.

    NASA’s Asteroid Grand Challenge Series

    Tuesday, March 11th, 2014

    NASA’s Asteroid Grand Challenge Series

    From the webpage:

    Welcome to the Asteroid Grand Challenge Series sponsored by the NASA Tournament Lab! The Asteroid Grand Challenge Series will be comprised of a series of topcoder challenges to get more people from around the planet involved in finding all asteroid threats to human populations and figuring out what to do about them. In an increasingly connected world, NASA recognizes the value of the public as a partner in addressing some of the country’s most pressing challenges. Click here to learn more and participate in our debut challenge, Asteroid Data Hunter – launching 03/17/14!

    From the details page:

    The Asteroid Data Hunter challenge tasks competitors to develop significantly improved algorithms to identify asteroids in images from ground-based telescopes. The winning solution must increase the detection sensitivity, minimize the number of false positives, ignore imperfections in the data, and run effectively on all computers.

    This is radically cool!

    Lots of data, difficult problem, high stakes (ELE (extinction level event) prevention).

    Data Science Challenge

    Tuesday, March 11th, 2014

    Data Science Challenge

    Some details from the registration page:

    Prerequisite: Data Science Essentials (DS-200)
    Schedule: Twice per year
    Duration: Three months from launch date
    Next Challenge Date: March 31, 2014
    Language: English
    Price: USD $600

    From the webpage:

    Cloudera will release a Data Science Challenge twice each year. Each bi-quarterly project is based on a real-world data science problem involving a large data set and is open to candidates for three months to complete. During the open period, candidates may work on their project individually and at their own pace.

    Current Data Science Challenge

    The new Data Science Challenge: Detecting Anomalies in Medicare Claims will be available starting March 31, 2014, and will cost USD $600.

    In the U.S., Medicare reimburses private providers for medical procedures performed for covered individuals. As such, it needs to verify that the type of procedures performed and the cost of those procedures are consistent and reasonable. Finally, it needs to detect possible errors or fraud in claims for reimbursement from providers. You have been hired to analyze a large amount of data from Medicare and try to detect abnormal data — providers, areas, or patients with unusual procedures and/or claims.

    Register for the challenge.

    Build a Winning Model

    CCP candidates compete against each other and against a benchmark set by a committee including some of the world’s elite data scientists. Participants who surpass evaluation benchmarks receive the CCP: Data Scientist credential.

    Lead the Field

    Those with the highest scores from each Challenge will have an opportunity to share their solutions and promote their work on and via press and social media outlets. All candidates retain the full rights to their own work and may leverage their models outside of the Challenge as they choose.

    Useful way to develop some street cred in data science.

    Fostering Innovation?

    Thursday, March 6th, 2014

    How Academia and Publishing are Destroying Scientific Innovation: A Conversation with Sydney Brenner by Elizabeth Dzeng.

    From the post:

    I recently had the privilege of speaking with Professor Sydney Brenner, a professor of Genetic medicine at the University of Cambridge and Nobel Laureate in Physiology or Medicine in 2002. My original intention was to ask him about Professor Frederick Sanger, the two-time Nobel Prize winner famous for his discovery of the structure of proteins and his development of DNA sequencing methods, who passed away in November. I wanted to do the classic tribute by exploring his scientific contributions and getting a first hand account of what it was like to work with him at Cambridge’s Medical Research Council’s (MRC) Laboratory for Molecular Biology (LMB) and at King’s College where they were both fellows. What transpired instead was a fascinating account of the LMB’s quest to unlock the genetic code and a critical commentary on why our current scientific research environment makes this kind of breakthrough unlikely today.

    If you or any funders you know are interested in fostering innovation, that is actually enabling innovation to happen, this is a must read interview for you.

    If you are any funders you know are interested in boosting about “fostering innovation,” creating “new breakthroughs” while funding the usual suspects, etc., just pass this one by.

    One can only hope that observations of proven innovators like Sydney Brenner will carry more weight that political ideologies in the research funding process.

    I first saw this in a tweet by Ivan Herman.

    Data Science – Chicago

    Monday, March 3rd, 2014

    OK, I shortened the headline.

    The full headline reads: Accenture and MIT Alliance in Business Analytics launches data science challenge in collaboration with Chicago: New annual contest for MIT students to recognize best data analytics and visualization ideas.: The Accenture and MIT Alliance in Business Analytics

    Don’t try that without coffee in the morning.

    From the post:

    The Accenture and MIT Alliance in Business Analytics have launched an annual data science challenge for 2014 that is being conducted in collaboration with the city of Chicago.

    The challenge invites MIT students to analyze Chicago’s publicly available data sets and develop data visualizations that will provide the city with insights that can help it better serve residents, visitors, and businesses. Through data visualization, or visual renderings of data sets, people with no background in data analysis can more easily understand insights from complex data sets.

    The headline is longer than the first paragraph of the story.

    I didn’t see an explanation for why the challenge is limited to:

    The challenge is now open and ends April 30. Registration is free and open to active MIT students 18 and over (19 in Alabama and Nebraska). Register and see the full rule here:

    Find a sponsor and setup an annual data mining challenge for your school or organization.

    Although I would suggest you take a pass on Bogata, Mexico City, Rio de Janeiro, Moscow, Washington, D.C. and similar places where truthful auditing could be hazardous to your health.

    Or as one of my favorite Dilbert cartoons had the pointy-haired boss observing:

    When you find a big pot of crazy it’s best not to stir it.

    Knight News Challenge

    Sunday, March 2nd, 2014

    Knight News Challenge

    Phases of the Challenge:

    Submissions (February 27 – March 18)
    Feedback (March 18 – April 18)
    Refinement (April 18 – 28)
    Evaluation (Begins April 28)

    From the webpage:

    How can we strengthen the Internet for free expression and innovation?

    This is an open call for ideas. We want to discover projects that make the Internet better. We believe that access to information is key to vibrant and successful communities, and we want the Internet to remain an open, equitable platform for free expression, commerce and learning. We want an Internet that fuels innovation through the creation and sharing of ideas.

    We don’t have specific projects that we’re hoping to see in response to our question. Instead, we want this challenge to attract a range of approaches. In addition to technologies, we’re open to ideas focused on journalism, policy, research, education– any innovative project that results in a stronger Internet.

    So we want to know what you think– what captures your imagination when you think about the Internet as a place for free expression and innovation? In June we will award $2.75 million, including $250,000 from the Ford Foundation, to support the most compelling ideas.

    Breaking the strangle hold of page rank is on top of my short list. There is a great deal to be said for the “wisdom” of crowds, but one of those things is that it doesn’t respond well to the passage of time. Old material keeps racking up credibility long past its “ignore by date.”

    More granular date sorting would be a strong second on my list.

    What’s on your short list?

    Yelp Dataset Challenge

    Monday, November 25th, 2013

    Yelp Dataset Challenge

    Deadline: Monday, February 10, 2014.

    From the webpage:

    Yelp is proud to introduce a deep dataset for research-minded academics from our wealth of data. If you’ve used our Academic Dataset and want something richer to train your models on and use in publications, this is it. Tired of using the same standard datasets? Want some real-world relevance in your research project? This data is for you!

    Yelp is bringing you a generous sample of our data from the greater Phoenix, AZ metropolitan area including:

    • 11,537 businesses
    • 8,282 checkin sets
    • 43,873 users
    • 229,907 reviews


    If you are a student and come up with an appealing project, you’ll have the opportunity to win one of ten Yelp Dataset Challenge awards for $5,000. Yes, that’s $5,000 for showing us how you use our data in insightful, unique, and compelling ways.

    Additionally, if you publish a research paper about your winning research in a peer-reviewed academic journal, then you’ll be awarded an additional $1,000 as recognition of your publication. If you are published, Yelp will also contribute up to $500 to travel expenses to present your research using our data at an academic or industry conference.

    If you are a student, see the Yelp webpage for more details. If you are not a student, pass this along to someone who is.

    Yes, this is dataset mentioned in How-to: Index and Search Data with Hue’s Search App.

    BRDI Announces Data and Information Challenge

    Thursday, October 10th, 2013

    BRDI Announces Data and Information Challenge by Stephanie Hagstrom.

    From the post:

    The National Academy of Sciences Board on Research Data and Information (BRDI; announces an open challenge to increase awareness of current issues and opportunities in research data and information. These issues include, but are not limited to, accessibility, integration, discoverability, reuse, sustainability, perceived versus real value and reproducibility.

    A Letter of Intent is requested by December 1, 2013 and the deadline for final entries is May 15, 2014.

    Awardees will be invited to present their projects at the National Academy of Sciences in Washington DC as part of a symposium of the regularly scheduled Board of Research Data and Information meeting in the latter half of 2014.

    More information is available at Please contact Cheryl Levey ( with any questions.

    This looks quite interesting.

    The main site reports:

    The National Academy of Sciences Board on Research Data and Information (BRDI; is holding an open challenge to increase awareness of current issues and opportunities in research data and information. These issues include, but are not limited to, accessibility, integration, discoverability, reuse, sustainability, perceived versus real value and reproducibility. Opportunities include, but are not limited to, analyzing such data and information in new ways to achieve significant societal benefit.

    Entrants are expected to describe one or more of the following:

    • Novel ideas
    • Tools
    • Processes
    • Models
    • Outcomes

    using research data and information. There is no restriction on the type of data or information, or the type of innovation that can be described. All data and tools that form the basis of a contestant’s entry must be made freely and openly available. The challenge is held in memory of Lee Dirks, a pioneer in scholarly communication.

    Anticipated outcomes of the challenge include the potential for original and innovative solutions to societal problems using existing research data and information, national recognition for the successful contestants and possibly their institutions.

    Looks ideal for a topic map-based proposal.

    Suggestions on data sets?

    Legislative XML Data Mapping [$10K]

    Friday, September 13th, 2013

    Legislative XML Data Mapping (Library of Congress)

    First, the important stuff:

    First Place: $10K

    Entry due by: December 31 at 5:00pm EST

    Second, the details:

    The Library of Congress is sponsoring two legislative data challenges to advance the development of international data exchange standards for legislative data. These challenges are an initiative to encourage broad participation in the development and application of legislative data standards and to engage new communities in the use of legislative data. Goals of this initiative include:
    • Enabling wider accessibility and more efficient exchange of the legislative data of the United States Congress and the United Kingdom Parliament,
    • Encouraging the development of open standards that facilitate better integration, analysis, and interpretation of legislative data,
    • Fostering the use of open source licensing for implementing legislative data standard.

    The Legislative XML Data Mapping Challenge invites competitors to produce a data map for US bill XML and the most recent Akoma Ntoso schema and UK bill XML and the most recent Akoma Ntoso schema. Gaps or issues identified through this challenge will help to shape the evolving Akoma Ntoso international standard.

    The winning solution will win $10,000 in cash, as well as opportunities for promotion, exposure, and recognition by the Library of Congress. For more information about prizes please see the Official Rules.

    Can you guess what tool or technique I would suggest that you use? 😉

    The winner is announced February 12, 2014 at 5:00pm EST.

    Too late for the holidays this year, too close to Valentines Day, what holiday will you be wanting to celebrate?

    KDD Cup 2013 – Author-Paper Identification Challenge

    Thursday, April 18th, 2013

    KDD Cup 2013 – Author-Paper Identification Challenge

    Started: 3:47 am, Thursday 18 April 2013 UTC
    Ends: 12:00 am, Wednesday 12 June 2013 UTC (54 total days)

    From the post:

    The ability to search literature and collect/aggregate metrics around publications is a central tool for modern research. Both academic and industry researchers across hundreds of scientific disciplines, from astronomy to zoology, increasingly rely on search to understand what has been published and by whom.

    Microsoft Academic Search is an open platform that provides a variety of metrics and experiences for the research community, in addition to literature search. It covers more than 50 million publications and over 19 million authors across a variety of domains, with updates added each week. One of the main challenges of providing this service is caused by author-name ambiguity. On one hand, there are many authors who publish under several variations of their own name. On the other hand, different authors might share a similar or even the same name.

    As a result, the profile of an author with an ambiguous name tends to contain noise, resulting in papers that are incorrectly assigned to him or her. This KDD Cup task challenges participants to determine which papers in an author profile were truly written by a given author.

    $7,500 and bragging rights.

    Is there going to be a topic map entry this year?

    Knight News Challenge – 40 Finalists

    Monday, April 8th, 2013

    Knight News Challenge – 40 Finalists

    There are 78 days (as of today) before the evaluation of the forty (40) finalists in the Knight News Challenge closes.

    You will need to average better than two (2) a day in order to see all of them.

    Worthwhile because:

    • Your comments may help improve a project.
    • Your comments may assist in evaluation of a project.
    • You may get some great ideas for another project.
    • You may see ways to incorporate topic maps in one or more projects. (or not)

    It is important to learn to contribute to projects that are not your own and may not be your top choice.

    You may discover ideas, techniques and even people who you would otherwise miss.

    The GitHub Data Challenge II

    Friday, April 5th, 2013

    The GitHub Data Challenge II

    From the webpage:

    There are millions of projects on GitHub. Every day, people from around the world are working to make these projects better. Opening issues, pushing code, submitting Pull Requests, discussing project details — GitHub activity is a papertrail of progress. Have you ever wondered what all that data looks like? There are millions of stories to tell; you just have to look.

    Last year we held our first data challenge. We saw incredible visualizations, interesting timelines and compelling analysis.

    What stories will be told this year? It’s up to you!

    To Enter

    Send a link to a GitHub repository or gist with your graph(s) along with a description to before midnight, May 8th, 2013 PST.

    Approaching 100M rows, how would you visualize the data and what questions would you explore?

    Increasing Interoperability of Data for Social Good [$100K]

    Saturday, March 23rd, 2013

    Increasing Interoperability of Data for Social Good

    March 4, 2013 through May 7, 2013 11:30 AM PST

    Each Winner to Receive $100,000 Grant

    Got your attention? Good!

    From the notice:

    The social sector is full of passion, intuition, deep experience, and unwavering commitment. Increasingly, social change agents from funders to activists, are adding data and information as yet one more tool for decision-making and increasing impact.

    But data sets are often isolated, fragmented and hard to use. Many organizations manage data with multiple systems, often due to various requirements from government agencies and private funders. The lack of interoperability between systems leads to wasted time and frustration. Even those who are motivated to use data end up spending more time and effort on gathering, combining, and analyzing data, and less time on applying it to ongoing learning, performance improvement, and smarter decision-making.

    It is the combining, linking, and connecting of different “data islands” that turns data into knowledge – knowledge that can ultimately help create positive change in our world. Interoperability is the key to making the whole greater than the sum of its parts. The Bill & Melinda Gates Foundation, in partnership with Liquidnet for Good, is looking for groundbreaking ideas to address this significant, but solvable, problem. See the website for more detail on the challenge and application instructions. Each challenge winner will receive a grant of $100,000.

    From the details website:

    Through this challenge, we’re looking for game-changing ideas we might never imagine on our own and that could revolutionize the field. In particular, we are looking for ideas that might provide new and innovative ways to address the following:

    • Improving the availability and use of program impact data by bringing together data from multiple organizations operating in the same field and geographical area;
    • Enabling combinations of data through application programming interface (APIs), taxonomy crosswalks, classification systems, middleware, natural language processing, and/or data sharing agreements;
    • Reducing inefficiency for users entering similar information into multiple systems through common web forms, profiles, apps, interfaces, etc.;
    • Creating new value for users trying to pull data from multiple sources;
    • Providing new ways to access and understand more than one data set, for example, through new data visualizations, including mashing up government and other data;
    • Identifying needs and barriers by experimenting with increased interoperability of multiple data sets;
    • Providing ways for people to access information that isn’t normally accessible (for using natural language processing to pull and process stories from numerous sources) and combing that information with open data sets.

    Successful Proposals Will Include:

    • Identification of specific data sets to be used;
    • Clear, compelling explanation of how the solution increases interoperability;
    • Use case;
    • Description of partnership or collaboration, where applicable;
    • Overview of how solution can be scaled and/or adapted, if it is not already cross-sector in nature;
    • Explanation of why the organization or group submitting the proposal has the capacity to achieve success;
    • A general approach to ongoing sustainability of the effort.

    I could not have written a more topic map oriented challenge. You?

    They suggest the usual social data sites:

    International Space Apps Challenge

    Monday, February 4th, 2013

    International Space Apps Challenge

    From the webpage:

    The International Space Apps Challenge is a two-day technology development event during which citizens from around the world will work together to address current challenges relevant to both space exploration and social need.

    NASA believes that mass collaboration is key to creating and discovering state-of-the-art technology. The International Space Apps Challenge aims to engage YOU in developing innovative solutions to our toughest challenges.

    Join us on April 20-21, 2013, as we join together cities around the world to be part of pioneering the future. Sign up to be notified when registration opens in early 2013!

    The list of challenges will be released around March 15th,

    I won’t be able to attend in person but would be interested in participating with others should a semantic integration challenge come up.

    I first saw this at: NASA launches second International Space Apps Challenge by Alex Howard.

    Seeking Creative Use Cases for Thomson Reuters Web of Knowledge

    Thursday, January 31st, 2013

    Seeking Creative Use Cases for Thomson Reuters Web of Knowledge

    From the post:

    AWARD: $10,000 USD | DEADLINE: 2/24/13 | ACTIVE SOLVERS: 467 | POSTED: 1/18/13

    This Challenge seeks use cases for Thomson Reuters Web of Knowledge content, tools, and APIs (Application Programming Interface) that would enable users to engage in creative new behaviors, beyond what is currently possible with online research portals. How will users want to search and discover scholarly content throughout the next 5 years?

    This Challenge is an Ideation Challenge, with a guaranteed award for at least one submitted solution. In this first phase, the Seeker is looking for creative ideas/use cases; no programming or code delivery is required.

    See the post for details and links.

    Only the best idea is required. No eye candy to cover up a poor idea.

    App-lifying USGS Earth Science Data

    Thursday, January 10th, 2013

    App-lifying USGS Earth Science Data

    Challenge Dates:

    Submissions: January 9, 2013 at 9:00am EST – Ends April 1, 2013 at 11:00pm EDT.

    Public Voting: April 5, 2013 at 5:00pm EDT – Ends April 25, 2013 at 11:00pm EDT.

    Judging: April 5, 2013 at 5:00pm EDT – Ends April 25, 2013 at 11:00pm EDT.

    Winners Announced: April 26, 2013 at 5:00pm EDT.

    From the webpage:

    USGS scientists are looking for your help in addressing some of today’s most perplexing scientific challenges, such as climate change and biodiversity loss. To do so requires a partnership between the best and the brightest in Government and the public to guide research and identify solutions.

    The USGS is seeking help via this platform from many of the Nation’s premier application developers and data visualization specialists in developing new visualizations and applications for datasets.

    USGS datasets for the contest consist of a range of earth science data types, including:

    • several million biological occurrence records (terrestrial and marine);
    • thousands of metadata records related to research studies, ecosystems, and species;
    • vegetation and land cover data for the United States, including detailed vegetation maps for the National Parks; and
    • authoritative taxonomic nomenclature for plants and animals of North America and the world.

    Collectively, these datasets are key to a better understanding of many scientific challenges we face globally. Identifying new, innovative ways to represent, apply, and make these data available is a high priority.

    Submissions will be judged on their relevance to today’s scientific challenges, innovative use of the datasets, and overall ease of use of the application. Prizes will be awarded to the best overall app, the best student app, and the people’s choice.

    Of particular interest for the topic maps crowd:

    Data used – The app must utilize a minimum of 1 DOI USGS Core Science and Analytics (CSAS) data source, though they need not include all data fields available in a particular resource. A list of CSAS databases and resources is available at: The use of data from other sources in conjunction with CSAS data is encouraged.

    CSAS has a number of very interesting data sources. Classifications, thesauri, data integration, metadata and more.

    Contest wins you a recognition and bragging rights, not to mention visibility for your approach.

    Health Design Challenge [$50K in Prizes – Deadline 30th Nov 2012]

    Thursday, October 25th, 2012

    Health Design Challenge

    More details at the site but:

    ONC & VA invite you to rethink how the medical record is presented. We believe designers can use their talents to make health information patient-centered and improve the patient experience.

    Being able to access your health information on demand can be lifesaving in an emergency situation, can help prevent medication errors, and can improve care coordination so everyone who is caring for you is on the same page. However, too often health information is presented in an unwieldy and unintelligible way that makes it hard for patients, their caregivers, and their physicians to use. There is an opportunity for talented designers to reshape the way health records are presented to create a better patient experience.

    Learn more at

    The purpose of this effort is to improve the design of the medical record so it is more usable by and meaningful to patients, their families, and others who take care of them. This is an opportunity to take the plain-text Blue Button file and enrich it with visuals and a better layout. Innovators will be invited to submit their best designs for a medical record that can be printed and viewed digitally.

    This effort will focus on the content defined by a format called the Continuity of Care Document (CCD). A CCD is a common template used to describe a patient’s health history and can be output by electronic medical record (EMR) software. Submitted designs should use the sections and fields found in a CCD. See for CCD sections and fields.

    Entrants will submit a design that:

    • Improves the visual layout and style of the information from the medical record
    • Makes it easier for a patient to manage his/her health
    • Enables a medical professional to digest information more efficiently
    • Aids a caregiver such as a family member or friend in his/her duties and responsibilities with respect to the patient

    Entrants should be conscious of how the wide variety of personas will affect their design. Our healthcare system takes care of the following types of individuals:

    • An underserved inner-city parent with lower health literacy
    • A senior citizen that has a hard time reading
    • A young adult who is engaged with technology and mobile devices
    • An adult whose first language is not English
    • A patient with breast cancer receiving care from multiple providers
    • A busy mom managing her kids’ health and helping her aging parents

    This is an opportunity for talented individuals to touch the lives of Americans across the country through design. The most innovative designs will be showcased in an online gallery and in a physical exhibit at the Annual ONC Meeting in Washington DC.

    should be enough to capture your interest.

    Winners will be announced December 12, 2012.

    Only the design is required, no working code.

    Still, a topic map frame of mind may give you more options than other approaches.

    Using (Spring Data) Neo4j for the Hubway Data Challenge [Boston Biking]

    Thursday, October 11th, 2012

    Using (Spring Data) Neo4j for the Hubway Data Challenge by Michael Hunger.

    From the post:

    Using Spring Data Neo4j it was incredibly easy to model and import the Hubway Challenge dataset into a Neo4j graph database, to make it available for advanced querying and visualization.

    The Challenge and Data

    Tonight @graphmaven pointed me to the article about the Hubway Data Challenge.

    (graphics omitted)

    Hubway is a bike sharing service which is currently expanding worldwide. In the Data challenge they offer the CSV-data of their 95 Boston stations and about half a million bike rides up until the end of September. The challenge is to provide answers to some posted questions and develop great visualizations (or UI’s) for the Hubway data set. The challenge is also supported by MAPC (Metropolitan Area Planning Council).

    Useful import tips for data into Neo4j and on modeling this particular dataset.

    Not to mention the resulting database as well!

    PS: From the challenge site:

    Submission will open here on Friday, October 12, 2012.


    MIDNIGHT (11:59 p.m.) on Halloween,
    Wednesday, October 31, 2012.

    Winners will be announced on Wednesday, November 7, 2012.


    • A one-year Hubway membership
    • Hubway T-shirt
    • Bern helmet
    • A limited edition Hubway System Map—one of only 61 installed in the original Hubway stations.

    For other details, see the challenge site.

    NASA Tournament Lab to Launch Big Data Challenge Series for U.S. Government Agencies

    Thursday, October 4th, 2012

    Big Data Challenge Series: NASA Tournament Lab to Launch Big Data Challenge Series for U.S. Government Agencies

    Contest ends: Nov 12, 2012 05:00 PM EST

    From the webpage:

    NASA, the National Science Foundation (NSF), and the Department of Energy’s Office of Science, announced Oct. 3, 2012, the launch of the Big Data Challenge – a series of ideation competitions hosted through the NASA Tournament Lab (NTL). The Big Data Challenge series will apply the process of Open Innovation (OI) to the goal of conceptualizing new and novel approaches to utilizing “Big Data” information sets residing in various agency silos while remaining consistent with individual United States agencies missions related to the field of health, energy and earth sciences.

    Competitors will be tasked with imagining analytical techniques and software tools that utilize Big Data from discrete government information domains and then describing how they may be shared as universal, cross-agency solutions that transcend the limitations of individual silos. The competition will be run by the NASA Tournament Lab (NTL), a collaboration between Harvard University and TopCoder, a competitive community of digital creators.

    “The ability to create new applications and algorithms using diverse data sets is a key element of the NTL,” said Jason Crusan, Director of Advanced Exploration Systems at NASA’s Human Exploration and Operations Mission Directorate. “NASA is excited to see the results that open innovation can provide to these big data applications.”

    You have to go to: and have a topcoder account (but you have that already).

    More than beer money and in time for the holiday season. Something to think about.

    The Ultimate Data Geek Challenge

    Saturday, September 22nd, 2012

    The Ultimate Data Geek Challenge by Nic Smith.

    From the post:

    Are You the Ultimate Data Geek?

    The time has come to show off your inner geek and let the rest of world know your data skills are second to none.

    We’re excited to announce the Ultimate Data Geek Challenge. Grab your data and share your visual creation in a video, screen capture, or blog post on the SCN. Once you enter, you’ll have a chance to be crowned the Ultimate Data Geek.

    How Do I Enter?

    It’s easy – just four simple steps:

    Important note: Challenge entries will be accepted up until November 30, 2012, at 11:59 p.m. Pacific.

    There are videos and other materials to help you learn SAP Visual Intelligence.

    Another tool to find subjects and data about subjects. I haven’t looked at SAP Visual Intelligence so would appreciate a shout if you have.

    I first saw this at: Dancing With Dirty Data Thanks to SAP Visual Intelligence

    23 Mathematical Challenges [DARPA – A Modest Challenge]

    Tuesday, August 28th, 2012

    23 Mathematical Challenges [DARPA]

    From the webpage:

    Discovering novel mathematics will enable the development of new tools to change the way the DoD approaches analysis, modeling and prediction, new materials and physical and biological sciences. The 23 Mathematical Challenges program involves individual researchers and small teams who are addressing one or more of the following 23 mathematical challenges, which if successfully met, could provide revolutionary new techniques to meet the long-term needs of the DoD:

    • Mathematical Challenge 1: The Mathematics of the Brain
    • Mathematical Challenge 2: The Dynamics of Networks
    • Mathematical Challenge 3: Capture and Harness Stochasticity in Nature
    • Mathematical Challenge 4: 21st Century Fluids
    • Mathematical Challenge 5: Biological Quantum Field Theory
    • Mathematical Challenge 6: Computational Duality
    • Mathematical Challenge 7: Occam’s Razor in Many Dimensions
    • Mathematical Challenge 8: Beyond Convex Optimization
    • Mathematical Challenge 9: What are the Physical Consequences of Perelman’s Proof of Thurston’s Geometrization Theorem?
    • Mathematical Challenge 10: Algorithmic Origami and Biology
    • Mathematical Challenge 11: Optimal Nanostructures
    • Mathematical Challenge 12: The Mathematics of Quantum Computing, Algorithms, and Entanglement
    • Mathematical Challenge 13: Creating a Game Theory that Scales
    • Mathematical Challenge 14: An Information Theory for Virus Evolution
    • Mathematical Challenge 15: The Geometry of Genome Space
    • Mathematical Challenge 16: What are the Symmetries and Action Principles for Biology?
    • Mathematical Challenge 17: Geometric Langlands and Quantum Physics
    • Mathematical Challenge 18: Arithmetic Langlands, Topology and Geometry
    • Mathematical Challenge 19: Settle the Riemann Hypothesis
    • Mathematical Challenge 20: Computation at Scale
    • Mathematical Challenge 21: Settle the Hodge Conjecture
    • Mathematical Challenge 22: Settle the Smooth Poincare Conjecture in Dimension 4
    • Mathematical Challenge 23: What are the Fundamental Laws of Biology?

    (Details of each challenge omitted. See the webpage for descriptions.)

    Worthy mathematical challenges all but what about a more modest challenge? One that may help solve a larger one?

    Such as cutting across the terminology barriers of approaches and fields of mathematics to collate the prior, present and ongoing research on each of these challenges?

    Not only would the curated artifact be useful to researchers, but the act of curation, the reading and mapping of what is known on a particular problem, could spark new approaches to the main problem as well.

    DARPA should consider a history curation project on one or more of these challenges.

    Could produce a useful information artifact for researchers, train math graduate students in searching across approaches/fields, and might trigger a creative insight into a possible challenge solution.

    I first saw this at Beyond Search: DARPA May Be Hilbert

    Mozilla Ignite [Challenge – $15,000]

    Friday, June 15th, 2012

    Mozilla Ignite

    From the webpage:

    Calling all developers, network engineers and community catalysts. Mozilla and the National Science Foundation (NSF) invite designers, developers and everyday people to brainstorm and build applications for the faster, smarter Internet of the future. The goal: create apps that take advantage of next-generation networks up to 250 times faster than today, in areas that benefit the public — like education, healthcare, transportation, manufacturing, public safety and clean energy.

    Designing for the internet of the future

    The challenge begins with a “Brainstorming Round” where anyone can submit and discuss ideas. The best ideas will receive funding and support to become a reality. Later rounds will focus specifically on application design and development. All are welcome to participate in the brainstorming round.


    What would you do with 1 Gbps? What apps would you create for deeply programmable networks 250x faster than today? Now through August 23rd, let’s brainstorm. $15,000 in prizes.

    The challenge is focused specifically on creating public benefit in the U.S. The deadline for idea submissions is August 23, 2012.

    Here is the entry website.

    I assume the 1Gbps is actual and not as measured by the marketing department of the local cable company. 😉

    That would have to be from a source that can push 1 Gbps to you and you be capable of handling it. (Upstream limitations being what chokes my local speed down.)

    I went looking for an example of what that would mean and came up with: “…[you] can download 23 episodes of 30 Rock in less than two minutes.

    On the whole, I would rather not.

    What other uses would you suggest for 1Gbps network speeds?

    Assuming you have the capacity to push back at the same speed, I wonder what that means in terms of querying/viewing data as a topic map?

    Transformation to a topic map for only for a subset of data?

    Looking forward to seeing your entries!

    Talking with Neo4j Graphs

    Sunday, February 19th, 2012

    Talking with Neo4j Graphs by Tomás Augusto Müller.

    From the post:

    In this post I will be covering all main details regarding the development of my entry for the Neo4j Challenge.

    The main objective of this challenge is to create a Heroku-ready template or demo application using Neo4j. So, I thought to myself: – what kind of application would be nice to show up in this contest?

    After many ideas, here it is!

    In short, the application is a Stock Exchange symbol lookup using Neo4j and your voice.

    Looks like competitors are starting to emerge in the Neo4j challenge!

    Very cool!

    Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE 2011)

    Monday, June 6th, 2011

    Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE 2011)

    Full Day Workshop in conjunction with the 10th International Semantic Web Conference 2011 23/24 October 2011, Bonn, Germany

    Important Dates

    Deadline for paper submission: 8 August 2011 23:59 (11:59pm) Hawaii time
    Notification of Acceptance: 29 August 2011 23:59 (11:59pm) Hawaii time
    Camera-ready version: 8 September 2011
    Workshop: 23 or 24 October 2011


    The goal of DeRiVE 2011 is to strengthen the participation of the semantic web community in the recent surge of research on the use of events as a key concept for representing knowledge and organizing and structuring media on the web. The workshop invites contributions to three central questions, and the goal is to formulate answers to these questions that advance and reflect the current state of understanding of events in the semantic web. Each submission will be expected to address at least one question explicitly, and, if possible, include a system demonstration. We have released an event challenge dataset for use in the preparation of contributions, with the goal of supporting a shared understanding of their impact. A prize will be awarded for the best use(s) of the dataset; but the use of other datasets will also be allowed.

    See the CFP for questions papers must address.

    Also note the anticipated release of a dataset:

    We will release a dataset of event data. In addition to regular papers, we invite everybody to submit a Data Challenge paper describing work on this dataset. We welcome analyses, extensions, alignments or modifications of the dataset, as well as applications and demos. The best Data Challenge paper will get a prize.

    The dataset consists of over 100.000 events from three sources: the music website, and the entertainment websites and All three are represented in the LODE schema. Next to events, they contain artists, venues and location and time information. Some links between the instances of the three datasets are provided.

    Suggestions for modeling events in topic maps?