Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 10, 2013

Call for KDD Cup Competition Proposals

Filed under: Contest,Data Mining,Dataset,Knowledge Discovery — Patrick Durusau @ 1:17 pm

Call for KDD Cup Competition Proposals

From the post:

Please let us know if you are interested in being considered for the 2013 KDD Cup Competition by filling out the form below.

This is the official call for proposals for the KDD Cup 2013 competition. The KDD Cup is the well known data mining competition of the annual ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD-2013 conference will be held in Chicago from August 11 – 14, 2013. The competition will last between 6 and 8 weeks and the winners should be notified by end-June. The winners will be announced in the KDD-2013 conference and we are planning to run a workshop as well.

A good competition task is one that is practically useful, scientifically or technically challenging, can be done without extensive application domain knowledge, and can be evaluated objectively. Of particular interest are non-traditional tasks/data that require novel techniques and/or thoughtful feature construction.

Proposals should involve data and a problem whose successful completion will result in a contribution of some lasting value to a field or discipline. You may assume that Kaggle will provide the technical support for running the contest. The data needs to be available no later than mid-March.

If you have initial questions about the suitability of your data/problem feel free to reach out to claudia.perlich [at] gmail.com.

Do you have:

non-traditional tasks/data that require[s] novel techniques and/or thoughtful feature construction?

Is collocation of information on the basis of multi-dimensional subject identity a non-traditional task?

Does extraction of multiple dimensions of a subject identity from users require novel techniques?

If so, what data sets would you suggest using in this challenge?

I first saw this at: 19th ACM SIGKDD Knowledge Discovery and Data Mining Conference.

February 4, 2013

International Space Apps Challenge

Filed under: Challenges,Contest,NASA — Patrick Durusau @ 7:12 pm

International Space Apps Challenge

From the webpage:

The International Space Apps Challenge is a two-day technology development event during which citizens from around the world will work together to address current challenges relevant to both space exploration and social need.

NASA believes that mass collaboration is key to creating and discovering state-of-the-art technology. The International Space Apps Challenge aims to engage YOU in developing innovative solutions to our toughest challenges.

Join us on April 20-21, 2013, as we join together cities around the world to be part of pioneering the future. Sign up to be notified when registration opens in early 2013!

The list of challenges will be released around March 15th, spaceappschallenge.org.

I won’t be able to attend in person but would be interested in participating with others should a semantic integration challenge come up.

I first saw this at: NASA launches second International Space Apps Challenge by Alex Howard.

January 31, 2013

Saturday 23rd February is Open Data Day 2013!

Filed under: Contest,Open Data — Patrick Durusau @ 7:24 pm

Saturday 23rd February is Open Data Day 2013! from AIMS.

From the post:

Open Data Day is a gathering of citizens in cities around the world to write applications, liberate data, create visualizations and publish analyses using open public data to show support for and encourage the adoption of open data policies by the world’s local, regional and national governments. There are Open Data Day events taking place all around the world.

Are you are planning to organize or participate in one of these events? Are you going to launch new open data catalogs on the Open Data Day? Share with us your plans and highlight events that might be of interest for the agricultural information management community.

Know more at http://opendataday.org/

As of today: 52 events.

Anyone interested in a virtual event on Open Data Day using open data and topic maps?

January 10, 2013

App-lifying USGS Earth Science Data

Filed under: Challenges,Contest,Data,Geographic Data,Science — Patrick Durusau @ 1:49 pm

App-lifying USGS Earth Science Data

Challenge Dates:

Submissions: January 9, 2013 at 9:00am EST – Ends April 1, 2013 at 11:00pm EDT.

Public Voting: April 5, 2013 at 5:00pm EDT – Ends April 25, 2013 at 11:00pm EDT.

Judging: April 5, 2013 at 5:00pm EDT – Ends April 25, 2013 at 11:00pm EDT.

Winners Announced: April 26, 2013 at 5:00pm EDT.

From the webpage:

USGS scientists are looking for your help in addressing some of today’s most perplexing scientific challenges, such as climate change and biodiversity loss. To do so requires a partnership between the best and the brightest in Government and the public to guide research and identify solutions.

The USGS is seeking help via this platform from many of the Nation’s premier application developers and data visualization specialists in developing new visualizations and applications for datasets.

USGS datasets for the contest consist of a range of earth science data types, including:

  • several million biological occurrence records (terrestrial and marine);
  • thousands of metadata records related to research studies, ecosystems, and species;
  • vegetation and land cover data for the United States, including detailed vegetation maps for the National Parks; and
  • authoritative taxonomic nomenclature for plants and animals of North America and the world.

Collectively, these datasets are key to a better understanding of many scientific challenges we face globally. Identifying new, innovative ways to represent, apply, and make these data available is a high priority.

Submissions will be judged on their relevance to today’s scientific challenges, innovative use of the datasets, and overall ease of use of the application. Prizes will be awarded to the best overall app, the best student app, and the people’s choice.

Of particular interest for the topic maps crowd:

Data used – The app must utilize a minimum of 1 DOI USGS Core Science and Analytics (CSAS) data source, though they need not include all data fields available in a particular resource. A list of CSAS databases and resources is available at: http://www.usgs.gov/core_science_systems/csas/activities.html. The use of data from other sources in conjunction with CSAS data is encouraged.

CSAS has a number of very interesting data sources. Classifications, thesauri, data integration, metadata and more.

Contest wins you a recognition and bragging rights, not to mention visibility for your approach.

November 19, 2012

The “Ask Bigger Questions” Contest!

Filed under: Cloudera,Contest,Hadoop — Patrick Durusau @ 7:17 pm

The “Ask Bigger Questions” Contest! by Ryan Goldman. (Deadline, Feb. 1 2013)

From the post:

Have you helped your company ask bigger questions? Our mission at Cloudera University is to equip Hadoop professionals with the skills to manage, process, analyze, and monetize more data than they ever thought possible.

Over the past three years, we’ve heard many great stories from our training participants about faster cluster deployments, complex data workflows made simple, and superhero troubleshooting moments. And we’ve heard from executives in all types of businesses that staffing Cloudera Certified professionals gives them confidence that their Hadoop teams have the skills to turn data into breakthrough insights.

Now, it’s your turn to tell us your bigger questions story! Cloudera University is seeking tales of Hadoop success originating with training and certification. How has an investment in your education paid dividends for your company, team, customer, or career?

The most compelling stories chosen from all entrants will receive prizes like Amazon gift cards, discounted Cloudera University training, autographed copies of Hadoop books from O’Reilly Media, and Cloudera swag. We may even turn your story into a case study!

Sign up to participate here. Submissions must be received by Friday, Feb. 1, 2013 to qualify for a prize.

A good marketing technique that might bear imitation.

Don’t have to seek out success stories. Incentives for people to bring them to you.

You get good marketing material that is likely to resonate with other users.

Something to think about.

October 4, 2012

NASA Tournament Lab to Launch Big Data Challenge Series for U.S. Government Agencies

Filed under: BigData,Challenges,Contest — Patrick Durusau @ 3:34 pm

Big Data Challenge Series: NASA Tournament Lab to Launch Big Data Challenge Series for U.S. Government Agencies

Contest ends: Nov 12, 2012 05:00 PM EST

From the webpage:

NASA, the National Science Foundation (NSF), and the Department of Energy’s Office of Science, announced Oct. 3, 2012, the launch of the Big Data Challenge – a series of ideation competitions hosted through the NASA Tournament Lab (NTL). The Big Data Challenge series will apply the process of Open Innovation (OI) to the goal of conceptualizing new and novel approaches to utilizing “Big Data” information sets residing in various agency silos while remaining consistent with individual United States agencies missions related to the field of health, energy and earth sciences.

Competitors will be tasked with imagining analytical techniques and software tools that utilize Big Data from discrete government information domains and then describing how they may be shared as universal, cross-agency solutions that transcend the limitations of individual silos. The competition will be run by the NASA Tournament Lab (NTL), a collaboration between Harvard University and TopCoder, a competitive community of digital creators.

“The ability to create new applications and algorithms using diverse data sets is a key element of the NTL,” said Jason Crusan, Director of Advanced Exploration Systems at NASA’s Human Exploration and Operations Mission Directorate. “NASA is excited to see the results that open innovation can provide to these big data applications.”

You have to go to: studio.topcoder.com and have a topcoder account (but you have that already).

More than beer money and in time for the holiday season. Something to think about.

August 12, 2012

DATA MINING: Accelerating Drug Discovery by Text Mining of Patents

Filed under: Contest,Data Mining,Drug Discovery,Patents,Text Mining — Patrick Durusau @ 1:34 pm

DATA MINING: Accelerating Drug Discovery by Text Mining of Patents

From the contest page:

Patent documents contain important research that is valuable to the industry, business, law, and policy-making communities. Take the patent documents from the United States Patent and Trademark Office (USPTO) as examples. The structured data include: filing date, application date, assignees, UPC (US Patent Classification) codes, IPC codes, and others, while the unstructured segments include: title, abstract, claims, and description of the invention. The description of the invention can be further segmented into field of the invention, background, summary, and detailed description.

Given a set of “Source” patents or documents, we can use text mining to identify patents that are “similar” and “relevant” for the purpose of discovery of drug variants. These relevant patents could further be clustered and visualized appropriately to reveal implicit, previously unknown, and potentially useful patterns.

The eventual goal is to obtain a focused and relevant subset of patents, relationships and patterns to accelerate discovery of variations or evolutions of the drugs represented by the “source” patents.

Timeline:

  • July 19, 2012 – Start of the Contest Part 1
  • August 23, 2012 – Deadline for Submission of Onotolgy delieverables 
  • August 24 to August 29, 2012 – Crowdsourced And Expert Evaluation for Part 1. NO SUBMISSIONS ACCEPTED for contest during this week.
  • Milestone 1: August 30, 2012 – Winner for Part 1 contest announced and Ontology release to the community for Contest Part 2
  • Aug. 31 to Sept. 21, 2012 – Contest Part 2 Begins – Data Exploration / Text Mining of Patent Data
  • Milestone 2: Sept. 21, 2012 – Deadline for Submission Contest Part 2. FULL CONTEST CLOSING.
  • Sept. 22 to Oct. 5, 2012 – Crowdsourced and Expert Evaluation for contest Part 2
  • Milestone 3: Oct. 5, 2012 – Conditional Winners Announcement 

Possibly fertile ground for demonstrating the value of topic maps.

Particularly if you think of topic maps as curating search strategies and results.

Think about that for a moment: curating search strategies and results.

We have all asked reference librarians or other power searchers for assistance and watched while they discovered resources we didn’t imagine existed.

What if for medical expert searchers, we curate the “search request” along with the “search strategy” and the “result” of that search?

Such that we can match future search requests up with likely search strategies?

What we are capturing is the experts understanding and recognition of subjects not apparent to the average user. Capturing it in such a way as to make use of it again in the future.

If you aren’t interested in medical research, how about: Accelerating Discovery of Trolls by Text Mining of Patents? 😉

I first saw this at KDNuggets.


Update: 13 August 2012

Tweet by Lars Marius Garshol points to: Patent troll Intellectual Ventures is more like a HYDRA.

Even a low-end estimate – the patents actually recorded in the USPTO as being assigned to one of those shells – identifies around 10,000 patents held by the firm.

At the upper end of the researchers’ estimates, Intellectual Ventures would rank as the fifth-largest patent holder in the United States and among the top fifteen patent holders worldwide.

As sad as that sounds, remember this is one (1) troll. There are others.

June 27, 2012

MyMoneyAppUp by U.S. Department of the Treasury – $25,000

Filed under: Contest,Funding — Patrick Durusau @ 10:25 am

MyMoneyAppUp by U.S. Department of the Treasury – $25,000

Submission period: June 27 – August 12, 2012.

Prizes:

1st: $10,000

2nd: $5,000 (2)

3rd: $2,500 (2)

From the webpage:

The MyMoneyAppUp Challenge, launched by the U.S. Treasury Department in partnership with the D2D Fund and Center for Financial Services Innovation, is a contest intended to motivate American entrepreneurs, software developers, the public, and students to propose the best ideas and designs for next-generation mobile tools to help Americans control and shape their financial futures. The Challenge calls for mobile app ideas (IdeaBank) and designs (App Design), with cash prizes awarded to the best submissions. Competitors are encouraged to propose mobile apps that incorporate data to empower consumers, as part of Treasury’s initiative to promote Smart Disclosure. MyMoneyAppUp competitors who want to take their winning ideas to the next step and develop prototypes may enter the FinCapDev Competition, a complementary competition sponsored exclusively by D2D and CFSI at the conclusion of the MyMoneyAppUp Challenge. Support for prizes and the administration of the Challenge by CFSI and D2D for the MyMoneyAppUp Challenge comes from the Ford Foundation, Omidyar Network, and the Citi Foundation.

Sounds like a place where topic maps could play a role.

From something as simple as integrating balances from specified accounts or drafts on those accounts, to provide users with projected balances. Could even include projected credit card balances with interest rates.

Need a kill switch for the credit card one, at least while you are buying me a book present online. No particular holiday required. 😉

It’s not a lot of money but a good opportunity to build street cred for topic maps.

Heterogeneous data structures are the rule in the finance community.

PS: When some friend of yours says, “Oh, but we can use X to map between heterogeneous data structures.,” your response should be: “Sure, and when you move up in management, how do we know why that mapping exists?” “Or add to it?”

Fixed mappings are useful, but also repetitively expensive.

June 7, 2012

Data Prospecting

Filed under: Contest,Data Analysis — Patrick Durusau @ 2:17 pm

Derrick Harris writes in: Kaggle is now crowdsourcing big data creativity about a new product from Kaggle, Kaggle Prospect:

The Kaggle Prospect homepage says:

Kaggle Prospect is an open data exploration and problem identification platform that lets organizations with large datasets solicit proposals from the best minds in our 40,000 strong community of predictive modeling and machine learning experts. The experts will peer-review each others ideas’ and we’ll present you with the short list of what problems your data could answer.

If you are sitting on a gold mine of data, but aren’t sure where to start digging, Kaggle Prospect is the place to start.

Kaggle Prospect has a great deal of promise. Assuming enough users can pry data out of data silos for submission. 😉

If you are not familiar with Kaggle contests, see: Kaggle.

PS: I like the Kaggle headline:

We’re making data science a sport.™

May 2, 2012

Google BigQuery and the Github Data Challenge

Filed under: Contest,Data,Google BigQuery — Patrick Durusau @ 10:54 am

Google BigQuery and the Github Data Challenge

Deadline May 21, 2012

From the post:

Github has made data on its code repositories, developer updates, forks etc. from the public GitHub timeline available for analysis, and is offering prizes for the most interesting visualization of the data. Sounds like a great challenge for R programmers! The R language is currently the 26th most popular on GitHub (up from #29 in December), and it would be interesting to visualize the usage of R compared to other languages, for example. The deadline for submissions to the contest is May 21.

Interestingly, GitHub has made this data available on the Google BigQuery service, which is available to the public today. BigQuery was free to use while it was in beta test, but Google is now charging for storage of the data: $0.12 per gigabyte per month, up to $240/month (the service is limited to 2TB of storage – although there a Premier offering that supports larger data sizes … at a price to be negotiated). While members of the public can run SQL-like queries on the GitHub data for free, Google is charging subscribers to the service 3.5 cents per Gb processed in the query: this is measured by the source data accessed (although columns of data not referenced aren't counted); the size of the result set doesn't matter.

Watch your costs but thoughts on how you would visualize the data?

April 25, 2012

NYC BigApps

Filed under: Contest,Mapping,Marketing — Patrick Durusau @ 6:25 pm

NYC BigApps

From the webpage:

New York City is challenging software developers to create apps that use city data to make NYC better.

There are three completed contests (one just ended) that resulted in very useful applications.

NYC BigApps 3.0 resulted in:

NYC Facets: Best Overall Application – Grand Prize – Explores and visualizes more than 1 million facts about New York City.

Work+: Best Overall Application – Second – Prices – Working from home not working for you? Discover new places to get things done.

Funday Genie: Investor’s Choice Application – The Funday Genie is an application for planning a free day. Our unique scheduling and best route algorithm creates a smart personalized day-itinerary of things to do, including events, attractions, restaurants, shopping, and more, based on the user’s preferences. Everyday can be a Funday.

among others.

Quick question: How would you interchange information between any two of these apps? Or if you like, any other two apps in this or prior contests?

Second question: How would you integrate additional information into any of these apps, prepared for use by another application?

Topic maps can:

  • collate information for display.
  • power re-usable and extensible mappings of data into other formats.
  • augment data for applications that lack merging semantics.

Where is your data today and where would you like for it to be tomorrow?

April 16, 2012

Third Challenge on Large Scale Hierarchical Text Classification

Filed under: Classification,Contest — Patrick Durusau @ 7:12 pm

ECML/PKDD 2012 Discovery Challenge: Third Challenge on Large Scale Hierarchical Text Classification

Important dates:

– March 30, start of the challenge
– April 20, opening of the evaluation
– June 29, closing of evaluation
– July 20, paper submission deadline
– August 3, paper notifications

From the website:

This year’s discovery challenge hosts the third edition of the successful PASCAL challenges on large scale hierarchical text classification. The challenge comprises three tracks and it is based on two large datasets created from the ODP web directory (DMOZ) and Wikipedia. The datasets are multi-class, multi-label and hierarchical. The number of categories ranges between 13,000 and 325,000 roughly and the number of documents between 380,000 and 2,400,000.

The tracks of the challenge are organized as follows:

1. Standard large-scale hierarchical classification
a) On collection of medium size from Wikipedia
b) On a large collection from Wikipedia

2. Multi-task learning, based on both DMOZ and Wikipedia category systems

3. Refinement-learning
a) Semi-Supervised approach
b) Unsupervised approach

In order to register for the challenge and gain access to the datasets you must have an account at the challenge Web site.

More fun than repeating someone’s vocabulary. Yes?

April 14, 2012

CloudSpokes Coding Challenge Winners – Build a DynamoDB Demo

Filed under: Amazon DynamoDB,Amazon Web Services AWS,Contest,Dynamo — Patrick Durusau @ 6:27 pm

CloudSpokes Coding Challenge Winners – Build a DynamoDB Demo

From the post:

Last November CloudSpokes was invited to participate in the DynamoDB private beta. We spent some time kicking the tires, participating in the forums and developing use cases for their Internet-scale NoSQL database service. We were really excited about the possibilities of DynamoDB and decided to crowdsource some challenge ideas from our 38,000 strong developer community. Needless to say, the release generated quite a bit of buzz.

When Amazon released DynamoDB in January, we launched our CloudSpokes challenge Build an #Awesome Demo with Amazon DynamoDB along with a blog post and a sample ”Kiva Loan Browser Demo” application to get people started. The challenge requirements were wide open and all about creating the coolest application using Amazon DynamoDB. We wanted to see what the crowd could come up with.

The feedback we received from numerous developers was extremely positive. The API was very straightforward and easy to work with. The SDKs and docs, as usual, were top-notch. Developers were able to get up to speed fast as DynamoDB’s simple storage and query methods were easy to grasp. These methods allowed developers to store and access data items with a flexible number of attributes using the simple “Put” or “Get” verbs that they are familiar with. No surprise here, but we had a number of comments regarding the speed of both read and write operations.

When our challenge ended a week later we were pleasantly surprised with the applications and chose to highlight the following top five:

I don’t think topic maps has 38,000 developers but challenges do seem to pull people out of the woodwork.

Any thoughts on what would make interesting/attractive challenges? Other than five figure prizes? 😉

March 14, 2012

Plastic Surgeon Holds Video Contest, Offers Free Nose Job to Winner

Filed under: Contest,Marketing — Patrick Durusau @ 7:35 pm

Plastic Surgeon Holds Video Contest, Offers Free Nose Job to Winner by Tim Nudd.

From the post:

Plastic surgeons aren’t known for their innovating marketing. But then, Michael Salzhauer isn’t your ordinary plastic surgeon. He’s “Dr. Schnoz,” the self-described “Nose King of Miami,” and he’s got an unorthodox offer for would-be patients—a free nose job to the winner of a just-announced video contest.

Can’t give away a nose job but what about a topic map?

What sort of contest should we have?

What would you do for a topic map?

February 26, 2012

February 25, 2012

Flavorwocky

Filed under: Contest,Heroku,Neo4j — Patrick Durusau @ 7:40 pm

Flavorwocky

Another Neo4j challenge contender!

Lists foods that go well together.

I tried “rice” and “red beans” did not come up. 🙁

I will have to add that tomorrow.

FrostyMug – Beer Rating/Recommendation Service

Filed under: Contest,Heroku,Neo4j,Recommendation — Patrick Durusau @ 7:39 pm

Similarity-based Recommendation Engines by Josh Adell.

From the post:

I am currently participating in the Neo4j-Heroku Challenge. My entry is a — as yet, unfinished — beer rating and recommendation service called FrostyMug. All the major functionality is complete, except for the actual recommendations, which I am currently working on. I wanted to share some of my thoughts and methods for building the recommendation engine.

I hear “similarity” as a measure of subject identity: beers recommended to X; movies enjoyed by Y users, even though those are group subjects.

Or perhaps better, as a possible means of subject identity. A person could list all the movies they have enjoyed and that list be the same as a recommendation list. Same subject, just a different method of identification. (Unless the means of subject identification has an impact on the subject you think is being identified.)

January 25, 2012

3rd Globals Challenge

Filed under: Contest,Globalsdb,NoSQL — Patrick Durusau @ 3:25 pm

3rd Globals Challenge

Contest starts: 10 Feb 12 18:00 EST
Contest ends: 17 Feb 12 18:00 EST

Topic mappers take note:

All applications must be built using Globals. However, you are also allowed to use additional technologies to supplement Globals (emphasis added, additional technologies, unlike some linked data competitions)

The email I got reports:

  • A cash prize of USD $3,500 for the winning entry
  • A press release announcing the winning participant and solution
  • A chance to win a free registration for the InterSystems Global Summit

You might want to drop by Globals to grab a copy of the software and read up on the documentation.

You can also see the prior challenges. These are non-trivial events but that also means you will learn a lot in the process.

January 18, 2012

Neo4j Challenge – Seed the Cloud

Filed under: Contest,Graphs,Neo4j — Patrick Durusau @ 7:59 pm

Neo4j Challenge

Important Dates: January 18 – February 13, 2012

From the challenge webpage:

Challenge: Seed the Cloud

Join Neo4j on Heroku, then help others get started by creating a Heroku-ready template or demo application using Neo4j.

The best project templates will win recognition and prizes. Use any language, any framework, with Neo4j!

  1. Create a Project using the Neo4j Add-on
  2. Share the Project as a Template on Gensen
  3. Win a place in the clouds (and cool prizes)

Neo4j has thrown down their gage. Will you be the one that picks it up?

January 5, 2012

Digging into Data Challenge

Filed under: Archives,Contest,Data Mining,Library,Preservation — Patrick Durusau @ 4:09 pm

Digging into Data Challenge

From the homepage:

What is the “challenge” we speak of? The idea behind the Digging into Data Challenge is to address how “big data” changes the research landscape for the humanities and social sciences. Now that we have massive databases of materials used by scholars in the humanities and social sciences — ranging from digitized books, newspapers, and music to transactional data like web searches, sensor data or cell phone records — what new, computationally-based research methods might we apply? As the world becomes increasingly digital, new techniques will be needed to search, analyze, and understand these everyday materials. Digging into Data challenges the research community to help create the new research infrastructure for 21st century scholarship.

Winners for Round 2, some 14 projects out of 67, were announced on 3 January 2012.

Interested to hear your comments on the projects as I am sure the projects would as well.

January 2, 2012

Topical Classification of Biomedical Research Papers

Filed under: Bioinformatics,Biomedical,Contest,Medical Informatics,MeSH,PubMed — Patrick Durusau @ 6:36 pm

JRS 2012 Data Mining Competition: Topical Classification of Biomedical Research Papers

From the webpage:

JRS 2012 Data Mining Competition: Topical Classification of Biomedical Research Papers, is a special event of Joint Rough Sets Symposium (JRS 2012, http://sist.swjtu.edu.cn/JRS2012/) that will take place in Chengdu, China, August 17-20, 2012. The task is related to the problem of predicting topical classification of scientific publications in a field of biomedicine. Money prizes worth 1,500 USD will be awarded to the most successful teams. The contest is funded by the organizers of the JRS 2012 conference, Southwest Jiaotong University, with support from University of Warsaw, SYNAT project and TunedIT.

Introduction: Development of freely available biomedical databases allows users to search for documents containing highly specialized biomedical knowledge. Rapidly increasing size of scientific article meta-data and text repositories, such as MEDLINE [1] or PubMed Central (PMC) [2], emphasizes the growing need for accurate and scalable methods for automatic tagging and classification of textual data. For example, medical doctors often search through biomedical documents for information regarding diagnostics, drugs dosage and effect or possible complications resulting from specific treatments. In the queries, they use highly sophisticated terminology, that can be properly interpreted only with a use of a domain ontology, such as Medical Subject Headings (MeSH) [3]. In order to facilitate the searching process, documents in a database should be indexed with concepts from the ontology. Additionally, the search results could be grouped into clusters of documents, that correspond to meaningful topics matching different information needs. Such clusters should not necessarily be disjoint since one document may contain information related to several topics. In this data mining competition, we would like to raise both of the above mentioned problems, i.e. we are interested in identification of efficient algorithms for topical classification of biomedical research papers based on information about concepts from the MeSH ontology, that were automatically assigned by our tagging algorithm. In our opinion, this challenge may be appealing to all members of the Rough Set Community, as well as other data mining practitioners, due to its strong relations to well-founded subjects, such as generalized decision rules induction [4], feature extraction [5], soft and rough computing [6], semantic text mining [7], and scalable classification methods [8]. In order to ensure scientific value of this challenge, each of participating teams will be required to prepare a short report describing their approach. Those reports can be used for further validation of the results. Apart from prizes for top three teams, authors of selected solutions will be invited to prepare a paper for presentation at JRS 2012 special session devoted to the competition. Chosen papers will be published in the conference proceedings.

Data sets became available today.

This is one of those “praxis” opportunities for topic maps.

December 7, 2011

USEWOD 2012 Data Challenge

Filed under: Contest,Linked Data,RDF,Semantic Web,Semantics — Patrick Durusau @ 8:08 pm

USEWOD 2012 Data Challenge

From the website:

The USEWOD 2012 Data Challenge invites research and applications built on the basis of USEWOD 2012 Dataset.

Accepted submissions will be presented at USEWOD2012, where a winner will be chosen. Examples of analyses and research that could be done with the dataset are the following (but not limited to those):

  • correlations between linked data requests and real-world events
  • types of structured queries
  • linked data access vs. conventional access
  • analysis of user agents visiting the sites
  • geographical analysis of requests
  • detection and visualisation of trends
  • correlations between site traffic and available datasets
  • etc. – let your imagination run wild!

USEWOD 2012 Dataset

The USEWOD dataset consists of server logs from from two major web servers publishing datasets on the Web
of linked data. In particular, the dataset contains logs from:

  • DBPedia: slices of log data
    spanning several months from
    the linked data twin of Wikipedia, one of the focal points of the Web of data.
    The logs were kindly made available to us for the challenge
    by OpenLink Software!
    Further details about this part of the dataset to follow.
  • SWDF:
    Semantic Web Dog Food is a
    constantly growing dataset of publications, people and organisations in the Web and Semantic Web area,
    covering several of the major conferences and workshops, including WWW, ISWC and ESWC. The logs
    contain two years of requests to the server from about 12/2008 until 12/2010.
  • Linked Open Geo Data A dataset about geographical data.
  • Bio2RDF Linked Data for life sciences.

Data sets are still under construction. Organizers advise that data sets should be available next week.

Your results should be reported as short papers and are due by 15 February 2011.

December 2, 2011

2nd Globals Challenge

Filed under: Contest,Globalsdb — Patrick Durusau @ 10:48 am

2nd Globals Challenge

Just a few hours left until the start of the 2nd Globals Challenge so I am sending this on its way.

Details being released at 18:00 EST on 2 December 2011!

Prizes to be awarded!

Maybe by this time next year we could organize something like this for topic maps. That would be way cool!

November 20, 2011

FantasySCOTUS

Filed under: Contest,Legal Informatics — Patrick Durusau @ 4:16 pm

Fantasy SCOTUS

Another example of imaginative use of technology to interest people in what is often seen as “boring” material. Supreme court cases have outcomes that have impacts on real people. I haven’t played but suspect participant gain a lot of knowledge about the facts and law in each case.

Not to mention that there are monthly drawings for $200 Amazon gift certificates. See the site for details.

From the about page:

FantasySCOTUS is the Internet’s Premier Supreme Court Fantasy League. Last year, over 5,000 attorneys, law students, and other avid Supreme Court followers made predictions about all cases that the Supreme Court decided. On average, members of the league correctly predicted the cases nearly 60% of the time, and accurately predicted that Elena Kagan would be nominated as the 100th Associate Justice of the Supreme Court. Justin Donoho, who received the highest score out of 5,000+ members, was nominated and confirmed as the inaugural Chief Justice of FantasySCOTUS.

FantasySCOTUS is brought to you by the Harlan Institute. The Harlan Institute’s mission is to bring a stylized law school experience into the high school classroom to ensure that our next generation of leaders has a proper understanding of our most fundamental laws. By utilizing the expertise of leading legal scholars and the interactivity of online games, Harlan will introduce students to our Constitution, the cases of the United States Supreme Court, and our system of justice. Harlan’s long term strategic goal is to develop condensed law school courses that can be taught at no cost in high schools across the country using engaging online programs.

This and the Crowdsourcing Scientific Research I mentioned yesterday make me think that perhaps TREC in 2012 should have a crowdsourced component. Where the data set is available over the WWW and interfaces are proposed and tested to interest the general public in participating. What was that they said about all bugs being shallow if you just had enough eyes?

Up to now, TREC has had a small set of eyes with very powerful machines and algorithms. Would be interesting to see what a crowd, plus imaginative interface and fast interaction could do? Could be a path towards a distributed knowledge economy where users log onto tasks/interfaces that interest them.

November 18, 2011

Our second $5000 information design challenge is on!

Filed under: Contest — Patrick Durusau @ 9:38 pm

Our second $5000 information design challenge is on!

From the post:

We’ve got another pot of info-design gold to give away – and this time your work might land you on the Guardian Datablog.

Last month we ran our first visualization challenge. And, boy, did you peeps really rise to it.

And here’s our second challenge: MON€Y PANIC$!

The financial system, debt crises, recession fears, Wall St occupation, currency devaluation, collapse of the markets, the END OF THE WORLD! It’s all getting rather mind-boggling.

So we and the Guardian have found some juicy datasets we want you to use to explain what in the world is going on. Clearly. Understandably. Visibly.

What are you waiting here for? Go see about the contest and then come back to read more about topic maps!

You might even learn something about visual design that you can apply to your next topic map project!

November 16, 2011

Yandex – Relevance Prediction Challenge

Filed under: Contest,Relevance — Patrick Durusau @ 8:18 pm

Yandex – Relevance Prediction Challenge

Important Dates:

Oct 15, 2011 – Challenge opens

Dec 15 22, 2011 – End of challenge

Dec 25, 2011 – Winners candidacy notification

Jan 20, 2012 – Reports deadline

Feb 12, 2012 – WSCD workshop at WSDM 2012, Winners announcement

Sorry, you are late starting already, here are some of the details, see the website for more:

From the webpage:

The Relevance Prediction Challenge provides a unique opportunity to consolidate and scrutinize the work from industrial labs on predicting the relevance of URLs using user search behavior. It provides a fully anonymized dataset shared by Yandex which has clicks and relevance judgements. Predicting relevance based on clicks is difficult, and is not a solved problem. This Challenge and the shared dataset will enable a whole new set of researchers to conduct such experiments.

The Relevance Prediction Challenge is a part of series of contests organized by Yandex called Internet Mathematics. This year’s event is the sixth since 2004. Participants will again compete in finding solutions to a real-life problem based on real-life data. In previous years, participants tried to learn to rank documents, predict traffic jams and find similar images.

I can’t think of very many “better” days to find out you won such a contest!

November 7, 2011

Challenge.gov

Filed under: Contest,Marketing — Patrick Durusau @ 7:27 pm

Challenge.gov

From the FAQ:

About challenges

What is a challenge?

A government challenge or contest is exactly what the name suggests: it is a challenge by the government to a third party or parties to identify a solution to a particular problem or reward contestants for accomplishing a particular goal. Prizes (monetary or non–monetary) often accompany challenges and contests.

Challenges can range from fairly simple (idea suggestions, creation of logos, videos, digital games and mobile applications) to proofs of concept, designs, or finished products that solve the grand challenges of the 21st century. Find current federal challenges on Challenge.gov.

About Challenge.gov

Why would the government run a challenge?

Federal agencies can use challenges and prizes to find innovative or cost–effective submissions or improvements to ideas, products and processes. Government can identify the goal without first choosing the approach or team most likely to succeed, and pay only for performance if a winning submission is submitted. Challenges and prizes can tap into innovations from unexpected people and places.

Hard to think of better PR for topic maps than being the solution to one or more of these challenges.

If you know of challenges in other countries or by other organizations, please post or email pointers to them.

November 5, 2011

How to enter a data contest – machine learning for newbies like me

Filed under: Contest,Data Contest,Machine Learning — Patrick Durusau @ 6:43 pm

How to enter a data contest – machine learning for newbies like me

From the post:

I’ve not had much experience with machine learning, most of my work has been a struggle just to get data sets that are large enough to be interesting! That’s a big reason why I turned to the Kaggle community when I needed a good prediction algorithm for my current project. I wasn’t completely off the hook though, I still needed to create an example of our current approach, limited as it is, to serve as a benchmark for the teams. While I was at it, it seemed worthwhile to open up the code too, so I’ve created a new Github project:

https://github.com/petewarden/MLloWorld

It actually produces very poor results, but does demonstrate the basics of how to pull in the data and apply one of scikit-learn’s great collection of algorithms. If you get the itch there’s lots of room for improvement, and the contest has another two weeks to run!

There is a case to be made for machine learning in the production of topic maps and what better motivation than contests for learning it?

Which makes me wonder how to structure something similar for topic maps? Contests that is for creating topic maps from one or more data sets? Coming up with funding for something like a meaningful prize would not be as hard as setting up something that was not too easy but also not too hard. At least not for the early contests anyway. 😉

For the early ones, pride of first place might be enough.

Suggestions/Comments?

April 6, 2011

Open Data Challenge

Filed under: Contest,Dataset — Patrick Durusau @ 6:19 pm

Open Data Challenge

EU residents and organizations with operations in the EU can compete in four basic categories:

  • Ideas – Anyone can suggest an idea for projects which reuse public information to do something interesting or useful.
  • Apps – Teams of developers can submit working applications which reuse public information.
  • Visualisations – Designers, artists and others can submit interesting or insightful visual representations of public information.
  • Datasets – Public bodies can submit newly opened up datasets, or developers can submit derived datasets which they’ve cleaned up, or linked together

Runs 5 April to 5 June, 2011

See the site for various rules and details.

« Newer Posts

Powered by WordPress