The Problem

Problem Description:
The contest will be declared open during the Machine Learning event presentation on February 28, 2014. After the event has been declared open, the datasets (first two) will be available for download. We have three different datasets:
  • 1. Training data consisting of {image_id, image, category label}
  • 2. Validation data consisting of {image_id, image}. The participating team can upload the output labels produced by their algorithm corresponding to this validation data. The scores will be displayed immediately to assist the team to evaluate themselves.
  • 3. Test data consisting of {image_id, image}. This data will be published only on the day of final submission and not before that. The performance of the model on this test data only will be used for judging winners.

The datasets will be available for download from the ‘Resouces'/'Downloads’ tab in this website. The participants need to use the images and the category label from the training data to learn a classifer and use this classifier to produce outputs for the validation data and the test data. Note that USE OF EXTERNAL DATA SOURCES WHILE MAKING PREDICTIONS ON THE VALIDATION / TEST DATA IS STRICTLY PROHIBITED. (Please note the category label will be either 'Human Faces', 'Flowers', 'Buildings', 'Cars', or 'Shoes'. As is evident this is a multi-class classification problem.)


Evaluation:
For the validation data, every submission (in the correct desired format) of the output labels will be evaluated and scores will be assigned and will be eligible for display in the leaderboard. Only the best score per team will be displayed in the leaderboard. The leaderboard will remain on display throughout the contest duration and the teams can upload the output labels for the validation data several times and keep improving their algorithm. However, these scores are just to tune your model, they will not affect your final scores on test data.

On the final day of submission, the test data will be available for download. The candidates need to submit the output labels for this test data, a writeup of their algorithm and the code they have used. Please look for instructions about the desired formats of submission in the ‘Resources’ tab. The candidates are required to upload only the output labels produced by their model in the desired format. These labels will be matched in the backend with the actual labels and the classification accuracy will be shown(that is the fraction of correct labels). Both the classification accuracy as well as the final submitted code will be used to judge the winner(s).
Challenges :
The data can be expected to be noisy (there might be some images which may have incorrect labels). Hence, preprocessing the training data and feature extraction will be very critical to obtaining high accuracy on the test data and the validation data.

Another crucial part of the problem will be feature representation as it might dictate how well the classifier is trained and hence its performance will be affected. In due course, we will provide hints for feature representation.

Choice of the classifer will again lead to varying performance of the test data and validation data.
Rules :
  1. Every team can have a maximum of 2 members only.
  2. The team must register anytime during the event duration (till March 10)with valid email-ids of its members and a teamname(this will be used for display in the leaderboard). Please note email-ids will not be disclosed online. However the winning team member names will be disclosed after the event has ended.
  3. No participant is allowed to participate in multiple teams. If a person participates in multiple teams, then those teams will be disqualified.
  4. After registration, no change in team composition is allowed.
  5. The event will be open for two weeks from 1st to 14th March 11:59pm IST (GMT+5:30 hours).
  6. The test data will be available on 14th March at 00:01am IST. Final submission has to be made before 14th March 11:59pm IST (GMT+5:30 hours). Trial submissions (that is uploading labels on validation data) will continue till the event end. For the final submission on March 14th, multiple submissions can be done but only the latest will be considered. Please follow the ‘Dates’ and the ‘Resources’ tab for updates.
  7. To be eligible for winning prize, the participating team should upload a writeup(maximum 2-page describing the algorithm and implementation) as well as the perfectly working code with proper documentation during the final submission. The details about submission format will be provided with the dataset in the ‘Resources’ tab.
  8. Please note classification accuracy alone will not be used for judging winners. The writeup and properly documented code will also be used to decide winners.