Monday, March 31, 2014

generating your categorization labels in 20minutes

Imagine you need to deliver quantitative results to your supervisor on a new dataset and you only have your samples and no annotations, how do to it in practise in less than 30mins? I am assuming here that you have:
  • a Crowdflower / Amazon Mechanical Turk account
  • an Amazon AWS account
First of all upload all your images into an S3 bucket

aws s3 sync /media/myimages/  s3://bucket-name

Set the correct permission with the following policy by right-clicking on the bucket link and then properties - >  Edit Bucket policies and insert the following:

      "Principal": {
            "AWS": "*"
Expose it with cloudfront and wait for 15minutes.... Then you have your list of files exposed worldwide. Generate a list of urls like
now login into crodwflower, design create a new job and in the CSM

Click on Manage data - >  upoad and dowload the csv template.

Open it with MS excel ( very important not to mess around with the .csv extension) and copy in the column image_url the list or urls that you want to crowdsource. Following the rest of the instructions and wait! You should be able to download the results in minutes..

No comments:

Post a Comment