-->

Monday, March 31, 2014

generating your categorization labels in 20minutes

Imagine you need to deliver quantitative results to your supervisor on a new dataset and you only have your samples and no annotations, how do to it in practise in less than 30mins? I am assuming here that you have:
  • a Crowdflower / Amazon Mechanical Turk account
  • an Amazon AWS account
First of all upload all your images into an S3 bucket

aws s3 sync /media/myimages/  s3://bucket-name

Set the correct permission with the following policy by right-clicking on the bucket link and then properties - >  Edit Bucket policies and insert the following:

{
  "Version":"2008-10-17",
  "Statement":[{
    "Sid":"AllowPublicRead",
        "Effect":"Allow",
      "Principal": {
            "AWS": "*"
         },
      "Action":["s3:GetObject"],
      "Resource":["arn:aws:s3:::bucket/*"
      ]
    }
  ]
}
Expose it with cloudfront and wait for 15minutes.... Then you have your list of files exposed worldwide. Generate a list of urls like
http://d3sdfsdfsodfsfdn.cloudfront.net/mysuperfile1.jpg
http://d3sdfsdfsodfsfdn.cloudfront.net/mysuperfile2.jpg
http://d3sdfsdfsodfsfdn.cloudfront.net/mysuperfile3.jpg
http://d3sdfsdfsodfsfdn.cloudfront.net/mysuperfile4.jpg
now login into crodwflower, design create a new job and in the CSM



Click on Manage data - >  upoad and dowload the csv template.

Open it with MS excel ( very important not to mess around with the .csv extension) and copy in the column image_url the list or urls that you want to crowdsource. Following the rest of the instructions and wait! You should be able to download the results in minutes..

No comments:

Post a Comment