Luca's pills for computer vision and machine learning experiments on linux: python

Showing posts with label python. Show all posts

Monday, January 5, 2015

how to print percentages in python

Very simple pattern to show your percentages into a printed string

>>> print "%.0f%%" % (100.0 * 1/3)
33%

Tuesday, August 19, 2014

Switch your EC2 instance from command line with python and BOTO

First thing first: security. Even before starting messing around with your AWS credentials make yourself sure to get the required precautions.

You dont want to wake up and find that someone has launched 60 c3.x8 servers across 5 regions using the credentials that you left in a backed up directory...

First thing create a user if you dont have yet: go in the IAM console, create new user and that's it ..

Then the more interesting part: associate a user policy to that user. This will limit the amount of messing that the user can do into your AWS account.

click on the user -> in users policies -> click on "Attach User Policy"
Click on custom policy generator
give your policy a name (e.g. 'my-restricted-policy')
copy and paste the following policy

{

"Version": "2012-10-17",

"Statement": [{

"Effect": "Allow",

"Action": [

"ec2:DescribeInstances", "ec2:DescribeImages",

"ec2:DescribeAvailabilityZones",

"ec2:StopInstances", "ec2:StartInstances"

"Resource": "*"

}

]

}

You are almost done: you just need to generate your security credentials for the user with the limited policy generator.

So click on the user -> select Security Credentials, click on Manage Access Keys and then Create access key.

At this point you should have something that looks like the following

aws_access_key_id = 'DSFSDFSDFWEFEWF'

aws_secret_access_key = 'ldfjjs8wnoliencdnscdmsfkmsdkfml32'

First part is done, now a few more lines of python and that 's it :

import boto.ec2


name_of_the_instance = 'put-here-the-name-of-your-instance'
name_of_the_region ='put-here-the-name-of-the-region-where-the-machine-is-located-eg-eu-west-1' 

# access key only for on / off
aws_access_key_id_     = 'DSFSDFSDFWEFEWF'
aws_secret_access_key_ = 'ldfjjs8wnoliencdnscdmsfkmsdkfml32'


conn = boto.ec2.connect_to_region(region_name, 
                                  aws_access_key_id     = aws_access_key_id_,
                                  aws_secret_access_key = aws_secret_access_key_ )

inst = conn.get_all_instances(filters={'tag:Name': name_of_the_instance})[0].instances[0]
print inst.stop()

Friday, April 25, 2014

filling the beautiful (and damn) nparray in a loop

Simple, first append vector by vector into a list put everything in a list

 
tmp_values = []

for i in myiterations: 
    tmp_list.append(s_tr.values)

and then convert the list into a numpy array

tst=np.array(tmp_list,dtype=float)
tst[:,1]

Wednesday, September 18, 2013

Super-simple Lookup table from textual file

import sys

# Usage:
# python lut.py
# : file containing two columns, key and value
# : file containing the keys to retrieve the corresponding values

lut= dict([f.strip().split() for f in open(sys.argv[1],'r').readlines()])
lines=[l.strip().split() for l in open(sys.argv[2]).readlines()]

for l in lines:
print lut.get(l[0])

Saturday, August 10, 2013

figures in python notebook

just remember to issue as first command

ipython notebook --pylab inline

Thursday, June 6, 2013

line of python to get the youtube auto caption in a text file

import re,sys

for l in re.findall('

(.*?)

',sys.stdin.read(),re.DOTALL):
print l

Sunday, May 19, 2013

tuples

They are immutable lists

Lists

Lists are ordered collections of items: they are the arrays of python but better shaped. They are mutable ( they can be changed in place)

good practices:
-always use append
- remember that operations like sort change the list object in-place which means that :

given :
l.sort()
you dont need to copy it into another list

# super simple way to deal with list ( array ) of values

tpr1=[]
fpr1=[]
for th in arange(1,10,1):
tpr1.append(th)
fpr1.append(th)

but if you want performances use dictionaries:

tpr1={}
c=0
for th in arange(1,10,1):
tpr1[c]=th
c+=1

.. to pipe in text into a python script..

import sys
data = sys.stdin.readlines()
print "Counted", len(data), "lines."

deal with classification scores obtained with a log loss

# script for sigmoid transform and L1 normalization of classification scores obtained with log loss

from numpy import loadtxt,savetxt, exp, tile,transpose, size, sum
from numpy.core.fromnumeric import size
import sys,os

filename_f=sys.argv[1]

# load matrix of scores NxM ( N= images, M=categories)
f=loadtxt(filename_f)

# sigmoid to get probabilities
pf=1./(1+exp(-f))

# L1 normalization
pf_n=pf / transpose( tile( sum(pf,1), (size(f,1),1) ) )

filename_pf_n=os.path.splitext(filename_f)[0]+'Norm'+os.path.splitext(filename_f)[1]

savetxt(filename_pf_n,pf_n,fmt='%.6e')

NOTE WELL:

a. import all the fuctions you are using from numpy ( it is faster
b. to extract path and extension use os.path.splitext

Dictionaries

Dictionaries are unordered collection of items that are stored and fetched by key!

c={} dictionary
c=[] list

c=dict(enumerate(set(open('/home/lmarches/cvpr13/visual/features_ml/lists_mlfu300/class.txt').read().split())))

set = get unique elements
enumerate = generate the sequence
dict= assembles the dictionary

433: 'you_need',
434: 'your_camera',
435: 'your_entry',
value, key

if you want to invert :

;c_inv= dict([(value, key) for (key, value) in c.iteritems()] )

'you_need': 433
'your_camera', 434
'your_entry', 435

if you want to search by value

[key for key,value in c.items() if value=='your_entry' ][0]

435

Strings: indexing, single, double and triple quotes...

For indexing remember that you can index from the start and from the end of a string:

s='luca'

print s[:2]

print s[:-2]

the triple quotes can be useful in some cases, when you need to include really long strings (e.g. containing several paragraphs of informational text), it is annoying that you have to terminate each line with \n\, especially if you would like to reformat the text occasionally with a powerful text editor like Emacs. For such situations, ``triple-quoted'' strings can be used, e.g.

        hello = """

            This string is bounded by triple double quotes (3 times ").
        Unescaped newlines in the string are retained, though \
        it is still possible\nto use all normal escape sequences.

            Whitespace at the beginning of a line is
        significant.  If you need to include three opening quotes
        you have to escape at least one of them, e.g. \""".

            This string ends in a newline.
        """

Monday, May 13, 2013

intersect two list of files

useful check to ensure that there's zero overlap between training/ test/ val splits:

set(open('testFree.jpgl')) & set(open('trainFree.jpgl'))

if you put into a bash script it should look like this

#!/bin/bash
echo -e "for i in set(open('$1')) & set(open('$2')):print i" | python

credit: @larsmans in stackoverflow

Wednesday, April 24, 2013

stats about my blog entries

from scipy import signal
import re
g=[]
g11=[]
for bl in re.findall('\\\\begin{BLOG}.*?\\\\end{BLOG}',open('2012_log.tex').read(),re.DOTALL):g.append(len(bl))
for bl in re.findall('\\\\begin{BLOG}.*?\\\\end{BLOG}',open('../2011/all_test_.tex').read(),re.DOTALL):g11.append(len(bl))
g11.reverse()
g11_p=hstack((zeros(15),g11))
plot(range(0,len(g11_p)),g11_p,range(0,len(g11_p)),signal.cspline1d(array(g11_p,dtype='float'),10.0))
plot(range(0,len(g)),g,range(0,len(g)),signal.cspline1d(array(g,dtype='float'),10.0))
plot(range(0,len(g11_p)),signal.cspline1d(array(g11_p,dtype='float'),10.0),range(0,len(g)),signal.cspline1d(array(g,dtype='float'),10.0))

Monday, April 22, 2013

MATLAB commands in numerical Python

Saturday, April 20, 2013

dealing with matrices "a la matlab"

Given a list :

your_list=[]

you play with your list :

your_list.append(3)

Most important thing is to handle it as a numpy array:

your_array_np = np.array(your_array,'dtype=int8')

and then you can do fancy things like indexing it using vectors of indexes, exactly as matlab:

gt=[]

c_idx=1
for l in labl:
if (len(re.findall(classtxt[c_idx],l))>0):
gt.append(1)
else:
gt.append(0)

gta=np.array(gt,dtype='int8')

len(find(gta[find((gta[:]-int8(sgdMat[:,c_idx]>th))==0)]!=0))

Monday, April 15, 2013

check if file exists

try:
   with open('filename'): pass
except IOError:
   print 'Oh dear.'

Matching strings in python

use is or == but be careful about

empty spaces => ???
newlines = > use rstrip()and lstrip()

An even better strategy from stackoverflow

s = s.strip(' \t\n\r')

This will strip any space, \t, \n, or \r characters from the left-hand side, right-hand side, or both sides of the string

to debug string matching always print with paddings

print "[DEBUG]: \nnew :"+reservation_new_time+"<"+"\nstd :"+reservation_stored+"<"+" cmp: "+ str(cmp(re\
servation_new_time,reservation_stored))

Friday, March 22, 2013

sparse matrices from matlab to python

#Load your matlab matrix in python
testFree=scipy.io.loadmat('TestFreeSparse.mat')

#get the matrix
Test=testFree['TestSparse']

#to dense
TestDense=Test.todense()

# get the indexes of non-zero values
I,J= Test.nonzero()

# get the data
V=Test.data

G = scipy.sparse.csr_matrix((V,(I,J)))
G[0,112342]

Monday, March 11, 2013

The 3 things you absolutely need to know about regexp

1. By default all qualifiers are greedy which means that they match as more text as they can! However if you want to match several instances of the same pattern add the ? identifier

re.findall('.*?,page)

2. By default newlines are not matched in std regexp so you can:

either remove the newlines with

re.findall('\begin{itemize}.*?\end{itemize}', page.replace('\n', ''))

or

re.findall('\begin{itemize}.*?\end{itemize}', page, re.DOTALL)

3. To get a number (float or integer) you can use again ? but this time to make a character optional:

re.findall('r'\d+\.?\d+',page)