Introduction to Python

DRAFT version 1.6

Prepared for:
Clemson University Cyberinfrastructure Technology Integration (CITI)

Mark Smotherman
June 2011

Table of Contents

Introduction
Data Types
1. Ordered Compound Types
  1. Strings
  2. Lists
2. Unordered Compound Types
  1. Sets
  2. Dictionaries
Statements
Functions
Input/Output
Modules
Functional Style Programming
OO Style Programming
Regular Expressions
OS Module
Bioinformatics Examples
1. Biopython Example
2. Molecular Modeling Tool Kit Example
Scientific Computation Examples
1. NumPy Example
2. SciPy Example
Text Processing Examples
1. BeautifulSoup HTML Parsing Example
2. Natural Language Tool Kit Example
Resources

Introduction

Python programming language

background
- designed by Guido van Rossum ca. 1990
- extensible, obvious, and fun
- should be one obvious way to accomplish something (as opposed to Perl's many ways to do the same thing)
- named for Monty Python's Flying Circus
versions
- Python 2.0 introduced in 2000
- Python 3.0 introduced in 2008 but is not backwards compatible
- version 2.4.3 is installed on Palmetto

Python is an interpreted language

Python can be used in interactive mode

>>> indicates command prompt
... indicates inside-block prompt
hit enter to exit the inside-block prompt and evaluate the previous statement(s)
hit cntl-D (or cntl-Z on Windows) to exit the interpreter

Interpreter Example (user enters the blue text and the control character)

[cmd-line-prompt] python Python 2.4.3 (#1, Nov 11 2010, 13:30:19) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> for i in range(0,5): ... print i, ... 0 1 2 3 4 >>> cntl-D [cmd-line-prompt]

Python can also run a saved program file

create and edit a file with suffix .py
pass to the interpreter by python my_prog.py

Command Line Example (user enters the blue text and the control character)

[cmd-line-prompt] cat > test.py for i in range(0,5): print i, cntl-D [cmd-line-prompt] python test.py 0 1 2 3 4 [cmd-line-prompt]

you can also easily turn a Python program into an executable script on a Unix/Linux system

include a special first line that invokes the Python interpreter
change the file permissions to make the file executable

Executable Script Example (user enters the blue text and the control character)

[cmd-line-prompt] cat > test #!/usr/bin/env python cntl-D [cmd-line-prompt] cat test.py >> test [cmd-line-prompt] cat test #!/usr/bin/env python for i in range(0,5): print i, [cmd-line-prompt] chmod u+x test [cmd-line-prompt] ./test 0 1 2 3 4 [cmd-line-prompt]

Python supports imperative, functional, and OO programming styles
no declarations

dynamic typing

Dynamic Typing Example

>>> a = 1 >>> type(a) <type 'int'> >>> a = 'a' >>> type(a) <type 'str'>

each variable name is an object reference (i.e., a key that is used to access a symbol-and-value table)
variable names are case-sensitive

an access to an undefined object is a run-time error

Undefined Object Example

>>> z Traceback (most recent call last): File "<stdin>", line 1, in ? NameError: name 'z' is not defined

automatic memory management and garbage collection
block structuring
- blocks are introduced by a colon
- no begin/end keywords or braces - Python uses indentation instead
- you can indent using space(s) or tab(s), but you must be consistent within the block
no semicolons to end statements
- a statement ends at a newline
- a semicolon can be used to separate two statements on the same line
- a continuation symbol (blackslash) is needed to allow a statement to continue on to another line
you can easily incorporate compiled C/C++ modules
many extensions, including Biopython for bioinformatics, NumPy and SciPy for scientific computation, and NLTK for natural language processing

Data Types

Overview

primitive types
- int
- float (double precision only)
- bool
- str (but also treated as an ordered compound type)
compound types and containers
- complex number - real and imaginary parts
- ordered types (also called sequences): strings, lists, and tuples
- unordered types: sets and dictionaries
some types are mutable, i.e., values can be changed in-place
some types are immutable
- values cannot be changed in-place
- instead a new object must be allocated as the result of an operation
- e.g., strings and tuples

Ordered Compound Types

Overview

a string is an immutable data type
- uses single or double quotes
- '123abc'
a list is a dynamic array and is a mutable data type
- uses brackets
- [1,2,3]
- a list can contain objects of different data types
- ['a',1,'23bc',3.5]
a tuple is an immutable data type
- uses parentheses
- (1,2,3)
- can think of a tuple as a read-only list
- can also contain objects of different data types
strings, lists, and tuples can be indexed and sliced
- 0-origin indexing
- brackets used for indexing regardless of string, list, or tuple
- slices - a[i:j] a[i:] a[:i] - where i is beginning index, and j is limit (i.e., one beyond)
- see examples below
infix operators on strings, lists, and tuples
- + concatenates two ordered types
- * repeats an ordered type multiple times
- but note that a comma will create a tuple
- see examples below
len is a built-in length command for ordered compound types

Strings

in Python a string is one of the basic data types (str); unlike other languages, there is not a more primitive character data type
you can use either single quote (') or double quote (")
- 'spam'
- "eggs"
for embedded quotes you can use the alternate quote character or use a backslash escape to include the same quote character
- "Lumberjacks enjoy 'spam and eggs'"
- 'Lumberjacks enjoy \'spam and eggs\''

multi-line strings are allowed with three quotes at beginning and end

"""Mr. Praline: Look, I took the liberty of examining that
parrot when I got it home, and I discovered the only reason
that it had been sitting on its perch in the first place was
that it had been NAILED there."""

note that there is no special, Perl-like interpolation of strings when you use double quotes; use % instead
- 'Lumberjacks also enjoy %s' % 'buttered scones'
string variants
- byte string - b'eggs'
- Unicode string - u'spam is pronounced sp\u00E6m'
- raw string - r prefix allows special characters without backslash escapes, e.g., r'\n' is a two-byte string equivalent to '\\n' ('\n' is a single-byte newline)
strings are immutable so operations on strings must allocate and return new strings
a string is an ordered data type (sequence) and can be indexed and sliced
String Indexing and Slicing Example
>>> s = 'this is a test' >>> s[3] 's' >>> s[0:4] 'this' >>> s[:5] 'this ' >>> s[5:] 'is a test'
indexing from the right uses negative indices
String Negative Indexing Example
>>> s = 'this is a test' >>> s[-3] 'e'

a string has + and * infix operators

String Infix Operators Example

>>> 'abc' + 'def' 'abcdef' >>> 'abc'*2 'abcabc' >>> 'abc','def' ('abc', 'def')

string length is built-in and named len
String Length Example
>>> s='abcde' >>> len(s) 5
string methods include:
- count - return the number of occurrences of a substring in a string
- find - return the lowest index in a string where a substring is found
- join - return a string which is the concatenation of the strings in the sequence passed in as an argument, with the separator being the string providing this method (there is a join example shown in the list section below)
- replace - return a copy of a string with all occurrences of a substring replaced by a new substring
String Methods Example
>>> s = 'this is a test' >>> s.count('is') 2 >>> s.find('a') # 0-origin 8 >>> s.replace('a','the') 'this is the test'
while running the interpreter, you can use dir(str) to show the attributes and methods for the string data type, and you can use help(str) to review the expected arguments

Lists

a list is an ordered data type (sequence) and can be indexed and sliced
List Indexing and Slicing Example
>>> a = [1,2,3,4] >>> a[1] [2] >>> a[1:1] [] >>> a[1:2] [2] >>> a[1:3] [2, 3] >>> a[:3] [1, 2, 3]
indexing from the right uses negative indices
List Negative Indexing Example
>>> a = [1,2,3,4] >>> a[-2] 3

a list has + and * infix operators

List Infix Operators Example

>>> a = [1,2] >>> b = [3,4] >>> a+b [1, 2, 3, 4] >>> a*2 [1, 2, 1, 2] >>> a,b ([1, 2], [3, 4])

list length is built-in and named len
List Length Example
>>> a = [1,2,3,4] >>> len(a) 4

since a list is a mutable data type, you can assign to slices, possibly changing the length of the list

List Slice Assignment Example

>>> a = [1,2,3,4] >>> a[2] = [5,6,7] # single element >>> a [1, 2, [5, 6, 7], 4] >>> len(a) 4 >>> b = [1,2,3,4] >>> b[2:3] = [5,6,7] # slice >>> b [1, 2, 5, 6, 7, 4] >>> len(b) 6 >>> b[1:4] = [] >>> b [1, 7, 4]

del can be used to delete individual list entries or slices
List Delete Element Example
>>> a = [1,2,3,4] >>> del a[2] >>> a [1, 2, 4] >>> del a[0:2] >>> a [4]
list methods include:
- append - add object to end of list
- count - return number of times an object appears in the list
- extend - add multiple objects to end of list
- index - return index of first occurrence of object
- insert - insert object prior to given index position
- pop - remove last object (optionally: object at given index)
- remove - remove first instance of object
- reverse
- sort
List Methods Example
>>> a = [1,'x',3.0] >>> a.append('y') >>> a [1, 'x', 3.0, 'y'] >>> a.pop() 'y' >>> a [1, 'x', 3.0] >>> a.reverse() >>> a [3.0, 'x', 1]
while running the interpreter, you can use dir(list) to show the attributes and methods for the list data type, and you can use help(list) to review the expected arguments

you can split a string into a list of substrings, and you can join a list of strings into a single string using a separator string

Split and Join Example

>>> a = 'this is a test of split and join' >>> b = a.split() >>> b ['this', 'is', 'a', 'test', 'of', 'split', 'and', 'join'] >>> c = '-'.join(b) >>> c 'this-is-a-test-of-split-and-join' >>> '; '.join(['1','2','3']) '1; 2; 3'

nested lists can be used for n-dimensional arrays (but note that the NumPy package described below provides a more efficient implementation of n-dimensional arrays)

Unordered Compound Types

Overview

a set is an unordered collection of elements
a dictionary is an unordered collection of key/value pairs (also known as an associative array)

Sets

a set is mutable and can have members of different types, e.g., set[1,'two',3.0]
a frozenset is immutable
set operators include:
- - set difference (in first but not second)
- | set union
- & set intersection
- ^ symmetric difference (unique members)
Set Operations Example
>>> s1 = set(['a','b']) >>> s2 = set(['b','c']) >>> s1 | s2 set(['a', 'c', 'b']) >>> s1 & s2 set(['b']) >>> s1 ^ s2 set(['a', 'c'])

Dictionaries

a dictionary is an associative array, that is, a collection of key and value pairs that can be accessed using the keys
keys must be immutable types and unique, but keys of different types can be used in the same dictionary
values can be of mutable or immutable type and can also have different types
insert a new entry or change an existing key and value pair by assigning dictionary_name[key] = value
delete an entry by using del dictionary_name[key]

use the in operator to test existence of a key in a dictionary

Dictionary Example

>>> assoc_array_names = { 'Snobol4':'table', 'Perl':'hash', 'C++':'map' } >>> assoc_array_names['Snobol4'] 'table' >>> assoc_array_names['Python'] Traceback (most recent call last): File "", line 1, in ? KeyError: 'Python' >>> 'Python' in assoc_array_names False >>> 'C++' in assoc_array_names True >>> assoc_array_names['Python'] = 'dictionary' >>> del assoc_array_names['Snobol4'] >>> assoc_array_names {'Python': 'dictionary', 'C++': 'map', 'Perl': 'hash'}

Statements

Overview

# starts a comment, which continues until the end of the current line
a docstring is an unassigned string that appears as the first item in function body; used to document functions
pass is the no-operation statement
assignment statements
- a = 1
- a,b = 3,4 (multiple assignments)
- a = b = 6
- unlike C, assignment statements do not return values

Deep Versus Shallow Copies

list assignments only make shallow copies

this means that, without special care, changes to one list will appear in all copies of the list

Shallow Copy Example

>>> a = 1 >>> b = a >>> a 1 >>> b 1 >>> a = 2 >>> a # b is unchanged 2 >>> b 1 >>> a = [1,2] >>> b = a >>> a[0] = 3 # changes b[0] also! >>> a [3, 2] >>> b [3, 2]

use deepcopy from the copy module to allocate a new area of memory for the copied list

Deep Copy Example

>>> a = [1,2] >>> b = a >>> from copy import deepcopy >>> c = deepcopy(a) >>> a[0] = 3 >>> a [3, 2] >>> b # points to same place as a [3, 2] >>> c # unchanged; different memory [1, 2]

Conditional Statements

keywords are if, elif, and else

Conditional Statement Example

>>> a = 55 >>> if a < 0: ... sign = -1 ... elif a == 0: ... sign = 0 ... else: ... sign = 1 ... >>> sign 1

== compares by value
is compares by reference
in tests membership
comparison can be prefixed with not
comparisons can be chained a < b < c
comparisons can be joined by and, or
compound comparisons use short-circuit evaluation
lists and strings are compared in lexicographic order
'ABC' < 'C' < 'Pascal' < 'Python'

Looping

a for loop can walk a list

Loop Iterating Through List Example

>>> a = [1,2,3] >>> for x in a: ... print x, ... 1 2 3

range generates a list that can be used for iteration
- optional start value (default is 0)
- limit value
- optional increment value (default is 1)
Range Generation Example
>>> range(5) [0, 1, 2, 3, 4] >>> range(5, 10) [5, 6, 7, 8, 9] >>> range(0, 10, 3) [0, 3, 6, 9]
xrange is preferred when iterating over a large range since it generates the values as needed rather than initially storing all of them in one big list

can optionally append an else clause to a for loop

For Loop with Else Example

>>> for i in range(3,6): ... print 'in loop with i = ',i ... else: ... print 'outside loop with i = ',i ... in loop with i = 3 in loop with i = 4 in loop with i = 5 outside loop with i = 5

a while loop executes until the while condition changes
break exits out of enclosing loop (one level only)
continue causes advance to next iteration without completing current
list comprehension generates a list with given properties
List Comprehension Example [from Rossum tutorial]
>>> [str(round(355/113.0, i)) for i in range(1,6)] ['3.1', '3.14', '3.142', '3.1416', '3.14159']
enumerate provides both index value and object from list
Enumerate Example [from Rossum tutorial]
>>> for i,v in enumerate(['tic', 'tac', 'toe']): ... print i,v ... 0 tic 1 tac 2 toe

zip combines object pairs from two separate lists

Zip Example [from Rossum tutorial]

>>> questions = ['name', 'quest', 'favorite color'] >>> answers = ['lancelot', 'the holy grail', 'blue'] >>> for q, a in zip(questions, answers): ... print 'What is your %s? It is %s.' % (q,a) ... What is your name? It is lancelot. What is your quest? It is the holy grail. What is your favorite color? It is blue.

iteritems allows iteration over the key and value pairs in a dictionary

Dictionary Traversal Example [from Rossum tutorial]

>>> knights = {'gallahad':'the pure', 'robin':'the brave'} >>> for k,v in knights.iteritems(): ... print k,v ... gallahad the pure robin the brave

Functions

Overview

functions are defined by def

Fibonacci Procedure and Function Examples [from Rossum tutorial]

>>> def fib(n): # write Fibonacci series up to n ... """Print a Fibonacci series up to n.""" ... a,b = 0,1 ... while b < n: ... print b, ... a,b = b,a+b ... >>> fib(2000) 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 >>> def fib2(n): # return Fibonacci series up to n ... """Return a list containing the Fibonacci series up to n.""" ... result = [] ... a,b = 0,1 ... while b < n: ... result.append(b) ... a,b = b,a+b ... return result ... >>> f100 = fib2(100) >>> f100 [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]

arguments are passed as call-by-object-reference (looks like call-by-value)
Argument Passing Example
>>> def f(x): ... x = 3 ... >>> a = 1 >>> f(a) >>> a 1

you can use a global declaration as the first statement in a function to allow access to global variable

Global Variable Example

>>> a = 55 >>> def f2(x): ... a = x ... >>> def f3(x): ... global a ... a = x ... >>> a 55 >>> f2(22) >>> a 55 >>> f3(33) >>> a 33

you can specify default argument values, but note that nonliteral defaults are evaluated at the point of the function definition rather than the function call

Default Arguments Example

>>> a = 1 >>> b = 2 >>> def f(arg1=20,arg2=b): ... print arg1,arg2 ... >>> f(a,2) 1 2 >>> f(3) 3 2 >>> b = 4 >>> f(3) 3 2 >>> f <function f at 0x2ac232a94938> >>> f() 20 2

keyword arguments and variable-length argument lists are also supported

generator functions provide iteration values; use yield rather than return

Generator Function Example [from Scullin tutorial]

>>> def squares(lastterm): ... for n in range(lastterm): ... yield n**2 ... >>> for i in squares(4): ... print i, ... 0 1 4 9

Input/Output

Overview

terminal input
- raw_input returns a string that can then be converted into the appropriate type
- optional argument is the prompting string
- i = int(raw_input('prompt: '))
output
- print can be used with or with formatting
  - print 'the values of i and j are',i,'and',j
  - print 'the values of i and j are %d and %d' % (i,j)
- C-like printf formatting is available that provides field widths, number of digits in fraction, etc.
- trailing comma means to omit newline

reading a file

readlines can be used as loop range
for large files use xreadlines instead of readlines since readlines tries to load an entire file at one time
readline provides one line at a time

File Input Example

[cmd-line-prompt] cat numbers.txt 1 2.3 4.0 [cmd-line-prompt] cat numbers.py # for loop over lines in file data = open('numbers.txt','r') for d in data.xreadlines(): # remove trailing newline from d e = d.rstrip() print e data.close() # while loop version with line-at-a-time reads data = open('numbers.txt','r') d = data.readline() while d: print d.rstrip() d = data.readline() data.close() [cmd-line-prompt] python numbers.py 1 2.3 4.0 1 2.3 4.0

writing a file

File Output Example

[cmd-line-prompt] cat numbers2.py # write one string at a time to the file data = open('numbers2.txt','w') for i in range(3): s = 'record ' + str(i) + '\n' data.write(s) data.close() # create a list and then write to file data = open('numbers3.txt','w') lines = [] for i in range(3): s = 'record ' + str(i) + '\n' lines.append(s) data.writelines(lines) data.close() [cmd-line-prompt] python numbers2.py [cmd-line-prompt] cat numbers2.txt record 0 record 1 record 2 [cmd-line-prompt] cat numbers3.txt record 0 record 1 record 2

the Python library provides the sys module, which includes definitions of standard input (stdin) and standard output (stdout), as well as command line arguments (argv).
refer to the library documentation of the sys module for more details

Modules

Overview

assume you have defined functions fn1 and fn2 in a file named my_functions.py; this defines a module named my_functions

you can access the functions after importing the module name

Module Import Example

>>> import my_functions >>> dir(my_functions) ['__builtins__', '__doc__', '__file__', '__name__', 'fn1', 'fn2'] >>> my_functions.fn1(1) call to fn1 with value 1

you can assign a local name to be able to directly call the function: fn1 = my_functions.fn1
alternatively you can import one, some, or all names directly
- from my_functions import fn1,fn2
- use * for all: from my_functions import *
note that assigning local names and importing names from modules will override the previous definitions of these names

Functional Style Programming

Overview

filter(fn,seq) - returns list of objects in sequence for which fn(object) is true
map(fn,seq) - returns list of objects returned by the function when applied to each object in the sequence
reduce(fn,seq) - returns the single object produced by successive application of fn to objects in the sequence
reduce(fn,seq,start) - gives a starting value, which is returned as the reduction of an empty sequence
Reduction Example [from Rossum tutorial]
>>> def sum(seq): ... def add(x,y): return x+y ... return reduce(add, seq, 0) ... >>> sum(range(1, 11)) 55 >>> sum([]) 0
lamda allows short functions to be defined without using a function name
List Sorting Example with Key Function Defined Using Lamda Form
>>> a = [[1,5],[2,3],[3,6]] >>> sorted(a,key = lambda x:x[1]) [[2, 3], [1, 5], [3, 6]]

OO Style Programming

Overview

Python provides classes for object-oriented programming, similar to C++

Simple Class Example

>>> class stack_example: ... def __init__(self): ... self._list = [] ... def push(self,item): ... self._list.append(item) ... def pop(self): ... if len(self._list) > 0: ... return self._list.pop() ... else: ... return 0 ... def empty(self): ... return len(self._list) == 0 ... >>> x = stack_example() >>> x.empty() True >>> x.push(1) >>> x.empty() False >>> x.pop() 1

the following example shows operator overloading as well as printing the object id of which object responds to a method call

Class Example with + Operator Overloading

[cmd-line-prompt] cat verbose_class.py object_counter = 1 class verbose_example: def __init__(self,data=0): global object_counter self._value = data self._object_id = object_counter object_counter += 1 print ' constructor for object',self._object_id, print 'with value',self._value def __repr__(self): print ' string representation for object',self._object_id return repr(self._value) def __add__(self,other): print ' addition for object',self._object_id try: print ' (1) first assume addition is to another object' return verbose_example(self._value + other._value) except: print ' (2) if not, assume addition is to constant' return verbose_example(self._value + other) # assignment operator cannot be overloaded def __del__(self): print ' destructor for object',self._object_id print '-- instantiate y and z --' y = verbose_example() z = verbose_example(10) print '-- add y and z and assign to x: x = y + z --' x = y + z print '-- print x' print x print '-- add x and 20 and assign to x: x = x + 20 --' x = x + 20 print '-- print x --' print x print '-- assign x to a: a = x (creates alias to same object) --' a = x print '-- print a --' print a print '-- end program --' [cmd-line-prompt] python verbose_class.py -- instantiate y and z -- constructor for object 1 with value 0 constructor for object 2 with value 10 -- add y and z and assign to x: x = y + z -- addition for object 1 (1) first assume addition is to another object constructor for object 3 with value 10 -- print x string representation for object 3 10 -- add x and 20 and assign to x: x = x + 20 -- addition for object 3 (1) first assume addition is to another object (2) if not, assume addition is to constant constructor for object 4 with value 30 destructor for object 3 -- print x -- string representation for object 4 30 -- assign x to a: a = x (creates alias to same object) -- -- print a -- string representation for object 4 30 -- end program -- destructor for object 4 destructor for object 1 destructor for object 2

derived classes inherit the variables and methods of the parent

Inheritance Example

[cmd-line-prompt] cat inherit.py class parent: def __init__(self,data=0): self._value = data def __repr__(self): return repr(self._value) def reset(self): self._value = 0 class child(parent): def update(self,data): self._value += data x = child(10) print x x.update(1) print x x.reset() print x [cmd-line-prompt] python inherit.py 10 11 0

refer to the Python tutorial for more details

Regular Expressions

Overview

the Python library provides the re module to support the use of regular expressions in string searching, replacing, and splitting, similar to the facilities available directly in Perl
re methods include:
- compile - create a pattern object for matching
- match - return a matched substring from the beginning of a string
- search - return a matched substring from anywhere in a string
- split - split a string into a list based on pattern matching
- sub - return a string with replacements
use the re.split method instead of the basic string split when there are multiple separator characters
special characters for regular expressions include:
- (default matching rules are noted; there are special flags to modify the rules)
- . - match any character except newline
- ^ - match start of string
- $ - match end of string
- * - match zero or more occurrences
- + - match one or more occurrences
- ? - match zero or one occurrence
- | - or
- [ ] - brackets indicate matching any of a set of characters, e.g., [a..z] matches any lower case letter
- [^ ] - brackets with a caret as the first character indicates matching any character of the inverse of the set
- ( ) - parentheses indicate a group of matched characters that can be referenced later
- \ - escape character or when followed by a number indicates a previously-matched parenthesized group
- \b - match the empty string at the beginning or end of a word

regular expressions are very powerful but can be hard to understand; the following example matches pairs of repeated lower-case words and removes the second occurrence of each repeated lower-case word

the leading r indicates a raw string so that you do not have to double-escape the backslashed
\b matches the beginning of a word
[a-z] matches any lower case letters
+ indicates one or more lower case letters match
the parentheses delimit the matching letters as a group
\1 matches the first group and is used in the initial pattern to indicate a repeat of the matched group and is also used in the replacement string
the first re.sub statement fails to find any repeated words, and no replacement is made
the second and third re.sub statements each find a pair of repeated 'a's and removes one occurrence of 'a'; the third statement demonstrates that the matching is not started anew from the beginning of the string after a replacement but instead restarts just past the location of the first match
the fourth re.sub statement finds two pairs of repeated 'a's and removes one 'a' from each pair
note that the pattern would not match "It's It's" since there are uppercase letters and single quotes in these two words, so the pattern would need to be (\b[a-zA-Z\']+) instead

Regular Expression Example

>>> import re >>> re.sub(r'(\b[a-z]+) \1', r'\1', "It's a fair cop") "It's a fair cop" >>> re.sub(r'(\b[a-z]+) \1', r'\1', "It's a a fair cop") "It's a fair cop" >>> re.sub(r'(\b[a-z]+) \1', r'\1', "It's a a a fair cop") "It's a a fair cop" >>> re.sub(r'(\b[a-z]+) \1', r'\1', "It's a a a a fair cop") "It's a a fair cop"

if the same pattern is going to be used repeatedly, it is faster to compile the pattern once and use the resulting pattern object

Compiled Regular Expression Example

>>> import re >>> repeated_word_pattern = re.compile(r'(\b[a-z]+) \1') >>> repeated_word_pattern.sub(r'\1',"It's a a fair cop") "It's a fair cop" >>> repeated_word_pattern.sub(r'\1',"It's a a a fair cop") "It's a a fair cop"

refer to the library documentation of the re module for more details

OS Module

Overview

Python supports a number of calls to the operating system through the os module
examples
- os.name returns the name of the operating system module that has been imported by import os; on Linux systems this is the Posix module
- os.getenv() and os.putenv() can be used to read and set environment variables
- a full set of process management and file operations is provided, including os.walk(), which generates all the file names in a directory tree
- you can pass command lines to the shell by using os.system(command), which calls the standard C function system()
Shell Command Example
>>> import os >>> os.name 'posix' >>> os.system('echo "this is a test"') this is a test 0 >>> status = os.system('echo "this is a test"') this is a test >>> status 0
refer to the library documentation of the os module for more details

Bioinformatics Examples

the following examples show the use of Python for different types of computations important for bioinformatics
strands of DNA are typically represented as strings in Python
- DNA is composed of four primary nucleotide bases: cytosine, guanine, adenine, and thymine; RNA contains uracil instead of thymine
- these bases are typically designated by the upper case letters CGATU or the lower case letters cgatu
- bioinformatics files can contain genetic sequences in either upper or lower case, and sometimes even in mixed case where the case difference may be significant (e.g., indicating more or less confidence in the identity of the base)
- while a Python list could be used to hold each base of a genetic sequence as a separate element, it is more common to see a Python string used to represent a sequence

the following example shows the calculation of GC content (or guanine-cytosine content), which is the percentage of nitrogenous bases in a DNA molecule; GC content ratios will vary within DNA strands, and the change in ratios is used to determine regions called isochores

GC Calculation Example [from Xie tutorial]

>>> dna = "gcatgacgttattacgactctg" >>> len(dna) 22 >>> dna.count("a") 5 >>> gc = (100 * (dna.count("c")+dna.count("g"))) / float(len(dna)) >>> "%.2f" % gc '45.45'

the following example shows a reverse complement function; in DNA, adenine pairs with thymine, and cytosine pairs with guanine; a reverse complement reverses the sequence and takes the complement of the AT and CG pairs

Reverse Complement Function Example [from Xie tutorial]

>>> from string import * >>> dna = "gcatgacgttattacgactctg" >>> def revcomp(dna): ... """ reverse complement of a DNA sequence """ ... comp = dna.translate(maketrans("AGCTagct","TCGAtcga")) ... lcomp = list(comp) ... lcomp.reverse() ... return join(lcomp,"") ... >>> dna 'gcatgacgttattacgactctg' >>> revcomp(dna) 'cagagtcgtaataacgtcatgc'

the following example shows multiple operations on a string that represents a strand of DNA; a search is made for a substring named EcoRI, which is an enzyme isolated from strains of E. coli

String Find and Replace Example [from Schuerer tutorial]

>>> dna = """tgaattctatgaatggactgtccccaaagaagtaggacccactaatgcagatcctgga tccctagctaagatgtattattctgctgtgaattcgatcccactaaagat""" >>> count(dna,'a') # call of unqualified function Traceback (most recent call last): File "<stdin> line 1, in ? NameError: name 'count' is not defined >>> dna.count('a') 33 >>> from string import count, find, replace >>> dna 'tgaattctatgaatggactgtccccaaagaagtaggacccactaatgcagatcctgga\ntccctagctaagatgtattattctgctgtgaattcgatcccactaaagat' >>> replace(dna,'\n','') 'tgaattctatgaatggactgtccccaaagaagtaggacccactaatgcagatcctggatccctagctaagatgtattattctgctgtgaattcgatcccactaaagat' >>> dna = replace(dna,'\n','') >>> dna 'tgaattctatgaatggactgtccccaaagaagtaggacccactaatgcagatcctggatccctagctaagatgtattattctgctgtgaattcgatcccactaaagat' >>> EcoRI = 'gaattc' >>> count(dna,EcoRI) # now defined from import statement 2 >>> find(dna,EcoRI) # find position of first match 1 >>> find(dna,EcoRI,2) # find position of second match 88

the following example demonstrates a shortcut in assigning long strings by adding the newline replacement method call as part of assignment

Long String Example [from Schuerer tutorial]

>>> from string import replace >>> dna = """tgaattctatgaatggactgtccccaaagaagtaggacccactaatgcagatcctgga tccctagctaagatgtattattctgctgtgaattcgatcccactaaagat""".replace('\n','')

the following example shows a function to count codons, which are tri-nucleotide units that represent individual amino acids; the function scans the input string three characters at a time, and a dictionary named "usage" is used to collect the counts of codons

Codon Count Example [from Schuerer tutorial]

>>> cds = "atgagtgaacgtctgagcattaccccgctggggccgtatatcggcgcacaataa" >>> def count_codons(cds): ... usage = {} ... for i in range(0,len(cds),3): ... codon = cds[i:i+3] ... if usage.has_key(codon): ... usage[codon] += 1 ... else: ... usage[codon] = 1 ... return usage ... >>> count_codons(cds) {'acc': 1, 'atg': 1, 'atc': 1, 'gca': 1, 'agc': 1, 'ggg': 1, 'att': 1, 'ctg': 2, 'taa': 1, 'ggc': 1, 'tat': 1, 'ccg': 2, 'agt': 1, 'caa': 1, 'cgt': 1, 'gaa': 1}

the following example shows a function to translate codons into one-letter abbreviations; note that the function is recursive and translates one three-character codon per invocation; translation is done by using the codon as a key into the code dictionary

Codon Translation Example [from Schuerer tutorial]

>>> code = { ... 'ttt': 'F', 'tct': 'S', 'tat': 'Y', 'tgt': 'C', ... 'ttc': 'F', 'tcc': 'S', 'tac': 'Y', 'tgc': 'C', ... 'tta': 'L', 'tca': 'S', 'taa': '*', 'tga': '*', ... 'ttg': 'L', 'tcg': 'S', 'tag': '*', 'tgg': 'W', ... 'ctt': 'L', 'cct': 'P', 'cat': 'H', 'cgt': 'R', ... 'ctc': 'L', 'ccc': 'P', 'cac': 'H', 'cgc': 'R', ... 'cta': 'L', 'cca': 'P', 'caa': 'Q', 'cga': 'R', ... 'ctg': 'L', 'ccg': 'P', 'cag': 'Q', 'cgg': 'R', ... 'att': 'I', 'act': 'T', 'aat': 'N', 'agt': 'S', ... 'atc': 'I', 'acc': 'T', 'aac': 'N', 'agc': 'S', ... 'ata': 'I', 'aca': 'T', 'aaa': 'K', 'aga': 'R', ... 'atg': 'M', 'acg': 'T', 'aag': 'K', 'agg': 'R', ... 'gtt': 'V', 'gct': 'A', 'gat': 'D', 'ggt': 'G', ... 'gtc': 'V', 'gcc': 'A', 'gac': 'D', 'ggc': 'G', ... 'gta': 'V', 'gca': 'A', 'gaa': 'E', 'gga': 'G', ... 'gtg': 'V', 'gcg': 'A', 'gag': 'E', 'ggg': 'G' ... } >>> def rectranslate(cds, code): ... if cds == "": ... return "" ... else: ... codon = cds[:3] ... return code[codon] + rectranslate(cds[3:], code) ... >>> cds = "atgagtgaacgtctgagcattaccccgctggggccgtatatcggcgcacaataa" >>> rectranslate(cds,code) MSERLSITPLGPYIGAQ*

Biopython Example

Biopython is a set of Python tools for use in bioinformatics
- Biopython includes modules to read files in many standard bioinformatics formats (FASTA, GenBank, PubMed, etc.)
- Biopython provides interfaces to common bioinformatics programs such as Blast and Clustalw
- Biopython includes modules for performing common biological computations, such as translation, transcription, and weight calculations
- Biopython is open source software distributed under the Biopython license
- Biopython 1.57 is the latest release, but note that Biopython 1.56 is the last release to support Python 2.4

the following example shows the use of Biopython in reading records from a FASTA file and sorting according to length

Sorting a Sequence by Length from a FASTA File Using Biopython [from Biopython tutorial]

# Suppose you wanted to sort a sequence file by length (e.g., a set # of contigs from an assembly), and you are working with a file format # like FASTA or FASTQ which Bio.SeqIO can read, write (and index). # # If the file is small enough, you can load it all into memory at # once as a list of SeqRecord objects, sort the list, and save it: from Bio import SeqIO records = list(SeqIO.parse("ls_orchid.fasta","fasta")) records.sort(cmp=lambda x,y: cmp(len(x),len(y))) SeqIO.write(records, "sorted_orchids.fasta", "fasta")

Molecular Modeling Took Kit Example

MMTK is the Molecular Modeling Tool Kit developed by Konrad Hinson
- MMTK uses an object-oriented model of molecular systems with special support for proteins and nucleic acids; typical usage is for simulation and modelling applications similar the CHARMM and Gromos packages
- MMTK is almost completely written in Python, with only a small time-critical energy-evaluation function written in C
- MMTK is open source software distributed under the CeCILL free software license agreement

the following example shows an integration scheme that might be used to calculate trajectories of particles in a molecular dynamics simulations (Verlet is the name of the integration scheme)

Molecular Dynamics Example from MMTK [MD integrator example]

# A Velocity-Verlet integrator implemented in Python. # Use this as a starting point for modified integrators. # from MMTK import * from MMTK.Proteins import Protein from MMTK.ForceFields import Amber99ForceField from MMTK.Trajectory import Trajectory, TrajectoryOutput, SnapshotGenerator # Velocity Verlet integrator in Python def doVelocityVerletSteps(delta_t, nsteps, equilibration_temperature = None, equilibration_frequency = 1): configuration = universe.configuration() velocities = universe.velocities() gradients = ParticleVector(universe) inv_masses = 1./universe.masses() evaluator = universe.energyEvaluator() energy, gradients = evaluator(gradients) dv = -0.5*delta_t*gradients*inv_masses time = 0. snapshot(data={'time': time, 'potential_energy': energy}) for step in range(nsteps): velocities += dv configuration += delta_t*velocities universe.setConfiguration(configuration) energy, gradients = evaluator(gradients) dv = -0.5*delta_t*gradients*inv_masses velocities += dv universe.setVelocities(velocities) time += delta_t snapshot(data={'time': time, 'potential_energy': energy}) if equilibration_temperature is not None \ and step % equilibration_frequency == 0: universe.scaleVelocitiesToTemperature(equilibration_temperature) # Define system universe = InfiniteUniverse(Amber99ForceField()) universe.protein = Protein('bala1') # Create trajectory and snapshot generator trajectory = Trajectory(universe, "md_trajectory.nc", "w", "Generated by a Python integrator") snapshot = SnapshotGenerator(universe, actions = [TrajectoryOutput(trajectory, ["all"], 0, None, 1)]) # Initialize velocities universe.initializeVelocitiesToTemperature(50.*Units.K) # Heat and equilibrate for temperature in [50., 100., 200., 300.]: doVelocityVerletSteps(delta_t = 1.*Units.fs, nsteps = 500, equilibration_temperature = temperature*Units.K, equilibration_frequency = 1) doVelocityVerletSteps(delta_t = 1.*Units.fs, nsteps = 500, equilibration_temperature = 300*Units.K, equilibration_frequency = 10) # Production run doVelocityVerletSteps(delta_t = 1.*Units.fs, nsteps = 5000) trajectory.close()

Scientific Computation Examples

Python provides a math module which includes:
- rounding functions ceil and floor
- absolute value function fabs
- modulo function fmod (with different properties than %)
- power and logarithmic functions exp, log, log10, pow, and sqrt
- trigonometric functions acos, asin, atan, cos, sin, tan, and hypot
- angular conversion functions degrees and radians
- hyperbolic functions cosh, sinh, and tanh
- constants pi and e
Math Module Example
>>> import math >>> math.exp(2) 7.3890560989306504
the following example shows that you need to use a decimal point with at least one of the numbers in an expression in order to obtain a floating-point instead of integer result
Integer and Floating-Point Example
>>> a = 21 >>> a/7 3 >>> a/8 2 >>> a/8. 2.625
the following example demonstrates the availability of imaginary numbers in Python
Imaginary Number Example
>>> z = 2+3j >>> z (2+3j) >>> z.real 2.0 >>> z.imag 3.0 >>> z.conjugate() (2-3j)
the following example demonstrates the availability of integers of arbitrary size in Python
Large Integer Example
>>> a = 2**100 >>> a 1267650600228229401496703205376L >>> a / 5 253530120045645880299340641075L

the following example shows a factoring function from Jim Carlson, Computations in Number Theory Using Python: A Brief Introduction, March 2003 (pdf, 24 pages)

Factoring Example [from Carlson]

>>> def factor3(n): ... d = 2 ... factors = [ ] ... while n % d == 0: ... factors.append(d) ... n = n/d ... d = 3 ... while n > 1 and d*d <= n: ... if n % d == 0: ... factors.append(d) ... n = n/d ... else: ... d = d + 2 ... if n > 1: ... factors.append(n) ... return factors ... >>> factor3(99) [3, 3, 11] >>> factor3(100) [2, 2, 5, 5] >>> factor3(1234) [2, 617] >>> a = 2**10 >>> factor3(a) [2, 2, 2, 2, 2, 2, 2, 2, 2, 2] >>> factor3(a+1) [5, 5, 41]

floating-point arithmetic in Python is done in 64-bit double precision, and the following example demonstrates the limits of the floating-point precision

Floating Point Numbers Example

>>> 0.1 0.10000000000000001 >>> step = 0.1 >>> sum = 0.0 >>> for i in range(10): ... sum += step ... >>> sum 0.99999999999999989 >>> 1.0 - sum 1.1102230246251565e-16

Python provides a random number module that includes an extensive set of functions

NumPy Example

NumPy is a Python package developed by Enthought that provides more efficient implementations of multidimensional arrays than in standard Python
- NumPy provides many of the standard mathematical operators on arrays; it also provides additional random number generators
- NumPy is open source software distributed under the BSD license

the following examples show the use of arrays in NumPy with a dot product, polynominal curve fits, and transpositions

NumPy Examples [from NumPy Example List]

>>> from numpy import * >>> x = array([[1,2,3],[4,5,6]]) >>> x.shape (2, 3) >>> y = array([[1,2],[3,4],[5,6]]) >>> y.shape (3, 2) >>> dot(x,y) # matrix multiplication (2,3) x (3,2) -> (2,2) array([[22, 28], [49, 64]]) >>> x = array([1,2,3,4,5]) >>> y = array([6, 11, 18, 27, 38]) >>> polyfit(x,y,2) # fit a 2nd degree polynomial to the data, result is x**2 + 2x + 3 array([ 1., 2., 3.]) >>> polyfit(x,y,1) # fit a 1st degree polynomial (straight line), result is 8x-4 array([ 8., -4.]) >>> a = arange(30) >>> a = a.reshape(2,3,5) >>> a array([[[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]], [[15, 16, 17, 18, 19], [20, 21, 22, 23, 24], [25, 26, 27, 28, 29]]]) >>> b = a.transpose() >>> b array([[[ 0, 15], [ 5, 20], [10, 25]], [[ 1, 16], [ 6, 21], [11, 26]], [[ 2, 17], [ 7, 22], [12, 27]], [[ 3, 18], [ 8, 23], [13, 28]], [[ 4, 19], [ 9, 24], [14, 29]]]) >>> b.shape (5, 3, 2) >>> b = a.transpose(1,0,2) # First axis 1, then axis 0, then axis 2 >>> b array([[[ 0, 1, 2, 3, 4], [15, 16, 17, 18, 19]], [[ 5, 6, 7, 8, 9], [20, 21, 22, 23, 24]], [[10, 11, 12, 13, 14], [25, 26, 27, 28, 29]]]) >>> b.shape (3, 2, 5)

see also David Mertz, Charming Python: Numerical Python, IBM Developer Works, October 2003, and the NumPy tutorial at SciPy.org

SciPy Example

SciPy is a Python package developed by Enthought that includes access to routines for FFTs, integration, linear algebra, optimization, and other common scientific and engineering computation tasks
- SciPy is built using the numeric array data types from NumPy and runs on Python 2.4 and 2.5
- SciPy routines are typically wrapper functions built on top of the LAPACK library, and the recommended 2-D plotting package is Matplotlib
- SciPy is open source software distributed under the BSD license
for comparisons with MATLAB, see NumPy for Matlab Users at SciPy.org and a Comparison of MATLAB, OCTAVE, and Python/SciPy/PyLab Examples (pdf, 2 pages) by Ed Bueler for his course at Univ. of Alaska

the following example shows the use of single integration in SciPy; double integration and triple integration routines are also available

SciPy Examples [from SciPy Example List and John Cook's Getting started with the SciPy library]

>>> from scipy import * >>> value, err = integrate.quad(func=pow, a=0., b=1., args=(5,)) >>> value # integral of x^5 over [0,1] 0.16666666666666669 >>> integrate.quad(lambda x: math.exp(-x*x), -inf, inf) (1.7724538509055159, 1.4202636780944923e-008)

see also the SciPy overview prepared by Dave Kuhlman in 2006 and the index of scienftic software available in Python prepared by SciPy.org

Text Processing Examples

the following word-indexing example uses a dictionary to accumulate a list of the line numbers of the lines on which a word appears in an input file

the input file is read using stdin from the sys module; command line I/O redirection is used to specify the file name
a regular string split is used on each input line, and punctuation is removed from the resulting substrings using the sub method from the re module; note that the single quote is not removed, and thus a contraction is kept as a single word
each substring is then transformed into a lower-case word by use of the lower method from the string module
if the word is already in the dictionary, the current line number is appended to the corresponding list; if not, the word is added to the dictionary with an initial single-element list
the words are then sorted and used to print the corresponding lists of line numbers
lines are numbered on a 0-origin basis, but this can be easily changed to 1-origin by appending n+1 to a list rather than n (and starting with [n+1] instead of [n])

Word Index Example [adapted from Alex Martelli]

[cmd-line-prompt] cat index.py # build a word -> line numbers mapping from an input file from re import compile,sub from sys import stdin from string import lower # punctuation marks you want to match (? and . are escaped) regex_punct = compile(r"[,:\?\.]") # create a dictionary in which each key is a word and each # value is a list of line numbers on which that word appears idx = {} for n,line in enumerate(stdin): for word in line.split(): word = regex_punct.sub('',word) word = word.lower() if idx.has_key(word): # shortcut to the if-else is to use idx[word].append(n) # idx.setdefault(word,[]).append(n) else: idx[word] = [n] # display by alphabetically-sorted word words = idx.keys() words.sort() for word in words: print "%s:" % word, for n in idx[word]: print n, print [cmd-line-prompt] cat input A man walks into an office. Man: Ah. I'd like to have an argument, please. Receptionist: Certainly sir. Have you been here before? Man: No, this is my first time. Receptionist: I see. Well, do you want to have the full argument, or were you thinking of taking a course? [cmd-line-prompt] python index.py < input a: 0 9 ah: 2 an: 0 2 argument: 2 8 been: 4 before: 4 certainly: 4 course: 9 do: 8 first: 6 full: 8 have: 2 4 8 here: 4 i: 8 i'd: 2 into: 0 is: 6 like: 2 man: 0 2 6 my: 6 no: 6 of: 9 office: 0 or: 9 please: 2 receptionist: 4 8 see: 8 sir: 4 taking: 9 the: 8 thinking: 9 this: 6 time: 6 to: 2 8 walks: 0 want: 8 well: 8 were: 9 you: 4 8 9

the following tokenizer example shows:

processing command line arguments using argv from the sys module, including defaulting to reading from stdin unless an input file is named as a command argument
splitting each input line into a list of words using re.split to allow multiple characters as separators
directly printing the contents list for each line, which results in the list brackets and list-element quote characters being printed

Input Line Tokenizer Example

[cmd-line-prompt] cat splitter #!/usr/bin/env python # command to split input lines based on multiple separators: # one or more spaces, comma, open parenthesis, close parenthesis # # will read from stdin unless file name specified on command line # # command line options are -r for reserve and -s for sorted; # they can be combined for reverse sorted order import re,sys # default values options = '' file = sys.stdin # process command line arguments args = sys.argv args.pop(0) # removes initial command name while len(args) > 0: # get next argument argument = args.pop(0) # an option argument has a leading dash if argument[0] == '-': if argument[1] == 'r': options = options + 'r' elif argument[1] == 's': options = options + 's' else: print 'unrecognized option:',argument # otherwise treat argument as file name else: file = open(argument,'r') # process input line by line input_line = file.readline() while input_line: contents = re.split(' +|,|$|$',input_line.strip()) # remove any empty string members while contents.count(''): contents.remove('') # process line according to options if 's' in options: contents.sort() if 'r' in options: contents.reverse() print contents input_line = file.readline() file.close() [cmd-line-prompt] chmod a+x splitter [cmd-line-prompt] cat input.txt a,b,(c,d) spams and eggs dead (parrot) [cmd-line-prompt] ./splitter input.txt ['a', 'b', 'c', 'd'] ['spams', 'and', 'eggs'] ['dead', 'parrot'] [cmd-line-prompt] ./splitter < input.txt ['a', 'b', 'c', 'd'] ['spams', 'and', 'eggs'] ['dead', 'parrot'] [cmd-line-prompt] ./splitter -s input.txt ['a', 'b', 'c', 'd'] ['and', 'eggs', 'spams'] ['dead', 'parrot'] [cmd-line-prompt] ./splitter -r -s input.txt ['d', 'c', 'b', 'a'] ['spams', 'eggs', 'and'] ['parrot', 'dead']

BeautifulSoup HTML Parsing Example

BeautifulSoup is a Python package developed by Leonard Richardson that parses well-formed or arbitrarily-broken HTML
- it is recommended that you use the latest version (3.2) of BeautifulSoup if you are running Python 3.0
- for Python 2.x systems, download BeautifulSoup.2.1.1.py (or via the 2.1.1 version page at the Python Package Index) and rename it as BeautifulSoup.py
- read the documentation for the 2.1.1 version here
- BeautifulSoup is open source software distributed under the Python Software Foundation License (PSFL)

the following example shows the use of the BeautifulSoup parser to pretty-print some HTML input

HTML Pretty-Print Example Using BeautifulSoup

>>> from BeautifulSoup import BeautifulSoup >>> import re >>> input = '''<html> ... <head><title>Page title</title></head> ... <body> ... This is paragraph one. ... This is paragraph two. ... </html>''' >>> soup = BeautifulSoup(input) >>> pretty_print = soup.prettify() >>> print pretty_print <html> <head> <title>Page title </title> </head> <body> This is paragraph one . This is paragraph two . </body> </html>

see also David Mertz, Charming Python: Easy Web Data Collection with Mechanize and Beautiful Soup, IBM Developer Works, November 2009

Natural Language Tool Kit Example

NLTK is the Natural Language Tool Kit developed by Steven Bird, Ewan Klein, and Edward Loper
- NLTK is written in Python and has an extensive set of modules for parsing text, classifying text, metrics, etc.
- NLTK currently runs on Python 2.4, 2.5, or 2.6
- NLTK is open source software distributed under the Apache Version 2.0 license

the following example shows the use of two tokenizers from NLTK

Tokenizer Example [from NLTK HowTo]

>>> from nltk import word_tokenize, wordpunct_tokenize >>> s = ("Good muffins cost $3.88\nin New York. Please buy me\n" ... "two of them.\n\nThanks.") >>> word_tokenize(s) ['Good', 'muffins', 'cost', '$', '3.88', 'in', 'New', 'York.', 'Please', 'buy', 'me', 'two', 'of', 'them', '.', 'Thanks', '.'] >>> wordpunct_tokenize(s) ['Good', 'muffins', 'cost', '$', '3', '.', '88', 'in', 'New', 'York', '.', 'Please', 'buy', 'me', 'two', 'of', 'them', '.', 'Thanks', '.']

see also David Mertz, Charming Python: Get started with the Natural Language Toolkit, IBM Developer Works, June 2004

Resources

Web resources
- Guido van Rossum, Python Tutorial, version 2.4.3, March 2006.
- Robert Brunner, nine-part "Discover Python" series at IBM Developer Works, beginning with Getting started with Python, a powerful object-oriented scripting language, May 2005.
- Katja Schuerer, Corinne Maufrais, Catherine Letondal, Eric Deveaud, and Marie-Agnes Petit, Introduction to Programming using Python: Programming Course for Biologists at the Pasteur Institute, February 2008.
- Xiaohui Xie, Python Course in Bioinformatics, March 2009. (pdf, 35 slides)
- William Scullin, James Snyder, Massimo Di Pierro, and Jussi Enkovaara, Python for Scientific and High Performance Computing, November 2009. (pdf, 132 slides)
- Python wiki
- PyPI - the Python Package Index
Books
- Jason Kinser, Python For Bioinformatics, Jones and Bartlett Series in Biomedical Informatics, June 2008, 417 pages.
- Mitchell Model, Bioinformatics Programming Using Python: Practical Programming for Biological Data, O'Reilly Media, December 2009, 528 pages.
- Steven Bird, Ewan Klein, and Edward Loper, Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit, O'Reilly Media, June 2009, 512 pages.
- John Hughes, Real World Instrumentation with Python: Automated Data Acquisition and Control Systems, O'Reilly Media, November 2010, 624 pages.