So my labmate recently wrote a very nice blog post comparing some different solutions for speeding up Python code. The solutions he explored are either replacement interpreters, (restricted) python -> vm bytecode compilers, or (restricted) python -> C++ compilers. Some of these solutions look very promising. However, they are not all mature yet, and most only support a subset of the current language. I recently had a need to speed up some multithreaded python code that I had written, and in the process, I learned a bit about improving the performance of Python itself (i.e. not replacing the interpreter/bytecode compiler).
In particular, there are two changes that significantly sped up my code; converting while loops to for loops, and replacing list comprehensions with conditional guards with filter statements. So the first of these improvements is rather obvious in retrospect, as a for loop can be unrolled somewhat, whereas the condition guard on the while loop must be re-evaluated by the interpreter for every loop iteration. However, what was not obvious at first was how to convert my while loop to a for loop. The loop in question traverses the edges in a tree from an internal node until it reaches the root. My initial code looked something like this:
[cc lang=’python’ ]
def pathToRoot(G, n):
path = [n]
while len(G.predecessors(n)) > 0:
pred = G..predecessors(n)[0]
path.append(n)
n = pred
return path
[/cc]
Simple enough, right? So how do we turn this into a for loop? The solution upon which I settled was to make an initial pass over the tree and to label each node with its depth. Therefore, for an arbitrary node n, n[‘depth’] holds the number of predecessors I must traverse to reach the root. This allows the above code to be written as:
[cc lang=’python’ ]
def pathToRoot(G, n):
path = [n]
ndepth = n[‘depth’]
for i in xrange(ndepth):
pred = G..predecessors(n)[0]
path.append(n)
n = pred
return path
[/cc]
It’s a very minor change, but it led to a significant speedup in my program.
The other change I made was even more simple, but likewise, it yielded improved performance. This was simply to replace a list comprehension with a filter statement. So initially, the list comprehension looked pretty much like a filter itself
[cc lang=’python’]
result = [ x for x in input if fun(x) ]
[/cc]
This simply becomes
[cc lang=’python’]
result = filter( fun, input )
[/cc]
Unlike the previous example, there’s no real though required here. However, I was surprised to discover than the filter statement was significantly faster than the list comprehension. For some (obviously misguided) reason, I thought they would both lead to the same bytecode; they do not. Anyway, these are two very simple Python performance tricks that are easy to implement and can lead to significantly improved performance. If I stumble upon other such tricks in the future, I’ll be sure to drop them here.
I found the following times on my netbook for these variants:
17.8 seconds:
print timeit.timeit(‘[x for x in xrange(100000) if (lambda y: y==5)(x)]’, number=100)
11.4 seconds:
print timeit.timeit(‘l5 = lambda x: x==5; [x for x in xrange(100000) if l5(x)]’, number=100)
9.8 seconds:
print timeit.timeit(‘filter(lambda x: x==5, xrange(100000))’, number=100)
print timeit.timeit(‘l5 = lambda x: x==5; filter(l5, xrange(100000))’, number=100)
3.5 seconds:
print timeit.timeit(‘[x for x in xrange(100000) if x==5]’, number=100)
When the filter can be written as a statement, under my test conditions the list comprehension is fastest. Have you tried inlining your filter function? Python has historically required such ugly tricks for best performance.
I’d really like to know how big your list is and what you do in your filtering function.
Hi Cortland,
Thanks for the timing tests. My lists consist of a sets of nodes for synthetically generated protein phylogeny networks, with size ranges varying from ~10^3 to ~10^4. The structure of each network is essentially a forest of trees, and the goal of my filtering function is to extract the root nodes from each forest. My filter based code looks something like this:
def rootSet(network):
def isRoot(network, n):
if len(network.predecessors(n)) == 0:
return n
return filter( partial(isRoot, network), network )
However, after reading over your response and looking again at the NetworkX documentation, I found a simpler and more efficient solution.
def rootSet(network):
return [ x[0] for x in network.in_degree_iter( network.nodes_iter() ) if x[1] == 0 ]
Even on a set of very small networks (average size of 500 nodes each) second approach is about a 0.1s faster on each network. However, this adds up to a significant savings as I am calling rootSet on a fairly large number of networks. When I move to larger networks, the savings should be even more significant.
I think my example supports your conclusion. In addition to performing my filter as a list comprehension, choosing the proper NetworkX functions allowed me to replace my function call with a simple statement. Thanks again for your input.