-
Notifications
You must be signed in to change notification settings - Fork 133
Python Optimization
Table of Contents
While RAVEN takes great advantage of the power and flexibility of Python, sometimes the most efficient methods in Python are unintuitive. To that end, we gather here a list of suggestions for rapid development.
If you find additional tips, let us know!
In general, python is quite efficient and it is seldom worth spending significant time optimizing every line of code. However, there are a few instances where optimization is very worthwhile:
- A line of code could be evaluated millions of times per RAVEN run,
- A line of code is in the serial portions of a RAVEN run, instead of the parallel portion.
In these cases, it may be worth trying to improve the speed of the code. For finding bottlenecks in the code, see Line Profiling below.
As of PR #1318, line profiling through the RAVEN command line has been automated. By running raven_framework
with the flag --profile
, a printout will be provided at the end of the RAVEN run showing several useful pieces of information about select methods.
NOTE: To use this feature, the RAVEN-optional library line_profiler
needs to be installed via the RAVEN library installation tools.
To enable timings and printouts for a method in the code, import Decorators
in the module containing the method, and decorate the desired method with @Decorators.timingProfile
. Both module-level and class-level methods can be decorated. For example, in Simulation.py
:
import Decorators
[...]
@Decorators.timingProfile
def setInputFiles(self,inputFiles):
"""
Method that can be used to set the input files that the program received.
These are currently used for cluster running where the program
needs to be restarted on a different node.
@ In, inputFiles, list, input files list
@ Out, None
"""
self.runInfoDict['SimulationFiles' ] = inputFiles
Then, running any given input with the --profile
flag will give line-by-line information about all tagged methods, such as:
$ cd ~/projects/raven/tests/framework/Samplers/Restart
$ ~/projects/raven/raven_framework --profile test_restart_MC.xml
provides the following printout:
Wrote profile results to Driver.py.lprof
Timer unit: 1e-06 s
Total time: 8e-06 s
File: /Users/talbpw/projects/raven/framework/Simulation.py
Function: setInputFiles at line 348
Line # Hits Time Per Hit % Time Line Contents
==============================================================
348 @Decorators.timingProfile
349 def setInputFiles(self,inputFiles):
350 """
351 Method that can be used to set the input files that the program received.
352 These are currently used for cluster running where the program
353 needs to be restarted on a different node.
354 @ In, inputFiles, list, input files list
355 @ Out, None
356 """
357 1 8.0 8.0 100.0 self.runInfoDict['SimulationFiles' ] = inputFiles
Note in particular the columns, which show the line number in the module, the number of calls to each line during the run, the total time (in microseconds) spent on each line, the time-per-call, and the percentage time in that method spent on each line. In this uninspiring example, there is only one line of actionable code, so 100% of the time is spent on that line.
Once a bottleneck is discovered, the developer can add the decorator to bottleneck methods in the hierarchy until improvements, if any, are found.
In general, loops in Python are quite slow, and most optimization techniques revolve around avoiding them. Any time operations can be done in vectors or without looping they should be. Here are some ways to mitigate looping.
If we want to populate a list with items, we might be tempted to use an algorithm such as the following:
a = []
for x in range(100):
a.append(x)
# time: 1.307e-5
Obviously this specific code is poor and not very useful; however, the same structure is seen frequently. On a particular test machine, averaging over 100,000 samples, this code took 1.307e-5 seconds per evaluation. Using list comprehensions, instead, we could do
a = list(x for x in range(100))
# time: 1.286e-6
On the same test machine, averaging over 100,000 samples, this code took 1.286e-6 seconds per evaluation, or roughly an order of magnitude improvement in speed.
Furthermore, if a list is being constructed only for iterating over, an iterator is faster than an actual list, as it doesn't evaluate until it is needed. Keeping with our previous example, consider the following:
a = list(x for x in range(100))
for x in a:
print x
# time: 3.230e-4
versus
a = (x for x in range(100))
for x in a:
print x
# time: 3.194e-4
Again, not particularly useful code, but the pattern gets used. On the same test machine, the first took 3.230e-4 seconds, while the second took 3.194e-4 seconds. This saves less time than comprehensions, but does save on memory as well.
If we want to find the element-wise product of two lists, we might try
a = range(1,1001)
b = range(1000,2000)
c = list(a[i]*b[i] for i in range(len(a)))
# time: 1.406e-4
Using numpy
arrays,
import numpy as np
a = np.arange(1,1001)
b = np.arange(1000,2000)
c = a*b
# time: 4.81e-6
For vector or matrix operations of floats, numpy
methods tend to be far superior to python loops.
While in small dictionaries the lookup if key in myDict.keys()
looks innocuous, it is rather expensive to look up the key. One option is to store the lookup as sorted tuples and use the bisect algorithm. For example, searching the list keys
for the entry find
:
keys = itertools.product('abcd',repeat=10) #creates all combinations of these letters in 10-letter words, sorted
find = 'c'*10 # 'cccccccccc'
Naively,
if find in keys:
print 'yes'
# time: 8.048e-3 seconds
Using bisect
,
i = bisect.bisect_left(keys,find)
if i != len(keys) and keys[i] == find:
print 'yes'
# time: 1.044e-6 seconds
However, bisect
only works on sorted lists.
Python natively uses assert
statements to check boolean evaluations that are necessary when developing or debugging, but not necessary during run time. For example:
if True: pass
# time: 7.701e-8 seconds
assert(True)
# time: 5.984e-8 seconds
Furthermore, running Python with the -O
option erases these assertion statements,
# using "python -O test.py"
assert(True)
# time: 3.600e-8 seconds
By default, running raven_framework
uses the -O
option. This is the expected run for RAVEN users. For developers, however, assert
statements can help debug development. To run RAVEN in debug mode, pass the -D
flag to raven_framework
, as
raven_framework -D test.py