Python math performance

Discussion in 'Programmer's Corner' started by someonesdad, Sep 14, 2010.

  1. someonesdad

    Thread Starter Senior Member

    Jul 7, 2009
    1,585
    141
    I've written a script that will measure the execution speed of some of the math functions in python.

    I have written a number of these programs over the years, yet I keep managing to lose them or not remember where they are or what I named them. So, I just wrote another one; I've attached both the code and the results of some runs on Windows 7 and Linux.

    I'd be interested in seeing this thread collect a few posts of the results others get from running this script on their machines. If you choose to run the script, save it in a directory, open a shell window, cd to that directory, then type 'python python_perf.py'. You'll probably want to redirect the results to a file. I always run the program twice, then use a graphical diff tool like kdiff3 or WinMerge to look at the differences.

    The script rounds the times in ns to two significant digits. Interpreting exactly what's going on takes a bit of work, as there can be overhead due to setting up function calls and type conversions. But the numbers are useful for getting a rough idea about the inherent cost of computing some python math functions. If you compare the Windows 7 and Linux results, you'll see that Linux is a bit faster for the elementary functions, probably due to better libc implementations. My system has a 2.5 GHz Intel Quad core processor.

    If you have mpmath installed, the script will also print out timings for the mpmath library (see http://code.google.com/p/mpmath). You'll see that the times for mpmath's functions are orders of magnitude larger than the functions done in hardware; this shouldn't be too surprising and, unless you need to use lots of mpmath calls, you'll still get pretty good performance).

    If there is a reasonable number of responses from others, I'll collect all the data and put it into a single spreadsheet.
     
  2. bitrex

    Member

    Dec 13, 2009
    79
    4
    Thanks! I'm learning Python, in hopes of using it and NumPy as an alternative to MATLAB. So this was of interest to me. Here are my results, on my Windows Vista machine with a 2.8 GHz Phenom II 720 three-core processor:

    Code ( (Unknown Language)):
    1.           Python performance measures Sat Sep 25 18:41:57 2010
    2.           -----------------------------------------------------
    3.                             log10(N) = 5
    4.                                                                          Time
    5.   Operation               Expression                                     in ns
    6.   ---------------------   -----------------------------------------      -----
    7.  
    8. Arithmetic
    9.   Integer addition        3 + 4                                             32
    10.   Integer subtraction     3 - 4                                             31
    11.   Integer multiplication  3*4                                               31
    12.   Integer division        3//4                                              33
    13.   Long addition           30000000000000000 + 40000000000000000             32
    14.   Long subtraction        30000000000000000 - 40000000000000000             32
    15.   Long multiplication     30000000000000000*40000000000000000               32
    16.   Long division           30000000000000000//40000000000000000              32
    17.   Float addition          3. + 4.                                           32
    18.   Float subtraction       3. - 4.                                           32
    19.   Float multiplication    3.*4.                                             32
    20.   Float division          3./4.                                             77
    21.   Complex addition        (1. + 2.j) + (3. + 4.j)                           85
    22.   Complex subtraction     (1. + 2.j) - (3. + 4.j)                           87
    23.   Complex multiplication  (1. + 2.j)*(3. + 4.j)                             93
    24.   Complex division        (1. + 2.j)/(3. + 4.j)                            150
    25.  
    26. Elementary real functions
    27.   sin                     sin(1.)                                          170
    28.   cos                     cos(1.)                                          170
    29.   tan                     tan(1.)                                          180
    30.   asin                    asin(1.)                                         150
    31.   acos                    acos(1.)                                         160
    32.   atan                    atan(1.)                                         190
    33.   atan2                   atan2(1., 2.)                                    320
    34.   hypot                   hypot(1., 2.)                                    420
    35.   sinh                    sinh(1.)                                         210
    36.   cosh                    cosh(1.)                                         230
    37.   tanh                    tanh(1.)                                         240
    38.   asinh                   asinh(1.)                                        250
    39.   acosh                   acosh(1.)                                        140
    40.   atanh                   atan(1.)                                         190
    41.   fabs                    fabs(-1.2)                                       180
    42.   factorial(10)           factorial(10)                                    420
    43.   factorial(100)          factorial(100)                                17,000
    44.   floor                   floor(-1.2)                                      150
    45.   fmod                    fmod(10.1, 0.2)                                  220
    46.   frexp                   frexp(10.1)                                      290
    47.   ldexp                   ldexp(10.1, 2)                                   320
    48.   modf                    modf(10.1)                                       230
    49.   trunc                   trunc(10.1)                                      290
    50.   exp                     exp(10.1)                                        160
    51.   log                     log(10.1)                                        230
    52.   log1p                   log1p(10.1)                                      190
    53.   log10                   log10(10.1)                                      180
    54.   pow                     pow(10.1, 2)                                     290
    55.   sqrt                    sqrt(10.1)                                       140
    56.   degrees                 degrees(10.1)                                     76
    57.   radians                 radians(10.1)                                     75
    58.  
    59. Elementary complex functions
    60.   sin                     sin(1.j)                                         560
    61.   cos                     cos(1.j)                                         570
    62.   tan                     tan(1.j)                                         570
    63.   asin                    asin(1.j)                                      1,100
    64.   acos                    acos(1.j)                                      1,200
    65.   atan                    atan(0.5j)                                       460
    66.   atan                    atan(0.5)                                        460
    67.   sinh                    sinh(1.j)                                        470
    68.   cosh                    cosh(1.j)                                        510
    69.   tanh                    tanh(1.j)                                        450
    70.   asinh                   asinh(1.j)                                       700
    71.   acosh                   acosh(1.j)                                     1,200
    72.   atanh                   atan(0.5j)                                       460
    73.   exp                     exp(1.j)                                         430
    74.   log                     log(1.j)                                         580
    75.   log10                   log10(1.j)                                       630
    76.   sqrt                    sqrt(1.j)                                        560
    77.   phase                   phase(1.j)                                       270
    78.   polar                   polar(1.j)                                       630
    79.   rect                    rect(1., 2.)                                     340
    80.  
    81. Number information
    82.   Max int                           2147483647
    83.   Largest negative int              -2147483648
    84.   Floating point info:
    85.     Number of digits                15
    86.     Mantissa digits                 53
    87.     Exponent radix                  2
    88.     Maximum floating point number   1.79769313486e+308
    89.     Minimum floating point number   2.22507385851e-308
    90.     Maximum exponent for radix      1024
    91.     Maximum exponent for 10         308
    92.     Minimum exponent for radix      -1021
    93.     Minimum exponent for 10         -307
    94.     (First number > 1) - 1          2.22044604925e-16
    95.     Addition rounds                 1
    96.  
    97. System information
    98.   2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit (Intel)]
    99.   flags                             sys.flags(debug=0, py3k_warning=0, division_warning=0, division_new=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, tabcheck=0, verbose=0, unicode=0, bytes_warning=0)
    100.   Platform                          Windows-Vista-6.0.6001-SP1
    101.                                     win32
    102.   Python implementation             CPython
    103.   API version                       1013
    104.   Java version                      ('', '', ('', '', ''), ('', '', ''))
    105.   Win32 version                     ('Vista', '6.0.6001', 'SP1', u'Multiprocessor Free')
    106.   Mac version                       ('', ('', '', ''), '')
    107.   Linux version                     ('', '', '')
    108.   libc version                      ('', '')
    109.   log2(maximum container size)      31.00
    110.  
    111. Total time to execute = 2.15 s
     
    Last edited: Sep 25, 2010
  3. sceadwian

    New Member

    Jun 1, 2009
    499
    37
    How many loops of those instructions is that code measuring someonesdad and have you taken into account the loop delay? That can't possible be timing a single instruction or the value should be in the pico second range for a modern CPU.
     
  4. bitrex

    Member

    Dec 13, 2009
    79
    4
    At the top of the results it says log(N) = 5, so I assume that means that it's running each instruction 100k times.
     
  5. sceadwian

    New Member

    Jun 1, 2009
    499
    37
    100,000 doesn't make any sense either, that'd be the femto second range. 1000 maybe I have no idea what log(n) = 5 means in that situations because the log10 of 5 (if n = 5) isn't even a whole number.
     
  6. someonesdad

    Thread Starter Senior Member

    Jul 7, 2009
    1,585
    141
    The timings were done using the timeit module of python; this module is used for timing small snippets of python code. The basic method is to pass a code object to the timeit code along with n, the number of times to repeat (the default value in the code was 1e5 times; read the code for a few exceptions to this). The number output is the time in seconds returned by the timeit call divided by n and multiplied by 1e9.

    The timings produced are relative numbers and should only be roughly compared to numbers produced by the same script on other machines. I haven't read the code for the timeit module, but I'll assume the designer did his due diligence in accounting for various things like loop overhead (the man page says they do turn off garbage collection). And timings on a multitasking OS can be challenging because of task switching. However, this script with its defaults runs quickly if mpmath isn't installed, so you can see how well the numbers repeat in subsequent calls.

    You can make fairly reasonable predictions with the results. For example, if it takes 10 seconds to run the script with n set to 1e5, you can be confident it will take about 30 seconds with n set to 3e5.
     
  7. bitrex

    Member

    Dec 13, 2009
    79
    4
    Ok, checking the script documentation it says:
    So it looks like timeit function gets the running time of the function, and the output is an average of N= 100,000 measurements ( log10(N) = 5 ). Then 32 ns does indeed seem like a long time to calculate 3 + 4!
     
  8. sceadwian

    New Member

    Jun 1, 2009
    499
    37
    someonesdays, the relative numbers that it produced don't make sense though. Even the basic single byte math is saying 30 nano seconds for a single byte add! That's off from the reality of the processing power of the type of chip that bitrex is using by several orders of magnitude! 32mips for a single byte add? There are micro controllers that can do that (factoring out the branch delays)

    If you're using 1e5 that's only five loops, and if this is the result it gives then there is no way that the loop delay is being accounted for and operating system overhead can't possibly account for the difference unless he was playing a high end video game or processing video images during the benchmark.

    I'm just trying to understand what this is actually doing, I don't know the libraries so I'm not going to even think of trying to read the code directly, but the results are not consistent with anything that makes sense in the metrics that it's producing.
     
  9. sceadwian

    New Member

    Jun 1, 2009
    499
    37
    I see what you ment by the 100k now bitrex, sorry that really confused me =) It seems an ass backwards way of displaying the number of loops, I'd have just shown a simple integer value or basic exponent not a logarithm, the only time I'd use a log is if the number of iterations was seriously large, like trillions.

    Alright at any rate, assuming 100,000 iterations for each instruction what can we determine? 100,000 iterations at 32ns for the group of adds is 32... FEMTO seconds? that would require a processor with a MIPS rating of 300ghz.... does not compute.
     
    Last edited: Sep 26, 2010
  10. bitrex

    Member

    Dec 13, 2009
    79
    4
    I don't think it's running the calculations 100,000 times and then summing the times for each iteration, but instead timing each of the iterations and then taking an average. So doing a single integer addition seems to indeed take 32 ns, on average, as far as the timings generated by this program are concerned.
     
  11. sceadwian

    New Member

    Jun 1, 2009
    499
    37
    That's what disturbs me bitrex. Those numbers don't mean anything. I don't care how bad the python code is there is no way it's taking that long to run those instructions. Even as a comparative metric it doesn't help anyone.
     
  12. someonesdad

    Thread Starter Senior Member

    Jul 7, 2009
    1,585
    141
    The numbers definitely have meaning -- if you're confused, I suggest reading the code to see what's being done. The script measures the time it takes to evaluate different expressions.

    All the work is being done by line 289 in the script. Let's take a floating point addition as an example: the expression is "3. + 4.". The time it takes for that expression to be evaluated by python is gotten by the following code:

    Code ( (Unknown Language)):
    1.  
    2. T = timeit.Timer(expr, setup)
    3. t = T.timeit(n)  # This is line 289
    4. out(fmt % (op, expr, TwoFig(t/n*1e9)))
    5.  
    Here, expr is the string "3. + 4.". n is the number of iterations and, in the script, defaults to 1e5 (setup is a string that is empty in the case we're talking about). The T.timeit() method call measures the time to evaluate that expression the indicated number of times. Python is an interpreted language and lots of things are going on to evaluate this expression, such as passing it to a parser, determining that it's an arithmetical expression, conversion from string to floating point, and, finally, the actual addition. One would have to go look at python's C code to actually see the steps that were being done -- I'm just guessing at some of the steps.

    The time t in seconds that it took to evaluate n expressions is then divided by n and multiplied by 1e9 to give the time per operation in ns. Thus, you see that the time given is the approximate time it takes the python interpreter to evaluate the expression given by the string "3. + 4.".

    Again, remember this script is evaluating math expressions that start out as strings, so don't assume it's doing a single low-level thing like adding two numbers in registers -- far from it!

    Actually, if you're interested, you can type the same code in at the command line in an interactive python session and you'll get similar numbers to what the script generates:

    Code ( (Unknown Language)):
    1.  
    2. >>> import timeit
    3. >>> T = timeit.Timer("3. + 4.")
    4. >>> n = int(1e5)      
    5. >>> t = T.timeit(n)
    6. >>> print t/n*1e9
    7. 31.3401222229
    8.  
     
  13. sceadwian

    New Member

    Jun 1, 2009
    499
    37
    Yes I understand that someonesdad, but a 1 byte interger add on a modern processor should take pico seconds not nano seconds. The results given are comparable to an ARM micro controller run at a few hundred mhz... Those results coming from a 2.4ghz based system do not make logical sense, the metrics being displayed are not representative of any possible reality, I don't know what anyone says about code bloat, but it can't result in a dispersion of results of THAT order on the given hardware, there is a flaw in the methods used in the software or libraries.

    If those are in fact the results the time it library produces then try increasing the number of iterations to 10*10^12th iterations and see what you get. As a comparatibe metric the numbers do not fit even a basic sanity check, even for a compiled language run from a hard drive!
     
    Last edited: Sep 27, 2010
  14. sceadwian

    New Member

    Jun 1, 2009
    499
    37
    Try t/(n*1e9)
     
Loading...