Skip to content Skip to sidebar Skip to footer

Changing Something From Iterating Over A Numpy Array To Vectorization

I am trying to speed up the piece of code below by vectorization: [rows,cols] = flow_direction_np.shape elevation_gain = np.zeros((rows,cols), np.float) for [i, j], flow in np.nde

Solution 1:

You can significantly improve your performance using np.where to get the indices where your conditions happen:

ind = np.where( flow_direction_np==32 )

you will see that ind is a tuple with two elements, the first is the indices of the first axis and the second of the second axis of your flow_direction_np array.

You can work out with this indices to apply the shifts: i-1, j-1 and so on...

ind_32 = (ind[0]-1, ind[1]-1)

Then you use fancy indexing to update the arrays:

elevation_gain[ ind_32 ] += sediment_transport_np[ ind ]

EDIT: applying this concept to your case would give something like this:

lookup= {32:(-1, -1),
          16:(0, -1),
           8:(+1, -1),
           4:(+1,  0),
          64:(-1,  0),
         128:(-1, +1),
           1:(0, +1),
           2:(+1, +1)}

fornum,shiftinlookup.iteritems():ind=np.where(flow_direction_np==num)ind_num=ind[0]+shift[0],ind[1]+shift[1]elevation_gain[ind_num]+=sediment_transport_np[ind]

Solution 2:

The reason that you're getting different results is due to the way python handles negative indexing.


For other folks reading, this question (and answer) are a follow up from here: Iterating through a numpy array and then indexing a value in another array

First off, I apologize that the "vectorized" code is as dense as it is. There's a through explanation in my earlier answer, so I won't repeat it here.


Your original code (in the original question) is actually subtly different than the version you posted here.

Basically, before you had

for [i, j], flow in np.ndenumerate(flow_direction_np):
    try:
        if flow == 32:
            ...
        elif ...
            ...

and you were getting an index error when i+1 or j+1 was greater than the size of the grid.

Just doing:

for [i, j], flow in np.ndenumerate(flow_direction_np):
    try:
        if flow == 32:
            ...
        elif ...
            ...
    except IndexError:
        elevation_change[i, j] = 0

is actually incorrect because it defines different boundary conditions on different sides of the grid.

In the second case, when j-1 or i-1 is negative, the value from the opposite side of the grid will be returned. However, when j+1 or i+1 is greater than the size of the grid, 0 will be returned. (Thus the "different boundary conditions".)

In the vectorized version of the code, 0 is returned both when the indices are negative and when they're beyond the bounds of the grid.


As a quick example, notice what happens with the following:

In [1]: x = [1, 2, 3]

In [2]: x[0]
Out[2]: 1In [3]: x[1]
Out[3]: 2In [4]: x[2]
Out[4]: 3In [5]: x[3]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent calllast)
<ipython-input-5-ed224ad0520d>in<module>()
----> 1 x[3]

IndexError: list index outofrangeIn [6]: x[-1]
Out[6]: 3In [7]: x[-2]
Out[7]: 2In [8]: x[-3]
Out[8]: 1In [9]: x[-4]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent calllast)
<ipython-input-9-f9c639f21256>in<module>()
----> 1 x[-4]

IndexError: list index outofrangeIn [10]:

Notice that negative indices up to the size of the sequence are valid and return the "opposite end" of the sequence. So, x[3] raises an error, while x[-1] just returns the other end.

Hopefully that's a bit clearer.

Post a Comment for "Changing Something From Iterating Over A Numpy Array To Vectorization"