17 Jun 2015

Numpy Masked Arrays

I have a 2D numpy array of data generated from topography. One of these columns of data is a mean slope value and for the analysis I am performing I want to filter out any row which has a mean slope value above 0.4.

I could do this by filtering the data as it is read from the file, but this is pretty slow, and I will run into problems of being unable to preallocating the array. One solution would be to traverse each data file twice, once to count the instances of slope > 0.4 and once to allocate the valid data to the array. I want to avoid this as the files are fairly large and this seems very clumsy.

So I started looking at masked arrays in numpy, I have used them before to filter no data values out of raster plots:

1
2
3
4
5
#load hillshade data into a numpy array, hillshade
hillshade, hillshade_header = raster.read_flt(data_path + hillshade_file)

#ignore nodata values
hillshade = np.ma.masked_where(hillshade == -9999, hillshade)

But what if I want to filter a row based on a value in a single column? I found this answer on stackoverflow which got me most of the way there.

First we delare a test array, a:

import numpy as np

a = np.array([[8, 5, 0.1],
              [2, 4, 0.39],
              [3, 1, 0.45]])

Next we create the masked array

masked_a =  np.ma.MaskedArray(a, mask=(np.ones_like(a)*(a[:,2]>0.4)).T)

The mask= section is creating a row mask based on the condition >0.4 in the last column. np.ones_like(a) creates a new array of the same shape as a filled with ones:

[[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]

(a[:,2]>0.4) evaluates the expression >0.4 for each cell in column 2 of the array, resulting in:

[False False True]

This is a 1D array, where the True corresponds to the value a[2][2] but if we multiply this array with the 2D array of ones, we get:

[[ 0.  0.  1.]
 [ 0.  0.  1.]
 [ 0.  0.  1.]]

Nearly there! Now we just use .T to transpose the array, effectively rotating it through 90 degrees. So now we can check out our mask, and the masked data:

a2 =
[[8.0 5.0 0.1]
 [2.0 4.0 0.39]
 [-- -- --]]

a2.mask =
[[False False False]
 [False False False]
 [ True  True  True]]

Unfortunately this transpose trick only works with arrays where dim1 == dim2 so in our example we had a 3*3 array. My real data is not so square. But as is almost always the case, someone else has had this problem before

The solution is to use np.newaxis which ensures that the mask is created in the same dimensions as the input array, a. The final steps are as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
a = np.array([[8, 5, 9, 0.1],
              [2, 4, 5, 0.39],
              [3, 1, 4, 0.45]])

mask = np.empty(a.shape,dtype=bool)
mask[:,:] = (a[:,3] > 0.4)[:,np.newaxis]
masked_a = np.ma.MaskedArray(a,mask=mask)

>>> masked_a
masked_array(data =
 [[8.0 5.0 9.0 0.1]
 [2.0 4.0 5.0 0.39]
 [-- -- -- --]],
             mask =
 [[False False False False]
 [False False False False]
 [ True  True  True  True]],
       fill_value = 1e+20)

final_a = np.ma.compress_rowcols(masked_a,axis=0)

>>> final_a
array([[ 8.  ,  5.  ,  9.  ,  0.1 ],
       [ 2.  ,  4.  ,  5.  ,  0.39]])

The final step uses np.ma.compress_rowcols to get rid of the rows that are masked out, this is not always needed, but will make my life easier for my current project.