I have a 2D numpy array of data generated from topography. One of these columns of data is a mean slope value and for the analysis I am performing I want to filter out any row which has a mean slope value above 0.4.

I could do this by filtering the data as it is read from the file, but this is pretty slow, and I will run into problems of being unable to preallocating the array. One solution would be to traverse each data file twice, once to count the instances of slope > 0.4 and once to allocate the valid data to the array. I want to avoid this as the files are fairly large and this seems very clumsy.

So I started looking at masked arrays in numpy, I have used them before to filter no data values out of raster plots:

1
2
3
4
5

#load hillshade data into a numpy array, hillshade
hillshade, hillshade_header = raster.read_flt(data_path + hillshade_file)
#ignore nodata values
hillshade = np.ma.masked_where(hillshade == -9999, hillshade)

But what if I want to filter a row based on a value in a single column? I found this answer on stackoverflow which got me most of the way there.

First we delare a test array, a:

Next we create the masked array

The `mask=`

section is creating a row mask based on the condition `>0.4`

in the last column.
`np.ones_like(a)`

creates a new array of the same shape as `a`

filled with ones:

`(a[:,2]>0.4)`

evaluates the expression `>0.4`

for each cell in column 2 of the array,
resulting in:

This is a 1D array, where the `True`

corresponds to the value `a[2][2]`

but if we multiply
this array with the 2D array of ones, we get:

Nearly there! Now we just use `.T`

to transpose the array, effectively rotating it through
90 degrees. So now we can check out our mask, and the masked data:

Unfortunately this transpose trick only works with arrays where `dim1 == dim2`

so in our example we had a 3*3 array. My real data is not so square. But as is almost
always the case, someone else has had this problem before

The solution is to use `np.newaxis`

which ensures that the mask is created in the
same dimensions as the input array, `a`

. The final steps are as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

a = np.array([[8, 5, 9, 0.1],
[2, 4, 5, 0.39],
[3, 1, 4, 0.45]])
mask = np.empty(a.shape,dtype=bool)
mask[:,:] = (a[:,3] > 0.4)[:,np.newaxis]
masked_a = np.ma.MaskedArray(a,mask=mask)
>>> masked_a
masked_array(data =
[[8.0 5.0 9.0 0.1]
[2.0 4.0 5.0 0.39]
[-- -- -- --]],
mask =
[[False False False False]
[False False False False]
[ True True True True]],
fill_value = 1e+20)
final_a = np.ma.compress_rowcols(masked_a,axis=0)
>>> final_a
array([[ 8. , 5. , 9. , 0.1 ],
[ 2. , 4. , 5. , 0.39]])

The final step uses `np.ma.compress_rowcols`

to get rid of the rows that are masked out,
this is not always needed, but will make my life easier for my current project.