I have a 2D numpy array of data generated from topography. One of these columns of data is a mean slope value and for the analysis I am performing I want to filter out any row which has a mean slope value above 0.4.
I could do this by filtering the data as it is read from the file, but this is pretty slow, and I will run into problems of being unable to preallocating the array. One solution would be to traverse each data file twice, once to count the instances of slope > 0.4 and once to allocate the valid data to the array. I want to avoid this as the files are fairly large and this seems very clumsy.
So I started looking at masked arrays in numpy, I have used them before to filter no data values out of raster plots:
1
2
3
4
5
#load hillshade data into a numpy array, hillshade
hillshade, hillshade_header = raster.read_flt(data_path + hillshade_file)
#ignore nodata values
hillshade = np.ma.masked_where(hillshade == -9999, hillshade)
But what if I want to filter a row based on a value in a single column? I found this answer on stackoverflow which got me most of the way there.
First we delare a test array, a:
Next we create the masked array
The mask=
section is creating a row mask based on the condition >0.4
in the last column.
np.ones_like(a)
creates a new array of the same shape as a
filled with ones:
(a[:,2]>0.4)
evaluates the expression >0.4
for each cell in column 2 of the array,
resulting in:
This is a 1D array, where the True
corresponds to the value a[2][2]
but if we multiply
this array with the 2D array of ones, we get:
Nearly there! Now we just use .T
to transpose the array, effectively rotating it through
90 degrees. So now we can check out our mask, and the masked data:
Unfortunately this transpose trick only works with arrays where dim1 == dim2
so in our example we had a 3*3 array. My real data is not so square. But as is almost
always the case, someone else has had this problem before
The solution is to use np.newaxis
which ensures that the mask is created in the
same dimensions as the input array, a
. The final steps are as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
a = np.array([[8, 5, 9, 0.1],
[2, 4, 5, 0.39],
[3, 1, 4, 0.45]])
mask = np.empty(a.shape,dtype=bool)
mask[:,:] = (a[:,3] > 0.4)[:,np.newaxis]
masked_a = np.ma.MaskedArray(a,mask=mask)
>>> masked_a
masked_array(data =
[[8.0 5.0 9.0 0.1]
[2.0 4.0 5.0 0.39]
[-- -- -- --]],
mask =
[[False False False False]
[False False False False]
[ True True True True]],
fill_value = 1e+20)
final_a = np.ma.compress_rowcols(masked_a,axis=0)
>>> final_a
array([[ 8. , 5. , 9. , 0.1 ],
[ 2. , 4. , 5. , 0.39]])
The final step uses np.ma.compress_rowcols
to get rid of the rows that are masked out,
this is not always needed, but will make my life easier for my current project.