Skip to content Skip to sidebar Skip to footer

Finding Rows In Numpy Array With Specific Condition Efficiently

I have two numpy array 2D. What I want to do is to find specific rows of np_weight in the np_sentence. For example: #rows are features, columns are clusters or whatever np_weight =

Solution 1:

Here is one approach: The function f below creates a mask the same shape as weight (plus one dummy row of Falses) marking the top five entries in each column with True.

It then uses np_sentence to index into the mask and counts the True for each column,row pair and compares with the threshold two.

Only complication: We must suppress duplicate values in rows of np_sentence. To that end we sort the rows and then direct each index which equals its left neighbor to the dummy row in the mask.

This function returns a mask. The last line of the script demonstrates how to convert that mask to indices.

import numpy as np

deff(a1, a2, n_top, n_hit):
    N,M = a1.shape
    mask = np.zeros((N+1,M), dtype=bool)
    np.greater_equal(
        a1,a1[a1.argpartition(N-n_top, axis=0)[N-n_top], np.arange(M)],
        out=mask[:N])
    a2 = np.sort(a2, axis=1)
    a2[:,1:][a2[:,1:]==a2[:,:-1]] = N
    return np.count_nonzero(mask[a2], axis=1) >= n_hit

a1 = np.matrix("""[[9.96859395 8.65543961 6.07429382 4.58735497]
 [3.21776471 8.33560037 2.11424961 8.89739975]
 [9.74560314 5.94640798 6.10318198 7.33056421]
 [6.60986206 2.36877835 3.06143215 7.82384351]
 [9.49702267 9.98664568 3.89140374 5.42108704]
 [1.93551346 8.45768507 8.60233715 8.09610975]
 [5.21892795 4.18786508 5.82665674 8.28397111]]"""[2:-2].replace("]\n [",";")).A

a2 = np.matrix("""[[2 5 1]
 [1 6 4]
 [0 0 0]
 [2 3 6]
 [4 2 4]]"""[2:-2].replace("]\n [",";")).A

print(f(a1,a2,5,2))

from itertools import groupby
from operator import itemgetter

print([[*map(itemgetter(1),grp)] for k,grp in groupby(np.argwhere(f(a1,a2,5,2).T),itemgetter(0))])

Output:

[[FalseTrueTrueTrue]
 [ TrueTrueTrueTrue]
 [FalseFalseFalseFalse]
 [ TrueFalseTrueTrue]
 [ TrueTrueTrueFalse]]
[[1, 3, 4], [0, 1, 4], [0, 1, 3, 4], [0, 1, 3]]

Post a Comment for "Finding Rows In Numpy Array With Specific Condition Efficiently"