Skip to content Skip to sidebar Skip to footer

Return Rows With Unique Pairs Across Columns

I'm trying to find rows that have unique pairs of values across 2 columns, so this dataframe: A B 1 0 2 0 3 0 0 1 2 1 3 1 0 2 1 2 3 2 0 3 1 3 2

Solution 1:

I think you can use applysorted + drop_duplicates:

df = df.apply(sorted, axis=1).drop_duplicates()
print (df)
   A  B
0  0  1
1  0  2
2  0  3
4  1  2
5  1  3
8  2  3

Faster solution with numpy.sort:

df = pd.DataFrame(np.sort(df.values, axis=1), index=df.index, columns=df.columns)
      .drop_duplicates()
print (df)
   A  B
0  0  1
1  0  2
2  0  3
4  1  2
5  1  3
8  2  3

Solution without sorting with DataFrame.min and DataFrame.max:

a = df.min(axis=1)
b = df.max(axis=1)
df['A'] = a
df['B'] = b
df = df.drop_duplicates()
print (df)
   A  B
0  0  1
1  0  2
2  0  3
4  1  2
5  1  3
8  2  3

Solution 2:

Loading the data:

import numpy as np
import pandas as pd

a = np.array("1 2   3   0   2   3   0   1   3   0   1   2".split("\t"),dtype=np.double)
b = np.array("0 0   0   1   1   1   2   2   2   3   3   3".split("\t"),dtype=np.double)
df = pd.DataFrame(dict(A=a,B=b))

In case you don't need to sort the entire DF:

df["trans"] = df.apply(
  lambda row: (min(row['A'], row['B']), max(row['A'], row['B'])), axis=1
)
df.drop_duplicates("trans")

Post a Comment for "Return Rows With Unique Pairs Across Columns"