Skip to content Skip to sidebar Skip to footer

Detecting Pos Tag Pattern Along With Specified Words

I need to identify certain POS tags before/after certain specified words, for example the following tagged sentence: [('This', 'DT'), ('feature', 'NN'), ('would', 'MD'), ('be', 'V

Solution 1:

Assuming you want to check literally for "would" followed by "be", followed by some adjective, you can do this:

defwould_be(tagged):
    returnany(['would', 'be', 'JJ'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][1]] for i in xrange(len(tagged) - 2))

The input is a POS tagged sentence (list of tuples, as per NLTK).

It checks if there are any three elements in the list such that "would" is next to "be" and "be" is next to a word tagged as an adjective ('JJ'). It will return True as soon as this "pattern" is matched.

You can do something very similar for the second type of sentence:

defam_able_to(tagged):
    returnany(['am', 'able', 'to', 'VB'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][0], tagged[i+3][1]] for i in xrange(len(tagged) - 3))

Here's a driver for the program:

s1 = [('This', 'DT'), ('feature', 'NN'), ('would', 'MD'), ('be', 'VB'), ('nice', 'JJ'), ('to', 'TO'), ('have', 'VB')]
s2 = [('I', 'PRP'), ('am', 'VBP'), ('able', 'JJ'), ('to', 'TO'), ('delete', 'VB'), ('the', 'DT'), ('group', 'NN'), ('functionality', 'NN')]

defwould_be(tagged):
   returnany(['would', 'be', 'JJ'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][1]] for i in xrange(len(tagged) - 2))

defam_able_to(tagged):
    returnany(['am', 'able', 'to', 'VB'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][0], tagged[i+3][1]] for i in xrange(len(tagged) - 3))

sent1 = ' '.join(s[0] for s in s1)
sent2 = ' '.join(s[0] for s in s2)

print("Is '{1}' of type 'would be' + adj? {0}".format(would_be(s1), sent1))
print("Is '{1}' of type 'am able to' + verb? {0}".format(am_able_to(s1), sent1))

print("Is '{1}' of type 'would be' + adj? {0}".format(would_be(s2), sent2))
print("Is '{1}' of type 'am able to' + verb? {0}".format(am_able_to(s2), sent2))

This correctly outputs:

Is 'This feature would be nice to have' of type'would be' + adj? True
Is 'This feature would be nice to have' of type'am able to' + verb? False
Is 'I am able to delete the group functionality' of type'would be' + adj? False
Is 'I am able to delete the group functionality' of type'am able to' + verb? True

If you'd like to generalize this, you can change whether you're checking the literal words or their POS tag.

Post a Comment for "Detecting Pos Tag Pattern Along With Specified Words"