Skip to content Skip to sidebar Skip to footer

Lxml.html Extract A String By Searching For A Keyword

I have a portion of html like below
  • The text
  • I want to get

    Solution 1:

    from lxml import html
    
    s = '<li><label>The Keyword:</label><span><ahref="../../..">The text</a></span></li>'
    
    tree = html.fromstring(s)
    text = tree.text_content()
    print text
    

    Solution 2:

    You can modify the XPath slightly to work with your current structure - by getting the parent of the label, then looking back for the fist a element, and taking the text from that...

    >>> tree.xpath('//*[contains(text(), "The Keyword:")]/..//a/text()')
    ['The text']

    But that may not be flexible enough...

    Post a Comment for "Lxml.html Extract A String By Searching For A Keyword"