Is It Possible To Just Get The Tags Without A Class Or Id With Beautifulsoup?

August 21, 2024 Post a Comment

I have several thousands HTML sites and I am trying to filter the text from these sites. I am doing this with beautiful soup. get_text() gives me to much unecessary information fr

Solution 1:

You can set False for class and id and it will get tags without class and id

soup.find_all('p', {'class': False, 'id': False})

or (word class_ has _ because there is keyword class in Python)

soup.find_all('p', class_=False, id=False)

from bs4 import BeautifulSoup as BS

text = '<p class="A">text A</p>  <p>text B</p>  <p id="C">text C</p>'

soup = BS(text, 'html.parser')

# ----

all_items = soup.find_all('p', {'class': False, 'id': False})

for item in all_items:
    print(item.text)

# ---

all_items = soup.find_all('p', class_=False, id=False)

for item in all_items:
    print(item.text)

EDIT: If you want tags without any attributes then you can filter items using not item.attrs

for item in all_items:if not item.attrs:print(item.text)

from bs4 import BeautifulSoup as BS

text = '<pclass="A">text A</p><p>text B</p><pid="C">text C</p><pdata="D">text D</p>'

soup = BS(text, 'html.parser')

all_items = soup.find_all('p')

for item in all_items:
    if not item.attrs:
        print(item.text)

theprettymind1987

Is It Possible To Just Get The Tags Without A Class Or Id With Beautifulsoup?

Solution 1:

Post a Comment for "Is It Possible To Just Get The Tags Without A Class Or Id With Beautifulsoup?"

Widget HTML #3