Is It Possible To Just Get The Tags Without A Class Or Id With Beautifulsoup?
I have several thousands HTML sites and I am trying to filter the text from these sites. I am doing this with beautiful soup. get_text() gives me to much unecessary information fr
Solution 1:
You can set False
for class
and id
and it will get tags without class
and id
soup.find_all('p', {'class': False, 'id': False})
or (word class_
has _
because there is keyword class
in Python)
soup.find_all('p', class_=False, id=False)
from bs4 import BeautifulSoup as BS
text = '<p class="A">text A</p> <p>text B</p> <p id="C">text C</p>'
soup = BS(text, 'html.parser')
# ----
all_items = soup.find_all('p', {'class': False, 'id': False})
for item in all_items:
print(item.text)
# ---
all_items = soup.find_all('p', class_=False, id=False)
for item in all_items:
print(item.text)
EDIT: If you want tags without any attributes then you can filter items using not item.attrs
for item in all_items:if not item.attrs:print(item.text)
from bs4 import BeautifulSoup as BS
text = '<pclass="A">text A</p><p>text B</p><pid="C">text C</p><pdata="D">text D</p>'
soup = BS(text, 'html.parser')
all_items = soup.find_all('p')
for item in all_items:
if not item.attrs:
print(item.text)
Post a Comment for "Is It Possible To Just Get The Tags Without A Class Or Id With Beautifulsoup?"