使用该库
from bs4 import BeautifulSoup
1、过滤所有元素的属性用法
soup = BeautifulSoup(html_code, 'html.parser')
def remove_all_attributes(soup:BeautifulSoup):
for tag in soup.find_all():
tag.attrs = {}
return str(soup)
2、删除代码中所有的Span元素
# 删除 span 元素
soup = BeautifulSoup(html_code, 'html.parser')
all_spans = soup.find_all('span')
for span in all_spans:
span.decompose()
3、删除指定class
删除所有 class 为 “cosd-citation-citationId” 的 span 元素
soup = BeautifulSoup(html_code, ‘html.parser’)
citation_spans = soup.find_all(‘span’, class_=’cosd-citation-citationId’)
for span in citation_spans:
span.decompose()
此外使用BeautifulSoup可以对html的元素进行增删改查。