I Can't Locate A Reocurring Element From A Bs4 Object
The issue I am having is driving me crazy. I am trying to pull text from the Pro Football Reference website. The information I need is in a td element displaying qb hurries In the
Solution 1:
The issue is that this field is inside HTML comment tag.
Here is a resolution :
import bs4
import requests
res = requests.get('https://www.pro-football-reference.com/players/D/DonaAa00.htm')
soup = bs4.BeautifulSoup(res.text, 'html.parser')
extract = soup.find('div', {'id':'all_detailed_defense'})
for comments in extract.find_all(text=lambda text:isinstance(text, bs4.Comment)):
comments.extract()
soup2 = bs4.BeautifulSoup(comments, 'html.parser')
totalQbHurrys = soup2.find('td', {'data-stat':'qb_hurry'})
print(totalQbHurrys)
PS: I have used this trick : https://stackoverflow.com/a/52874885/2186074
Solution 2:
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
import pandas as pd
options = Options()
options.add_argument('--headless')
driver = webdriver.Firefox(options=options)
driver.get("https://www.pro-football-reference.com/players/D/DonaAa00.htm")
df = pd.read_html(driver.page_source, attrs={
'class': 'row_summable sortable stats_table now_sortable'}, header=1)[0]
print(df.loc[1, 'Hrry'])
driver.quit()
Output:
32
Solution 3:
The HTML you need is inside a comment so will not be directly visible in the soup
. You need to first grab the comment and then parse this as a new soup
object. From this you can then locate the tr
and th
elements. For example:
from bs4 import BeautifulSoup, Comment
import requests
res = requests.get('https://www.pro-football-reference.com/players/D/DonaAa00.htm')
soup = BeautifulSoup(res.text, 'html.parser')
div = soup.find('div', {'id':'all_detailed_defense'})
comment_html = div.find(string=lambda text: isinstance(text, Comment))
comment_soup = BeautifulSoup(comment_html, 'html.parser')
for tr in comment_soup.find_all('tr'):
row = [td.text for td in tr.find_all(['td', 'th'])]
print(row)
Giving you:
['', 'Games', 'Pass Coverage', 'Pass Rush', 'Tackles']
['Year', 'Age', 'Tm', 'Pos', 'No.', 'G', 'GS', 'Int', 'Tgt', 'Cmp', 'Cmp%', 'Yds', 'Yds/Cmp', 'Yds/Tgt', 'TD', 'Rat', 'DADOT', 'Air', 'YAC', 'Bltz', 'Hrry', 'QBKD', 'Sk', 'Prss', 'Comb', 'MTkl', 'MTkl%']
['2018*+', '27', 'LAR', 'DT', '99', '16', '16', '0', '1', '0', '0.0%', '0', '', '0.0', '0', '39.6', '-2.0', '0', '0', '0', '30', '19', '20.5', '70', '59', '6', '9.2%']
['2019*+', '28', 'LAR', 'DT', '99', '16', '16', '0', '0', '0', '', '0', '', '', '0', '', '', '0', '0', '0', '32', '9', '12.5', '55', '48', '6', '11.1%']
Post a Comment for "I Can't Locate A Reocurring Element From A Bs4 Object"