Python Webscraping - NoneObeject Failure - Broken HTML?

September 27, 2022 Post a Comment

Ive got a problem with my parsing script in python. Ive tried it already at another page (yahoo-Finance) and it worked fine. On morningstar nevertheless its not working. I get the

Solution 1:

The table is dynamically loaded with a separate XHR call to an endpoint which would return JSONP response. Simulate that request, extract the JSON string from the JSONP response, load it with json, extract the HTML from the componentData key and load with BeautifulSoup:

import json
import re

import requests
from bs4 import BeautifulSoup

# make a request
url = 'http://financials.morningstar.com/financials/getFinancePart.html?&callback=jsonp1450279445504&t=XNAS:SBUX&region=usa&culture=en-US&cur=&order=asc&_=1450279445578'
response = requests.get(url)

# extract the HTML under the "componentData"
data = json.loads(re.sub(r'([a-zA-Z_0-9\.]*\()|(\);?$)', '', response.content))["componentData"]

# parse HTML
soup = BeautifulSoup(data, "html.parser")
table = soup.find('table', attrs={'class': 'r_table1 text2'})
print(table.prettify())

Build Html5

Python Webscraping - NoneObeject Failure - Broken HTML?

Solution 1:

Post a Comment for "Python Webscraping - NoneObeject Failure - Broken HTML?"