Python Webscraping - NoneObeject Failure - Broken HTML?
Ive got a problem with my parsing script in python. Ive tried it already at another page (yahoo-Finance) and it worked fine. On morningstar nevertheless its not working. I get the
Solution 1:
The table is dynamically loaded with a separate XHR call to an endpoint which would return JSONP
response. Simulate that request, extract the JSON string from the JSONP response, load it with json
, extract the HTML from the componentData
key and load with BeautifulSoup
:
import json
import re
import requests
from bs4 import BeautifulSoup
# make a request
url = 'http://financials.morningstar.com/financials/getFinancePart.html?&callback=jsonp1450279445504&t=XNAS:SBUX®ion=usa&culture=en-US&cur=&order=asc&_=1450279445578'
response = requests.get(url)
# extract the HTML under the "componentData"
data = json.loads(re.sub(r'([a-zA-Z_0-9\.]*\()|(\);?$)', '', response.content))["componentData"]
# parse HTML
soup = BeautifulSoup(data, "html.parser")
table = soup.find('table', attrs={'class': 'r_table1 text2'})
print(table.prettify())
Post a Comment for "Python Webscraping - NoneObeject Failure - Broken HTML?"