findall is returning a list with everything that is captured between the brackets in the regular expression. I used re.DOTALL so the dot also captures end of lines.
I used \s* because I was not sure whether there would be any whitespace.
Solution 2:
This works, but may not be very robust:
import re
r = re.compile('<HR>\s?<fontsize="\+1">(.+?)</font>\s?<BR>', re.IGNORECASE)
r.findall(html)
You will be better off using a proper HTML parser. BeautifulSoup is excellent and easy to use. Look it up.
Post a Comment for "Search In HTML Page Using Regex Patterns With Python"