Locating specific <p> tag after <h1> tag in Python Html Parser

I'm attempting to parse through a series of webpages and grab just 3 paragraphs after the header occurs on each of these pages. They all have the same format (I think). I'm using urllib2 and beautiful soup, but i'm not quite sure how to just jump the to header and then grab the few

tags that follow it.I know the first split("h1") is not correct but its my only decent attempt so far. Here's my code,

from bs4 import BeautifulSoup
import urllib2
from HTMLParser import HTMLParser

BANNED = ["/events/new"]

def main():

    soup = BeautifulSoup(urllib2.urlopen('http://b-line.binghamton.edu').read())

     for link in soup.find_all('a'):
         link = link.get('href')      
        if link != None and link not in BANNED and "/events/" in link:
            print()
            print(link)          
            eventPage = "http://b-line.binghamton.edu" + link
            bLineSubPage = urllib2.urlopen(eventPage)   
            bLineSubPageStr = bLineSubPage.read()
            headAccum = 0  
            for data in bLineSubPageStr.split("<h1>"):
                if(headAccum < 1):
                    accum = 0 
                    for subData in data.split("<p>"):
                        if(accum < 5):
                            try:
                                print(BeautifulSoup(subData).get_text())
                            except Exception as e:
                                print(e) 
                            accum+=1
                    print()
                headAccum += 1           
            bLineSubPage.close()         
            print()

main()

Answers


>>> page_txt = urllib2.urlopen("http://b-line.binghamton.edu/events/9305").read(
>>> soup = bs4.BeautifulSoup(pg.split("<h1>",1)[-1])
>>> print soup.find_all("p")[:3]

is that what you want?


Need Your Help

Screenshot of the main WPF window under second monitor/TV

c# .net wpf

I use my application on the second monitor and sometimes at the primary monitor of the computer.

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.