![]() | the sleepy snakeindex :: html | |||||||||||||||||||||||||||||||
htmlHTML 4.0 module This module is a python wrapper for HTML tags. It contains classes representing all HTML 4.0tags, allowing easy construction and parsing of HTML pages from pure python. Parsing HTML pages is somewhat limited. The module does no magic at all, relying exclusively on pythons HTMLParser, so html to parse has to be "well-formed". Also no efford is ,taken to generate nicely formated output. All this is left as an exercise for tidylib or the like. The module does some error checking like reporting unsupported attributes for a specified tag or unsupported child tags. The documentation is split into the following subsections:
The html module defines the following errors:
The html module provides the following functions:
from html import *
# constructing a simple page
page = (
HtmlFile()
(
Doctype(),
Html()
(
Head()
(
Title()('my title')
),
Body()
(
'Hello, World!'
)
)
)
)
page.save(outpath="myfile.html")
# parsing a page (and tidy markup)
page = HtmlFile(url="some/url", tidy="path/to/tidy")
for tag in page.walk():
print tag
| ||||||||||||||||||||||||||||||||