xmlua.HTMLSAXParser
classIt's a class for parsing a HTML with SAX(Simple API for XML).
SAX is different from DOM, processing parse documents line by line. DOM processing parse after read all documents into memory. So, SAX can parse documents with much less memory and fast.
You can register your callback method which call when occured events below.
Call back event list:
xmlua.HTMLSAXParser.new() -> HTMLSAXParser
It makes HTMLSAXParser object.
You can make object of xmlua.HTMLSAXParser
class as below example.
Example:
local xmlua = require("xmlua")
local parser = xmlua.HTMLSAXParser.new()
parse(html) -> boolean
html
: HTML string to be parsed.
It parses the given HTML. If HTML parsing is succeed, this method returns true. If HTML parsing is failed, this method returns false.
Example:
local xmlua = require("xmlua")
-- HTML to be parsed
local html = [[
<html>
<body>
<p>Hello</p>
</body>
</html>
]]
-- If you want to parse text in a file,
-- you need to read file content by yourself.
-- local html = io.open("example.html"):read("*all")
-- Parses HTML with SAX
local parser = xmlua.HTMLSAXParser.new()
local success = parser:parse(html)
if not success then
print("Failed to parse HTML with SAX")
os.exit(1)
end
finish() -> boolean
It finishes parse HTML with SAX.
If you started parse with parse
, you should call this method.
If you don't call this method, end_document
event isn't occurred.
Example:
local xmlua = require("xmlua")
-- HTML to be parsed
local html = [[
<html>
<body>
<p>Hello</p>
</body>
</html>
]]
-- If you want to parse text in a file,
-- you need to read file content by yourself.
-- local html = io.open("example.html"):read("*all")
-- Parses HTML with SAX
local parser = xmlua.HTMLSAXParser.new()
local success = parser:parse(html)
if not success then
print("Failed to parse HTML with SAX")
os.exit(1)
end
parser:finish()
start_document
It registers user call back function as below.
local parser = xmlua.HTMLSAXParser.new()
parser.start_document = function()
-- You want to execute code
end
Registered function is called, when parse start document element.
Registered function is called, when parse <html>
in below example.
Example:
local xmlua = require("xmlua")
-- HTML to be parsed
local html = [[
<html>
<body>
<p>Hello</p>
</body>
</html>
]]
-- If you want to parse text in a file,
-- you need to read file content by yourself.
-- local html = io.open("example.html"):read("*all")
-- Parses HTML with SAX
local parser = xmlua.HTMLSAXParser.new()
parser.start_document = function()
print("Start document")
end
local success = parser:parse(html)
if not success then
print("Failed to parse HTML with SAX")
os.exit(1)
end
parser:finish()
Result of avobe example as blow.
Start document
end_document
It registers user call back function as below.
local parser = xmlua.HTMLSAXParser.new()
parser.end_document = function()
-- You want to execute code
end
Registered function is called, when call finish
.
Registered function is called, when parse parser:finish()
in below example.
Example:
local xmlua = require("xmlua")
-- HTML to be parsed
local html = [[
<html>
<body>
<p>Hello</p>
</body>
</html>
]]
-- If you want to parse text in a file,
-- you need to read file content by yourself.
-- local html = io.open("example.html"):read("*all")
-- Parses HTML with SAX
local parser = xmlua.HTMLSAXParser.new()
parser.end_document = function()
print("End document")
end
local success = parser:parse(html)
if not success then
print("Failed to parse HTML with SAX")
os.exit(1)
end
parser:finish()
Result of avobe example as blow.
End document
processing_instruction
It registers user call back function as below.
You can get attributes of processing instruction as argument of your call back. Attributes of processing instruction are target
and data_list
in below exsample.
local parser = xmlua.HTMLSAXParser.new()
parser.processing_instruction = function(target, data_list)
-- You want to execute code
end
Registered function is called, when parse processing instruction element.
Registered function is called, when parse <?target This is PI>
in below example.
Example:
local xmlua = require("xmlua")
-- HTML to be parsed
local html = [[
<html>
<?target This is PI>
<body>
<p>Hello</p>
</body>
</html>
]]
-- If you want to parse text in a file,
-- you need to read file content by yourself.
-- local html = io.open("example.html"):read("*all")
-- Parses HTML with SAX
local parser = xmlua.HTMLSAXParser.new()
parser.processing_instruction = function(target, data_list)
print("Processing instruction target: "..target)
print("Processing instruction data: "..data_list)
end
local success = parser:parse(html)
if not success then
print("Failed to parse HTML with SAX")
os.exit(1)
end
parser:finish()
Result of avobe example as blow.
Processing instruction target: target
Processing instruction data: This is PI
cdata_block
It registers user call back function as below.
You can get attributes of script element as argument of your call back. Attributes of script element is cdata_block
.
local parser = xmlua.HTMLSAXParser.new()
parser.cdata_block = function(cdata_block)
-- You want to execute code
end
Registered function is called, when parse script element.
Registered function is called, when parse <script>alert(\"Hello world!\")</script>
in below example.
Example:
local xmlua = require("xmlua")
-- HTML to be parsed
local html = [[
<html>
<body>
<p>Hello</p>
</body>
<script>alert(\"Hello world!\")</script>
</html>
]]
-- If you want to parse text in a file,
-- you need to read file content by yourself.
-- local html = io.open("example.html"):read("*all")
-- Parses HTML with SAX
local parser = xmlua.HTMLSAXParser.new()
parser.cdata_block = function(cdata_block)
print("CDATA block: "..cdata_block)
end
local success = parser:parse(html)
if not success then
print("Failed to parse HTML with SAX")
os.exit(1)
end
parser:finish()
Result of avobe example as blow.
CDATA block: alert(\"Hello world!\")
ignorable_whitespace
It registers user call back function as below.
You can get ignorable whitespace in HTML as argument of your call back. ignorable whitespace in HTML is ignorable_whitespace
in below example.
local parser = xmlua.HTMLSAXParser.new()
parser.ignorable_whitespace = function(ignorable_whitespace)
-- You want to execute code
end
Registered function is called, when parse ignorable whitespace
Example:
local xmlua = require("xmlua")
-- HTML to be parsed
local html = [[
<html> <body><p>Hello</p></body> </html>
]]
-- If you want to parse text in a file,
-- you need to read file content by yourself.
-- local html = io.open("example.html"):read("*all")
-- Parses HTML with SAX
local parser = xmlua.HTMLSAXParser.new()
parser.ignorable_whitespace = function(ignorable_whitespace)
print("Ignorable whitespace: ".."\""..ignorable_whitespace.."\"")
end
local success = parser:parse(html)
if not success then
print("Failed to parse HTML with SAX")
os.exit(1)
end
parser:finish()
Result of avobe example as blow.
Ignorable whitespace: " "
Ignorable whitespace: " "
Ignorable whitespace: "
"
comment
It registers user call back function as below.
You can get comment of HTML as argument of your call back. comment in HTML is comment
in below example.
local parser = xmlua.HTMLSAXParser.new()
parser.comment = function(comment)
-- You want to execute code
end
Registered function is called, when parse HTML's comment.
Example:
local xmlua = require("xmlua")
-- HTML to be parsed
local html = [[
<html>
<!--This is comment.-->
<body>
<p>Hello</p>
</body>
</html>
]]
-- If you want to parse text in a file,
-- you need to read file content by yourself.
-- local html = io.open("example.html"):read("*all")
-- Parses HTML with SAX
local parser = xmlua.HTMLSAXParser.new()
parser.comment = function(comment)
print("Comment: "..comment)
end
local success = parser:parse(html)
if not success then
print("Failed to parse HTML with SAX")
os.exit(1)
end
parser:finish()
Result of avobe example as blow.
Comment: This is comment.
start_element
It registers user call back function as below.
You can get name and attributes of elements as argument of your call back.
local parser = xmlua.HTMLSAXParser.new()
parser.start_element = function(local_name, attributes)
-- You want to execute code
end
Registered function is called, when parse element.
Example:
local xmlua = require("xmlua")
-- HTML to be parsed
local html = [[
<html id="top" class="top-level">
<body>
<p>Hello</p>
</body>
</html>
]]
-- If you want to parse text in a file,
-- you need to read file content by yourself.
-- local html = io.open("example.html"):read("*all")
-- Parses HTML with SAX
local parser = xmlua.HTMLSAXParser.new()
parser.start_element = function(local_name, attributes)
print("Start element: " .. local_name)
if #attributes > 0 then
print(" Attributes:")
for i, attribute in pairs(attributes) do
local name
if attribute.prefix then
name = attribute.prefix .. ":" .. attribute.local_name
else
name = attribute.name
end
if attribute.uri then
name = name .. "{" .. attribute.uri .. "}"
end
print(" " .. name .. ": " .. attribute.value)
end
end
end
local success = parser:parse(html)
if not success then
print("Failed to parse HTML with SAX")
os.exit(1)
end
parser:finish()
Result of avobe example as blow.
Start element: html
Attributes:
id: top
class: top-level
Start element: body
Start element: p
end_element
It registers user call back function as below.
You can get name of elements as argument of your call back.
local parser = xmlua.HTMLSAXParser.new()
parser.end_element = function(name)
-- You want to execute code
end
Registered function is called, when parse end element.
Example:
local xmlua = require("xmlua")
-- HTML to be parsed
local html = [[
<html id="top" class="top-level">
<body>
<p>Hello</p>
</body>
</html>
]]
-- If you want to parse text in a file,
-- you need to read file content by yourself.
-- local html = io.open("example.html"):read("*all")
-- Parses HTML with SAX
local parser = xmlua.HTMLSAXParser.new()
parser.end_element = function(name)
print("End element: " .. name)
end
local success = parser:parse(html)
if not success then
print("Failed to parse HTML with SAX")
os.exit(1)
end
parser:finish()
Result of avobe example as blow.
End element: p
End element: body
End element: html
text
It registers user call back function as below.
You can get text of text element as argument of your call back.
local parser = xmlua.HTMLSAXParser.new()
parser.text = function(text)
-- You want to execute code
end
Registered function is called, when parse text element.
Example:
local xmlua = require("xmlua")
-- HTML to be parsed
local html = [[
<html><body><p>Hello</p></body></html>
]]
-- If you want to parse text in a file,
-- you need to read file content by yourself.
-- local html = io.open("example.html"):read("*all")
-- Parses HTML with SAX
local parser = xmlua.HTMLSAXParser.new()
parser.text = function(text)
print("Text: " .. text)
end
local success = parser:parse(html)
if not success then
print("Failed to parse HTML with SAX")
os.exit(1)
end
parser:finish()
Result of avobe example as blow.
Text: Hello
error
It registers user call back function as below.
You can get error information of parse HTML with SAX as argument of your call back.
local parser = xmlua.HTMLSAXParser.new()
parser.error = function(error)
-- You want to execute code
end
Registered function is called, when parse failed. Error information structure as below.
{
domain
code
message
level
line
}
domain
has values as specific as below.
Error domain list
code
has values as specific as below.
Error code list
level
has values as specific as below.
Error level list
Example:
local xmlua = require("xmlua")
-- HTML to be parsed
local html = [[
<>
]]
-- If you want to parse text in a file,
-- you need to read file content by yourself.
-- local html = io.open("example.html"):read("*all")
-- Parses HTML with SAX
local parser = xmlua.HTMLSAXParser.new()
parser.error = function(error)
print("Error domain : " .. error.domain)
print("Error code : " .. error.code)
print("Error message: " .. error.message)
print("Error level : " .. error.level)
print("Error line : " .. error.line)
end
local success = parser:parse(html)
if not success then
print("Failed to parse HTML with SAX")
os.exit(1)
end
parser:finish()
Result of avobe example as blow.
Error domain : 5
Error code : 68
Error message: htmlParseStartTag: invalid element name
Error level : 2
Error line : 1
Failed to parse HTML with SAX