xmlua.HTML classIt's a class for parsing a HTML.
The parsed document is returned as xmlua.Document object.
Example:
local xmlua = require("xmlua")
local document = xmlua.HTML.parse("<html><body></body></html>")
-- Call xmlua.Document:root method
document:root() -- -> Root element
xmlua.HTML.parse(html, options=nil) -> xmlua.Documenthtml: HTML string to be parsed.
options: Parse options as a table.
Here are available options:
url: The base URL of the HTML. The default is nil. It means that no base URL isn't specified.
encoding: The encoding of the HTML. The default is nil. It means that encoding is detected automatically.
prefer_meta_charset: Whether is <meta charset="ENCODING"> HTML 5 tag used for detecting encoding. The default is true when encoding is nil, false when encoding is not nil.
It parses the given HTML and returns xmlua.Document object.
If HTML parsing is failed, it raises an error only when the error is a critical error. Otherwise, xmlua.Document.errors contain all errors.
Normally, you don't need to specify any options.
Example:
local xmlua = require("xmlua")
-- HTML to be parsed
local html = [[
<html>
<body>
<p>Hello</p>
</body>
</html>
]]
-- If you want to parse text in a file,
-- you need to read file content by yourself.
-- local html = io.open("example.html"):read("*all")
-- Parses HTML
local success, document = pcall(xmlua.HTML.parse, html)
if not success then
local message = document
print("Failed to parse HTML: " .. message)
os.exit(1)
end
-- Gets the root element
local root = document:root() -- --> <html> element as xmlua.Element
-- Prints the root element name
print(root:name()) -- -> html
If you know right encoding, you can specify encoding option.
Example:
local xmlua = require("xmlua")
local html = [[
<html>
<body><p>Hello</p></body>
</html>
]]
-- Parses HTML with the specified encoding
local document = xmlua.HTML.parse(html, {encoding = "UTF-8"})
-- Prints the <body> element content
print(document:search("//body"):text())
-- Hello
You can get error details from xmlua.Document.errors.
Example:
local xmlua = require("xmlua")
-- Invalid HTML. "&" is invalid.
local html = [[
<html>
<body><p>&</p></body>
</html>
]]
-- Parses HTML loosely
local document = xmlua.HTML.parse(html)
-- "&" is parsed as "&"
print(document:search("//body"):to_html())
-- <body><p>&<p/></body>
for i, err in ipairs(document.errors) do
print("Error" .. i .. ":")
print("Line=" .. err.line .. ": " .. err.message)
-- Line=2: htmlParseEntityRef: no name
end
xmlua.HTML.build(document_tree={ELEMENT, {ATTRIBUTE1, ATTRIBUTE2, ...}, ...}[, uri][, public_id]) -> xmlua.DocumentIf you give tabel as below, it returns document tree.
{ -- Support only element and attribute, text.
"Element name", -- 1st element is element name.
{ -- 2nd element is attribute. If this element has not attribute, this table is empty.
["Attribute name1"] = "Attribute value1",
["Attribute name2"] = "Attribute value2",
...,
["Attribute name n"] = "Attribute value n",
},
-- 3rd element is child node.
"Text node1", -- If this element is a string, this element is a text node.
{ -- If this element is a table, this element is an element node.
"Child node name1",
{
["Attribute name1"] = "Attribute value1",
["Attribute name2"] = "Attribute value2",
...,
["Attribute name n"] = "Attribute value n",
},
}
"Text node2",
...
}
This method makes new xmlua.Document.
If you give empty table, it returns empty xmlua.Document(This document have not root element).
Example:
local xmlua = require("xmlua")
local doc_tree = {
"html",
{
["class"] = "A",
["id"] = "1"
},
"This is text.",
{
"child",
{
["class"] = "B",
["id"] = "2"
}
}
}
-- Make new document fro table.
local document = xmlua.HTML.build(doc_tree)
print(document:to_html())
-- <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
-- <html></html>
You can also specify the external subset of DTD with system ID or public ID as below.
Example:
-- Specify external subset with system ID
local uri = "file:///usr/local/share/test.dtd"
tree = {"html"}
document = xmlua.HTML.build(tree, uri)
print(document:to_html())
-- <!DOCTYPE html SYSTEM "file:///usr/local/share/test.dtd">
-- <html></html>
-- Specify external subset with public ID
local uri = "http://www.w3.org/TR/html4/strict.dtd"
local public_id = "-//W3C//DTD HTML 4.01//EN"
tree = {"html"}
document = xmlua.HTML.build(tree, uri, public_id)
print(document:to_html())
-- <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
-- <html></html>
If you don't specify the external subset of DTD, DTD use default as below.
Example:
-- Don't specify the external subset of DTD
tree = {"html"}
document = xmlua.HTML.build(tree)
print(document:to_html())
-- <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
-- <html></html>
xmlua.Document: The class for HTML document and XML document.