This document describes how to use XMLua step by step. If you don't install XMLua yet, install XMLua before you read this document.
You need to parse document at first to use XMLua. XMLua supports HTML document and XML document.
You can use xmlua.HTML.parse
to parse HTML.
Example:
-- Requires "xmlua" module
local xmlua = require("xmlua")
local html = [[
<html>
<head>
<title>Hello</title>
</head>
<body>
<p>World</p>
</body>
</html>
]]
-- Parses HTML
local document = xmlua.HTML.parse(html)
-- Serializes as HTML
print(document:to_html())
-- <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
-- <html>
-- <head>
-- <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
-- <title>Hello</title>
-- </head>
-- <body>
-- <p>World</p>
-- </body>
-- </html>
You can use xmlua.XML.parse
to parse XML.
Example:
-- Requires "xmlua" module
local xmlua = require("xmlua")
local xml = [[
<root>
<sub>text1</sub>
<sub>text2</sub>
<sub>text3</sub>
</root>
]]
-- Parses XML
local document = xmlua.XML.parse(xml)
-- Serializes as XML
print(document:to_xml())
-- <?xml version="1.0" encoding="UTF-8"?>
-- <root>
-- <sub>text1</sub>
-- <sub>text2</sub>
-- <sub>text3</sub>
-- </root>
You must pass HTML or XML as string
. If you want to parse HTML or XML in a file, you need to read it by yourself.
Example:
local xmlua = require("xmlua")
local xml_file = io.open("example.xml")
local xml = xml_file:read("*all")
xml_file:close()
local document = xmlua.XML.parse(xml)
xmlua.HTML.parse
and xmlua.XML.parse
may fail. For example, they fail with invalid document. If they fail, they raises an error.
If you need to assume that document may be invalid, you need to handle error with pcall
.
Example:
local xmlua = require("xmlua")
local invalid_xml = "<root>"
-- Parses invalid XML
local success, document = pcall(xmlua.XML.parse, invalid_xml)
if success then
print("Succeeded to parse XML")
else
-- If pcall returns not success, the second return value is error
-- message not document.
local message = document
print("Failed to parse XML: " .. message)
-- -> Failed to parse XML: ./xmlua/xml.lua:15: Premature end of data in tag root line 1
end
Both parsed HTML and parsed XML are xmlua.Document
object in XMLua. You can handle both of them in the same way.
You can use XPath to search elements. You can use xmlua.Searchable:search
method for it.
xmlua.Document
can use the method.
The method returns a xmlua.NodeSet
object. It's a normal array with convenience methods. You can use normal array features such as #
and []
.
Example:
local xmlua = require("xmlua")
local xml = [[
<root>
<sub>text1</sub>
<sub>text2</sub>
<sub>text3</sub>
</root>
]]
local document = xmlua.XML.parse(xml)
-- Searches all <sub> elements under the <root> element
local all_subs = document:search("/root/sub")
-- You can use "#" for getting the number of matched nodes
print(#all_subs) -- -> 3
-- You can access the N-th node by "[]".
print(all_subs[1]:to_xml()) -- -> <sub1>text1</sub1>
print(all_subs[2]:to_xml()) -- -> <sub2>text2</sub2>
print(all_subs[3]:to_xml()) -- -> <sub3>text3</sub3>
xmlua.NodeSet
object can also use search
method. It means that you can search against searched result.
Example:
local xmlua = require("xmlua")
local xml = [[
<root>
<sub class="A"><subsub1/></sub>
<sub class="B"><subsub2/></sub>
<sub class="A"><subsub3/></sub>
</root>
]]
local document = xmlua.XML.parse(xml)
-- Searches all <sub class="A"> elements
local class_a_subs = document:search("//sub[@class='A']")
-- Searches all elements under <sub class="A"> elements
local subsubs_in_class_a = class_a_subs:search("*")
print(#subsubs_in_class_a) -- -> 2
-- It's /root/sub[@class="A"]/subsub1
print(subsubs_in_class_a[1]:to_xml())
-- <subsub1/>
-- It's /root/sub[@class="A"]/subsub3
print(subsubs_in_class_a[2]:to_xml())
-- <subsub3/>
The search
method is xmlua.NodeSet:search
. It's not xmlua.Searchable:search
method.
You can also use xmlua.Searchable:search
method against element.
Example:
local xmlua = require("xmlua")
local xml = [[
<root>
<sub>text1</sub>
<sub>text2</sub>
<sub>text3</sub>
</root>
]]
local document = xmlua.XML.parse(xml)
-- Searches the <root> element
local roots = document:search("/root")
local root = roots[1]
-- Searches all <sub> elements under the <root> element
local all_subs = root:search("sub")
-- You can use "#" for getting the number of matched nodes
print(#all_subs) -- -> 3
-- You can access the N-th node by "[]".
print(all_subs[1]:to_xml()) -- -> <sub1>text1</sub1>
print(all_subs[2]:to_xml()) -- -> <sub2>text2</sub2>
print(all_subs[3]:to_xml()) -- -> <sub3>text3</sub3>
search
may fail. For example. it fails with invalid XPath. If it fails, it raises an error.
If you need to assume that XPath may be invalid, you need to handle error with pcall
.
Example:
local xmlua = require("xmlua")
local xml = "<root/>"
local document = xmlua.XML.parse(xml)
local invalid_xpath = "..."
local success, node_set = pcall(function()
return document:search(invalid_xpath)
end)
if success then
print("Succeeded to search by XPath")
else
-- If pcall returns not success, the second return value is error
-- message not node set.
local message = node_set
print("Failed to search by XPath: " .. message)
-- -> Failed to search by XPath: ./xmlua/searchable.lua:57: Invalid expression
end
You need to get xmlua.Element
object to get attribute value in element.
Normally, you can get xmlua.Element
by xmlua.Document:root
or xmlua.Searchable:search
.
xmlua.Document:root
returns the root element of the document.
xmlua.Searchable:search
returns elements as xmlua.NodeSet
. You can get the first element by node_set[1]
.
Example:
local xmlua = require("xmlua")
local document = xmlua.XML.parse("<root class='A'/>")
-- Gets the root element: <root>
local root = document:root()
-- Gets the root element: <root>
local root = document:search("/root")[1]
You can get attribute value by one of the followings:
element.attribute_name
: Dot syntax.
element["attribute_name"]
: []
syntax.
element:get_attribute("attribute_name")
: Method syntax.
Normally, dot syntax and []
syntax are recommended.
If attribute name is valid Lua identifier such as "class
", dot syntax is recommended.
If attribute name isn't valid Lua identifier such as "xml:lang
", []
syntax is recommended.
If attribute name conflicts with method names available in xmlua.Element
such as search
and parent
, method syntax is recommended.
Example:
local xmlua = require("xmlua")
local document = xmlua.XML.parse("<root class='A'/>")
local root = document:root()
-- Uses dot syntax to get attribute value
print(root.class)
-- -> A
-- Uses [] syntax to get attribute value
print(root["class"])
-- -> A
-- Uses get_attribute method to get attribute value
print(root:get_attribute("class"))
-- -> A
If the specified attribute doesn't exist, nil
is returned.
Example:
local xmlua = require("xmlua")
local document = xmlua.XML.parse("<root class='A'/>")
local root = document:root()
-- Returns nil for nonexistent attribute name
print(root.nonexistent)
-- -> nil
If the attribute what you find has namespace, you can use "namespace-prefix:local-name"
syntax attribute name such as "xml:lang"
. In this case, you can't use dot syntax because ":"
is included in attribute name.
If the specified namespace prefix doesn't exist, the whole specified attribute name is treated as attribute name instead of namespace URI and local name.
Example:
local xmlua = require("xmlua")
local xml = [[
<root xmlns:example="http://example.com/"
example:attribute="value-example"
attribute="value"
nonexistent-namespace:attribute="value-nonexistent-namespace"/>
]]
local document = xmlua.XML.parse(xml)
local root = document:root()
-- With namespace prefix
print(root["example:attribute"])
-- -> value-example
-- Without namespace prefix
print(root["attribute"])
-- -> value
-- With nonexistent namespace prefix
print(root["nonexistent-namespace:attribute"])
-- -> value-nonexistent-namespace
TODO.
See xmlua.Element:text
and xmlua.NodeSet:text
for now.
You can get the sibling elements by xmlua.Element:previous
and xmlua.Element:next
.
Example:
local xmlua = require("xmlua")
local xml = [[
<root>
<sub1/>
<sub2/>
<sub3/>
</root>
]]
local document = xmlua.XML.parse(xml)
local sub2 = document:search("/root/sub2")[1]
-- Gets the previous sibling element of <sub2>
print(sub2:previous():to_xml())
-- <sub1/>
-- Gets the next sibling element of <sub2>
print(sub2:next():to_xml())
-- <sub3/>
If there are no sibling elements, they returns nil
.
Example:
local xmlua = require("xmlua")
local xml = [[
<root>
<sub1/>
<sub2/>
<sub3/>
</root>
]]
local document = xmlua.XML.parse(xml)
local sub1 = document:search("/root/sub1")[1]
local sub3 = document:search("/root/sub3")[1]
-- Gets the previous sibling element of <sub1>
print(sub1:previous())
-- nil
-- Gets the next sibling element of <sub3>
print(sub3:next():to_xml())
-- nil
You can get the parent element by xmlua.Element:parent
.
Example:
local xmlua = require("xmlua")
local xml = [[
<root>
<sub1/>
<sub2/>
<sub3/>
</root>
]]
local document = xmlua.XML.parse(xml)
local sub2 = document:search("/root/sub2")[1]
-- Gets the parent element of <sub2>
print(sub2:parent():to_xml())
-- <root>
-- <sub1/>
-- <sub2/>
-- <sub3/>
-- </root>
If the element is the root element, it returns xmlua.Document
.
Example:
local xmlua = require("xmlua")
local xml = [[
<root>
<sub1/>
<sub2/>
<sub3/>
</root>
]]
local document = xmlua.XML.parse(xml)
local root = document:root()
-- Gets the parent of <root>: xmlua.Document
print(root:parent():to_xml())
-- <?xml version="1.0" encoding="UTF-8"?>
-- <root>
-- <sub1/>
-- <sub2/>
-- <sub3/>
-- </root>
xmlua.Document:parent
also exists. It always returns nil
. It just exist for consistent API.
Example:
local xmlua = require("xmlua")
local xml = [[
<root>
<sub1/>
<sub2/>
<sub3/>
</root>
]]
local document = xmlua.XML.parse(xml)
-- Gets the parent of document
print(document:parent())
-- nil
You can serialize document and element as HTML and XML.
xmlua.Serializable:to_html
serializes target as HTML.
Example:
local xmlua = require("xmlua")
local document = xmlua.HTML.parse([[
<html>
<head>
<title>Hello</title>
</head>
<body>World</body>
</html>
]])
-- Serializes as HTML
print(document:to_html())
-- <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
-- <html>
-- <head>
-- <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
-- <title>Hello</title>
-- </head>
-- <body>World</body>
-- </html>
-- Serializes <body> element as HTML
print(document:search("/html/body")[1]:to_html())
-- <body>World</body>
xmlua.Serializable:to_xml
serializes target as XML.
Example:
local xmlua = require("xmlua")
local document = xmlua.XML.parse([[
<root>
<sub1>text1</sub1>
<sub2>text2</sub2>
<sub3>text3</sub3>
</root>
]])
-- Serializes as XML
print(document:to_xml())
-- <?xml version="1.0" encoding="UTF-8"?>
-- <root>
-- <sub1>text1</sub1>
-- <sub2>text2</sub2>
-- <sub3>text3</sub3>
-- </root>
-- Serializes <body> element as XML
print(document:search("/root/sub1")[1]:to_xml())
-- <sub1>text1</sub1>
xmlua.NodeSet
also has to_html
and to_xml
. They just concatenate serialized strings of nodes in the node set. They are useful just for debug.
Example:
local document = xmlua.HTML.parse([[
<html>
<head>
<title>Hello</title>
</head>
<body>World</body>
</html>
]])
-- All elements under <html> (<head> and <body>)
local node_set = document:search("/html/*")
-- Serializes all elements as HTML and concatenates serialized HTML
print(node_set:to_html())
-- <head>
-- <meta http-equiv="Content-Type" content="text/html; charset=EUC-JP">
-- <title>Hello</title>
-- </head><body>World</body>
-- FYI: <head> serialization
print(node_set[1]:to_html())
-- <head>
-- <meta http-equiv="Content-Type" content="text/html; charset=EUC-JP">
-- <title>Hello</title>
-- </head>
-- FYI: <body> serialization
print(node_set[2]:to_html())
-- <body>World</body>
local xmlua = require("xmlua")
local document = xmlua.XML.parse([[
<root>
<sub1>text1</sub1>
<sub2>text2</sub2>
<sub3>text3</sub3>
</root>
]])
-- All elements under <root> (<sub1>, <sub2> and <sub3>)
local node_set = document:search("/root/*")
-- Serializes all elements as XML and concatenates serialized XML
print(node_set:to_xml())
-- <sub1>text1</sub1><sub2>text2</sub2><sub3>text3</sub3>
-- FYI: <sub1> serialization
print(node_set[1]:to_xml())
-- <sub1>text1</sub1>
-- FYI: <sub2> serialization
print(node_set[2]:to_xml())
-- <sub2>text2</sub2>
-- FYI: <sub3> serialization
print(node_set[3]:to_xml())
-- <sub3>text3</sub3>
You can use XMLua with multiple threads. But you need more codes to do it.
You must call xmlua.init
in the main thread before you create any threads that use XMLua.
Example:
local xmlua = require("xmlua")
-- You must call xmlua.init in main thread before you create threads
-- when you use XMLua with multiple threads.
xmlua.init()
local thread = require("cqueues.thread")
-- ...
You can call xmlua.cleanup
in the main thread after you finish all threads that use XMLua and you finish all MXLua use.
Example:
local xmlua = require("xmlua")
xmlua.init()
local thread = require("cqueues.thread")
-- ...
-- You can call xmlua.cleanup in main thread to free all resources
-- used by XMLua. You must ensure that all threads are finished and
-- all XMLua related objects aren't used anymore.
xmlua.cleanup()
os.exit()
If you work on GNU/Linux and your LuaJIT isn't linked with pthread, require("xmlua")
isn't thread safe. Here are solutions:
Link libpthread.so.0
by LD_PRELOAD
environment variable.
Don't execute require("xmlua")
at the same time.
Here are example command lines to use LD_PRELOAD
.
Debian GNU/Linux and Ubuntu:
% LD_PRELOAD=/lib/x86_64-linux-gnu/libpthread.so.o luajit XXX.lua
CentOS:
% LD_PRELOAD=/lib64/libpthread.so.o luajit XXX.lua
If you use cqueues, you can use connection returned by cqueues.thread.start
to prevent executing require("xmlua")
at the same time.
Example:
local xmlua = require("xmlua")
xmlua.init()
local thread = require("cqueues.thread")
local n = 10
local workers = {}
local connections = {}
for i = 1, n do
local worker, connection = thread.start(function(connection)
-- require("xmlua") isn't thread safe
local xmlua = require("xmlua")
-- Notify that require("xmlua") is finished
connection:write("ready\n")
for job in connection:lines("*l") do
-- local html = ...
-- local document = xmlua.HTML.parse(html)
-- document:search("...")
end
end)
-- Wait until require("xmlua") is finished
connection:read("*l")
table.insert(workers, worker)
table.insert(connections, connection)
end
for _, connection in ipairs(connections) do
-- Put jobs to workers
connection:write("Job1\n")
connection:write("Job2\n")
-- ...
end
for _, connection in ipairs(connections) do
-- Finish providing jobs
connection:close()
end
for _, worker in ipairs(workers) do
worker:join()
end
xmlua.cleanup()
Now, you knew all major XMLua features! If you want to understand each feature, see reference manual for each feature.