Related: How can I pretty-print JSON in (unix) shell script?
Is there a (unix) shell script to format XML in human-readable form?
Basically, I want it to transform the following:
<root><foo a="b">lorem</foo><bar value="ipsum" /></root>
... into something like this:
<root>
<foo a="b">lorem</foo>
<bar value="ipsum" />
</root>
libxml2-utils
This utility comes with libxml2-utils
:
echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' |
xmllint --format -
Perl's XML::Twig
This command comes with XML::Twig perl module, sometimes xml-twig-tools
package:
echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' |
xml_pp
xmlstarlet
This command comes with xmlstarlet
:
echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' |
xmlstarlet format --indent-tab
tidy
Check the tidy
package:
echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' |
tidy -xml -i -
Python
Python's xml.dom.minidom
can format XML (both python2 and python3):
echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' |
python -c 'import sys;import xml.dom.minidom;s=sys.stdin.read();print(xml.dom.minidom.parseString(s).toprettyxml())'
saxon-lint
You need saxon-lint
:
echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' |
saxon-lint --indent --xpath '/' -
saxon-HE
You need saxon-HE
:
echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' |
java -cp /usr/share/java/saxon/saxon9he.jar net.sf.saxon.Query \
-s:- -qs:/ '!indent=yes'
Good, quick answer. The first option seems like it'll be more ubiquitous on modern *nix installs. A minor point; but can it be called without working through an intermediate file? I.e.,
echo '<xml .. />' | xmllint --some-read-from-stdn-option
?The package is
libxml2-utils
in my beautiful ubuntu.Note that the "cat data.xml | xmllint --format - | tee data.xml" does not work. On my system it sometimes worked for small files, but always truncated huge files. If you really want to do anything in place read backreference.org/2011/01/29/in-place-editing-of-files
To solve
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 805: ordinal not in range(128)
in python version you want to definePYTHONIOENCODING="UTF-8"
:cat some.xml | PYTHONIOENCODING="UTF-8" python -c 'import sys;import xml.dom.minidom;s=sys.stdin.read();print xml.dom.minidom.parseString(s).toprettyxml()' > pretty.xml
Note that tidy can also format xml with no root element. This is useful to format through a pipe, xml sections (e.g. extracted from logs).
echo '<x></x><y></y>' | tidy -xml -iq