Chilkat Software Chilkat Software Chilkat Software
Chilkat Software Chilkat Software

  

  

  

  

  

 

HTML to XML Conversion Sample #1

Goto Sample #2

Goto Sample #3

Goto Sample #4

This is the first of several examples describing the details of how the Chilkat HTML-to-XML library converts HTML into well-formed XML.

We'll begin with the following HTML and the describe the features of the generated XML:

<html>
<head>
<title>This is a test</title>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
</head>
<body>
<h1>This is the heading</h1>
<p>Lorem ipsum dolor sit amet, <b>consectetur</b> adipisicing elit, sed do eiusmod tempor incididunt ut labore <br> et dolore magna aliqua.
<p>Ut enim ad minim veniam, <a href="http://www.google.com/">quis nostrud exercitation</a> ullamco laboris nisi ut aliquip ex ea commodo consequat.
</body>
</html>

The XML output is shown below.

  • The XML is written to match the encoding of the HTML. In the HTML above, the charset is windows-1252, so the encoding attribute is set to windows-1252.
  • The root node of the XML document is always <root>. The <html> node is found directly underneath. The reason for the "root" node is because you may encounter poorly formed HTML such that it has more than one root-level node.
  • All text content is placed under <text> nodes.
<?xml version="1.0" encoding="windows-1252" ?>

<root>
    <html>
        <head>
            <title>
                <text>This is a test</text>
            </title>
            <meta http-equiv="Content-Language" content="en-us"></meta>
            <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"></meta>
        </head>
        <body>
            <h1>
                <text>This is the heading</text>
            </h1>
            <p>
                <text>Lorem ipsum dolor sit amet,  consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore  et dolore magna aliqua.
                </text>
            </p>
            <p>
                <text>Ut enim ad minim veniam, </text>
                <a href="http://www.google.com/">
                    <text>quis nostrud exercitation</text>
                </a>
                <text>ullamco laboris nisi ut aliquip ex ea commodo consequat.
                </text>
            </p>
        </body>
    </html>
</root>

(The Chilkat HTML-to-XML API is offered across many programming languages: Ruby, Perl, Python, Java, C#, VB.NET, etc.)


Privacy Statement. Copyright 2000-2017 Chilkat Software, Inc. All rights reserved.

(Regarding the usage of the Android logo) Portions of this page are reproduced from work created and shared by Google and used according to terms described in the Creative Commons 3.0 Attribution License.

Send feedback to support@chilkatsoft.com


Software components and libraries for Linux, MAC OS X, iOS, Android™, Solaris, RHEL/CentOS, FreeBSD, MinGW
Azure, Windows 10, Windows 8, Windows Server 2012, Windows 7, Vista, XP, 2003 Server, 2008 Server, etc.