Difference between XML and HTML?

anonymous

2008-03-06 04:00:05 UTC

The Extensible Markup Language (XML) is a general-purpose specification for creating custom markup languages.[1] It is classified as an extensible language because it allows its users to define their own elements. Its primary purpose is to facilitate the sharing of structured data across different information systems, particularly via the Internet,[2] and it is used both to encode documents and to serialize data. In the latter context, it is comparable with other text-based serialization languages such as JSON and YAML.[3]

It started as a simplified subset of the Standard Generalized Markup Language (SGML), and is designed to be relatively human-legible. By adding semantic constraints, application languages can be implemented in XML. These include XHTML,[4] RSS, MathML, GraphML, Scalable Vector Graphics, MusicXML, and thousands of others. Moreover, XML is sometimes used as the specification language for such application languages.

XML is recommended by the World Wide Web Consortium. It is a fee-free open standard. The W3C recommendation specifies both the lexical grammar and the requirements for parsing.

HTML, an initialism of HyperText Markup Language, is the predominant markup language for web pages. It provides a means to describe the structure of text-based information in a document — by denoting certain text as links, headings, paragraphs, lists, and so on — and to supplement that text with interactive forms, embedded images, and other objects. HTML is written in the form of tags, surrounded by angle brackets. HTML can also describe, to some degree, the appearance and semantics of a document, and can include embedded scripting language code (such as JavaScript) which can affect the behavior of web browsers and other HTML processors.

HTML is also often used to refer to content of the MIME type text/html or even more broadly as a generic term for HTML whether in its XML-descended form (such as XHTML 1.0 and later) or its form descended directly from SGML (such as HTML 4.01 and earlier).

By convention, html format data files use a file extension .html or .htm.

this may help you know why using one of the two languages and not the other:

differences:

Although HTML and XHTML appear to have similarities in their syntax, they are significantly different in many ways.

MIME Types

* XHTML must be served with an XML MIME type, such as application/xml or application/xhtml+xml.

* HTML must be served as text/html.

It is the MIME type that determines what type of document you are using. If you use attempt to send XHTML as text/html, you are actually just using HTML, possibly with syntax errors.

Technically, according to the spec, XHTML 1.0 is allowed to be served as text/html. But, due to the above reason, such a document is considered to be an HTML document, not an XHTML document.

Parsing

XHTML uses XML parsing requirements. HTML uses its own which are defined much more closely to the way browsers actually handle HTML today.

* In XHTML, well-formedness errors are fatal. In HTML, error handling rules are much more graceful. Well-formedness errors, which are also syntax errors in HTML, include the following:

o Unencoded ampersands (& instead of &), and less than signs (< instead of <) (This does not apply to CDATA). (Note: in HTML, an unencoded ampersand is allowed in some cases.)

o Comments containing extra pairs of hyphens or ending with a hyphen. e.g.

+ or

+ .

o Mismatched end tags (does not apply to elements with optional tags)

o Unclosed tags.

o Unexpected characters occuring in or before attribute names.

o Unexpected occurrence of EOF.

o Unexpected characters before the DOCTYPE name.

o Missing DOCTYPE name.

o A PUBLIC identifer in a DOCTYPE without a SYSTEM identifier (Note: including either of these is a syntax error in HTML5; but, in XML only the SYSTEM identifier is allowed to occur on its own).

o End tags with attributes.

o Unexpected end tags (in HTML, an unexpected
or

can cause the start tag to be implied before it).

* The internal subset is permitted in XML, but meaningless (and forbidden) in HTML.

o In some cases, an internal subset in HTML would end up being partly rendered inline.

* The sequence of characters "]]>" in content when it does not mark the end of a CDATA section is a well-formedness error in XHTML, but valid in HTML.

* In XHTML: is a CDATA section. In HTML, it's a bogus comment.

* In XHTML, is a processing instruction. In HTML, it's a bogus comment.

* In HTML, the trailing slash used for the empty element syntax is a parse error for non-void elements (see below), but is ignored in all cases.

* In HTML, the script and style elements are parsed as CDATA. (Note: the definition of CDATA differs from that in XML). In XML, they're parsed as normal elements (which means that comments are treated as real comments, and things that look like start tags actually are start tags).

* In HTML, the title and textarea elements are parsed as RCDATA. (Note: The definition of RCDATA differs from that in SGML and there is no RCDATA in XML).

* In HTML, if scripting is enabled, the noscript element is parsed as CDATA. If scripting is disabled, it's parsed as PCDATA. In XHTML, the element has no effect, and can't really be used to stop content from being present when script is disabled.

* In HTML, the iframe, noembed and noframes elements are parsed as CDATA. In XHTML, they are parsed as normal elements, and therefore do not stop content from being used.

* White space characters in attribute values are normalized to spaces in XHTML.

* In HTML, elements with optional tags are implied in certain conditions.

* In HTML, title elements with tags occurring in the body are moved into the head. In XHTML, they stay where they were specified.

* In HTML, tags for certain elements, which appear out of context, are ignored. This includes caption, col, colgroup, frame, frameset, head, option, optgroup, tbody, td, tfoot, th, thead, tr.

* The plaintext element has a special parsing requirement in HTML. (It is, however, forbidden.)

* Many other special handling of edge cases and error conditions, not all of which are listed here, occur in HTML.

Syntax

* In HTML, the doctype is required. In XHTML, it is optional.

* In XHTML, tag names and attribute names are case sensitive. In HTML, they are case insensitive.

* In XHTML, non-empty elements require both a start and an end tag. In HTML, certain elements allow the omission of either or both:

o html (both)

o head (both)

o body (both)

o li (end tag)

o dt (end tag)

o dd (end tag)

o p (end tag)

o colgroup (both)

o thead (end tag)

o tbody (both)

o tfoot (end tag)

o tr (end tag)

o td (end tag)

o th (end tag)

* In XHTML, empty elements may use either the empty element syntax (
) or have an end tag immediately follow the start tag (

). In HTML, the empty element syntax (trailing slash) is allowed on void elements, but forbidden on other elements. However, it serves no purpose whatsoever and can be omitted. End tags for void elements are forbidden.

o base, link, meta, hr, br, img, embed, param, area, col and input

o Note: the following are treated as void elements for the purpose in the parsing requirements, but, as they are obsolete and non-standard, the trailing slash is not permitted: basefont, bgsound, spacer, wbr. (although, since these elements are not permitted anyway, it doesn't make much difference).

* HTML allows attribute minimisation (i.e. omitting the value), XHTML does not.

* HTML allows the use of unquoted attribute values, XHTML does not.

* XHTML allows the use of CDATA sections, HTML does not.

* XHTML allows the use of processing instructions, HTML does not.

* In HTML, all entity references are predefined and do not require a DTD. But because there is no DTD for XHTML5, entity references cannot be used in XHTML. (excluding the 5 predefined entities: &, <, >, " and ')

o You may provide your own DTD for use with your own validating parser, but be aware that browsers do not use validating parsers and will not read the DTD.

* The valid set of unicode characters in XML 1.0 is limited beyond that in HTML.

* Namespace prefixes are permitted in XHTML. They are forbidden in HTML.

Markup

* The namespace declaration (xmlns attribute) is required in XHTML. The xmlns attribute is also allowed to appear on the html element in HTML on the condition that is has the value "http://www.w3.org/1999/xhtml".

o

o In HTML, the xmlns attribute has absolutely no effect. It is basically a talisman. It is allowed merely to make migration to and from XHTML mildly easier. When parsed by an HTML parser, the attribute ends up in the null namespace

o In XML (with an XML Namespaces-aware parser), an xmlns attribute is part of the namespace declaration mechanism, and an element cannot actually have an xmlns attribute in the null namespace. In DOM implementations, the attribute ends up in the "http://www.w3.org/2000/xmlns/" namespace.

* XHTML allows non XHTML elements and attributes (in different namespaces) to be used, HTML does not.

* XHTML uses the xml:lang attribute, HTML uses lang instead,

* XML ID introduces xml:id, wh

David D

2008-03-06 03:58:58 UTC

HTML is a language to describe the semantics and structure of text documents, along with their relationships to other documents.

HTML is an application of SGML, which is a general language for marking up text. (i.e. SGML describes how tags and attribtues work. HTML describes what a particular set of tags and attributes mean).

XML is a simplified version of SGML.

The only reason to use XML directly is if you need to design a new markup language, which you would do if one doesn't exist to describe the information you need to describe.

e.g. SVG is an XML application for describing images, while FOAF describes people, and ATOM describes collections of articles (with information like publication date).

anonymous

2008-03-06 04:04:12 UTC

XML is not a replacement for HTML.

XML and HTML were designed with different goals:

XML was designed to transport and store data, with focus on what data is.

HTML was designed to display data, with focus on how data looks.

HTML is about displaying information, while XML is about carrying information.

HTML is used to display the data in a formatted way. You can apply styles and use different layouts to display the data in an html file. The data that is displayed in an html file could come from an xml file.

So to say in simple words, html displays the data and xml holds the data!

matthiasderuyver

2008-03-06 04:10:34 UTC

One starts with X, and another starts with HT.

I think HTML is used most.

icpooreman

2008-03-06 04:02:43 UTC

http://www.youtube.com/watch?v=6gmP4nk0EOE

haha video gets me pumped glayvin.

Kamren Z

2008-03-06 08:44:53 UTC

After reading the above posts your head might be swimming so here is how I explain it.

There are two different foundations for writing web pages HTML and XHTML. HTML is the way we have been writing pages for years, XHTML is the newcomer.

Push XML and HTML together and you get XHTML.

As you pointed out XHMTL is more complex than HTML so why use it? Think of XHMTL as the business person who goes to work in a suit and tie, while HTML is the local surfer dude who is working at the beach. Each person has their pros and cons.

The business person is strict. All of his code makes sense, it is orderly, it is on the cutting edge (well actually just the new standard for web development).

The surfer dude is laid back. He can write sloppy code and get the job done.

If you are just trying to get a website coded use HTML. If you want to follow the new industry standards use XHTML.

For the most part XHTML doesn't add a lot of extra work. As a designer you have to close tags, so don't use
use
. That is a simple example.

After you have decided which language to use you pick the appropriate DOCTYPE to tell the browser what language you are using.

...

Why not take a class with us?

http://developintelligence.com/catalog/html-xhtml-training.php

Kamren Z

Software Technologist @ DevelopIntelligence

http://developintelligence.com