The Manipulate text expressed in most of the world's writing systems. Developed in tandem with the Universal Character Set standard and published in book form as The Unicode Standard, Unicode consists of a repertoire of more than 100,000 characters, a set of code charts for visual reference, an encoding methodology and set of standard character encodings, an enumeration of character properties such as upper and lower case, a set of reference data computer files, and a number of related items, such as character properties, rules for normalization, decomposition, collation, rendering and bidirectional display order (for the correct display of text containing both right-to-left scripts, such as Arabic or Hebrew, and left-to-right scripts).
The Unicode Consortium, the non-profit organization that coordinates Unicode's development, has the ambitious goal of eventually replacing existing character encoding schemes with Unicode and its standard Unicode Transformation Format (UTF) schemes, as many of the existing schemes are limited in size and scope and are incompatible with multilingual environments.
Unicode's success at unifying character sets has led to its widespread and predominant use in the internationalization and localization of computer software. The standard has been implemented in many recent technologies, including XML, the Java programming language, the Microsoft .NET Framework and modern operating systems.
Unicode can be implemented by different character encodings. The most commonly used encodings are UTF-8 (which uses 1 byte for all ASCII characters, which have the same code values as in the standard ASCII encoding, and up to 4 bytes for other characters), the now-obsolete UCS-2 (which uses 2 bytes for all characters, but does not include every character in the Unicode standard), and UTF-16 (which extends UCS-2, using 4 bytes to encode characters missing from UCS-2).
Now For What It Is Used :-
Unicode has become the dominant scheme for internal processing and sometimes storage (though a lot of text is still stored in legacy encodings) of text. Early adopters tended to use UCS-2 and later moved to UTF-16 (as this was the least disruptive way to add support for non-BMP characters). The best known such system is Windows NT (and its descendants, Windows 2000, Windows XP and Windows Vista), which uses Unicode as the sole internal character encoding. The Java and .NET bytecode environments, Mac OS X, and KDE also use it for internal representation.
UTF-8 (originally developed for Plan 9) has become the main storage encoding on most Unix-like operating systems (though others are also used by some libraries) because it is a relatively easy replacement for traditional extended ASCII character sets.
Multilingual text-rendering engines which use Unicode include Uniscribe for Microsoft Windows, ATSUI for Mac OS X and Pango, a free software engine used by GTK+ (and hence the GNOME desktop).
Because keyboard layouts cannot have simple key combinations for all characters, several operating systems provide alternative input methods that allow access to the entire repertoire.
ISO 14755[28], which standardises methods for entering Unicode characters from their codepoints, specifies several methods. There is the Basic method, where a beginning sequence is followed by the hexadecimal representation of the codepoint and the ending sequence. There is also a screen-selection entry method specified, where the characters are listed in a table in a screen, such as with a character map program.
All W3C recommendations have used Unicode as their document character set since HTML 4.0. Web browsers have supported Unicode, especially UTF-8, for many years. Display problems result primarily from font related issues; in particular, versions of Microsoft Internet Explorer do not render many code points unless explicitly told to use a font that contains them.
Although syntax rules may affect the order in which characters are allowed to appear, both HTML 4 and XML (including XHTML) documents, by definition, comprise characters from most of the Unicode code points, with the exception of most of the C0 and C1 control codes
the permanently-unassigned code points D800–DFFF
any code point ending in FFFE or FFFF
These characters manifest either directly as bytes according to document's encoding, if the encoding supports them, or users may write them as numeric character references based on the character's Unicode code point. For example, the references Î, Ð, ק, Ù
, à¹, ã, å¶, è, and ë§ (or the same numeric values expressed in hexadecimal, with as the prefix) display on browsers as Î, Ð, ק, Ù
, à¹, ã, å¶, è, and ë§.
When specifying URIs, for example as URLs in HTTP requests, non-ASCII characters must be percent-encoded
Free and retail fonts based on Unicode are commonly available, since TrueType and OpenType support Unicode.