Can file saved in ASCII be declared UTF-8?

Question:

Master Asker

2011-02-06 15:14:53 UTC

I have shared hosting (with hostgator) and I noticed that in the file manager it seems to save files as US-ASCII by default. With UTF-8 completely recognizing ASCII, can I use the UTF-8 meta tag (in a file which was apparently saved in ASCII)

meta http-equiv="Content-Type" content="text/html; charset=utf-8"

ALSO, even though the file was apparently saved in ASCII, I did a browser test (Tools - Encoding) and it said the encoding was ISO-8859-1... what the *#$%

I have a lot of html files saved as .php for the purpose of php include and the HEAD section is php included in one file to all the other pages ( i forget / don't know / don't care technical terms)

BASIC QUESTION: Am I safe putting meta http-equiv="Content-Type" content="text/html; charset=utf-8" in the file which is basic text/html and was saved as ASCII

Three answers:

Jallan

2011-02-07 14:14:22 UTC

Saying that a file you have saved in ASCII is really UTF-8 should work in all circumstances.

But make sure that the file really is only 7-bit ASCII and contains no extended characters like é, £, or ¢, because they are not in pure ASCII and are encoded differently in Unicode and old-style 256-character character sets.

UTF-8 is one of the three official encodings of the Unicode character set. “Languages” are not subsets of Unicode (or UTF-8). Rather almost all “scripts” currently used are covered by UTF-8. A single script may cover more than one languages, and in some cases the same language is commonly written in more than one script.

As an example, versions of the Latin script are used for English, French, Icelandic, Ojibway, Estonian, and for many other languages. Azeri, the language of Azerbaijan, has been written in three different scripts: the Perso-Arabic script, the Latin script, and the Cyrillic script.

In the early days of the Internet, before Unicode, people published on the net using many different character sets. Hence it was necessary to indicate in coding which character set was being used, so that the browser could attempt to select appropriate fonts and the correct characters with the fonts it was using.

Unicode covers all the characters in all the character sets officially recognized on the web and is supported by 99% and more of the computers in use, so is usually the obvious character set to use. But HTML entities might be used and were used in pages coded using an old character set to indicate Unicode characters not part of that old set. This works just as well and even a little better than using Unicode when the main text is in a script where three or even four bytes would be necessary to encode most of the characters used on the web page. For example, in a page set up in the Russian script, most of the characters would take three bytes each in UTF-8.

Accordingly the old character sets are still used, either from force of habit by the programmer(s) or for sake of efficiency. HTML must continue to recognize that these older character sets are still being used and therefore character sets must continue to be identified in coding HTML.

David D

2011-02-06 15:17:06 UTC

ASCII is a subset of ISO-8859-1 and UTF-8.

Any ASCII document can safely be treated as either of them.

Response to update:

> Aren't ALL living languages subsets of UTF-8, so can UTF-8 not be used on EVERY webpage?

There might be a couple of obscure ones that aren't covered (but they tend not be covered by anything)

> So then what is all this about character encoding if UTF-8 can can be used and is recommended in (apparently) ALL situations?

There are times with other Unicode encodings are a better choice. For example, Asian language documents are better in UTF-16 since the structure of that encoding means the characters needed are represented by smaller codes.

As for that, ISO-8859-* is legacy, superseded many years ago, but some people still haven't moved on.

2014-03-29 17:00:40 UTC

Hi,

May be this site can help you

http://webhostingsecretsonline.com

Regards,

ⓘ

This content was originally posted on Y! Answers, a Q&A website that shut down in 2021.

about - legalese