HTML Character Sets

A browser must know what character sets (character encoding) to use. It is required to show the HTML page correctly.

UTF-8 is the default character encoding for HTML5. However, it was used to be different. ASCII was the character set before it. And the ISO-8859-1 was the default character set from HTML 2.0 till HTML 4.01.

But there were still problems with encoding and when UTF-8 came along with HTML5 and XML, many problems were solved.

Let's see more details about character sets.

ASCII

ASCII was the first character encoding standard (also called character set).

It is actually abbreviated from American Standard Code for Information Interchange. It is originally based on English alphabet and encodes 128 characters into 7-bit binary integer as it is known that all computer informations are recorded as binary ones and zeros (01000101) in the electronics.

You can see a ASCII chart above.

The biggest problem for ASCII is that it didn't have non English letters. It is still in use commonly, especially in mainframe computers.

Click here to see more about ASCII.

ANSI

ANSI, which was also called Windows-1252, was the default character set for Windows up to Windows 95. It is an extension for ASCII which adds international characters. It supported 256 characters using a full byte (8-bits).

ANSI was supported by all the browsers since it was announced as the default character set of Windows.

ISO-8859-1

ISO-8859-1 became the default character encoding in HTML2.0, as most countries use characters different from ASCII. It is also an extension to ASCII just like ANSI and it adds international characters. ISO-885-1 also uses full byte to show twice as many characters than ASCII.

Click here to see more about ISO-8859-1.

If an HTML4 page uses a different character encoding than ISO-8859-1, it must be defined in the <meta> tag.

<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-8">

Unicode UTF-8

UTF-8 is the default character encoding for HTML5.

As the character sets mentioned above are limited, the Unicode Consortium developed a Unicode Standard.

This Unicode Standard has almost all the characters, punctuation and symbols which are used in the world.

In HTML use charset attribute is used to add character encoding.

<meta charset="UTF-8">