HTML Character Sets

The browser should know what character sets (character encoding) to use. It is required to display an HTML page correctly.

UTF-8 is the default character encoding for HTML5. However, it was used to be different. ASCII was the character set before it. And the ISO-8859-1 was the default character set from HTML 2.0 till HTML 4.01.

However, there were still problems with encoding, and when UTF-8 appeared with HTML5 and XML, many issues were solved.

Let's see more details about character sets.

ASCII

ASCII was the first character encoding standard, which is also called a character set. It is abbreviated from American Standard Code for Information Interchange.

For each storable character, ASCII defined a unique binary number to support the upper and lower case alphabet (a-z, A-Z), the numbers from 0-9, and special characters. It is originally based on the English alphabet and encodes 128 characters into a 7-bit binary integer as it is known that all computer information is recorded as binary ones and zeros (01000101) in the electronics.

Below, you can see an ASCII chart.

The biggest problem for ASCII is that it didn't have non-English letters. It is still in use, especially in mainframe computers.

Click here to see more about ASCII.

ANSI

ANSI, which was also called Windows-1252, was the default character set for Windows up to Windows 95. It is an extension for ASCII, which adds international characters. It supported 256 characters using a full byte (8-bits).

ANSI was supported by all the browsers since it was announced as the default character set of Windows.

ISO-8859-1

ISO-8859-1 became the default character encoding in HTML2.0, as most countries use characters different from ASCII. It is also an extension to ASCII, just like ANSI, and it adds international characters. ISO-885-1 also uses a full byte to show twice as many characters as ASCII.

Click here to see more about ISO-8859-1.

ISO-8859-1 is an extension to ASCII, with international characters added.

<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">

If an HTML4 page uses a different character encoding than ISO-8859-1, it must be defined in the <meta> tag.

<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-8">

All HTML4 processors support UTF-8.

<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
When a browser detects ISO-8859-1 it commonly defaults to ANSI, as the latter has 32 more international characters.

Unicode UTF-8

UTF-8 is the default character encoding for HTML5.

As the character sets mentioned above are limited, the Unicode Consortium developed a Unicode Standard.

This Unicode Standard has almost all the characters, punctuations, and symbols used in the world.

In HTML, the charset attribute is used to add character encoding.

<meta charset="UTF-8">
All HTML5 and XML processors support ANSI, ISO-8859, and UTF-8.



Do you find this helpful?

Related articles