Before talking about Unicode character sets, learn a little about Unicodes and their creator Unicode Consortium.
There are many languages and nations in the world and different symbols for countries. However, when computers and internet started to become widespread, the existing character sets were not enough for multilingual environments.
The Unicode Consortium is a non-profit organization and it was founded to solve this problem.The aim of the organization is to replace the existing character sets with Unicode and its standard Unicode Transformation Format (UTF) .
The organization and its work was successful, and today the Unicode standard is supported by many operating systems and browsers.
Unicode Consortium also cooperates with the leading standards development organizations.
Unicode Character Sets
Unicode can be implemented by many different character sets. The most widely used character sets are UTF-8 and UTF-16.
UTF-8: UTF-8 characters can be from 1 to 4 bytes long. UTF-8 can show every character in Unicode standard. It can work compatibly with ASCII. It is preferred for e-mails and webpages.
UTF-16: UTF-16 is a variable-length character set for Unicode. It is used in major operating systems and environments just like Windows, Java and .NET.
While HTML4 supports only UTF-8, HTML5 supports both UTF-8 and UTF-16 character sets.
The Unicode Consortium developed Unicode Standard just because ISO-8859 character set was limited and it was not compatible with multilingual environments. In opposition to ISO-8859, Unicode Standard has almost all characters, punctuations and symbols in the world.
UTF-8 is the default character encoding in HTML5. If an HTML5 webpage uses a different character set, it must be defined in the <meta> tag.
See how to do it below:
HTML UTF-8 Character Codes
See some of UTF-8 character codes which are supported by HTML5:
|C0 Controls and Basic Latin||0-127||0000-007F|
|C1 Controls and Latin-1 Supplement||128-255||0080-00FF|
|Greek and Coptic||880-1023||0370-03FF|