What is Java Unicode System?



This article will help you understand what the Java Unicode System is.

Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. For example – A = U + 0041, B = U + 0042.

Why we need Unicode?

Before Unicode was invented, there were hundreds of different encoding systems for assigning this number. No single encoding could contain enough characters: For example, the European Union alone requires several different encodings to cover all its languages. Even for a single language like English no single encoding was adequate for all the letters, punctuation, and technical symbols in common use. These encoding systems also collide with one another. That is, two encodings can use the same number for two different characters, or use different numbers for the same character. Any given computer (especially servers) needs to support many different encodings; yet whenever data is passed between different encodings or platforms, the data always runs the risk of corruption.

Unicode System

Unicode provides a unique number for every character, no matter what the platform, no matter what the language, no matter what the program. Example  A = U + 0041, B = U + 0042, C = U + 0043, D = U + 0044.

The Unicode system has been adopted by such industry leaders like Apple, HP, IBM, Just Systems, Microsoft, Oracle and many others. Unicode is required by modern standards such as XML, JAVA, ECMAScript (JavaScript), COBRA 3.0, WML, LDAP etc., and is the official way to implement ISO/IEC 10646. It is supported in many Operating Systems, all modern browsers, and many other products. The emergence of the Unicode standard, and the availability of tools supporting it, are among the most significant recent global software technology trends.

Incorporating Unicode into client-server or multi-tiered applications and websites offers significant cost savings over the use of legacy character sets. Unicode enables a single software product or a single website to be targeted across multiple platforms, languages and countries without re-engineering. It allows data to be transported through many different systems without corruption.

Unicode and ASCII code are both ways that computer languages store characters as numbers. ASCII stands for “American Standard Code for Information Interchange” and it allows encoding for 128 characters. This is fine for English language, but not enough for others. Unicode can handle 100,000 characters, so by using this encoding scheme, JAVA allows programmers to work with printed languages from around the world.


Advertisements