How many bits are used to represent Unicode, ASCII, UTF-16, and UTF-8 characters in java?


In general, data is stored in a computer in the form of bits (1 or, 0). There are various coding schemes available specifying the set of bytes represented by each character.

ASCII − Stands for American Standards Code for Information Interchange. It is developed by American standards association and is the mostly used coding system. It represents characters using 7 bits and has includes 128 characters: upper and lowercase Latin alphabet, the numbers 0-9, and some extra characters).

Unicode (UTF) − Stands for Unicode Translation Format. It is developed by The Unicode Consortium. if you want to create documents that use characters from multiple character sets, you will be able to do so using the single Unicode character encodings. It provides 3 types of encodings.

  • UTF-8 − It comes in 8-bit units (bytes), a character in UTF8 can be from 1 to 4 bytes long, making UTF8 variable width.
  • UTF-16 − It comes in 16-bit units (shorts), it can be 1 or 2 shorts long, making UTF16 variable width.
  • UTF-32 − It comes in 32-bit units (longs). It is a fixed-width format and is always 1 "long" in length.

Representation in Java

The following table lists the number of bits used in Java to represent various coding standards.

Representationbits used
ASCII7 bits (represented as 8 bits).
UTF-88, 16 and, 18bit patterns.
UTF-1616 bits and larger bit patterns.

Updated on: 30-Jul-2019

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements