Convert UTF-8 to Unicode in Java


Before moving onto their conversions, let us learn about Unicode and UTF-8.

Unicode is an international standard of character encoding which has the capability of representing a majority of written languages all over the globe. Unicode uses hexadecimal to represent a character. Unicode is a 16-bit character encoding system. The lowest value is \u0000 and the highest value is \uFFFF.

UTF-8 is a variable width character encoding. UTF-8 has the ability to be as condense as ASCII but can also contain any unicode characters with some increase in the size of the file. UTF stands for Unicode Transformation Format. The '8' signifies that it allocates 8-bit blocks to denote a character. The number of blocks needed to represent a character varies from 1 to 4.

In order to convert UTF-8 to Unicode, we create a String Object which has the parameters as the UTF-8 byte array name and the charset the array of bytes which it is in i.e. UTF-8.

Let us see a program to convert UTF-8 to Unicode by creating a new String Object.

Example

 Live Demo

public class Example {
   public static void main(String[] args) throws Exception {
      String str = "hey\u6366";
      byte[] charset = str.getBytes("UTF-8");
      String result = new String(charset, "UTF-8");
      System.out.println(result);
   }
}

Output

hey捦

Let us understand the above program. Firstly we converted a given Unicode string to UTF-8 for future verification using the getBytes() method −

String str = "hey\u6366";
byte[] charset = str.getBytes("UTF-8")

Then we converted the charset byte array to Unicode by creating a new String object as follows −

String result = new String(charset, "UTF-8");
System.out.println(result);

Updated on: 26-Jun-2020

5K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements