- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Convert Unicode to UTF-8 in Java
Before moving onto their conversions, let us learn about Unicode and UTF-8.
Unicode is an international standard of character encoding which has the capability of representing a majority of written languages all over the globe. Unicode uses hexadecimal to represent a character. Unicode is a 16-bit character encoding system. The lowest value is \u0000 and the highest value is \uFFFF.
UTF-8 is a variable width character encoding. UTF-8 has the ability to be as condensed as ASCII but can also contain any Unicode characters with some increase in the size of the file. UTF stands for Unicode Transformation Format. The '8' signifies that it allocates 8-bit blocks to denote a character. The number of blocks needed to represent a character varies from 1 to 4.
In order to convert Unicode to UTF-8 in Java, we use the getBytes() method. The getBytes() method encodes a String into a sequence of bytes and returns a byte array.
Declaration - The getBytes() method is declared as follows.
public byte[] getBytes(String charsetName)
where charsetName is the specific charset by which the String is encoded into an array of bytes.
Let us see a program to convert Unicode to UTF-8 in Java using the getBytes() method.
Example
public class Example { public static void main(String[] args) throws Exception { String str1 = "\u0000"; String str2 = "\uFFFF"; byte[] arr = str1.getBytes("UTF-8"); byte[] brr = str2.getBytes("UTF-8"); System.out.println("UTF-8 for \u0000"); for(byte a: arr) { System.out.print(a); } System.out.println("
UTF-8 for \uffff" ); for(byte b: brr) { System.out.print(b); } } }
Output
UTF-8 for \u0000 0 UTF-8 for \uffff -17-65-65
Let us understand the above program. We have created two Strings.
String str1 = "\u0000"; String str2 = "\uFFFF";
String str1 is assigned \u0000 which is the lowest value in Unicode. String str2 is assigned \uFFFF which is the highest value in Unicode.
To convert them into UTF-8, we use the getBytes(“UTF-8”) method. This gives us an array of bytes as follows −
byte[] arr = str1.getBytes("UTF-8"); byte[] brr = str2.getBytes("UTF-8");
Then to print the byte array, we use an enhanced for loop as follows −
for(byte a: arr) { System.out.print(a); } for(byte b: brr) { System.out.print(b); }
- Related Articles
- Convert UTF-8 to Unicode in Java
- Convert String to UTF-8 bytes in Java
- How many bits are used to represent Unicode, ASCII, UTF-16, and UTF-8 characters in java?
- How to read and write unicode (UTF-8) files in Python?
- Convert ASCII TO UTF-8 Encoding in PHP?
- How to represent Unicode strings as UTF-8 encoded strings using Tensorflow and Python?
- How to convert wrongly encoded data to UTF-8 in MySQL?
- How to convert an MySQL database characterset and collation to UTF-8?
- UTF-8 Validation in C++
- Change MySQL default character set to UTF-8 in my.cnf?
- How to convert Unicode values to characters in JavaScript?
- How to convert Date to String in Java 8?
- How to convert an integer to a unicode character in Python?
- How can Tensorflow text be used to split the UTF-8 strings in Python?
- How to Convert a Java 8 Stream to an Array?
