Reading UTF8 data from a file using Java


In general, data is stored in a computer in the form of bits (1 or, 0). There are various coding schemes available specifying the set of bytes represented by each character.

Unicode (UTF) − Stands for Unicode Translation Format. It is developed by The Unicode Consortium. if you want to create documents that use characters from multiple character sets, you will be able to do so using the single Unicode character encodings. It provides 3 types of encodings.

  • UTF-8 − It comes in 8-bit units (bytes), a character in UTF8 can be from 1 to 4 bytes long, making UTF8 variable width.

  • UTF-16 − It comes in 16-bit units (shorts), it can be 1 or 2 shorts long, making UTF16 variable width.

  • UTF-32 − It comes in 32-bit units (longs). It is a fixed-width format and is always 1 "long" in length.

Writing UTF data to a file

The readUTF() method of the java.io.DataOutputStream reads data that is in modified UTF-8 encoding, into a String and returns it. Therefore to read UTF-8 data to a file −

  • Instantiate the FileInputStream class by passing a String value representing the path of the required file, as a parameter.

  • Instantiate the DataInputStream class bypassing the above created FileInputStream object as a parameter.

  • read UTF data from the InputStream object using the readUTF() method.

Example

import java.io.DataInputStream;
import java.io.EOFException;
import java.io.FileInputStream;
import java.io.IOException;
public class UTF8Example {
   public static void main(String args[]) {
      StringBuffer buffer = new StringBuffer();
      try {
         //Instantiating the FileInputStream class
         FileInputStream fileIn = new FileInputStream("D:\\test.txt");
         //Instantiating the DataInputStream class
         DataInputStream inputStream = new DataInputStream(fileIn);
         //Reading UTF data from the DataInputStream
         while(inputStream.available()>0) {
            buffer.append(inputStream.readUTF());
         }
      }
      catch(EOFException ex) {
         System.out.println(ex.toString());
      }
      catch(IOException ex) {
         System.out.println(ex.toString());
      }
      System.out.println("Contents of the file: "+buffer.toString());
   }
}

Output

Contents of the file: టుటోరియల్స్ పాయింట్ కి స్వాగతిం

The new bufferedReader() method of the java.nio.file.Files class accepts an object of the class Path representing the path of the file and an object of the class Charset representing the type of the character sequences that are to be read() and, returns a BufferedReader object that could read the data which is in the specified format.

The value for the Charset could be StandardCharsets.UTF_8 or, StandardCharsets.UTF_16LE or, StandardCharsets.UTF_16BE or, StandardCharsets.UTF_16 or, StandardCharsets.US_ASCII or, StandardCharsets.ISO_8859_1

Therefore to read UTF-8 data to a file −

  • Create/get an object of the Path class representing the required path using the get() method of the java.nio.file.Paths class.

  • Create/get a BufferedReader object, that could read UtF-8 data, bypassing the above-created Path object and StandardCharsets.UTF_8 as parameters.

  • Using the readLine() method of the BufferedReader object read the contents of the file.

Example

import java.io.BufferedReader;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
public class UTF8Example {
   public static void main(String args[]) throws Exception{
      //Getting the Path object
      String filePath = "D:\\samplefile.txt";
      Path path = Paths.get(filePath);
      //Creating a BufferedReader object
      BufferedReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8);
      //Reading the UTF-8 data from the file
      StringBuffer buffer = new StringBuffer();
      int ch = 0;
      while((ch = reader.read())!=-1) {
         buffer.append((char)ch+reader.readLine());
      }
      System.out.println("Contents of the file: "+buffer.toString());
   }
}

Output

Contents of the file: టుటోరియల్స్ పాయింట్ కి స్వాగతిం
raja
Published on 10-Sep-2019 15:37:15
Advertisements