How to remove non-ASCII characters from strings


The Posix character class \p{ASCII} matches the ASCII characters and the meta character ^ acts as negation.

i.e. The following expression matches all the non-ASCII characters.

"[^\p{ASCII}]"

The replaceAll() method of the String class accepts a regular expression and a replacement-string and, replaces the characters of the current string (matching the given pattern) with the specified replacement-string.

Therefore, You can remove the matched characters by replacing them with the empty string “, using the replaceAll() method.

Example 1

import java.util.Scanner;
public class Exp {
   public static void main( String args[] ) {
      Scanner sc = new Scanner(System.in);
      String regex = "[^\p{ASCII}]";
      System.out.println("Enter input data:");
      String input = sc.nextLine();
      String result = input.replaceAll(regex, "");
      System.out.println("Result: "+result);
   }
}

Output

Enter input data:
whÿ do we fall
Result: wh do we fall

Example 2

import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Example {
   public static void main( String args[] ) {
      Scanner sc = new Scanner(System.in);
      System.out.println("Enter input string: ");
      String input = sc.nextLine();
      String regex = "[^\p{ASCII}]";
      //Creating a pattern object
      Pattern pattern = Pattern.compile(regex);
      //Matching the compiled pattern in the String
      Matcher matcher = pattern.matcher(input);
      //Creating an empty string buffer
      StringBuffer sb = new StringBuffer();
      while (matcher.find()) {
         matcher.appendReplacement(sb, "");
      }
      matcher.appendTail(sb);
      System.out.println("Result: \n"+ sb.toString() );
   }
}

Output

Enter input string:
whÿ do we fall
Result:
wh do we fall

Updated on: 21-Nov-2019

733 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements