
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to remove the HTML tags from a given string in Java?
A String is a final class in Java, and it is immutable, it means that we cannot change the object itself, but we can change the reference to the object. The HTML tags can be removed from a given string by using the following approaches:
Using replaceAll() Method
We can remove the HTML tags from a given string by using the replaceAll() method. The replaceAll() method accepts two parameters: a "regular expression" and a "replacement string". It replaces all substrings that match the regular expression with the specified replacement.
Syntax
Following is the syntax of the replaceAll() method:
public String replaceAll(String regex, String replacement)
Here,
- regex: A regular expression that defines the pattern to search for in the string.
- replacement: The new string to replace each match of the regular expression.
Example
public class RemoveHTMLTagsTest { public static void main(String[] args) { String str = "<p><b>Welcome to Tutorials Point</b></p>"; System.out.println("Before removing HTML Tags: " + str); str = str.replaceAll("\<.*?\>", ""); System.out.println("After removing HTML Tags: " + str); } }
Following is the output of the above program:
Before removing HTML Tags: <p><b>Welcome to Tutorials Point</b></p> After removing HTML Tags: Welcome to Tutorials Point
Using Pattern and Matcher
Here we have another approach to remove the HTML tags from a given string by using Pattern and Matcher. The Pattern class in Java defines a regular expression, while the Matcher class performs matching operations on an input string using that pattern. Pattern compiles the regex, and Matcher searches and manipulates the string based on it.
Syntax of Pattern and Matcher
Pattern pattern = Pattern.compile("regex-pattern"); Matcher matcher = pattern.matcher("input-string");
Here,
- regex-pattern: This is a String that defines the pattern you are trying to match.
- input-string: This is the original String you want to search, match, or modify using the regex pattern.
Example
This is another example of removing the HTML tags from the given String. We use the Pattern and Matcher technique to remove the HTML tags from the string, "<div>Hello from <span>Tutorialspoint</span></div>".
import java.util.regex.Matcher; import java.util.regex.Pattern; public class Main { public static void main(String[] args) { String str = "<div>Hello from <span>Tutorialspoint</span></div>"; System.out.println("Before removing HTML tags: " + str); Pattern tagPattern = Pattern.compile("<[^>]+>"); Matcher matcher = tagPattern.matcher(str); StringBuffer sb = new StringBuffer(); while (matcher.find()) { matcher.appendReplacement(sb, ""); } matcher.appendTail(sb); System.out.println("After removing HTML tags: " + sb.toString()); } }
The above program produces the following output:
Before removing HTML tags: <div>Hello from <span>Tutorialspoint</span></div> After removing HTML tags: Hello from Tutorialspoint