W3docs

Remove HTML tags from a String

To remove HTML tags from a string, you can use a regular expression to match and replace the tags with an empty string.

To remove HTML tags from a string, you can use a regular expression to match and replace the tags with an empty string.

Here's an example of how you can do this in Java:


import java.util.regex.Pattern;

public class HtmlTagRemover {
  public static String stripHtmlTags(String input) {
    if (input == null || input.isEmpty()) {
      return input;
    }
    // Compile the regular expression to match HTML tags
    Pattern pattern = Pattern.compile("<[^>]+>");
    // Replace all HTML tags with an empty string
    return pattern.matcher(input).replaceAll("");
  }
}

This method uses a regular expression to match any sequence of characters that starts with a < character and ends with a > character, and replaces it with an empty string. This will remove all HTML tags from the input string.

Here's an example of how you can use the stripHtmlTags() method:


public class Main {
  public static void main(String[] args) {
    String input = "<p>This is a <b>test</b> string.</p>";
    String output = HtmlTagRemover.stripHtmlTags(input);
    System.out.println(output); // This is a test string.
  }
}

Note that this method only removes the tags from the input string, and does not parse the HTML to extract the content of the tags. Keep in mind that this simple regex may not handle all edge cases, such as attributes containing > or malformed HTML. If you need to extract the content of the tags or handle complex HTML, you will need to use an HTML parser or a library that can parse HTML, such as Jsoup.