How to HTML-encode a String

This tutorial provides some methods that are used for HTML-encoding a string without an XSS vulnerability.

Here is an example which somehow reduces the XSS chance:

<!DOCTYPE html>
<html>
  <head>
    <title>Title of the document</title>
  </head>
  <body>
    <div id="encoded"></div>
    <div id="decoded"></div>
    <script>
      let string1 = "Html & Css & Javascript";
      let string2 = "Html &amp; Css &amp; Javascript";
      function htmlDecode(input) {
        const textArea = document.createElement("textarea");
        textArea.innerHTML = input;
        return textArea.value;
      }
      function htmlEncode(input) {
        const textArea = document.createElement("textarea");
        textArea.innerText = input;
        return textArea.innerHTML.split("<br>").join("\n");
      }
      document.getElementById("encoded").innerText = htmlEncode(string1);
      document.getElementById("decoded").innerText = htmlDecode(string2);
    </script>
  </body>
</html>

On the htmlEncode function the innerText of the element is set, and the encoded innerHTML is retrieved. The innerHTML value of the element is set on the htmlDecode function the innerText is retrieved.

In the following html code, we use the functions we have defined to convert a user input in a textarea, and encode it to prevent XSS.

<!DOCTYPE html>
<html>
  <body>
    <textarea rows="6" cols="50" name="normalTXT" id="textId"></textarea>
    <button onclick="convert()">Convert</button>
    <br />
    <URL>
      Encoding in URL:
      <input width="500" type="text" name="URL-ENCODE" id="URL-ENCODE" />
      <br />
    </URL>
    <html>
      Encoding in HTML:
      <input type="text" name="HTML-ENCODE" id="HTML-ENCODE" />
      <br />
    </html>
    <script>
      function htmlDecode(input) {
        const textArea = document.createElement("textarea");
        textArea.innerHTML = input;
        return textArea.value;
      }
      function htmlEncode(input) {
        const textArea = document.createElement("textarea");
        textArea.innerText = input;
        return textArea.innerHTML.split("<br>").join("\n");
      }
      function convert() {
        const textArea = document.getElementById("textId");
        const HTMLencoded = textArea.value;
        document.getElementById("HTML-ENCODE").value = HTMLencoded;
        const urlEncode = htmlEncode(textArea.value);
        document.getElementById("URL-ENCODE").value = urlEncode;
      }
    </script>
  </body>
</html>

This method will work fine in many scenarios, but in some cases, you will end up with a XSS vulnerability.

For the function above, consider the following string:

htmlDecode("<img src='dummy' onerror='alert(/xss/)'>");

The string contains an unescaped HTML tag, so instead of decoding the htmlDecode function will run JavaScript code specified inside the string. To avoid this you can use DOMParser which is supported in all major browsers:

Javascript decoding the HTML
function htmlDecode(input) { let doc = new DOMParser().parseFromString(input, "text/html"); return doc.documentElement.textContent; } alert(htmlDecode("&lt;img src='img.jpg'&gt;")); // "<img src='myimage.jpg'>" alert(htmlDecode("<img src='dummy' onerror='alert(/xss/)'>")); // ""
The function won’t run any JavaScript code as a side-effect. Any HTML tag will be ignored as the text content only will be returned.

Another useful and fast method exists which also encodes quote marks:

function htmlEscape(str) {
    return str
        .replace(/&/g, '&amp')
        .replace(/'/g, '&apos')
        .replace(/"/g, '&quot')
        .replace(/>/g, '&gt')   
        .replace(/</g, '&lt');    
}

// The opposite function:
function htmlUnescape(str) {
    return str
        .replace(/&amp/g, '&')
        .replace(/&apos/g, "'")
        .replace(/&quot/g, '"')
        .replace(/&gt/g, '>')   
        .replace(/&lt/g, '<');    
}

To escape forward-slash / for anti-XSS safety purposes use the following:

.replace(/\//g, '/');

The replace() Method

The replace() RegExp method replaces the specified string with another string. The method takes two parameters the first one is the string that should be replaced, and the second one is the string replacing from the first string. The second string can be given an empty string so that the text to be replaced is removed.