May 14, 2026

How To Encode HTML Code

How To Encode HTML Code means converting special characters into HTML entities so a browser renders them as text rather than interpreting them as markup — for example, turning < into &lt; so the browser displays a literal less-than sign instead of opening a tag. This guide covers the HTML Living Standard (WHATWG) definition of entities, the five characters you must always encode, how encoding blocks XSS attacks, and practical code examples in JavaScript, PHP, and Python.

What Is HTML Entity Encoding?

The HTML Living Standard (maintained by WHATWG at html.spec.whatwg.org) defines an HTML character reference — commonly called an HTML entity — as either a named character reference such as &amp;, or a numeric character reference in decimal (e.g. &#38;) or hexadecimal (e.g. &#x26;) form. Both notations are valid in HTML5 and tell the parser to produce a specific Unicode code point rather than act on the raw character.

HTML encoding serves three distinct purposes:

  • Security — prevent XSS. If user-supplied text is inserted into an HTML document without encoding, an attacker can inject <script>tags or inline event handlers that execute in the victim's browser.
  • Display reserved characters as content. The characters <, >, and & have structural meaning in HTML. To display them as visible text you must encode them.
  • Non-ASCII characters in ASCII-only documents. Numeric references let you embed any Unicode code point even in documents that use ASCII or Latin-1 character encoding.

Use our HTML Encoder / Decoder tool to encode or decode HTML entities instantly without writing any code.

The Five Characters You Must Always Encode

Any content that may contain these five characters and is inserted into an HTML document must be encoded — no exceptions. They form the minimum safe set for preventing HTML injection in element content and attribute values.

CharacterNamed EntityNumeric EntityUsage
& (ampersand)&amp;amp;&amp;#38; / &amp;#x26;Starts every entity — must be encoded first
< (less-than)&amp;lt;&amp;#60; / &amp;#x3C;Opens HTML tags
> (greater-than)&amp;gt;&amp;#62; / &amp;#x3E;Closes HTML tags
" (double quote)&amp;quot;&amp;#34; / &amp;#x22;Delimits attribute values in double quotes
' (single quote)&amp;apos; (HTML5 / XML)&amp;#39; / &amp;#x27;Delimits attribute values in single quotes

One important note on the single quote: &apos; is defined in XML and was formally added to the HTML5 named character reference list, but it was not part of HTML4. For broadest compatibility, the numeric form &#x27; is safer when targeting legacy parsers. Encode the ampersand (&) first whenever you process a string — otherwise you will double-encode existing entities in the content.

Other Commonly Encoded Characters

Beyond the essential five, these named entities appear frequently in web content:

  • Non-breaking space &nbsp; — prevents a line break between two words and produces a space that will not collapse.
  • © (copyright sign) &copy;
  • ® (registered trademark) &reg;
  • ™ (trade mark sign) &trade;
  • € (euro sign) &euro;
  • — (em dash) &mdash;

In modern UTF-8 HTML documents you can include these characters literally (the browser handles them correctly), but the named entities remain useful in templating systems, email HTML, or any environment where character encoding is uncertain.

XSS Attacks and Why HTML Encoding Is the Fix

Cross-Site Scripting (XSS) is one of the most widespread web security vulnerabilities. It occurs when an application inserts untrusted data — typically something a user typed — directly into an HTML page without encoding it first. The browser cannot distinguish between the application's own markup and the attacker's injected markup, so it executes both.

Vulnerable output (do not do this)

<!-- User input: <script>alert('XSS')</script> -->
<!-- Server inserts it raw into the page: -->
<p>Hello, <script>alert('XSS')</script>!</p>
<!-- Result: the browser executes alert() -->

Safe output after HTML encoding

<!-- Same user input, HTML-encoded before insertion: -->
<p>Hello, &lt;script&gt;alert('XSS')&lt;/script&gt;!</p>
<!-- Result: the browser displays the text literally -->

After encoding, the angle brackets become &lt; and &gt;. The browser renders them as visible characters instead of tag boundaries, and no script executes. The rule is simple: always HTML-encode untrusted data before inserting it into an HTML context — element content, attribute values, JavaScript string literals, CSS, and URL parameters each require their own encoding scheme; HTML entity encoding handles the first two.

How To HTML-Encode in JavaScript

JavaScript (in a browser context) has no built-in encodeHTML() function. The canonical approach uses the DOM: assign the string to the textContent property of a temporary element, then read back innerHTML. The browser's own serialiser performs the encoding.

// Encode
function htmlEncode(str) {
  const el = document.createElement('div')
  el.textContent = str
  return el.innerHTML
}

// Decode
function htmlDecode(str) {
  const el = document.createElement('div')
  el.innerHTML = str
  return el.textContent
}

// Examples
htmlEncode('<script>alert("xss")</script>')
// → '&lt;script&gt;alert("xss")&lt;/script&gt;'

htmlDecode('&lt;b&gt;Hello &amp; world&lt;/b&gt;')
// → '<b>Hello & world</b>'

The DOMParser API is not straightforward for encoding (it parses HTML rather than escaping it), so the textContent / innerHTML trick above is the widely recommended DOM-based approach. In a Node.js environment — where document is not available — use a library such as he or entities, both of which implement the full WHATWG named character reference list.

How To HTML-Encode in PHP

PHP provides two built-in functions. Understanding the difference is important for choosing the right one:

<?php

// RECOMMENDED — encodes only the five essential characters
$safe = htmlspecialchars($input, ENT_QUOTES | ENT_HTML5, 'UTF-8');

// ENT_QUOTES  → encode both " (double) and ' (single) quotes
// ENT_HTML5   → use HTML5 entity names where available
// 'UTF-8'     → treat the input as UTF-8

// NOT recommended for security use — encodes every character
// that has a named entity equivalent (many more than the five)
$overEncoded = htmlentities($input, ENT_QUOTES | ENT_HTML5, 'UTF-8');

// Decoding
$original = html_entity_decode($safe, ENT_QUOTES | ENT_HTML5, 'UTF-8');

?>

htmlspecialchars() is sufficient to prevent XSS and is the standard recommendation. It encodes &, <, >, ", and ' (when ENT_QUOTES is passed). htmlentities() encodes everything — including accented letters and currency symbols — which can produce unnecessarily verbose output and cause issues when the document is already correctly served as UTF-8.

How To HTML-Encode in Python

Python's standard library includes the html module, available since Python 3.2, which provides straightforward encode and decode functions.

import html

# RECOMMENDED — standard library, no dependencies
# quote=True also encodes " and ' (default is True)
safe = html.escape('<script>alert("xss")</script>', quote=True)
# → '&lt;script&gt;alert(&quot;xss&quot;)&lt;/script&gt;'

# Decoding
original = html.unescape('&lt;b&gt;Hello &amp; world&lt;/b&gt;')
# → '<b>Hello & world</b>'

# --- MarkupSafe (used by Jinja2) ---
# pip install markupsafe
from markupsafe import escape, Markup

safe_markup = escape('<b>User input</b>')
# → Markup('&lt;b&gt;User input&lt;/b&gt;')
# Returns a Markup object — won't be double-encoded if passed
# back into a Jinja2 template that already auto-escapes.

html.escape() encodes &, <, >, ", and (when quote=True) '. The MarkupSafe.escape() function produces an identical encoding but wraps the result in a Markup type that Jinja2 recognises as already safe — preventing the double-encoding that would otherwise occur if auto-escaping is enabled in the template.

For related encoding topics, see our guide on what is URL encoding, or try our Base64 encoder / decoder for Base64 encoding tasks.

Frequently Asked Questions

What does it mean to encode HTML code?

Encoding HTML means replacing characters that carry structural meaning in the HTML grammar with their entity equivalents. For instance, a raw < character would start a new tag; after encoding it becomes &lt;, which the browser displays as a visible less-than sign.

Which characters must always be HTML-encoded?

The five: & &amp;; < &lt;; > &gt;; " &quot;; ' &#x27;. Encoding these five is the minimum requirement when inserting untrusted text into HTML element content or attribute values.

Why does HTML encoding prevent XSS?

XSS injection relies on the attacker's input being parsed as HTML markup. When you encode the five essential characters first, angle brackets lose their ability to open tags, and quotes can no longer break out of attribute values. The malicious payload is displayed as inert text instead of being executed.

How do I HTML-encode a string in JavaScript?

There is no native encodeHTML() function. Use the DOM: const el = document.createElement('div'); el.textContent = str; return el.innerHTML; The browser serialiser handles all necessary escaping.

What is the difference between htmlspecialchars and htmlentities in PHP?

htmlspecialchars() encodes only the five security-critical characters and is the recommended function for XSS prevention. htmlentities() additionally converts accented characters, currency symbols, and any other character with a named entity equivalent — which is unnecessary for security and can produce over-encoded output.

Related Tools & Guides