When should I encode HTML entities?

Encode user-supplied content before inserting it into HTML to prevent XSS attacks. Characters like , &, and " must be encoded so browsers treat them as text rather than HTML markup.

What is the difference between named and numeric entities?

Named entities use a short descriptor (e.g., & for &, < for <). Numeric entities use the Unicode code point in decimal (&) or hex (&). Named entities are more readable; numeric entities work in any context.

Does encoding HTML entities affect page performance?

Negligibly. Modern browsers parse HTML entities extremely fast. The security benefit of proper encoding far outweighs any theoretical micro-cost.

Free Online HTML Entity Encoder & Decoder

Encode special HTML characters to their entity equivalents (& → &) or decode HTML entities back to plain text. Essential for safe HTML templating and debugging.

HTML Entities: Encoding Special Characters for the Web

HTML entities are a fundamental part of web development that ensures special characters display correctly in browsers and are transmitted safely through HTML documents. When you need to display a less than sign in a paragraph of text without the browser interpreting it as the start of an HTML tag, or when you want to include a copyright symbol, an em dash, or a non-breaking space, HTML entities provide the solution. Understanding entity encoding prevents display bugs, XSS vulnerabilities, and character corruption in web applications.

What Are HTML Entities?

An HTML entity is a string that begins with an ampersand and ends with a semicolon, representing a character that either cannot be directly included in HTML source code or that has special meaning in the HTML syntax. Entities can be written in two forms: named entities use a descriptive name (such as < for the less-than sign) while numeric entities use either a decimal number (<) or a hexadecimal number (<) corresponding to the character's Unicode code point. Both forms produce the same output in the browser.

The five characters that must always be encoded in HTML content are the less-than sign (<), the greater-than sign (>), the ampersand itself (&), the double quote ("), and the single quote/apostrophe ('). These characters have special meaning in HTML syntax, and including them unencoded can cause the browser to misinterpret your content as markup. The ampersand in particular is the escape character itself, so any literal ampersand in content must be encoded as & to prevent it from being interpreted as the start of an entity.

Security: Preventing Cross Site Scripting

HTML entity encoding is a critical security measure for preventing cross-site scripting (XSS) attacks. XSS occurs when user supplied input is embedded directly in HTML output without encoding, allowing attackers to inject malicious script tags or event handlers. If a user submits the name <script>alert('XSS')</script> and your application displays it unencoded, the browser will execute that JavaScript. Encoding the input as <script>alert('XSS')</script> ensures the browser displays the text literally without executing it.

Every web framework and template engine provides HTML encoding functions for this purpose htmlspecialchars() in PHP, HtmlEncoder.encode() in Java, escape() in Jinja2 and Django templates, and the default behavior in React's JSX. These functions encode the five critical characters automatically. The security rule is simple: never output user-supplied data to an HTML page without encoding it first, unless you have explicitly verified the content is safe HTML and intentionally want to render it. Encoding by default and allowing HTML explicitly is far safer than the reverse.

Common Named Entities and Their Uses

Beyond the five critical characters, a large set of named entities covers typographic characters, symbols, and characters from various writing systems. The non breaking space ( ) prevents line breaks between two words that should stay together useful for units like "10 kg" or names like "John Smith" where breaking between the parts would look awkward. The em dash (—) and en dash (–) are typographically correct substitutes for hyphens in various contexts: em dashes for parenthetical remarks—like this and en dashes for ranges like 10–20.

Typographic quotation marks the "curly" or "smart" quotes “ (“) and ” (”) for double quotes, and ‘ (‘) and ’ (’) for single quotes are preferred over straight quotation marks in well-typeset content. The copyright symbol © (©), registered trademark ® (®), trademark ™ (™), and various mathematical symbols like × (×), ÷ (÷), ± (±), and ∞ (∞) are all available as named entities.

Unicode and Modern HTML

Modern HTML documents use UTF-8 encoding, declared with <meta charset="UTF-8">, which means virtually any Unicode character can be included directly in the source without entity encoding, provided the file is saved as UTF-8. You can write © directly instead of ©, paste em dashes directly, and use any international character in your content. The main reason to still use entities for non ASCII characters is compatibility with systems that might not handle UTF-8 correctly, or when you need to include the character in an attribute value where direct Unicode might cause parsing issues.

For the five reserved characters, entity encoding remains mandatory regardless of charset. The ampersand, angle brackets, and quotes must be encoded in HTML content and attribute values where they could be misinterpreted. The charset declaration does not change this requirement it affects which characters can be used without encoding, not whether the HTML structural characters need escaping.

Entities in XML and XHTML

XML, XHTML, and SVG have stricter entity requirements than HTML5. While HTML5 parsers are forgiving about some encoding issues, XML parsers are not an unencoded ampersand in XML will cause a parse error and stop processing entirely. This strictness is intentional: XML is designed to be machine-parseable without ambiguity. When generating XML output programmatically for RSS feeds, Atom feeds, SOAP requests, SVG files, or XML configuration files — always use a proper XML serializer or ensure all special characters are entity-encoded. Manually constructing XML by string concatenation without encoding is a reliable recipe for malformed output and parsing failures.

SVG files embedded in HTML are a common source of entity encoding confusion. SVG uses XML syntax, so text content in SVG elements must be entity-encoded if it contains reserved characters. When embedding SVG inline in HTML5, the SVG content is parsed as HTML, which is more lenient. But when loading SVG as an external file (via an img src or object element), it must be valid XML and all characters must be correctly encoded. Understanding this distinction prevents mysterious rendering failures when working with SVG graphics that contain text or data URI embedded content.