HTML Entity Encoding - Preventing XSS Attacks

Learn how HTML entity encoding protects your web applications from Cross-Site Scripting (XSS) attacks. Understand character encoding, when to encode user input, framework auto-escaping, and common vulnerabilities developers miss.

2025-06-1214 min

Related toolHTML Encoder/Decoder

Use the tool alongside this guide for hands-on practice.

The XSS attack that made me paranoid

I'll never forget the first time I saw XSS exploited on a site I built. A comment field. Someone posted what looked like innocent text, but hidden inside was a script that stole session cookies. The client called me at 11 PM because users were getting logged out randomly. Turned out someone was hijacking sessions.

Cross-Site Scripting lets attackers inject malicious JavaScript into pages viewed by other users. When the browser renders that page, the injected code runs with full privileges - reading cookies, stealing tokens, modifying the page. It's been in the OWASP Top 10 forever, and I still catch it in code reviews regularly.

The three flavors of XSS

Not all XSS is created equal. Understanding the types helps you know where to look and how to defend.

Stored XSS: The payload lives in your database. Comment fields, user profiles, forum posts. Every visitor who loads the page executes the script. This is the nightmare scenario.
Reflected XSS: The payload comes from the URL or form submission. Attacker sends victim a malicious link. When they click, boom. Requires social engineering but still dangerous.
DOM-based XSS: The payload never hits your server. Client-side JavaScript reads location.hash or similar and unsafely writes it to the DOM. Server-side protections are useless here.
Stored XSS is worst - no user action needed beyond normal browsing
Reflected XSS needs victims to click sketchy links
DOM XSS bypasses your server completely - you might never see it in logs

How HTML encoding saves you

HTML entity encoding is your first line of defense. It converts dangerous characters into their HTML entity equivalents. The browser displays them as text instead of interpreting them as code.

When someone tries to inject <script>alert('gotcha')</script>, encoding turns it into visible text. The browser shows the literal characters instead of executing the script.

< becomes < - no more opening tags
> becomes > - no more closing tags
& becomes & - prevents entity injection
" becomes " - can't break out of attributes
' becomes ' - handles single-quoted attributes
/ becomes / - extra safety for closing tags
That nasty script tag? Now it's just: <script>alert("XSS")</script>

Context matters - a lot

Here's where I see people mess up: HTML encoding isn't a silver bullet. WHERE you're inserting user data determines WHAT encoding you need.

HTML body (<p>USER_DATA</p>): HTML entity encoding works here
HTML attributes (<div title="USER_DATA">): Need attribute encoding plus ALWAYS quote your attributes
JavaScript strings (var name = "USER_DATA"): Need JavaScript encoding, not HTML
URL parameters (<a href="/search?q=USER_DATA">): Need URL encoding
CSS values (<div style="background: USER_DATA">): Need CSS encoding
NEVER put untrusted data directly in script tags, event handlers, or CSS
I've seen HTML encoding applied in JavaScript context - completely useless against XSS

Framework auto-escaping: trust but verify

Modern frameworks escape output by default, which is amazing. But I've seen developers assume they're protected when they're not. Know your framework's limits.

React: {userInput} is safe. dangerouslySetInnerHTML is exactly as dangerous as the name suggests.
Vue: {{ interpolation }} is safe. v-html bypasses everything.
Angular: {{ interpolation }} is safe. [innerHTML] is not.
Django: Template variables are escaped. |safe filter turns off protection.
Auto-escaping only protects HTML body context - not JS, URLs, or CSS
Those bypass functions (dangerouslySetInnerHTML, v-html, |safe)? Only use with DOMPurify.
Server-side escaping is great, but your client-side code needs to be safe too

The XSS patterns I keep finding

Even with frameworks, certain patterns keep causing XSS. These slip through code reviews because they don't look obviously dangerous.

innerHTML: document.getElementById("output").innerHTML = userInput; - I see this constantly
href attributes: <a href="javascript:alert(1)"> executes even with HTML encoding
Event handlers: <div onclick="handler(USER_DATA)"> needs JS encoding, not HTML
URL fragments: location.hash is attacker-controlled. DOM XSS central.
JSON in scripts: <script>var data = USER_JSON;</script> can break out with </script>
Template literals with innerHTML: elem.innerHTML = `${userInput}` - concatenation with extra steps
Third-party embeds without sandboxing
Markdown renderers allowing raw HTML

Input validation is not output encoding

I get asked all the time: 'Can't I just filter out script tags?' No. These are different tools for different jobs.

Input validation: Restricts what enters your app (length, format, allowed characters)
Output encoding: Makes data safe when rendered in a specific context
Validation helps but can't prevent all XSS - legitimate text contains <, >, &
Encoding happens at output time - same data might appear in HTML, JS, URLs
Denylist filtering ("remove script tags") fails. Attackers have endless bypasses.
Allowlist validation ("only these characters allowed") is much stronger
I use both. Defense in depth.

Encoding functions I actually use

Don't write your own encoding function. Seriously. Use what's battle-tested.

JavaScript: Use textContent instead of innerHTML when inserting text
For HTML you need to render: DOMPurify.sanitize(userHTML)
PHP: htmlspecialchars($string, ENT_QUOTES, "UTF-8") - always set charset!
Python: html.escape(string) or let Django handle it
Java: OWASP Java Encoder - Encode.forHtml(input)
C#: System.Web.HttpUtility.HtmlEncode() or let Razor do it
Ruby/Rails: ERB escapes by default; CGI.escapeHTML() for manual
UTF-8 encoding specification prevents charset-based XSS bypasses

Content Security Policy: your backup plan

CSP is like a seatbelt for XSS. Even if your encoding fails somewhere, CSP limits what attackers can do. I implement it on everything now.

script-src 'self' blocks inline scripts - that <script>alert(1)</script> won't run
Without 'unsafe-eval', no eval() attacks
Nonce-based CSP: Only scripts with your secret nonce execute
My strict CSP: default-src 'self'; script-src 'self' 'nonce-abc123';
Start with report-only mode to catch issues without breaking things
CSP is defense in depth - it's not a replacement for encoding
Test thoroughly - strict CSP can break legitimate features

How I test for XSS

Every project at Šikulovi s.r.o. gets XSS testing. Manual plus automated. Here's my process:

Manual: <script>alert(1)</script> in every input field. Boring but essential.
Test all contexts: Where does input appear? HTML, attributes, JS, URLs?
My favorite payloads: <img src=x onerror=alert(1)>, " onmouseover="alert(1)
Automated: OWASP ZAP, Burp Suite scan everything
Static analysis: Semgrep, ESLint security plugins catch dangerous patterns in code
DevTools: Inspect rendered HTML to verify encoding is actually applied
Bypass attempts: URL encoding, mixed case, unicode alternatives
DOM XSS needs client-side testing - server scanners miss it completely

Patterns that keep me safe

After enough XSS incidents, these patterns became muscle memory. I don't even think about them anymore.

textContent/innerText for text. innerHTML only with DOMPurify.
createElement + appendChild instead of building HTML strings
Framework bindings (React JSX, Vue templates) instead of manual DOM work
DOMPurify.sanitize() before dangerouslySetInnerHTML or v-html
new URL(path, baseURL) for building URLs safely
Template parameters instead of string concatenation
Everything from users is untrusted: URL params, cookies, headers - all of it

The real defense is layers

No single technique stops all XSS. I use HTML encoding as the foundation, but it's part of a system: context-aware encoding, framework auto-escaping, Content Security Policy, and safe coding patterns.

Know where user data flows through your app. Encode appropriately at every output point. Don't bypass framework protections without sanitization. Add CSP. Test regularly. XSS is one of those vulnerabilities that never fully goes away - it just waits for the one place you forgot to encode.

FAQ

Is HTML encoding enough to prevent all XSS attacks?

HTML entity encoding prevents XSS in HTML body context but is insufficient alone. Different contexts (JavaScript, URLs, CSS, HTML attributes) require context-specific encoding. Additionally, some XSS vectors like javascript: URLs in href attributes execute despite HTML encoding. Use a combination of context-aware encoding, framework auto-escaping, Content Security Policy, and safe coding patterns for full protection.

Why does React have dangerouslySetInnerHTML if it is dangerous?

React escapes all values by default, but sometimes you genuinely need to render HTML—for example, content from a CMS, markdown rendering, or rich text editors. The deliberately alarming name reminds developers to sanitize content with a library like DOMPurify before using it. The name is a feature, not a bug—it forces conscious decisions about bypassing XSS protection.

What is the difference between encoding and sanitization?

Encoding converts special characters to safe representations (< becomes <) and preserves all input—the user sees exactly what they typed. Sanitization removes or modifies dangerous content (strips script tags, removes event handlers) and may alter the input. Use encoding when displaying plain text; use sanitization when you need to allow some HTML formatting while blocking dangerous elements.

Can Content Security Policy replace HTML encoding?

No, CSP is defense in depth, not a replacement for encoding. CSP limits what scripts can execute if XSS occurs but does not prevent all XSS impacts—attackers can still modify page content, steal form data via CSS, or exfiltrate data through allowed endpoints. Some applications cannot use strict CSP due to legacy code. Always encode output properly and use CSP as an additional layer.

How do I handle user-generated HTML content safely?

Use a sanitization library like DOMPurify (JavaScript), Bleach (Python), or HTML Purifier (PHP). These libraries parse HTML and remove dangerous elements (script, iframe) and attributes (onclick, onerror) while preserving safe formatting. Configure the sanitizer with an allowlist of permitted tags and attributes. Never build your own HTML sanitizer—the edge cases are numerous and subtle.

Why do XSS attacks still occur in applications using modern frameworks?

Modern frameworks provide strong defaults, but developers bypass them: using dangerouslySetInnerHTML/v-html without sanitization, inserting user data into href or src attributes, building DOM with innerHTML, embedding user data in inline scripts, or misunderstanding which contexts the framework protects. Framework protection only works when developers understand and follow the secure patterns.

Should I encode user input on storage or on display?

Encode on display (output), not on storage (input). Store data in its original form and encode when rendering. This allows the same data to be safely rendered in different contexts (HTML, JSON, CSV) with appropriate encoding for each. Encoding on input causes double-encoding issues and makes it impossible to properly encode for contexts you did not anticipate at storage time.

What are the most commonly exploited XSS patterns?

The most exploited patterns include innerHTML assignments with user data, href attributes with javascript: URLs, event handler attributes containing user data, JSON embedded in script tags without encoding, URL fragments processed unsafely in JavaScript, and markdown or rich text rendering without sanitization. Code review should specifically check for these patterns.

MŠ

Martin Šikula

Founder of CodeUtil. Web developer building tools I actually use. When I'm not coding, I experiment with productivity techniques (with mixed success).

More about me →LinkedIn

March 12, 202410 min

Base64 Encoding: When and Why to Use It

Learn what Base64 encoding is, how it works, and when to use it. Understand the difference between encoding and encryption, common use cases, and best practices for web development.

Base64 Encoder/Decoderbase64encodingweb development

March 28, 202412 min

SQL Injection Prevention: Why Parameterized Queries Beat Escaping Every Time

SQL injection is still the #1 web attack. Learn the real difference between escaping and parameterized queries — with code examples in Python, PHP, and Node.js — and why one method leaves you exposed.

SQL Formattersecuritysqldatabase

May 20, 202412 min

Understanding JWT Claims and Best Practices for Secure Token Authentication

Learn how JWT claims work, explore registered, public, and private claims, and discover security best practices for implementing JSON Web Tokens in your applications.

JWT Decoderjwtsecurityauthentication