Percent-Encoding and RFC 3986: The Specification Explained

The official specification for URL encoding is RFC 3986 — Uniform Resource Identifier (URI): Generic Syntax. It supersedes RFC 2396 and replaces RFC 1738. If you ever need the canonical answer to a question about URL encoding, RFC 3986 is where to look.

This article walks through the parts of the spec that matter for everyday use.

The three character sets

RFC 3986 defines three categories of characters with different encoding rules.

Unreserved characters never need encoding. The set is small and memorable:

A-Z   a-z   0-9   -   _   .   ~

That’s 66 characters total — the alphabet (52), digits (10), and four punctuation marks.

Reserved characters have syntactic meaning in URLs. They’re split into two subsets.

The gen-delims (general delimiters) separate the major URL components:

:   /   ?   #   [   ]   @

The sub-delims (subcomponent delimiters) separate parts within a component — primarily within query strings:

!   $   &   '   (   )   *   +   ,   ;   =

Everything else — control characters, spaces, non-ASCII bytes — must always be percent-encoded.

When reserved characters get encoded

The rule: a reserved character must be encoded when it appears as data rather than structure.

Example: a slash / in a path means “next path segment.” If you have a product name like “A/B Testing,” and you want it as a path segment, you must encode that slash as %2F — otherwise the server will see two segments.

Inside a query value, the ampersand & separates parameters. If your data legitimately contains an ampersand (a company name like “Smith & Sons”), you must encode it as %26.

The encoding rule itself

From RFC 3986 Section 2.1:

A percent-encoded octet is encoded as a character triplet, consisting of the percent character “%” followed by the two hexadecimal digits representing that octet's numeric value.

That’s the whole rule. %20 represents byte 0x20 (decimal 32, which is space in ASCII).

The spec also says hex digits can be either case (%2F and %2f are equivalent), but recommends uppercase for canonicalization — if you’re comparing two URLs for equality, normalize both to uppercase hex.

Reserved vs. unreserved: the practical rule

RFC 3986 lays out one important guarantee about unreserved characters: they may always be decoded without changing the meaning of the URI. So if you see %7E in a URL, you can replace it with ~ safely.

The same isn’t true for reserved characters. %2F and / are NOT interchangeable — one is data, the other is structure. Many HTTP servers actively reject URLs that contain encoded slashes in paths for this reason.

Non-ASCII characters: the UTF-8 convention

RFC 3986 doesn’t specify how to handle non-ASCII characters at the character level — it operates on bytes. The convention, codified later in RFC 3987 (Internationalized Resource Identifiers, IRIs), is to encode the character as UTF-8 first, then percent-encode each byte.

The letter é is Unicode code point U+00E9. In UTF-8, that’s two bytes: 0xC3 0xA9. Percent-encoded: %C3%A9.

This wasn’t always universal — older systems used the character set of the page that produced the URL, which is why you sometimes see %E9 for é instead (that’s ISO-8859-1, not UTF-8). The legacy issue is the reason our decoder offers a character-set selector.

The URI vs URL vs URN distinction

RFC 3986 is technically about URIs (Uniform Resource Identifiers), the umbrella concept that includes:

URLs — identifiers that locate a resource (most everyday web addresses)
URNs — identifiers that name a resource without locating it (like urn:isbn:0451450523)

The encoding rules are identical for both. In modern usage, “URL” and “URI” are used interchangeably; the WHATWG spec that browsers actually implement is just called the URL Living Standard, and that’s the spec your browser’s new URL() constructor follows.

When RFC 3986 and the browser spec disagree

The WHATWG URL spec is more permissive than RFC 3986 in a few places. It accepts URLs with some characters that RFC 3986 considers invalid, normalizes them silently, and handles whitespace inside URLs in defined ways.

Practical impact: a URL that’s technically invalid per RFC 3986 may still parse cleanly in your browser. Don’t rely on this for portable code — older HTTP libraries follow RFC 3986 strictly and will reject URLs the browser accepts.

Why this matters

Most of the time you don’t need to know RFC 3986. The browser handles it; the framework handles it; the library does the right thing. But the edge cases — OAuth signing, deep URL routing, custom protocols, URL parsers in different languages disagreeing — require knowing the actual rules, not the rules-of-thumb. RFC 3986 is shorter than you’d expect (about 60 pages) and worth reading once.

Found this useful? Try the URL decoder or browse all tools.

Percent-encoding and RFC 3986

The three character sets

When reserved characters get encoded

The encoding rule itself

Reserved vs. unreserved: the practical rule

Non-ASCII characters: the UTF-8 convention

The URI vs URL vs URN distinction

When RFC 3986 and the browser spec disagree

Why this matters

From the blog.

The three character sets

When reserved characters get encoded

The encoding rule itself

Reserved vs. unreserved: the practical rule

Non-ASCII characters: the UTF-8 convention

The URI vs URL vs URN distinction

When RFC 3986 and the browser spec disagree

Why this matters

From the blog.

What is URL encoding?

URL encoding in JavaScript