How URL Encoding Works — Step-by-Step Guide

The problem URL encoding solves

URLs travel through HTTP, which restricts them to a narrow set of ASCII characters. Anything else — spaces, accented letters, emoji, line breaks — would break the HTTP request line. URL encoding is the workaround: a way to represent any byte using only ASCII characters that are guaranteed safe in URLs.

The algorithm in three lines

For each character that needs encoding, find its byte value (as UTF-8), and replace it with a % followed by two hexadecimal digits representing that byte. That’s it. ! is byte 0x21, so it becomes %21. A space is 0x20 so it becomes %20. The check mark ✓ is three bytes in UTF-8 (0xE2 0x9C 0x93), so it becomes %E2%9C%93.

Which characters get encoded

RFC 3986 splits characters into three groups:

Unreserved characters are always safe and never need encoding: A-Z a-z 0-9 - _ . ~.

Reserved characters have special meaning in URL syntax: ! * ' ( ) ; : @ & = + $ , / ? # [ ]. These must be encoded when they’re part of data (rather than structure).

Everything else — control characters, spaces, non-ASCII bytes — must always be encoded.

How browsers handle it

Modern browsers do this automatically when you type into the address bar or submit a form. They’ll encode spaces as %20 in paths but + in query strings (the form-encoded variant). They’ll display the decoded form for readability but send the encoded form over the wire.

This dual representation is the source of many confusing bugs: the URL in your address bar looks different from what your server actually receives, and the difference matters for debugging.

The same algorithm, in code

If you want to do this in your own code, every major language has a built-in:

JavaScript: encodeURIComponent(text) and decodeURIComponent(text)
Python: urllib.parse.quote(text) and urllib.parse.unquote(text)
PHP: urlencode($text) and urldecode($text) (form-encoded variant) or rawurlencode/rawurldecode (standard)
Java: URLEncoder.encode(text, "UTF-8") and URLDecoder.decode(text, "UTF-8")
Go: url.QueryEscape(text) and url.QueryUnescape(text)
C#: Uri.EscapeDataString(text) and Uri.UnescapeDataString(text)

Edge cases worth knowing

Plus signs and spaces. In query strings, + means space (the form-encoded variant). In path components, + is literal. The same character has different meanings depending on URL position.

Tilde ~. Officially unreserved (no encoding needed). Older systems sometimes encoded it as %7E defensively. Both forms work.

Reserved characters in data. The string a=b&c=d as a query parameter value would be parsed as two parameters (a=b and c=d) unless you encode the = and &.

Percent itself. A literal % in URL data must be encoded as %25, because % introduces an encoded sequence. Skipping this is the cause of most “double encoding” bugs.

Privacy of this page

None of the explanations on this page require submitting anything. The decoder on the homepage runs entirely in your browser. No input is sent to our server.

Inside URL encoding.