What is URL Encoding? A Plain-English Guide

You’ve seen them before — strings like %20, %3A, %2F scattered through URLs. They look cryptic, but they’re actually one of the simplest specifications on the web: URL encoding, also called percent-encoding.

The short version: URL encoding is how URLs carry text that contains characters the URL syntax wouldn’t otherwise allow. A space, an emoji, a Japanese ideograph, a comma — all need to be represented as bytes the URL spec considers safe. Percent-encoding is the agreed-upon translation table.

Why URLs need encoding at all

URLs travel through HTTP. The HTTP protocol’s request line looks like this:

GET /search?q=hello HTTP/1.1

That line is whitespace-delimited. The server splits on spaces to find the method (GET), the path (/search?q=hello), and the protocol version (HTTP/1.1). If a space appeared in the path, the server would interpret everything after it as the version — broken request.

Same story for other characters that have meaning in URLs: ? introduces the query string, & separates query parameters, # introduces the fragment. Characters with structural meaning can’t also appear as data.

The encoding rule

For each byte you want to encode, write a percent sign followed by the byte’s value as two hex digits. That’s the entire rule.

A space has byte value 0x20, so it’s encoded as %20. A question mark is 0x3F → %3F. The ampersand: 0x26 → %26.

For non-ASCII characters (like é or 你), the character is first encoded to its UTF-8 bytes, and each byte is then percent-encoded. The letter é is two bytes in UTF-8 (0xC3 0xA9), so it becomes %C3%A9.

Which characters need encoding

RFC 3986 calls characters that never need encoding the unreserved set: letters A-Z and a-z, digits 0-9, and four punctuation marks - _ . ~. Anything else either has reserved meaning in URL syntax or isn’t a safe ASCII character.

In practice, you can over-encode safely (encoding more characters than strictly required is always valid), but under-encoding breaks things. The conservative move when you’re unsure is to encode anything that isn’t a plain letter or digit.

The two flavors of URL encoding

You’ll see two slightly different conventions in the wild.

Standard percent-encoding (RFC 3986) encodes a space as %20 and treats + as a literal plus sign. This is what URLs use in path components and what encodeURIComponent produces in JavaScript.

Form encoding (the application/x-www-form-urlencoded MIME type) encodes a space as + and uses %2B for a literal plus. This is what HTML forms produce when submitted via GET — you’ll see it most often in query strings.

Both are valid in different contexts. The decoder in our tool handles either correctly via the “treat + as space” toggle.

The classic mistake: double encoding

The most common URL-encoding bug is encoding something that’s already encoded. A %20 becomes %2520, because % itself is 0x25 in ASCII, which encodes to %25.

Symptom: you see literal %20 instead of spaces in the rendered output. Cause: somewhere in your data pipeline, something double-encoded. Fix: decode twice, or find the layer that’s adding the second encoding pass.

That’s the whole concept

URL encoding is one of the small set of web standards that actually is as simple as it appears: percent-sign, two hex digits, you can do it in your head. Knowing it well saves an enormous amount of debugging time when something looks wrong in an address bar or a server log.

Found this useful? Try the URL decoder or browse all tools.

What is URL encoding?

Why URLs need encoding at all

The encoding rule

Which characters need encoding

The two flavors of URL encoding

The classic mistake: double encoding

That’s the whole concept

From the blog.

Why URLs need encoding at all

The encoding rule

Which characters need encoding

The two flavors of URL encoding

The classic mistake: double encoding

That’s the whole concept

From the blog.

Percent-encoding and RFC 3986

URL encoding in JavaScript