How We Tidy Your Name for the Puzzle
Before your name becomes a prime, it goes through a tiny grooming routine. Here's why, and what it does.
There’s a famous puzzle that has tripped up software engineers for decades: how many ways can a computer write the name José?
The answer, depressingly, is several. You can write the é as one character (U+00E9). Or you can write it as a regular e followed by an invisible “put an accent on the previous letter” mark (U+0065 + U+0301). On screen they look identical. To a computer, they’re different bytes — and a different sequence of bytes will produce a different prime.
If we ever want to take your prime and recover your name from it (and we do — that’s the whole a3.0 trick), we need both ends to agree on exactly what letters went in. So we run every name through a small, predictable cleanup before the math starts.
The grooming routine, in order
Five steps, performed every time:
- Unicode NFKD normalization. A standard recipe for unscrambling all the different ways a single character can be written, into one canonical form.
- Strip combining marks. That sneaky floating accent we just talked about? It gets dropped.
écollapses toe,ñton,ütou. (Your name is still yours — we just need a single agreed-upon spelling.) - Uppercase everything.
Alice,alice, andALICEall becomeALICE. - Collapse runs of whitespace. Two spaces become one. A tab becomes one space. The line of accidental spaces from your phone’s autocomplete becomes one space.
- Trim the edges. Any leading or trailing whitespace is removed.
By the time we’re done, José Martí has become JOSE MARTI. Predictable, reversible, no hidden surprises.
What’s allowed in the tidy version
After cleanup, every character must come from this small, friendly set:
- The 26 capital letters
A–Z - A single space
' ' - An apostrophe
' - A hyphen
-
That’s 29 symbols total. Enough for MARY O'BRIEN-SMITH or JEAN-LUC PICARD, but not enough for Mr. ☃️ — Esq., which we’ll politely refuse.
If anything else slips through (a stray punctuation mark, an emoji, a digit), we stop right there and ask for a cleaner version. Better to ask now than to mint you a beautiful prime that nobody can decode later.
Two limits, one stricter than the other
Names have to fit into a fixed amount of space inside the puzzle. We use two rules:
- The format limit: 63 characters. The bit layout reserves 6 bits to record the character count, and 2⁶ = 64, so anything longer literally won’t fit in the box.
- The product limit: 25 characters. Stricter, on purpose. It keeps your prime from getting absurdly long, and it keeps the certificate readable.
So if your full given name is Maximilian Eduardo de Bourbon-Battenberg-Schleswig-Holstein, we’ll need a slightly snappier version. Sorry, Max.
Why bother with all this?
Two reasons.
Reversibility. The decoder will run the same name through the same number-crunching machinery and expect to get the same bits back. If our two ends disagree about whether É is one character or two, the bits diverge and the decode fails. One canonical form keeps both halves of the puzzle in lockstep.
Human equivalence. People type their own names inconsistently. José one day, Jose the next, JOSÉ on a passport application. By collapsing all of those down to a single representation, we make sure your prime is yours — no matter how you typed it the day you bought it.
It’s not glamorous work. It’s the math equivalent of wiping the table before you set out the puzzle pieces. But every clean prime starts with a clean name.