(picture)

September 29, 2002

I missed an encoding.

Weblogging truns out to be a good way to gnaw at unresolved issues. I had an open problem -- a weirdly subtle one -- with characters moving through the pipe, and although just writing about data pipes wouldn't fix the problem, it did help me get to the "aha!" point of seeing what I'd missed.

The missing piece: UTF-8 encoding.

The code I'm writing here is JavaScript, and by default all strings are considered Unicode. So, to get (say) a "trademark" symbol, I can write String.fromCharCode(0x2122), or write XML markup saying "™" and get a character with numeric value way up in the thousands.

Weirdly, these very-high characters passed right through the pipe - in one end, out the other - in both directions, without breaking. Yet more mundane symbols such as the pound sign ("£") and Latin-1 accented characters would detonate along the way. (The actual symptom was reporting that my SQL Server was stopped, but that was a red herring).

So, that pound sign was character 163 ("%A3" after URL-escaping), but the URL wanted UTF-8, where the pound becomes two bytes: 194 163, or %C2%A3. UTF8 is wonderful and slightly strange - don't ever try working backwards through a UTF8 string!

Comments

We are having some fun with the Euro sign, Notes databases, Domino HTTP stack, WebSphere and Weblogic Servers and a portal framework on top. And don't forget the ever stupid &%$§ Verity search engine.

Post a comment
Name:


Email Address:
(optional)

URL:
(optional)

Comments:


Remember my name