Decoding "quoted-printable" -- Help needed

R.H. roland.huettmann at gmail.com
Tue Nov 12 14:44:42 EST 2019


Even with a lot of research and comparing functions in C# and Javascript, I
do understand it yet.

In E-Mail-bodies, the content parts are often either based64-encoded, no
problem with that, but there are also other encodings called
"quoted-printable". This is text that in my case needs to be converted to
UTF-8.

Now, here all characters that are not pure ASCII are marked with a equal
sign "=" (similar to the "%" in an URL encoded string) and the following
two characters define the byte value in Hex notation. There can be one, two
and even three separate byte values for a character encoded in UTF-8.

Example: "F=C3=BCr". This translates to the German Umlaut and would render
to the string "für". The "ü" is not part of the pure ASCII and therefore it
is encoded this way. It is an encoding specific for UTF-8.

Now, as you can see, there is not just one byte represented with "=C3".
There are actually two bytes "=C3=BC": represented in Hex by "C3" and "BC"
each individually converted to decimal notation as 195 and 188. If you
URL-encode the single bytes using "%" instead of "=" such as "%U3" it will
give it's own character whith will be "À". The URL-encoding of "%BC" gives
"Ä". So, this does not help. I have to somenow look at the two bytes
together.

Converting pure ASCI to Hex gives the correct result in other programs:
-- Link: https://www.rapidtables.com/convert/number/ascii-to-hex.html:
-- Enter: "ü"
-- Result: "C3,BC" --- what we are looking for when encoding: Two separate
byte representations.
-- But it only works when the character encoding is UTF-8.

How do I come from "=C3=BC" to codepoint("ü") = 252? What do I need to
calculate?
How do we  decode such "quoted-printable" encoded string to UTF-8?

Thanks in advance...)
Roland



More information about the use-livecode mailing list