Decoding "quoted-printable" -- Help needed -- Reopened - Solved 2nd
R.H.
roland.huettmann at gmail.com
Thu Nov 14 18:12:10 EST 2019
I am very sorry that I am overstressing this list. I keep on answering my
own questions.
The function needs to address bytes. I found this looking at some similar
C# code:
# Code snippet from C#
# Source:
https://stackoverflow.com/questions/32083334/consecutive-control-characters-in-quoted-printable-not-decoding-correctly
---
string sHex = input;
sHex = sHex.Substring(i + 1, 2);
int hex = Convert.ToInt32(sHex, 16);
byte b = Convert.ToByte(hex);
output.Add(b);
i += 3;
---
I oversaw that the value must be a byte value. Anyway, that is all new to
me.
So, the correct and tested converting to and from "quoted-printable" with
encoded UTF8 in LiveCode >7 is:
---
local tChar
local tItem
local tCodedChar
local tCodePoint
local tEncoded
local tDecoded
set the itemdelimiter to "="
// ENCODE EXAMPLE
put "€" into tChar
put textEncode ( tChar , "UTF-8" ) into tCodedChar
repeat for each codePoint tCodePoint in tCodedChar
put "="& baseConvert ( byteToNum ( tCodePoint ) , 10 , 16 ) after
tEncoded
end repeat
put tEncoded into msg ---> "=E2=82=AC" - the quoted-printable UFT-8
encoding of the Euro symbol "€"
// DECODE EXAMPLE
put "=E2=82=AC" into tEncoded
delete char 1 of tEncoded
repeat for each item tItem in tEncoded
put numToByte ( BaseConvert ( tItem , 16 , 10 ) ) after tDecoded
end repeat
put textDecode ( tDecoded , "UTF-8" ) into msg --> the Euro symbol "€"
---
Thanks to all.
Given a bit of time, I will post a solution for UTF8 quoted-printable
encoded E-Mail blocks of text in the Forum.
Roland
---
Am Do., 14. Nov. 2019 um 20:41 Uhr schrieb R.H. <roland.huettmann at gmail.com
>:
>
> Oh, sorry, I was too quick declaring a solution.
>
> Even though the code of the function works fine, the result also converts
back, but the "quoted-printable" or "UTF-8" code expects that each
codepoint is encoded in Hex with just two ASCII letters representing a
codepoint.
>
> For example, for the Euro symbol "€" we have three codepoints.
> The function below converts to "=E2=201A=AC" while it must be "=E2=82=AC".
> The "=" sign is just a delimiter in quoted-printable.
>
> Now, I do not know what is wrong in my thinking as I am not getting quite
the same results.
> (The result is ok for other symbols such as 'ü'.)
>
> EXAMPLE:
>
> put "€" into tChar
> // First encode to UTF-8:
> put textEncode(tChar,"UTF-8") into tCodedChar
> // Repeat for each codepoint in the UTF-8 char
> repeat for each codePoint tCodePoint in tCodedChar
> // Encode each codepoint to its integer expression and convert to
Hex value:
> put "="& BaseConvert ( codePointToNum (tCodePoint) , 10 , 16 )
after tEncoded
> end repeat
> put tEncoded into field "Show Codepoints" -- Expected ASCII representing
Hex numbers
> -- Result: "=E2=201A=AC" -- Instead of "=E2=82=AC" , but valid and
working.
>
> The actual "correct" UTF-8 result can be tested here:
http://www.endmemo.com/unicode/unicodeconverter.php
>
> What am I missing?
>
> Thanks a lot
> Roland
More information about the use-livecode
mailing list