offset() functions and Unicode: SOLUTIONS
Slava Paperno
slava.paperno at cornell.edu
Sun Jun 19 22:30:56 EDT 2011
I thought I would broadcast some good news for a change.
Good News 1) I was prepared to see that the offset() function is useless
with bilingual text (i.e. a mix of Roman and non-Roman, double-byte
characters) for the same reason as mouseCharChunk(), but no, it works fine.
I guess the mouseCharChunk hasn't been updated the way offset() has. If I am
wrong, please let us know.
Bad News 1) wordOffset() works for some words, but fails if your text
contains an upper case Russian R (decimal 1056, bytes 32 and 4), and
probably other similarly confusing bytes. Don't use it for Unicode.
Good News 2) The "find " command works fine in bilingual texts, perhaps
because it matches a string and doesn't tell us where the match is. Even the
repeated "find " works fine and finds (and highlights) the next occurrence.
Good News 3) Although the replace() function fails for bilingual fields, you
can work around the problem like so:
on mouseUp
--this handler amounts to a home-made custom replace() function for
UTF-16 fields;
--it searches field InputField for word 1 from field WordToFind
--and replaces it with word 2 from field WordToFind;
--here is what we do:
--1) text in field WordToFind is stored in a variable and converted to
UTF-8
--2) word 1 and word 2 are retrieved from that UTF8 variable (the can't
be reliably retrieved directly from the field)
--3) text from InputField (the field to search in) is stored in a
variable and converted to UTF8
--4) offset() is called to find the position of the search target in the
UTF8 input string
--5) the final result created by concatenating the text in the input
string up to the offset & the replacement string & the text in the input
string that follows the search target
--6) this final result is converted to UTF-16 and displayed in the field
local locInputText
local locFindReplaceText
local locStrToFind, locReplacementStr
local locOffset
local locHead, locTail
set caseSensitive to true
put the unicodeText of field "WordToFind" of this card into
locFindReplaceText
put uniDecode(locFindReplaceText, "UTF8") into locFindReplaceText
put word 1 of locFindReplaceText into locStrToFind --this is UTF8;
although word 1 would have been retrieved successfully form the original
UTF16 string,
-- word 2 and later words would not, especially if some of the
double-byte characters happened to be byteNum 32 followed by byteNum X, like
the Russian upper case R (decimal 1056)
put word 2 of locFindReplaceText into locReplacementStr
--this direct approach will not work with Unicode:
-- replace locStrToFind with locReplacementStr in field "InputField" of
this card
--until LC engineers create a replaceUnicode command, use this approach:
put the unicodeText of field "InputField" of this card into locInputText
--UTF16
put uniDecode(locInputText, "UTF8") into locInputText --UTF8
put offset(locStrToFind, locInputText) into locOffset
if (locOffset is not an integer) or (locOffset is 0) then
set the unicodeText of field "SearchResult" to uniEncode("Your word 1
was not found.", "UTF8")
exit mouseUp
end if
put char 1 to (locOffset - 1) of locInputText into locHead
put char (locOffset + length(locStrToFind)) to -1 of locInputText into
locTail
put locHead & locReplacementStr & locTail into locInputText --UTF8
set the unicodeText of field "InputField" of this card to
uniEncode(locInputText, "UTF8")
end mouseUp
Slava
More information about the use-livecode
mailing list