Merge and unicode

J. Landman Gay jacque at hyperactivesw.com
Tue Sep 10 14:14:40 EDT 2019


I extracted an example. The main issue is curly quotes. The text came 
from FileMaker in UTF8, which I textDecode to UTF16. You can assume that 
all text is LC native throughout the app.

Here is the template I use for merge:
<p leftindent="10" spacebelow="20"><span metadata="[[tMETADATA]]"><font 
size="16" color="#C77C02">[[tSECTION]]</font>[[tCONCEPT]]</span></p>

In the field, this text is displayed accurately with curly quotes:
The New Testament Scholar
 <soft break> “Dare to reason!”

Here is a result of the merge:
<p leftindent="10" spacebelow="20" bgcolor="#FFDD71"><span metadata="The 
New Testament Scholar	EN07_ힿ�Dare to 
reason!ힿ�"><font size="16" color="#C77C02">The New 
Testament Scholar</font>“Dare to reason!”</span></p>

Notice that the displayed text uses entity names (&ldquo, &rdquo) while 
the metadata which was created from the same text block as the field 
text has changed the quotes to two numbers in the high 5000s with no 
difference between left and right quotes. I was unable to paste the 
actual text here, as my mail client refused to render it, but the two 
numerical references appear as a single pictograph in LC's variable 
watcher, and do not match the card path I need, which in this case is:
EN07_The New Testament Scholar<tab>“Dare to reason!”

Maybe you can make sense of this? I've written an ugly workaround that 
pieces together the reference I need, but it would be better if I could 
just use the metadata. The metadata works fine as long as there are no 
quotes.

On 9/9/19 11:35 PM, dsc--- via use-livecode wrote:
> I think I'm doing this wrong. This seems to work, too.
> 
> on mouseup
>     put empty into field 1
>     put numToCodepoint(0x2200) into x
>     put numToCodepoint(0x1040F) & "V-" into y
>     put merge(" é{ [[x]] }é [[y]]") into field 1
> end mouseup
> 
> 
>> On Sep 9, 2019, at 10:25 PM, dsc--- via use-livecode <use-livecode at lists.runrev.com> wrote:
>>
>> And this, too, looks OK to me.
>>
>> on mouseup
>>    put empty into field 1
>>    put "A" into field 1
>>    get numToCodepoint(0x2200) & numToCodepoint(0x1040F) & "V-"
>>    set the metadata of char 1 of field 1 to it
>>    put the metadata of char 1 of field 1 after field 1
>> end mouseup
>>
>> I guess the problem is in the merge as you thought.
>>
>> I did notice in the dictionary that setting the metadata of a line is not the same as setting the metadata of all of the characters of the line.
>>
>> Dar Scott
>>
>>
>>> On Sep 9, 2019, at 8:58 PM, Dar Scott Consulting via use-livecode <use-livecode at lists.runrev.com> wrote:
>>>
>>> This quick check seems to work for me.
>>>
>>> on mouseup
>>>
>>> put "A" into field 1
>>>
>>> set the metadata of char 1 of field 1 to "é"
>>>
>>> put the metadata of char 1 of field 1 after field 1
>>>
>>> end mouseup
>>>
>>>
>>>> On Sep 9, 2019, at 8:32 PM, J. Landman Gay via use-livecode <use-livecode at lists.runrev.com> wrote:
>>>>
>>>> Well, I've made some changes to the code since I started urlEncoding the text before merging so I'll check that again. Paul is right that unicode in htmltext needs to be in hex, but the numbers I'm getting back are very high (8,000+) and render in the field as strange pictographs. Elsewhere where there is no merge, curly quotes translate to the named quote or apostrophe entities and are correct.
>>>>
>>>> By metadata I mean the LC term (see the dictionary) that allows you to attach some text to a field text chunk. The metadata isn't displayed in the field but you can use it for anything you want. In my case the field is a list of clickable entries in a table of contents, each with its own metadata attached that provides a path to the stack and card the entry needs to open.
>>>>
>>>> When I use normal LC text as metadata, diacriticals aren't rendered correctly (curly quotes become question marks,) the path is therefore incorrect and the click goes nowhere.
>>>>
>>>> Since LC is supposed to be unicode throughout, I'd expect metadata to be compatible. The same text appears correctly when not used as metadata.
>>>> --
>>>> Jacqueline Landman Gay | jacque at hyperactivesw.com
>>>> HyperActive Software | http://www.hyperactivesw.com
>>>> On September 9, 2019 7:25:28 PM Dar Scott Consulting via use-livecode <use-livecode at lists.runrev.com> wrote:
>>>>
>>>>> I think you are trying to think too much about the LC implementation of text. Maybe.
>>>>>
>>>>> Text in LC is an abstraction of a sequence of code points. Whether it is UTF16 or not is hidden to me. (mostly)
>>>>>
>>>>> So,
>>>>>
>>>>> get textDecode( binaryFromServer, "UTF-8" )
>>>>>
>>>>> should put that into the correct form, if it is really UTF-8.
>>>>>
>>>>> A data (binary bytes) is interpreted as native encoding if one tries to use it as text. I recommend against this. I try to always textDecode() everything coming in, but I make exceptions at times for ASCII.
>>>>>
>>>>> I'm not sure what you mean by metadata. Are you referring to HTTP content-type?
>>>>>
>>>>> Sorry, if I am off on a bunny trail...
>>>>>
>>>>> Dar
>>>>>
>>>>>> On Sep 9, 2019, at 4:38 PM, J. Landman Gay via use-livecode <use-livecode at lists.runrev.com> wrote:
>>>>>>
>>>>>> It's UTF8 text from a server, which I textDecode to UTF16. When I use the UTF16 text in a merge, diacriticals and/or curly quotes get mangled. (Same with setting metadata on field text too.)
>>>>>>
>>>>>> On 9/9/19 4:16 PM, Dar Scott Consulting via use-livecode wrote:
>>>>>>> I'm not sure I understand.
>>>>>>> Do you mean "encoded to UTF-16"? In that case you should decode that to convert it to internal text. And then try merge. (Which still might have problems, I suppose.)
>>>>>>>> On Sep 9, 2019, at 12:08 PM, J. Landman Gay via use-livecode <use-livecode at lists.runrev.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> It seems that the merge command doesn't respect unicode. Does anyone have a workaround? The text I'm inserting is already decoded to UTF16.
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jacqueline Landman Gay         |     jacque at hyperactivesw.com
>>>>>>>> HyperActive Software           |     http://www.hyperactivesw.com
>>>>>>>>
>>>>>>>>


-- 
Jacqueline Landman Gay         |     jacque at hyperactivesw.com
HyperActive Software           |     http://www.hyperactivesw.com





More information about the use-livecode mailing list