Still struggling with Unicode

Graham Samuel livfoss at mac.com
Sun Aug 31 08:05:58 EDT 2014


Richmond -

Sorry to hear you're in hospital, Richmond - get better soon! I wonder what Bulgarian hospital food is like...

What you tell me fills me with trepidation. My notion is that there will be an app, totally out of my control, running on some box (PC. Mac or Linux - probably the latter) which accepts input of symbols not easy to key in at my end but easy for that user (perhaps they have an Anglo-Saxon or a Sanskrit, or Greek, or even a Mathematical keyboard). The word processor allows styles (bold, italic etc), and then it uses these frankly insane Unicode variants to incorporate some of the styles into the very characters (codepoints) themselves. Such text is then copied and pasted into my LC app, which is then doomed (if its job is to treat all or some of these characters as operators rather than just letters of some alphabet) to recognise all the variants, or put up with the fact that the user will be mystified when one version of a character (codepoint) is recognised and another, just a little bit different, isn't. This goes against years and years of treating styled text as an add-on to plain text which can be stripped out by program. Yuk.

Or maybe I was just unlucky in choosing pi. Pity, because that's one of the ones I'm really using in the current project.

Cheers

Graham

On 31 Aug 2014, at 10:03, Richmond <richmondmathewson at gmail.com> wrote:

> 
> On 30.08.2014 19:50, Graham Samuel wrote:
>> Thanks for the swift response, Richmond? Are you in Bulgaria?
> 
> Yup: in hospital :(
> 
>> 
>> I kind of understand what you mean. But using your example, if someone had an Anglo-Saxon word processor, and chose to italicise 'thorn' (if that is meaningful!), and then pasted a string containing this from the word processor to an LC program, would it have to be recognised by the program separately from the 'plain text' version of 'thorn' or what? It seems to me that if Unicode actually includes styling information (italic, bold, different colours???) we are all doomed!
> 
> Unicode does not include styling info as such.
> 
> 1D70B lies in the Alphanumeric mathematical symbols area [ check that here: http://www.unicode.org/charts/ from now on you may find that this website becomes your second home]  and is, indeed an italicised Pi. All this really means is that whoever on the Unicode consortium's committee who makes the judgements calls re Maths symbols has made a judgement call.
> 
> However that Maths area [ http://www.unicode.org/charts/PDF/U1D400.pdf ] contains several Pi symbols: normal Pi, italic Pi, bold Pi with wiggly legs, apple Pi, and so on:
> 
> 1D6D1, 1D70B, 1D745, 1D77F frankly one wonders what that committee member was smoking :)
> 
> However, you could be 'boring' and stick with 3C0 from the 'normal' Greek area.
> 
> make yourself a stack (using version 7) with one button and 2 flds ("ff" & "gg")
> and put this code into your button:
> 
> on mouseUp
>   ask "Character Number"
>   if it is not empty then
>      put it into MAGIC
>      put MAGIC into fld "ff"
>      put "0x" & MAGIC into BMAGIC
>      put numToCodepoint(BMAGIC) into fld "gg"
>      end if
> end mouseUp
> 
> the great advantage about version 7 is that one doesn't have to mess around with a calculator
> converting Hex numbers into Decimal ones.
> 
> then you will be able to check whether the fonts on your system have the glyphs for those
> Unicode code points.
> 
> Richmond.
> 
>> 
>> TIA
>> 
>> Graham
>> 
>> On 30 Aug 2014, at 18:44, Richmond <richmondmathewson at gmail.com> wrote:
>> 
>>> On 30.08.2014 19:29, Graham Samuel wrote:
>>>> I know people are lining up for the conference (wish I was there!) so I am not sure who's listening, but here goes.
>>>> 
>>>> On advice from Fraser Gordon, I've been trying to use LC 7 to experiment with Unicode. I've had some tricky problems with the latest version in the LC 'downloads' catalogue (DP10), so I'm having to work somewhat in the abstract (I mean I can't get my actual app script to run, so I'm just using the Message Box).
>>>> 
>>>> I have been looking on the internet at various representations of Unicode characters (OK, codepoints). It seems that there are some forms that include formatting information and some that don't. For example, choosing that old chestnut, Greek letter lower case pi, a search appears to reveal:
>>>> 
>>>> U+1D70B seems to represent it in italic (written in LC as 0x1D70B)
>>>> U+1D7B9 in sans-serif bold (written in LC as 0x1D7B9)
>>>> 
>>>> but
>>>> 
>>>> U+0x3C0 appears to be pi with the formatting ignored,
>>>> 
>>>> and finally I swear that some PDF I downloaded from the Unicode Consortium gave
>>>> 
>>>> U+1D77F as a legitimate representation of pi - (written 0x1D77F)
>>>> 
>>>> Sure enough, in the LC 7 message box, ALL these generate a pi glyph.
>>>> 
>>>> Can anyone explain what this means, and what I should do if someone pastes a Unicode string containing pi into my app - I mean how should I recognise it? Can I strip off the style info, and if so what is the rule for doing that?
>>>> 
>>>> If this happens for a little old symbol that we've all been using since childhood, what chance to we have with more exotic stuff?
>>>> 
>>>> Puzzled
>>>> 
>>>> Graham
>>>> 
>>> The Unicode standard organises glyphs into writing-system families.
>>> 
>>> So; thinking on the fly about pi I would expect it to be in:
>>> 
>>> 1. Greek script.
>>> 
>>> 2. Coptic script.  and
>>> 
>>> 3. Mathematical signs. To say the least.
>>> 
>>> MY recommendation is to go here: http://www.unicode.org/charts/
>>> 
>>> and find the chart that has a Pi at the lowest Unicode address.
>>> 
>>> Just recently I fell into a hole by using an Anglo-Saxon 'thorn' from the "wrong place";
>>> by "wrong place" I mean that the character range the thorn I chose was not included
>>> in the standard fonts issued with operating systems.
>>> 
>>> Richmond.
>>> 
>>> _______________________________________________
>>> use-livecode mailing list
>>> use-livecode at lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> 
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode




More information about the use-livecode mailing list