Is anyone working with Arabic?

Mark Schonewille m.schonewille at economy-x-talk.com
Tue Aug 31 09:39:58 EDT 2010


Hi Lars,

I did a lot of work with Arabic during several years. It has always been a bit of a struggle for me. Note that Arabic is unicode text, while the filter command uses ASCII. Unicode is binary and can contain return characters in places where you don't expect them (as well as quotes and tabs). Therefore, you can't filter unicode text directly. You can filter Arabic text after decoding the Arabic text and the filter into hexadecimal data. After decoding, replace the A0 with cr. After filtering, replace cr with A0 and encode to binary.

Applying the sort command to binary data doesn't make sense. You will have to write your own sorting routine. Of course, you can decode the binary data and sort that, but the result will be dissatisfactory.

--
Best regards,

Mark Schonewille

Economy-x-Talk Consulting and Software Engineering
Homepage: http://economy-x-talk.com
Twitter: http://twitter.com/xtalkprogrammer
KvK: 50277553

From 15th August, we'll have time for new projects! Be the first in line and contact me now!

Download the Installer Maker plugin for Runtime Revolution at http://qurl.tk/ce

On 31 aug 2010, at 14:49, Lars Brehmer wrote:

> I am working with Arabic for the first time and have a problem. First of all, I do not speak, read or write Arabic, so I can't spot things that might be obvious.  This project if for a friend who is learning Arabic. He will eventually enter his own content, and my Arabic content is just gibberish for testing purposes.
> 
> So far, most things work as predicted - the Arabic font, right to left text, etc. (looks weird by the way - the insertion point stays at the right as new characters are added/deleted at the left)
> 
> The only problem so far is this;
> 
> The bilingual content is stored as a tab delimited list in a custom property. When I filter the custom property without empty, I lose all of the content of the custom property except the first line or 2. I finally noticed after much frustration that this seems to happen when there is more than one arabic word in an item, that is I lose all of the content starting with the first space in an Arabic item.
> The only thing I could think of is that the ASCII number of a space using the Arabic keyboard layout differs from the ASCII number of a European keyboard layout, but they are both 32.
> 
> I found out a long time ago that when a " appears in such a tab delimited list custom property, it causes weird things to happen, but I can easily work around it, and it never caused content in the custom property to just disappear while filtering without empty. Is there something about Arabic text that I obviously don't know? For now, all I can do is NOT filter without empty, which I do routinely in scripts because an empty line in the custom property would be a big problem for this particulat stack.
> 
> Anyone have an idea about this?
> 
> Also, when Runrev "supports" Arabic, does that mean that sorting aplphabetically works correctly? I can see that the text lines are indeed sorted by the first letter, but I obviously can't tell whether it is alphabetical.
> 
> Cheers,
> 
> 
> Lars





More information about the use-livecode mailing list