Converting from unicode to ASCII

Lagi Pittas iphonelagi at gmail.com
Wed Sep 23 16:17:44 EDT 2020


Hi Jacq,

Since you don't do Chinese then I think what I suggested would work except
for bulgarian and other non latin alphabets. (which you could use a
translation table for).
It also is compatible with   all the previous names as the  extract and
tagging on the end will only happen with new unicode file names.

Since most names will only have a maximum of 3 or 4 diacritics or non ascii
characters within them (big assumption) - removing the non ascii but
tagging their  #&codes on the end with a positional value
gives you the readable filename and the uniqueness in 1 hit - or am i
missing something?

Lagi

On Wed, 23 Sep 2020 at 20:59, J. Landman Gay via use-livecode <
use-livecode at lists.runrev.com> wrote:

> On 9/23/20 1:26 PM, Richard Gaskin via use-livecode wrote:
> > My only suggestion was to change how the existing munger works to
> satisfy the two problem areas
> > identified: that names not be too long, and that any munger not remove
> so many characters as to
> > make the file name non-unique or empty.
>
> There's one more consideration though. The file name must be recognizable
> so that it can be
> replaced or updated on the server easily by a human. Hashes, UUIDs, html
> entity numbers, HEX,
> etc. would all be workable if that weren't the case.
>
> But consider the case where my client has made a few edits to the text in
> a stack and wants to
> replace the existing one. With descriptive names, the file is easy to find
> in the AWS bucket.
> But comparing long sequences of indecipherable text is cumbersome.
>
> I'm drifting toward the idea of removing non-ascii characters. That might
> satisfy all
> requirements, at least for now. We don't do Sanskrit or Chinese yet. Or
> alternately I could
> bite the bullet and convert my build tool to insert metadata into the
> clickable lists. That
> isn't terribly difficult, I was just wondering if there was a different
> way using what we
> already have.
>
> Devin, Paul and Scott suggested variations on the "removal" approach. I
> haven't tested much,
> but it looks like converting to UTF8 will quickly remove any non-ascii
> characters. Duplication
> of file names is unlikely given the way various product files are
> separately stored on AWS.
>
> But I'm still pondering. When I first asked the question, I wondered if
> there was a quick way
> to do what I want, though I didn't expect much. What I got back from this
> amazing list is a
> wealth of ideas and a very interesting discussion.
>
> --
> Jacqueline Landman Gay         |     jacque at hyperactivesw.com
> HyperActive Software           |     http://www.hyperactivesw.com
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>


-- 
KIndest Regards Lagi



More information about the use-livecode mailing list