Converting from unicode to ASCII
J. Landman Gay
jacque at hyperactivesw.com
Wed Sep 23 15:59:01 EDT 2020
On 9/23/20 1:26 PM, Richard Gaskin via use-livecode wrote:
> My only suggestion was to change how the existing munger works to satisfy the two problem areas
> identified: that names not be too long, and that any munger not remove so many characters as to
> make the file name non-unique or empty.
There's one more consideration though. The file name must be recognizable so that it can be
replaced or updated on the server easily by a human. Hashes, UUIDs, html entity numbers, HEX,
etc. would all be workable if that weren't the case.
But consider the case where my client has made a few edits to the text in a stack and wants to
replace the existing one. With descriptive names, the file is easy to find in the AWS bucket.
But comparing long sequences of indecipherable text is cumbersome.
I'm drifting toward the idea of removing non-ascii characters. That might satisfy all
requirements, at least for now. We don't do Sanskrit or Chinese yet. Or alternately I could
bite the bullet and convert my build tool to insert metadata into the clickable lists. That
isn't terribly difficult, I was just wondering if there was a different way using what we
already have.
Devin, Paul and Scott suggested variations on the "removal" approach. I haven't tested much,
but it looks like converting to UTF8 will quickly remove any non-ascii characters. Duplication
of file names is unlikely given the way various product files are separately stored on AWS.
But I'm still pondering. When I first asked the question, I wondered if there was a quick way
to do what I want, though I didn't expect much. What I got back from this amazing list is a
wealth of ideas and a very interesting discussion.
--
Jacqueline Landman Gay | jacque at hyperactivesw.com
HyperActive Software | http://www.hyperactivesw.com
More information about the use-livecode
mailing list