Converting from unicode to ASCII

Wed Sep 23 15:59:01 EDT 2020

On 9/23/20 1:26 PM, Richard Gaskin via use-livecode wrote:
> My only suggestion was to change how the existing munger works to satisfy the two problem areas 
> identified: that names not be too long, and that any munger not remove so many characters as to 
> make the file name non-unique or empty.

There's one more consideration though. The file name must be recognizable so that it can be 
replaced or updated on the server easily by a human. Hashes, UUIDs, html entity numbers, HEX, 
etc. would all be workable if that weren't the case.

But consider the case where my client has made a few edits to the text in a stack and wants to 
replace the existing one. With descriptive names, the file is easy to find in the AWS bucket. 
But comparing long sequences of indecipherable text is cumbersome.

I'm drifting toward the idea of removing non-ascii characters. That might satisfy all 
requirements, at least for now. We don't do Sanskrit or Chinese yet. Or alternately I could 
bite the bullet and convert my build tool to insert metadata into the clickable lists. That 
isn't terribly difficult, I was just wondering if there was a different way using what we 
already have.

Devin, Paul and Scott suggested variations on the "removal" approach. I haven't tested much, 
but it looks like converting to UTF8 will quickly remove any non-ascii characters. Duplication 
of file names is unlikely given the way various product files are separately stored on AWS.

But I'm still pondering. When I first asked the question, I wondered if there was a quick way 
to do what I want, though I didn't expect much. What I got back from this amazing list is a 
wealth of ideas and a very interesting discussion.

-- 
Jacqueline Landman Gay         |     jacque at hyperactivesw.com
HyperActive Software           |     http://www.hyperactivesw.com