Converting from unicode to ASCII

Wed Sep 23 11:03:44 EDT 2020

J. Landman Gay write:

 > I'm looking for a way to create non-unicode file names
 > based on the string that comes out of the database.

Ah, public clouds...

Amazon's S3 docs say just encoding in UTF-8 should suffice, but then 
they also list a lot of characters they consider "special", but common 
usage considers them not so special at all, so conflicts like this are 
apparently abundant.

One workaround for their storage name limitations I've seen used 
elsewhere is hash-based names, giving you a string that is plain ASCII, 
of a fixed and usable length, and is derived from the file name so 
systems don't need to maintain a lookup table to find the file based on 
a given string.

This will give you a 40-char string in plain ol' ASCII unique to the input:

     function CleanHash s
        get binaryDecode("h*", sha1Digest(s), tHash)
        return tHash
     end CleanHash

e.g.:

     get CleanHash("MyFile.txt")

...returns:

     d9275b8f757ce47c240d276c1e1192dae8585eba

 > ...When the user selects a name from a list, the selection is munged
 > to match the server name and the download URL is obtained from the
 > cron job's lookup file.
 >
 > We don't have a field in the database for a file name.

Since a hash is derived from the file name, you don't need to maintain a 
lookup table as you would with an arbitrary string like UUID.

If I understand your problem correctly, that file identification need 
only be in one direction, just add the hash as part of your existing 
munge and you're pretty much done.

-- 
  Richard Gaskin
  Fourth World Systems
  Software Design and Development for the Desktop, Mobile, and the Web
  ____________________________________________________________________
  Ambassador at FourthWorld.com                http://www.FourthWorld.com