Converting from unicode to ASCII

Wed Sep 23 14:26:58 EDT 2020

If I understand her problem correctly, file identification need only be 
in one direction.

As far as I can tell from the description, everything that needs to 
determine which file to access does so by using a string from which the 
hashed file name can be derived.

That she already has a munger to derive the file name seems to reinforce 
that.

My only suggestion was to change how the existing munger works to 
satisfy the two problem areas identified: that names not be too long, 
and that any munger not remove so many characters as to make the file 
name non-unique or empty.

In some respects the benefits of a hash in this case are similar to 
using a UUID.  But UUID is arbitrary and therefore requires establishing 
and maintaining a lookup table. In contrast, a hash is directly 
derivable from the file name, providing the same benefit as UUID for 
this case but without the need for a lookup table.

Like the old saying goes, "There are two hard problems in computer 
science: cache invalidation, and naming things".

Lookup tables are effectively a form of cache, a secondary replication 
of data, very useful at times but best avoided unless absolutely necessary.

- Richard Gaskin
   Fourth World Systems

Bob Sneidar bobsneidar at iotecdigital.com

 > How do you get back to the filename?

 > On Sep 23, 2020, at 8:03 AM, Richard Gaskin wrote:
 >
 >> One workaround for their storage name limitations I've seen used
 >> elsewhere is hash-based names, giving you a string that is plain
 >> ASCII, of a fixed and usable length, and is derived from the file
 >> name so systems don't need to maintain a lookup table to find the
 >> file based on a given string.
 >>
 >> This will give you a 40-char string in plain ol' ASCII unique to the
 >> input:
 >>
 >>   function CleanHash s
 >>      get binaryDecode("h*", sha1Digest(s), tHash)
 >>      return tHash
 >>   end CleanHash
 >>
 >> e.g.:
 >>
 >>   get CleanHash("MyFile.txt")
 >>
 >> ...returns:
 >>
 >>   d9275b8f757ce47c240d276c1e1192dae8585eba
 >>
 >>> ...When the user selects a name from a list, the selection is munged
 >>> to match the server name and the download URL is obtained from the
 >>> cron job's lookup file.
 >>>
 >>> We don't have a field in the database for a file name.
 >>
 >> Since a hash is derived from the file name, you don't need to
 >> maintain a lookup table as you would with an arbitrary string like
 >> UUID.
 >
 >> If I understand your problem correctly, that file identification need
 >> only be in one direction, just add the hash as part of your existing
 >> munge and you're pretty much done.
 >>
 >> --
 >> Richard Gaskin