Converting from unicode to ASCII

Bob Sneidar bobsneidar at iotecdigital.com
Wed Sep 23 14:40:45 EDT 2020


Understood, but if it were reversible, it would eliminate the necessity of a lookup table as an intermediary. 

Bob S


> On Sep 23, 2020, at 11:26 AM, Richard Gaskin via use-livecode <use-livecode at lists.runrev.com> wrote:
> 
> If I understand her problem correctly, file identification need only be in one direction.
> 
> As far as I can tell from the description, everything that needs to determine which file to access does so by using a string from which the hashed file name can be derived.
> 
> That she already has a munger to derive the file name seems to reinforce that.
> 
> My only suggestion was to change how the existing munger works to satisfy the two problem areas identified: that names not be too long, and that any munger not remove so many characters as to make the file name non-unique or empty.
> 
> In some respects the benefits of a hash in this case are similar to using a UUID.  But UUID is arbitrary and therefore requires establishing and maintaining a lookup table. In contrast, a hash is directly derivable from the file name, providing the same benefit as UUID for this case but without the need for a lookup table.
> 
> Like the old saying goes, "There are two hard problems in computer science: cache invalidation, and naming things".
> 
> Lookup tables are effectively a form of cache, a secondary replication of data, very useful at times but best avoided unless absolutely necessary.
> 
> 
> - Richard Gaskin
>  Fourth World Systems
> 
> 
> 
> Bob Sneidar bobsneidar at iotecdigital.com
> 
> > How do you get back to the filename?
> 
> > On Sep 23, 2020, at 8:03 AM, Richard Gaskin wrote:
> >
> >> One workaround for their storage name limitations I've seen used
> >> elsewhere is hash-based names, giving you a string that is plain
> >> ASCII, of a fixed and usable length, and is derived from the file
> >> name so systems don't need to maintain a lookup table to find the
> >> file based on a given string.
> >>
> >> This will give you a 40-char string in plain ol' ASCII unique to the
> >> input:
> >>
> >>   function CleanHash s
> >>      get binaryDecode("h*", sha1Digest(s), tHash)
> >>      return tHash
> >>   end CleanHash
> >>
> >> e.g.:
> >>
> >>   get CleanHash("MyFile.txt")
> >>
> >> ...returns:
> >>
> >>   d9275b8f757ce47c240d276c1e1192dae8585eba
> >>
> >>> ...When the user selects a name from a list, the selection is munged
> >>> to match the server name and the download URL is obtained from the
> >>> cron job's lookup file.
> >>>
> >>> We don't have a field in the database for a file name.
> >>
> >> Since a hash is derived from the file name, you don't need to
> >> maintain a lookup table as you would with an arbitrary string like
> >> UUID.
> >
> >> If I understand your problem correctly, that file identification need
> >> only be in one direction, just add the hash as part of your existing
> >> munge and you're pretty much done.
> >>
> >> --
> >> Richard Gaskin
> 
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list