Converting from unicode to ASCII
Bob Sneidar
bobsneidar at iotecdigital.com
Wed Sep 23 14:40:45 EDT 2020
Understood, but if it were reversible, it would eliminate the necessity of a lookup table as an intermediary.
Bob S
> On Sep 23, 2020, at 11:26 AM, Richard Gaskin via use-livecode <use-livecode at lists.runrev.com> wrote:
>
> If I understand her problem correctly, file identification need only be in one direction.
>
> As far as I can tell from the description, everything that needs to determine which file to access does so by using a string from which the hashed file name can be derived.
>
> That she already has a munger to derive the file name seems to reinforce that.
>
> My only suggestion was to change how the existing munger works to satisfy the two problem areas identified: that names not be too long, and that any munger not remove so many characters as to make the file name non-unique or empty.
>
> In some respects the benefits of a hash in this case are similar to using a UUID. But UUID is arbitrary and therefore requires establishing and maintaining a lookup table. In contrast, a hash is directly derivable from the file name, providing the same benefit as UUID for this case but without the need for a lookup table.
>
> Like the old saying goes, "There are two hard problems in computer science: cache invalidation, and naming things".
>
> Lookup tables are effectively a form of cache, a secondary replication of data, very useful at times but best avoided unless absolutely necessary.
>
>
> - Richard Gaskin
> Fourth World Systems
>
>
>
> Bob Sneidar bobsneidar at iotecdigital.com
>
> > How do you get back to the filename?
>
> > On Sep 23, 2020, at 8:03 AM, Richard Gaskin wrote:
> >
> >> One workaround for their storage name limitations I've seen used
> >> elsewhere is hash-based names, giving you a string that is plain
> >> ASCII, of a fixed and usable length, and is derived from the file
> >> name so systems don't need to maintain a lookup table to find the
> >> file based on a given string.
> >>
> >> This will give you a 40-char string in plain ol' ASCII unique to the
> >> input:
> >>
> >> function CleanHash s
> >> get binaryDecode("h*", sha1Digest(s), tHash)
> >> return tHash
> >> end CleanHash
> >>
> >> e.g.:
> >>
> >> get CleanHash("MyFile.txt")
> >>
> >> ...returns:
> >>
> >> d9275b8f757ce47c240d276c1e1192dae8585eba
> >>
> >>> ...When the user selects a name from a list, the selection is munged
> >>> to match the server name and the download URL is obtained from the
> >>> cron job's lookup file.
> >>>
> >>> We don't have a field in the database for a file name.
> >>
> >> Since a hash is derived from the file name, you don't need to
> >> maintain a lookup table as you would with an arbitrary string like
> >> UUID.
> >
> >> If I understand your problem correctly, that file identification need
> >> only be in one direction, just add the hash as part of your existing
> >> munge and you're pretty much done.
> >>
> >> --
> >> Richard Gaskin
>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
More information about the use-livecode
mailing list