Converting from unicode to ASCII
Richard Gaskin
ambassador at fourthworld.com
Wed Sep 23 14:26:58 EDT 2020
If I understand her problem correctly, file identification need only be
in one direction.
As far as I can tell from the description, everything that needs to
determine which file to access does so by using a string from which the
hashed file name can be derived.
That she already has a munger to derive the file name seems to reinforce
that.
My only suggestion was to change how the existing munger works to
satisfy the two problem areas identified: that names not be too long,
and that any munger not remove so many characters as to make the file
name non-unique or empty.
In some respects the benefits of a hash in this case are similar to
using a UUID. But UUID is arbitrary and therefore requires establishing
and maintaining a lookup table. In contrast, a hash is directly
derivable from the file name, providing the same benefit as UUID for
this case but without the need for a lookup table.
Like the old saying goes, "There are two hard problems in computer
science: cache invalidation, and naming things".
Lookup tables are effectively a form of cache, a secondary replication
of data, very useful at times but best avoided unless absolutely necessary.
- Richard Gaskin
Fourth World Systems
Bob Sneidar bobsneidar at iotecdigital.com
> How do you get back to the filename?
> On Sep 23, 2020, at 8:03 AM, Richard Gaskin wrote:
>
>> One workaround for their storage name limitations I've seen used
>> elsewhere is hash-based names, giving you a string that is plain
>> ASCII, of a fixed and usable length, and is derived from the file
>> name so systems don't need to maintain a lookup table to find the
>> file based on a given string.
>>
>> This will give you a 40-char string in plain ol' ASCII unique to the
>> input:
>>
>> function CleanHash s
>> get binaryDecode("h*", sha1Digest(s), tHash)
>> return tHash
>> end CleanHash
>>
>> e.g.:
>>
>> get CleanHash("MyFile.txt")
>>
>> ...returns:
>>
>> d9275b8f757ce47c240d276c1e1192dae8585eba
>>
>>> ...When the user selects a name from a list, the selection is munged
>>> to match the server name and the download URL is obtained from the
>>> cron job's lookup file.
>>>
>>> We don't have a field in the database for a file name.
>>
>> Since a hash is derived from the file name, you don't need to
>> maintain a lookup table as you would with an arbitrary string like
>> UUID.
>
>> If I understand your problem correctly, that file identification need
>> only be in one direction, just add the hash as part of your existing
>> munge and you're pretty much done.
>>
>> --
>> Richard Gaskin
More information about the use-livecode
mailing list