is it safe to rely on the hash of a livecode variable from a character encoding standpoint?

Mark Waddingham mark at livecode.com
Fri Jan 26 13:31:37 EST 2018


On 2018-01-26 18:50, Tom Glod via use-livecode wrote:
> Hi Everyone,
> 
> I want to ask how likely it is that at some point in the future some 
> change
> in character encoding could start producing a different hash for the 
> same
> sentence? just thinking about the nightmare scenarios facing a project 
> that
> heavily uses hashing to verify and address content......in 
> international
> characters......to boot.

The hash/digest functions (e.g. sha1Digest) operate on binary data. So 
if you do:

   put sha1Digest("foobar")

Then "foobar" is first converted to binary data using the native 
encoding (i.e. the backwards-compatibility rule we have), then that is 
hashed.

In every case where you produce a hash you have to explicitly choose an 
encoding - so pick you favourite (unicode friendly!) encoding and do:

   get sha1Digest(textEncode(tMyString, tMyEncoding))

If you are generating hashes of strings to send to existing things, then 
it should say *somewhere* in the docs of the thing you are sending what 
encoding to use before applying the hash.

Also be aware that unicode allows the 'same' string to be encoded in 
multiple ways - so its probably wise to choose a normalization form 
first too (see normalizeText) - otherwise you could have two strings 
which look the same (e.g. e,acute / e-acute) but hash to a different 
value.

Warmest Regards,

Mark.

-- 
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps




More information about the use-livecode mailing list