Linux filenames in LC Server

matthias_livecode_150811 at m-r-d.de matthias_livecode_150811 at m-r-d.de
Mon Aug 14 07:12:55 EDT 2023


Hi Mark,

when i read Neville's post i thought also about normalize, although i really do not have a clue about the whole unicode stuff, but i remembered that the standalone builder make use of the normalize function. ;)

So i used this script on LC Server to write the seconds to a file containing an a-umlaut in its name.

put  normalizeText("testä.txt", "NFC") into tFile
put the seconds into URL ("binfile:"&tFile)
put the result
put "<br><br>"
put the files
put "<br><br>"
put tFile

But that does not work. "The result" returns 'can't open file'. 
As i already wrote i have no clue about unicode so i tried also NFD and also the other 2 options, but also w/o success.

Is there something else that  one hast to keep in mind to have success with this?


Regards,
Matthias



> Am 14.08.2023 um 12:22 schrieb Mark Waddingham via use-livecode <use-livecode at lists.runrev.com>:
> 
> On 2023-08-14 02:45, Neville Smythe via use-livecode wrote:
>> OK, so the macOS *is* using utf-8 for its file names - the [e-acute] in the filename Carré.txt is rendered with two bytes [C3A9] not the single byte MacRoman encoding. I got tricked by copying the terminal listing into another program rather than hex dumping within the terminal, and somewhere in the process the native encoding was preferred.
>> So one must *not* textEncode a filename to utf-8 before writing a file to disk, LC deals with the encoding, although you *should” textEncode its contents.
>> Which leaves the problem of why I can’t get LC Server on Linux to write non-ascii filenames
> 
> So I suspect the problem here is normalization, rather than the inability of Linux to write non-ascii filenames.
> 
> Characters such as e-acute / e-grave have *two* representations in unicode - the decomposed and composed form.
> 
> The composed form is a direct mapping from the native encodings and is a single codepoint, the decomposed form will be two codepoints - (e, combining-acute/grave)
> 
> Depending on where the string comes from it might either be composed or decomposed - macOS filenames are stored decomposed in the FS, but the higher-level parts of the OS make either form work (in a similar fashion to how macOS filesystems are case-insensitive by default).
> 
> Linux filesystems, however, are both case-sensitive and form-sensitive - a filename must match byte to byte with what it was created with (indeed, linux filesystems care nothing for encodings, they see filenames as a sequence of bytes which are interpreted relative to the user's current locale - the default locale on linux these days is utf-8).
> 
> If your app is managing the files completely on Linux (i.e. it is creating / deleting them and the filenames are not user-editable) then (if this is the caseu) the problem should be fixable by choosing a normalization form when you create / lookup the file - i.e. pass all filenames on the server through `normalizeText(<str>, <form>)` - here you want form to be either "NFC" (composed) or "NFD" (decomposed).
> 
> Warmest Regards,
> 
> Mark.
> 
> P.S. For all the gory details about Unicode normalization forms see - https://unicode.org/reports/tr15/
> 
> -- 
> Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
> LiveCode: Build Amazing Things
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode




More information about the use-livecode mailing list