Linux file names in LC Server

Neville Smythe neville.smythe at optusnet.com.au
Sun Aug 13 08:45:39 EDT 2023


As we know with LC it is pretty straightforward to deal with internationalised text for remote databases and unknown user platforms by conversion to utf-8. But I have come across a problem with Linux filenames containing non-ascii characters which has me befuddled.

My many-years-old app has until now just required all filenames to be in standard 7-bit ascii, so it was way past time I brought it up to date. 

The app talks to a database, media  and web site on a unix (DreamHost) server using LC server as intermediary.

I create a file say “Carré.txt” on a Mac - the non-ascii character in that name being [e-acute] - I shall use this convention from now on to ensure what is displayed here on the forum is understood.

BTW, as far as I can determine that character in the Mac file system is a single byte hex [8e], the classic MacRoman encoding, not its utf-8 2-byte [C3A9] encoding. So I don’t understand how macOS handles unicode in its filesystem, which it certainly does. We are exhorted to textEncode to utf-8 when exporting anything outside LC but perhaps not filenames??  If I textEncode the filename and save with that name I get a new file “Carr[squareroot copyright].txt”. I am befuddled already - how does macOS distinguish MacRoman encoding from unicode encoding when it displays a file name? - but that is another story for another place..

Oh, and another story: it ain't true that all text in LC is utf-16: While it’s not possible using LC-API’s to determine exactly what is inside the black-box of an LC variable in memory, it is evidently platform dependent —  that MacRoman [8e] is reported as being the relevant byte in the LC variable. What can be determined is what is on disk when a stack is saved: there text appears to be encoded as a mixture of 7-bit ascii when it can be, utf-16 encoding for other characters. Not that we as consumers need to know how the magic is performed, as long as it works. Back to my story..

So now I want to upload this file to my remote Linux server. I POST a form, prepared with libURLMultiPartFormData, to an LC Server script, which is supposed to save the received file.

If I attempt to use the original Mac file name, the server responds “Cannot open file Carr[e-acute].txt” 
(this is the Result error message from "open file tFileName for binary write”)

If I send textEncode(filename, utf-8) as the file name, the server responds “Cannot open file Carr[squareroot][copyright].txt”

If I textEncode at the client end, and then textDecode on the server it responds “Cannot open file Carre[E-grave].txt” (Where did THAT come from? Is there a bug in textDecode on Linux LCS?  The native encoding on Linux is supposed to be ISO-Latin-1, where E-grave is hex [C8], in MacRoman it is [E9], no apparent connections between them or the utf-8 bytes.)

And just as a piece of nonsense, if I send the raw un-Encoded Mac file name, but then textDecode on the server, the file is happily saved as “Carr.txt”, which is correct since [8e] followed by .  is illegal as utf-8, so the [e-acute] is just skipped by textDecode.

Could it be that LCserver cannot create files on Linux  with non-ascii names?!?  That doesn’t seem believable. I can of course directly create files on the server with non-ascii characters such as e-acute.

Either I am missing something, or surely our European users have seen this already, so someone should be able to unfuddle me!

Neville Smythe





More information about the use-livecode mailing list