Linux filenames in LC Server

Wed Aug 16 19:33:16 EDT 2023

It seems it is possible to set environment vars  using redirect rules in .htaccess.

I added the following lines to my .htaccess

RewriteEngine on
RewriteRule \.(lc) - [E=LANG:de_DE.UTF-8]

the 2nd line or better tells Apache not to redirect, but to 'use'  the flag [E=....]  when an .lc file is "requested".
The flag allows to set an environment variable.  E=LANG:de_DE.UTF-8        tells apache to set the variable LANG  to the value de_DE.UTF-8
It's even possible to set cookies that way using a cookie tag.

With those 2 lines i did not receive the 'can't open file'  error anymore and the file with a non-ascii filename, in my script testä.txt, was created by the .lc script.
In my ftp client the file testä.txt is shown as testä.txt, but i can access the file from LCserver still with its original name testä.txt

> Am 16.08.2023 um 09:34 schrieb Mark Waddingham via use-livecode <use-livecode at>:
> On 2023-08-16 06:37, Neville Smythe via use-livecode wrote:
>> So I misunderstood, I thought we were talking about Apache environment variables. Indeed the Terminal app reports
>> as a system env variable. But if this is not specifically a server problem, wouldn’t
>> that mean we could see the same behaviour with LC Desktop on Linux machines running
>> vanilla Ubuntu or Debian (which is what Dreamhost uses)? I haven’t tried this yet,
>> as it is a bit of pain to fire up my Linux emulator machine.
> So the situation here is similar to that which you get on macOS. If you open Terminal, then the (UNIX) environment (variable-wise) which you get will be different from that you get when you double-click on an app to launch it. In the latter case, the executable is launched via the desktop environments 'launcher' process and will inherit the environment provided by that. Presumably, as Linux desktops mandate various things (like language settings), the locale and environment vars will be set appropriately.
>> An experiment, which make me wonder if this counts as a configuration problem or an actual bug in LC Server:
>> In Terminal I type (actually paste) and execute
>> echo “éü😃” > Carré.txt
>>    (for Forum users like me who just see ? everywhere, that is [e-acute][u-umlaut][happyface emoji] in the content to be written to a file with [e-acute] in its name)
>>   This works without problem. The contents of the file are utf-8 encoded, which I didn’t
>> need to specify, but I guess that is what the pasteboard provided. Terminal had no problem
>> creating or finding the file without needing those env settings. Of course it cannot *display*
>> the file name without knowing the encoding, so ls reports the filename as 'Carr'$'\303\251''.txt’
>> ( readable as an ascii encoding, though not one I have seen before; note the single quotes)
> I'm guessing here that this is a remote ssh session to your Linux server, and you are using macOS Terminal app to run and connect? If that is the case then the reason this works is because Terminal on macOS is UTF-8 (which is the *only* encoding macOS supports in its UNIX subsystem so you don't get the variance problem you do with Linux). This means that pasting text from somewhere else will paste the UTF-8 bytes - i.e. they will get transmitted over SSH to the remote linux machine.
> As filenames are just sequences of bytes on Linux this works fine - however when you ask the remote terminal to list the files, it can only interpret the ascii chars (as the LANG is C) and thus emits octal escapes for the others - here this ix 0xC3 0xA9 which is the utf-8 encoding of e-acute.
>> If I setup the env variables Mark suggests in the Terminal session
>> export LC_ALL="en_US.UTF8"
>> export LANG=“en_US.UTF8”
>> then Terminal is able to display the filename á la française.
> So now the remote terminal knows how to interpret the sequences of bytes present in the filenames, and thus can emit them appropriately.
>> Cyberduck reports this filename correctly using the [e-acute] without having to set encoding
>> knowledge. And I can also create the file using Cyberduck with no problems. So IT knows about/expects/sets
>> up the encoding as needed. I bet other Linux-aware apps would also open or list such files without
>> drama or special configuration.
> IT doesn't know - it assumes. I suspect that if you used Cyberduck to connect to a Linux server which is setup to *not* be utf-8 (so filenames are encoded with some other encoding), then it would display things incorrectly.
> Of course, if the protocols it deals with specify the text encoding as utf-8 *and* the daemons running on said server are setup correctly (i.e. so that they process the filenames and such relative to the server's encoding) *and* they correctly convert the filenames from that encoding to the encoding mandated by the protocol then it would display fine.
> Certainly FTP treats filenames as sequences of bytes - so at least for that protocol the client would have to assume UTF-8 or be told the correct encoding to do the correct thing.
>> However: in LC Server when I call "the long files" for the enclosing folder: crash!
>> (Actually an in-line error reported for this code line). To my mind that qualifies as
>> bug, even if the source of the crash is the same as for open file.
> I take it by crash you mean a runtime error is logged, and that this only happens if the LANG / LC_ALL environment variables are not set?
> This is the same issue as opening a file - the low-level text encoding from ASCII to the internal encoding used by strings in the engine will be failing because it encounters non-ASCII.
>>   On the other hand hopefully setting the environment variables as Mark suggests will
>> fix everything . Mark, could I clarify exactly how that “launcher script” is to be used…
>> I’m guessing the cgi configuration should point to that file to be executed when it wants
>> to open instead of pointing to the livecode-server executable (in which case
>> it might have to have a .cgi suffix rather than .txt), or is it a shell script to be
>> executed by livecode-server?
> The provided text should be put into a shell script which should be launched *instead* of livecode-server - so configure your CGI environment to call said shell script when it encounters a lc server script file to run. It will then set environment variables and then 'exec' replaces the shell script with livecode-server (in the same process).
> Technically while what the engine is doing is correct (relative to its need to have filenames represented as strings internally at least) it isn't ideal. There are two options to improve the situation (when the locale env vars are not set / set to C):
>  1) Rather than assume ASCII, assume native - this would preserve the bytes in the filename regardless of system encoding.
>  2) Rather than assume ASCII, assume utf-8 - this would correctly represent filenames which are valid UTF-8, but would still fail on filenames with bad encoding
> Here (1) has the advantage that filenames would be preserved; but with the slight caveat that if you combined with other unicode characters (in a report say); the filenames would be displayed incorrectly (here 'display' would also include being sent as part of some protocol response).
> Here (2) has the advantage of everything working as expected assuming the server in question is utf-8 - it would still fail on filenames which are badly encoded though. However the latter could be mitigated by making the sys-string<->lc-string conversion slightly less strict - i.e. bad utf-8 chars map to/from '?' as they do in textEncode/Decode - so at least you could see the bad filenames.
> I suspect (2) is overall better - its only downside is that you would not be able to manipulate files on the server which had badly encoded utf-8 names. However, that seems like an extreme edge case; and one which you could work around by just setting the LANG env var to a native encoding and put appropriate code in your app to deal with.
