Linux filenames in LC Server

Mark Waddingham mark at livecode.com
Tue Aug 15 04:44:15 EDT 2023


On 2023-08-15 08:42, Neville Smythe via use-livecode wrote:
> So if I understand Mark correctly, while one can create utf-8 encoded 
> filenames directly in a terminal
> session,  LC Server internally accesses Apache environment variables to 
> encode/decode the filename
> before opening a file rather than directly using the shell. Presumably 
> this has something to do with
> the engine being a server app having to respect the server environment.

So what is actually happening here is that there is a notion of a 
'SysString' in the engine. A 'SysString' is a string represented as a 
sequence of bytes in whatever encoding the host platform understands in 
its APIs. The engine converts its internal string representation to a 
sys string whenever it accesses a system API - e.g. for opening files.

In the case of Linux what encoding such 'sys strings' need to use 
depends on the environment - the encoding *could* be anything and thus 
the engine uses the UNIX 'iconv' library to convert from internal 
representation to the encoded bytes needed. I think this is what is 
causing the failure of the file APIs - iconv is refusing to convert a 
string with non-ascii characters to the 'default' 'C' locale as it can't 
(there is no mapping from, say, e-acute to ascii).

I should point out that textEncode/Decode do not use system APIs - the 
conversions between UTF* forms and 'native' are all built into the 
engine - so that part is fine - its the low-level connection between 
commands like 'open file' and calling the UNIX open API which is 
throwing an error on file name conversion.

> On Dreamhost, as far as I can determine, the LANG and LC-ALL variables 
> are *not* set (though WordPress
> is running and it adds support for a swathe of languages, so surely has 
> support for non-ascii filenames?)
> The site is a shared hosting, so I do not have permissions to change 
> the Apache conf files. I tried adding
> the SetEnv commands in the .htaccess file but that didn’t work, 
> although I could well be doing it wrong,
> I am fumbling around in the dark here.

The only thing I've found so far is SetEnv which does look like it can 
only be configured in the host config for a domain which is slightly 
irksome. However, there is a way to launch the CGI engine with any vars 
needed.

I'm not sure how Dreamhost sets things up - indeed it might be worth 
asking their support if there is a way to configure environment 
variables which are passed through to CGI executables.

If there isn't then it can be done with a launcher script:

```
#!/bin/sh
export LC_ALL="en_US.UTF8"
export LANG="en_US.UTF8"
exec livecode-server
```

This would be a text file which has been made executable - and needs to 
be configured as the executable which is launched when a livecode server 
script is launched (livecode-server in the above needs to be the 
location of the livecode-server executable in the hosting setup).

I know others here use (or have used) Dreamhost in the past - so they 
might know more about how the above could be configured (although, 
again, Dreamhost support can probably help).


> Unless there is some way to fix the configuration, it would seem that 
> not only will opening files
> fail but the detailed files (the long files) command will also fail if 
> non-ascii characters are
> encountered since it uses textEncode. I presume that using shell 
> commands could be used as a workaround
> for accessing the filesystem, as long as LC doesn’t do an internal 
> textEncode as it passes the
> variables to the shell!
> However it also means one cannot use textDecode/Encode at all, not just 
> for the filenames but also
> content; and that could be a bummer. I haven’t encountered this so far 
> because to this point I have
> encoded content before uploading binary files to the server, but I can 
> envision situations where I
> would want to encode or decode server-side.

The problem isn't with textEncode/Decode - they work fine as mentioned 
above - its just the engine doesn't have the necessary information (due 
to lack of env vars) to know how to interpret/create the filenames the 
system APIs need.

> I’m puzzled that this problem hasn’t been raised before. Surely the 
> vast majority of website host
> providers use Linux servers, and the Dreamhost configuration for shared 
> hosting is most likely
> standard. So has no-one in Europe (or Asia..) using LC Server wanted to 
> create native-language
> filenames? I think LC Server is a magnificent tool, but perhaps it is 
> not as widely used as it
> deserves! Or: they all found the fix and haven’t told us.

This is almost certainly a server setup/config thing - I guess apache 
(by default) runs CGIs in the most 'raw' environment possible by 
default.

The observation about Wordpress is interesting - certainly before PHP 
was 'unicodified' - the encoding of filenames was up to the script - 
i.e. you had to to encode/decode filenames appropriately yourself and I 
guess utf-8 was just assumed. With PHP7 I believe it handles unicode 
transparently a bit like LC does, so I'll see if I can see what PHP7+ 
uses to determine the system encoding. Indeed, it might do no harm at 
all to just assume UTF-8 encoding for Linux in the engine if the locale 
vars are not set (which appears to be the case here) which would resolve 
the problem transparently.

Warmest Regards,

Mark.

-- 
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Build Amazing Things



More information about the use-livecode mailing list