Linux filenames in LC Server
Mark Waddingham
mark at livecode.com
Tue Aug 15 04:44:15 EDT 2023
On 2023-08-15 08:42, Neville Smythe via use-livecode wrote:
> So if I understand Mark correctly, while one can create utf-8 encoded
> filenames directly in a terminal
> session, LC Server internally accesses Apache environment variables to
> encode/decode the filename
> before opening a file rather than directly using the shell. Presumably
> this has something to do with
> the engine being a server app having to respect the server environment.
So what is actually happening here is that there is a notion of a
'SysString' in the engine. A 'SysString' is a string represented as a
sequence of bytes in whatever encoding the host platform understands in
its APIs. The engine converts its internal string representation to a
sys string whenever it accesses a system API - e.g. for opening files.
In the case of Linux what encoding such 'sys strings' need to use
depends on the environment - the encoding *could* be anything and thus
the engine uses the UNIX 'iconv' library to convert from internal
representation to the encoded bytes needed. I think this is what is
causing the failure of the file APIs - iconv is refusing to convert a
string with non-ascii characters to the 'default' 'C' locale as it can't
(there is no mapping from, say, e-acute to ascii).
I should point out that textEncode/Decode do not use system APIs - the
conversions between UTF* forms and 'native' are all built into the
engine - so that part is fine - its the low-level connection between
commands like 'open file' and calling the UNIX open API which is
throwing an error on file name conversion.
> On Dreamhost, as far as I can determine, the LANG and LC-ALL variables
> are *not* set (though WordPress
> is running and it adds support for a swathe of languages, so surely has
> support for non-ascii filenames?)
> The site is a shared hosting, so I do not have permissions to change
> the Apache conf files. I tried adding
> the SetEnv commands in the .htaccess file but that didn’t work,
> although I could well be doing it wrong,
> I am fumbling around in the dark here.
The only thing I've found so far is SetEnv which does look like it can
only be configured in the host config for a domain which is slightly
irksome. However, there is a way to launch the CGI engine with any vars
needed.
I'm not sure how Dreamhost sets things up - indeed it might be worth
asking their support if there is a way to configure environment
variables which are passed through to CGI executables.
If there isn't then it can be done with a launcher script:
```
#!/bin/sh
export LC_ALL="en_US.UTF8"
export LANG="en_US.UTF8"
exec livecode-server
```
This would be a text file which has been made executable - and needs to
be configured as the executable which is launched when a livecode server
script is launched (livecode-server in the above needs to be the
location of the livecode-server executable in the hosting setup).
I know others here use (or have used) Dreamhost in the past - so they
might know more about how the above could be configured (although,
again, Dreamhost support can probably help).
> Unless there is some way to fix the configuration, it would seem that
> not only will opening files
> fail but the detailed files (the long files) command will also fail if
> non-ascii characters are
> encountered since it uses textEncode. I presume that using shell
> commands could be used as a workaround
> for accessing the filesystem, as long as LC doesn’t do an internal
> textEncode as it passes the
> variables to the shell!
> However it also means one cannot use textDecode/Encode at all, not just
> for the filenames but also
> content; and that could be a bummer. I haven’t encountered this so far
> because to this point I have
> encoded content before uploading binary files to the server, but I can
> envision situations where I
> would want to encode or decode server-side.
The problem isn't with textEncode/Decode - they work fine as mentioned
above - its just the engine doesn't have the necessary information (due
to lack of env vars) to know how to interpret/create the filenames the
system APIs need.
> I’m puzzled that this problem hasn’t been raised before. Surely the
> vast majority of website host
> providers use Linux servers, and the Dreamhost configuration for shared
> hosting is most likely
> standard. So has no-one in Europe (or Asia..) using LC Server wanted to
> create native-language
> filenames? I think LC Server is a magnificent tool, but perhaps it is
> not as widely used as it
> deserves! Or: they all found the fix and haven’t told us.
This is almost certainly a server setup/config thing - I guess apache
(by default) runs CGIs in the most 'raw' environment possible by
default.
The observation about Wordpress is interesting - certainly before PHP
was 'unicodified' - the encoding of filenames was up to the script -
i.e. you had to to encode/decode filenames appropriately yourself and I
guess utf-8 was just assumed. With PHP7 I believe it handles unicode
transparently a bit like LC does, so I'll see if I can see what PHP7+
uses to determine the system encoding. Indeed, it might do no harm at
all to just assume UTF-8 encoding for Linux in the engine if the locale
vars are not set (which appears to be the case here) which would resolve
the problem transparently.
Warmest Regards,
Mark.
--
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Build Amazing Things
More information about the use-livecode
mailing list