Unicode is not "everywhere"...

Dar Scott Consulting dsc at swcp.com
Tue Aug 27 10:55:29 EDT 2019


The added parameter approach looks very similar to an enhancement suggestion that has been around for a while. I'd mention the bug number, but I and bugs are not getting along at the moment.

Dar

> On Aug 27, 2019, at 5:54 AM, Mark Waddingham via use-livecode <use-livecode at lists.runrev.com> wrote:
> 
> On 2019-08-22 20:53, Paul Dupuis via use-livecode wrote:
>> I just want it consistent and documented and able to return more than
>> just ASCII data
>> Currently, OSX shell returns UTF8 which may mean that it is returning
>> binary as it is returning 8-bit bytes where Unicode text has been
>> encoded as UTF8
> 
> The encoding returned by the terminal commands on macOS are UTF-8 for two reasons:
> 
>  1) Various environment variables make it so (the 'system encoding')
> 
>  2) The terminal commands you are calling are written to respect the system encoding and emit text encoded in that way - because they are actually emitting text.
> 
> In contrast - 'cat' will just dump the contents of the file you specify byte by byte - and files could contain data in any encoding.
> 
> There is absolutely no way to tell whether a command is 'ls' like and thus emits text, or 'cat' like and thus emits binary.
> 
>> Windows returns CP1252 text, not binary and any Unicode results, which
>> DOS displays as Unicode just fine, can be returned without elaborate
>> work-arounds.
>> That by definition is a bug.
> 
> No - that isn't the definition of a bug - it is a difference of behavior because you are dealing with platform-specific details.
> 
> The /U switch which Dar suggested (and appears to work for DIR and friends at least) seems to be only applicable to 'internal commands' (according to https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/cmd) so it isn't clear what, if anything, it would do to an arbitrary windows terminal command.
> 
>> I would advocate that shell should return binary data. Text being
>> returned should be UTF8 encoded, that way people expecting ASCII do
>> nto need to o anything, they can just work with teh returned text.
>> People expecting Unicode can use textDecode to get the UTF8 converted
>> to LC native 16-bit Unicode, and people extcting binary can use the
>> byte chunk to process what comes back however they want.
> 
> The problem here is that it is up to the command being called what it outputs - nothing else - so this isn't an achievable goal. You have to know what the commands you are calling do, and how they work - and ensure you set the environment up when calling them to return what you want.
> 
> The current situation with shell is irksome though - the internal platform-dependent code returns binary data and does nothing to it but the higher-level wrapper (i.e. the 'shell()' function implementation) will basically leave it as binary data (converted to a native string - native strings and binary strings are essentially interchangeable) and then will perform EOL conversion on it on Windows and in server engines. This means it kinda returns text but not really. Unfortunately this behavior has existed for so long that it is 'just the way things are' so it isn't going to change.
> 
> Moving forward, a second parameter to shell() would probably be the best way to resolve the above anomaly - empty would mean legacy behavior, binary would mean do nothing at all.
> 
> It would be nice to be able to specify 'text' as well...
> 
> On UNIX-based systems it is clear what that should do (textDecode the output based on the 'system' encoding, which is determined from the environment variables of the calling process).
> 
> On Windows it is not clear to me what such a setting could do - /U certainly doesn't sound like it helps arbitrary processes, but it might be there is some way to change the codepage (analogous to the 'system encoding') of the command being called so some attempt can be made to text decode and EOL convert appropriately.
> 
> Warmest Regards,
> 
> Mark.
> 
> -- 
> Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
> LiveCode: Everyone can create apps
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
> 





More information about the use-livecode mailing list