Make numberFormat even better

Mark Waddingham mark at livecode.com
Tue Apr 25 04:38:47 EDT 2017


On 2017-04-25 09:01, Curry Kenworthy via use-livecode wrote:
> To handle arrays better without requiring more lines of code, how
> about making numberFormat itself an array with at least two parts?
> Similar to the way the clipboardData has parts.
> 
> Using it without a key works exactly like before, by setting both 
> parts, ie:
> 
> -- (assume i=binary 5 and starting format=default)
> 
> set numberformat to "0.0"
> put "#" & i into a[i] --> a["5.0"] = "#5.0"
> 
> If you want just the regular strings formatted and not any array keys,
> or vice versa, use the appropriate part:
> 
> set numberformat["text"] to "0.0"
> put "#" & i into a[i] --> a["5"] = "#5.0"
> 
> (or)
> 
> set numberformat["keys"] to "0.0"
> put "#" & i into a[i] --> a["5.0"] = "#5"

This is certainly an idea - however, I wonder if it isn't quite 
addressing
the crux of the issue.

Arrays in LiveCode serve two purposes - both lists (if integer keyed,
dense, starting from 1) and dictionaries (all other types). The 
distinction
is made purely on the string structure of the keys (for all intents and
purposes) and what keys are present. Indeed for a LiveCode array to be
treated as a list the keys *must* be very strictly formatted as integers 
-
there must be no fractional part (i.e. "1.0" is *not* considered the 
same
key as "1").

For dictionaries, then having the numberFormat affect your numbers isn't
really an issue - as the keys are strings, they have to be strings, 
there
is no option.

For lists, however, what is really happening is that they are being
indexed by integers. So, could it be that the problem in LiveCode
is that it doesn't distinguish between integers and non-integers?

This actually alludes to another problem I've been struggling with
recently (and, indeed, partly why this numberFormat problem is so
interesting).

At the moment LiveCode's arithmetic uses 64-bit doubles universally. 
This
gives a relatively consistent arithmetic, but it has a significant 
limit:
you cannot represent 64-bit integers accurately - only up to 53-bit. 
Indeed,
the current semantics basically mean that you can do integer 
computations up
to 53-bits and then things 'gracefully' degrade in precision. The 
problem
is that I'm not sure it is feasible to extend the allowed size of 
integers
(having them arbitrary precision would be really neat!) whilst 
preserving
this graceful degradation to double (at least not in a performant way).

One solution here is to have (internally at least) two distinct numeric
types - integers and (inexact) reals - with the common-sense rules:

    integer int_op integer -> integer (+, -, *, div, mod are int_ops)
    integer op real -> real (+, -, *, /, div, mod are ops)
    real op integer -> real
    real op real -> real

The question, now, is that how would we distinguish between a string
we want to auto-convert to an integer, and a string we want to
auto-convert to a real?

One option here is to make it so that integers are always of the form

   [1-9][0-9]+

With a string being considered a real if it is any other valid numeric
string:

   "1" is an integer
   "1.", "1e10", "1.03", "1.0" are all reals.

If we made this distinction, then it would be viable to make it so that
numberFormat *only* affects reals:

   put 1 + 0 into tInteger
   put 1. + 0 into tReal
   set the numberFormat to "0.00"
   put tInteger , tReal
     => "1,1.00"

This would certainly solve the sequence-style-array 'ambiguity' and 
would
perhaps make sorting out numeric literals not being touched by 
numberFormat
a little more sane:

   set the numberFormat to "0.00"
   put "This is an integer: " & 1
     => This is an integer: 1
   put "This is a real: " & 1.
     => This is a real: 1.0

The problem here is that I have no idea if that distinction is too 
subtle
(for LiveCode, at least). It is certainly something you get used to 
pretty
quickly in pretty much any other language - most (if not all!) have a
strict distinction between integer and real.

To put the potentially subtlety in context; it only arise in:

   - numeric literals

   - strings/text/files taken as inputs

In the former case, if you actually wanted a real, then you'd
just have to add a '.' to the literal (there's an advantage here is that
it completely expresses your intent as to the type of number). In the
latter case, then if you are processing string inputs:

   - For integer do:
   put tStringFromOutside + 0 -> integer if tStringFromOutside is integer
   - For real do:
   put tStringFromOutside + 0.0 -> real

The advantages of this approach are:

   1) The implementation of integers is free to be whatever it needs to
      be.

   2) It explicitly means you can indicate your numeric intent (integer
      operations and real operations have very different semantics)
      in your scripts.

   3) It potentially opens the door to a much less 'surprising' 
numberFormat
      with regards indexing arrays.

   4) It potentially opens the door to a much less 'surprising' 
numberFormat
      with regards numeric literals in scripts

The disadvantages of this approach are:

   1) The distinction between integer and real could be considered
      quite subtle: "1" vs "1.".

   2) It is not backwards-compatible.

   3) You have to think a little more about transforming your input 
strings
      (although, only insofar as using + 0.0 rather than + 0 to force a
       toNumber conversion).

In regards to (1), then in reality there is a *huge* distinction between
these two things due to the semantics of the arithmetic operations on
integers vs doubles, so perhaps LiveCode should make you choose 
*explicitly*.

In regards to (2), then it is not backwards compatible but the current 
semantics
would be subsumed by the new ones since all we are doing (in effect) is
saying that 'don't represent a subset of numbers (the integers) we 
currently represent
as doubles, but use actual integers instead'. i.e. The new semantics can
get back the old behavior by simply converting integer-like strings to
doubles as they do now. i.e. This new behavior is a binary flag which 
affects
*only* num->string and string->num.

In regards to (3), this happens now - quite frequently people ask 'why 
is
this code dealing with numbers and strings not working quite right' and 
the
answer is generally - 'oh you need to put a + 0 in there somewhere to 
force
a numeric conversion'.

> Doing a simple job keeps it simple, and it does that job well.
> However, it would be possible to have both the classic simplicity and
> more power, by adding more optional symbols and rules.

That is very true - however, this thread has made me question what job
does numberFormat *actually* do and in LiveCode it is just a formatting
property - in comparison to HyperCard it has 0 effect on arithmetic.

> -- (assume i=binary 5555)
> 
> set numberformat to "$#,##0.00"
> put "Cost:" && i into x --> x = "Cost: $5,555.00"

Therefore, why not (as you suggest) make it a much better formatter?

There are so many precedents for number formatting out there (as Richard
points out) there is plenty to choose from. Ideally, we'd allow both
explicit formatting *and* formatting based on locale.

Certainly something to think about :)

Warmest Regards,

Mark.



-- 
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps




More information about the use-livecode mailing list