NULL characters and sorting

Richard Gaskin ambassador at fourthworld.com
Fri Mar 20 11:26:30 EDT 2009


Paul Looney wrote:

> One of my customers had a problem displaying all of their archived  
> orders.
> There should have been 16,020 archived records but only 3,879 were  
> showing up in the list.
...
> I started checking the contents of variables in the appropriate  
> handler and discovered there were the proper number of records just  
> before a sort by column. After the sort, records were missing.
> Phil, the Great, Davis - Wizard of West Linn - suggested checking for  
> and removing NULLs (because they terminate a line in C). They were  
> 131,023 NULLs in the pre-sort variable.
> When I removed them before the sort, the number of listed records  
> jumped from 3,879 to 16,020.
> This leaves some questions:
> How can 131,023 NULL "characters" reduce the displayed "lines" by  
> 12,141?

The sort command has a limit which I don't believe is currently 
documented:  it can only be used reliably on data sets in which no line 
is longer than 65,535 characters.

I've submitted a request to have this noted in the docs:
<http://quality.runrev.com/qacenter/show_bug.cgi?id=7823>

I've also submitted a request to have this limit raised:
<http://quality.runrev.com/qacenter/show_bug.cgi?id=7824>

I learned about this when I encountered a similarly mystifying bug with 
one of my WebMerge customers, back when Dr. Raney managed the engine. 
His response was that if one had an item in a line which was pushing the 
line beyond those limits, that's a good argument for putting that data 
somewhere else. ;)

At the time I argued with him, but working around it led me to a useful 
feature in my product: the ability to reference external files from 
within a record.  In my case one of the editors at a major Mac magazine 
was using WebMerge on some data from FileMaker, in which he stored full 
articles.  With the addition of the new feature he was able to write 
those articles in any tool and store them anywhere, merely referencing 
the file from his database; my product would then find it and include it 
as though it was part of the data.

While it worked out well for me and my customers, I still see the 
occasional data set in which some lines are longer than 65,535 chars, 
and still believe it would be a useful enhancement to the engine.

In your case it may not be a problem at all now that the NULLs are 
removed, presumably reducing the number of chars well below that limit. 
64k is approximately 28 pages' worth of stuff, so it's pretty rare that 
such a limit would be exceeded on a single line of actual data.


Now that Phil's removed the NULLs, the next trick is to figure out how 
those got into the data in the first place.  I'm tempted to place a bet 
that it's related to pasting data that had been copied from MS Word. 
I'd be interested to learn how that happened either way.

--
  Richard Gaskin
  Fourth World
  Revolution training and consulting: http://www.fourthworld.com
  Webzine for Rev developers: http://www.revjournal.com



More information about the use-livecode mailing list