LC7 and 8 - Non responsive processing large text files

Mark Waddingham mark at livecode.com
Thu Apr 14 05:43:45 EDT 2016


Hi Roland,

On 2016-04-13 12:03, Roland Huettmann wrote:
> NON-RESPONSIVENESS USING "UNTIL" READ
> 
> I found opening a very large text file (my file is 26 GB), simply 
> reading
> from it up to 90 MB of data in each iteration, using an offset () 
> function
> processing the read junk of data, and iterating through the file, is 
> not a
> problem and performs acceptably.
> 
> But reading something into memory using "read from file <filename> 
> UNTIL
> <string>" and doing this many times over in such large text file 
> creates
> non-responsiveness of LC (tested on 7 and 8).

When you say 'non-responsiveness' I take it you mean that Windows thinks 
that the application has 'hung'? (i.e. the windows go slightly opaque).

If that is the case then the problem here is probably that the 'read 
until' command is running in a very tight loop without 'tickling' the 
event loop. The first thing to try is to open the file for binary read, 
and make sure you encode '<string>' in the appropriate encoding for the 
text file you are reading and see if that helps.

If that doesn't improve matters then I'd guess that the string being 
searched for is not that common, and so the engine is having to wade 
through large sections of the file on each command invocation. This is 
probably taking longer than Windows will tolerate before thinking the 
app is non-responsive.

It isn't entirely clear to me at the moment how we could 'fix' this in 
the engine right now with the way the file processing code currently 
works. However, you might find replacing 'read until' with a script 
solution might make things work better:

------

on test
    local tFile
    bufferedFileOpen "~/Desktop/largefile.txt", "binary", tFile
    put empty into field 1
    repeat forever
       local tLine, tIsEof
       bufferedFileReadUntilExact tFile, numToChar(13), tLine
       if the result is "eof" then
          put tLine & return after field 1
          exit repeat
       end if
       put tLine & return after field 1
    end repeat
    bufferedFileClose tFile
end test

command bufferedFileOpen pFilename, pEncoding, @xFileHandle
   open file pFilename for binary read
   if the result is not empty then
     throw "cannot open file"
   end if

   -- The 'file' key stores the filename to use to read from.
   put pFilename into xFileHandle["file"]

   -- The 'encoding' is used to encode a string we search for 
appropriately
   put pEncoding into xFileHandle["encoding"]

   -- The 'buffer' contains data we have read but not yet consumed
   put empty into xFileHandle["buffer"]
end bufferedFileOpen

command bufferedFileClose @xFileHandle
   if xFileHandle["file"] is empty then
     exit bufferedFileClose
   end if
   close file xFileHandle["file"]
   put empty into xFileHandle
end bufferedFileClose

command bufferedFileReadUntilExact @xFileHandle, pString, @rRead
   -- First encode the string as binary data. If the encoding of
   -- the file is 'binary' then we assume pString is binary too.
   local tEncodedString
   if xFileHandle["encoding"] is "binary" then
     put pString into tEncodedString
   else
     put textEncode(pString, xFileHandle["encoding"]) into tEncodedString
   end if

   -- Now compute the length in bytes of the string we are searching for
   local tEncodedStringLength
   put the number of bytes in tEncodedString into tEncodedStringLength

   -- We store the last position we searched up until in the current 
buffer
   -- so that we aren't continually searching the same data for the 
string.
   local tBytesToSkip
   put 0 into tBytesToSkip

   -- We now loop, accumulating the output string, until we find the
   -- string we are searching for.
   local tIsEof
   put false into tIsEof
   repeat forever
     -- If the amount of data in the buffer is less than the string we
     -- are searching for then read in another 64kb of data.
     if the number of bytes in xFileHandle["buffer"] < 
tEncodedStringLength then
       read from file xFileHandle["file"] for 65536 bytes
       if the result is "eof" then
         put true into tIsEof
       else if the result is not empty then
         throw "error reading from file"
       end if
       put it after xFileHandle["buffer"]
     end if

     -- See if we can find the string in the buffer
     local tEncodedStringOffset
     put byteOffset(tEncodedString, xFileHandle["buffer"], tBytesToSkip) 
into tEncodedStringOffset
     if tEncodedStringOffset is not 0 then
       put byte 1 to (tBytesToSkip + tEncodedStringOffset + 
tEncodedStringLength - 1) of xFileHandle["buffer"] into rRead
       delete byte 1 to (tBytesToSkip + tEncodedStringOffset + 
tEncodedStringLength - 1) of xFileHandle["buffer"]
       exit repeat
     end if

     -- If we failed to find the string and the file is at eof, we are 
done.
     if tIsEof then
       put xFileHandle["buffer"] into rRead
       put empty into xFileHandle["buffer"]
       exit repeat
     end if

     -- We failed to find the string in the buffer so we need to 
accumulate
     -- more data. As the string was not found in the current buffer, we
     -- know we can skip all bytes up to (buffer length - 
tEncodedStringLength)
     put the number of bytes in xFileHandle["buffer"] - 
tEncodedStringLength into tBytesToSkip
   end repeat

   if tIsEof then
     return "eof"
   end if

   return empty
end bufferedFileReadUntilExact

----

I wrote the above whilst waiting for my Windows VM to spin up... 
However, having since tried the above on Windows and *still* observing a 
'non-responsive' window after a while I don't think the problem you are 
seeing has anything to do with processing speed of the 'read until' 
command.

Windows will assume that an app is 'non responsive' if it does not 
process any UI events for more than 5 seconds 
(https://msdn.microsoft.com/en-gb/library/windows/desktop/dd744765(v=vs.85).aspx). 
Now, the engine does periodically 'poke' the event queue to look for 
'Ctrl-.' key presses which will abort repeat loops and any long running 
process - however, this does not appear to be (any longer?) enough to 
stop an app from becoming 'non-responsive'. What is even more strange is 
that after an app does become 'non-responsive' (according to the OS), 
the 'Ctrl-.' key press will no longer appear to work. At this stage, I'm 
not entirely sure what can be done to prevent Windows from marking an 
app as 'non-responsive' without explicit script being used (the Ctrl-. 
not working whilst 'non-responsive' is potentially fixable although it 
is a little tricky to work out what is going on).

One thing to try is to not use 'wait', but 'wait with messages' at 
reasonable intervals in your top-level processing loop. This does mean 
you will have to disable the UI whilst your processing is taking place, 
and balance the periodic calling of 'wait with messages' with the speed 
of the processing loop. (You don't want to call it too often because 
otherwise it will impact performance).

Warmest Regards,

Mark.

-- 
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps





More information about the use-livecode mailing list