Shifted Results from External

Dar Scott dsc at swcp.com
Wed May 3 16:36:01 EDT 2006


(Hello, everybody, I'm trying to crawl back out from my cave and see  
what's going on here.)

I have an external problem.  Well, that's where the symptoms show up.

I created an external that is a thin layer over some win32 calls  
(targeted to XP).  It does some extensive error checking.  The code  
looks clean and the external has tested out.

However, in the customer's environment which includes another  
external of mine that runs a customer supplied ActiveX module  
something strange happens after a while.  Based on the data from the  
customer, it looks like the results from system calls are shifted off  
by one.  That is, if calls are like this--f1(), g(), g(), f2(), g()-- 
at some point the data returned is that that should have been for the  
previous call--empty, f1(), g(), g(), f2().  I have no queues in my  
code, but it looks as if data is queued but an extra value is left in  
or inserted at some point.  (The customer also reported memory usage  
growth.)

I suspect that some other module in my customer's environment is  
breaking things--one of these: the other external, the activeX, Rev  
2.6.1, XP or maybe even the Transcript.  I think either the  
external's memory is getting smashed (or the heap or Rev) or  
something is going wrong with malloc/free.  I'm pretty sure it is not  
this external (famous last words).

The module uses static linking to C run time, and the best I can  
tell, there is no substitution for the malloc.  In all cases  
*retString is set.  (A quick check shows gibberish is returned if it  
is not.)  Strong exception catching is used.  All function results  
are checked for CRT and WIN32 calls.  I checked the calls to malloc  
and free and in my external they balance.  I make no calls to CRT  
functions that use malloc (according to MSDN documentation).  I  
haven't looked into where malloc gets its memory yet, maybe the  
process heap--anybody know?

These use my C++ libraries for externals, but these have worked for a  
long time and in lots of environments.  (More famous last words.)

The test stack does not seem to be blowing the Transcript call stack,  
but does have some interesting uses of wait with messages.

I'm not able to duplicate this in my environment on 3 machines.  I've  
made an effort to make sure the environments are the same as that of  
my customer, but was in the middle of that when the troubleshooting  
effort was stopped.  The customer test stack makes lots of different  
kinds of calls and uses send a lot.  In any case, the test is not  
small and it takes a while to fail in the customer's environment.

Since I couldn't replicate the bug (I know how RunRev feels with some  
of the Rev bugs), I sent some variations that might shift the  
symptoms or even report what went wrong.  Unfortunately, one of them  
(one that uses malloc less) did not display the problem, and testing  
of the batch of variations stopped right there, most untried.

I realize this is very weird and folks on this list, even external  
builders, may not have seen this, but I thought I'd give it a try.

I hope my customer can get his product to run reliably and I want to  
vindicate this external.

I can come up with a model for almost anything, but this baffles me.   
What can cause this?

OK, here is a model, but it is pretty wild:  I know external calls  
are slow, but I would be surprised if Rev is pushing & pulling data  
through queues to another thread that runs external calls.

Dar Scott
Rev guy on the northern Rio Grande





More information about the use-livecode mailing list