[revServer] process timeout issue

Andre Garzia andre at andregarzia.com
Tue Aug 3 09:47:40 EDT 2010


I never seen a RevServer bug like those timeouts and stuff and of course
Jerry is not making it up, so the logical conclusion is that there's some
specific case(s) where a RevServer script can go loco and fail.

After that conclusion, our first requirement is to draw a mental map of what
is involved. Now repeat after me SETUP

S - Scenario (what are they building)
E - Environment (what is the surrounding environment)
T - Technology (what are they using)
U - Usage Case (what happened/when happened)
P - Possible Points of Failure (critical things that need assessment)

=== Scenario ===
Jerry, Sarah and Mary are building some amazing tools. They have a new
development tool that is powered by new xTalk like language called LIST.
Their system works by doing conversions on that LIST language to present
HTML5, Javascript, CSS web application. Their applications are AJAX powered
and a round trip to the server is needed to execute LIST ACTIONS (is this

=== Environment ===
Rodeo server was a RevServer built solution hosted at a datacenter. Is it
hosted at On-Rev or is it hosted privatelly? Where is it hosted?

=== Technology ===
Is Rodeo on virtualized hardware? Shared accounts? VPS? Which database was
used? Is it being served with Apache?

=== Usage Case ===
Can RevServer timeout be narrowed to some usage scenario? For example did it
happened while  huge conversions of LIST were taking place? Did it happen
while complex database calls were being executed? Was it completelly random?

=== Possible Points of Failure ===
If RevServer failed for a SANE REASON, meaning, it didn't simply exploded
out of nowhere then the most probable causes are:
* Memory Exaustion: RevServer script took more than it was allowed to chew
and was terminated.
* CPU Thief: RevServer script decided that the CPU was his alone and maxed
it for more time than allowed. Terminated with prejudice.
* Timeout: RevServer script started ponderating on the meaning of life while
doing its chores, takes forever, terminated by virtualization police.

These are the sane reasons for a process to be terminated by the system.
Could it be that RevServer is crashing, crashing is not the same as being
terminated. Being terminated means that you are working correctly but for
some reason or policy your process is terminated, crashing means somewhere
inside RevServer engine something went nuts and it died.

Now we don't know the answers to those questions but those are questions
that all should ask while facing problems with server side stuff. Server
side is hard and while on the Desktop is OK to make a little standalone that
allocated half a gig of memory, at the server side there's a big change that
you will not be allowed to do that.

There are to many possible points of failures and things to keep attention
specially when you are building something as big as Rodeo. I am sure that
Jerry and Sarah investigated their possibilities and reasoned what to do. In
the end it is just a compromise about where you want to stop and work. They
decided to move that technology to other engine. It is OK. Had they decided
otherwise it would be OK too.

Yesterday me and Pierre did some stress testing on his site. I've run 25
concurrent access for 30 secs on his site, of course I've run it from a
single machine and thus it is not the best benchmark possible but I did this
multiple times and his did it at the same time as well and maybe others did
to. On my tests there was not a single error. Some requests took longer
thant others but this can happen for many resons including problems on my
machine and network, after all I am on a lousy VPN.

I am now building some huge RevServer based solutions and I am yet to face
those problems, thats why I believe there's a recipe for them or that they
are caused by a combination of factors related to the virtualization stack
(if the server was indeed virtualized). Remember folks, virtualization is
cool but nothing beats a real machine.

Sorry for the long post

