Doing user testing in a risk free way (Re: gitter)

Mark Waddingham mark at livecode.com
Sun Jul 30 01:16:04 EDT 2017


On 2017-07-30 01:04, Monte Goulding via use-livecode wrote:
> I’m actually not sure I see the connection between the ide submodule
> and the 8.1.6-rc-2.

Heh - the connection is indirect, however it is definitely there!

**The goal:

We want to do A/B testing on LiveCode's 'first run' experience. This 
experience has to go from finding the site, all the way through to 
downloading, installing, activating and actually using the product to 
ensure the data that we get is of the highest, and most useful quality. 
The experience any new user (who is being used as a guinea pig) has to 
be indistinguishable from all other new user's except for the specific 
test flow they have been allocated to.

**The problem:

Iteration on develop branch is too slow and we can't guarantee with a 
high degree of confidence that any one DP would be in any way suitable 
for a brand new user to use. (We still reserve the right to make 
absolutely blooping mistakes in develop DPs - obviously we try not to, 
but we need to be able to test out invasive changes somewhere, and that 
is where we currently can!).

We have a well-defined and tightly controlled process for our 
maintenance branch. It is really important that this is kept as it is, 
and it is definitely *not* the place to do any sort of testing of any 
features whatsoever.

For A/B testing to work, tests need to be able to be rolled out quickly, 
iterated quickly, and pulled quickly if they are unsuccessful, or 
indeed, ill-posed.

**The primary observation:

A/B testing is centered around the outermost, user-visible parts of any 
release. Specifically, the IDE and supporting materials. Any engine work 
that might be required will either be new (internal) additions to 
support user-visible work or bug fixes (almost always the latter).

**The proposed solution:

All A/B tests are built *upon* but not *into* the maintenance builds.

If a specific test *requires* a bug fix, it will be placed in first RC 
of the next maintenance build. If this means a specific test has to be 
deferred a short while it can be deployed then that is what must happen. 
(We have plenty of tests we could run, after all - it is not like we are 
going to run out anytime soon).

If a specific test *requires* an internal engine addition to achieve, 
then suitable analysis will be done to make sure it can and does have 
zero impact on the potential stability of a maintenance build. 
Basically, these things are constrained to internal features which are 
entirely 'bolted' on to the existing engine and have no interoperation 
with the existing code and ideally such things would be a widget/library 
or a similar extrinsic addition.

A test will be created by making a fork of the target maintenance 
release branch in the IDE repository. All work for the specific test 
happens on this branch. All active test branches are kept up to date, by 
merging progress of the underlying maintenance branch/release branch 
(e.g. develop-8.1 or release-8.1.7) into them.

Tests are not built into the maintenance releases - the entire IDE 
bundle for that test will be uploaded as a separate archive.

When a new user who is being put through a test generates a license, the 
license will be marked with the name of the test.

When the user activates and launches the IDE for the first time, the 
appropriate test (IDE) archive will be downloaded and *that* version of 
the IDE will be launched, rather than the one built into the release.

If a test is successful, it will be merged into the next *suitable* 
version.

For small entirely first-run related things this might well be a 
maintenance release (e.g. new first run tutorials, new materials, new 
default preferences for first-run users).

For larger things (such as autocomplete or handler lists) we will use a 
bump in *middle* version number of semver, and use a DP release to make 
sure we got it absolutely right.

Importantly the latter has no impact on what new users see. We can merge 
all successful tests into all future tests so we our test profiles build 
on what we have done before. i.e. New users will see the combined set of 
successful tests; whilst existing users will be able to help us finesse 
them and ensure that they are as good as they can be before appearing in 
a minor version update; rather than a maintenance update.

If a test is unsuccessful, the test just gets deleted, the code for it 
is never merged (but of course, we'll keep the code - it might be 
something we might want to test again in the future).

**Why it works:

The issue which occurred with 8.1.6-rc-2 was due to the fact we were 
iterating on the actual releases which existing users rely on. This 
approach means that existing users will not see, nor even have the code 
in their system for any of the tests which we run. We can keep the 
strict process for maintenance releases we have always had, whilst still 
using them to iterate rapidly in the first-run A/B testing side.

**What is the connection to the IDE submodule?

If the IDE were rolled into the engine repository, the above would not 
work. We would not be able to easily guarantee that an A/B test *was* 
actually based on a line-for-line identical version of the engine as the 
maintenance release. It *physically* forces us to observe the boundaries 
laid out above - which means that the risk of error is diminished to 
close to zero - everything can be tested where it should be, in the 
proper process.

Of course, it would work *better* if we had our repo structure the right 
way round - the IDE should really have the engine as a submodule. 
However, nothing is perfect, and the current situation can be made to 
work perfectly well, even if it is not as pure as it could be.

**How can all our existing users help with our user testing?

Well, the above approach has one further *really* useful aspect. For any 
licenses which are not marked with a test funnel (or indeed are marked 
with a dead test funnel) we can show a drop-down list in preferences for 
all currently available tests.

Choosing one would reboot the IDE into that test's IDE. The result? Any 
of you can play around with the things we are testing and give us 
feedback on them without it affecting your workflow or use of the IDE at 
all.

**When will this happen?

Well, soon, hopefully, We are already looking to do a 8.2.0-dp-1 with 
the autocomplete features as we don't really have time to get this new 
process up and running for our next new user testing session - and 
neither Ali nor I are in any way comfortable with doing it 8.1.6 style.

We *think* we have time to do an 8.2.0-dp-1 and *hopefully* an 
8.2.0-rc-1 in that timescale if you all don't mind helping us ensure the 
feature is sound (Monte has done a great job on it so far - so hopefully 
the fact it is autocomplete will be enough of an incentive to try out 
these builds and rapidly help us get it to RC state).

So, yes - Mark and Dan you were right - I cannot say that you weren't. 
Our previous approach was the wrong thing to do, and we only did it that 
way because we couldn't see how to do it in another way. Well, I think 
we have now, as the above explains.

Warmest Regards,

Mark.

-- 
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps




More information about the use-livecode mailing list