word counts - what is going on?

James Hale james at thehales.id.au
Tue Aug 14 03:07:44 EDT 2012


Hi,

I am processing a body of text and identifying each word and where it is within the text block.
I am after its word number and its line number.

Using a repeat for each type loop I am stepping through the text and simply adding each step.

My previous post on identifying the words included test results from two ways of cleaning up my words for identification.
I was interested in the times but there was something else amiss.

Here are the two results.

Script 1: 2.856504+1.876625 secs for 491081 words
Script 2: 0.502831+2.173185 secs for 488871 words

Now the thing is, the counting loop to give me the line and word number for each word is the same in both cases.
The word count given above is the final value of the word index I compute.

So the difference must lie in my preamble to this counting, i.e tidying up the text.

After a bit of to'ing and fro'ing I think I see the problem.
Word boundaries.
For example, quoted text is considered 1 word.
Removing the quotes is ok to find the included words within the quoted text but from then on the word number is out of whack.
By this I mean that if I see that word 45 of line 1223 is "World" for example, I can't simply hilite  word 45 of line 1223 of mytext field and expect the hilite to fall on "World".

It would seem that to avoid these contradictory requirements (need to keep quoted text versus identifying words within the quote) I might need to revisit character positions.

So, back to the drawing board.


James

james at thehales.id.au

Tel: +61 3 9386 2516    
Fax: +61 3 9386 1387







More information about the use-livecode mailing list