Word chunk includes punctuation
James Hale
james at thehales.id.au
Tue Aug 14 02:31:33 EDT 2012
Well thank you all for your replies.
wasn't sure about using tokens (token 1 of word) as I wasn't sure of the overhead.
however the alternative was to remove all pesky punctuation before running through my script.
I thought I would try both.
first script replaced . , ? ! ; : both within and at end of a line (but not within a word.)
second script only replaced the period either at the end of a word or a line.
I then went through each word of my text.
here are the times, in seconds, for passing through some 480,000 plus words.
the first figure is the replacement algorithm and the second is the word processing routine
Script 1
2.856504+1.876625 secs for 491081 words
Script 2
0.502831+2.173185 secs for 488871 words
So, there is a slight overhead on getting token 1 of the word but a massive saving on the pre-replacement routine.
So tokens it is then.
James
P.S in case you are wondering about the discrepancy in the word count I will be asking about it in my next question :-)
More information about the use-livecode
mailing list