Word chunk includes punctuation

James Hale james at thehales.id.au
Tue Aug 14 02:31:33 EDT 2012


Well thank you all for your replies.

wasn't sure about using tokens (token 1 of word) as I wasn't sure of the overhead.
however the alternative was to remove all pesky punctuation before running through my script.

I thought I would try both.

first script replaced . , ? ! ; : both within and at end of a line (but not within a word.)
second script only replaced the period either at the end of a word or a line.

I then went through each word of my text.

here are the times, in seconds, for passing through some 480,000 plus words.

the first figure is the replacement algorithm and the second is the word processing routine

Script 1
2.856504+1.876625 secs for 491081 words

Script 2
0.502831+2.173185 secs for 488871 words

So, there is a slight overhead on getting token 1 of the word but a massive saving on the pre-replacement routine.

So tokens it is then.

James

P.S in case you are wondering about the discrepancy in the word count I will be asking about it in my next question :-)







More information about the use-livecode mailing list