Is syntax a dead issue?
Geoff Canyon
gcanyon at gmail.com
Mon Jul 25 02:42:30 EDT 2022
As a meta point, I wrote a version closer to the actual requirements --
lowercase everything, process an external input line by line to allow for
arbitrary input size. The result is about 8-10x slower than most other
languages -- not as bad as I feared, not as good as I hoped. Here's the
code for that version:
on mouseUp
answer file "choose input:"
if it is empty then exit mouseUp
put it into F
lock screen
put the long seconds into T
open file F for read
repeat
read from file F for 1 line
repeat for each word w in toLower(it)
add 1 to R[w]
end repeat
if the result is not empty then exit repeat
end repeat
combine R using cr and tab
sort R numeric descending by word 2 of each
put the long seconds - T into T1
put R into fld "output"
put the long seconds - T into T2
put T1 && T2
close file F
end mouseUp
On Sun, Jul 24, 2022 at 11:01 PM Geoff Canyon <gcanyon at gmail.com> wrote:
> On this Hacker News thread <https://news.ycombinator.com/item?id=32214419>,
> I read this programming interview question
> <https://benhoyt.com/writings/count-words/>. Roughly, the challenge is to
> count the frequency of words in input, and return a list with counts,
> sorted from most to least frequent. So input like this:
>
> The foo the foo the
> defenestration the
>
> would produce output like this:
>
> the 4
> foo 2
> defenestration 1
>
> Of course I smiled because LC is literally built for this problem. I took
> well under two minutes to write this function:
>
> function wordCount X
> repeat for each word w in X
> add 1 to R[w]
> end repeat
> combine R using cr and tab
> sort R numeric descending by word 2 of each
> return R
> end wordCount
>
> There are quibbles -- the examples given in the article work line by line,
> so input size isn't an issue, and of course quotes would cause an issue,
> and LC is case insensitive, so it works, but the output would look like
> this:
>
> The 4
> foo 2
> defenestration 1
>
> But generally, it works, and is super-easy to code. But for the sake of
> argument, consider this Python solution given:
>
> counts = collections.Counter()
> for line in sys.stdin:
> words = line.lower().split()
> counts.update(words)
>
> for word, count in counts.most_common():
> print(word, count)
>
> That requires a library, but it's also super-easy to code and understand,
> and it requires just the same number of lines. So, daydreaming extensions
> to LC syntax, this comes to mind:
>
> function wordCount X
> add 1 to R[w] for each word w in X
> return R combined using cr and tab and sorted numeric descending by
> word 2 of each
> end wordCount
>
> or if you prefer:
>
> function wordCount X
> for each word w in X add 1 to R[w]
> return (R combined using cr and tab) sorted numeric descending by word
> 2 of each
> end wordCount
>
> Or to really apply ourselves:
>
> function wordCount X
> return the count of each word in X using cr and tab sorted numeric
> descending by word 2 of each
> end wordCount
>
> So: the xTalk syntax is over thirty years old; when was the last
> significant syntax update?
>
> (I'm not at all core to the process, so feel free to tell me how much I've
> missed lately!)
>
>
More information about the use-livecode
mailing list