Is syntax a dead issue?

Mon Jul 25 02:42:30 EDT 2022

As a meta point, I wrote a version closer to the actual requirements --
lowercase everything, process an external input line by line to allow for
arbitrary input size. The result is about 8-10x slower than most other
languages -- not as bad as I feared, not as good as I hoped. Here's the
code for that version:

on mouseUp
   answer file "choose input:"
   if it is empty then exit mouseUp
   put it into F
   lock screen
   put the long seconds into T
   open file F for read
   repeat
      read from file F for 1 line
      repeat for each word w in toLower(it)
         add 1 to R[w]
      end repeat
      if the result is not empty then exit repeat
   end repeat
   combine R using cr and tab
   sort R numeric descending by word 2 of each
   put the long seconds - T into T1
   put R into fld "output"
   put the long seconds - T into T2
   put T1 && T2
   close file F
end mouseUp

On Sun, Jul 24, 2022 at 11:01 PM Geoff Canyon <gcanyon at gmail.com> wrote:

> On this Hacker News thread <https://news.ycombinator.com/item?id=32214419>,
> I read this programming interview question
> <https://benhoyt.com/writings/count-words/>. Roughly, the challenge is to
> count the frequency of words in input, and return a list with counts,
> sorted from most to least frequent. So input like this:
>
> The foo the foo the
> defenestration the
>
> would produce output like this:
>
> the 4
> foo 2
> defenestration 1
>
> Of course I smiled because LC is literally built for this problem. I took
> well under two minutes to write this function:
>
> function wordCount X
>    repeat for each word w in X
>       add 1 to R[w]
>    end repeat
>    combine R using cr and tab
>    sort R numeric descending by word 2 of each
>    return R
> end wordCount
>
> There are quibbles -- the examples given in the article work line by line,
> so input size isn't an issue, and of course quotes would cause an issue,
> and LC is case insensitive, so it works, but the output would look like
> this:
>
> The 4
> foo 2
> defenestration 1
>
> But generally, it works, and is super-easy to code. But for the sake of
> argument, consider this Python solution given:
>
> counts = collections.Counter()
> for line in sys.stdin:
>     words = line.lower().split()
>     counts.update(words)
>
> for word, count in counts.most_common():
>     print(word, count)
>
> That requires a library, but it's also super-easy to code and understand,
> and it requires just the same number of lines. So, daydreaming extensions
> to LC syntax, this comes to mind:
>
> function wordCount X
>    add 1 to R[w]  for each word w in X
>    return R combined using cr and tab and sorted numeric descending by
> word 2 of each
> end wordCount
>
> or if you prefer:
>
> function wordCount X
>    for each word w in X add 1 to R[w]
>    return (R combined using cr and tab) sorted numeric descending by word
> 2 of each
> end wordCount
>
> Or to really apply ourselves:
>
> function wordCount X
>    return the count of each word in X using cr and tab sorted numeric
> descending by word 2 of each
> end wordCount
>
> So: the xTalk syntax is over thirty years old; when was the last
> significant syntax update?
>
> (I'm not at all core to the process, so feel free to tell me how much I've
> missed lately!)
>
>