problem with counting words
larry at significantplanet.org
larry at significantplanet.org
Mon Oct 13 13:07:41 EDT 2014
Hi Richard,
in a word...
"I really enjoyed reading your post and I learned a lot!"
Larry
----- Original Message -----
From: "Richard Gaskin" <ambassador at fourthworld.com>
To: <use-livecode at lists.runrev.com>
Sent: Monday, October 13, 2014 9:03 AM
Subject: Re: problem with counting words
> Good post, Kay. Each of the examples you provided is among the reasons I
> like xTalk.
>
> But even though they demonstrate useful features of the language, neither
> is dependent on xTalk's trait of counting quoted text as a single word
> when using the word chunk type.
>
> Perhaps I should preface this by noting that I very much enjoy xTalk in
> general and LiveCode in particular, a love that's only grown in my 27
> years with this family of languages.
>
> But all programming languages have historical anomalies, and xTalk is not
> the world's only exception to this. Programming languages are, by nature,
> somewhat funky, attempting to communicate the richness of human thought to
> a machine too stupid to count past 1. All of them require trade-offs.
>
> In the first example you provided, the list of names, none of them
> includes quoted text. And even with the broader support of treating words
> as white-space delimited (breaking from the English rule of usually not
> including punctuation), as you noted at least one of the examples there
> will fail (sorry, Mr. Van Damme).
>
> Many other languages also provide means of dealing with multi-character
> white space (sed, awk, and Python come to mind), and none of them, not
> even xTalk, will reliably sort by last name unless we separate the first
> and last more explicitly, such in separate fields or with a tab character,
> as is commonly done in any language where a last-name sort is important,
> even in LiveCode.
>
> In the second example in which a multi-word value is used as an object
> identifier, once again we're not asked to parse that using xTalk's "word"
> chunk type, but instead get to rely on the engine's expression evaluator,
> which works very much like JavaScript's and others' in which literal
> strings can be used as object identifiers. Useful as it is, it's neither
> unique to xTalk nor necessarily dependent on how we use the "word" chunk
> type.
>
> Object identifiers *can* become dependent on the word chunk type if you
> need to parse them yourself, as others have noted along with many other
> good examples to justify the HyperTalk team's implementation (though we
> might ask why we need to do this so often, such as why we don't have
> objectType or ownerStack functions).
>
> No matter how useful the current implementation is, the choice still
> requires justification. Even if that justification is sound, favoring a
> certain utility, it's still a trade-off, the downside being a redefinition
> of the word "word" from its more common definition in natural language.
>
> Larry's initial confusion is far from rare. xTalk's reliance on a unique
> definition of "word" that differs from its use in natural language is
> something we all had to learn. We may accept it, we may like it, we may
> even prefer it, but it's by no means intuitive to the native English
> speaker.
>
> xTalk was born more than a decade before Unicode was invented, so it
> couldn't have taken advantage of the vast pool of collective knowledge
> embodied in the Unicode spec, nor was there the luxury of having the
> computational horsepower needed to use such a spec efficiently.
>
> Today the LiveCode team has at last corrected this with the introduction
> of the "trueWord" token type, though I have to shrug my shoulders with an
> acknowledging chuckle in sharing Larry's initial observation that if xTalk
> were being designed today, with it's ostensible emphasis on "English-like"
> syntax, the order is backwards:
>
> If we didn't have 27 years of code dependent on xTalk's unique
> redefinition of "word", to support the claim of "English-like" it might be
> more intuitive to have "word" act as "trueWord" does, and have some other
> token do what "word" currently does in xTalks unique redefinition.
>
> But that's not the world we live in. Like every other language, LiveCode
> is a product of its unique history. Useful as its conventions are, they
> will from time to time require us to learn new ways of doing things.
>
> This is just one of many reasons I generally don't use the phrase
> "English-like" when giving talks on LiveCode. Our favorite language
> brings to the world's programming choices a uniquely valuable blend of
> features, but while it's certainly more readable than most it isn't
> particularly "English-like", nor does it really even try all that hard to
> be.
>
> And that's a good thing.
>
> Natural language is really tough stuff to parse, full of its own even
> longer and more nuanced history, and intended for a very different
> audience (the cognitive complexity of the human mind rather than the
> logical simplicity of computers).
>
> I think most of us (except Geoff Canyon who has a rare mind for this sort
> of stuff <g>) would agree that we're all glad this isn't a valid statement
> in xTalk, even though it's a perfectly valid sentence in English:
>
> "Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo."
>
> <http://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo_buffalo_buffalo_Buffalo_buffalo>
>
> :)
>
>
>
> Kay C Lan wrote:
>
>> On Mon, Oct 13, 2014 at 7:45 AM, Richard Gaskin wrote:
>>>
>>> I hear ya', but like so many other oddities in the language this one
>>> came
>>> from Apple,
>>>
>>
>> Sheer brilliance! One of the first analogies of HyperCard was that it was
>> a
>> an electronic rolodex. Here is a list of names:
>>
>> Abu Musab Al-Zarqawi
>> Camilla Parker-Bowles
>> Catherine Zeta-Jones
>> Claude Levi-Strauss
>> D'Arcy Corrigan
>> Daniel Day-Lewis
>> David Ben-Gurion
>> Dodi Al-Fayed
>> Florence Griffith-Joyner
>> Gilbert O'Sullivan
>> Gloria Macapagal-Arroyo
>> Jean-Claude Van Damme
>> Jimmy O'Dea
>> Justine Henin-Hardenne
>> Kareem Abdul-Jabbar
>> Karim Abdul-Jabbar
>> Kristin Scott-Thomas
>> Maddox Jolie-Pitt
>> Michael O'Leary
>> Olivia Newton-John
>> Peter O'Toole
>> Sinad O'Connor
>> Tim Brooke-Taylor
>> Ralph Twistleton-Wykham-Fiennes
>>
>> So lets say you want to sort these by surname - a kind of rolodex thing
>> to
>> do.
>>
>> sort lines of myListOfNames by word of -1 each
>>
>> will result in only one mistake
>>
>> sort lines of myListOfnames by trueword -1 of each --if you are on LC7.0
>>
>> will result in basically the same messed up result most other programming
>> languages will give you. Put it in and word processor and see how you go.
>>
>> Please feel free to try and write your own function that is more
>> successful
>> and more efficient than the beautiful one liner Bill Atkinson gave us.
>> Even
>> if you had wordDel it wouldn't help much. I can't imagine the amount of
>> hours that have been wasted, especially on genealogical websites, trying
>> to
>> unfathom why double barrelled names never sort correctly. This is also
>> compounded by the certain fact that some people will put a space between
>> the last given name and the Surname, some a tab, and some will 'format'
>> the
>> data by placing multiple spaces in between names so that things 'line up
>> nicely' - and are then confused as to why it only looks that way on their
>> screen an not on someone else's. One of the reasons double barrelled
>> names
>> have picked up the '-' is to help computers recognise them as a single
>> word.
>>
>> Also;
>>
>> put myVariable into fld Not A Variable
>>
>> doesn't work
>>
>> put myVariable into fld "Not A Variable"
>>
>> does. The ability to recognise words in quote as a single entity is
>> extremely important. Yes, we don't typically think of such as a single
>> word, but when we understand that computers don't think like us, and we
>> do
>> understand why things are the way they are, such oddities can be
>> manipulated in many powerful ways to our own advantage. It is also
>> helpful
>> when we understand such things that we don't go around replacing one
>> character willy nilly with another character. ~ [tilde] for instance is
>> one
>> character I'd never use as it has a special meaning in many computer
>> languages; as does / \ < > . * and many others. If we had some text that
>> contained both straight and curly quotes and replaced the straight quotes
>> with curly quotes so we could get a word count, and then changed the
>> curly
>> quotes back to straight quotes, the finL text is not the same as it
>> started
>> - and this could cause problems. Today your function might work perfectly
>> for today's problem, but next month, or next year, when you start
>> expanding
>> your LC skills and try working with SQL databases, or Servers and network
>> connections, every now and then someone will report a bug that your app
>> does something strange. You may never be able to track it down because it
>> just happens that once every million DB calls a random user happens to
>> use
>> data that contains a character that you never use yourself and thought no
>> one else would. I have a particular liking to numToChar(127) myself.
>>
>> Yep, no other programming language might define a word like LC defines a
>> word, but I for one am EXTREMELY thankful for that.
>
>
> --
> Richard Gaskin
> Fourth World Systems
> Software Design and Development for the Desktop, Mobile, and the Web
> ____________________________________________________________________
> Ambassador at FourthWorld.com http://www.FourthWorld.com
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
More information about the use-livecode
mailing list