How to remove emoji's from unicode string

scott at elementarysoftware.com scott at elementarysoftware.com
Sun Jan 13 19:51:00 EST 2019


Hello Richmond.I have found that emojis also cause the <open printing to pdf> command  to fail silently. Being able to strip emojis would be helpful for that as well.

I've been fooling about with your emoji stripping stack. 
Using   codePointToNum(tEmojiChar) > 128511  doesn't seem to catch all the emoji characters... 
this cat head <  🐱 > returns "128049"

Also some emojis now have multiple skin colors < 🧜‍♂️🧜🏾‍♂️ >and that seems to throw a monkeywrench into the works, too. When I posted this to the forum I see that the second merman is followed by a <brown swatch> and then a <male sign>. However, the second merman I pasted was actually slightly browner than the first (and looked correct when it was originally pasted) but this does not seem to pass through the posting mechanism correctly. The brown swatch and male symbol seem to be incorrectly parsed away from the second merman.  The “browner merman” is reported as 4 characters. This can even be seen by using the delete key (I’m on a Mac using LC 9.0.2) and deleting backwards over the merman. It changes color as the deleteKey removes characters which the field may or may not display. The merman doesn’t necessarily displays at the size that the field is set to. Selecting the merman and choosing “Use Owner’s Size” from the text menu can break the emoji if the field isn’t wide enough to contain all the “hidden characters” on the same line.

hmm… just looked for a bug report and didn’t find one exactly like this. I’m pretty ignorant about how unicode actually operates but on the assumption that it should “just work” in LiveCode.. I guess a bug report is my next stop.

--cross posted to forums—

Scott Morrow

Elementary Software
(Now with 20% less chalk dust!)
web       http://elementarysoftware.com/
email     scott at elementarysoftware.com
booth     1-800-615-0867
------------------------------------------------------


> On Jan 13, 2019, at 1:34 PM, Richmond via use-livecode <use-livecode at lists.runrev.com> wrote:
> 
> Cop a look at this:
> 
> *http://forums.livecode.com/viewtopic.php?f=7&t=32030*
> 
> On 13.01.19 22:17, Stephen MacLean via use-livecode wrote:
>> Hi All,
>> 
>> The recent conversations on using offset() with Unicode strings was very enlightening, thanks to all that took part!.
>> 
>> I have data stored in UTF8mb4. I use textDecode after loading it from the DB to put it into a format that LC understands. I then use offset() to find certain tags, text, etc. to work with. However, if there are emoji in that string, the offset() function hard crashes with a out of range error.
>> 
>> Due to the troubles offset(), I’m looking for a way to remove the emojis before I have to use the offset function.
>> 
>> Short of compiling a list of emoji and the decimal equivalent, does anyone have a way to do this in LC?
>> 
>> My offset code has been rock solid, except for these rare instances were there are emoji in the text and I am not really looking to change it if I don’t have to, preferring to just remove the emoji if possible.
>> 
>> TIA,
>> 
>> Steve MacLean
>> 
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list