How to remove emoji's from unicode string
scott at elementarysoftware.com
scott at elementarysoftware.com
Mon Jan 14 01:51:00 CET 2019
Hello Richmond.I have found that emojis also cause the <open printing to pdf> command to fail silently. Being able to strip emojis would be helpful for that as well.
I've been fooling about with your emoji stripping stack.
Using codePointToNum(tEmojiChar) > 128511 doesn't seem to catch all the emoji characters...
this cat head < 🐱 > returns "128049"
Also some emojis now have multiple skin colors < 🧜♂️🧜🏾♂️ >and that seems to throw a monkeywrench into the works, too. When I posted this to the forum I see that the second merman is followed by a <brown swatch> and then a <male sign>. However, the second merman I pasted was actually slightly browner than the first (and looked correct when it was originally pasted) but this does not seem to pass through the posting mechanism correctly. The brown swatch and male symbol seem to be incorrectly parsed away from the second merman. The “browner merman” is reported as 4 characters. This can even be seen by using the delete key (I’m on a Mac using LC 9.0.2) and deleting backwards over the merman. It changes color as the deleteKey removes characters which the field may or may not display. The merman doesn’t necessarily displays at the size that the field is set to. Selecting the merman and choosing “Use Owner’s Size” from the text menu can break the emoji if the field isn’t wide enough to contain all the “hidden characters” on the same line.
hmm… just looked for a bug report and didn’t find one exactly like this. I’m pretty ignorant about how unicode actually operates but on the assumption that it should “just work” in LiveCode.. I guess a bug report is my next stop.
--cross posted to forums—
(Now with 20% less chalk dust!)
email scott at elementarysoftware.com
> On Jan 13, 2019, at 1:34 PM, Richmond via use-livecode <use-livecode at lists.runrev.com> wrote:
> Cop a look at this:
> On 13.01.19 22:17, Stephen MacLean via use-livecode wrote:
>> Hi All,
>> The recent conversations on using offset() with Unicode strings was very enlightening, thanks to all that took part!.
>> I have data stored in UTF8mb4. I use textDecode after loading it from the DB to put it into a format that LC understands. I then use offset() to find certain tags, text, etc. to work with. However, if there are emoji in that string, the offset() function hard crashes with a out of range error.
>> Due to the troubles offset(), I’m looking for a way to remove the emojis before I have to use the offset function.
>> Short of compiling a list of emoji and the decimal equivalent, does anyone have a way to do this in LC?
>> My offset code has been rock solid, except for these rare instances were there are emoji in the text and I am not really looking to change it if I don’t have to, preferring to just remove the emoji if possible.
>> Steve MacLean
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
More information about the use-livecode