Parsing (scraping) OpenGraph Tags from html HEAD
th.douez at gmail.com
Wed Aug 2 04:06:55 EDT 2017
2017-08-02 6:45 GMT+02:00 Sannyasin Brahmanathaswami:
> though I'm yet sure when using regEx this is better than using Jacque's
That's 2 different ways..
but with the regex one, you have the exact key and value of each tags,
nothing more to do.
Either way it would seem prudent to extract the head first before processing
Mmm, don't really see why, but I've added a line of code for this too
> Using jacques method just gets the list..
and we need to do more coding to get the array we need.
> But your method can only handle 1 tag.
I was aware of that but didn't know what you want to achieve, therefore I
leave it for the reader.
However this has nothing to do with the regex but with the code inside the
Here is another way to do it, changing only *1* line of code inside the loop
with the same regex as before:
-- to please BR wishes, but not necessary
-- erase everything after </head>
put replaceText( _Html, "(?ms)</head>.*?$", empty) into _Html
repeat while matchChunk( _Html, Rx, p1,p2,p3,p4 )
put char p1 to p2 of _Html & tab& char p3 to p4 of _Html &cr after
delete char 1 to p4 of _Html
delete last char of Rslt -- extra cr
put Rslt into fld 1
answer "Got " & the number of lines of Rslt & " og: meta tags!"
Building a multi-dimensionnal array after the extraction,
a bit more work inside the repeat loop will be needed,
but the extraction part is still valid.
Finally, if you are not at ease with regex, go with Jacque's way and
everything will be fine.
There are fundamentally not much differences in between the 2 ways.
> On 7/31/17, 12:31 AM, "use-livecode on behalf of Thierry Douez wrote:
> So, here is the code:
> local Rx, Rslt, _Html, OG
> put empty into Rslt
> put URL "https://www.youtube.com/user/kauaiaadheenam" into _Html
> put IT into Rx
> repeat while matchChunk( _Html, Rx,p1,p2,p3,p4 )
> put char p3 to p4 of _Html into OG[ char p1 to p2 of _Html ]
> delete char 1 to p4 of _Html
> end repeat
> and you can test it this way:
> combine OG using return and ":"
> put OG into fld 1
> HTH and feel free to ask any question...
> Kind regards,
Thierry Douez - sunny-tdz.com
sunnYrex - sunnYtext2speech - sunnYperl - sunnYmidi - sunnYmage
More information about the Use-livecode