Parsing and Extracting Text from ePub XHTML
m.schonewille at economy-x-talk.com
Sun Jul 26 22:46:12 CEST 2015
This works on LC 6.7.3:
put fld 1 into x
if the platform is not "MacOS" then
// not sure why this works
put isoToMac(x) into x
put uniDecode(uniEncode(x,"UTF8")) into x
set the htmlText of fld 2 to x
Economy-x-Talk Consulting and Software Engineering
Installer Maker for LiveCode:
Buy my new book "Programming LiveCode for the Real Beginner"
LiveCode on Facebook:
On 7/26/2015 21:31, Brahmanathaswami wrote:
> We do a lot of work with the contents of ePubs. For those who don't know
> the spec:
> "someBook.epub" is just "someBook.zip"
> which when inflated has a mini-portable web site based on responsive CSS
> (all percentages). You get
> /ops # "Open Package Structure"
> / fonts
> / images
> / styles
> / xhtml
> The xhtml folder then has all the these files:
> The text is pretty advanced in the sense that it uses unicode... (I
> think) for rendering diacritical fonts. mdash's etc.
> If I simply import the raw file unprocessed into a LC field (7.0.5)... I
> get the usual, expected text:
> <h3 class="h3s"><samp>Is Monistic Theism Found in the <span
> <h4 class="h4"><samp><span class="smallcap"><samp>ŚLOKA
> <p class="noindent"><samp><span class="cmbold"><samp>Again and again in
> the <em>Vedas </em>and from <em>satgurus </em>we hear ‚ÄúAhaṁ
> Brahmāsmi,‚Äù ‚ÄúI am God,‚Äù and that God is both immanent and
> transcendent. Taken together, these are clear statements of monistic
> theism. Aum Namaḥ Śivāya.</samp></span></samp></p>
> <h4 class="h4"><samp><span
> <p class="noindent"><samp>Monistic theism is the philosophy of the <span
> class="cmitalic"><samp>Vedas</samp></span>. Scholars have long noted
> that the Hindu scriptures are alternately monistic, describing the
> oneness of the individual soul and God, and theistic, describing the
> reality of the Personal God. One cannot read the <span
> class="cmitalic"><samp>Vedas</samp></span>, <span
> class="cmitalic"><samp>Śaiva Āgamas</samp></span> and hymns
> of the saints without being overwhelmed with theism as well as monism.
> Monistic theism is the essential teaching of Hinduism, of Śaivism.
> It is the conclusion of Tirumular, Vasugupta, Gorakshanatha, Bhaskara,
> Srikantha, Basavanna, Vallabha, Ramakrishna, Yogaswami, Nityananda,
> Radhakrishnan and thousands of others. It encompasses both
> Siddhānta and Vedānta. It says, God is and is in all things.
> It propounds the hopeful, glorious, exultant concept that every soul
> will finally merge with Śiva in undifferentiated oneness, none
> left to suffer forever because of human transgression. The <span
> class="cmitalic"><samp>Vedas</samp></span> wisely proclaim, ‚ÄúHigher
> and other than the world-tree, time and forms is He from whom this
> expanse proceeds‚Äîthe bringer of <span
> class="cmitalic"><samp>dharma,</samp></span> the remover of evil, the
> lord of prosperity. Know Him as in one‚Äôs own Self, as the immortal
> abode of all.‚Äù Aum Namaḥ Śivāya.</samp></p>
> Goal is to create a tool for volunteers to go in and extract quotes to
> allow them to grab a few sentences, which we will them push to an online
> So: What is the best way to get this text rendered? Do I go the path of
> setting the field's Unicode? But then what about the html mark up? if we
> create a browser object... can users select text and does LC know that
> there is a selected chunk if it is inside a browser object?
> Before I start wading into this I though to see if anyone else has some
> good guidance in advance,
> Swasti Astu, Be Well!
> Kauai's Hindu Monastery
More information about the use-livecode