Parsing XML Nodes w/Same Tag

Sannyasin Brahmanathaswami brahma at hindu.org
Fri Feb 24 14:45:12 EST 2017


My longest running in house production app is an audio transcriber. Very successful little gadget, running in xTalk since 2001

We have over 1,000 XML files from an audio archive of transcripts.

Now I'm digging in and getting the data out.

I'm not facile with xml routines but did my best with the help of Bernd new, actually useable, dictionary.

But ran into a bug  in 9 DP5  (I think… ) OR I am doing something wrong

given transcripts formatted with nodes like this:

<?xml version="1.0" encoding="UTF-8"?>
<audio_transcript>
<header>
  <audio_filename>CAS0886_radio-pilot_Inspired-Talks.A.mp3</audio_filename>
  <date_given>1980-01-03</date_given>
  <given_by>Gurudeva</given_by>
  <subject>Three Words of Existence</subject>
  <category>God and Lords of Dharma</category>
  <duration>18 min, 36 secs</duration>
  <given_location>San Francisco</given_location>
  <transcribed_by>Brahmanathaswami</transcribed_by>
  <description>
          Subtopic: three worlds: 0:3:56
          Subtopic: temple: 0:4:7
          </description>
</header>
          <transcript_text>
                   <p>
                             [Radio Announcer: Ravi Peruman introduces Gurudeva]
                   </p>
                   <p>
                             Gurudeva says ......
                   </p>
                   <p>
                             More content here
                   </p>
                   <p>
                             Subtopic: three worlds: 0:3:56
                   </p>
                   <p>
                             All about temple
                   </p>
                   <p>
                             Subtopic: temple: 0:4:7
                   </p>
          </transcript_text>
</audio_transcript>

My script looks like this

put revXMLChildContents(pTree, "/audio_transcript/header",tab,return,false,4) into fld "productionNotes"  # this works… I get all the contents
put revXMLNodeContents(pTree,"/audio_transcript/transcript_text/p") into tText # this works but we only get the first <p> content

# so I presume (like I said… parsing xml is new to me) we need to loop/iterate over the sibling <p> tags..
put revXMLNumberOfChildren(pTree,"/audio_transcript/transcript_text/","p",4) # return "6

# the following line should provide us what we need, I think, to set up a repeat loop  using the indexed node function
# and this is a) according to the dictionary b) and the script will compile:

put revXMLChildNames(pTree,"/audio_transcript/transcript_text/", return,"p",true)

I get a "green" OK in the script editor, but when I run it. we get this output, which is expected

p[1]
p[2]
p[3]
p[4]
p[5]
p[6]

and presumably I can use that list to now fetch the contents of all those nodes (haven't figured that out yet)

but the engine fires an error msg (even though the script compiled without complaining)  when we run it..

button "Load Transcript": execution error at line 22 (Handler: can't find handler) near "", char 89

it is breaking on the end of this line

put revXMLChildNames(pTree,"/audio_transcript/transcript_text/", return,"p",true)


even though the script compiles… isn't this a bug? If it a) is what the dictionary says it should be and b) compiles, why the error?

if not, what am I doing wrong?

The full button script is below… and you see my "fumbling" to fetch the content of all the "p" nodes. There seems to be some oddity relating to multiples nodes all having the same tag. 

global theTape
on mouseUp
put theTape into tTranscript
set the itemdel to "."
put "xml" into item -1 of tTranscript
if there is a file tTranscript then
put url ("file:/" & tTranscript) into tTranscriptXML
else
answer "Sorry, there is no transcript in the same folder as the audio" with "OK"
exit to top
end if
put revXMLCreateTree(tTranscriptXML,false, true,true) into pTree
if pTree is not an integer then
answer "Problem with the XML. Open in a text editor" with "OK"
end if
put revXMLChildContents(pTree, "/audio_transcript/header",tab,return,false,4) into fld "productionNotes"
put revXMLNodeContents(pTree,"/audio_transcript/transcript_text/p") into tText
put revXMLNumberOfChildren(pTree,"/audio_transcript/transcript_text/","p",4)
put revXMLChildNames(pTree,"/audio_transcript/transcript_text/", return,"p",true) 

#this script complies, but breaks on the above line when run
--put revXMLNextSibling(pTree,"/audio_transcript/transcript_text/p") into nextSibling
--put revXMLNodeContents(pTree,nextSibling) after tText # feeble attempt fails, need to do some loop but don't know how.
# no robust examples to follow, any help appreciated!
--put revXMLNodeContents(pTree, "audio_transcript/header/duration") into tTranscriptHTML # works for single node (of course)
--set the htmltext of fld "transcript" of stack "Audio_transcriber" to tTranscriptHTML

end mouseUp






More information about the use-livecode mailing list