Transforming XML Data into JSON documents using JQuery

Recently, someone very close to me, asked me to develop an app that would be useful in the religious scene. I am not exactly the religious type but I figured it could be fun to use technology in doing something boring in a new and exciting way. I immediately thought about doing something with the Bible. I remember, years and years ago, trying to read the bible. It all seemed so monotonous. So many pages, very cramped words. Finding a new verse as the priest told his sermon seemed to take quite a few seconds. It was all basically a boring process.

So why not create something which enables high-performance, full text, instant search of the Bible, with the ability to use Audio input (in some browsers like chrome). Something that can find exact phrase matches in the bible in a few milliseconds. Something that can answer statistical questions based on the bible. How many times was the word ‘Lord’ used in the bible. What about ‘Jesus’? What about any other word? In a few milliseconds! All of this, from the browser! Why not go one step further and make this service into a programming interface (API) for the bible? An API upon which other apps could be built: apps for the browser, mobile phone, iPad, whatever! An API built on RESTful principles of the web, accessible ubiquitously via the HTTP protocol.

My first idea was to split a digital version of the bible into the smallest meaningful unit: verses. But I was confronted with the problem of finding a digital version of the bible. I set about searching the web, googling. I luckily came across an XML data dump of the bible at I downloaded the data and decided to use the ‘King James Version’. By opening KJV.xml, I was happy with the xml schema used in describing their data. Below is a snippet of the data contained in KJV.xml.

<bible translation="KJV">
	<testament name="Old">
		<book name="Genesis">
			<chapter number="1">
				<verse number="1">In the beginning God ....</verse>
				<verse number="2">And the earth was without .... </verse>
				<verse number="3">And God said, Let there .....</verse>
        <testament name="New">
                 <book name="Matthew">
                         <chapter number="1">
                               <verse number="1"> .... </verse>

Now I just had to parse this file to extract the verses into separate JSON documents. I wanted the verses to have a JSON like the one shown below:

   "translation": "KJV",
   "testament": "Old",
   "book": "Genesis",
   "chapter": 1,
   "versenumber": 1,
   "versetext": "In the beginning God created the heaven and the earth."

I did not feel like having to wire up some STAX xml parser in java so I decided to go with doing it all in jQuery/javascript. I know that JQuery is very good at parsing xml. Web developers usually employ jQuery for the manipulation of HTML but the fact is that HTML is XML. So jQuery shines on arbitrary node travel in arbitrary XML. Furthermore, we are trying to create new JSON documents from the xml we read in. Well….JSON is basically javascript object notation, just abbreviated. So we can read our xml using javascript (in the form of jQuery) and create our javascript objects using plain old javascript too! Dream combo.

Below is the jQuery code I wrote to do the parsing.

	type: "GET",
	url: "KJV.xml",
	dataType: "xml",
	success: function(xml) {

	var doc = null;

		var translation = $(this).attr('translation');

		$(this).find('testament').each( function(){
			var testament = $(this).attr( 'name');

			$(this).find('book').each( function() {
				var book = $(this).attr('name');

				$(this).find('chapter').each( function(){
					var chapternumber = $(this).attr('number');

					$(this).find('verse').each( function(){

						var versenumber = $(this).attr('number');
						var versetext = $(this).text();

						doc = {};
						doc["translation"] = translation;
						doc["testament"] = testament;
						doc["book"] = book;
						doc["chapter"] = parseInt(chapternumber);
						doc["versenumber"] = parseInt(versenumber);
						doc["versetext"] = versetext;



So there it is. In the innermost loop in the code snippet above, a new doc object is created and various properties are set inside it. These properties were read and saved at different points during the xml traversal. Now that we have the doc objects, we can do something interesting with them. I will talk about that in the next blog posts. Oh btw. I think there are 31,102 verses in the bible 😉


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s