Lab Assignment 2

Bible Word Search System

PROJECT OVERVIEW

The goal of this project is the development of a “Google-like” verse search for the Bible. The final goal will be to type in a series of words, and have the system return those verses which contain all of those words. Many words have multiple forms, e.g. “build, builder, builds, building”, and we do not want the matches to fail due to a different form. A “stemmer” program will be used, which takes English words, and attempt to find the “stem. For example, “build builder builds building” will all be converted to “build”.

One aspects of this project is the creation of a Bible word “inverted list”. An inverted list is simply a data structure which creates, for each word in the Bible, a list of every reference which contains that word. The problem is that creation of this structure requires the processing of all 880K words in the Bible. This takes 15-20 seconds on Hoth. We would like to display, and narrow, matches as fast as the user type the search terms.

A solution is to preload the inverted list into a map structure, within a separately running program. This CGI will then “query” the already running program, requesting a list of references for a given word.

A demo version of the system used uses AJAX, and creates and uses an inverted list of the Bible is on HOTH at /home/class/csc3004/P2P1-COPYME]

A demo of a client/server system is at: /home/class/csc3004/namesclientserver

LEARNING GOALS

This project has the following learning goals:

  1. Creation and use of inverted lists

  2. Use of STL list and map classes

  3. Use of STL transform algorithm

  4. Merge algorithms.

  5. Interprocess communications

  6. Client Server architecture

Project Milestones

As explained above, the goal of this code is to create a user-friendly application that permits the fast lookup of verses with a matching set of words. The goal is a high speed, active search the displays matching verses, on the fly, as the words are typed. This project is has several distinct aspects, each of which will solved in a stepwise manner.

  1. Display verses matching a groups of words using a submit button

  2. Designing a Client/Server solution

  3. Implementing a Client Server solution

  4. Critiquing and proposing solutions for problems

  5. Implementing final solution

Milestone 1

Display verses matching a groups of words using a submit button

In this milestone the goal is to gather together a list of words typed in by the user, and lookup the list of references for each. You are then to INTERSECT the resulting lists, and display nicely the list of matching verses.

The INTERSECT operation can be minimized by merging the REF lists for each word. The STL set template can be used to eliminate duplicates. before merging.

Merging can be done between pairs of reference lists by processing each list in order, stepping through each, always reading from the list with the minimal (eariler in the Bible) first. You will need to complete the operator< on the reference clas to do this. While you step through the list, create a new list of matching references.

You can merge the lists from two words, and then merge this list with the list for the next word, until you have merged the lists for all the unique words entered.

The resulting list of REFs can then be looked up and displayed using the Bible object.

Milestone 2

Designing a Client/Server solution

The system resulting from milestone 1 is slow, and not live or dynamic. The problem is that the inverted lists are created AFTER the words are entered, a time consuming operation.

The proposed solution is to create a client/server solution. A "server" process will run, preloading the Bible words inverted list. The CGI program called from the web page will recieve the words as they are typed in, and pass them to the server program to process the lists and find the matching references.

This solution needs to be carefully designed before implementing. Several outstanding questions must be answered. Where will the Bible verses be looked up (server or client)? Where will the merging be done? What object classes need to be created/changed? What, exactly will be passed between the client and the server. A communication "protocol" needs to be developed.

This part does NOT require coding, only CAREFUL creation of and architecture and design. This design will include:

  1. A diagram and description of overall architecture of the system. This shows the parts of the solutions, what functionality exists in each, and lines of communication.
  2. A complete UML diagram of the objects used in the system.
  3. A complete, unambiguous design of a communication protocol between the client and the server. This includes the message formats, and all command/responce possibilities.
  4. A complete explaination of the planned user interface.
  5. Justification for your design decisions.
I would like you to use VISIO for your design. You can use the UML modeling capability to model your object class structures. As an example of a simple design, see a design for the sample Names Statistics System here: ClientServerDemo. This includes 1 and 3 from above. 2 will be forthcoming.

Milestone 3

Implementing a Client Server solution

Here you will implement and test the results of Milestone 3.

Milestone 4

Critiquing and proposing solutions for problems, User interface design,

Several potentual problems will likely exist with your solution. Here we will explore possible problems, and demonstrait their existance. You will then propose a solution for each.

Another aspect of milestone for is the user interface. Your goal is to condier the characteristics of a good user interface, and design you user interface to be as good, or usable, as possible.

Milestone 5

Implementing final solution

Here you will implement the final solutions as agreed upon with the class instructor. This will include a solution for dealing with multiple concurrent web accesses (e.g. fix the pip race condition). It will also be a complete implementation of the well designed user interface.

User Interface

As before, a major goal of this assignment is to create a friendly and robust user interface. The interface must be intuitive and easy to use.

GRADING

• 50% - How well the program works with respect to the given specifications. (NO credit will be given for a program that doesn't run.)

• 20% - User friendliness. Should meet the criteria covered in class. Invalid user entries should not crash program. All input should be range checked. Error messages should be meaningful and helpful.

• 20% - Internal program structure. The program should be well organized into procedures. Remember "maximize cohesion, minimize coupling".

• 10% - Comments - The code must be clearly commented. Each procedure should be preceded by a detailed description. This should include a description of the function and any assumptions the calling routine must make. It should describe the function, purpose, and range of values for each parameter passed, as well as whether the value if passed, returned, or both.

• Up to 10% Extra Credit You may receive extra credit f or extra features. This includes multiple Bible versions integration, expanding search range to a given verse range to be considered a match (or a whole chapter as a range), improving the stemmer for the language used in the KJV of the Bible. Etc as proposed and approved.

Milestone Due dates

Milestone

Completion date

Points

Display matching verses using a submit button

March 3

100

Design Client/Server solution

March 8

100

Implemented Client Server solution

March 14

100

Critiquing and proposing solutions for problems

March 18

100

Implementing final solution

March 22

100

-- JimSkon - 2011-02-22

Topic revision: r4 - 2011-03-31 - JimSkon
 
This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback