Revision History

Revision Name Reason for Changes Date
1.0 Grant Gipson Initial revision 10/20/10
2.0 Grant Gipson Extensive specifications and changes in design made 11/14/10
2.1 Grant Gipson Elaborated on database schema; user table in particular 02/07/11
2.2 Grant Gipson Added Python scripts to design 02/10/11
2.3 Guess who... Elaborated on Master Program and associated pipe messages 02/13/11
2.3.1   Further details; added psuedocode 02/15/11
2.3.2   Elaborated on Master Program 03/05/11

1. Introduction

1.1 Purpose

The system is to provide a means by which a local group of users may download and share content from the web. This enables users to view content they have downloaded without needing to be connected to the network and it cuts down on redundancy when downloading content because users can share content. The idea is for the system to help cut down on network traffic and allow users to view content they have downloaded when working offline.

1.2 System Overview

The system will consist of a Mozilla Firefox plug-in on the client side and a series of components on the server side. The user will use the plug-in to authenticate with the server and then use the plug-in to select content to be downloaded. Content is downloaded on the server and placed within the user's file space. The user will access the server via a web interface. Through this interface the user will manage, share, and archive content.

2. Design Considerations

2.1 Assumptions and Dependencies

AS-1: Users will be able to access server using Mozilla Firefox and install necessary plug-in.

AS-2: Users will be able to open archived web pages.

DE-1: Server must be able to deploy a crawler to navigate and download requested web pages.

2.2 Constraints

No significant constrains yet identified.

2.3 Operating Environment

OE-1: My CAP Project shall operate only with Mozilla Firefox due to the need for the client plug-in. I do not yet know which versions of Firefox will be supported but I will do my best to work with past versions.

OE-2: My CAP Project will operate on a server running Ubuntu Server.

OE-3: My CAP Project shall be operating in the MVNU campus network. It is important that I keep the amount of traffic to and from my server to a minimum or I risk disrupting the campus network.

2.4 Design Methodology

The design process is to be iterative. I have created a general specification of the system, but many of the final features and components will be decided through design and development. Once an initial prototype has been pushed out, then its design will continue to be refined over subsequent versions until an acceptable, final system has been produced.

The design is also to be highly modular. Each component of the system must be designed in such a way that its internals can be replaced without a change in the interface with the rest of the system. This will enable easier changes in the future and the system will be more understandable in its design.

2.5 Risks and Volatile Areas

RI-1: The project cannot be completed in time. Not likely.

RI-2: Any single component of the system grows in complexity and cannot be finished. Distinct possibility.

VA-1: The web interface is a volatile area because I have little experience in this area and many features will be conveyed by this component. I have decided that the majority of the features are doable—with enough time.

VA-2: The content archiver is a volatile area because I will need to take the source code for Mozilla's Archive extension and use it for my own purposes. Hopefully this will not turn into a very complex task.

VA-3: Security is a volatile area because it is a high priority but it also loosely defined.

3. Architecture

3.1 Overview

The server will act as a hub for all activity by the user. The user will view and manage content via the web interface and issue commands to download content via the client plug-in. The master program will coordinate the operation of the entire server. It will keep the database and file space synchronized and act as a gatekeeper between these and the other components in the system. The downloader is the only component with direct access to the Internet simply because it is the only component which has any reason to be operating outside the server. Obviously the web server is an exception to this but for simplicity's sake I only represented the relationship between it and the user.

3.2 Subsystem, Component, or Module 1...N

If a particular component is one that merits a more detailed discussion than what was presented in the System Architecture section, provide that more detailed discussion in a subsection of the System Architecture section (or it may even be more appropriate to describe the component in its own design document). If necessary, describe how the component was further divided into subcomponents, and the relationships and interactions between the subcomponents (similar to what was done for top-level components in the System Architecture section).

Note that this design will likely be hierarchical, with sub components being broken up into sub-sub-components. This will be a living process as the project evolves. Try to cover as much as you can now.

3.3 Strategy 1...N

Describe the strategy used or decision made. Include information on the alternatives considered and the reasons for their rejection.

4. Database Schema

The database will represent the one-to-many relationship between users and content. Each content item and folder has its own unique ID and is associated with only one user. If two users have downloaded the same content, then it is still considered different content items. In addition to the tables which represent these relationships there will be tables for keeping various records of activity.

Database Schema

TODO

Tables and Fields

Table: user

Each user is given a unique, sequential ID number. The ID is assigned by using mySQL's AUTO_INCREMENT attribute. This table houses any information about the user which is not directly related to any content items or folders. In addition to personal information, it is possible that usage statistics could be stored here.

Field Name Date Type Allow Nulls Field Description
user_id INT No Unique ID of user; PRIMARY KEY
last_name VARCHAR(30) No Last name of user
first_name VARCHAR(30) No First name of user
username VARCHAR(30) No System username for user (Any sequence of alphanumeric characters)
password VARCHAR(20) No User password (no password criteria as of yet)
email VARCHAR(50) No Email address of user

5. Detailed System Design

5.1 Common Design

Wrappers

The archiver and downloader will not be stand-alone components, but will be interfaced with using wrapper programs. The purpose of these wrappers is to reduce coupling so that if an individual component is modified in the future, the interface will not change and no major recoding needs to be done in other components.

Communication via Pipes

When the system is started, pipes will be created between each component and the master program. There will be two pipes between each component: one for messages from the secondary component to the master program, and another for messages from the master program to the secondary component. The messages sent through the pipes vary depending on the components involved. See individual components for how they handle messages.

Log Files

capmaster.log
Anything and everything from Master Program gets logged here. Format is "[MM/DD/YYYY HH:MM] -<entry priority>- <message>". The entry priority is a single character indicating the type of entry made; FATAL ERROR, ERROR, WARNING, INFORMATION, UNKNOWN.

capweb.log
Anything and everything from Python CGI scripts gets logged here. Same format as above.

Pipes

man2master.fifo, master2man.fifo
Used for communications between CAPManage and Master Program.

web2master.fifo, master2web.fifo
Used for communications between web server and Master Program.

Component Locations

Location of system components in Linux server file system

Component Path
Configuration File /var/cap/capconf.xml
Master Program /usr/bin/capmaster
CAP Manage /usr/bin/CAPManage
Archiver /usr/bin/caparchive
Downloader /usr/bin/capdownload
File Space /var/cap/
Logs /var/log/
Pipes /var/run/cap/
Client plug-in Install /var/www/
Master Program PID /var/run/cap.pid

5.2 Master Program

Description

Acts as the coordinator of the entire system; whenever a data-changing action needs to be carried out, then it must go through the master program. The program will be one process which runs through a perpetual loop reading various pipes and responding to the data in each. The main loop will maintain a constant connection to the database so that changes can be made quickly, and reflected in the web interface. Only one instance of the program will be running at a time.

The Master Program will have the location of the configuration file passed to it when it starts. The configuration file will specify the locations and names of other components the program must manage. If the Master Program is unable to read the configuration file or open a log file, then it will terminate. If it does not know where anything is or it cannot report problems, then it should be stopped from operating. Master_Program_Pseudocode.txt

Command line: capmaster pid capconf.xml

The caller must specify the location of the master program's process ID (PID) file which is used to ensure that only one instance of the process is running. The second argument is the location of the XML configuration file for the system.

Messages

MSG_MASTERHERE [out]
Sent to web server when Master Program starts.

MSG_QUIT [in|out]
When received, message is forwarded to web server, archiver and downloader. Program will then continue reading from pipes until they are empty at which point it will terminate.

MSG_UNKNOWN [in|out]
When received, entry is made to error log. Send in response to an unknown message in the pipe.

CAPManage

Description

A secondary program which takes command line arguments and passes them to the master program. This is the means by which the system administrator will manage the master program.

Commands

Command Option Description
start   Starts Master Program
stop   Sends message to terminate

Messages

MSG_QUIT [out]
Sent to Master Program in response to stop command.

MSG_UNKNOWN [in|out]
When received, notify user of the message which Master Program did not recognize. Send when Master Program sends an unrecognizeable message.

5.3 Web server

Description

The web server will be run on the latest version of the Apache HTTP Server. This is the only component which will be permitted to circumvent the master program and access the database directly. The web server will be able to perform read-only operations on the database. Forcing these queries to pass through the master program would be slow and mostly unnecessary because records in the database can be blocked from reading without such regulation. However, any commands to be executed must be sent to the master program to be processed. The web server can in no way add or update records in the database on its own.

Python

The web server will send requests to Python CGI scripts for processing. Python will act as the middle-man between the web server, and the database and master program. There will also be a Python script which is responsible for processing HTTP messages to and from the client plug-in.

Messages

MSG_MASTERHERE [in]
Master Program has restarted; unset flag and allow requests to be handled.

MSG_QUIT [in]
Flag is set to indicate that Master Program is unavailable; any HTTP requests which necessitate a change to the system will be denied. As long as the database is still available, then users can still view information.

MSG_UNKNOWN [in|out]
When received, write entry in error log and display error message to user. Send in response to an unknown message in the pipe.

5.4 Archiver

Description

The archiver is responsible for taking content items and combining them into archive files. The finished archive files will be in the Mozilla Archive Format (MAFF). These are essentially ZIP files with some nifty meta-data for browsing them.

Messages

STATUS, CANCEL, ARCHIVE

Operation

The master program sends a STATUS message to determine if the archiver is idle. If it is, then the master program copies content items into the archiver's working directory and sends an ARCHIVE message. Upon completion the archiver sends another STATUS message to the master program which then copies out the new archive file and clears the archiver's working directory.

5.5 Downloader

Description

The downloader is responsible for parsing download requests from users and downloading the requested content. It will consist of the GNU Wget program which will perform the actual downloading and a wrapper C program which will serve as the interface between Wget and the master program.

Messages

STATUS, CANCEL, DOWNLOAD

Operation

The master program sends a STATUS message to determine if the downloader is idle. If it is, then the master program sends a DOWNLOAD message with the requests to be downloaded. Upon completion the downloader sends another STATUS message to the master program which then copies the downloaded content out of, and clears, the downloader's working directory.

5.6 Database

The system will use a mySQL database to store user, content, and log data. There is no wrapper program for the database; it will be directly accessed by the master program. The interface between the database and master program will be the mySQL C API. The database schema is described under the same-named section of this document.

5.7 File space

Description

The file space is the folder on the server's hard drive that will contain all users' content. There will be one root folder containing a subfolder for each username in the system. User content will be stored under this path: $ContentRoot/$Username/

Each content item will be named after its ID number in the database. So if a content item is stored with an ID of 123456, then it will be stored in the file space as: $ContentRoot/$Username/123456.content

All content items will be named with this method; whether it is downloaded content or an archive. Folder hierachies will be stored in the database. Therefore, all content items for a user will be stored in the single path given above. With this method content does not need to be rearranged on the hard drive everytime the user moves it to another folder.

Interfaces

There is no wrapper program for the file space. File operations are directly called by the master program. The only operations which will need to be performed on the file space are: copying content into and out of file space, and deleting (permanently) content.

5.8 Configuration file

A simple XML file which stores all of the configuration information for the system. The system only reads the file, so any changes need to manually be performed by the system administrator.

5.9 Client plug-in

Description

The client plug-in will be a Java program which receives requests to download content from the user and then forwards those requests to the server for processing. It will be written in Java to allow operation on multiple platforms. The client plug-in will take content download requests from the user, wrap the request(s) in a messaging format, and then send these messages to the server over a connection it has already setup.

Components

The client pug-in will consist of a Mozilla Firefox extension which provides an interface to the user, a Java program which handles the users requests, and a TCP socket to the server over which to send the content requests. The Java program will contain a public class which is exported to the user interface.

Interfaces

Mozilla Firefox extension, Java program

Communication between the Mozilla Firefox extension and the Java program will be done using LiveConnect . Content download requests from the user interface will be simple text strings. Below is a table of the various download options available to the user. The input is passed to the Java program exactly as it appears in the input dialog in the user interface.

Functions exported to user interface:

NOTE: All functions throw exceptions if errors occur.

Connect(String user, String passwd) -- Attempts to establish TCP connection with server and authenticate user. Returns true if successful.

Disconnect() -- De-authenticates user from server and closes TCP connection. Returns true if successful.

DownloadContent (String input) -- Takes given input string and attempts to pass it to the server. No return value.

GetPreferences () -- Returns an XMLObject containing all of the client plug-in's preferences.

SetPreferences (XMLObject pref) -- Sets the client plug-in's preferences.

GetProgress () -- Requests that server give an update on user's job progress. Server replies with this structure within a message: JobsProgress {double perc; String status; String msg;};

Java program, server

The Java program will establish a TCP connection with the server which will be used for communications between the client and server. All communications will use the same messaging format. If at any time the connection is lost, an exception is thrown to the user interface. The server never initiates contact with the client; if the client plug-in wants anything, it must ask the server for it.

5.10 Client-server MESSAGE format

This is the message format that is used for communications between the client plug-in and the server. Messages are sent over a TCP connection.

MESSAGE HEADER

int size // size of message body

char[4] content // content/type of message

MESSAGE BODY

Content dependent on message type

Content types:

AUTH, used when client is being authenticated/de-authenticated. When sent from client, contains username and password. When sent from server, contains result code.

REQT, used for download content request. When sent from client, contains request. When sent from server, contains acknowledgement of request.

PROG, used to get user's job progress. When sent from client, contains nothing. When sent from server, contains JobsProgress class specified in Client plug-in definition.

TODO: When message is of type AUTH, it must be encrypted.

5.11 Piped message format

Messages will be parsed by lines; a line is terminated by a newline character with ASCII code 0xA. Rather than have a message end with a certain character, it is better to specify the length of the message in a pre-determined line so that the contents of the message are not limited.

Messages will follow this format:

A message header indicating the nature of the message; single word, all caps. and underscores (ex: MSG_QUIT)
An unsigned long integer indicating the length of the message body
The message body which may contain any sort of data, or no data

6. User Interface Design

6.1 Common Appearance and Behavior

All of the screens will have a logo header, a tree view of the folder hierarchy on the left, and a status bar at the bottom of the page. They will all support cut, copy, paste, and delete operations with text and any other applicable items which are selected. All of the screens will also support keyboard shortcuts for the above-mentioned operations as well as responding to the Return key by executing some default action for that screen.

6.2 Individual Screens

Home

Gives the user a broad overview of the content they own as well as the current job queue. The user will be displayed miscellaneous information about the content they own. I will not try to specify the information to be shown now because this is a highly flexible feature and will be implemented as said information is desired. There will be a pie graph showing the user how much of their allocated file space has been used as well as what comprises that used space. The user will also be able to view the job queue; it will display all jobs but jobs that the user does not own will be listed anonymously.

Profile

Allows user to view and change personal information about himself in the system. The user must answer his security question before he can access his profile page. This might seem like a hassle, but the profile page displays sensitive information which needs to be protected. The password and security question fields will indicate the age of those fields. There will be a list view which dumps all data that references the user in the database. This is to facilitate an honest privacy policy by showing the user what the system knows about him.

Archives

Allows user to view, run (compile content into archive), and manage archives. The controls of this screen will consist of a dropdown box which lists the user's archives, a button to run the archiver, a dropdown box with options to manage archives, and a button to download the archive (if it has been compiled). The user can create new archives, duplicate the current archive, and delete the current archive. When an archive is selected it displays information about that archive as well as a list of all the content in the archive.

6.3 Content

View and managing content is done with a tree view, a list view, and a content view. The tree view is a folder hierarchy with two parent nodes: the user's root folder and a shared folder accessible by all users. The tree view only allows one folder to be accessed at a time and only displays folders, not content therein. The list view displays a folder's subfolders, content, and archives. The list view will have a header which indicates the parent folder as well as additional columns displaying details about the list items. Multiple items in the list can be selected but this does not guarantee that an operation can be carried out on all of the selected items.

Drag-and-Drop Behavior

All items in the list view can be dragged-and-dropped into subfolders or into a folder in the tree view. Anytime an item is dropped on the Shared folder the only action taken is that the item is shared—nullifying normal behavior.

Tree view folder --> Different tree view folder Moves folder to new destination
Tree view folder → Same tree view folder Does nothing
Tree view folder → List view Same as above-mentioned; destination is folder being displayed in list view
Tree view folder → List view subfolder Same as above-mentioned; destination is subfolder of folder being displayed in list view
List view subfolder → Any folder Equivalent to above-mentioned
Root or shared folder → Any folder CANNOT BE DONE
Any folder → Content CANNOT BE DONE
Any folder → Archive Adds folder to archive
Content → Any folder Moves content to destination folder
Content → Content CANNOT BE DONE
Content → Archive Adds content to archive
Archive → Any folder Moves archive to destination folder
Archive → Content or archive CANNOT BE DONE

Popup Menu

There will be a popup menu displayed upon a right-click within either the tree or list view.

Tree View (folder selected) View, share, archive, cut, copy, paste, delete, and new folder
Tree View (no folder selected) None available
List View (item selected) Same as first row, but also has download option for content and archives
List View (no item selected) Share (current folder), paste, and new folder

Actions for Selected Items

ACTION CALLED BY DESCRIPTION
Cut CTRL-X, popup menu Deletes the selected item(s) and copies them to the clipboard; the clipboard is only applicable to the web interface. For example, the user cannot cut a file and then paste it on their local hard drive.
Copy CTRL-C, popup menu Copies the selected item(s) to the clipboard.
Paste CTRL-V, popup menu Pastes whatever item(s) are on the clipboard into the currently selected folder.
Delete DEL, popup menu PERMANENTLY deletes selected item from system.
View Double-click, enter when activated, view button, popup menu; single-click for folder in tree view Folder: navigates to selected folder. Content: displays in HTML frame. View button is toggled on when HTML frame is being viewed. This view can be left by toggling View button or pressing Esc key. Archive: displays selected archive in archive screen.
Download Download button, popup menu Begins HTTP download of selected content or archive. Folders cannot be downloaded.
Share Share button, popup menu Either copies this content, folder, or archive into Shared directory or un-shares the item depending on its shared status.
Archive Archive button, popup menu Adds selected content items or folders to archive selected in dropdown box at bottom of screen.

6.4 Client Plug-in

Authenticating

The user must authenticate with the server before he can begin downloading web pages. If he is already logged in via the web interface, then the plug-in will authenticate automatically. Otherwise, he must login to the server via the plug-in.

Downloading Pages

There is a single button for downloading the URL currently being viewed. There are four options for downloading multiple pages.

Domain: specifying domain from which to download pages (e.g. *.reptilesmagazine.com)

Host: specifying host from which to download pages (e.g. http://reptilesmagazine.com)

Directory: specifying directory from which to download pages (e.g. http://reptilesmagazine.com/snakes/)

URL list: specifying a list of URLs to download

Each of these selections brings up an input dialog where the user can enter in the desired values. There is a small control on the client plug-in which displays the inputs given and can be clicked on to edit those values. When either of the download buttons is selected, a download request message is created and sent to the server.

Other

There is a progress icon between the single page and multiple page download areas which displays the server's progress in downloading the pages. The progress shown is the total completed of this user's jobs in the job queue. The logout button de-authenticates the user from the system. The preferences button brings up a standard preferences dialog.

6.5 Visual Mockups

Web interface.pdf, revision 1.0

Client plug-in.pdf, revision 1.0

-- GrantGipson - 2011-02-12

Topic attachments
I Attachment Action Size Date Who Comment
Pngpng CAP_system.png manage 49.2 K 2011-02-13 - 02:58 GrantGipson System Architecture Diagram 1.0
Pdfpdf Client_plug-in.pdf manage 97.2 K 2011-02-13 - 02:59 GrantGipson Client plug-in Interface Diagram 1.0
Txttxt Master_Program_Pseudocode.txt manage 0.9 K 2011-02-16 - 03:27 GrantGipson Master Program Pseudocode v. 1.0
Pdfpdf Web_interface.pdf manage 137.7 K 2011-02-13 - 03:00 GrantGipson Web Interface Diagram 1.0
Topic revision: r7 - 2011-03-14 - RicardoRodriguez
 
This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback