Numerous web traffic monitoring packages exist today. But despite the wide variety of software options available, we believe that incredible opportunities exist for fundamental improvements over the solutions currently available. One of the core sources of this problem is that the tools created to date have been designed and implemented by programmers whose interest has been more focused on the technical issues than effective visualization and interaction design. In fact, not one of the major traffic analysis packages that is in use today makes use of interaction in any significant way. They are limited largely to long lists of tabular data in a scrolling window, with occasional graphical summaries. Zaap intends to change all this by leveraging the power of interaction to create tool that leverages human perception for the exploration of this information.
We recognize that tools of this nature support a wide variety of users, each with very different goals. Our purpose with Zaap is to create an application that supports each of these users needs without introducing any unnecessary complexity. We hope to create a general solution which addresses the needs of all the users by making exploration of the appropriate dimensions of the data for each one possible. An appropriate solution therefore, at a minimum, will address the needs of the following user groups:
The fundamental task of traffic visualization is to display traffic patterns over time. This traffic could be measured in a variety of useful ways from hits (user requests for a file on the web server), to visits (which groups hits into a single even when they are from the same visitor and occur within a fixed period of each other), to bandwidth usage. Our application is built around allowing the user to explorer this data display by filtering it by a number of different variables. These include:
Zaap is designed to process the traffic logs automatically generated by servers such as Apache. These logs can contain a vast amount of data, one data set we collected for review contained 2 gigabytes on the traffic for one site over 3 months. A typical example would have contain about 100,000 records for every month of activity. This data is encoded in a file with very strict formatting rules, but that is not formatted for human review. Clearly a visualization tool is required to display this volume of data in a meaningful way. The following schema defines the form of the data that Zaap takes as input:
| Variable | Type | Description |
|---|---|---|
| Visitor IP | Nominal | 4 sequences of 8 bits holding values 0-255, representing the address of the computer from which a request came. Possible values range from 0.0.0.0 through 255.255.255.255 but their ordering is not significant in any way. It is possible through the use of an external database to map many IP addresses to known physical locations, ranging in specificity from country to city. |
| Date | Ordinal | The date and time that a request was sent. Valid values range from the date the log begins to the last date recorded and are formatted as: "31/Oct/2006 13:50:58" |
| Offset | Quantitative | A time offset that is applied to get from GMT to the local time the server uses to record the date specified above. Ranges from -1200 through +1200, with the first two digits representing hours and the second two minutes which must be added or subtracted from GMT. |
| Path | Nominal | Any path to a page requested on the server beginning with "/" to represent the document root, in this case the domain that was request was sent to (eg www.google.com). Valid characters include: [a-zA-Z0-9&?=-] |
| Protocol | Nominal | The protocol used to make a page request. Valid protocols include HTTP/1.0 or HTTP/1.1 |
| Referrer Page | Nominal | The last page the user visited before requesting this page. Valid values include any web url, or blank or "-" to indicate the page was visited directly |
| User Agent | Nominal | A string representing the User Agent indicated by the visitor's browser. This might be sent by Internet Explorer, or even a search engine crawler. A typical example follows: "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)". In order for this to be a useful measure for the user it will need to be transformed into groups by browser or search engine, because the exact user agent string can vary dramatically even for a single browser. We may also pull the visitors operating system from the this string to give another variable with which the data may be filtered. |
Joe Corp. just launched some new products on their website and their CEO, Bill, wants to know how that launch is effecting the traffic that views their existing products.
The time requirements on this phase of development limit the time we have available for the actual implementation of the Zaap software, so we intend to focus our efforts on creating the most effective design possible. This will include multiple rounds of paper prototyping, user testing, and re-design at multiple levels of quality.
For our evaluation we intend to define a set of specific tasks for test users complete. Users will interact with with paper prototypes for initial studies of behavior and rough interface design, but final prototyping and testing may be completed in an electronic environment such as Macromedial Flash. As subjects interact with the prototype they will be observed and asked to think out loud. We will use these observations to problem areas in our design that need to be addressed to improve usability. Through these evaluations we are primarily concerned with observing issues concerning easy of use, accuracy, user satisfaction, and any unforeseen uses of the application that we had not considered in our initial design.
The complexity and scale of the data we are working with constitutes the biggest barrier we face in this project. It makes prototyping difficult, not because we do not understand the data, but because performing the kinds of analysis and display that we are hoping to use on a data set of a sufficient size to accurately simulate the actual user experience of this application is nearly impossible without electronic. We have considered the possibility of manually transforming a small subset of the data, but we believe that without a large number (Say, at least 500) of well analyzed data points this visualization cannot be truly useful.
We have been able to import our data into some existing data visualization tools, such as Tableau, but have found that they lack the flexibility to generate some of the statistics our application would require. While number of hits are fairly easy to calculate in these software applications other statistics, visits and entry pages in particular, are proving much more challenging. We are continuing to consider what the best approach for generating the displays in our prototypes is. Thus far we have created what we believe are generally correct charts based on user interactions and our basic conceptualization of a theoretical data set that users have been tasked to explore. This admittedly does not provide the same level of accuracy as would be present in the final application, but we believe that it is sufficient for evaluating the interface and more effective than using the massively reduced data sets that could feasibly be constructed by hand.
For our prototype evaluations we created a paper prototype, a list of three user tasks, and interviewed 2 other people by having them try to complete our pre-set tasks.
The interview process consisted of presenting the questions to the user and explaining the process. They were asked to follow where they would have placed the cursor with a pencil and click when they wanted to click, explaining their thought process all the while. Here are some of the results from each of our interviews. Our notes are much more extensive but these encompass the main ideas and results from each interview.
The tests yielded surprising results that we had not anticipated. The main issues had were that the interface didn't contain as many traditional design elements such as checkboxes and that some of our symbols were hard to understand. Parts were also found to be more overwhelming than others and we gained insight into how to better redesign the interface to make the user process more understandable and easy to use.