School > INFO 424 > Group Project P3

Introducing Zaap

Numerous web traffic monitoring packages exist today. But despite the wide variety of software options available, we believe that incredible opportunities exist for fundamental improvements over the solutions currently available. One of the core sources of this problem is that the tools created to date have been designed and implemented by programmers whose interest has been more focused on the technical issues than effective visualization and interaction design. In fact, not one of the major traffic analysis packages that is in use today makes use of interaction in any significant way. They are limited largely to long lists of tabular data in a scrolling window, with occasional graphical summaries. Zaap intends to change all this by leveraging the power of interaction to create tool that leverages human perception for the exploration of this information.

Target users

We recognize that tools of this nature support a wide variety of users, each with very different goals. Our purpose with Zaap is to create an application that supports each of these users needs without introducing any unnecessary complexity. We hope to create a general solution which addresses the needs of all the users by making exploration of the appropriate dimensions of the data for each one possible. An appropriate solution therefore, at a minimum, will address the needs of the following user groups:

  • Web Developers are most interested in traffic data that offers insight into the technical requirements for site development and/or maintenance. This includes things like identifying problems with the site or collecting statistics on user's browser usage to inform support standards
  • Site owners are concerned with evaluating the effectiveness of the site as an investment and for planning purposes in strategic initiatives. They are more concerned with higher level visitor behaviors, and often have less developed technical knowledge about the Internet. They may find too much technical detail exposed through the application confusing.
  • Marketing departments focus on traffic growth and decline over time. They are especially concerned with where that traffic is coming from, and the conversion statistics of users from different sources. A core objective of the marketing department is to make better spending decisions based on the relative success of different types of marketing campaigns.

Tasks

The fundamental task of traffic visualization is to display traffic patterns over time. This traffic could be measured in a variety of useful ways from hits (user requests for a file on the web server), to visits (which groups hits into a single even when they are from the same visitor and occur within a fixed period of each other), to bandwidth usage. Our application is built around allowing the user to explorer this data display by filtering it by a number of different variables. These include:

  • Date - The date and time visitors made the page requests
  • Page - The pages requested by visitors
  • Browser - The browsers visitors used when viewing pages
  • Entry page - The first page a visitor accesses in a visit
  • HTTP status codes - The result of the request, ranging from a successful serving of the page to the visitor to one of many errors
  • Visitor location - Derived from a look-up based on visitor's IP address to determine the country (and possibly even city) that the visitor made a request from. This will require an external database for the IP to city mappings
  • Purchasing behavior - How far through a purchasing process visitors got (added item to shopping cart? purchased item?). This will require some site-specific configuration to track effectively
  • Referral page / Search terms - The first page visitors came from when they entered the site. We would include some special processing for search engine pages to display them in a more friendly way - emphasizing their the search engine and search terms

Data

Zaap is designed to process the traffic logs automatically generated by servers such as Apache. These logs can contain a vast amount of data, one data set we collected for review contained 2 gigabytes on the traffic for one site over 3 months. A typical example would have contain about 100,000 records for every month of activity. This data is encoded in a file with very strict formatting rules, but that is not formatted for human review. Clearly a visualization tool is required to display this volume of data in a meaningful way. The following schema defines the form of the data that Zaap takes as input:

VariableTypeDescription
Visitor IP Nominal 4 sequences of 8 bits holding values 0-255, representing the address of the computer from which a request came. Possible values range from 0.0.0.0 through 255.255.255.255 but their ordering is not significant in any way. It is possible through the use of an external database to map many IP addresses to known physical locations, ranging in specificity from country to city.
Date Ordinal The date and time that a request was sent. Valid values range from the date the log begins to the last date recorded and are formatted as: "31/Oct/2006 13:50:58"
Offset Quantitative A time offset that is applied to get from GMT to the local time the server uses to record the date specified above. Ranges from -1200 through +1200, with the first two digits representing hours and the second two minutes which must be added or subtracted from GMT.
Path Nominal Any path to a page requested on the server beginning with "/" to represent the document root, in this case the domain that was request was sent to (eg www.google.com). Valid characters include: [a-zA-Z0-9&?=-]
Protocol Nominal The protocol used to make a page request. Valid protocols include HTTP/1.0 or HTTP/1.1
Referrer Page Nominal The last page the user visited before requesting this page. Valid values include any web url, or blank or "-" to indicate the page was visited directly
User Agent Nominal A string representing the User Agent indicated by the visitor's browser. This might be sent by Internet Explorer, or even a search engine crawler. A typical example follows: "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)". In order for this to be a useful measure for the user it will need to be transformed into groups by browser or search engine, because the exact user agent string can vary dramatically even for a single browser. We may also pull the visitors operating system from the this string to give another variable with which the data may be filtered.

Scenario of use

Joe Corp. just launched some new products on their website and their CEO, Bill, wants to know how that launch is effecting the traffic that views their existing products.

Evaluation

The time requirements on this phase of development limit the time we have available for the actual implementation of the Zaap software, so we intend to focus our efforts on creating the most effective design possible. This will include multiple rounds of paper prototyping, user testing, and re-design at multiple levels of quality.

For our evaluation we intend to define a set of specific tasks for test users complete. Users will interact with with paper prototypes for initial studies of behavior and rough interface design, but final prototyping and testing may be completed in an electronic environment such as Macromedial Flash. As subjects interact with the prototype they will be observed and asked to think out loud. We will use these observations to problem areas in our design that need to be addressed to improve usability. Through these evaluations we are primarily concerned with observing issues concerning easy of use, accuracy, user satisfaction, and any unforeseen uses of the application that we had not considered in our initial design.

Risks

The complexity and scale of the data we are working with constitutes the biggest barrier we face in this project. It makes prototyping difficult, not because we do not understand the data, but because performing the kinds of analysis and display that we are hoping to use on a data set of a sufficient size to accurately simulate the actual user experience of this application is nearly impossible without electronic. We have considered the possibility of manually transforming a small subset of the data, but we believe that without a large number (Say, at least 500) of well analyzed data points this visualization cannot be truly useful.

We have been able to import our data into some existing data visualization tools, such as Tableau, but have found that they lack the flexibility to generate some of the statistics our application would require. While number of hits are fairly easy to calculate in these software applications other statistics, visits and entry pages in particular, are proving much more challenging. We are continuing to consider what the best approach for generating the displays in our prototypes is. Thus far we have created what we believe are generally correct charts based on user interactions and our basic conceptualization of a theoretical data set that users have been tasked to explore. This admittedly does not provide the same level of accuracy as would be present in the final application, but we believe that it is sufficient for evaluating the interface and more effective than using the massively reduced data sets that could feasibly be constructed by hand.

Appendix I

Version 1

Our first prototype

Version 2

The second prototype

Test Version

The prototype we tested

Test Popup

The popup text for the prototype we tested

Appendix II

For our prototype evaluations we created a paper prototype, a list of three user tasks, and interviewed 2 other people by having them try to complete our pre-set tasks.

Evaluation Tasks:

  1. You are the server administrator for an online apple reseller. You were told that people are getting errors when trying to use your website but dont know where on the site those errors are occurring. You also know there have been recent server upgrades and want to know if maybe the errors were fixed after those upgrades. The upgrades were two weeks ago on the 30th of october.
  2. You also want to know if anyone on AOL uses your website and approximately how much of the traffic comes from the AOL browser to see if it's worth testing your website's compatabillity with the AOL browser.
  3. You want to know how your bandwidth is doing lately.

The interview process consisted of presenting the questions to the user and explaining the process. They were asked to follow where they would have placed the cursor with a pencil and click when they wanted to click, explaining their thought process all the while. Here are some of the results from each of our interviews. Our notes are much more extensive but these encompass the main ideas and results from each interview.

User 1 Results:

  1. Question 1 - The user seemed to have difficulty understanding waht "HTTP Code" meant. She also relied heavily on the graph's tick marks for counting even though our prototype did not have acurate measurements. This lets us know that labeling the graph will be very important. She used the sliders and the rest of the visualization as expected.
  2. Question 2 - She unclicked the server error button but mentioned she would have liked to have a checkbox or a more clear button interface. She clicked on the "+" button as expected to see the popup but was confused with the bar graphs and waht the numbers signified in our popup. She mentioned a request for a pie chart or alternate visualization other than a bar graph. She did not reset the dates from the last exersize making it more important for us to implement more "clear all" or "reset" buttons in the final visualization.
  3. Question 3 - Found it simple to find the proper dropdown and understand the graph.

User 2 Results:

  1. Question 1 - She was a little less familiar with the scenario and use of the visualization, but we found her feedback useful nonetheless. She was quite a bit confused with the bottom section of the prototype and spent a great deal of time lost and trying to understand what things meant. She required assistance understanding what "HTTP Code" meant so she could continue her interview. She had a little bit of difficulty narrowing the graph down and gave some ideas for ways to improve this interface. She also mentioned that with the layout given the "Line 1","Line 2", etc. tabs were confusing. We should put more attention into making sure people know how to use these -- this will likely be fixed when the full color version is made.
  2. Question 2 - She gave up looking at the "Browser" section when AOL was not there and continued to explore the other bottom sections looking for "AOL" before she noticed the "+" button. She tried to click the AOL button in the popup because she didn't understand the bar graphs and suggested the numbers should be a little more prominent.
  3. Question 3 - She did not understand bandwidth and it took her a while but eventually she found the dropdown at the top. Better labeling would be a good fix for this.

The tests yielded surprising results that we had not anticipated. The main issues had were that the interface didn't contain as many traditional design elements such as checkboxes and that some of our symbols were hard to understand. Parts were also found to be more overwhelming than others and we gained insight into how to better redesign the interface to make the user process more understandable and easy to use.