With the advent of tabbed browsing, it is now very convenient to have multiple web pages open at the same time. Oftentimes, a user would like to view interesting content in two different tabs, but it can become cumbersome to switch between pages. For example, a user may want to compare the content of two different pages, follow the score of a game while reading a news article, or watch a lecture online while reading the notes.
Our interaction addresses this problem by allowing users to identify specific content from each page, then view only this desired content on a single "aggregated" page. This interaction is intended to work for pages across multiple domains.
How to Use the Interaction Edit
The process of aggregating content is also quite easy:
- Once the script is installed, each page will have two buttons at the top: "Aggregate" and "Stop Aggregating." To start aggregating content, the user clicks the corresponding button.
- Now, when the user hovers over a page element, it will be surrounded by a bright green border (which is a color that is unlikely to be used in most pages).
- The user then double-clicks on the desired content once it is highlighted.
- The user then opens the aggregated page, which has all the selected content. The location of this page is described in more detail in the next section.
- Content Selection: The green box is drawn by changing the style.border property of the element using the onMouseOver event handler. The original border is restored when the mouse moves out.
- HTML Extraction: Once the user selects the desired content, the script uses DOM facilities to extract the element's innerHTML, tag name, and attributes. These values specify the HTML used to generate the element.
- Document Style Extraction: In order to display the element as it appears on the original page, we need style information for the document. We extract the URLs for linked stylesheets and get the HTML for all <style> elements.
- POST Data: We then create a form, fill it with the URL, extracted HTML, and style information, and send the data to the server.
We used a PHP server with a modular three-tiered architecture to manage the user's aggregated content.
- Save Data: The first file simply handles the POST request from the client script. It starts or resumes a session and stores the content in the session variable.
- Display Aggregated Page: The second layer is a skeleton for the aggregated page. This page dynamically creates iframes based on how much content has been aggregated. In the iframe, it links to the third file and provides it a query string, which identifies which piece of content to return:
<iframe src = 'http://.../file3.php?aggNum=".$i."'></iframe>"
URL with query string: (http://.../file3.php?aggNum=2).
- IFrame Content: The last file accepts the query string and includes the appropriate HTML. It also recreates the original page's style by including the extracted styles, and replaces relative paths (i.e. in links) with absolute paths.
Though we aimed to do most of the work client side, there were certain constraints that forced us to use a server. Our interaction needs to store the aggregated content, and there is no procedure or method to store any content client side. To facilitate this, we used a PHP server with sessions enabled to ensure that a user doesn't lose content between multiple aggregations.
We are currently sending the form data to the server using the standard POST method. This is because it is not possible to send cross-domain AJAX requests. GreaseMonkey has a method that allows for these requests, but we have been unable to get it working at this time.
Requirements and Multi-User Support Edit
By changing the PHP server pointed to by the script, a user can customize how the aggregated content is displayed. The server has no dependence on the page being aggregated and is independent of the user. For example, Yahoo can use its own server to create a unique user experience across its various sites like sports.yahoo.com, finance.yahoo.com.
var postURL = 'http://www.stanford.edu/~narenr/cgi-bin/file.php'; to
var postURL = 'http://yourServer/yourFile.php';
The interaction is also completely configurable by the user. GreaseMonkey allows the user to control which pages the script runs on. Also, as previously mentioned, the user can also customize the interaction by changing the server which the script points to.
Since the content aggregator extracts the desired HTML based on DOM elements, the content that can be selected is limited by the page layout. It is not possible, for example, for a user to draw an arbitrary rectangle and select all content that lies inside. In addition, users cannot select dynamic content because the aggregator relies heavily on DOM manipulations, and these elements do not exist in the DOM when the page is loaded.
At this point, we have developed a proof-of-concept prototype. Some useful additions may include automatically refreshing the aggregated content (like a sports score), providing a more customized experience with user logins, or allowing the user to drag, drop, and resize the iframes on the aggregated page.
This interaction is feasible and useful for static content. We believe that we have created the cleantest possible architecture considering the significant limitations imposed on cross-domain DOM manipulations. However, these limitations make it difficult to accurately recreate all possible content from every possible page. We also believe that we have not exposed any significant security vulnerabilities that are characteristic of cross-site scripting because we developed on top of the widely-used GreaseMonkey add-on. In addition, it appears that new forms of cross-domain storage and document access are being discussed for future DOM specifications, which would make the implementation significantly easier.