Fandom

Stanford University Wiki

Customized content aggregation

229pages on
this wiki
Add New Page
Talk0 Share

Introduction Edit

With the advent of tabbed browsing, it is now very convenient to have multiple web pages open at the same time. Oftentimes, a user would like to view interesting content in two different tabs, but it can become cumbersome to switch between pages. For example, a user may want to compare the content of two different pages, follow the score of a game while reading a news article, or watch a lecture online while reading the notes.

Our interaction addresses this problem by allowing users to identify specific content from each page, then view only this desired content on a single "aggregated" page. This interaction is intended to work for pages across multiple domains.

How to Use the Interaction Edit

The content aggregator is very simple to set up. First, the user needs to install GreaseMonkey, which is a Firefox add-on that allows custom JavaScript to be inserted in a page. Then, the user needs to install the script so GreaseMonkey can insert the custom code.

The process of aggregating content is also quite easy:

  1. Once the script is installed, each page will have two buttons at the top: "Aggregate" and "Stop Aggregating." To start aggregating content, the user clicks the corresponding button.
  2. Now, when the user hovers over a page element, it will be surrounded by a bright green border (which is a color that is unlikely to be used in most pages).
  3. The user then double-clicks on the desired content once it is highlighted.
  4. The user then opens the aggregated page, which has all the selected content. The location of this page is described in more detail in the next section.

Implementation Edit

Summary Edit

Our architecture involves a split in functionality between the server and browser. We are using client-side JavaScript to interact with a PHP server. Since the script is not possible to store cross-domain content via the browser, we need a server to maintain the aggregated content for each user.

Proj

Browser/Server Functionality Split

Details Edit

The client-side JavaScript utilizes many DOM facilities, which are described below.

  • Content Selection: The green box is drawn by changing the style.border property of the element using the onMouseOver event handler. The original border is restored when the mouse moves out.
  • HTML Extraction: Once the user selects the desired content, the script uses DOM facilities to extract the element's innerHTML, tag name, and attributes. These values specify the HTML used to generate the element.
  • Document Style Extraction: In order to display the element as it appears on the original page, we need style information for the document. We extract the URLs for linked stylesheets and get the HTML for all <style> elements.
  • POST Data: We then create a form, fill it with the URL, extracted HTML, and style information, and send the data to the server.

We used a PHP server with a modular three-tiered architecture to manage the user's aggregated content.

  • Save Data: The first file simply handles the POST request from the client script. It starts or resumes a session and stores the content in the session variable.
  • Display Aggregated Page: The second layer is a skeleton for the aggregated page. This page dynamically creates iframes based on how much content has been aggregated. In the iframe, it links to the third file and provides it a query string, which identifies which piece of content to return:
    Frame Code:<iframe src = 'http://.../file3.php?aggNum=".$i."'></iframe>"
    URL with query string: (http://.../file3.php?aggNum=2).
  • IFrame Content: The last file accepts the query string and includes the appropriate HTML. It also recreates the original page's style by including the extracted styles, and replaces relative paths (i.e. in links) with absolute paths.

Challenges Edit

As already mentioned, the core of our interaction is a script that runs on every page that the user wishes to aggregate. In order to be most useful, we did not want the interaction to depend on the publisher of a web page adding the required JavaScript. Thus, we needed to include the script on the client side. GreaseMonkey is a Firefox add-on that allows us to do this. The user just needs to install GreaseMonkey and add our aggregator script to enable the interaction. It requires no intervention or work from the publisher.

Though we aimed to do most of the work client side, there were certain constraints that forced us to use a server. Our interaction needs to store the aggregated content, and there is no procedure or method to store any content client side. To facilitate this, we used a PHP server with sessions enabled to ensure that a user doesn't lose content between multiple aggregations.

The greatest challenge we faced related to security restrictions imposed to prevent cross-site scripting attacks. Many of the properties we needed were unavailable even for read access. For example, to render the aggregated page as close as possible to the original page, we needed to extract the style elements associated with it. We initially planned to extract this information using the server, but due to security restrictions, browsers prevent a page accessing the style elements of a webpage from a different domain. To overcome this, we extracted the style associated with the page using JavaScript and sent it to the server.

We are currently sending the form data to the server using the standard POST method. This is because it is not possible to send cross-domain AJAX requests. GreaseMonkey has a method that allows for these requests, but we have been unable to get it working at this time.

Reusability Edit

Requirements and Multi-User Support Edit

Our interaction can run on any page without requiring any special support like JavaScript from the server hosting the webpage. The script running inside the client's browser is completely page agnostic. The server side of the interaction uses a standard PHP server. The server can handle multiple aggregations from the same user and can be used simultaneously for different users.

Customization Edit

By changing the PHP server pointed to by the script, a user can customize how the aggregated content is displayed. The server has no dependence on the page being aggregated and is independent of the user. For example, Yahoo can use its own server to create a unique user experience across its various sites like sports.yahoo.com, finance.yahoo.com. To point the script to your own server(to enable the aggregated content to display according to your styling and/or requirements), only 1 line in the code needs to be changed. The publisher needs to update the postURL in the Javascript to point to his own server. He needs to change
var postURL = 'http://www.stanford.edu/~narenr/cgi-bin/file.php'; to
var postURL = 'http://yourServer/yourFile.php';

Strengths Edit

The core logic of the interaction is implemented on the client side using JavaScript, and since this JavaScript only needs to be downloaded once, the interaction is very lightweight. Further, it doesn't add any overhead to the content publisher, because no JavaScript is required to be sent from the server hosting the webpage. As a result, a user can use this to aggregate content over a myriad of webpages without worrying about how the content is presented on a page.

The interaction is also completely configurable by the user. GreaseMonkey allows the user to control which pages the script runs on. Also, as previously mentioned, the user can also customize the interaction by changing the server which the script points to.

Limitations Edit

Since the content aggregator extracts the desired HTML based on DOM elements, the content that can be selected is limited by the page layout. It is not possible, for example, for a user to draw an arbitrary rectangle and select all content that lies inside. In addition, users cannot select dynamic content because the aggregator relies heavily on DOM manipulations, and these elements do not exist in the DOM when the page is loaded.

Most importantly, there are always security considerations when handling cross-domain content. The aggregator sends raw HTML to the server, which is then output on the aggregated page. A website could potentially include malicious JavaScript code that, when aggregated by a user, then has access to all of the contents in the aggregated page. Thus, a user shouldn't use this interaction on secure pages and sensitive content.

Extensions Edit

At this point, we have developed a proof-of-concept prototype. Some useful additions may include automatically refreshing the aggregated content (like a sports score), providing a more customized experience with user logins, or allowing the user to drag, drop, and resize the iframes on the aggregated page.

Conclusion Edit

This interaction is feasible and useful for static content. We believe that we have created the cleantest possible architecture considering the significant limitations imposed on cross-domain DOM manipulations. However, these limitations make it difficult to accurately recreate all possible content from every possible page. We also believe that we have not exposed any significant security vulnerabilities that are characteristic of cross-site scripting because we developed on top of the widely-used GreaseMonkey add-on. In addition, it appears that new forms of cross-domain storage and document access are being discussed for future DOM specifications, which would make the implementation significantly easier.

Ad blocker interference detected!


Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.