What: Yahoo BOSS API
Who: Nipun Bhatia and Tejaswi Tenneti
BOSS - 'Build Your Own Search Service' is a web services platform that allows developers and companies to create and launch web-scale search products by utilizing the same infrastructure and technology that powers Yahoo! Search.It is an open platform that offers programmatic access to the entire Yahoo! Search index via an API. It basically turns web search into a web service by inviting developers to leverage Yahoo's core search technology and build their own web search implementations. The framework allows developers to submit queries (and their associated parameters) and retrieve up to 50 web, image, news, or spelling results in XML or JSON format at a time.
Supported Search VerticalsEdit
- Web search
- Image search
- News Search
What's in it for the developers ?Edit
- Ability to re-rank and blend results - Developers can reorder Yahoo results as they see fit to tailor their application. Further they can blend the search results with any proprietary and other web content. Consider augmenting search results with data from Digg.com or yelp.com
- Flexibility on presentation - Freedom to present search results using any user interface paradigm
- Mashups, mashups and more mashups - The BOSS API is augmented with a mashup library - BOSS Mashup library which is a Python library and UI templates. This allows developers mashup BOSS search results with other public data sources and frameworks. We discuss this in more detail in section .
- Unlimited queries - There are no rate limits on the number of queries per day, though at each time results are returned in batches of 50. A developer can iterate through all the results by changing the arguments.
Using the APIEdit
A developer can access the API by using queries whose general form is:
In the above query various fields are:
- vertical - specifies the type of vertical to query (Web, image or news)
- query - The input query
- appid - BOSS app id (Developer specific id obtained by registering for usage of the BOSS API)
- start - Ordinal position of first result. First position is 0. Default sets start to 0.Allows a developer to go through all the results, 50 at a time.
- count - Total number of results to return. Max is 50. Default sets count to 10.
- lang - Specifies the language search product to query.
- region - Specifies which regional (country) search product to query.Default sets region to 'us'. Must be used in parallel with lang
- format - The data format of the response. Value can be set to either 'xml' or 'json'. Default sets format to 'json'
Control over Search SpaceEdit
The API offers the user a better control over the search space when compared to the usual browser-based search
- Phrase Words: To ensure exact query keywords are present in the search result in the exact order use quotation marks, " " around the query words.
Eg: http://boss.yahooapis.com/ysearch/web/v1/"Apple Pie"?appid=xyz&format=xml
- -(minus) Operator: To exclude content that contains certain keywords use the - (minus) operator. Can be used in conjunction with other operators
- Site (Domain) search: We can prepend 'site:' to queries to search for or exclude documents based on the domain of the document
- Title Search: Used to return only pages that have the query in their title. Hence, this serves as a form of tag search.
- Get Related Sites: We can make a secondary request for each result to find related web pages. This would have some performance impact, but could be done as an AJAX request after the page has loaded.
BOSS Mashup FrameworkEdit
The API is augmented with an experimental python library(Yahoo BOSS Python API) developers with SQL-like constructs for mashing up the BOSS API with third-party data sources. The library exposes data constructors that on a "best effort" basis unify response formats regardless of their XML, JSON, RSS/RDF structures. In this paradigm, the developer does not need to specify the parsing and conversion logic per data source, resulting in concise, declarative and simple-to-interpret code.
Features of the Mashup FrameworkEdit
- Support for SQL-like functions such as select, group (reduce), sort, union, inner join and user defined functions (map)
- Text normalization and duplicate removal
- Auto-transformation of resource-oriented API results into tables without specifying any parsing logic (unifies XML and JSON responses based on inferred data format)
- All-in-memory storage and retrieval operations
- Ability to join lists of tables via an arbitrary predicate function
- Mashup output available in XML or JSON
- Includes UI templates to allow developers to easily render mashed up search results on a customizable search results page template
Libraries provided by the Mashup APIEdit
- yos.yql.db: provides classes and functions for creating and remixing tables out of XML/JSON responses
- yos.boss.ysearch: provides a single function for fetching BOSS search results
- yos.yql.udfs: provides some handy user defined functions for yos.yql.db.select calls
- yos.util.text: provides some handy functions for processing and comparing text (strings)
- yos.util.console: provides a write function that prints messages to stdout despite encoding errors
Sample API CodeEdit
Given below is an example usage of the API to search the web for the term 'Django'
from yos.boss import ysearch from yos.yql import db \\Search for the term "Django". lang and region are search parameters that specify the language and region respectively data = ysearch.search("Django",lang='jp',region='jp') \\create a new table to store the search results table = db.create(data=data)
Results: Given below is a single result of the above submitted query. Please refer to the appendix for a description of the response fields.
u'dispurl': u'django.nqsblog.jp', u'title': u'Django Kumamoto', u'url': u'http://django.nqsblog.jp/', u'abstract': u'Django Kumamoto. \u718a\u672c\u30b8\u30e3\u30f3\u30b4\u306e\u30e9\u30a4\u30d6\u30fb\u30a2\u30fc\u30c6\u30a3\u30b9\u30c8\u30fb\u30d4\u30c3\u30af\u30a2\u30c3\u30d7\u60c5\u5831 ... \u718a\u672c\uff24\uff4a\uff41\uff4e\uff47\uff4f\u30e9\u30a4\u30d6\u30b9\u30b1\u30b8\u30e5\u30fc\u30eb ... Django. 07-01-12. \u53ea\u4eca\u597d\u8a55\u767a\u58f2\u4e2d! \u544a\u77e5\u6709\u96e3\u3046\u5fa1\u5ea7\u3044\u307e.. TIGER HOLE ...', u'clickurl': u'http://django.nqsblog.jp/', u'date': u'2008/07/07', u'size': u'23025'
Yahoo BOSS and Google App Engine IntegratedEdit
The above features of the BOSS Mashup Framework(BMF) empower developers to build web applications in a quick and effective way. Running BMF on top of the Google App engine is one of the easiest ways to deploy BOSS.
Integration with Google App Engine has been discussed in detail here
Applications built with Y! BOSSEdit
A number of applications have been built upon the Yahoo BOSS platform. These applications utilize the Yahoo search index and augment it with a proprietary algorithm or present search results using innovative front end.
Hakia – A semantic search engine which uses BOSS to access the Yahoo’s search index and augments it with its secret semantic sauce. It is one of the start-ups using Yahoo’s infrastructure and index to improve the relevancy of search by augmenting Yahoo’s search result with its algorithm to semantically analyze the web.
PlayerSearch – It is sports search engine that pulls in content from a host of sources, including BOSS. It illustrates how useful BOSS is to build niche search engines, without the unnecessary worry of infrastructure and resources.
NewsLine – Is an interactive timeline that places news events in chronological order. It harnesses the power of BOSS in-conjunction with the Daylife news API.
Tianamo – Unlike other applications which augment BOSS with some proprietary algorithm, TInamo aims to provide a user with a distinct search experience with its 3D visualization of the search results.
BuildaSearch - Intends to remove the whole programming aspect of implementing BOSS for your website. It simplifies the setup process by letting you pick just the colors, images, and scope of search results you desire.
A custom search for the class wikiEdit
We can build a custom search engine that searches the class wiki by using the site search discussed above.
For more a sophisticated form of searching BOSS Custom allows developers to actually push data to Yahoo’s servers for indexing, and then perform highly customizable search queries against them. Using BOSS Custom we can conduct search and retrieve results that have been weighted using a custom relevance model. User can drill down by author, comments, date etc.
For eg. A search for 'Nipun Bhatia' could be implemented in a way so that documents written by him are retrieved before documents on which he has left a comment. A number of relevancy models can be designed on top of BOSS Custom.
Google Custom Search
Google provides developers with a search API that allows users to perform custom search. However both these APIs have a considerable number of differences. Unlike BOSS, the Google API is extremely restrictive and places a cap on the number of search queries. Google doesn’t allow developers to mix results from other search engines, nor does it allow any re-ordering. The more open and flexible BOSS API powers developers to create useful applications and not just display search results – which seems to be the main aim of the Google API.
Many sites like Techmeme, Digg, Techcrunch do not search features that provide web-level comprehensiveness.The biggest goal of Boss is to help bootstrap sites like these to get comprehensiveness and basic ranking for free, as well as offer tools to re-rank, blend, and overlay the results in a way that revolutionizes the search experience.
One can rank web results by digg and youtube favorite counts, remove duplicates, and publish the results using a provided search results page template in less than 30 lines of code and without having to specify any parsing logic of the data sources/API’s as the framework can infer the structure and unify the data formats automatically in most cases.
Yahoo BOSS radically opens up web search to a whole lot of new players. It enables Yahoo to spread its technological influence and monetize the long tail of search. However, in our opinion what makes BOSS API practical and useful is the Boss Mashup Framework. Its runs seamlessly on platforms like GAE – Google App Engine and makes it easy for developers to build mash-ups and create valuable web applications.
As already discussed advent of BOSS has led to a wide range of applications & it has enable innovation on a number of fronts – from improving search UI experience(Tianamo), to search platforms (BuildASearch), social search browser plugins (Medium - http://blogme.dium.com/), novel search algorithms to augment BOSS(Hakia) and vertical search engines(NewsLine and PlayerSearch). It seems it can only get better and bigger from here.
Niche search engines to date haven’t been very good primarily because they have access to a very limited index of content. It is expensive to index the whole web and BOSS reduces the entry to barrier to search. Consider how useful it will be to perform custom search just across a class website, the class wiki and bunch of sites that have information related to what is being taught in the class, without worrying about the overwhelming number of results from the entire web.
Yahoo BOSS, though a remarkable first release, can improve on certain fronts. On the technological side, reordering of results is done after the search results are retrieved. This can slow down an application that relies on reordering of results. The framework does not allow developers to pass parameter that tweaks its standard relevancy model and return reordered results on the fly. In addition to this, the approval process for BOSS custom takes time. Yahoo also doesn’t plan to allow developers to access its crawl of the Semantic web.
Search Response fields
Web Search ResponseEdit
The response for Web search query contains the following fields
- COUNT: Indicates how many results to show per page.
- START: The first numeric result to display.
- TOTALHITS: A result count that reflects no duplicates. A normal use for totalhits is to determine how many pages of results to offer in search result navigation.
- DEEPHITS: It returns an approximate count that reflects duplicate documents and all documents from a host.
- ABSTRACT: Abstract with keywords highlighted with HTML tags
- TITLE: Title with keywords highlighted with HTML tags
- URL: URL of result
- CLICKURL: Returns a navigation URL that leads to the target URL for each result. A clickurl might lead through a redirect server, which provides Yahoo! with important usage data from search result sets.
- DISPURL: Returns the URLs of documents matching the query result. Use this field only for display purposes on result pages. To direct search users to the target document, use the clickurl value.
- SIZE: Returns the document’s size in bytes
- DATE: Returns date in YYYY/MM/DD format
An example web search XML response:
<ysearchresponse responsecode="200"> <nextpage>/ysearch/web/v1/foo?format=xml&start=10&count=10</nextpage> <resultset_web count="10" start="0" totalhits="29440998" deephits="881000000"> <result> <abstract><![CDATA[World soccer coverage from ESPN, including Premiership, Serie A, La Liga, and Major League Soccer. Get news headlines, live scores, stats, and tournament information.]]></abstract> <date>2008/06/08</date> <dispurl><![CDATA[www.soccernet.com]]></dispurl> <clickurl>http://us.lrd.yahoo.com/_ylc=X3oDMTFkNXVldGJyBGFwcGlkA2Jvc3NkZW1vBHBvcwMwBHNlcnZpY2UDWVNlYXJjaARzcmNwdmlkAw-- /SIG=10u3e8260/**http%3A//www.soccernet.com/</clickurl> <size>94650</size> <title>ESPN Soccernet</title> <url>http://www.soccernet.com/</url> </result> </resultset_web> </ysearchresponse>
Image Search ResponseEdit
The response for an Image search query contains fields that additionally specify the Filename of image, Size of image file, Format of image, Dimensions of the image and so on.
An example Image search XML response:
<ysearchresponse responsecode="200"> <nextpage>/ysearch/images/v1/soccer?format=xml&start=10&count=10</nextpage> <resultset_images count="10" start="0" totalhits="4195016" deephits="4195016"> <result> <abstract>Now we don't know Crawford from Adam but he's a good friend of WCP-occasional-contributor Beans</abstract> <clickurl>http://us.lrd.yahoo.com/_ylc=X3o/SIG=12f47a1hh/**http%3A//wickedchopspoker.blogs.com/images/why_men_love_soccer.jpg</clickurl> <filename>why_men_love_soccer.jpg</filename> <size>31500</size> <format>jpeg</format> <height>213</height> <date>2006/06/09</date> <mimetype>image/jpeg</mimetype> <refererclickurl>http://wickedchopspoker.blogs.com/my_weblog/2006/06/world_cup_musin.html</refererclickurl> <refererurl>http://wickedchopspoker.blogs.com/my_weblog/2006/06/world_cup_musin.html</refererurl> <title>why_men_love_soccer.jpg</title> <url>http://wickedchopspoker.blogs.com/my_weblog/images/why_men_love_soccer.jpg</url> <width>230</width> <thumbnail_height>120</thumbnail_height> <thumbnail_url>http://re3.yt-thm-a01.yimg.com/image/25/m7/3979077535</thumbnail_url> <thumbnail_width>130</thumbnail_width> </result> </resultset_images> </ysearchresponse>
News Search ResponseEdit
The reponse for a news search contains fields that additionally specify the Last publication time of the story, source of publication and so on.
An example News search XML response:
<ysearchresponse responsecode="200"> <nextpage>/ysearch/news/v1/soccer?format=xml&start=10&count=10</nextpage> <resultset_news count="10" start="0" totalhits="8775"deephits="8775"> <result> <abstract>June 16 (Bloomberg) -- Adidas AG , the world's second - largest sporting-goods maker, will ``clearly exceed its full- year sales target for soccer-related goods and gain share in all major markets, Chief Executive Officer Herbert Hainer said. </abstract> <clickurl>http://www.bloomberg.com/apps/news?pid=20601100&sid=aSSf0jMZtvBU</clickurl> <title>Adidas Will `Clearly Exceed' Soccer Sales Target, Hainer Says</title> <language>en english</language> <date>2008/06/16</date> <time>14:21:15</time>
<sourceurl>http://www.bloomberg.com/</sourceurl> <url>http://www.bloomberg.com/apps/news?pid=20601100&sid=aSSf0jMZtvBU</url> </result> </resultset_news> </ysearchresponse>