This research involved the development and study an intelligent news-sifting client tool that tracks topics in a dynamic online news source. Multilevel interest profiles, utilizing both explicit and implicit feedback, track the user’s interest and are used to customize the news display.
The coarse-grain interest profile tracks interest in the general channels such as “Business” and “Technology”. The fine-grain profile tracks interest in the topics within each channel. Providing the additional fine-grain profile, which is not present in most online news personalization systems, allows the system to more accurately track the users’ interests.
NewsSifter uses the ClariNews online news feed service, distributed through USENET newsgroups. News articles from 384 ClariNews newsgroups are organized into 25 general interest channels and are retrieved and stored for use by NewsSifter. The update frequency of the channels range between 3 and 300 messages a day.
A multilevel interest profile, utilizing both explicit and implicit interest indicators is used to track user interest. When a user starts using the system, they explicitly set their interest level in the channels and the topics within the channels. Over time, the system monitors the user’s behavior and implicitly updates the interest levels in the channels and topics. This allows the system to track the user’s interest without interrupting their news reading experience to get explicit feedback. The user does have the opportunity to give explicit feedback on individual news articles, but this explicit feedback is not required.
The multilevel nature of the profile comes from modeling the users’ interests at the coarse-grain level of the channels and at the fine-grain level of the topics within each channel. The coarse-grain channel profile tracks the users’ interest in the general categories and controls which channels are displayed and the order they are displayed. The fine-grain topic profile tracks the users’ specific interest in the topics within each channel and controls the order the articles are displayed.
The channel profile is created from the 25 general interest categories derived from the ClariNet newsgroups. Categories in the channel profile do not change, but their order may change depending on the explicit and implicit feedback. The channel profile controls the display and order of the channel tabs within the main interface.
The topic profile is dynamically derived from the documents within each channel and controls the display order of the news articles. Vocabulary detection and clustering is performed on the document set to create a profile and is performed using components of the SIFTER system.
The vocabulary detection is performed by calculating tf.idf weights based on the token frequency, sorting them by weight and token, and the tokens that appear in at least D documents and are ranked between 1 and R (R should be a small number to ensure selection of highly weighted terms) are selected.
A heuristic unsupervised clustering algorithm, called Maximin-Distance algorithm, is used to determine the cluster centroids. The centroids are generated in an iterative fashion. The distance between terms is calculated using a cosine similarity based formula.
The NewsSifter system provides a direct manipulation interface where the user sets their interest level in the channels and topics and browses the articles. Before using the system, a user must register with the system, setting up a login name and password to use for each session. The first time a new user logs in, they are prompted to set their interest level in the news channels, which creates initial settings for the channel profile. The user selects the channels they want to read and then sets their interest level in each channel using the sliders.
Once the user explicitly sets their initial interest in the channels, the system does not prompt them to change their settings. Over time, the system updates the channel profile using implicit feedback. The user may access the settings at any time and explicitly set their current interest levels. When the user selects to explicitly updates the channel profile, the display order of the channels will be sorted based on the current interest levels.
Similarly, the first time a user accesses each channel, the current topic profile is downloaded and the user is prompted to set their initial interest levels in the topics, thus creating a starting point for the topic profile. The word listed as the topic is the token that is the centroid of that cluster. If the user holds the mouse pointer over the topic a tool-tip listing all of the cluster members will be shown.
As with the channel interest, the user selects the topics they are interested in and sets their interest level using the sliders. The user is prompted to explicitly set their interest only once and over time the system updates the profile based on implicit feedback. The user may decide to explicitly change their topic profile and if they do, the topics are displayed sorted based on the current interest levels.
As time goes on, the user may want to detect new topics and use the new topics to sort the articles. If the user selects the menu item to update the topic profile, the most recent topic profile is downloaded and the user is prompted to set their initial interest level in the new topics.
The channels the user has indicated they are interested in are displayed in a series of tabs with the channel with the highest interest displayed by default and in the left-most tab. The tabs to the right are in decreasing order of interest. Over time, the interest level is updated and the order of the tabs may change. In addition, if the interest level of a channel falls below a threshold it will no longer be displayed in the tabs.
If the user decides they want to read a channel that is not currently displayed, they can either explicitly edit the channel interests through the “Configure” menu or they may select the channel they want to read through the “Channels” menu. When a channel is selected from the “Channels” menu, its interest level is automatically raised above the display threshold and it is added to the tabs. The news articles for the selected channel are classified and sorted based on the topic profile. When an article is double-clicked, it is displayed in a new window.
The user has the option to give explicit feedback on the article, but they are not required to. If no explicit feedback is given then positive implicit feedback is given since the user chose to view the article. If the user scrolls to view the entire article, the amount of positive implicit feedback is increased.
Requesting explicit feedback when a profile is first accessed provides an accurate starting point for the profile. Updating the profile over time based on implicit feedback allows the profile to evolve with the user’s interest while not interfering with the news reading process by having the system interrupt the user to request explicit feedback. A reinforcing learning algorithm is used for profile update.
Andrew J. Kurtz and Javed Mostafa. Topic detection and interest tracking in a dynamic online news source. In Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries, pages 122-124. IEEE Computer Society, 2003.