Ontology-based User Profile Learning from heterogeneous Web Resources in a Big Data Context

Tracking #: 608-1816

Anett Hoppe
Ana Roxin1
Christophe Nicolle

Responsible editor: 
Pascal Hitzler

Submission type: 
Full Paper
With the emergence of real-time distribution of online advertising space (“real-time bidding“), user profiling from traces left by online navigation reaches a new importance. The ability to distinguish user interests based on implicit information as it is contained in navigation logs, enables online advertisers to target customers without interfering with their activities. Current techniques apply traditional methods as statistics and machine learning, but also suffer from their limitations. As an answer, the MindMinings research project aims to develop and evaluate a semantic-based profiling system for improvement purpose.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 07/Jan/2014
Review Comment:

This paper presents an approach to build an ontological user profile by analysing web pages on the Web related to advertising topics.
The paper is written in a very high level and several important details are missing.

The title is not suited with respect to the content of the paper. Generally, "heterogeneous Web resources" refer to several Web objects that
users can access such as web pages, chat, forum, images, news, blog, etc. Instead, by reading the paper only web pages are considered.
Also in this last case several mistakes are provided: in the Introduction the user profile is described as made up of several attributes on activities
performed by users when interact with the web pages (such as, the standard time spent on a page, the bookmarked page, the printed page, etc.), and in the rest of paper these actions are not considered for defining a portion of the user profile.
The same it happens by considering the related work section where the authors have given a lot of importance to other approaches presented in the literature where information from texts is analyzed to define ontologies related to the analyzed content. Also in this case, details concerning the "machine learning" approach used to extract terms for defining an ontology related to the content of the web pages are missing.
In the literature, several papers have been presented with the goal to build ontologies by analyzing the user's actions on the web pages (or logs or cookies) and/or the content of textual information.
Thus, a comparison of some of these methods with the one presented by the authors is needed (see the survey in http://www.mdpi.com/1999-5903/2/4/533 or the two well known papers written by Susan Gauch).

User profiles are defined by considering explicit (e.g., filling out questionnaires) and/or implicit (e.g., monitoring the user's activities) user's information.
It is not clear if the authors also consider the explicit information to obtain the gender or age of the individual; in case of a positive answer, how this information is automatically acquired in the system?

The example given at the end of the paper is not explicative of the proposed approach. More details have to be provided concerning the definition of the user profile, namely which type of content is analyzed (if the textual information from web pages or the user's actions obtained by log), how the interests are extracted from the web page, how, for this portion of the profile, the relationships used in the ontology are automatically populated.
Moreover, it seems that a predefined set of concepts exists related to the types of topical content of the considered web pages. It is not realistic to assume a
pre-defined pool of topics as the advertising of the web pages is related to a wide range of topics.

- In the "Introduction" and in the "Task description" sections the references are missing.
- In the introduction the authors stated that they not use machine learning techniques to define their ontology; instead they used them to analyze the content of web pages/cookies/log files.
- they refer to "industrial partner" without to give some details about this partner.
- some details have to be provided according to the structured information in the log files.
- In Section 4.3.2 the authors have to give an explanation by using the description logic syntax.
- Figure 3 and Figure 4 not represent the same ontology schema.

Review #2
Anonymous submitted on 12/Mar/2014
Review Comment:

The paper addresses the problem how to represent user profiles by means of an
ontology. For this a class hierarchy of 11 classes is developed (fig. 2),
that allows to represent web pages (as bag of weighted keywords and topics),
user sessions (as sequence of page visits) and user profiles. The authors
especially stress that they can relate profiles to segments that can be
defined using OWL restrictions or SWRL rules (section 4.3). To test the
prototype, a segment for "mothers" (as "female parents") and "sportyMoms"
(as mothers that visited a webpage with topics sports) is created and
shown that users with the defining characteristics correctly are
inferred as belonging to the segment.

The title of the paper is misleading: neither is the paper about *learning*
user profiles, but about representing them (and allowing automatic inferences
within the representation), nor does the big data context play any role
in the design or evaluation of the ontology.

The core contribution in my opinion is the ontology representation of user
profiles. As such, it suffers from two major issues:

1. There is no evaluation provided that assesses if the ontology representation
has any benefits, e.g., a comparison of click-through-rates of a status-quo
systems that does not use ontologies vs. the proposed system. The authors
merely show that they can define new segments. But what is the impact of
such segments? And why cannot they be found automatically?

2. There is no clear delineation against the state-of-the-art.
The authors review some of the literature of ontological user profiling,
but later on they never compare against existing mechanisms: what are the novel
aspects of their representation? Is their ontology more expressive than others?
Or does it allow faster inference?
-- Some papers are missed, e.g.,
Stuart E. Middleton, David C. De Roure, and Nigel R. Shadbolt. 2001.
Capturing knowledge of user preferences: ontologies in recommender systems.
In Proceedings of the 1st International Conference on Knowledge Capture
(K-CAP '01). ACM, New York, NY, USA, 100-107.
-- Also a comparison against other generic/flexible representation mechanisms
such as web data warehouses is missing.

Some aspects of the proposed ontology are not clear: what is "BID" (fig. 2,
not discussed in the text)? What is "Universe" (fig. 2)? The core concept
"profile" is not clearly defined: to which class do the individuals
"Age 15-24" belong and how are they linked to profiles? Where is this
shown in the class diagram? -- Some aspects of the proposed ontology
look rather use-case specific to me and not generic, e.g., that domains
are grouped by "partners".

Some parts of the paper are only loosely connected to the core contribution,
read more like a project description or like a paragraph from a textbook and thus
should be removed or likely could be shortened for a research paper, e.g.,
* section 1, last paragraph: description of the aims of the paper.
* section 2: 2nd and 3rd paragraph: description of authors' organizations.
* p. 3: the V's of big data,
* section 3.1: related work on resource profiling,
* section 4.3: the different ways to define a concept in OWL/SWRL.

In the abstract "limitations" of existing methods for placing online ads are
referred to, but in the paper these limitations never are detailed.