An Architecture of a Distributed Semantic Social Network

Paper Title: 
An Architecture of a Distributed Semantic Social Network
Authors: 
Sebastian Tramp, Philipp Frischmuth, Timofey Ermilov, Saeedeh Shekarpour, Sören Auer
Abstract: 
Online social networking has become one of the most popular services on the Web. However, current social networks are like walled gardens in which users do not have full control over their data, are bound to specific usage terms of the social network operator and suffer from a lock-in effect due to the lack of interoperability and standards compliance between social networks. In this paper we propose an architecture for an open, distributed social network, which is built solely on Semantic Web standards and emerging best practices. Our architecture combines vocabularies and protocols such as WebID, FOAF, Semantic Pingback and PubSubHubbub into a coherent distributed semantic social network, which is capable to provide all crucial functionalities known from centralized social networks. We present our reference implementation, which utilizes the OntoWiki application framework and take this framework as the basis for an extensive evaluation. Our results show that a distributed social network is feasible, while it also avoids the limitations of centralized solutions.
Full PDF Version: 
Submission type: 
Full Paper
Responsible editor: 
Guest Editors
Decision/Status: 
Accept
Reviews: 

Submission in response to http://www.semantic-web-journal.net/blog/special-issue-personal-and-soci...

Third round revision, now accepted, after "accept with major revisions" in round one, and "accept with minor revision" in round two. Reviews of the second round revision are below, followed by the reviews of the original submission.

Solicited review by Fabian Abel:

The authors have carefully worked on the article and improved the article according to the reviewer's remarks. In particular, the evaluation of the performance of the implementation of their reference architecture for distributed social networks has been re-done completely. In particular, it is very charming that the evaluation now tests the performance on different devices ranging from server machines to smartphones.

Overall, I think that the article can be accepted and published if the following (A) minor issues and (B) comments on the text (+ further proof reading) are tackled.

A. MINOR ISSUES
----------------------

Some minor issues that should be tackled:
i. Section 3 - Implementation: clarify where/how the actual RDF data is stored

ii. Section 4: the structure should be slightly improved. Either consider to split the qualitative evaluation and the quantitative evaluation into two sections or add subsections: “4.1 Qualitative Evaluation: Social Web Acid Test” and “4.2 Quantitative Evaluation: DSSN Performance“ (i.e. “4.2 Evaluation Framework” would then become a subsection 4.2.x)

iii. the structure within the quantitative evaluation part should also be improved: I would recommend to first explain what is going to be evaluated before detailing the evaluation framework. For example:
4.2 Quantitative Evaluation: DSSN Performance
Based on the proposed..... Consequently, we there is a demand to investigate the following research questions:

To answer these research questions, we created an evaluation framework that allows for simulating the traffic within a DSSN. We apply this framework to measure the performance of our DSSN in a testbed using a large social network dataset.

4.2.1 Evaluation Framework
Figure 4 overviews the architecture of our evaluation framework...
4.2.2 Data Generation and Testbed Configuration

4.2.3 Results and Discussion
...

iv. Quantitative evaluation:
- what is the motivation for selecting one particular user who is closest to an “average user”? -> this has to be motivated and explained better because it is crucial for the experiment. If I understand correctly then the idea is to evaluate the “average performance”. By selecting an average user one assumes that there is a linear relationship between the runtime and the size of the user profiles (foaf:knows relations, number of triples generated by the user). Is this really true? If not then it would be better to either list the average runtimes as average over all users or plot for each user the query execution time (e.g. for each query one figure; in each figure one would plot three curves (for each device) where the x-axis refers to the x-th user and the y-axis refers to the runtime achieved for the x-th user; for each curve one would order the users by the runtime (e.g. in ascending order) that was achieved for the user).

- the results from Table 2 should be explained in more detail, for example: one should explain why Q4 is more expansive than the other queries and detail why Q4 cannot be answered on a smartphone (currently it just says that there is a "gap of working triple store implementations for HTML5"). Moreover, one sentence explanation why Q1, which is the second fastest query for the Server and FreedomBox, is the second slowest on the smartphone.

B. COMMENTS ON THE TEXT
--------------------------------------

I am not an English native speaker. I added "(?)" when I was not sure whether my suggestion is correct. Please also let other co-authors proof-read the article.

General issues (to be checked across the paper):
- Web vs. web
- Social Network vs. social network
- who is the actor of activities? -> in Section 2, it says several times that services/application "create activities"; I think that the actual actor would be the end-user while the services/applications report about the activities which the users perform
- allow: either one should write “allow someone to do something” or “allow for something” (e.g. “in order to allow object-centered push notification” -> “in order to allow for object-centered push notification”
- social web vs. Social Web

1. Introduction
--------------------
-p1: "most popular service on the Web." -> most popular services on the Web.
-p1: "with its 600M+ million users creates" -> consider to add a reference (footnote to corresponding posts in Facebook's official blog)
-p1: "We argue that social networking must be distributed and users should" -> quite strong comment (may be OK, but one could also consider to state: ""We argue that solutions to social networking should be engineered in a distributed fashion so that users are empowered to…")
-p2: "Privacy. Users of the DSSN " -> introduce the abbreviation DSSN first (and remove the introduction of DSSN later where it says " for a distributed semantic social network (DSSN)")
-p2: "due to the monopoly or duopoly " -> it is not clear watt is meant by duopoly (maybe change to: "due to the oligopoly in the social networking market, which is dominated by big players such as Facebook, Google or Twitter")
-p2: "Data ownership. Users have full ownership" -> "Data ownership. Users can have full ownership"
-p2: "Also, here a DSSN would" -> A DSSN would moreover...
-p2: "Although extensi- bility is also easily to realize in the centralized setting (as is confirmed by various APIs, e.g., Open Social), a centralized Social Network set- ting could easily prohibit (or censor) certain ex- tensions for commercial (or political) reasons." -> It is hard to follow this argumentation. Currently, it sounds as if extensibility is rather an advantage of centralized social networking platforms. Please clarify your argumentation.
-p2: "Arab Spring" -> provide some context, e.g. something like "As we observed recently during the Arab Spring, where social networking services helped protestors to…"
-p2 - footnote 2 -> if you could add a reference to this work where you experimented with activities like git communities then this would be perfect
-p2: "Combined these standards, protocols" -> "Together, these standards, protocols" (?) or "By combining these standards, protocols…"

2. Architecture…
-----------------------
-p2: "In this Section we" -> "In this Section, we.." (not sure, please check grammar)
- Fig. 1: "Foto" -> Photo (?) (see also other occurrences in the article)
- Fig. 1 - caption: "Resources announce services and feeds, feeds announce services – in particular a push service." -> try to explain more precisely (should it mean something like "resources provide descriptions of services"?)
-p3: "clean an reliable." -> "clean and reliable." -> whether this constraint increases reliability would need to be shown -> maybe: "clean and is intended to increase the reliability of the DSSN."
-p3 - Footnote 3: spec -> specification
-p4: "An example service relation is shown in Listing 4." -> remove this sentence
-p4 - Footnote 5: "we suggest to use it" -> "we suggest to use it."
-p4: "e.g. create activities" -> (1) at this point it is not clear what an activity is; (2) can agents and applications really "create activities" (e.g. perform a tagging activity) or do you mean that applications/agents can "report about user activities"
-p4: "tagging" -> "tag assignments"
-p4: "Media Artefacts are also created by services and ap- plications" -> do you mean that (a) services itself create artifacts or do you mean that (b) users create artifacts and services report about the creation of an artifact?
-p4: "Feeds […] are usually not expressed as RDF. " -> feeds could be represented in RSS/RDF; either add some evidence for the "usually not", remove the "but are usually…"-sentence or re-phrase
-p4 - Footnote 7: "in their social network"
-p5: "From a more technical perspective the WebID pro- tocol [20] (formerly known as a best practice [21])" -> align with Section 2.2.1, e.g.: explain "best practice" aspect (incl. reference to [21]) in Sec. 2.2.1. In Sec. 2.3.1 one can then just write "From a more technical perspective the WebID pro- tool incorporates…"
-p5: " X.509 certificate " -> add link to specification in a footnote
-p6: "We-bID" -> try to resolve line break issues
-p6: "If such … agent, access to the secured re- source is then granted. " -> "If such ... agent, then access to the secured re- source is granted. "
-p6: "In contrast to other access control solutions such as OAuth12, a user only has to maintain a central resource, her WebID." -> give some further context, i.e.: "In contrast to … OAuth, where …" (actually, I cannot follow; I one could apply OAuth-based authorization also in a distributed fashion, right?)
-p6: "context of DSSN architecture" -> "context of a DSSN architecture"
-p6: " and establish a new connection (Friend- ing)." -> " and establish a new connection (emph{friending})."
-p6 - Footnote 10: "We have recently discussed the motivation and solution …and hope" -> this is slightly to informal. Try to phrase it slightly differently, e.g. "Our solution has been proposed to the W3C WebID incubator group (ADD link to the group)…who might integrate the the access delegation principles into the … specification." -> remove "we hope"
-p6: " enables people, but also authors of RDF content" -> " allows casual users and authors of RDF content"
-p6: "(or other content, e.g. status message)" -> "(or other content, e.g. status messages)"
-p6: "Pingback service is shown in Listing 4." -> although Listing 4 is very short, please explain the meaning of the listing in one sentence (-> who is pinging whom?)
-p6: "tagging." -> "tagging activities."
-p6 - Footnote 13: "In fact, we experimented with ..and as a result we now prefer… This is described in…[22]" -> "we prefer" is not a solid design rationale; maybe "…from the results, which are described in [22] we conclude that HTTP post requests are more appropriate…because…"
-p7: “Alice publishes a foaf:knows relation to Bob in their WebID” -> in her WebID profile. (?)
-p7: “pings Bob’s WebID because of this new statement” -> maybe: pings Bob’s WebID to inform Bob about this new statement
-p7: “Bob approves of the relation by publishing it in his WebID,...” -> “Bob approves the relation by publishing it in his WebID profile,...” -> will exactly the same “relation” be stored in his WebID profile or the inverse statement (i.e. Bob foaf:knows Alice)?
-p7: add reference to PubSubHunbbub specification or Google code project via which the specification is accessible (e.g. link to http://code.google.com/p/pubsubhubbub/ in a footnote)
-p7: “is not the best solution...from our perspective” -> “is not an optimal solution...from a Linked Data perspective” (?)
-p7: “The main work flow of establishing a PubSubHubbub connection is to advertise” -> “The main work flow of establishing a PubSubHubbub connection can be described as follows:...”
-p7: “two specific feeds are important and interlinked with a WebID to allow subscriptions to them:” -> “two specific feeds are important and interlinked with a WebID to allow for subscriptions:”
-p8 - Table 1: I think that the caption and labels of Table 1 are slightly misleading. In fact, Table 1 lists RDF statements that may cause a pingback, right? It would be good to phrase this more clearly. Moreover, it would be good to clarify “who is pinged”, e.g. given a sioc:reply_of, will the foaf:maker of the original post be pinged?
-p8 - Table 1: it might be a good idea to add to the caption a short explanation to which vocabularies “sioct”, “ctag” and “aair” are pointing
-p8: “Activity Distribution” is not an optimal name, I think, because one does not distribute activities but rather messages that report about activities
-p8: “(see next section)” -> “(see next section).”
-p8: “(e.g. for a group resource to receive updates” -> “(e.g. if a user is member of a certain group then she may subscribe to the feed of a group resource to receive updates”
-p8: “Since resource synchronization is not an issue which was raised in the context of DSSN at first, we have built our” -> “Resource synchronization is a well known when dealing with distributed resources. We have designed our...”
-p8: “can do more than notification.” -> ”can do more than sending notifications.”
-p8: “dssn:syncFeed” -> add link to the DSSN ontology (e.g. add a footnote saying: The DSSN ontology is available at: …)
-p9: “application queries from applications” -> queries from applications
-p9: “However, this assumption is not true for all public Semantic Web search engine at the moment.” -> either remove this sentence or add some further explanations, e.g. are the big players providing SPARQL endpoints? which engine does not provide SPARQL endpoints?
-p9: “are secured by way of” -> are secured by means of
-p9: “DSSN applications work with the WebID” -> ...with a WebID
-p9 - Footnote 18: “an update service” -> “an update service.”

3. Implementation
--------------------------
- consider to chose a more specific title for Section 3, e.g.: “DSSN Implementation for OntoWiki”
- introduce OntoWiki in 1-2 sentences:what is OntoWiki?
- for the list of features in the introduction of Section 3, consider to add “(see Section 3.x)” at the end of each listed item (e.g. “other Linked Data-enabled resource. (see Section 3.1)”)
-p10: “activities inside her social network” -> activities inside their social network
-p10: “an PubSubHubbub-enabled” -> “a PubSubHubbub-enabled”
-p10: “we describe the implementation decisions made to allow these functionality” -> “we describe the implementation of these features and provide insights into our rationale for choosing certain technologies”
-p10: “will be Linked data enabled” -> “will be Linked Data enabled” (check document: “Linked Data enabled” vs. “Linked Data-enabled”)
-p10: “3.2 Maintaining Network Connections” -> slightly ambiguous, consider renaming to, e.g., “Maintaining Friend Connections”
-p10: “In order to maintain network connections (aka. friending)” -> “In order to maintain friend connections and other social network connections”
-p10: in the list where you explain the four steps it always says “we” (e.g. We fetch the data). Who is “we”? -> try to replace we by the corresponding component of the DSSN
-p10: “request, so the new” -> “request so that the new”
-p10: “news feed” -> does news feed refer to activity feed?
-p10: “The incoming atom activity entry were transformed to” -> “The incoming atom activity entries are transformed to”
-p10: “AAIR resources” -> if not done before (e.g. in Table 1) then add reference/link to AAIR
-p10: “for the friends profile” -> “for the friends’ profile”
-p11: “Since a user is normally not interested in all activities of his friends (e.g. certain games), there needs to be a feasibility to remove certain activities from the timeline.” -> “Since a user may not be interested in all activities of her friends (e.g. gaming activities), we offer functionality to hide certain activities in the timelines that visualize the activity streams to which a user is subscribed to.”
-p11: “we use this component to keep users timeline clean.” -> “we apply this component to allow users to clean up their timelines: ”
-p11: “and sharing is made really easy.” -> “so that sharing is facilitated for the user.”
-p11: “All activities are represented as Linked Data resources and are equipped with a Pingback service as well as a feed” -> “All activities are represented as Linked Data resources that refer to a Pingback service and a corresponding activity feed.” (?)
-p11: “Each time someone comments on a resource (or otherwise uses it) externally” -> what is meant by “externally”?
-p11: “once he comments on a particular resource” -> once she comments on a particular resource
-p12: “currently commenting person” -> “commenting person”

4. Evaluation
-------------------

-p12: “SWAT level have clearly been described” -> “SWAT level have been specified”
-p12: “The following enumeration describes the details:” -> “The following enumeration describes the corresponding steps:”
-p12: “by creating a tag resource which” -> “by creating a tag resource (ctag:Tag) which” (?)
-p12: “WebID of user B.” -> “WebID of User B.”
-p12: “publishing the Tag.” -> “publishing the tag.”
-p12: step 6 is not 100% in line with the 6th step of SWAT0 because in your scenario, User B might not get notified. Add an explanation that states that you allow for a variation or provide additional functionality since User B can choose whether she wants to be notified or not
-p12: “However, most of the user stories are satisfied” -> “However, most of the emph{user stories} are satisfied”
-p13: “Are the queries fast enough..” -> at this point it is not clear what is meant by “queries”. Please sketch what is meant by “queries” before.
-p13: “demonstate the usage of licens- ing in our architecture (the data ownership issue from the Introduction).” -> it is still not clear what is meant by “licensing”. Please clarify.
-p14: “facet based exploration module” -> “facet-based exploration module” or “faceted search module”
-p14: “This specific query asks for the” -> “The query that is depicted in Listing 9 asks for the”
-p14: “Since foaf:birthday values have a string datatype, comparison is also done by string order here.” -> “Since foaf:birthday values are of datatype xsd:string, a string comparison has to be executed.”
-p14: “This query is used” -> “Query Q3 (see Listing 10) is used”
-p15: “Query Q4 is used to prepare a” -> “Query Q4 is applied to prepare a”
-p15: what is meant by “vertical result set”?
-p15: “low-end hardware, which can be achieved by everyone” -> “low-end hardware, which everyone can afford”
-p15: “or which already exists in most households as DSL router or WLAN access points,” -> “or which already exists in most households (e.g. DSL routers or WLAN access points),”
-p15: “three prototypical categories of DSSN nodes where we wanted to test the query performance:” -> “three prototypical categories of DSSN nodes for which we would like to test the query performance:”
-p15: “will soon be solved in the near future” -> “will be solved in the near future”
-p15: “to get a feeling, which” -> “to overview what”
-p15: “are important: The number...and” -> “are important: (1) the number...and (2)”
-p15: “Figure 5 shows a scatter plot where each account is one point while the x axis” -> “Figure 5 shows a scatter plot where each account corresponds to one point. The x-axis”
-p15: “y axis is the amount” -> “y-axis depicts the amount”
-p15: “(profile triple and activity triple)” -> “(profile triples and activity triples)” (?)
-p15: the results shown in Figure 5 have to be explained in 1-2 sentences
-p15: “We cleaned the data by eliminating outlier in both dimensions.” -> Why?
-p15: “outlier” -> “outliers”
-p15: “is around 1500 triple.” -> “is around 1500 triples.”
-p15: “The average amount of related contacts in the cleaned data is 200.” -> please provide the exact number
-p16: "As a nice spin-off, this evaluation demonstrates how Social Network federation can be achieved based on semantic interoperability. " -> maybe: "As a nice spin-off, this evaluation demonstrates how Social Network federation can be achieved if semantic interoperability is guaranteed. "
-p16: "If a user wants to federate his DSSN node" -> "If a user wants to federate her DSSN node"
-p16: "This can even be an integral part of its own DSSN node and only accessible for him." -> no entirely sure what is meant by this sentence, maybe?: "The RDF translation can moreover be realized by the user's own DSSN node so that the user can be in full control of her data."

5. Related Work
-------------------

-p16: "This model addresses some drawbacks like lack of interoperability" -> "This model poses some drawbacks like lack of interoperability"
-p16: "in using private data like how data can be either used or transmitted." -> "in using or transmitting private data."
-p16: "network were proposed and numerous projects based on those were developed with respect to a conve- nient and secure way for users." -> "network have been proposed and numerous projects based on those architects have been developed to provide convenient and secure functionality to the users."
-p16: "The dis- tributed social network model emerged to deal with challenges due to centralized models." -> "The dis- tributed social network model emerged to overcome shortcomings attributed to centralized models."
-p16: "which together referred to as " -> "which together are referred to as " (?)
-p16: "Rss, PubSubHubbub, Webfinger," -> some of these formats/protocols/specifications are already mentioned in the beginning of the article. One could think about listing the specifications that are important for this article in the references so that one can easily re-cite them here
-p16: "projects which were developed using these technologies are:StatusNet28 ,DiSo" -> are all these projects aiming to realize distributed social networks? Simply listing all these projects does not add too much value for the reader. It would be good to explain/overview these projects (maybe not all projects individually but maybe in groups, e.g. something like "A, B and C provide… while D, E and F focus on … and use XY technologies".
-p17: " employing XMPP for instance messaging" -> " employing XMPP [17] for instance messaging"
-p17: "While a federated model as a hybrid model improves some of those disadvantages, some of them still remain. " -> give an example of a problem that can be solved and a problem that remains
-p17: "em- ploys individuals for personal data and relations." -> maybe: "em- ploys individuals for maintaining personal data and relations."
-p17: "a SPARQL query which is associated with the user is at first created and monitored in an RDF triple store." -> odes "user" refer in this setting to the agent that issues a query?
-p17: "SIOC (Semantically Interlinked Online Communities) project43" -> remove footnote 43 and use [6] instead

5. Conclusions and Future Work
------------------------------------
- it might be a good idea to also summarize the results of the evaluation (maybe as some sort of motivation for the future work (e.g. "scalability and the multi-client capabilities have to be improved.") that still has to be tackled)
-p18: "social networking apps " -> "social networking applications (emph{apps})"

References
-------------
- check whether references are properly formatted (e.g. full names of authors vs. initial and lastname (e.g. A. Passant); some references list the locations of venues (e.g. [5]) while others don't (e.g. [16])
- [3] -> title of proceedings seems to be quite long and mentions two times ISWC 2006
- [7] and [8] seem to refer to the same specifications (I think it is a good idea to point to the specification (URL) as done in [8])
- [10] -> add "USA"

First round reviews:

Solicited review by Fabian Abel:

A. Summary of content of the paper:
The authors present an architecture and prototypical implementation of a distributed semantic social network (DSSN) implemented based on standards such as WebID and using Linked Data Principles. The architecture consists of three layers: (1) a data layer that features resources (e.g. foaf:Person resources, pictures a foaf:Person uploaded, etc.) and streams (activity streams = what activities did a foaf:Agent perform; history feeds = logging changes of resources), (2) a service layer that provides functionality such as updating resources, searching resources or pings within the social network (informing about updates) and (3) an application layer that allows for applications such as resource sharing, blogging etc.
The prototypical implementation is evaluated.

B. Summary of review:
The paper is well written. The topic of the article is very relevant for the special issue on the Personal and Social Semantic Web. The proposed solution is very interesting and makes nicely use of various of existing (Semantic Web) standards. It nicely demonstrates how social networking functionality can be realized in a distributed fashion by exploiting Semantic Web technologies.

Motivations for a DSSN are given in the conclusions. I think it is very important to find some arguments why social networking must be distributed. At the moment, the introduction lacks motivation why a DSSN is required. In fact, given the stats that are mentioned about Facebook, one could also argue that Facebook is "the" social network and no other networking services are required an applications on the Social Web that would like to exploit a user's social network should just connect to Facebook :-) In the conclusions, some interesting motivations for a DSSN are mentioned. These things could/should be discussed already in the introduction (e.g. from the perspective of "shortcomings of current centralized SNs such as Facebook).

Section 2 is, in my opinion, very nicely structured and explains the architecture very well. If the authors would discuss shortcomings of traditional SNs and motivations for DSSNs already in the introduction, then it would be great to link these issues to the design principles in Sec. 2.1 so that the design rationale become more obvious.

In Section 3, it would be interesting to give examples of SPARQL queries that are issued in order to generate the view that is displayed in Fig. 3. For example, the SPARQL query that is used to get the upcoming birthdays of the user's friends. Given these queries, one could also stress stronger where the data is coming from that is visualized in Fig. 3 (I assume that the data is distributed, right?). Moreover: would it make sense to present also a second view (e.g. the "network")?

The evaluation that is presented in Section 4 is the weak part of this article and should be improved: The quantitative evaluation does not meet the expectations of the reader and should be improved. What research questions should actually be answered? For example, given statistics from Facebook or Twitter, one could estimate the runtime performance of certain SPARQL queries such as "count the number of friends that are interested in db:Dataportability". Right now, no numbers are given. Hence, one could even omit the sections about the quantitative analysis as no quantitative results are reported, i.e. the quantitative analysis currently does not add much value.

Section 4.3 raises some questions and comments:
- From how many users were the Facebook activities collected? The number of activities ranges between 1 and 24 per month - it seems that this is not representative for posting behavior in social networks (one would rather expect that the highly active users perform 24 activities per day (or even per hour, e.g. there are Twitter users who post more than 1000 tweets per day). Hence, the data that was used to generate the artificial data probably does not feature typical characteristics that would result from real user behavior. It is probably better to get the required statistics from related work (e.g. take a Twitter paper such as [1] that reports about the distribution of "the number of followers/followees or the distribution of "the number of tweets per user" (see Sec. 3 in [1]) -> the corresponding stats can be used to generate artificial data, e.g. the social network of the artificial data should show similar characteristics like a real social network (e.g for Twitter: Fig. 1 in [1]; power-law exponent between 2 and 3).

- It is not clear how the artificial data was generated, e.g. what do you mean by a random process? What kind of probability distribution was used to simulate a user's activities? How do the stats from the real data come into play? It seems that artificial data is generated for each of the three classes of users (high, medium, low). Given such a discrete approach, one will not obtain characteristics that are similar to a real social network.

- large scale: The artificial dataset is not really large scale, e.g. 100 users is fairly small. One would rather expect something like 1M users.

Regarding the quantitative evaluation, I think it would actually be better to take a real dataset. For example, social network datasets are available at http://snap.stanford.edu/data/index.html (only the social network) or a Twitter dataset is available via http://wis.ewi.tudelft.nl/iswc2011/ (authors can be contacted to get also the profile information about the users that published the tweets). Given such a dataset, one could evaluate the use cases listed in Sec. 4.1 or SPARQL queries that are required to generate the view (or particular elements of the view) that is depicted in Fig. 3 in a quantitative manner. For example, given the Twitter dataset one could show how "expansive" the "get upcoming birthdays of friends" query would be on average.

C. Minor things:

- Page 7: Are change set feeds the same as History Feeds? If so, one could use the term "history feed" instead. Otherwise it would be good to add a short explanation about change set feeds.

- Page 11: a Pingback request is send -> a Pingback request is sent

- Page 13: 1-2 sentences of explanation regarding the stats presented in Table 2 should be added. Moreover: what is the meaning of SD?

- Page 13: As can been seen > As can be seen

- Page 13: is shown in table 3. > is shown in Table 3.

- Page 15: Pointers to protocols Webfinger & Co. would be nice.

Solicited review by Laura Hollink:

This paper discusses the possibility of a social network that is distributed and based on semantic technologies, rather than centralized, closed, and proprietory. The paper fits very well within the scope of the special issue and could be an important contribution to the work on the Social Semantic Web. Moreover, it is well written. I am therefore happy to recommend publication.

Comments and questions:

On page 2, you state that your architecture "helps users of the DSSN to distinguish between their own data, which they share with and license to other people and services, and foreign data, which they create by using these services and which they do not own." However, in the rest of the paper this distinction is not made. What kind of foreign data do you foresee, and how would your implementation manage this?

Also on page 2, it says about users profiles that "The description of the owner can be performed in any (mix of) suitable vocabulary (-ies), but FOAF [6] has emerged as the ‘industry standard’ for that purpose." However, on page 6, you define 'friending' as "the process of establishing a symmetric foaf:knows relation between two WebIDs", which suggest that FOAF needs to be used. My general question is how you see the role of the FOAF vocabulary. Would profiles based on other vocabularies have to be linked (or 'dumbed down') to FOAF, e.g. in order to be used to express friends? How would you deal with social networks in which the friend relation is not symmetric?

The paper features two evaluations, a qualitative and a quantitative one. I am a bit puzzled by the latter. What is the goal of this quantitative evaluation exactly? The result section 4.5 discusses scalability issues but most of the remarks there do not follow from the experimental results; they follow from expectations that the authors have about the amount of activity, triples, and the state-of-the-art of triple stores. The single remark "we did not reach a limit regarding the scalability on a single machine" does not seem to justify the long and detailed explanation of the setup of the dataset for this test.

Finally, I have a general question: could you please briefly explain how (and if) your architecture can be used for integration with (or porting of) existing social web data and services?

I found two typo's:
-p.14 We did not expected -> expect.
-p.14 To make a guess on incoming triple per year -> triples.

Solicited review by anonymous reviewer:

The authors describe a reference architecture for of a distributed semantic social network. They aim to overcome the drawback resulting from users working with many isolated social networks and present approaches for overcoming this isolation and interconnecting said networks.

Their approach focused on the exploitation of Semantic Web technology which makes it highly relevant to this journal.

The reference architecture is presented in theory; a real-world implementation. illustrates potential usage and served as a basis for the evaluation. The results are promising.

The paper lacks a deeper review of of related work:

- Even Wikipedia provides a detailed overview http://en.wikipedia.org/wiki/Distributed_social_network which provides more insights than the papers does; the authors should also have a look at the external links being referenced; I did search for related work shortly but could find a lot of other related work - anyway, don't think it is a big issue and have the feeling that the authors just ran out of space and are actually aware of that there is more but decided to focus on the most important related work.

The paper is clearly structured and easy to read and follow.

What I am not really convinced is the novelty of the work presented.

Tags: