Thursday 13 May 2010

BRII Update

This is a short update on our activities and some ideas I have come up with.

Since the end of the BRII project we have been working on the registry and user engagement. We are adding more data continuously. We are also outlining data quality control processes and planning some activities with users to evaluate our work.

Outcomes of our work in BRII gave us some insights into what the requirements for a successful service of the registry would be. For example:

From the departments and individual users point of view:
  • Breadth of coverage. By this is meant the inclusion of data from as many different sources as possible, both internally and externally.
  • Depth of coverage in addition to breadth of coverage. This will enable context to be clear and detailed questions to be answered. It will require a maximum quantity of data about each entity from multiple sources.
  • The ability to find information that cannot be easily found elsewhere such as all Oxford researchers working in a particular topic or collaborating with others in a specific geographical location
  • Easy to use and flexible search option on the Blue Pages
  • The ability to discover research connections between people and research interests as well as gaps or islands of subjects (groups who are not related to anyone)
  • The ability to explore information across time, as in changes in roles, research interests of people, and in their publications,
  • Being able to download relevant information in formats that can be easily manipulated by users.
From the University point of view:
  • To provide services complementary to those provided by other systems (avoid duplication)
  • the ease with which data harvesting can be repeated and supported in future will be critical in the long-term to delivering a lower overhead and a sustainable/affordable service.
We are working on implementing and providing the above requisites. However this could take some time during which we need to constantly monitor response from our users to see if we are in track.

During the last few weeks I have been reading about privacy and data aggregation issues. Both topics are extremely relevant to BRII. Regarding privacy, I like this quote by boyd (2010):

"Fundamentally, privacy is about having control over how information flows. It's about being able to understand the social setting in order to behave appropriately. To do so, people must trust their interpretation of the context, including the people in the room and the architecture that defines the setting."

If I translate the above to the context of the registry and the Blue Pages, I would say that in order for researchers and departments to trust our work we need to help them understand what we are doing with their information and in which contexts we are going to disseminate it. Although we are not dealing with personal information, we are using information about researchers' work (which sometimes they want to be private), information which can affect (hopefuly positively) their reputations and future work.

Regarding data aggregatiojn, the general concerns I gather from the literature can be summarized as: data aggregation can threaten privacy, can lead to security problems (e.g., identity theft), can mislead people (aggregated data is not always comprehensible), can violate contextual integrity (changing data’s original meaning) and is not always used for the same purposes as originally intended.

Having the above in mind and putting that into BRII’s context: data restricted to one institution and limited to data about research, this reading has left me thinking on more requisites we may need to consider in our work. The diagram below explains what I am talking about.

Click to see larger version

It is about building trust among our users. It complements the first list which is more focused on technical developments and data access and coverage. This second set of points, I think, would help our contributors as well as users of data. It will let them know in which ways we are going to use their information and in which ways we are allowing other parties use that data. We need to reassure our contributors that their data will be secured and used lawfully, by constraining uses to research purposes and keeping data’s contextual integrity. We have already been taking into account some of this points but we may need to stress them and publicise them more.

Relevant literature:
Ethics of data mining and aggregation
Data aggregation: Actually a threat?
Lita van Wel and Lamber Royakkers (2004) Ethical issues in web data mining. Ethics and Information Technology 6: 129–140
Nissenbaum, (1997) “Toward an approach to privacy in public: the challenges of information technology,” Ethics and Behavior 7(3) , pp. 207–219.
Nissenbaum, H. (1998), “Protecting Privacy in an Information Age: The Problem of Privacy in Public,” Law and Philosophy, 17, pp. 559-596.