Friday 27 November 2009

BRII's Entity Store

In our internal meetings we use the term Registry or Entity Registry to refer to the Research Information Infrastructure. Wanting to know a bit more about the meaning and technological features of such kind of store I asked Anusha. She gave me the following lecture:

Cecilia: What is an Entity Registry and what is the difference with conventional stores?

Anusha: Lets first list our entities, so we are clear about what we mean when talking about an entity - person, organisational unit, publications (journal articles, books, chapters...), funder information and research activity info.
Now, the main difference between an entity store and conventional store (typically databases) is that in a conventional store, the columns relate to attributes of each entity and it needs to be created at the time of creation of the database. So how is this a problem? Well,
  1. We need to think up of all the attributes relating to the entity (example: all the attributes that make up a person) at the time of creation and we cannot change our structure very easily later on. (cost of change rises exponentially with time)
  2. All the people have to follow the same structure. So you cannot account for variations very easily.
  3. You cannot accomodate all of the multiple relationships easily. Eg - person belongs to multiple departments/colleges, person has multiple roles, person has multiple titles, person has multiple names
When aggregating data from multiple sources, points 1, 2 and 3 are crucial. To design a common structure having all the attributes we are going to come across in the future is going to be near impossible.

Cecilia: This sounds obviously relevant to BRII because we are collecting information from all kinds of sources around the University, and most importantly because we do not have control over the content or format of that information.

But beyond these technological advantages, what are the benefits that this way of organising data brings to scholars?

Anusha: The extra benefit to visitors is that we can show them multiple and different relationships between entities very easily (like collaborators, linking funders, research activity, people, departments in whatever way we want).
Or the entity registry will be transparent to them, as they will only see things like the Blue Pages.
The entity registry will be transparent or open to all to access our data. If they are interested in the data, they can build tools to analyse the data in whatever they want (we haven't yet done this, but will be doing so).
With a service like the Blue Pages, for a keen observer, the entity registry will be noticable. For other users no and rightfully so, as they need not know what's happening at the back (for example: we see nothing about how google does its work). They can however observe that some of them have a lot of information and are linked in multiple ways to other entities, while some others hardly have any information. The key thing is that the data can be linked very easily to other entities in multiple ways.

The power of the Blue Pages is mainly derived from three things
  1. Quantity of data - The more we have the better
  2. Variety - A one stop shop, visit one website rather than 10 different websites
  3. The way we present our data and the deductions / analysis we perform on our data (like finding collaborators). If we can think of more deductions like this, it would be useful and make our web service more powerful.
The Blue Pages is just one way of displaying the data. We can do much more with the information, like create a graph of collaborations and areas of research across Oxford University for example.

Cecilia: and how do you see all this helping scholarly communication?

Anusha: scholarly communication is more than just Journals. Journals were and to some extent still are primary sources of communication but they aren't the only sources. We now have institutional repositories which are helping with this. Also, "scholarly" in scholarly communication does not refer to the people, but to the type of communication. So its anyone (not just scholars) communicating on a scholarly topic.

Cecilia: yes and I guess that having all these connections facilitates these communications.

Wednesday 18 November 2009

Blue Pages - User Tests 3

We have just started a third round of user tests of the Oxford Blue Pages. This time tests focus on research collaborations. Collaborations can happen in projects, when writing (books, articles, etc,) or they can be informal exchanges of knowledge. The Blue Pages will display the collaborators of researchers as far as data is available. Names of collaborators will be extracted from publications, project websites and personal websites. See screenshot below.

Users will be able to change the grouping of data between two views. In the example below collaborations are organised by people. When the user expands one person the Blue Pages will show the nature of that collaboration, which in the case of the example is one academic article and that person being listed in the researcher's website. When the user clicks on the group collaborations button the Blue Pages will organise data by sets of collaborations, for example by research projects or academic articles which when expanded will show their paticipants. Whenever data is available within the Blue Pages names of collaborators and research outcomes will have links to their corresponding profiles.

Note: data is real but connections between people were made up for this screenshot. Click to see full picture.

We are also testing how useful connections between data objects are to users. Data objects for us are People (researchers,) Research Activities (e.g. projects), Academic Units (e.g. departments) and Funders. The Blue Pages can connect all of these between them. For example: departments with people, people with people, people with research projects.

The Blue Pages also use research keywords to find and connect objects. For example search for research projects under a subject field, or find projects in similar areas to the one displayed on screen. Although some the of mentioned examples have not been implemented yet we ask testers how they would like to access and see these data.