Friday, 27 November 2009

BRII's Entity Store

In our internal meetings we use the term Registry or Entity Registry to refer to the Research Information Infrastructure. Wanting to know a bit more about the meaning and technological features of such kind of store I asked Anusha. She gave me the following lecture:

Cecilia: What is an Entity Registry and what is the difference with conventional stores?

Anusha: Lets first list our entities, so we are clear about what we mean when talking about an entity - person, organisational unit, publications (journal articles, books, chapters...), funder information and research activity info.
Now, the main difference between an entity store and conventional store (typically databases) is that in a conventional store, the columns relate to attributes of each entity and it needs to be created at the time of creation of the database. So how is this a problem? Well,
  1. We need to think up of all the attributes relating to the entity (example: all the attributes that make up a person) at the time of creation and we cannot change our structure very easily later on. (cost of change rises exponentially with time)
  2. All the people have to follow the same structure. So you cannot account for variations very easily.
  3. You cannot accomodate all of the multiple relationships easily. Eg - person belongs to multiple departments/colleges, person has multiple roles, person has multiple titles, person has multiple names
When aggregating data from multiple sources, points 1, 2 and 3 are crucial. To design a common structure having all the attributes we are going to come across in the future is going to be near impossible.

Cecilia: This sounds obviously relevant to BRII because we are collecting information from all kinds of sources around the University, and most importantly because we do not have control over the content or format of that information.

But beyond these technological advantages, what are the benefits that this way of organising data brings to scholars?

Anusha: The extra benefit to visitors is that we can show them multiple and different relationships between entities very easily (like collaborators, linking funders, research activity, people, departments in whatever way we want).
Or the entity registry will be transparent to them, as they will only see things like the Blue Pages.
The entity registry will be transparent or open to all to access our data. If they are interested in the data, they can build tools to analyse the data in whatever they want (we haven't yet done this, but will be doing so).
With a service like the Blue Pages, for a keen observer, the entity registry will be noticable. For other users no and rightfully so, as they need not know what's happening at the back (for example: we see nothing about how google does its work). They can however observe that some of them have a lot of information and are linked in multiple ways to other entities, while some others hardly have any information. The key thing is that the data can be linked very easily to other entities in multiple ways.

The power of the Blue Pages is mainly derived from three things
  1. Quantity of data - The more we have the better
  2. Variety - A one stop shop, visit one website rather than 10 different websites
  3. The way we present our data and the deductions / analysis we perform on our data (like finding collaborators). If we can think of more deductions like this, it would be useful and make our web service more powerful.
The Blue Pages is just one way of displaying the data. We can do much more with the information, like create a graph of collaborations and areas of research across Oxford University for example.

Cecilia: and how do you see all this helping scholarly communication?

Anusha: scholarly communication is more than just Journals. Journals were and to some extent still are primary sources of communication but they aren't the only sources. We now have institutional repositories which are helping with this. Also, "scholarly" in scholarly communication does not refer to the people, but to the type of communication. So its anyone (not just scholars) communicating on a scholarly topic.

Cecilia: yes and I guess that having all these connections facilitates these communications. Print this post

