Wednesday, 30 June 2010

BRII Summer Project

This another update to explain our summer 2010 activities. At BRII we are working on a reporting system where users can notify us and official sources of data about errors they find in Research Activity Data. This system will help us and our sources to improve the quality of data. As we are harvesting data from other sources we are designing a system were users can flag errors and send notifications to the appropriate people (sources and BRII) These notifications will contain enough information to decide on a suitable action to take.

Errors could originate from the content of data themselves or from the process of aggregation we perform at BRII. Say for example, misspellings and wrong information in source data, information which have been aggregated but which belong to different people with the same names, information belonging to the same person but which appears as belonging to two or more people with the same name, etc.

In relation to aggregation errors, Anusha has been working hard to design a system to accurately identify sets of data which belong to the same person. For example Prof John Smith in source 1 and J. P. Smith in source 2 could be the same person, or could not. For this she is using extra information that comes with data such as affiliation etc. When her algorithm is finished we will be able to merge two or more "people" into one or divide one "person" into two or more "people" as requested by administrators or users who identify inaccuracies.

For the summer task we are collaborating with the Computing Laboratory Comlab. Anusha is currently harvesting their data and Monica is working on the reporting/notification forms within the Blue Pages. We will soon contact Comlab again to check their harvested data and participate in tests. We would like to thank Thorsten Hauler, research facilitator, and Edward Crichton, web manager, from Comlab who have kindly given us their time.

This summer project is part of the data quality control that we are trying to establish within the registry. I have talked about this in a previous post. Print this post

