Friday, 27 March 2009

Some thoughts about Data

One of the interviews I conducted this week left me thinking on the issues of transformation of data and its different meanings. I was talking to this lady about her work with research activity data. She explained me how she collects information from various sources, enters that information in a spreadsheet and accommodates it according to her needs. This accommodation involves the organization of data in columns, the correction of errors, filtering records and adding new ones from other sources. All this transformed data will be entered in a research portal.

I asked my interviewee whether she could contribute some of the data she has so far. She said I could get all that information from the same sources she used, that all of those sources were public. And that made me think…. On the one hand, I thought she was right. All the information she had came from other sources which we can all access. She had not created new data but just worked on existing data. However, on the other hand the sets of data she had been working on represented new pieces of information. The work she carried out on that data transformed it in new data. Data+Work=NewData. So I guess new arrangements of data provide new meanings to that information.

A simple example: You can access an international online database and download a list of publications on economy. You enter that list in a spreadsheet and filter the publications belonging to a particular Oxford University author. Then you attach that list to the author’s profile and you get his bibliography, all his publications since before he joined Oxford. You can do that with all economists in Oxford and you will get a number of bibliographies from Oxford economists. This list will correspond to the produce of Oxford economists across their careers.

You can also use that list to filter all the publications from authors whose affiliations are Oxford University, current staff, or staff who has already left, but who produced the publications while they were working in Oxford. This second list is the produce of Oxford economists in Oxford.

Both lists come from the same source, and possibly they contain the same set of fields, however they represent different things.

When I did my PhD I came across a book called “Information, systems and information systems: making sense of the field” by Peter Checkland and Sue Holwell (1998) a must read if you are in the Information Systems field! This book is about information systems, their creation and relation to IT. In chapter four they discuss the concepts of Data, Information and Knowledge, and they introduce the concept of Capta. These concepts may help you to understand all these processes of transformation of data and how they can acquire different meanings. For Checkland and Holwell (1998) data represents all these masses of facts, observations and concepts that exist in the Universe. Once data are captured as part of an information system, a conversation or any kind of interaction they become Capta. Capta therefore are a subset of data which have been selected through a purposeful process, i.e., according to a criterion which fits a particular purpose. Capta are transformed into Information when they are given meaning and context by their interpreters. Because they depend on interpretation, a subjective process, information can have different meanings to different people. Finally, large structures of information form Knowledge.

Now, how can I explain this in the context of the BRII project?
Well, all these processes of transformation of research activity data into capta and into information happen all over the University. People acquire research activity data from different internal or external sources and transform them according to their needs. New people may use these transformed data and give them new meanings, again according to their new contexts. This seems like a mess, but
  • BRII will sit in the middle, facilitating these processes.
  • BRII will extract capta from data and store it in the Research Information Infrastructure (RII).
Using Checkland and Holwells’s concepts, we can define Research Activity data as capta. Data selected from vast sources which represents and describes only research activities. BRII will ignore data which does not fit this criterion. So the RII will be a container of capta in the sense that it will only host research activity data.
  • If we see this from a different angle, within the universe of the RII and call its content data again, I can say that BRII will provide the means to reuse that data and transform it into capta, capta for every system or individual who access the RII looking for information. (Different purposes and different contexts.)
As explained in the example above, most sets of existing data are data which have been filtered and worked on according to criteria which depend on the contexts and purposes of their owners. Another example, the list of researchers in a departmental website is a subset of researchers of the University. This subset was selected by checking on the affiliation of each researcher to the department or his/her work within a research group or project within the department. The same researcher may appear in another website as he/she is involved in other research activities. However, this researcher does not appear in a thematic website as his/her interests are different. Ideally the RII will hold a list of all researchers in Oxford and by accessing the RII people will be able to extract these subsets of data. This data will acquire different meanings depending on the criteria used for its extraction and on the place and way it is shown.
  • The RII will be a tool to give meaning to huge, disparate, disconnected, complex set of Research Activity data.
The RII will be a big Information System supporting and allowing other systems to exist on top of it. The RII itself and the web services built on top of it will allow the transformation of data into capta and information by its users.

What am I in this context?
I am the person who is looking for these sources of data and capta and tries to understand what they mean to their users and what new meanings can be obtained from them in the future.

Some issues
How do we know the purpose of capta, capta which has been adapted by some people from other sources of capta also within the University. Should the RII store only sets of raw data and allow its users to transform them? Should we use all these sources, data and capta? How?
Again, the answer lies on the semantic web. Semantic web technologies allow the labelling of data with labels which are meaningful to peolple and to computers, that is, tagging data with meaning. From the context of the RII these labels convert data into capta. From the context of the services accessing the RII tags are the means through which users can convert data into capta for their own needs. Print this post

No comments:

Post a Comment