Friday, 27 March 2009

Some thoughts about Data

One of the interviews I conducted this week left me thinking on the issues of transformation of data and its different meanings. I was talking to this lady about her work with research activity data. She explained me how she collects information from various sources, enters that information in a spreadsheet and accommodates it according to her needs. This accommodation involves the organization of data in columns, the correction of errors, filtering records and adding new ones from other sources. All this transformed data will be entered in a research portal.

I asked my interviewee whether she could contribute some of the data she has so far. She said I could get all that information from the same sources she used, that all of those sources were public. And that made me think…. On the one hand, I thought she was right. All the information she had came from other sources which we can all access. She had not created new data but just worked on existing data. However, on the other hand the sets of data she had been working on represented new pieces of information. The work she carried out on that data transformed it in new data. Data+Work=NewData. So I guess new arrangements of data provide new meanings to that information.

A simple example: You can access an international online database and download a list of publications on economy. You enter that list in a spreadsheet and filter the publications belonging to a particular Oxford University author. Then you attach that list to the author’s profile and you get his bibliography, all his publications since before he joined Oxford. You can do that with all economists in Oxford and you will get a number of bibliographies from Oxford economists. This list will correspond to the produce of Oxford economists across their careers.

You can also use that list to filter all the publications from authors whose affiliations are Oxford University, current staff, or staff who has already left, but who produced the publications while they were working in Oxford. This second list is the produce of Oxford economists in Oxford.

Both lists come from the same source, and possibly they contain the same set of fields, however they represent different things.

When I did my PhD I came across a book called “Information, systems and information systems: making sense of the field” by Peter Checkland and Sue Holwell (1998) a must read if you are in the Information Systems field! This book is about information systems, their creation and relation to IT. In chapter four they discuss the concepts of Data, Information and Knowledge, and they introduce the concept of Capta. These concepts may help you to understand all these processes of transformation of data and how they can acquire different meanings. For Checkland and Holwell (1998) data represents all these masses of facts, observations and concepts that exist in the Universe. Once data are captured as part of an information system, a conversation or any kind of interaction they become Capta. Capta therefore are a subset of data which have been selected through a purposeful process, i.e., according to a criterion which fits a particular purpose. Capta are transformed into Information when they are given meaning and context by their interpreters. Because they depend on interpretation, a subjective process, information can have different meanings to different people. Finally, large structures of information form Knowledge.

Now, how can I explain this in the context of the BRII project?
Well, all these processes of transformation of research activity data into capta and into information happen all over the University. People acquire research activity data from different internal or external sources and transform them according to their needs. New people may use these transformed data and give them new meanings, again according to their new contexts. This seems like a mess, but
  • BRII will sit in the middle, facilitating these processes.
  • BRII will extract capta from data and store it in the Research Information Infrastructure (RII).
Using Checkland and Holwells’s concepts, we can define Research Activity data as capta. Data selected from vast sources which represents and describes only research activities. BRII will ignore data which does not fit this criterion. So the RII will be a container of capta in the sense that it will only host research activity data.
  • If we see this from a different angle, within the universe of the RII and call its content data again, I can say that BRII will provide the means to reuse that data and transform it into capta, capta for every system or individual who access the RII looking for information. (Different purposes and different contexts.)
As explained in the example above, most sets of existing data are data which have been filtered and worked on according to criteria which depend on the contexts and purposes of their owners. Another example, the list of researchers in a departmental website is a subset of researchers of the University. This subset was selected by checking on the affiliation of each researcher to the department or his/her work within a research group or project within the department. The same researcher may appear in another website as he/she is involved in other research activities. However, this researcher does not appear in a thematic website as his/her interests are different. Ideally the RII will hold a list of all researchers in Oxford and by accessing the RII people will be able to extract these subsets of data. This data will acquire different meanings depending on the criteria used for its extraction and on the place and way it is shown.
  • The RII will be a tool to give meaning to huge, disparate, disconnected, complex set of Research Activity data.
The RII will be a big Information System supporting and allowing other systems to exist on top of it. The RII itself and the web services built on top of it will allow the transformation of data into capta and information by its users.

What am I in this context?
I am the person who is looking for these sources of data and capta and tries to understand what they mean to their users and what new meanings can be obtained from them in the future.

Some issues
How do we know the purpose of capta, capta which has been adapted by some people from other sources of capta also within the University. Should the RII store only sets of raw data and allow its users to transform them? Should we use all these sources, data and capta? How?
Again, the answer lies on the semantic web. Semantic web technologies allow the labelling of data with labels which are meaningful to peolple and to computers, that is, tagging data with meaning. From the context of the RII these labels convert data into capta. From the context of the services accessing the RII tags are the means through which users can convert data into capta for their own needs.

Thursday, 19 March 2009

Research Websites in the MSD

Yesterday I attended an interesting presentation in the Oxford Libraries Staff Conference 2009 titled: Research Websites in the MSD (or letting cats herd themselves) presented by Anne Bowtell, the Medical Sciences Divisional Web Manager. It focused on the complex and confusing structures of that division and the difficulties at designing numerous websites and trying to keep all their content accurate, updated and organised.

Anne is a member of the BRII team and the work she’s been carrying out at MedSci has a lot of overlap with BRII. Anne receives requests from all areas within MedSci to create websites for them to help them raise their research profiles. Publicly available information in websites increases their visibility within and outside Oxford. So here we have one area of overlap with BRII: information about research which MedSci want to make visible. Websites are like windows through which everyone can see the research done in MedSci, particularly the research subjects they are working on and the people involved with each subject. Having their information grouped by subject facilitates their work with funding agencies for example. Funding agencies usually have pockets of money destined to efforts in particular subject areas. Even if there is no actual research group or institute aimed at research in an area but only individual efforts or tangential interests, websites can be created to group those people and efforts even if they belong to different areas in the University. So we can create a website to give a face to something that is not a physical entity. An example of this is the Themes like the Cardiovascular Science theme:

Anne explained that one problem she had at creating these websites was data duplicity and accuracy. For example, one researcher within MedSci may appear in his/her departmental website, research group website, theme website and so on. This researcher’s information may appear different in each source; it may be outdated or wrong. So here we have this issue of keeping control of data which may be visible from different windows. Common sense would tells us that we should have only one reliable source for that data and that each website should access that unique source. In theory that sounds sensible, however in practice that can be very difficult to achieve. Hmmm, and actually this would not make sense in a research environment. Research environments are fast moving free environments. Everyday researchers come up with new ideas, and the ways of expressing those ideas change from person to person and subect area to subject area. It makes more sense if they had the freedom of updating their own information the way they believe is best. But that leaves us in a vicious circle, if we let researchers creating information independently we can end up with a huge mess. On the other hand if we try to control or centralise the control of what they produce, we could be hindering the researchers work. The truth is, you can't herd cats! So how then can we break this vicious circle? Anne thinks the answer lies on the Semantic Web. And of course BRII shares the same believe.

Tuesday, 17 March 2009

Interviews v Online Surveys

Interviews and Online Surveys, both are research data collection methods. Both are useful in different ways. For example, interviews allow the researcher to meet with the people s/he wants to talk to, they allow for more rapport building and perhaps more honest and thoughtful responses. During interviews the interviewer can re-phrase questions, clarify points, ask new questions depending on the answers of the respondent, one can also get insights from the respondents’ reactions, gestures, tone of voice, etc.

On the other hand Online Surveys, or surveys in general, are (or should be) quicker to fill, have (or should have) clearer and concise questions. Questions could have a list of options to choose from, so as to keep the range of possible responses short and controlled. Online surveys can be completed where and when is more convenient to the respondent... so no need to arrange meetings etc.

In general, interviews are considered better means for collecting data, especially qualitative data. However, surveys could also be designed to collect qualitative data. Interviews and surveys can work together in many ways. For example, surveys can be used previous to a series of interviews to explore an area of research which is little known. Surveys would raise some general topics of discussion which could then be investigated in detail through the interviews. This of course could take a long time.

For BRII's stakeholder analysis we need qualitative data. Data that allow us to understand how Research Management Data are created, used, updated, exchanged, shared, etc. Data which also can give us insights into what the people creating and using these data DO and cannot do but would like to do, what they think of the quality of the data they have, what they think of their current processes and how they could be improved. We want all this and more. In my opinion the best way to get all this information would be through interviews. To talk to stakeholders face-to-face in their work environments. Maybe to ask them to show me what they have and what they do in situ. However there are some constraints which will not allow me to do that in all cases.

As in BRII we have some time constrains, I thought on ways to use that little time in more productive ways. Not only BRII is a short project (for such ambitious aim!) but our stakeholders experience even tighter time constrains. People who work with Research Management data are usually busy people. These range from Project Managers to Principal Investigators, from Research Facilitators to Departmental Administrators. BRII aims at helping them in their work, but it needs some help from them as well. I am aware that some or perhaps most of our stakeholders could not have time for interviews.

So I have thought on running the interviews and online surveys in parallel, both serving the same purpose, i.e., one will not be used to get information for the other, but will be designed to get the same kind of information. Interviews will be used when stakeholders have the time to meet with me (30mins to 1hr) and online surveys would be sent out by emails when the stakeholder is busy.

Clearly data from interviews would be richer, but as things are in Oxford, data from Surveys could be more in quantity and perhaps providing many more perspectives from people who work in different areas of the University.

Friday, 6 March 2009

Stakeholder Analysis

Another week into the project....

And now I have set foot on the analysis process, writing project briefs and sending emails like crazy... hmmm well not like crazy, I am still in first gear but I gaining speed!

I have explained in previous posts that Oxford University is so big and so complex I needed a few weeks to absorb and understand all these names, acronyms, hierarchies, structures, organisational culture, reasons for doing things the way they are done, traditions, history, etc, etc. I am still digesting. Just yesterday I attended a consultation meeting where we discussed a draft of a strategic plan for OULS. As I am new in Oxford University I asked many questions, and was surprised by the answers. Those revealed an institution which takes pride in its traditions and achievements. OULS as, I think, the rest of the University, want to develop new state of the art systems by building on the strengths and opportunities inscribed in their traditions. BRII is a good example of this as it has been designed so that groups around the University can continue to work as they are used to and will not have to transfer to a new technical system to take advantage of the project work. That is, BRII will cause little interference among its stakeholders.

Anyhow, as BRII has to collect sample data sets from its stakeholders, I had to think on which departments or research areas would be suitable places to start from. I thought one strategy could be to shoot everything that moves and ask everyone in my mailing list! but then I would probably not get good feedback and good quality data. So I decided to choose according on type of area (depending on different characteristics) and of course on availability and will of stakeholders to contribute. As we need variety to demonstrate BRII will be able to cope with Oxford University heterogeneity I thought on the following:
  • Choose one department from each of the following divisions: Humanities, Mathematical, Physical & Life Sciences and Social Sciences. People belonging to those divisions are from different species. They will provide us with different kinds of data in terms of the kinds of research they do and also different perspectives on the needs and new uses that Research Management data can fulfil.
I have set my eyes in Phonetics Laboratory in Humanities, ComLab in MPLS and OII in Social Sciences. I have contacted people in the first two and I am still waiting for a response from someone in the third one.
  • As Medical Sciences are part and main stakeholder of BRII I should get more departments from that division involved. I am looking for variety here as well. In a conversation with Simon Neil, Medical Sciences Administrative Officer, he suggested I should look for clinical and non-clinical departments (I don’t understand the difference between them but I guess they do research in different ways.) He also suggested looking for departments which are using the standard divisional CMS (content management system) to develop their websites, and departments which are using different approaches. Finally, he suggested one big department and a small one. With all that I will get sample data that represent the whole division.
Here I have set my eyes in the following departments or units:
  1. Nuffield Department of Anaesthetics – smallest department in the division, no CMS
  2. Nuffield Department of Clinical Medicine – clinical department, use a different CMS, biggest department in the division
  3. Department of Physiology, anatomy and genetics – Non-clinical department, use standard CMS
  4. Department of Cardiovascular Medicine – clinical department, use standard CMS
  5. Health Economics Research Centre – They are users of ORA
  6. Babylab – they have connections with Phonetics Laboratory
  • Get data from one University College
  • Get data from one central administrative department, such as Research Services.
Now, I am sending emails to people working in those areas, introducing myself and the BRII project. I am first asking for a quick face-to-face or telephone conversation to inform them about the project and the possible ways that each area could get involved. I am talking about interviews and online surveys but I will explain that in another post. I have followed Simon Neil’s advice and contacted departmental administrators and research facilitators. They are people who have an overall view of their areas; they know people and may have access to some data.