6.4. Data processing and documentation

Data processing constitutes a key step in the data lifecycle and one that researchers must undertake to make data useful for analyses (Paine et al. 2015). Many scholars in information science and other fields point out that knowledge and understanding of the context of data creation are necessary to be able to analyse, share, and reuse data, e.g. Faniel and Jacobsen 2010. Initially, research data are often referred to as ‛raw', meaning they are yet to undergo processing following their creation. However, Gitelman's (2013) impressive edited volume ‛Raw data is an oxymoron' emphasised that data are never raw and always already embody decisions. Embracing the FAIR principles helps to ensure that data processing decisions remain explicit and are documented.

There is a bewildering diversity of processes and practices that fall under 'data processing'. Among other things, 'processing' can mean entering data into lists, transcribing recorded conversations, checking data, validating data, cleaning data, anonymising data, describing data using metadata, choosing appropriate data formats, and choosing appropriate repositories. Research fields (sometimes) differ markedly in all these parameters, e.g. by the extent to which data need to be cleaned before further analysis can happen (Paine et al. 2015), the extent to which data from different sources need to be integrated into new data products to answer research questions, and in terms of finding common data formats. On the one hand, appropriate standards need to be followed in order to make your research data as FAIR as possible; on the other hand, the variability of disciplinary or domain-specific research processes is considerable. Therefore, this may require specific sets of knowledge and skills from researchers and/or research support staff to meet these disciplinary or domain-specific standards.

Support services at the institutional level can usually only provide general guidelines. The minutiae of discipline and method-specific practices need to be provided and supported at the departmental and research group level. In order to make data reusable and interoperable, there should be clear expectations and support at each level to help researchers to:

think about how the data generated might be used by others, and under what conditions;
think about the information others will need to be able to reuse data and translate that information into documentation complying with appropriate metadata standards (if applicable);
make sure they appropriately document every step, ranging from raw to processed data and on to research-ready data;
save and back up documentation alongside important iterations of both primary and processed data; and
think about the appropriateness of file formats in use and for sharing/publication.

Learn more:

Lesson Plan 3: Documentation
Lesson Plan 4: Data creation
Lesson Plan 6: Metadata
Lesson Plan 7: Data standardisation and ontologies

Previous6.3. Data management planning Next6.5. Support infrastructure

Last updated 2 years ago