Socio-Cultural characteristics of usability of bioinformatics databases and tools

Abstract

With the increasing importance of the usability of bioinformatics systems and databases, this paper examines the socio-cultural characteristics that may affect the usability of such tools. We understand socio-cultural characteristics to be the norms, values, and beliefs that mediate the interactions between the structures and institutions of science (i.e. disciplines, universities, funding organizations), and its practitioners. These factors are not necessarily distinct from the technical features of a database, but do nevertheless affect the context in which one chooses to use a particular set of tools. We have developed three socio-cultural characteristics of bioinformatics database usability: accessibility, utility, and portability. By ‘accessibility’, we mean the social and cultural attributes that make resources open and available for use, such as intellectual property arrangements or institutional reputation and prestige. ‘Utility’ in this context means the perceived usefulness of a database, which can be determined by non-technical matters such as trust and taste. ‘Portability’ refers to the social aspects of criteria such as maintenance funding, and input and storing standards that allow a database to move through space and time. In this article, we call for a social science research programme on these — and other — socio-cultural characteristics to usability. We invite researchers in human–computer interaction, bioinformatics, usability engineering and other areas to extend their work to examine the social contexts in which these systems are used, and the sociocultural factors that mediate their use. Such a research programme would increase the multidisciplinary nature of these emergent fields, and help address the complexities of work in the post-genomic era.

Keywords

Usability Social sciences Multidisciplinary research

Introduction

The availability and importance of bioinformatics databases and software tools have been increasing at a rapid rate, reflecting accelerating demand. Given the expanding number of bioinformatics resources now available for the analysis of genomic data, this paper examines the socio-cultural characteristics that may affect the usability of such tools for researchers lacking computational backgrounds.

The need to integrate social sciences into these kinds of scientific and technological developments was identified in a Special Issue in Briefings for Bioinformatics on building successful biological databases. The guest editor Russ Altman (2004, 5) summarized:

… we cannot help but observe that every one of these reports stresses the non-technical aspects of creating and maintaining a successful database. To be sure, there are important challenges and lessons on the technical software side and the biological domain side, but these challenges seem to pale in comparison with the formidable sociological challenges associated with convincing scientists to share their data at a level of detail rarely matched in the published literature (emphasis added)

While this paper does not provide instructions on how to convince scientists to share their data, we agree with Altman and other authors in the Special Issue that significant non-technical, sociological factors are involved in the construction of databases. Here, we extend this position to include the importance of socio-cultural factors in the use of databases. We understand socio-cultural characteristics to be the norms, values, and beliefs that mediate the interactions between the structures and institutions of science (i.e. disciplines, universities, and funding organizations), and the practitioners of science. These factors are not necessarily distinct from the technical features of a database, but they nevertheless affect the context in which a particular set of tools is chosen and used. These characteristics entail factors such as the prestige and reputation of the database's host institution; openness, availability and ownership of the resources; as well as the degree to which a network of users exists for the tools in question. Our dual purpose in describing and understanding these socio-cultural features is to extend the potential of bioinformatics resources for user groups that lack computational backgrounds, and to expand the scope of usability studies.

The notion of usability has already begun to figure in the process of evaluating the merits of bioinformatics systems. For example, recent work by Bolchini and colleagues (2009a, 407) has addressed user testing of such systems in order to understand technical ‘design characteristics of web bioinformatics resources’ that ultimately affect usability. Our paper builds on this work by focusing on the socio-cultural elements that shape the usability of platforms and tools. We argue that a research programme that examines usability from a socio-cultural perspective may offer practical guidance to the users and developers of bioinformatics systems.¹

We have developed three key socio-cultural characteristics of bioinformatics database usability: accessibility, utility, and portability. By ‘accessibility’, we mean the social and cultural attributes that make resources open and available for use, such as IP arrangements or institutional reputation and prestige. ‘Utility’ in this context means the perceived usefulness of the database, which can be determined by non-technical matters such as trust and taste. ‘Portability’ refers to the social aspects of factors such as maintenance funding and input and storing standards that allow a database to persist through space and time. Each of these characteristics of usability are composed of — and defined by — a number of dimensions that we explore in the core of the paper.

Previous work on usability in bioinformatics systems

The concept of ‘usability’ is beginning to be adopted by those involved in the design and evaluation of bioinformatics databases and tools.² The roots of usability research can be traced back at least two decades. Rebhan and colleagues (1998) conducted usability analysis on GeneCards over a decade ago,³ and Rainer Fuchs’ editorial questioned whether ‘point-and-click user interfaces and sophisticated graphics output are enough to ensure that the capabilities of such programs are exploited in the most productive — and scientifically sound – way.’ (2000, 491). In computer science, and specifically software design, the importance of usability has been established since the mid-1980s when Gould and Lewis (1985) set out basic principles to guide the design of user-friendly(er) computer systems. At the end of the decade, Fred Davis published highly influential and verifiable ‘new scales for two specific variables, perceived usefulness and perceived ease of use, which are hypothesized to be fundamental determinants of user acceptance’ (1989, 319). This research helped to clear the way for more work in this area and in early 1990s, Jakob Nielsen published Usability engineering (1993), which established early definitions of usability, as well as parameters and methods for usability testing. More recently, human–computer interaction research has allowed the concept of usability to influence software development (Preece et al. 2002) and teaching (Rosson and Carroll 2002).

In work directed at designing ‘better bioinformatics systems through usability analysis’, Bolchini and colleagues draw on the international standard ISO 92491 for a definition of usability as ‘the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use’ (2009a, 406). While we do not take issue with this definition, it fails to address the question: what makes a bioinformatics system effective, efficient and satisfactory? Our position is that criteria of accessibility, utility, and portability are significant influences on usability and should be included as key characteristics. Further, all characteristics have socio-cultural determinants: individual users do not exist within a social and institutional vacuum. Rather, they are located within academic and industrial laboratories, user communities, and research networks and consortia, all of which are governed by norms, values, and beliefs that shape approaches to database usability.

The purpose of invoking socio-cultural characteristics and involving social scientists in discussions of usability is not to design a better evaluation tool, or to conduct evaluation studies of bioinformatics systems. Such programmes exist already and our expertise lies elsewhere. Further, while we acknowledge existing initiatives that deploy social science research methods, such as ethnographic fieldwork, to examine how bioinformaticians use systems (Javahery et al. 2004), we believe usability research should include a more comprehensive social science research programme. The absence of social science approaches to usability represents a gap in the literature. More importantly, social science approaches represent an exciting new avenue in usability research that will complement existing understandings.

Characterizing databases and their users: A multidisciplinary approach

There are a great number of publicly available bioinformatics databases and tools that range from simple nucleotide sequence repositories to more specialized resources. While it is impossible to categorize all databases, some can be grouped into several generalized types:

sequence repositories that contain raw sequence data

integrated sequence databases that contain annotated sequence data

gene expression repositories for miRNA, RNA, protein, etc

integrated model organism databases that incorporate multiple types of data for a single organism

integrated specialized databases that contain specialized data for specific biological topics.

The major nucleotide sequence repositories are used to provide fast access to new, raw, sequenced data, and as such are rudimentary in form with no additional annotation. The International Nucleotide Sequence Database Collaboration (INSDC) is a joint effort of three partner nucleotide sequence databases of the European Molecular Biology Laboratory (EMBL), DNA Database Bank of Japan (DDBJ) and GenBank, and synchronizes sequence data on a daily basis, which includes single genes, whole genomes, RNA, expressed sequence tags (ESTs), cDNAs and synthetic sequences. Other nucleotide sequence databases include RefSeq, which provides an integrated and non-redundant set of sequences including genomic DNA, RNA and protein sequences. Similarly, Ensembl is a joint effort by the European Bioinformatics Institute (EBI) and the Wellcome Trust that uses raw genomic sequence to produce and maintain fast automatic annotation for raw genomic sequences on selected genomes. There are also several major protein sequence databases including GenPept, Entrez protein and UniProt as well as a number of microarray and serial analysis of gene expression (SAGE) databases (e.g. Genomics Ontology — GEO) and protein structure databases (e.g. Protein Data Bank, Cambridge structural database). Integrated databases on model organisms include WormBase, FlyBase, Rat Genome Database and Mouse Genome Informatics. Other specialized integrated databases provide information on single nucleotide polymorphisms (SNPs), genome wide association studies (GWAS), miRNAs and DNA regulatory elements. The sheer number of databases and database types shows how crucial data-sharing is in bioinformatics.

Emergent alongside this variety of databases is a concurrently diverse set of users who co-evolve with technological systems such as bioinformatics resources. The importance of this phenomenon has been recognized in fields such as science and technology studies (STS) (Oudshoorn and Pinch 2003), innovation studies (von Hippel 2005) and the sociology and anthropology of science and technology (Pinch and Bijker 1984). Research in these areas has shown that users — and their internal norms, values, and beliefs — are crucial actors in the technological development process, and not simply passive recipients of end-products. In the case of bioinformatics, with different groups of biologists and computer scientists charged with different tasks and research questions, databases can be seen to be (socially) shaped accordingly. For example, the development of interface-driven software programs in bioinformatics can be seen to evolve alongside increasing involvement of biologists who lack computational expertise but need to use these web resources to either access the wealth of data they contain or contribute to it by publishing their own sequence data. In contrast, biologists with advanced computational experience (who may identify themselves as ‘bioinformaticians’⁴) are involved in the use and construction of analytical software that allows them to analyse their data and direct future research. On the other hand, computer scientists (who may also identify themselves as ‘bioinformaticians’) are more interested in accessing full datasets in order to develop new computational and analytic tools to support future research. The diversity of bioinformatics systems and their users calls for a richer understanding of how — or whether these seemingly disparate, but tightly connected ‘communities of users’ come together in an organized manner. Our characteristics of usability should be seen as socio-cultural factors that might rally or dissipate user communities of all varieties.

The bioinformatics resources that form the basis for our comments here [i.e. DiscoverySpace (Robertson et al. 2007), InnateDB (Lynn et al. 2008) and WormBase (Harris et al. 2010)] represent different kinds of genomics databases, complete with diverse user communities. DiscoverySpace is a gene expression repository with SAGE data; InnateDB is an integrated specialized database with a range of functionalities and specialized annotated information related to immune system regulation; whereas WormBase is an integrated model organism database incorporating multiple types of data for the Caenorhabditis elegans nematode model organism. Examination of these three different databases offers an excellent opportunity to investigate socio-cultural characteristics of usability across a range bioinformatics resources, and to point to more general processes that are not limited to one kind of system.

Characterizing usability

Socio-cultural characteristics make bioinformatics systems more usable or less usable. In the sections that follow, we show how characteristics of accessibility, utility, and portability affect usability, and that these characteristics themselves are subject to socio-cultural influences. It is our position that these characteristics are not purely socio-cultural in nature; some of the components we examine are partly technical (e.g. the degree to which a database is open source and open access). Rather than focusing on the technical nature of these components, however, we will identify and focus on the socio-cultural milieu in which they are embedded. We clarify what a social science perspective brings to the characteristics of usability, and we have identified and provide a set of dimensions that highlight the socio-cultural nature of what may appear at first to be purely technical determinants of usability.

The usability characteristics of accessibility, utility, and portability are not intended to be exhaustive or mutually exclusive. We anticipate that other factors may come to characterize usability as this research programme matures. Further, the complexities within each characteristic suggest significant areas of overlap that we will return to in the conclusion.

Characteristics of usability: Accessibility

Open source, open access, open science

The usability of bioinformatics databases and software is mediated by the intellectual property (IP) policies and practices that regulate access. Such policies and practices are informed by historical and institutional contexts, and choices grounded in the cultural, political and social perspectives of decision-makers in university–industry liaison offices (also known as technology transfer offices) responsible for securing IP, and negotiating its transfer to third parties. The dramatic ‘private versus public’ race to sequence the human genome in the late 1990s and early 2000s gave rise to ethical and social concerns related to the proprietary status of genomic sequence information housed in databases (Sulston and Ferry 2002). Researchers in genomics and other fields of biology were troubled that access to this fundamental resource would be constrained by private ownership and many proposed that such a practice would impede basic science. Early work by pioneering sociologist of science Robert K. Merton argued that the institution of science was governed by a set of norms, which included a kind of communalism that valued transparency and sharing so that collaboration could be fostered for the advancement of science (Merton 1942; 1973). Secrecy and proprietary strategies effectively breach those norms and can impede knowledge production. At the time of the Human Genome Project, concerns about the availability of sequence data were also associated with worries about the impact of gene sequence patent practices on the open practice of science (Sulston and Ferry 2002) and on the pursuit of research and innovation (Heller and Eisenberg 1998; Cook–Deegan and Dedeurwaerdere 2006).

In the case of bioinformatics systems, database information and associated software tools are automatically protected by copyright. Users of such systems are legally obligated not to copy, modify or distribute any component of the work. In order to create a more open status for the tools and information that make up a database and associated software/plug-ins, some curators or owners choose to deploy a General Public License (GPL, also known as GNU's Not Unix (GNU) General Public License) or ‘copyleft’ license. To choose this approach means rejecting the copyright status of the work, instead giving users the right to freely copy, modify, and distribute data and source code. GPL licenses are viral in nature as the license requires the user to apply the same open terms to future iterations/improvements of the data or code.

Creative Commons (CC) licenses are similar to copyleft licenses, but are variable in application, allowing creators to determine which rights they want to reserve and which they will waive for the benefit of recipients and other creators. Data may also be components of patented inventions, for example, gene sequences. If a gene sequence is referenced in an ‘open access’ GPL database, then there may be a perception that all the information contained therein is freely available. However, researchers using the sequence may find themselves infringing a patent. Thus, from a legal IP standpoint, the nature of accessibility can be measured in terms of openness. In other words, the various ways in which materials (data and code) can be legally accessed determines the level of openness and therefore the usability of the resources.

It should also be noted that scientific fields have different internal cultures regarding IP. Bioinformatics has emerged from the disciplines of computer science, which has a strong cultural tradition of open source and open access (Hope 2008), whereas the field of biotechnology has experienced a tension between the norms of communalism and pressures to commercialize findings and products. While there are practical and technical justifications for the development of open source and open access norms in bioinformatics, these reasons cannot be easily separated from their socio-cultural contexts. Norms and values concerning IP affect whether a resource is fully or partly accessible which, in turn, has clear implications for the overall usability of a database.

Mutual understanding

Accessibility also concerns the way systems are understood by the prospective user community.⁵ Technical features can help users from varying computational backgrounds make sense of a database. To a large degree, however, accessibility depends on mutuality: the extent to which designers understand how a system will be used, and how users grasp the complexities and capabilities of the tool at hand. According to the founding investigators of the database Ensembl,

… there are occasions both when biologists do not appreciate aspects of running databases, and IT professionals are naïve about the challenge they face in implementing a successful biological database. (Birney and Clamp 2004, 31)

Mutual understanding between the developers and users of a database is fundamental to its usability. Practitioners who have participated in interdisciplinary research teams know that each participant brings different analytical tools for knowing and understanding the world (i.e. epistemologies) to the table. Karen Knorr–Cetina (1991, 107) refers to these differences as ‘epistemic cultures’ and ‘cultural structure[s] of scientific methodology’. Forging disparate epistemic cultures into a functional and productive interdisciplinary team capable of producing a viable interdisciplinary bioinformatics product requires intense social negotiation. This negotiation cannot be reduced to the ironing-out of ‘mere’ technical differences between team members. Rather, when such cultural realignments are successful, the usability of end-products can be greatly improved with enhancements to comprehension and overall accessibility.

Institutional reputation and prestige

The accessibility of a bioinformatics system is mediated by the communities, networks and institutions within which it is located. Valuable research by Sine et al. (2003) shows how a university's prestige influences the extent to which research innovations get licensed. Merton's classic work on the ‘Matthew effect’ (1968) demonstrated how scientific research from renowned institutions receives more citations and recognition than work of the same quality from lesser known institutions. We argue that bioinformatics resources and systems are subject to similar socio-cultural dynamics in that open source/open access bioinformatics tools produced by more prestigious bioinformaticians, bioinformatics labs, and universities are likely to receive more use than those from lesser-known sources. This is not to say that a particular lesser-known source is not deserving of recognition, nor does it imply that systems linked to prestigious institutions are scientifically superior. Rather, we are suggesting that the relative accessibility of a system is mediated not only by its technical capabilities but also by the reputation of the lab wherein it is located.

Characteristics of usability: Utility

Given the number of bioinformatics resources available, what qualities of a tool make it more likely or less likely to be used? One response might be that the ‘best’ tool is the one chosen for the job; in other words, the most appropriate resource is selected for the specific task at hand. It is not always evident, however, why one bioinformatics tool may be more ‘appropriate’ or ‘better ’ than another. Is utility determined by the tool's user-interface or analytical suite? How does the quality and curation of data influence utility; are issues of trust involved? How effectively does the resource facilitate the completion of a scientist's research goals? From our perspective, these questions relate to the socio-cultural factors that influence choice and perceptions of the resource's utility and overall usability.

Trust

According to Arnold S. Relman, former editor of The New England Journal of Medicine, although trust is an integral part of science, it runs counter to the rational basis of the scientific enterprise,

[i]t seems paradoxical that scientific research, in many ways one of the most questioning and skeptical of human activities, should be dependent on personal trust. But the fact is that without trust the research enterprise could not function … Research is a collegial activity that requires its practitioners to trust the integrity of their colleagues. (Hardwig 1991, 693)

Modern science is rarely conducted by isolated individuals, and this has led scientists and social theorists alike to suggest that trust is an increasingly important element in sharing information and social systems such as science (Wilkie 1996; Hart and Saunders 1997; Luhmann 2000). The field of bioinformatics is no exception. A database's utility is tempered by the extent to which users trust the information/data carried within it. The presence of false data, or scepticism about the quality of curation and maintenance, are likely to undermine trust and affect the ways in which users engage with the resources. To achieve buy-in, users must have confidence that both technological and socio-cultural changes will not lead to the discontinuation of the database and that the currency of the dataset will be maintained. Disruptive technologies that stand to completely alter previous ways of working and the kinds of problems that can be tackled — such as next generation sequencing technologies — can generate an increased volume of data and subsequently improved iterations of datasets. For a database to be useful to its user-communities, they need to trust that positive, not negative, disruptions of databases will occur. Consistent communication is essential for trust-building, whether of changes in database content or changes related to the information on how database inputs are curated. Trust can also be built at an institutional level, with the institution's reputational capital casting a halo effect on the perception of the resource, at least in the initial stages. In addition, users may be more patient concerning database updates if the resource is housed at a reputable institution.

Network effects, path dependence and technological lock-in

The utility of a system can also be determined by the extent to which it is integrated into routine research practices (e.g. uploading local data into larger meta-databases) and/or other technologies; neither of which are necessarily related to a database's technical optimization. In this sense, utility is partly dependent on network externalities and network effects. Economists have long established that the value of a particular product can depend in part on the number of other people who have and use it, or what is referred to as ‘consumption externalities’ (Katz and Shapiro 1985). We see similar processes at work in how utility is inscribed into bioinformatics systems. Database developers can choose to design visualization tools specifically as plug-ins for established systems which have a large community of users. Cytoscape is one example (see cytoscape.com). In tapping into the positive network externalities associated with Cytoscape, developers of smaller bioinformatics resources are able to maximize the overall utility of their tools, even if their rendering within Cytoscape is less than optimal. Similarly, database development teams can choose to construct the data within their system to be easily integrated into larger systems such as GEO. While this decision may not be necessary for a database's technical operation — in fact, it may cost extra time and money — compatibility improves utility, as data from both systems can then be examined side-by-side. In both of the above examples, database developers seek not only to maximize positive network effects in terms of tapping their technology into an existing community of users, but also to do so in particular systems that can be characterized by certain degrees of ‘path dependence’. The theory of path dependence refers to a ‘stochastic process… whose asymptotic distribution evolved as a consequence (function of) the process's own history’ (David 2001, 19). Put another way, path dependence suggests that the adoption and persistence of a technology is not necessarily based on whether it is ‘the best tool for the job’. Rather than efficiency, factors such as first-to-market advantages and the development of peripheral support services specific to the technology may account for its ‘lock-in’ as the preferred tool. Subsequent developments are ‘path-dependent’ to the extent that, in order to find a market, they must support the now-dominant technology. Further research would be needed to identify what in the history of Cytoscape led to its lock-in as one of the preferred gene expression data visualization platforms, or of GEO as one of the central gene expression databases. Nevertheless, it seems clear that the utility of any new resource is determined, to a large extent, by its network associations with path-dependent systems that are steeped in historically contextualized use.

User interfaces and tastes

User interfaces (UI) can be the single biggest factor affecting the utility of a database. If users are unable to rapidly decipher how to format and submit sequences, fix errors within the system, and interpret the output, the tool may simply be abandoned in favour of another. Most early developments in bioinformatics took the form of algorithms without web systems or training datasets (Fuch 2000; Ouzounis and Valencia 2003). Newer approaches integrate fields such as information visualization (infovis) (Card et al. 1999) and visual analytics (Thomas and Cook 2005) in the construction process to improve accessibility. Infovis principles are increasingly deployed in the development of layout and visualization tools in order to ‘enable biologists without a computational background to explore their data in a more systems-oriented manner ’ (Lynn et al. 2008, 2). Making a database more intuitive and comprehensible to a broad user-community is a function of its technical configurations, but the manner by which ‘tastes’ surrounding UI are chosen, and the processes by which users are configured in the deployment of infovis techniques, is highly socially mediated. According to lead investigators of Ensembl,

We have a particular style of designing and running databases, but we recognize that this is just one style; other people run other databases differently with equal or better results. We expect readers might be interested in our ‘best practice’ rules, but this advice is more ‘to taste’ for each individual. (Birney and Clamp 2004, 34)

Diversity in and around the bioinformatics community is one of the reasons ‘tastes’ figure so centrally in the development of systems, and it is therefore not surprising that UI developers struggle with the challenge of predicting the needs and interests of heterogeneous users. Sociological research has shown how structures such as family and education can impart cultural cues and tastes on their members, which make their navigation of other institutions such as the workplace or healthcare more manageable (Bourdieu 1984). We propose that socio-cultural contributions to the usability of a research programme would similarly examine how tastes are constructed within particular communities of practice, and then deployed to facilitate the use of bioinformatics systems.

Characteristics of usability: Portability

Bioinformatics resources are generally considered to be web-based software systems/tools and digital databases available to those with internet access. In this context, ‘portability’ does not describe the capacity of these resources to be physically packed-up and moved from one location to another. Rather, portability is the ability to move a resource through conceptual space and time. Central to this understanding of portability would be the degree to which a resource could be used outside of the lab in which it was developed, or the extent to which the resource can be deployed in multiple contexts of use. Databases with such characteristics would be what Hine calls ‘transportable technologies for doing science’ (2006, 291). Is it the case, for instance, that a database is only useful for those conducting research into a specific area of systems-level biology? Or are the tools and information within the database of use to others outside of a particular project team or even scientific field/discipline?

Data format and storing standards

The technical design of bioinformatics resources has been shown to affect their usability (Bolchini et al. 2009a). The use of appropriate data format standards is key to ensuring functionality, and user communities are encouraged to agree on the database standards and modular structure that is most appropriate for their needs. Community decisions must also be made concerning the type of software/plug-in language that will be used to interpret database information. Data format standards and software language also affect the degree to which a database can be used in contexts and disciplines for which it was not originally envisioned. Storage standards for databases also need to be determined by the community. Depending on the rate of submissions to a database and transactions between databases, the chosen database technology may be rendered inadequate to accommodate rapid changes in growth. Many of these decisions are commonly guided by practical and technical considerations, but processes by which agreements are met and standards are constructed — as in any other instances of negotiating agreement and constructing language — are subject to the social and cultural norms of the community in question.

Maintenance

For a database to demonstrate portability over time, the resource needs to be maintained beyond the life of the grant that originally funded its creation. As with other features of usability, maintenance is in part a technical issue: hardware requires updating and replacing; programming languages change; and disruptive technologies can alter the entire research landscape. While usability demands that these technical issues be dealt with, a recent editorial in Nature (2009) has pointed out the central importance of non-technical issues such as funding in maintaining biological databases. For resources with proven utility, the alignment of national and international science funding agendas is needed to ensure that the investments made to develop resources for the scientific commons are not squandered. Cooperative strategies, which are likely to more closely resemble international science treaties than maintenance funding applications, are needed so that front-line biology research can continue (Editorial, Nature, 2009).

The maintenance of the back-end of systems (i.e. software design, data processing components, and programming code) can be further complicated by the turn-over of personnel, and the associated loss of tacit knowledge. While programming languages that alter and maintain the infrastructure of these databases are standardized, the deployment of the code to carry out upgrades and changes to the system are very much a matter of personal preference and style. Not every programmer codes in the same way, and it may be that they have a tacit, rather than codified, understanding of how the system is structured and how it is to be maintained. While Konagaya (2006) has argued that knowledge grids could be designed ‘for sharing tacit knowledge among a community’, the individual stylization of programming can nevertheless cause problems for long-term maintenance of biological databases.

Application to other contexts of use

Alongside the standards affecting portability of a resource through space, and maintenance issues that allow for portability through time, we have identified a number of socio-cultural factors that affect the extent to which a system can be moved into differing contexts of use. According to Hine's social science work on the development of a databases aimed at assisting in the mapping of the mouse genome,

[o]n their own, then, databases do not promote data re-use: the way that the package is presented and researchers’ expectations about the interpretation of data influence both whether and how data can be taken up and incorporated into new projects’ (Hine 2006, 292).

When a resource can be mobile enough to be applied both within and beyond the community that developed the system, then it may be effectively considered a transportable technology for doing science. (Hine 2006, 291)

A social science research programme that takes seriously the variety of technical and non-technical factors that stand to affect usability would then ask: what are the socio-cultural factors that affect a database being ‘transportable’? As Hine suggests, the ‘presentation’ of the package/system stands to play a role in this kind of portability. The ‘presentation’ of the system may entail dissemination and communication activities (such as publication and conference presentations) that lie outside of the technical purview of database construction. ‘Presentation’ may also refer to the user-interface, and here we recall the processes detailed above that describe how ‘tastes’ can be socially constructed.

The contextual portability of a system is also subject to its relative openness — an issue we have discussed in the section on ‘accessibility’. It is worthwhile reiterating that not all systems — or all components of systems — are open source and open access, and decisions to make them so are based in part on technical justifications that include ‘free’ system improvements through a larger user-community and the wider diffusion of the system in question. However, decisions to make a system portable in terms of access are also grounded in non-technical factors such as the building and enhancement of reputations (von Hippel 2005) and a sense of communitarian ‘giving back’ to those that freely revealed their own data in order to make the current work possible (Lakhani and Wolf 2005). These socio-cultural elements that affect a system's accessibility will also affect its portability from one context of use to another.

It is also worthwhile noting at least one of the barriers to such portability. In Hine's reference to other social science work on databases, she notes that, in a particular case, ‘the re-use of data proved more problematic, since conventions were less established and researchers still felt it was vital to understand the specific experimental conditions that generated the data’ (Hine 2006, 292). Von Hippel (2005) has evidenced how learning in the innovation process acts as a key motivating factor for users to be involved in the production of tools that they themselves will deploy. Not being able to benefit from these processes may be a barrier for the contextual portability of bioinformatics systems as the learning fostered in the development of the database and its tools may not be experienced when moved to other contexts of use.

Discussion and conclusion

We have shown why bioinformatics database and tool usability merits investigation from a social science perspective, and have outlined the possible types of contributions to such an approach. Direct applications of usability principles that emanate from the computer sciences are beginning to appear in bioinformatics to improve the technical design of these resources (Bolchini et al. 2009a; 2009b). However, a social science research programme is needed on the construction and use of bioinformatics systems to highlight the non-technical factors at work, and to broaden the understanding of what makes for usable systems. Specifically, we suggest that technical considerations of usability need to be supplemented by understandings of accessibility, utility, and portability, and that each of these characteristics are themselves subject to socio-cultural determinants. We have pointed to how the accessibility of a system can be subject to particular IP arrangements, or be affected by the reputation of the particular lab or institution within which it is located. In addition, we have illustrated how the utility of a bioinformatics resource can be dependent on levels of trust, or the extent to which it is linked into positive network externalities and pre-existing path-dependent systems. We have stressed that elements that affect usability, such as trust, prestige and sharing, are not solely driven by socio-cultural forces. Rather, our position is that usability cannot be exclusively understood from a technical perspective, and that research is needed to explicitly examine how socio-cultural factors operate within technical contexts. Within this framework, social science research might be able to address some of the non-technical challenges expressed by Altman (2004) in his editorial on building successful biological databases. A social science research programme of this nature might ask about the relationship between open source/open access systems and the culture of sharing and free revealing that is so pervasive in computer sciences: how do these cultural elements work to shape technological products, and how do the particular configurations of the technology enable or constrain certain cultural behaviours such as free-revealing?

Throughout, we have stressed that our understanding of usability does not run counter to existing definitions in and around computer science. We have also emphasized that our characteristics of accessibility, utility, and portability are neither exhaustive nor mutually exclusive. In fact, we observe important areas of overlap in which a socio-cultural characteristic such as openness/IP can affect a database's usability alongside its accessibility. In the same way that one particular gene can be involved in multiple and different interactions and pathways within a biological system, so too can a particular sociocultural characteristic have various effects in a socio-technical system such as bioinformatics. We see the complexities surrounding socio-cultural characteristics and their interactions with the technical features of databases as motivation for research in this area. We invite researchers in human–computer interaction, STS, bioinformatics, usability engineering and other related areas to extend their work to examine the social contexts in which these systems are used, and the socio-cultural factors that mediate their use. Such a research programme would not only increase the multidisciplinary nature of these emergent fields, but would also help address the complexities of work in the post-genomic era.

Notes

¹ We based this review on our observations of three established genomics databases: WormBase (http://www.wormbase.org/), DiscoverySpace http://www.bcgsc.ca/discoveryspace/), and InnateDB http://www.innatedb.ca/). We were integrated as multidisciplinary researchers in projects which developed, deployed, and expanded on these bioinformatics databases. As such, we draw on our participation within these projects to present general insights on the use and uptake of bioinformatics resources.

² A search of the archive of one of the leading journals in the field of bioinformatics — Briefings in Bioinformatics — only returned 14 hits on the use of the term in any form in any article (search conducted 27 April 2010). While the term was first used in Briefings in 2006, we have recently seen a rise in attention paid to this area with 7 of the 14 hit returns in 2009–2010. This can be taken as a sign that ‘usability’ is becoming

increasingly important in the pages of Briefings, and in the field of bioinformatics more generally.

³ GeneCard is a database of human genes that provides genomic, proteomic, transcriptomic, genetic and functional information on all known and predicted human genes.

⁴ As an emergent area of study and work, the nature of bioinformatics is unclear. Is it a research tool or a scientific discipline? If the latter, who belongs to it? How does one differentiate between insiders and outsiders? What are the norms and values that regulate activity? While all these questions would be relevant to a sociology of bioinformatics, they are simply not the subject of our discussion here.

⁵ We acknowledge that the extent to which users understand a bioinformatics system also affects the resource's utility, which is discussed in the following section.

Notes on contributors

Dr Conor Douglas is a sociologist of science, and science and technology studies (STS) scholar. At the time of writing he was a Postdoctoral Research Fellow Translational Genomics Research Group (TGRG) at the W. Maurice Young Centre for Applied Ethics at the University of British Columbia in Vancouver, Canada. His work focuses on the social patterns within science, as well as the ramifications of new genetic medical technologies on broader social organization. His current work at the Vrije University (Amsterdam) Section Community Genetics concerns user configurations in Dutch biobanking, and European health policy implications of synthetic biology that is being carried out in the Rathenau Institute in The Hague. Email: cm.douglas@vumc.nl

At the time of writing Dr Rebecca Goulding was a Postdoctoral Fellow with the Intellectual Property and Policy Research Group (IPPRG) at the W. Maurice Young Centre for Applied Ethics at the University of British Columbia in Vancouver, Canada. Rebecca's research with the IPPRG was funded by Genome British Columbia and Genome Canada. Rebecca is currently working as a policy analyst consultant in global health and drug research and development in Vancouver, BC.

At the time of writing Lily Farris was a Research Associate in the Intellectual Property and Policy Research Group (IPPRG) at the Centre for Applied Ethics residing at the University of British Columbia in Vancouver, Canada. Lily's work with the IPPRG was funded by Genome British Columbia and Genome Canada. Lily is currently a Quality Analyst at the BC Children's Hospital Child and Youth Mental Health Program in Vancouver, BC.

Dr Atkinson-Grosjean is a Senior Research Fellow at UBC's WM Young Centre for Applied Ethics where she leads the Translational Genomics Research Group. Focusing on large-scale science, her work examines how institutional and organizational arrangements affect the production and translation of scientific knowledge. She has developed a dialectical model of translational science that captures iterative relationships between worlds of discovery (basic research) and worlds of utility (clinical, commercial, and civic translation.) Dr. Atkinson-Grosjean's postdoctoral training was in applied ethics and policy; she holds an interdisciplinary PhD in science and technology studies (STS) (UBC 2002) and an MA in Liberal Studies with an STS focus (SFU, 1996).

Footnotes

Acknowledgements

The authors acknowledge Genome Canada and Genome BC funding that supports members of the Translational Genomics Research Group (TGRG) and the Intellectual Property and Policy Research Group (IPPRG) at UBC through a series of integrated GE³LS projects: Pathogenomics of innate immunity (PI2), MORGEN and C. elegans Gene Knockout Consortium, and one stand-alone GE³LS project — Building a GE³LS Architecture (GE³LS Arch). GE³LS is Genome Canada's term for the study of the ethical, economic, environmental, legal and social issues arising in genomics research. The authors would also like to thank the anonymous reviewers of this journal who provided excellent and detailed feedback, and we also acknowledge the extensive networks of collaborators across these above mentioned projects — in particular, Dr Jennifer Gardy who provided feedback on earlier drafts.

References

Altman

. 2004. Editorial: Building successful biological databases. Briefings in Bioinformatics 5: 4–5.

Birney

, Clamp

. 2004. Biological database design and implementation. Briefings in Bioinformatics 5: 31–8.

Bolchini

, Finkelstein

, Perrone

, et al.. 2009a. Better bioinformatics through usability analysis. Bioinformatics 25: 406–12.

Bolchini

, Finkelstein

, Paolini

. 2009b. Designing usable bio-information architectures. In Human-computer interaction, ed, , Jacko

J. A

653–62. Berlin, Heidelberg: Springer-Verlag.

Bourdieu

. 1984. Distinction. Cambridge, MA: Harvard University Press.

Card

, Mackinlay

, Shneiderman

. 1999. Readings in information visualization: Using vision to think. San Francisco, CA: Morgan Kaufmann Publishers.

Cook-Deegan

, Dedeurwaerdere

. 2006. The science commons in life science research: structure, function, and value of access to genetic diversity. International Social Science Journal 58: 299–317.

David

. 2001. Path dependence, its critics and the quest for ‘historical economics’. In Evolution and path dependence in economic ideas: Past and present ed. , Garrouste

Pierre

, Stavros

Ioannides

, 15–40. Cheltenham, UK: Edward Elgar Press.

Davis

. 1989. Usefulness, perceived ease of use, and user acceptance of information technology. MIS Quarterly 13: 319–40.

10.

Editorial . 2009. Nature 462(7271): 252.

11.

Fuch

. 2000. Analyze this… or: intelligent help for the rest of us. Bioinformatics 16491–3.

12.

Gould

J. D

, Lewis

. 1985. Designing for usability: Key principles and what designers think. Communications of the ACM 28: 300–11.

13.

Hardwig

. 1991. The role of trust in knowledge. Journal of Philosophy 88: 693–708.

14.

Harris

, Antoshechkin

, Bieri

, et al.. 2010. WormBase: a comprehensive resource for nematode research. Nucleic Acids Research 38: D463–67.

15.

Hart

, Saunders

. 1997. Power and trust: Critical factors in the adoption and use of electronic data interchange. Organization Science 8(1): 23–42.

16.

Heller

, Eisenberg

. 1998. Can patents deter innovation? The anticommons in biomedical research. Science 280: 698–701.

17.

Hine

. 2006. Databases as scientific instruments and their role in the ordering of scientific work. Social Studies of Science 36: 269–98.

18.

Hope

. 2008. Biobazaar: The open source revolution and biotechnology. Cambridge, MA: Harvard University Press.

19.

Javahery

, Seffah

, Radhakrishnan

. 2004. Beyond power: Making bioinformatics tools user-centered. Communications of the ACM 47: 59–62.

20.

Katz

, Shapiro

. 1985. Network externalities, competition, and compatibility. American Economic Review 75: 424–40.

21.

Knorr-Cetina

. 1991. Epistemic cultures: Forms of reason in science. History of Political Economy 23: 105–22.

22.

Konagaya

. 2006. Trends in life science grid: From computing grid to knowledge grid. BMC Bioinformatics 7(Suppl 5): S10.

23.

Lakhani

, Wolf

. 2005. Why Hackers do what they do: Understanding motivation and effort in free/open source software projects. In Perspectives on free and open source software ed. , Feller

, Fitzgerald

, Hissam

, Lakhani

. Cambridge, MA: MIT Press.

24.

Luhmann

. 2000. Familiarity, confidence, trust: Problems and alternatives. In Trust: Making and breaking cooperative relations, ed. Diego Gambetta. electronic edn, Department of Sociology, University of Oxford, chapter 6, 94–107. http://www.sociology.ox.ac.uk/papers/luhmann94–107.pdf.

25.

Lynn

Detal

. 2008. InnateDB: Facilitating systems-level analyses of the mammalian innate immune response. Molecular Systems Biology 4: 218.

26.

Merton

. 1942. The sociology of science. Chicago: The University of Chicago Press.

27.

Merton

Robert K

. 1973 [1942]. The normative structure of science. In The sociology of science: Theoretical and empirical investigations ed. , Merton

, Storer

, 267–78. Chicago, IL: The University of Chicago Press.

28.

Merton

. 1968. The Matthew effect in science. Science 159: 56–63.

29.

Nielsen

. 1993. Usability engineering. London, UK: Academic Press.

30.

Oudshoorn

, Pinch

, ed. 2003. How users matter: The co-construction of users and technologies. Cambridge, MA: MIT Press.

31.

Ouzounis

, Valencia

. 2003. Early bioinformatics: The birth of a discipline — a personal view. Bioinformatics 19: 2176.

32.

Pinch

, Bĳker

. 1984. The social construction of facts and artefacts: Or how the sociology of science and the sociology of technology might benefit each other. Social Studies of Science 14: 399–441.

33.

Preece

, Rogers

, Sharp

, ed. 2002. Interaction design: Beyond human-computer interaction. John Wiley Press.

34.

Rebhan

, Chalifa-Caspi

, Prilusky

, et al.. 1998. GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support. Bioinformatics 14: 656–64.

35.

Robertson

, Oveisi-Fordorei

, Zuyderduyn

, et al.. 2007. DiscoverySpace: an interactive data analysis application. Genome Biology 8(1): R6.

36.

Rosson

, Carroll

. 2002. Usability engineering: Scenario-based development of human-computer interaction. San Diego, CA: Academic Press.

37.

Sine

, Shane

, Di Gregorio

. 2003. The Halo Effect and technology licensing: The influence of institutional prestige on the licensing of university inventions. Management Science 49: 478–96.

38.

Sulston

, Ferry

. 2002 The common thread: A story of science, politics, ethics, and the human genome. Washington, DC: Joseph Henry Press.

39.

Thomas

J J

, Cook

K A

, ed. 2005. Illuminating the path: The R&D agenda for visual analytics. Richland, WA: National Visualization and Analytics Center.

40.

von Hippel

. 2005. Democratizing innovation. Cambridge, MA: MIT Press.

41.

Wilkie

. 1996. Sources in science: Who can we trust? Lancet 11 347(9011): ; 1308–11.