Analysis and synthesis of metadata goals for scientific data by Craig Willis, Jane Greenberg and Hollie White. (Willis, C., Greenberg, J. and White, H. (2012), Analysis and synthesis of metadata goals for scientific data. J. Am. Soc. Inf. Sci.. doi: 10.1002/asi.22683)
Abstract:
The proliferation of discipline-specific metadata schemes contributes to artificial barriers that can impede interdisciplinary and transdisciplinary research. The authors considered this problem by examining the domains, objectives, and architectures of nine metadata schemes used to document scientific data in the physical, life, and social sciences. They used a mixed-methods content analysis and Greenberg’s () metadata objectives, principles, domains, and architectural layout (MODAL) framework, and derived 22 metadata-related goals from textual content describing each metadata scheme. Relationships are identified between the domains (e.g., scientific discipline and type of data) and the categories of scheme objectives. For each strong correlation (>0.6), a Fisher’s exact test for nonparametric data was used to determine significance (p < .05). Significant relationships were found between the domains and objectives of the schemes. Schemes describing observational data are more likely to have “scheme harmonization” (compatibility and interoperability with related schemes) as an objective; schemes with the objective “abstraction” (a conceptual model exists separate from the technical implementation) also have the objective “sufficiency” (the scheme defines a minimal amount of information to meet the needs of the community); and schemes with the objective “data publication” do not have the objective “element refinement.” The analysis indicates that many metadata-driven goals expressed by communities are independent of scientific discipline or the type of data, although they are constrained by historical community practices and workflows as well as the technological environment at the time of scheme creation. The analysis reveals 11 fundamental metadata goals for metadata documenting scientific data in support of sharing research data across disciplines and domains. The authors report these results and highlight the need for more metadata-related research, particularly in the context of recent funding agency policy changes.
The authors remark on the scope of metadata:
Scope is a broad term, but is commonly used in the software requirements and metadata communities to identify what is included as part of a system or scheme. In the context of metadata for scientific data, it seems that each community has scoped their metadata based on discipline-specific needs and practices. This observation makes sense, given that the metadata efforts examined are initiated within silos, embedded in the scientific practice of the community. To extend this research, it seems that more questions are needed to address these fundamental requirements in the context of communities’ approaches to science and communication.
And later advocate the study of metadata in broader communities to break down the barriers created by silos.
The quest to avoid/abandon silos is a quixotic one.
A more realistic goal would be to build a bigger silos that encompasses related areas of science/data by providing trans-silo mappings from more specialized silos.
Any metadata that we establish today will be a “silo” when viewed ten years hence.
We can create mechanisms that ease our transition from one silo to the next or continue the pretense we can move beyond them.
Which will you choose?