-
BMC Research Notes Sep 2023This release note describes the Maize GxE project datasets within the Genomes to Fields (G2F) Initiative. The Maize GxE project aims to understand genotype by...
OBJECTIVES
This release note describes the Maize GxE project datasets within the Genomes to Fields (G2F) Initiative. The Maize GxE project aims to understand genotype by environment (GxE) interactions and use the information collected to improve resource allocation efficiency and increase genotype predictability and stability, particularly in scenarios of variable environmental patterns. Hybrids and inbreds are evaluated across multiple environments and phenotypic, genotypic, environmental, and metadata information are made publicly available.
DATA DESCRIPTION
The datasets include phenotypic data of the hybrids and inbreds evaluated in 30 locations across the US and one location in Germany in 2020 and 2021, soil and climatic measurements and metadata information for all environments (combination of year and location), ReadMe, and description files for each data type. A set of common hybrids is present in each environment to connect with previous evaluations. Each environment had a collaborator responsible for collecting and submitting the data, the GxE coordination team combined all the collected information and removed obvious erroneous data. Collaborators received the combined data to use, verify and declare that the data generated in their own environments was accurate. Combined data is released to the public with minimal filtering to maintain fidelity to the original data.
Topics: Zea mays; Seasons; Genotype; Germany; Resource Allocation
PubMed: 37710302
DOI: 10.1186/s13104-023-06430-y -
Open Research Europe 2023The recent COVID-19 (Corona Virus Disease 2019) pandemic dramatically underlined the multi-faceted nature of health research, requiring input from basic biological...
BACKGROUND
The recent COVID-19 (Corona Virus Disease 2019) pandemic dramatically underlined the multi-faceted nature of health research, requiring input from basic biological sciences, pharmaceutical technologies, clinical research), social sciences and public health and social engineering. Systems that could work across different disciplines would therefore seem to be a useful idea to explore. In this study we investigated whether metadata schemas and vocabularies used for discovering scientific studies and resources in the social sciences and in clinical research are similar enough to allow information from different source disciplines to be easily retrieved and presented together.
METHODS
As a first step a literature search was performed, exemplarily identifying studies and resources, in which data from social sciences have been usefully employed or integrated with that from clinical research and clinical trials. In a second step a comparison of metadata schemas and related resource catalogues in ECRIN (European Clinical Research Infrastructure Network) and CESSDA (Consortium of European Social Science Data Archives) was performed. The focus was on discovery metadata, here defined as the metadata elements used to identify and locate scientific resources.
RESULTS
A close view at the metadata schemas of CESSDA and ECRIN and the basic discovery metadata as well as a crosswalk between ECRIN and CESSDA metadata schemas have shown that there is considerable resemblance between them.
CONCLUSIONS
The resemblance could serve as a promising starting point to implement a common search mechanism for ECRIN and CESSDA metadata. In the paper four different options for how to proceed with implementation issues are presented.
PubMed: 37965479
DOI: 10.12688/openreseurope.16284.1 -
The Lancet. Digital Health Oct 2023Data sharing is central to the rapid translation of research into advances in clinical medicine and public health practice. In the context of COVID-19, there has been a... (Review)
Review
Data sharing is central to the rapid translation of research into advances in clinical medicine and public health practice. In the context of COVID-19, there has been a rush to share data marked by an explosion of population-specific and discipline-specific resources for collecting, curating, and disseminating participant-level data. We conducted a scoping review and cross-sectional survey to identify and describe COVID-19-related platforms and registries that harmonise and share participant-level clinical, omics (eg, genomic and metabolomic data), imaging data, and metadata. We assess how these initiatives map to the best practices for the ethical and equitable management of data and the findable, accessible, interoperable, and reusable (FAIR) principles for data resources. We review gaps and redundancies in COVID-19 data-sharing efforts and provide recommendations to build on existing synergies that align with frameworks for effective and equitable data reuse. We identified 44 COVID-19-related registries and 20 platforms from the scoping review. Data-sharing resources were concentrated in high-income countries and siloed by comorbidity, body system, and data type. Resources for harmonising and sharing clinical data were less likely to implement FAIR principles than those sharing omics or imaging data. Our findings are that more data sharing does not equate to better data sharing, and the semantic and technical interoperability of platforms and registries harmonising and sharing COVID-19-related participant-level data needs to improve to facilitate the global collaboration required to address the COVID-19 crisis.
Topics: Humans; COVID-19; Cross-Sectional Studies; Information Dissemination; Registries; Metadata
PubMed: 37775189
DOI: 10.1016/S2589-7500(23)00129-2 -
The British Journal of Radiology Nov 2023In radiography, much valuable associated data (metadata) is generated during image acquisition. The current setup of picture archive and communication systems (PACS) can... (Review)
Review
In radiography, much valuable associated data (metadata) is generated during image acquisition. The current setup of picture archive and communication systems (PACS) can make extraction of this metadata difficult, especially as it is typically stored with the image. The aim of this work is to examine the current challenges in extracting image metadata and to discuss the potential benefits of using this rich information. This work focuses on breast screening, though the conclusions are applicable to other modalities.The data stored in PACS contain information, currently underutilised, and is of great benefit for auditing and improving imaging and radiographic practice. From the literature, we present examples of the potential clinical benefit such as audits of dose, and radiographic practice, as well as more advanced research highlighting the effects of radiographic practice, . cancer detection rates affected by imaging technology.This review considers the challenges in extracting data, namely, The search tools for data on most PACS are inadequate being both time-consuming and limited in elements that can be searched. Security and information governance considerations Anonymisation of data if required Data curationThe review describes some solutions that have been successfully implemented. Retrospective extraction: direct query on PACS Extracting data prospectively Use of structured reports Use of trusted research environmentsUltimately, the data access process will be made easier by inclusion during PACS procurement. Auditing data from PACS can be used to improve quality of imaging and workflow, all of which will be a clinical benefit to patients.
Topics: Humans; Radiology Information Systems; Retrospective Studies; Workflow; Metadata
PubMed: 37698251
DOI: 10.1259/bjr.20230104 -
Biodiversity Data Journal 2023The standardization of data, encompassing both primary and contextual information (metadata), plays a pivotal role in facilitating data (re-)use, integration, and...
The standardization of data, encompassing both primary and contextual information (metadata), plays a pivotal role in facilitating data (re-)use, integration, and knowledge generation. However, the biodiversity and omics communities, converging on omics biodiversity data, have historically developed and adopted their own distinct standards, hindering effective (meta)data integration and collaboration. In response to this challenge, the Task Group (TG) for Sustainable DwC-MIxS Interoperability was established. Convening experts from the Biodiversity Information Standards (TDWG) and the Genomic Standards Consortium (GSC) alongside external stakeholders, the TG aimed to promote sustainable interoperability between the Minimum Information about any (x) Sequence (MIxS) and Darwin Core (DwC) specifications. To achieve this goal, the TG utilized the Simple Standard for Sharing Ontology Mappings (SSSOM) to create a comprehensive mapping of DwC keys to MIxS keys. This mapping, combined with the development of the MIxS-DwC extension, enables the incorporation of MIxS core terms into DwC-compliant metadata records, facilitating seamless data exchange between MIxS and DwC user communities. Through the implementation of this translation layer, data produced in either MIxS- or DwC-compliant formats can now be efficiently brokered, breaking down silos and fostering closer collaboration between the biodiversity and omics communities. To ensure its sustainability and lasting impact, TDWG and GSC have both signed a Memorandum of Understanding (MoU) on creating a continuous model to synchronize their standards. These achievements mark a significant step forward in enhancing data sharing and utilization across domains, thereby unlocking new opportunities for scientific discovery and advancement.
PubMed: 37829294
DOI: 10.3897/BDJ.11.e112420 -
Scientific Reports Sep 2023For many machine learning applications in drug discovery, only limited amounts of training data are available. This typically applies to compound design and activity...
For many machine learning applications in drug discovery, only limited amounts of training data are available. This typically applies to compound design and activity prediction and often restricts machine learning, especially deep learning. For low-data applications, specialized learning strategies can be considered to limit required training data. Among these is meta-learning that attempts to enable learning in low-data regimes by combining outputs of different models and utilizing meta-data from these predictions. However, in drug discovery settings, meta-learning is still in its infancy. In this study, we have explored meta-learning for the prediction of potent compounds via generative design using transformer models. For different activity classes, meta-learning models were derived to predict highly potent compounds from weakly potent templates in the presence of varying amounts of fine-tuning data and compared to other transformers developed for this task. Meta-learning consistently led to statistically significant improvements in model performance, in particular, when fine-tuning data were limited. Moreover, meta-learning models generated target compounds with higher potency and larger potency differences between templates and targets than other transformers, indicating their potential for low-data compound design.
Topics: Drug Discovery; Electric Power Supplies; Machine Learning
PubMed: 37752164
DOI: 10.1038/s41598-023-43046-5 -
Scientific Data May 2024Datasets consist of measurement data and metadata. Metadata provides context, essential for understanding and (re-)using data. Various metadata standards exist for...
Datasets consist of measurement data and metadata. Metadata provides context, essential for understanding and (re-)using data. Various metadata standards exist for different methods, systems and contexts. However, relevant information resides at differing stages across the data-lifecycle. Often, this information is defined and standardized only at publication stage, which can lead to data loss and workload increase. In this study, we developed Metadatasheet, a metadata standard based on interviews with members of two biomedical consortia and systematic screening of data repositories. It aligns with the data-lifecycle allowing synchronous metadata recording within Microsoft Excel, a widespread data recording software. Additionally, we provide an implementation, the Metadata Workbook, that offers user-friendly features like automation, dynamic adaption, metadata integrity checks, and export options for various metadata standards. By design and due to its extensive documentation, the proposed metadata standard simplifies recording and structuring of metadata for biomedical scientists, promoting practicality and convenience in data management. This framework can accelerate scientific progress by enhancing collaboration and knowledge transfer throughout the intermediate steps of data creation.
Topics: Biomedical Research; Data Management; Metadata; Software
PubMed: 38778016
DOI: 10.1038/s41597-024-03349-2 -
Frontiers in Human Neuroscience 2023With the ever-increasing adoption of tools for online research, for the first time we have visibility on macro-level trends in research that were previously...
With the ever-increasing adoption of tools for online research, for the first time we have visibility on macro-level trends in research that were previously unattainable. However, until now this data has been siloed within company databases and unavailable to researchers. Between them, the online study creation and hosting tool Gorilla Experiment Builder and the recruitment platform Prolific hold metadata gleaned from millions of participants and over half a million studies. We analyzed a subset of this data (over 1 million participants and half a million studies) to reveal critical information about the current state of the online research landscape that researchers can use to inform their own study planning and execution. We analyzed this data to discover basic benchmarking statistics about online research that all researchers conducting their work online may be interested to know. In doing so, we identified insights related to: the typical study length, average completion rates within studies, the most frequent sample sizes, the most popular participant filters, and gross participant activity levels. We present this data in the hope that it can be used to inform research choices going forward and provide a snapshot of the current state of online research.
PubMed: 37484919
DOI: 10.3389/fnhum.2023.1228365 -
Frontiers in Public Health 2023The COVID-19 pandemic has exemplified the importance of interoperable and equitable data sharing for global surveillance and to support research. While many challenges...
The COVID-19 pandemic has exemplified the importance of interoperable and equitable data sharing for global surveillance and to support research. While many challenges could be overcome, at least in some countries, many hurdles within the organizational, scientific, technical and cultural realms still remain to be tackled to be prepared for future threats. We propose to (i) continue supporting global efforts that have proven to be efficient and trustworthy toward addressing challenges in pathogen molecular data sharing; (ii) establish a distributed network of Pathogen Data Platforms to (a) ensure high quality data, metadata standardization and data analysis, (b) perform data brokering on behalf of data providers both for research and surveillance, (c) foster capacity building and continuous improvements, also for pandemic preparedness; (iii) establish an International One Health Pathogens Portal, connecting pathogen data isolated from various sources (human, animal, food, environment), in a truly One Health approach and following FAIR principles. To address these challenging endeavors, we have started an ELIXIR Focus Group where we invite all interested experts to join in a concerted, expert-driven effort toward sustaining and ensuring high-quality data for global surveillance and research.
Topics: Animals; Humans; COVID-19; Pandemics; Capacity Building; Information Dissemination
PubMed: 38074768
DOI: 10.3389/fpubh.2023.1289945 -
Database : the Journal of Biological... Nov 2023Over the last couple of decades, there has been a rapid growth in the number and scope of agricultural genetics, genomics and breeding databases and resources. The...
Over the last couple of decades, there has been a rapid growth in the number and scope of agricultural genetics, genomics and breeding databases and resources. The AgBioData Consortium (https://www.agbiodata.org/) currently represents 44 databases and resources (https://www.agbiodata.org/databases) covering model or crop plant and animal GGB data, ontologies, pathways, genetic variation and breeding platforms (referred to as 'databases' throughout). One of the goals of the Consortium is to facilitate FAIR (Findable, Accessible, Interoperable, and Reusable) data management and the integration of datasets which requires data sharing, along with structured vocabularies and/or ontologies. Two AgBioData working groups, focused on Data Sharing and Ontologies, respectively, conducted a Consortium-wide survey to assess the current status and future needs of the members in those areas. A total of 33 researchers responded to the survey, representing 37 databases. Results suggest that data-sharing practices by AgBioData databases are in a fairly healthy state, but it is not clear whether this is true for all metadata and data types across all databases; and that, ontology use has not substantially changed since a similar survey was conducted in 2017. Based on our evaluation of the survey results, we recommend (i) providing training for database personnel in a specific data-sharing techniques, as well as in ontology use; (ii) further study on what metadata is shared, and how well it is shared among databases; (iii) promoting an understanding of data sharing and ontologies in the stakeholder community; (iv) improving data sharing and ontologies for specific phenotypic data types and formats; and (v) lowering specific barriers to data sharing and ontology use, by identifying sustainability solutions, and the identification, promotion, or development of data standards. Combined, these improvements are likely to help AgBioData databases increase development efforts towards improved ontology use, and data sharing via programmatic means. Database URL https://www.agbiodata.org/databases.
Topics: Animals; Data Management; Plant Breeding; Genomics; Databases, Factual; Information Dissemination
PubMed: 37971715
DOI: 10.1093/database/baad076