-
PLoS Biology Dec 2020Researchers face many, often seemingly arbitrary, choices in formulating hypotheses, designing protocols, collecting data, analyzing data, and reporting results....
Researchers face many, often seemingly arbitrary, choices in formulating hypotheses, designing protocols, collecting data, analyzing data, and reporting results. Opportunistic use of "researcher degrees of freedom" aimed at obtaining statistical significance increases the likelihood of obtaining and publishing false-positive results and overestimated effect sizes. Preregistration is a mechanism for reducing such degrees of freedom by specifying designs and analysis plans before observing the research outcomes. The effectiveness of preregistration may depend, in part, on whether the process facilitates sufficiently specific articulation of such plans. In this preregistered study, we compared 2 formats of preregistration available on the OSF: Standard Pre-Data Collection Registration and Prereg Challenge Registration (now called "OSF Preregistration," http://osf.io/prereg/). The Prereg Challenge format was a "structured" workflow with detailed instructions and an independent review to confirm completeness; the "Standard" format was "unstructured" with minimal direct guidance to give researchers flexibility for what to prespecify. Results of comparing random samples of 53 preregistrations from each format indicate that the "structured" format restricted the opportunistic use of researcher degrees of freedom better (Cliff's Delta = 0.49) than the "unstructured" format, but neither eliminated all researcher degrees of freedom. We also observed very low concordance among coders about the number of hypotheses (14%), indicating that they are often not clearly stated. We conclude that effective preregistration is challenging, and registration formats that provide effective guidance may improve the quality of research.
Topics: Data Collection; Humans; Quality Control; Registries; Research Design
PubMed: 33296358
DOI: 10.1371/journal.pbio.3000937 -
Bulletin of the World Health... Feb 2021
Topics: Data Collection; Health Information Systems; Humans; Information Dissemination
PubMed: 33551510
DOI: 10.2471/BLT.19.237248 -
American Journal of Public Health Dec 2021High-quality data are accurate, relevant, and timely. Large national health surveys have always balanced the implementation of these quality dimensions to meet the needs...
High-quality data are accurate, relevant, and timely. Large national health surveys have always balanced the implementation of these quality dimensions to meet the needs of diverse users. The COVID-19 pandemic shifted these balances, with both disrupted survey operations and a critical need for relevant and timely health data for decision-making. The National Health Interview Survey (NHIS) responded to these challenges with several operational changes to continue production in 2020. However, data files from the 2020 NHIS were not expected to be publicly available until fall 2021. To fill the gap, the National Center for Health Statistics (NCHS) turned to 2 online data collection platforms-the Census Bureau's Household Pulse Survey (HPS) and the NCHS Research and Development Survey (RANDS)-to collect COVID-19‒related data more quickly. This article describes the adaptations of NHIS and the use of HPS and RANDS during the pandemic in the context of the recently released Framework for Data Quality from the Federal Committee on Statistical Methodology. (. 2021;111(12):2167-2175. https://doi.org/10.2105/AJPH.2021.306516).
Topics: Bias; COVID-19; Cross-Sectional Studies; Data Collection; Health Surveys; Humans; Internet; Interviews as Topic; National Center for Health Statistics, U.S.; Pandemics; SARS-CoV-2; Sociodemographic Factors; Telephone; United States
PubMed: 34878857
DOI: 10.2105/AJPH.2021.306516 -
BMC Medical Informatics and Decision... Apr 2022Electronic sources (eSources) can improve data quality and reduce clinical trial costs. Our team has developed an innovative eSource record (ESR) system in China. This...
BACKGROUND
Electronic sources (eSources) can improve data quality and reduce clinical trial costs. Our team has developed an innovative eSource record (ESR) system in China. This study aims to evaluate the efficiency, quality, and system performance of the ESR system in data collection and data transcription.
METHODS
The study used time efficiency and data transcription accuracy indicators to compare the eSource and non-eSource data collection workflows in a real-world study (RWS). The two processes are traditional data collection and manual transcription (the non-eSource method) and the ESR-based source data collection and electronic transmission (the eSource method). Through the system usability scale (SUS) and other characteristic evaluation scales (system security, system compatibility, record quality), the participants' experience of using ESR was evaluated.
RESULTS
In terms of the source data collection (the total time required for writing electronic medical records (EMRs)), the ESR system can reduce the time required by 39% on average compared to the EMR system. In terms of data transcription (electronic case report form (eCRF) filling and verification), the ESR can reduce the time required by 80% compared to the non-eSource method (difference: 223 ± 21 s). The ESR accuracy in filling the eCRF field is 96.92%. The SUS score of ESR is 66.9 ± 16.7, which is at the D level and thus very close to the acceptable margin, indicating that optimization work is needed.
CONCLUSIONS
This preliminary evaluation shows that in the clinical medical environment, the ESR-based eSource method can improve the efficiency of source data collection and reduce the workload required to complete data transcription.
Topics: Data Accuracy; Data Collection; Electronic Health Records; Humans; Research Design; Workflow
PubMed: 35410214
DOI: 10.1186/s12911-022-01824-7 -
Journal of Medical Internet Research May 2020Roadside observational studies play a fundamental role in designing evidence-informed strategies to address the pressing global health problem of road traffic injuries....
BACKGROUND
Roadside observational studies play a fundamental role in designing evidence-informed strategies to address the pressing global health problem of road traffic injuries. Paper-based data collection has been the standard method for such studies, although digital methods are gaining popularity in all types of primary data collection.
OBJECTIVE
This study aims to understand the reliability, productivity, and efficiency of paper vs digital data collection based on three different road user behaviors: helmet use, seatbelt use, and speeding. It also aims to understand the cost and time efficiency of each method and to evaluate potential trade-offs among reliability, productivity, and efficiency.
METHODS
A total of 150 observational sessions were conducted simultaneously for each risk factor in Mumbai, India, across two rounds of data collection. We matched the simultaneous digital and paper observation periods by date, time, and location, and compared the reliability by subgroups and the productivity using Pearson correlations (r). We also conducted logistic regressions separately by method to understand how similar results of inferential analyses would be. The time to complete an observation and the time to obtain a complete dataset were also compared, as were the total costs in US dollars for fieldwork, data entry, management, and cleaning.
RESULTS
Productivity was higher in paper than digital methods in each round for each risk factor. However, the sample sizes across both methods provided a precision of 0.7 percentage points or smaller. The gap between digital and paper data collection productivity narrowed across rounds, with correlations improving from r=0.27-0.49 to 0.89-0.96. Reliability in risk factor proportions was between 0.61 and 0.99, improving between the two rounds for each risk factor. The results of the logistic regressions were also largely comparable between the two methods. Differences in regression results were largely attributable to small sample sizes in some variable levels or random error in variables where the prevalence of the outcome was similar among variable levels. Although data collectors were able to complete an observation using paper more quickly, the digital dataset was available approximately 9 days sooner. Although fixed costs were higher for digital data collection, variable costs were much lower, resulting in a 7.73% (US $3011/38,947) lower overall cost.
CONCLUSIONS
Our study did not face trade-offs among time efficiency, cost efficiency, statistical reliability, and descriptive comparability when deciding between digital and paper, as digital data collection proved equivalent or superior on these domains in the context of our project. As trade-offs among cost, timeliness, and comparability-and the relative importance of each-could be unique to every data collection project, researchers should carefully consider the questionnaire complexity, target sample size, implementation plan, cost and logistical constraints, and geographical contexts when making the decision between digital and paper.
Topics: Accidents, Traffic; Data Collection; Efficiency; Humans; Information Technology; Paper; Prevalence; Risk Factors; Surveys and Questionnaires; Telemedicine
PubMed: 32348273
DOI: 10.2196/17129 -
American Journal of Public Health Dec 2021While underscoring the need for timely, nationally representative data in ambulatory, hospital, and long-term-care settings, the COVID-19 pandemic posed many challenges...
While underscoring the need for timely, nationally representative data in ambulatory, hospital, and long-term-care settings, the COVID-19 pandemic posed many challenges to traditional methods and mechanisms of data collection. To continue generating data from health care and long-term-care providers and establishments in the midst of the COVID-19 pandemic, the National Center for Health Statistics had to modify survey operations for several of its provider-based National Health Care Surveys, including quickly adding survey questions that captured the experiences of providing care during the pandemic. With the aim of providing information that may be useful to other health care data collection systems, this article presents some key challenges that affected data collection activities for these national provider surveys, as well as the measures taken to minimize the disruption in data collection and to optimize the likelihood of disseminating quality data in a timely manner. (. 2021;111(12):2141-2148. https://doi.org/10.2105/AJPH.2021.306514).
Topics: Ambulatory Care; COVID-19; Data Collection; Electronic Health Records; Health Care Surveys; Hospitalization; Humans; Long-Term Care; Pandemics; SARS-CoV-2; Time Factors; United States
PubMed: 34878878
DOI: 10.2105/AJPH.2021.306514 -
Journal of Medical Internet Research Feb 2021The World Health Organization has recognized the importance of assessing population-level mental health during the COVID-19 pandemic. During a global crisis such as the...
BACKGROUND
The World Health Organization has recognized the importance of assessing population-level mental health during the COVID-19 pandemic. During a global crisis such as the COVID-19 pandemic, a timely surveillance method is urgently needed to track the impact on public mental health.
OBJECTIVE
This brief systematic review focused on the efficiency and quality of data collection of studies conducted during the COVID-19 pandemic.
METHODS
We searched the PubMed database using the following search strings: ((COVID-19) OR (SARS-CoV-2)) AND ((Mental health) OR (psychological) OR (psychiatry)). We screened the titles, abstracts, and texts of the published papers to exclude irrelevant studies. We used the Newcastle-Ottawa Scale to evaluate the quality of each research paper.
RESULTS
Our search yielded 37 relevant mental health surveys of the general public that were conducted during the COVID-19 pandemic, as of July 10, 2020. All these public mental health surveys were cross-sectional in design, and the journals efficiently made these articles available online in an average of 18.7 (range 1-64) days from the date they were received. The average duration of recruitment periods was 9.2 (range 2-35) days, and the average sample size was 5137 (range 100-56,679). However, 73% (27/37) of the selected studies had Newcastle-Ottawa Scale scores of <3 points, which suggests that these studies are of very low quality for inclusion in a meta-analysis.
CONCLUSIONS
The studies examined in this systematic review used an efficient data collection method, but there was a high risk of bias, in general, among the existing public mental health surveys. Therefore, following recommendations to avoid selection bias, or employing novel methodologies considering both a longitudinal design and high temporal resolution, would help provide a strong basis for the formation of national mental health policies.
Topics: COVID-19; Cross-Sectional Studies; Data Collection; Health Surveys; Humans; Mental Health; Pandemics; SARS-CoV-2
PubMed: 33481754
DOI: 10.2196/25118 -
Structure (London, England : 1993) Dec 2000To increase the efficiency of diffraction data collection for protein crystallographic studies, an automated system designed to store frozen protein crystals, mount them...
To increase the efficiency of diffraction data collection for protein crystallographic studies, an automated system designed to store frozen protein crystals, mount them sequentially, align them to the X-ray beam, collect complete data sets, and return the crystals to storage has been developed. Advances in X-ray data collection technology including more brilliant X-ray sources, improved focusing optics, and faster-readout detectors have reduced diffraction data acquisition times from days to hours at a typical protein crystallography laboratory [1,2]. In addition, the number of high-brilliance synchrotron X-ray beam lines dedicated to macromolecular crystallography has increased significantly, and data collection times at these facilities can be routinely less than an hour per crystal. Because the number of protein crystals that may be collected in a 24 hr period has substantially increased, unattended X-ray data acquisition, including automated crystal mounting and alignment, is a desirable goal for protein crystallography. The ability to complete X-ray data collection more efficiently should impact a number of fields, including the emerging structural genomics field [3], structure-directed drug design, and the newly developed screening by X-ray crystallography [4], as well as small molecule applications.
Topics: Crystallization; Crystallography, X-Ray; Data Collection; Drug Design; Drug Storage; Protein Engineering; Proteins; Robotics; Software
PubMed: 11188700
DOI: 10.1016/s0969-2126(00)00535-9 -
PloS One 2015In the social sciences, there is a longstanding tension between data collection methods that facilitate quantification and those that are open to unanticipated...
In the social sciences, there is a longstanding tension between data collection methods that facilitate quantification and those that are open to unanticipated information. Advances in technology now enable new, hybrid methods that combine some of the benefits of both approaches. Drawing inspiration from online information aggregation systems like Wikipedia and from traditional survey research, we propose a new class of research instruments called wiki surveys. Just as Wikipedia evolves over time based on contributions from participants, we envision an evolving survey driven by contributions from respondents. We develop three general principles that underlie wiki surveys: they should be greedy, collaborative, and adaptive. Building on these principles, we develop methods for data collection and data analysis for one type of wiki survey, a pairwise wiki survey. Using two proof-of-concept case studies involving our free and open-source website www.allourideas.org, we show that pairwise wiki surveys can yield insights that would be difficult to obtain with other methods.
Topics: Data Collection; Humans; Internet; Social Sciences; Surveys and Questionnaires
PubMed: 25992565
DOI: 10.1371/journal.pone.0123483 -
Health Technology Assessment... May 2012Many studies in health sciences research rely on collecting participant-reported outcomes and attention is increasingly being paid to the mode of data collection.... (Review)
Review
BACKGROUND
Many studies in health sciences research rely on collecting participant-reported outcomes and attention is increasingly being paid to the mode of data collection. Consideration needs to be given to the validity of response via different modes and the impact that choice of mode might have on study conclusions.
OBJECTIVES
(1) To provide an overview of the theoretical models of survey response and how they relate to health research; (2) to review all studies comparing two modes of administration for subjective outcomes and assess the impact of mode of administration on response quality; (3) to explore the impact of findings for key identified health-related measures; and (4) to inform the analysis of multimode studies.
DATA SOURCES
A broad range of databases (for example EMBASE, PsychINFO, MEDLINE, EconLit, SPORTDiscus, etc.) were chosen to allow as comprehensive a selection as possible, and they were searched up until the end of 2004.
REVIEW METHODS
The abstracts were reviewed against inclusion/exclusion criteria. Full papers were retrieved for all selected abstracts and then screened again using more detailed inclusion criteria related to the measures used. Papers that were still included were reviewed in full and detailed data extracted. At each stage, abstracts or papers were reviewed by a single reviewer.
RESULTS
The search strategy identified 39,253 unique references, of which 2156 were considered as full papers, with 381 finally included in the review. Two features of mode were clearly associated with bias in response; however, none of the features of mode was associated with changes in precision. How the measure was administered, by an interviewer or by the person themselves, was highly significantly associated with bias (p < 0.001). A difference in sensory stimuli was also significant (p = 0.03). When both of these were present the average overall bias was < 1 point on a percentage scale. In terms of mediating factors, there was some suggestion that there was an interaction between both telephone and computer for data collection and date of publication, supporting the theory that differences disappear as new technologies become commonplace. Single-item measures were also related to greater degrees of bias than multi-item scales (p = 0.01). Individual analysis of the Short Form questionnaire-36 items and Minnesota Multiphasic Personality Inventory (MMPI) showed a varied pattern across the different subscales, with conflicting results between the two types of study. None of the MMPI measures used to detect deviant responding showed a relationship with the mode features tested. The limits of agreement analysis showed how variable measures were between modes at an individual rather than a group mean level.
LIMITATIONS
The search strategy covered the period up to 2004, so any new and emerging technologies were not included. Not all potential mode features were tested and there was limited information on potential mediating factors.
CONCLUSIONS
Researchers need to be aware of the different mode features that could have an impact on their results when selecting a mode of data collection for subjective outcomes. Further mode comparison studies, which manipulate mode features and directly assess impact over time, would be beneficial.
Topics: Confidence Intervals; Data Collection; Health Surveys; Humans; Reproducibility of Results; Research Design; Self Report
PubMed: 22640750
DOI: 10.3310/hta16270