-
Journal of Medical Internet Research May 2021Perioperative quantitative monitoring of neuromuscular function in patients receiving neuromuscular blockers has become internationally recognized as an absolute and...
BACKGROUND
Perioperative quantitative monitoring of neuromuscular function in patients receiving neuromuscular blockers has become internationally recognized as an absolute and core necessity in modern anesthesia care. Because of their kinetic nature, artifactual recordings of acceleromyography-based neuromuscular monitoring devices are not unusual. These generate a great deal of cynicism among anesthesiologists, constituting an obstacle toward their widespread adoption. Through outlier analysis techniques, monitoring devices can learn to detect and flag signal abnormalities. Outlier analysis (or anomaly detection) refers to the problem of finding patterns in data that do not conform to expected behavior.
OBJECTIVE
This study was motivated by the development of a smartphone app intended for neuromuscular monitoring based on combined accelerometric and angular hand movement data. During the paired comparison stage of this app against existing acceleromyography monitoring devices, it was noted that the results from both devices did not always concur. This study aims to engineer a set of features that enable the detection of outliers in the form of erroneous train-of-four (TOF) measurements from an acceleromyographic-based device. These features are tested for their potential in the detection of erroneous TOF measurements by developing an outlier detection algorithm.
METHODS
A data set encompassing 533 high-sensitivity TOF measurements from 35 patients was created based on a multicentric open label trial of a purpose-built accelero- and gyroscopic-based neuromuscular monitoring app. A basic set of features was extracted based on raw data while a second set of features was purpose engineered based on TOF pattern characteristics. Two cost-sensitive logistic regression (CSLR) models were deployed to evaluate the performance of these features. The final output of the developed models was a binary classification, indicating if a TOF measurement was an outlier or not.
RESULTS
A total of 7 basic features were extracted based on raw data, while another 8 features were engineered based on TOF pattern characteristics. The model training and testing were based on separate data sets: one with 319 measurements (18 outliers) and a second with 214 measurements (12 outliers). The F1 score (95% CI) was 0.86 (0.48-0.97) for the CSLR model with engineered features, significantly larger than the CSLR model with the basic features (0.29 [0.17-0.53]; P<.001).
CONCLUSIONS
The set of engineered features and their corresponding incorporation in an outlier detection algorithm have the potential to increase overall neuromuscular monitoring data consistency. Integrating outlier flagging algorithms within neuromuscular monitors could potentially reduce overall acceleromyography-based reliability issues.
TRIAL REGISTRATION
ClinicalTrials.gov NCT03605225; https://clinicaltrials.gov/ct2/show/NCT03605225.
Topics: Accelerometry; Humans; Machine Learning; Neuromuscular Blockade; Neuromuscular Monitoring; Reproducibility of Results
PubMed: 34152273
DOI: 10.2196/25913 -
Journal of Applied Statistics 2020Outlier detection can be seen as a pre-processing step for locating data points in a data sample, which do not conform to the majority of observations. Various...
Outlier detection can be seen as a pre-processing step for locating data points in a data sample, which do not conform to the majority of observations. Various techniques and methods for outlier detection can be found in the literature dealing with different types of data. However, many data sets are inflated by true zeros and, in addition, some components/variables might be of compositional nature. Important examples of such data sets are the Structural Earnings Survey, the Structural Business Statistics, the European Statistics on Income and Living Conditions, tax data or - as in this contribution - household expenditure data which are used, for example, to estimate the Purchase Power Parity of a country. In this work, robust univariate and multivariate outlier detection methods are compared by a complex simulation study that considers various challenges included in data sets, namely structural (true) zeros, missing values, and compositional variables. These circumstances make it difficult or impossible to flag true outliers and influential observations by well-known outlier detection methods. Our aim is to assess the performance of outlier detection methods in terms of their effectiveness to identify outliers when applied to challenging data sets such as the household expenditures data surveyed all over the world. Moreover, different methods are evaluated through a close-to-reality simulation study. Differences in performance of univariate and multivariate robust techniques for outlier detection and their shortcomings are reported. We found that robust multivariate methods outperform robust univariate methods. The best performing methods in finding the outliers and in providing a low false discovery rate were found to be the generalized S estimators (GSE), the BACON-EEM algorithm and a compositional method (CoDa-Cov). In addition, these methods performed also best when the outliers are imputed based on the corresponding outlier detection method and indicators are estimated from the data sets.
PubMed: 35707025
DOI: 10.1080/02664763.2019.1671961 -
Genes Oct 2021Domestication of teleost fish is a recent development, and in most cases started less than 50 years ago. Shedding light on the genomic changes in key economic traits...
Domestication of teleost fish is a recent development, and in most cases started less than 50 years ago. Shedding light on the genomic changes in key economic traits during the domestication process can provide crucial insights into the evolutionary processes involved and help inform selective breeding programmes. Here we report on the recent domestication of a native marine teleost species in New Zealand, the Australasian snapper (). Specifically, we use genome-wide data from a three-generation pedigree of this species to uncover genetic signatures of domestication selection for growth. Genotyping-By-Sequencing (GBS) was used to generate genome-wide SNP data from a three-generation pedigree to calculate generation-wide averages of F between every generation pair. The level of differentiation between generations was further investigated using ADMIXTURE analysis and Principal Component Analysis (PCA). After that, genome scans using Bayescan, LFMM and XP-EHH were applied to identify SNP variants under putative selection following selection for growth. Finally, genes near candidate SNP variants were annotated to gain functional insights. Analysis showed that between generations F values slightly increased as generational time increased. The extent of these changes was small, and both ADMIXTURE analysis and PCA were unable to form clear clusters. Genome scans revealed a number of SNP outliers, indicative of selection, of which a small number overlapped across analyses methods and populations. Genes of interest within proximity of putative selective SNPs were related to biological functions, and revealed an association with growth, immunity, neural development and behaviour, and tumour repression. Even though few genes overlapped between outlier SNP methods, gene functionalities showed greater overlap between methods. While the genetic changes observed were small in most cases, a number of outlier SNPs could be identified, of which some were found by more than one method. Multiple outlier SNPs appeared to be predominately linked to gene functionalities that modulate growth and survival. Ultimately, the results help to shed light on the genomic changes occurring during the early stages of domestication selection in teleost fish species such as snapper, and will provide useful candidates for the ongoing selective breeding in the future of this and related species.
Topics: Animals; Biological Evolution; Domestication; Genome; Genome-Wide Association Study; Genotyping Techniques; New Zealand; Pedigree; Perciformes; Phenotype; Polymorphism, Single Nucleotide; Selective Breeding
PubMed: 34828341
DOI: 10.3390/genes12111737 -
Journal of Applied Statistics 2021This paper studies the outlier detection and robust variable selection problem in the linear regression model. The penalized weighted least absolute deviation (PWLAD)...
This paper studies the outlier detection and robust variable selection problem in the linear regression model. The penalized weighted least absolute deviation (PWLAD) regression estimation method and the adaptive least absolute shrinkage and selection operator (LASSO) are combined to simultaneously achieve outlier detection, and robust variable selection. An iterative algorithm is proposed to solve the proposed optimization problem. Monte Carlo studies are evaluated the finite-sample performance of the proposed methods. The results indicate that the finite sample performance of the proposed methods performs better than that of the existing methods when there are leverage points or outliers in the response variable or explanatory variables. Finally, we apply the proposed methodology to analyze two real datasets.
PubMed: 35707691
DOI: 10.1080/02664763.2020.1722079 -
Entropy (Basel, Switzerland) May 2023Outliers are often present in data and many algorithms exist to find these outliers. Often we can verify these outliers to determine whether they are data errors or not....
Outliers are often present in data and many algorithms exist to find these outliers. Often we can verify these outliers to determine whether they are data errors or not. Unfortunately, checking such points is time-consuming and the underlying issues leading to the data error can change over time. An outlier detection approach should therefore be able to optimally use the knowledge gained from the verification of the ground truth and adjust accordingly. With advances in machine learning, this can be achieved by applying reinforcement learning on a statistical outlier detection approach. The approach uses an ensemble of proven outlier detection methods in combination with a reinforcement learning approach to tune the coefficients of the ensemble with every additional bit of data. The performance and the applicability of the reinforcement learning outlier detection approach are illustrated using granular data reported by Dutch insurers and pension funds under the Solvency II and FTK frameworks. The application shows that outliers can be identified by the ensemble learner. Moreover, applying the reinforcement learner on top of the ensemble model can further improve the results by optimising the coefficients of the ensemble learner.
PubMed: 37372186
DOI: 10.3390/e25060842 -
Mikrochimica Acta Jan 2024In this tutorial review, we provide a guiding reference on the good practice in building calibration and correlation experiments, and we explain how the results should... (Review)
Review
In this tutorial review, we provide a guiding reference on the good practice in building calibration and correlation experiments, and we explain how the results should be evaluated and interpreted. The review centers on calibration experiments where the relationship between response and concentration is expected to be linear, although certain of the described principles of good practice can be applied to non-linear systems, as well. Furthermore, it gives prominence to the meaning and correct interpretation of some of the statistical terms commonly associated with calibration and regression. To reach a mutual understanding in this significant field, we present, through a practical example, a step-by-step procedure, which deals with typical challenges related to linearity and outlier assessment, calculation of the associated error of the predicted concentration, and limits of detection. The utilization of regression lines to compare analytical methods is also elaborated. The results of regression and correlation data are acquired by implementing the Excel spreadsheet of Microsoft, being perhaps one of the most widely used user-friendly software in education and research.
PubMed: 38191690
DOI: 10.1007/s00604-023-06157-4 -
Sensors (Basel, Switzerland) Apr 2021For statistic space-time adaptive processing (STAP), a critical issue is estimating the clutter covariance matrix (CCM). However, sufficient training samples are...
For statistic space-time adaptive processing (STAP), a critical issue is estimating the clutter covariance matrix (CCM). However, sufficient training samples are difficult to obtain that satisfy the independent and identically distributed (IID) condition. It is because of the realistic heterogeneous environment faced by airborne radar. Moreover, one should eliminate contaminated training samples before CCM estimation. Aiming at the problems of the computational complexity and susceptibility to the outlier of the traditional generalized inner product (GIP) method, a clutter subspace-based training sampling selecting method is proposed combined with specific distribution in the space-time plane of clutter spectrum. Theoretical analysis and simulation results verified the proposed method and indicate that the proposed method is easy to construct CCM and has lower computational complexity and sensitivity to outliers.
PubMed: 33946952
DOI: 10.3390/s21093108 -
The Journal of Trauma and Acute Care... Aug 2019Expected performance rates for various outcome metrics are a hallmark of hospital quality indicators used by Agency of Healthcare Research and Quality, Center for...
BACKGROUND
Expected performance rates for various outcome metrics are a hallmark of hospital quality indicators used by Agency of Healthcare Research and Quality, Center for Medicare and Medicaid Services, and National Quality Forum. The identification of outlier hospitals with above- and below-expected mortality for emergency general surgery (EGS) operations is therefore of great value for EGS quality improvement initiatives. The aim of this study was to determine hospital variation in mortality after EGS operations, and compare characteristics between outlier hospitals.
METHODS
Using data from the California State Inpatient Database (2010-2011), we identified patients who underwent one of eight common EGS operations. Expected mortality was obtained from a Bayesian model, adjusting for both patient- and hospital-level variables. A hospital-level standardized mortality ratio (SMR) was constructed (ratio of observed to expected deaths). Only hospitals performing three or more of each operation were included. An "outlier" hospital was defined as having an SMR with 80% confidence interval that did not cross 1.0. High- and low-mortality SMR outliers were compared.
RESULTS
There were 140,333 patients included from 220 hospitals. Standardized mortality ratio varied from a high of 2.6 (mortality, 160% higher than expected) to a low of 0.2 (mortality, 80% lower than expected); 12 hospitals were high SMR outliers, and 28 were low SMR outliers. Standardized mortality was over three times worse in the high SMR outliers compared with the low SMR outliers (1.7 vs. 0.5; p < 0.001). Hospital-, patient-, and operative-level characteristics were equivalent in each outlier group.
CONCLUSION
There exists significant hospital variation in standardized mortality after EGS operations. High SMR outliers have significant excess mortality, while low SMR outliers have superior EGS survival. Common hospital-level characteristics do not explain the wide gap between underperforming and overperforming outlier institutions. These findings suggest that SMR can help guide assessment of EGS performance across hospitals; further research is essential to identify and define the hospital processes of care which translate into optimal EGS outcomes.
LEVEL OF EVIDENCE
Epidemiologic Study, level III.
Topics: California; Emergencies; Female; Hospital Mortality; Hospitals; Humans; Male; Middle Aged; Quality Improvement; Quality Indicators, Health Care; Surgical Procedures, Operative
PubMed: 30908450
DOI: 10.1097/TA.0000000000002271 -
Cureus Mar 2023Objectives Clinical discoveries are heralded by observing unique and unusual clinical cases. The effort of identifying such cases rests on the shoulders of busy...
Objectives Clinical discoveries are heralded by observing unique and unusual clinical cases. The effort of identifying such cases rests on the shoulders of busy clinicians. We assess the feasibility and applicability of an augmented intelligence framework to accelerate the rate of clinical discovery in preeclampsia and hypertensive disorders of pregnancy-an area that has seen little change in its clinical management. Methods We conducted a retrospective exploratory outlier analysis of participants enrolled in the folic acid clinical trial (FACT, N=2,301) and the Ottawa and Kingston birth cohort (OaK, N=8,085). We applied two outlier analysis methods: extreme misclassification contextual outlier and isolation forest point outlier. The extreme misclassification contextual outlier is based on a random forest predictive model for the outcome of preeclampsia in FACT and hypertensive disorder of pregnancy in OaK. We defined outliers in the extreme misclassification approach as mislabelled observations with a confidence level of more than 90%. Within the isolation forest approach, we defined outliers as observations with an average path length z score less or equal to -3, or more or equal to 3. Content experts reviewed the identified outliers and determined if they represented a potential novelty that could conceivably lead to a clinical discovery. Results In the FACT study, we identified 19 outliers using the isolation forest algorithm and 13 outliers using the random forest extreme misclassification approach. We determined that three (15.8%) and 10 (76.9%) were potential novelties, respectively. Out of 8,085 participants in the OaK study, we identified 172 outliers using the isolation forest algorithm and 98 outliers using the random forest extreme misclassification approach; four (2.3%) and 32 (32.7%), respectively, were potential novelties. Overall, the outlier analysis part of the augmented intelligence framework identified a total of 302 outliers. These were subsequently reviewed by content experts, representing the human part of the augmented intelligence framework. The clinical review determined that 49 of the 302 outliers represented potential novelties. Conclusions Augmented intelligence using extreme misclassification outlier analysis is a feasible and applicable approach for accelerating the rate of clinical discoveries. The use of an extreme misclassification contextual outlier analysis approach has resulted in a higher proportion of potential novelties than using the more traditional point outlier isolation forest approach. This finding was consistent in both the clinical trial and real-world cohort study data. Using augmented intelligence through outlier analysis has the potential to speed up the process of identifying potential clinical discoveries. This approach can be replicated across clinical disciplines and could exist within electronic medical records systems to automatically identify outliers within clinical notes to clinical experts.
PubMed: 37009347
DOI: 10.7759/cureus.36909 -
Journal of Computational Biology : a... Jun 2023Detection of omics sample outliers is important for preventing erroneous biological conclusions, developing robust experimental protocols, and discovering rare...
Detection of omics sample outliers is important for preventing erroneous biological conclusions, developing robust experimental protocols, and discovering rare biological states. Two recent publications describe robust algorithms for detecting transcriptomic sample outliers, but neither algorithm had been incorporated into a software tool for scientists. Here we describe Ensemble Methods for Outlier Detection (EnsMOD) which incorporates both algorithms. EnsMOD calculates how closely the quantitation variation follows a normal distribution, plots the density curves of each sample to visualize anomalies, performs hierarchical cluster analyses to calculate how closely the samples cluster with each other, and performs robust principal component analyses to statistically test if any sample is an outlier. The probabilistic threshold parameters can be easily adjusted to tighten or loosen the outlier detection stringency. EnsMOD can be used to analyze any omics dataset with normally distributed variance. Here it was used to analyze a simulated proteomics dataset, a multiomic (proteome and transcriptome) dataset, a single-cell proteomics dataset, and a phosphoproteomics dataset. EnsMOD successfully identified all of the simulated outliers, and subsequent removal of a detected outlier improved data quality for downstream statistical analyses.
Topics: Software; Algorithms; Gene Expression Profiling; Proteomics; Multiomics
PubMed: 37042708
DOI: 10.1089/cmb.2022.0243