-
MedRxiv : the Preprint Server For... Jun 2023Neuroanatomical normative modelling can capture individual variability in Alzheimer's Disease (AD). We used neuroanatomical normative modelling to track individuals'...
INTRODUCTION
Neuroanatomical normative modelling can capture individual variability in Alzheimer's Disease (AD). We used neuroanatomical normative modelling to track individuals' disease progression in people with mild cognitive impairment (MCI) and patients with AD.
METHODS
Cortical thickness and subcortical volume neuroanatomical normative models were generated using healthy controls (n~58k). These models were used to calculate regional Z-scores in 4361 T1-weighted MRI time-series scans. Regions with Z-scores <-1.96 were classified as outliers and mapped on the brain, and also summarised by total outlier count (tOC).
RESULTS
Rate of change in tOC increased in AD and in people with MCI who converted to AD and correlated with multiple non-imaging markers. Moreover, a higher annual rate of change in tOC increased the risk of MCI progression to AD. Brain Z-score maps showed that the hippocampus had the highest rate of atrophy change.
CONCLUSIONS
Individual-level atrophy rates can be tracked by using regional outlier maps and tOC.
PubMed: 37398392
DOI: 10.1101/2023.06.15.23291418 -
Analytical Chemistry Feb 2020Previously, we have introduced an approach for calculation of the full object distance in the frame of Principal Component Analysis that can be applied to data...
Previously, we have introduced an approach for calculation of the full object distance in the frame of Principal Component Analysis that can be applied to data exploration and classification. Now, a similar approach has been developed for regression problems in which a total distance can be calculated for every sample in projection modeling. Based on the total distance, a threshold for outlier detection has been developed by means of a data-driven estimation of the degrees of freedom and scaling parameters for the partial distances in the projection models. A joint threshold is used as a basis for a sequential outlier detection procedure. The iterative nature of the procedure helps to overcome masking effect in outliers, and a backward step eliminates swamping effects. Two real examples are used for illustration. The first dataset represents capsules filled with specially prepared mixtures of an active pharmaceutical ingredient and a number of excipients. This dataset is used to illustrate the behavior of possible outliers in the regression model and their corresponding locations in the X- and XY-distance plots. The second dataset consists of spectra of 135 whole wheat samples used for the prediction of protein, gluten, and moisture content. This dataset is used for a demonstration of the step-by-step application of the sequential procedure for outlier detection.
PubMed: 31880430
DOI: 10.1021/acs.analchem.9b04611 -
Bioinformatics (Oxford, England) Aug 2022It has become routine in neuroscience studies to measure brain networks for different individuals using neuroimaging. These networks are typically expressed as adjacency...
MOTIVATION
It has become routine in neuroscience studies to measure brain networks for different individuals using neuroimaging. These networks are typically expressed as adjacency matrices, with each cell containing a summary of connectivity between a pair of brain regions. There is an emerging statistical literature describing methods for the analysis of such multi-network data in which nodes are common across networks but the edges vary. However, there has been essentially no consideration of the important problem of outlier detection. In particular, for certain subjects, the neuroimaging data are so poor quality that the network cannot be reliably reconstructed. For such subjects, the resulting adjacency matrix may be mostly zero or exhibit a bizarre pattern not consistent with a functioning brain. These outlying networks may serve as influential points, contaminating subsequent statistical analyses. We propose a simple Outlier DetectIon for Networks (ODIN) method relying on an influence measure under a hierarchical generalized linear model for the adjacency matrices. An efficient computational algorithm is described, and ODIN is illustrated through simulations and an application to data from the UK Biobank.
RESULTS
ODIN was successful in identifying moderate to extreme outliers. Removing such outliers can significantly change inferences in downstream applications.
AVAILABILITY AND IMPLEMENTATION
ODIN has been implemented in both Python and R and these implementations along with other code are publicly available at github.com/pritamdey/ODIN-python and github.com/pritamdey/ODIN-r, respectively.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Humans; Algorithms; Neuroimaging; Brain; Software
PubMed: 35762974
DOI: 10.1093/bioinformatics/btac431 -
Entropy (Basel, Switzerland) Mar 2023Rate distortion theory was developed for optimizing lossy compression of data, but it also has applications in statistics. In this paper, we illustrate how rate...
Rate distortion theory was developed for optimizing lossy compression of data, but it also has applications in statistics. In this paper, we illustrate how rate distortion theory can be used to analyze various datasets. The analysis involves testing, identification of outliers, choice of compression rate, calculation of optimal reconstruction points, and assigning "descriptive confidence regions" to the reconstruction points. We study four models or datasets of increasing complexity: clustering, Gaussian models, linear regression, and a dataset describing orientations of early Islamic mosques. These examples illustrate how rate distortion analysis may serve as a common framework for handling different statistical problems.
PubMed: 36981344
DOI: 10.3390/e25030456 -
IEEE Transactions on Neural Networks... Apr 2021Few-shot learning (FSL) focuses on distilling transferrable knowledge from existing experience to cope with novel concepts for which the labeled data are scarce. A...
Few-shot learning (FSL) focuses on distilling transferrable knowledge from existing experience to cope with novel concepts for which the labeled data are scarce. A typical assumption in FSL is that the training examples of novel classes are all clean with no outlier interference. In many realistic applications where examples are provided by users, however, data are potentially noisy or unreadable. In this context, we introduce a novel research topic, robust FSL (RFSL), where we aim to address two types of outliers within user-provided data: the representation outlier (RO) and the label outlier (LO). Moreover, we introduce a metric for estimating robustness and use it to investigate the performance of several advanced methods to FSL when faced with user-provided outliers. In addition, we propose robust attentive profile networks (RapNets) to achieve outlier suppression. The results of a comprehensive evaluation of benchmark data sets demonstrate the shortcomings of current FSL methods and the superiority of the proposed RapNets when dealing with RFSL problems, establishing a benchmark for follow-up studies.
PubMed: 32310797
DOI: 10.1109/TNNLS.2020.2984710 -
Scientific Reports Jan 2022Recognition of anomalous events is a challenging but critical task in many scientific and industrial fields, especially when the properties of anomalies are unknown. In...
Recognition of anomalous events is a challenging but critical task in many scientific and industrial fields, especially when the properties of anomalies are unknown. In this paper, we introduce a new anomaly concept called "unicorn" or unique event and present a new, model-free, unsupervised detection algorithm to detect unicorns. The key component of the new algorithm is the Temporal Outlier Factor (TOF) to measure the uniqueness of events in continuous data sets from dynamic systems. The concept of unique events differs significantly from traditional outliers in many aspects: while repetitive outliers are no longer unique events, a unique event is not necessarily an outlier; it does not necessarily fall out from the distribution of normal activity. The performance of our algorithm was examined in recognizing unique events on different types of simulated data sets with anomalies and it was compared with the Local Outlier Factor (LOF) and discord discovery algorithms. TOF had superior performance compared to LOF and discord detection algorithms even in recognizing traditional outliers and it also detected unique events that those did not. The benefits of the unicorn concept and the new detection method were illustrated by example data sets from very different scientific fields. Our algorithm successfully retrieved unique events in those cases where they were already known such as the gravitational waves of a binary black hole merger on LIGO detector data and the signs of respiratory failure on ECG data series. Furthermore, unique events were found on the LIBOR data set of the last 30 years.
PubMed: 34996940
DOI: 10.1038/s41598-021-03526-y -
Environmental Science and Pollution... Dec 2023Grid-based approaches render an efficient framework for data clustering in the presence of incomplete, inexplicit, and uncertain data. This paper proposes an...
Grid-based approaches render an efficient framework for data clustering in the presence of incomplete, inexplicit, and uncertain data. This paper proposes an entropy-based grid approach (EGO) for outlier detection in clustered data. The given hard clusters obtained from a hard clustering algorithm, EGO uses entropy on the dataset as a whole or on an individual cluster to detect outliers. EGO works in two steps: explicit outlier detection and implicit outlier detection. Explicit outlier detection is concerned with those data points that are isolated in the grid cells. They are either far from the dense region or maybe a nearby isolated data point and therefore declared as an explicit outlier. Implicit outlier detection is associated with the detection of outliers that are perplexedly deviated from the normal pattern. The determination of such outliers is achieved using entropy change of the dataset or a specific cluster for each deviation. The elbow based on the trade-off between entropy and object geometries optimizes the outlier detection process. Experimental results on CHAMELEON datasets and other similar datasets suggested that the proposed approach(es) detect the outliers more precisely and extend the capability of outliers detection to an additional 4.5% to 8.6%. Moreover, the resultant clusters became more precise and compact when the entropy-based gridding approach is applied on top of hard clustering algorithms. The performance of the proposed algorithms is compared with well-known outlier detection algorithms, including DBSCAN, HDBSCAN, RE3WC, LOF, LoOP, ABOD, CBLOF and HBOS. Finally, a case study for detecting outliers in environmental data has been carried out using the proposed approach and results are generated on our synthetically prepared datasets. The performance shows that the proposed approach may be an industrial-oriented solution to outlier detection in environmental monitoring data.
Topics: Entropy; Algorithms; Cluster Analysis
PubMed: 37306879
DOI: 10.1007/s11356-023-26780-1 -
Knee Surgery, Sports Traumatology,... Feb 2023The primary aim was to evaluate the accuracy of navigation in opening wedge high tibial osteotomy (HTO). The secondary aim was to examine mid-term outcomes after HTO.
PURPOSE
The primary aim was to evaluate the accuracy of navigation in opening wedge high tibial osteotomy (HTO). The secondary aim was to examine mid-term outcomes after HTO.
METHODS
Inclusion criteria were patients with medial compartment knee osteoarthritis who underwent computer-assisted HTOs. Mechanical axis (MA), percentage MA (%MA), and change in posterior tibial slope (ΔPTS) were displayed on the navigation screen. Radiographic examinations included hip-knee-ankle (HKA) angle, medial proximal tibial angle (MPTA), joint line convergence angle (JLCA), and PTS. Preoperative and 5 weeks postoperative standing radiographs of the whole lower extremity and knee were used. Clinical evaluations were performed using American Knee Society knee score and function score both preoperatively and at last follow-up. Radiographic evaluations were performed by orthopedic surgeons. Intraoperative navigation after osteotomy and postoperative standing radiograph were compared. MA (HKA), %MA, and ΔPTS were compared. Outliers were defined as > 3° in MA, > 10% in %MA, and > 10° in ΔPTS. Outlier and non-outlier groups were compared. The rate of conversion to arthroplasty was examined.
RESULTS
This study involved 38 patients (44 knees) and last follow-up was at a mean of 5 years (range, 1-9 years). Mean American Knee Society knee score and function score improved significantly from 59 to 69 preoperatively to 95 and 85 at last follow-up, respectively. Absolute values of mean errors for MA, %MA, and ΔPTS were 2.1°, 9.3%, 1.2°, respectively. Outlier rates were 18% in MA, 39% in %MA, and 5% in ΔPTS. No significant factors were found in MA and ΔPTS. In %MA, preoperative JLCA was significantly higher in the outlier group compared to the non-outlier group. No knees underwent conversion to total knee arthroplasty. No differences in outcomes were found between outlier and non-outlier groups.
CONCLUSION
Although rates of outlier values in computer-assisted opening wedge HTO were high, mid-term outcomes were excellent.
LEVEL OF EVIDENCE
IV.
Topics: Humans; Osteoarthritis, Knee; Knee Joint; Tibia; Osteotomy; Arthroplasty, Replacement, Knee; Computers; Retrospective Studies
PubMed: 34738158
DOI: 10.1007/s00167-021-06788-1 -
Genes Feb 2023Outliers in the training or test set used to fit and evaluate a classifier on transcriptomics data can considerably change the estimated performance of the model. Hence,...
Outliers in the training or test set used to fit and evaluate a classifier on transcriptomics data can considerably change the estimated performance of the model. Hence, an either too weak or a too optimistic accuracy is then reported and the estimated model performance cannot be reproduced on independent data. It is then also doubtful whether a classifier qualifies for clinical usage. We estimate classifier performances in simulated gene expression data with artificial outliers and in two real-world datasets. As a new approach, we use two outlier detection methods within a bootstrap procedure to estimate the outlier probability for each sample and evaluate classifiers before and after outlier removal by means of cross-validation. We found that the removal of outliers changed the classification performance notably. For the most part, removing outliers improved the classification results. Taking into account the fact that there are various, sometimes unclear reasons for a sample to be an outlier, we strongly advocate to always report the performance of a transcriptomics classifier with and without outliers in training and test data. This provides a more diverse picture of a classifier's performance and prevents reporting models that later turn out to be not applicable for clinical diagnoses.
Topics: Transcriptome; Gene Expression Profiling; Probability; Research Design
PubMed: 36833313
DOI: 10.3390/genes14020387 -
BMC Public Health Jan 2023One of the seminal events since 2019 has been the outbreak of the SARS-CoV-2 pandemic. Countries have adopted various policies to deal with it, but they also differ in...
BACKGROUND
One of the seminal events since 2019 has been the outbreak of the SARS-CoV-2 pandemic. Countries have adopted various policies to deal with it, but they also differ in their socio-geographical characteristics and public health care facilities. Our study aimed to investigate differences between epidemiological parameters across countries.
METHOD
The analysed data represents SARS-CoV-2 repository provided by the Johns Hopkins University. Separately for each country, we estimated recovery and mortality rates using the SIRD model applied to the first 30, 60, 150, and 300 days of the pandemic. Moreover, a mixture of normal distributions was fitted to the number of confirmed cases and deaths during the first 300 days. The estimates of peaks' means and variances were used to identify countries with outlying parameters.
RESULTS
For 300 days Belgium, Cyprus, France, the Netherlands, Serbia, and the UK were classified as outliers by all three outlier detection methods. Yemen was classified as an outlier for each of the four considered timeframes, due to high mortality rates. During the first 300 days of the pandemic, the majority of countries underwent three peaks in the number of confirmed cases, except Australia and Kazakhstan with two peaks.
CONCLUSIONS
Considering recovery and mortality rates we observed heterogeneity between countries. Liechtenstein was the "positive" outlier with low mortality rates and high recovery rates, at the opposite, Yemen represented a "negative" outlier with high mortality for all four considered periods and low recovery for 30 and 60 days.
Topics: Humans; SARS-CoV-2; COVID-19; Pandemics; Disease Outbreaks; France
PubMed: 36681790
DOI: 10.1186/s12889-023-15092-1