-
Water, Air, and Soil Pollution 2018Low-cost urban air quality sensor networks are increasingly used to study the spatio-temporal variability in air pollutant concentrations. Recently installed low-cost...
Low-cost urban air quality sensor networks are increasingly used to study the spatio-temporal variability in air pollutant concentrations. Recently installed low-cost urban sensors, however, are more prone to result in erroneous data than conventional monitors, e.g., leading to outliers. Commonly applied outlier detection methods are unsuitable for air pollutant measurements that have large spatial and temporal variations as occur in urban areas. We present a novel outlier detection method based upon a spatio-temporal classification, focusing on hourly NO concentrations. We divide a full year's observations into 16 spatio-temporal classes, reflecting urban background vs. urban traffic stations, weekdays vs. weekends, and four periods per day. For each spatio-temporal class, we detect outliers using the mean and standard deviation of the normal distribution underlying the truncated normal distribution of the NO observations. Applying this method to a low-cost air quality sensor network in the city of Eindhoven, the Netherlands, we found 0.1-0.5% of outliers. Outliers could reflect measurement errors or unusual high air pollution events. Additional evaluation using expert knowledge is needed to decide on treatment of the identified outliers. We conclude that our method is able to detect outliers while maintaining the spatio-temporal variability of air pollutant concentrations in urban areas.
PubMed: 29563652
DOI: 10.1007/s11270-018-3756-7 -
IEEE Transactions on Neural Networks... May 2021In this brief, a new outlier-resistant state estimation (SE) problem is addressed for a class of recurrent neural networks (RNNs) with mixed time-delays. The mixed time...
In this brief, a new outlier-resistant state estimation (SE) problem is addressed for a class of recurrent neural networks (RNNs) with mixed time-delays. The mixed time delays comprise both discrete and distributed delays that occur frequently in signal transmissions among artificial neurons. Measurement outputs are sometimes subject to abnormal disturbances (resulting probably from sensor aging/outages/faults/failures and unpredictable environmental changes) leading to measurement outliers that would deteriorate the estimation performance if directly taken into the innovation in the estimator design. We propose to use a certain confidence-dependent saturation function to mitigate the side effects from the measurement outliers on the estimation error dynamics (EEDs). Through using a combination of Lyapunov-Krasovskii functional and inequality manipulations, a delay-dependent criterion is established for the existence of the outlier-resistant state estimator ensuring that the corresponding EED achieves the asymptotic stability with a prescribed H performance index. Then, the explicit characterization of the estimator gain is obtained by solving a convex optimization problem. Finally, numerical simulation is carried out to demonstrate the usefulness of the derived theoretical results.
PubMed: 32452774
DOI: 10.1109/TNNLS.2020.2991151 -
JCO Clinical Cancer Informatics Oct 2022Artificial intelligence (AI) models for medical image diagnosis are often trained and validated on curated data. However, in a clinical setting, images that are outliers...
PURPOSE
Artificial intelligence (AI) models for medical image diagnosis are often trained and validated on curated data. However, in a clinical setting, images that are outliers with respect to the training data, such as those representing rare disease conditions or acquired using a slightly different setup, can lead to wrong decisions. It is not practical to expect clinicians to be trained to discount results for such outlier images. Toward clinical deployment, we have designed a method to train cautious AI that can automatically flag outlier cases.
MATERIALS AND METHODS
Our method-ClassClust-forms tight clusters of training images using supervised contrastive learning, which helps it identify outliers during testing. We compared ClassClust's ability to detect outliers with three competing methods on four publicly available data sets covering pathology, dermatoscopy, and radiology. We held out certain diseases, artifacts, and types of images from training data and examined the ability of various models to detect these as outliers during testing. We compared the decision accuracy of the models on held-out nonoutlier images also. We visualized the regions of the images that the models used for their decisions.
RESULTS
Area under receiver operating characteristic curve for outlier detection was consistently higher using ClassClust compared with the previous methods. Average accuracy on held-out nonoutlier images was also higher, and the visualizations of image regions were more informative using ClassClust.
CONCLUSION
The ability to flag outlier test cases need not be at odds with the ability to accurately classify nonoutliers in AI models. Although the latter capability has received research and regulatory attention, AI models for clinical deployment should possess the former as well.
Topics: Artificial Intelligence; Data Collection; Humans; ROC Curve; Trust
PubMed: 36228179
DOI: 10.1200/CCI.22.00067 -
IEEE Transactions on Neural Networks... Apr 2019It is a grand challenge to identify the outliers existing in subspaces from a high-dimensional data set. A brute-force method is computationally prohibitive since it...
It is a grand challenge to identify the outliers existing in subspaces from a high-dimensional data set. A brute-force method is computationally prohibitive since it requires examining an exponential number of subspaces. Current state-of-the-art methods explore various heuristics to significantly prune subspaces, facing the tradeoff between the subspace completeness and search efficiency. In this brief, we discuss a principal type of subspace outliers whose behaviors are different from the others on individual attributes. We formulate such outliers by a novel notion of the Markov boundary-based (MBB) outliers. The central idea is that for each attribute T in a data set, we consider only the subspace representing the knowledge needed to predict the behavior on T , which is captured by the MB of T . Then, the outliers whose behavior is different from others on T can be detected in the subspace of the MB, and thus, our approach reduces the number of possible subspaces from exponential to linear with respect to dimensionality. Using both synthetic and real data sets, we validate the effectiveness and efficiency of our method.
PubMed: 30130240
DOI: 10.1109/TNNLS.2018.2861743 -
Sensors (Basel, Switzerland) Jul 2020Sensor networks in real-world environments, such as smart cities or ambient intelligent platforms, provide applications with large and heterogeneous sets of data...
Sensor networks in real-world environments, such as smart cities or ambient intelligent platforms, provide applications with large and heterogeneous sets of data streams. Outliers-observations that do not conform to an expected behavior-has then turned into a crucial task to establish and maintain secure and reliable databases in this kind of platforms. However, the procedures to obtain accurate models for erratic observations have to operate with low complexity in terms of storage and computational time, in order to attend the limited processing and storage capabilities of the sensor nodes in these environments. In this work, we analyze three binary classifiers based on three statistical prediction models-ARIMA (Auto-Regressive Integrated Moving Average), GAM (Generalized Additive Model), and LOESS (LOcal RegrESSion)-for outlier detection with low memory consumption and computational time rates. As a result, we provide (1) the best classifier and settings to detect outliers, based on the ARIMA model, and (2) two real-world classified datasets as ground truths for future research.
PubMed: 32751248
DOI: 10.3390/s20154217 -
Sensors (Basel, Switzerland) Sep 2023Outliers can be generated in the power system due to aging system equipment, faulty sensors, incorrect line connections, etc. The existence of these outliers will pose a...
Outliers can be generated in the power system due to aging system equipment, faulty sensors, incorrect line connections, etc. The existence of these outliers will pose a threat to the safe operation of the power system, reduce the quality of the data, affect the completeness and accuracy of the data, and thus affect the monitoring analysis and control of the power system. Therefore, timely identification and treatment of outliers are essential to ensure stable and reliable operation of the power system. In this paper, we consider the problem of detecting and localizing outliers in power systems. The paper proposes a Minorization-Maximization (MM) algorithm for outlier detection and localization and an estimation of unknown parameters of the Gaussian mixture model (GMM). To verify the performance of the method, we conduct simulation experiments by simulating different test scenarios in the IEEE 14-bus system. Numerical examples show that in the presence of outliers, the MM algorithm can detect outliers better than the traditional algorithm and can accurately locate outliers with a probability of more than 95%. Therefore, the algorithm provides an effective method for the handling of outliers in the power system, which helps to improve the monitoring analyzing and controlling ability of the power system and to ensure the stable and reliable operation of the power system.
PubMed: 37836883
DOI: 10.3390/s23198053 -
Sensors (Basel, Switzerland) Apr 2021The aim of this paper is to provide an extended analysis of the outlier detection, using probabilistic and AI techniques, applied in a demo pilot demand response in...
The aim of this paper is to provide an extended analysis of the outlier detection, using probabilistic and AI techniques, applied in a demo pilot demand response in blocks of buildings project, based on real experiments and energy data collection with detected anomalies. A numerical algorithm was created to differentiate between natural energy peaks and outliers, so as to first apply a data cleaning. Then, a calculation of the impact in the energy baseline for the demand response computation was implemented, with improved precision, as related to other referenced methods and to the original data processing. For the demo pilot project implemented in the Technical University of Cluj-Napoca block of buildings, without the energy baseline data cleaning, in some cases it was impossible to compute the established key performance indicators (peak power reduction, energy savings, cost savings, CO emissions reduction) or the resulted values were far much higher (>50%) and not realistic. Therefore, in real case business models, it is crucial to use outlier's removal. In the past years, both companies and academic communities pulled their efforts in generating input that consist in new abstractions, interfaces, approaches for scalability, and crowdsourcing techniques. Quantitative and qualitative methods were created with the scope of error reduction and were covered in multiple surveys and overviews to cope with outlier detection.
PubMed: 33922298
DOI: 10.3390/s21092946 -
Heart (British Cardiac Society) May 2019To assess the effect of various evaluation and reporting strategies in determining outlier surgeons, defined by having worse-than-expected mortality after cardiac...
OBJECTIVE
To assess the effect of various evaluation and reporting strategies in determining outlier surgeons, defined by having worse-than-expected mortality after cardiac surgery.
METHODS
Our study included 33 394 isolated coronary artery bypass graft (CABG) procedures performed by 136 surgeons and 12 172 surgical aortic valve replacement (SAVR) procedures performed by 113 surgeons between 2010 and 2014. Three current methodologies based on the framework of comparing observed and expected (O/E ratio) mortality, with different distributional assumptions, were examined. We further assessed the consistency of outliers detected by these three methods and the impact of using different time windows and aggregating data of CABG and SAVR procedures.
RESULTS
The three methods were consistent and detected same outliers, with the least conservative method detecting additional outliers (outliers detected for methods 1, 2 and 3: CABG 3 (2.2%), 2 (1.5%) and 8 (5.9%); SAVR 1 (0.9%), 0 (0.0%) and 11 (9.7%)). When numbers of cases recorded were low and events were rare, the two more conservative methods were unlikely to detect outliers unless the O/E ratios were extremely high. However, these two methods were more consistent in detecting the same surgeons as outliers across different time windows for assessment. Of the surgeons who performed both CABG and SAVR, none was an outlier for both procedures when assessed separately. Aggregating data from CABG and SAVR may lead to results to be dominated by the procedure that had a higher caseload.
CONCLUSIONS
The choices of outlier assessment method, time window for assessment and data aggregation have an intertwined impact on detecting outlier surgeons, often representing different value assumptions toward patient protection and provider penalty. It is desirable to use different methods as sensitivity analyses, avoid aggregating procedures and avoid rare-event endpoints if possible.
Topics: Cardiac Surgical Procedures; Health Services Research; Humans; Mandatory Reporting; Quality of Health Care; Retrospective Studies; Surgeons; United States
PubMed: 30415207
DOI: 10.1136/heartjnl-2018-313650 -
Perception Mar 2024The accurate perception of groups with outliers can help us identify potential risks. However, it is unclear how outliers affect the perception of group emotion. To...
The accurate perception of groups with outliers can help us identify potential risks. However, it is unclear how outliers affect the perception of group emotion. To address this question, we conducted a study on group emotion perception in the context of facial identity. We presented 74 participants with pictures of crowds, and asked them to evaluate the valence ratios and intensity of the crowd by means of the Emotional Aperture Measure. The results revealed that outlier emotions were often overestimated within crowds. Moreover, we found that the emotional expression of a close friend modulated the perception of outliers. Specifically, when a close friend expressed the group emotion, participants overestimated the outlier less than when a close friend expressed the outlier emotion. These results suggest that people can detect outliers within groups, and that their perception of group emotion is influenced by close friends. Thus, we provide evidence that facial identity affects group emotion perception.
Topics: Humans; Emotions; Facial Expression
PubMed: 38158215
DOI: 10.1177/03010066231218519 -
Sensors (Basel, Switzerland) May 2018The aim of structural identification is to provide accurate knowledge of the behaviour of existing structures. In most situations, finite-element models are updated...
The aim of structural identification is to provide accurate knowledge of the behaviour of existing structures. In most situations, finite-element models are updated using behaviour measurements and field observations. Error-domain model falsification (EDMF) is a multi-model approach that compares finite-element model predictions with sensor measurements while taking into account epistemic and stochastic uncertainties-including the systematic bias that is inherent in the assumptions behind structural models. Compared with alternative model-updating strategies such as residual minimization and traditional Bayesian methodologies, EDMF is easy-to-use for practising engineers and does not require precise knowledge of values for uncertainty correlations. However, wrong parameter identification and flawed extrapolation may result when undetected outliers occur in the dataset. Moreover, when datasets consist of a limited number of static measurements rather than continuous monitoring data, the existing signal-processing and statistics-based algorithms provide little support for outlier detection. This paper introduces a new model-population methodology for outlier detection that is based on the expected performance of the as-designed sensor network. Thus, suspicious measurements are identified even when few measurements, collected with a range of sensors, are available. The structural identification of a full-scale bridge in Exeter (UK) is used to demonstrate the applicability of the proposed methodology and to compare its performance with existing algorithms. The results show that outliers, capable of compromising EDMF accuracy, are detected. Moreover, a metric that separates the impact of powerful sensors from the effects of measurement outliers have been included in the framework. Finally, the impact of outlier occurrence on parameter identification and model extrapolation (for example, reserve capacity assessment) is evaluated.
PubMed: 29795035
DOI: 10.3390/s18061702