-
Nutricion Hospitalaria Feb 2015When performing nutritional epidemiology studies, missing values and outliers inevitably appear. Missing values appear, for example, because of the difficulty in... (Review)
Review
When performing nutritional epidemiology studies, missing values and outliers inevitably appear. Missing values appear, for example, because of the difficulty in collecting data in dietary surveys, leading to a lack of data on the amounts of foods consumed or a poor description of these foods. Inadequate treatment during the data processing stage can create biases and loss of accuracy and, consequently, misinterpretation of the results. The objective of this article is to provide some recommendations about the treatment of missing and outlier data, and orientation regarding existing software for the determination of sample sizes and for performing statistical analysis. Some recommendations about data collection are provided as an important previous step in any nutritional research. We discuss methods used for dealing with missing values, especially the case deletion method, simple imputation and multiple imputation, with indications and examples. Identification, impact on statistical analysis and options available for adequate treatment of outlier values are explained, including some illustrative examples. Finally, the current software that totally or partially addresses the questions treated is mentioned, especially the free software available.
Topics: Data Interpretation, Statistical; Epidemiologic Methods; Epidemiology; Humans; Nutrition Surveys
PubMed: 25719786
DOI: 10.3305/nh.2015.31.sup3.8766 -
Journal of Applied Statistics 2023Discriminative subspace clustering (DSC) can make full use of linear discriminant analysis (LDA) to reduce the dimension of data and achieve effective clustering...
Discriminative subspace clustering (DSC) can make full use of linear discriminant analysis (LDA) to reduce the dimension of data and achieve effective clustering high-dimension data by clustering low-dimension data in discriminant subspace. However, most existing DSC algorithms do not consider the noise and outliers that may be contained in data sets, and when they are applied to the data sets with noise or outliers, and they often obtain poor performance due to the influence of noise and outliers. In this paper, we address the problem of the sensitivity of DSC to noise and outlier. Replacing the Euclidean distance in the objective function of LDA by an exponential non-Euclidean distance, we first develop a noise-insensitive LDA (NILDA) algorithm. Then, combining the proposed NILDA and a noise-insensitive fuzzy clustering algorithm: AFKM, we propose a noise-insensitive discriminative subspace fuzzy clustering (NIDSFC) algorithm. Experiments on some benchmark data sets show the effectiveness of the proposed NIDSFC algorithm.
PubMed: 36819072
DOI: 10.1080/02664763.2021.1937583 -
Economics Letters Feb 2022COVID-19 hit the economy in an unprecedented way, changing the data generating process of many series. We compare different seasonal adjustment methods through...
COVID-19 hit the economy in an unprecedented way, changing the data generating process of many series. We compare different seasonal adjustment methods through simulations, introducing outliers in the trend and seasonality to reproduce the heterogeneity in the series during COVID-19.
PubMed: 34931098
DOI: 10.1016/j.econlet.2021.110206 -
Conservation Biology : the Journal of... Jun 2018Conservation operates within complex systems with incomplete knowledge of the system and the interventions utilized. This frequently results in the inability to find...
Conservation operates within complex systems with incomplete knowledge of the system and the interventions utilized. This frequently results in the inability to find generally applicable methods to alleviate threats to Earth's vanishing wildlife. One approach used in medicine and the social sciences has been to develop a deeper understanding of positive outliers. Where such outliers share similar characteristics, they may be considered exceptional responders. We devised a 4-step framework for identifying exceptional responders in conservation: identification of the study system, identification of the response structure, identification of the threshold for exceptionalism, and identification of commonalities among outliers. Evaluation of exceptional responders provides additional information that is often ignored in randomized controlled trials and before-after control-intervention experiments. Interrogating the contextual factors that contribute to an exceptional outcome allow exceptional responders to become valuable pieces of information leading to unexpected discoveries and novel hypotheses.
Topics: Conservation of Natural Resources
PubMed: 28856730
DOI: 10.1111/cobi.13006 -
Sensors (Basel, Switzerland) Aug 2022Analysing human physiological data allows access to the health state and the state of mind of the subject individual. Whenever a person is sick, having a panic attack,...
Analysing human physiological data allows access to the health state and the state of mind of the subject individual. Whenever a person is sick, having a panic attack, happy or scared, physiological signals will be different. In terms of physiological signals, we focus, in this manuscript, on monitoring breathing patterns. The scope can be extended to also address heart rate and other variables. We describe an analysis of breathing rate patterns during activities including resting, walking, running and watching a movie. We model normal breathing behaviours by statistically analysing signals, processed to represent quantities of interest. We consider moving maximum/minimum, the amplitude and the Fourier transform of the respiration signal, working with different window sizes. We then learn a statistical model for the basal behaviour, per individual, and detect outliers. When outliers are detected, a system that incorporates our approach would send a visible signal through a smart garment or through other means. We describe alert generation performance in two datasets-one literature dataset and one collected as a field study for this work. In particular, when learning personal rest distributions for the breathing signals of 14 subjects, we see alerts generated more often when the same individual is running than when they are tested in rest conditions.
Topics: Humans; Models, Statistical; Respiration; Respiratory Rate; Rest
PubMed: 36016067
DOI: 10.3390/s22166306 -
Transfusion Medicine (Oxford, England) Jun 2023To investigate if time to initiate a blood transfusion after an informative laboratory test could feasibly be used by the transfusion medicine service as a metric to...
OBJECTIVES
To investigate if time to initiate a blood transfusion after an informative laboratory test could feasibly be used by the transfusion medicine service as a metric to monitor for transfusion delays.
BACKGROUND
Delayed transfusions may result in patient morbidity and mortality, but no standards for timely transfusion have been developed. Information technology tools could be implemented to identify gaps in provision of blood and to recognise areas of improvement.
MATERIALS AND METHODS
Data obtained from a children's hospital's data science platform and time from the release of laboratory results to the initiation of transfusions were calculated and weekly medians were used for trend analyses. Outlier events were obtained using locally estimated scatterplot smoothing and generalised extreme studentized deviate test.
RESULTS
Overall, the number of outlier events on the timing of transfusions based on patients' haemoglobin level and platelet count were small (n = 1 and n = 0 for 139 weeks, respectively). Investigation of these events for adverse clinical outcomes was non-significant.
CONCLUSIONS
Herein, we propose that the trends and outlier events could be further investigated and used to make decisions and implement protocols to improve patient care.
Topics: Child; Humans; Blood Transfusion; Platelet Count
PubMed: 36807938
DOI: 10.1111/tme.12960 -
Journal of Neurotrauma Jan 2024Blood biomarkers have been studied to improve the clinical assessment and prognostication of patients with moderate-severe traumatic brain injury (mo/sTBI). To assess...
Blood biomarkers have been studied to improve the clinical assessment and prognostication of patients with moderate-severe traumatic brain injury (mo/sTBI). To assess their clinical usability, one needs to know of potential factors that might cause outlier values and affect clinical decision making. In a prospective study, we recruited patients with mo/sTBI ( = 85) and measured the blood levels of eight protein brain pathophysiology biomarkers, including glial fibrillary acidic protein (GFAP), S100 calcium-binding protein B (S100B), neurofilament light (Nf-L), heart-type fatty acid-binding protein (H-FABP), interleukin-10 (IL-10), total tau (T-tau), amyloid β40 (Aβ40) and amyloid β42 (Aβ42), within 24 h of admission. Similar analyses were conducted for controls ( = 40) with an acute orthopedic injury without any head trauma. The patients with TBI were divided into subgroups of normal versus abnormal ( = 9/76) head computed tomography (CT) and favorable (Glasgow Outcome Scale Extended [GOSE] 5-8) versus unfavorable (GOSE <5) ( = 38/42, 5 missing) outcome. Outliers were sought individually from all subgroups from and the whole TBI patient population. Biomarker levels outside Q1 - 1.5 interquartile range (IQR) or Q3 + 1.5 IQR were considered as outliers. The medical records of each outlier patient were reviewed in a team meeting to determine possible reasons for outlier values. A total of 29 patients (34%) combined from all subgroups and 12 patients (30%) among the controls showed outlier values for one or more of the eight biomarkers. Nine patients with TBI and five control patients had outlier values in more than one biomarker (up to 4). All outlier values were > Q3 + 1.5 IQR. A logical explanation was found for almost all cases, except the amyloid proteins. Explanations for outlier values included extremely severe injury, especially for GFAP and S100B. In the case of H-FABP and IL-10, the explanation was extracranial injuries (thoracic injuries for H-FABP and multi-trauma for IL-10), in some cases these also were associated with abnormally high S100B. Timing of sampling and demographic factors such as age and pre-existing neurological conditions (especially for T-tau), explained some of the abnormally high values especially for Nf-L. Similar explanations also emerged in controls, where the outlier values were caused especially by pre-existing neurological diseases. To utilize blood-based biomarkers in clinical assessment of mo/sTBI, very severe or fatal TBIs, various extracranial injuries, timing of sampling, and demographic factors such as age and pre-existing systemic or neurological conditions must be taken into consideration. Very high levels seem to be often associated with poor prognosis and mortality (GFAP and S100B).
Topics: Humans; Fatty Acid Binding Protein 3; Interleukin-10; Prospective Studies; Brain Injuries, Traumatic; Biomarkers; S100 Calcium Binding Protein beta Subunit; Glial Fibrillary Acidic Protein
PubMed: 37725575
DOI: 10.1089/neu.2023.0120 -
Entropy (Basel, Switzerland) Apr 2022Outlier detection is an important research direction in the field of data mining. Aiming at the problem of unstable detection results and low efficiency caused by...
Outlier detection is an important research direction in the field of data mining. Aiming at the problem of unstable detection results and low efficiency caused by randomly dividing features of the data set in the Isolation Forest algorithm in outlier detection, an algorithm CIIF (Cluster-based Improved Isolation Forest) that combines clustering and Isolation Forest is proposed. CIIF first uses the -means method to cluster the data set, selects a specific cluster to construct a selection matrix based on the results of the clustering, and implements the selection mechanism of the algorithm through the selection matrix; then builds multiple isolation trees. Finally, the outliers are calculated according to the average search length of each sample in different isolation trees, and the Top-n objects with the highest outlier scores are regarded as outliers. Through comparative experiments with six algorithms in eleven real data sets, the results show that the CIIF algorithm has better performance. Compared to the Isolation Forest algorithm, the average AUC (Area under the Curve of ROC) value of our proposed CIIF algorithm is improved by 7%.
PubMed: 35626495
DOI: 10.3390/e24050611 -
Entropy (Basel, Switzerland) Nov 2021With the advent of big data and the popularity of black-box deep learning methods, it is imperative to address the robustness of neural networks to noise and outliers....
With the advent of big data and the popularity of black-box deep learning methods, it is imperative to address the robustness of neural networks to noise and outliers. We propose the use of Winsorization to recover model performances when the data may have outliers and other aberrant observations. We provide a comparative analysis of several probabilistic artificial intelligence and machine learning techniques for supervised learning case studies. Broadly, Winsorization is a versatile technique for accounting for outliers in data. However, different probabilistic machine learning techniques have different levels of efficiency when used on outlier-prone data, with or without Winsorization. We notice that Gaussian processes are extremely vulnerable to outliers, while deep learning techniques in general are more robust.
PubMed: 34828244
DOI: 10.3390/e23111546 -
Pharmaceutical Statistics May 2020Potency bioassays are used to measure biological activity. Consequently, potency is considered a critical quality attribute in manufacturing. Relative potency is...
Potency bioassays are used to measure biological activity. Consequently, potency is considered a critical quality attribute in manufacturing. Relative potency is measured by comparing the concentration-response curves of a manufactured test batch with that of a reference standard. If the curve shapes are deemed similar, the test batch is said to exhibit constant relative potency with the reference standard, a critical requirement for calibrating the potency of the final drug product. Outliers in bioassay potency data may result in the false acceptance/rejection of a bad/good sample and, if accepted, may yield a biased relative potency estimate. To avoid these issues, the USP<1032> recommends the screening of bioassay data for outliers prior to performing a relative potency analysis. In a recently published work, the effects of one or more outliers, outlier size, and outlier type on similarity testing and estimation of relative potency were thoroughly examined, confirming the USP<1032> outlier guidance. As a follow-up, several outlier detection methods, including those proposed by the USP<1010>, are evaluated and compared in this work through computer simulation. Two novel outlier detection methods are also proposed. The effects of outlier removal on similarity testing and estimation of relative potency were evaluated, resulting in recommendations for best practice.
Topics: Biological Assay; Data Interpretation, Statistical; Dose-Response Relationship, Drug; Models, Statistical; Reference Standards; Research Design
PubMed: 31762118
DOI: 10.1002/pst.1984