Did you mean: encoded data
-
Journal of Biomedical Informatics Feb 2016The identification of similar entities represented by records in different databases has drawn considerable attention in many application areas, including in the health...
The identification of similar entities represented by records in different databases has drawn considerable attention in many application areas, including in the health domain. One important type of entity matching application that is vital for quality healthcare analytics is the identification of similar patients, known as similar patient matching. A key component of identifying similar records is the calculation of similarity of the values in attributes (fields) between these records. Due to increasing privacy and confidentiality concerns, using the actual attribute values of patient records to identify similar records across different organizations is becoming non-trivial because the attributes in such records often contain highly sensitive information such as personal and medical details of patients. Therefore, the matching needs to be based on masked (encoded) values while being effective and efficient to allow matching of large databases. Bloom filter encoding has widely been used as an efficient masking technique for privacy-preserving matching of string and categorical values. However, no work on Bloom filter-based masking of numerical data, such as integer (e.g. age), floating point (e.g. body mass index), and modulus (numbers wrap around upon reaching a certain value, e.g. date and time), which are commonly required in the health domain, has been presented in the literature. We propose a framework with novel methods for masking numerical data using Bloom filters, thereby facilitating the calculation of similarities between records. We conduct an empirical study on publicly available real-world datasets which shows that our framework provides efficient masking and achieves similar matching accuracy compared to the matching of actual unencoded patient records.
Topics: Algorithms; Computer Security; Confidentiality; Electronic Health Records; Humans; Medical Informatics; Privacy
PubMed: 26707453
DOI: 10.1016/j.jbi.2015.12.004 -
BMC Medical Research Methodology Jan 2022Privacy preserving record linkage (PPRL) methods using Bloom filters have shown promise for use in operational linkage settings. However real-world evaluations are...
BACKGROUND
Privacy preserving record linkage (PPRL) methods using Bloom filters have shown promise for use in operational linkage settings. However real-world evaluations are required to confirm their suitability in practice.
METHODS
An extract of records from the Western Australian (WA) Hospital Morbidity Data Collection 2011-2015 and WA Death Registrations 2011-2015 were encoded to Bloom filters, and then linked using privacy-preserving methods. Results were compared to a traditional, un-encoded linkage of the same datasets using the same blocking criteria to enable direct investigation of the comparison step. The encoded linkage was carried out in a blinded setting, where there was no access to un-encoded data or a 'truth set'.
RESULTS
The PPRL method using Bloom filters provided similar linkage quality to the traditional un-encoded linkage, with 99.3% of 'groupings' identical between privacy preserving and clear-text linkage.
CONCLUSION
The Bloom filter method appears suitable for use in situations where clear-text identifiers cannot be provided for linkage.
Topics: Australia; Computer Security; Humans; Medical Record Linkage; Medical Records Systems, Computerized; Privacy
PubMed: 35034615
DOI: 10.1186/s12874-022-01510-2 -
Journal of Immunology (Baltimore, Md. :... Mar 1997To understand whether the distinct VHDJH gene utilization by natural polyreactive Abs reflects the developmentally restricted Ig VHDJH rearrangements putatively...
To understand whether the distinct VHDJH gene utilization by natural polyreactive Abs reflects the developmentally restricted Ig VHDJH rearrangements putatively expressed by B-1 cells, we generated 11 (8 IgM, 1 IgG3, 2 IgA1), 7 (6 IgM, 1 IgG1), and 7 (2 IgM, 3 IgG1, 2 IgG3) mAb-producing lines using B-1a (surface CD5+, CD45RAlow), B-1b (surface CD5-, CD45RAlow, CD5 mRNA+), and B-2 (surface CD5-, CD45RAhigh, CD5 mRNA-) cells, respectively, sorted from adult human peripheral blood. Most B-1a and B-1b, but no B-2, cell-derived mAbs were polyreactive; i.e., they bound different self and foreign Ags with different affinities. B-1a and B-2 mAbs preferentially utilized VH4 (p = 0.003) and VH3 (p = 0.010) genes, respectively. All three mAb populations utilized DXP, DLR, DN DH genes, and JH6, but no mAb utilized DHQ52. There were fewer unencoded nucleotide (N) additions in the VHDJH junctions of B-1b (3.00 +/- 2.52, mean +/- SD) than of B-1a (12.45 +/- 3.93, p = 1.23 x 10(-5)) or B-2 (8.29 +/- 4.75, p = 0.020) mAbs. Partly due to the fewer N additions and a paucity of D-D fusions, the B-1b mAb CDR3s were significantly shorter than the B-1a mAb CDR3s (p = 0.013), which contained a nonrandom Tyr distribution (p = 0.003). Finally, all but two B-1 cell-derived mAbs were mutated, in a fashion similar to that of the Ag-selected B-2 mAbs. Thus, in the human adult, B-1 cells that make natural polyreactive Abs may not be representative of the predominantly B-1 developmental waves of colonization of the fetal and neonatal B cell repertoires, and are somatically selected.
Topics: Adult; Amino Acid Sequence; Antibodies, Monoclonal; Antigen-Antibody Reactions; B-Lymphocyte Subsets; Base Sequence; CD5 Antigens; Cell Line; Genes, Immunoglobulin; Humans; Immunoglobulin Heavy Chains; Immunoglobulin Joining Region; Immunoglobulin Variable Region; Molecular Sequence Data; Point Mutation; RNA, Messenger
PubMed: 9037000
DOI: No ID Found -
International Journal of Population... May 2019Available and practical methods for privacy preserving linkage have shortcomings: methods utilising anonymous linkage codes provide limited accuracy while methods based...
INTRODUCTION
Available and practical methods for privacy preserving linkage have shortcomings: methods utilising anonymous linkage codes provide limited accuracy while methods based on Bloom filters have proven vulnerable to frequency-based attacks.
OBJECTIVES
In this paper, we present and evaluate a novel protocol that aims to meld both the accuracy of the Bloom filter method with the privacy achievable through the anonymous linkage code methodology.
METHODS
The protocol involves creating multiple match-keys for each record, with the composition of each match-key depending on attributes of the underlying datasets being compared. The protocol was evaluated through de-duplication of four administrative datasets and two synthetic datasets; the 'answers' outlining which records belonged to the same individual were known for each dataset. The results were compared against results achieved with un-encoded linkage and other privacy preserving techniques on the same datasets.
RESULTS
The multiple match-key protocol presented here achieved high quality across all datasets, performing better than record-level Bloom filters and the SLK, but worse than field-level Bloom filters.
CONCLUSION
The presented method provides high linkage quality while avoiding the frequency based attacks that have been demonstrated against the Bloom filter approach. The method appears promising for real world use.
PubMed: 32935028
DOI: 10.23889/ijpds.v4i1.1094