everythingpossible - Fotolia

How data duplication in healthcare is diagnosed

Electronic health record systems have helped reduce duplicate patient data in hospitals -- but they haven't cured the problem. Find out how organizations are addressing the issue.

Andy Hayler, Information Difference

Published: 16 Nov 2018

We're all familiar with the issue of data duplication -- who hasn't had to deal with multiple pieces of junk mail from the same company? This may be mildly irritating, but it has no real consequences except increased costs for those sending the same marketing flyers.

If you go to a hospital, you might assume it has a higher standard of data management -- but you would be wrong.

According to the American Health Information Management Association, the average hospital has a 10% duplication rate of patient records. Consider the consequences: Patients may be given the wrong treatment or drugs, or they may have allergies that could be missed if the information is on the duplicate record, but not on a current patient record entry. Duplicate medical records could lead to repeat testing, unnecessary X-rays or delayed diagnoses, and possibly even incorrect surgery -- and accurate patient information is crucial in the case of blood transfusions.

Costs and causes of data duplication

A 2018 survey by Imprivata and healthsystemCIO.com found that 17% of U.S. healthcare CIOs from 55 hospitals reported an incident of patient harm at their medical facility due to patient data duplication, and 40% reported privacy breaches.

A survey of 1,392 health IT managers by Black Book Market Research conducted from the third quarter of 2017 to the first quarter of 2018 found that an average of 18% of patient records were duplicates, with an associated cost of $1,950 per inpatient stay.

Preventable medical error is the third-largest cause of death in the United States after heart disease and cancer. A 2014 study by Smart Card Alliance estimated that 195,000 deaths per year in the U.S. occur due to medical error, with 58% of these associated with wrong patient errors. According to the same study, just fixing the duplicate data in patient records can cost a large hospital more than $1 million per year.

What is the root cause of this? A 2007 study by Johns Hopkins Hospital found that 92% of such errors occur during in-patient registration. Misspellings of names and failure to check documentation were common issues.

Certain sectors of the population are particularly at risk: Children typically lack official forms of identification -- especially because Social Security numbers are optional for newborns in the U.S. Children cannot always speak for themselves, so pediatric records are more likely to have incorrect information than adult records. In just one example involving twin girls at the Children's Hospital Colorado, correcting the data issues related to the twins involved 16 staff members over a three-month period.

The cost of duplicate medical records — Data duplication in healthcare is still an issue for hospitals.

Approaches to data deduplication

Given the scale of the problem, what can actually be done about duplicate medical records? Clearly, having electronic health records (EHRs) is a start. Although EHRs are the norm these days, they are not universal, with paper-based systems still existing in many countries.

The U.S. moved to electronic records after being prodded by successive governments. And in the U.K., there is a plan to move entirely to electronic patient records by 2020. Having a unique patient identifier is an important step to combat data duplication, as relying on name and date of birth may not be enough -- as was shown in the case of the identical twins.

Technology can certainly help, as the data management industry has a long track record of identifying duplicate customer records and helping to match and merge such records. Data quality software uses a number of techniques to identify data duplication.

Deterministic matching uses common identifiers like first and last name, date of birth, address, and phone number to check whether records are duplicates; a unique patient record number is ideal for such an approach, but in the absence of this, duplicates may be missed, as no single field may be able to provide a reliable match between records.

Probabilistic matching uses algorithms to determine the likelihood of matches. Two records are compared field by field, and each record is assigned a weight that indicates how closely the two fields match. Rules-based algorithms utilize preset confidence limits for particular data elements. Data quality tools may use a combination of these different approaches to detect likely incidents of duplicate medical records.

Even when records are digitized, there are still data duplication issues, and those can be compounded when hospitals start to share patient information.

Putting people in place with the responsibility to fix these issues is important, yet not every hospital has an enterprise master patient index team tasked with rooting them out and correcting errors.

Even when records are digitized, there are still data duplication issues, and those can be compounded when hospitals start to share patient information. Such integration is tough, as the U.K. discovered some years ago when it tried an ambitious project to have a unified patient records database. The NHS Connecting for Health project was scrapped in 2013 after 10 years and around £20 billion in cost.

More modest attempts continue in different countries -- each one bringing not only potential benefits, but also the possibility of further data duplication.

Patient safety is not the only issue: Medical fraud is a major problem in the U.S., with stolen medical records used to illicitly obtain prescription drugs. According to the FBI, a stolen medical record fetches $60 on the black market compared to just $1 for a Social Security number. In the U.K., some patients try to misrepresent themselves as European Union residents in order to get free medical care.

Some hospitals are taking a more radical approach to the problem. In the 2018 healthsystemCIO.com survey mentioned above, 15% of respondents reported using biometrics like fingerprints, palm vein recognition or iris recognition to identify patients. This certainly has promise, with fingerprint recognition being 99% accurate -- and even higher if multiple fingers are tested.

Data quality may seem a dry or even dull topic, but as we have seen, data duplication in a medical context can move from being a mild annoyance or cost issue to something that may be life-threatening. Hospitals are gradually improving the situation as they migrate to EHR systems, but that on its own is no silver bullet.

Not enough use is being made of modern data quality tools, and a combination of better organizational processes and the latest technology will be needed to whittle down the scale of patient record duplication. Data deduplication can save lives, and much remains to be done.

How data duplication in healthcare is diagnosed

Electronic health record systems have helped reduce duplicate patient data in hospitals -- but they haven't cured the problem. Find out how organizations are addressing the issue.

Costs and causes of data duplication

Approaches to data deduplication

Dig Deeper on Data governance

How The Sequoia Project is promoting health data usability

Lawmakers Introduce MATCH IT Act for Patient Matching, Interoperability

The Role of MPI Tools in Health Data Interoperability, Patient Matching

Contexture HIE Partners with Verato to Support Patient Matching

Costs and causes of data duplication

Approaches to data deduplication

Related Resources

Dig Deeper on Data governance

How The Sequoia Project is promoting health data usability

Lawmakers Introduce MATCH IT Act for Patient Matching, Interoperability

The Role of MPI Tools in Health Data Interoperability, Patient Matching

Contexture HIE Partners with Verato to Support Patient Matching