It is a truth universally acknowledged, that duplicate records are bad. They hurt analytics, increase operational overheads, make compliance a pain and increase risk. But well, there are so many challenges in the data stack, surely duplicate records can be something we can live with? How bad can it be? A recent survey by Black Book has quantified just how bad duplicate records can be. The survey found that an average hospital is spending an extra 1.5 million USD an year due to duplicate and fragmented patient records. 1.5 million USD! Lack of a master patient index is clearly a very costly affair. The survey also found that with hospitals with more than 150 beds and hundreds of thousands of records, it took approximately 5 months for data cleaning with data validation and normalisation. 5 months of data cleaning equals 625,000 USD duplicate data spend besides the software and implementation costs (1.5*5/12) Surely there is a faster and much cheaper way to get there? 😉