Healthcare Analytic News reported on a new study published by the Health Information Science and Systems journal that may lead to billion dollar savings in Medicare reimbursement. Machine learning may become a useful tool in discovering Medicare fraud reclaiming anywhere from $19 billion to $65 billion lost to fraud each year.
Researchers from Florida Atlantic University’s College of Engineering and Computer Science used Medicare Part B data, machine learning and advanced analytics to automate fraud detection. They tested six different machine learners on balanced and imbalanced data sets, ultimately finding the RF100 random forest algorithm to be most effective at identifying possible instances of fraud. They also found that imbalanced data sets are more preferable than balanced data sets when scanning for fraud.
“There are so many intricacies involved in determining what is fraud and what is not fraud, such as clerical error,” Richard A. Bauder, senior author and a Ph.D. student at the school, said. “Our goal is to enable machine learners to cull through all of this data and flag anything suspicious. Then we can alert investigators and auditors, who will only have to focus on 50 cases instead of 500 cases or more.”
In the study, Bauder and colleagues examined Medicare Part B data from 2012 to 2015, which held 37 million cases, for instances such as patient abuse, neglect and billing for medical services that never occurred. The team narrowed the data set to 3.7 million cases, a number that would still represent a challenge for human investigators who are typically charged with pinpointing Medicare fraud.
The authors used the National Provider Identifier — a unique ID number issued by the government to healthcare providers — to match fraud labels to Medicare Part B data, which comprised provider details, payment and charge information, procedure codes, total procedures performed and medical specialty.
When researchers matched the NPI to the Medicare data, they flagged potentially fraudulent providers in a separate database. “If we can predict a physician’s specialty accurately based on our statistical analyses, then we could potentially find unusual physician behaviors and flag these as possible fraud for further investigation,” Taghi M. Khoshgoftaar, Ph.D., co-author and a professor at the school, said.
Surprisingly, researchers found that keeping the data set 90 percent normal and 10 percent fraudulent was the “sweet spot” for machine-learning algorithms tasked with identifying Medicare fraud. They thought the ratio would need to include more fraudulent providers for the learners to be effective.