I don’t see why we should give up so easily. Wrong is wrong. If you make a bad decision based on wrong data, you can’t just shrug it off. The fact there’s a lot of bad data out there isn’t a reason for accepting the status quo.
I think King also slides too easily from issues of quantity to conclusions about quality: “It seems pretty obvious that the more data you collect, the more mistakes will be embedded in the data.” Well, no, it’s not obvious to me. I guess if you do something badly, then doing it a lot will add up to a lot of badness. But it’s up to us if we accept poor quality data as a given.
King also says: “The main driver of big data is unstructured information and almost by definition, unstructured information is inexact...” By definition, unstructured data is unstructured. It’s not inexact. A photo of a face is unstructured data, but it’s not inexact.
What’s the oldest saying in IT? Garbage in, garbage out. Errors have been being rejected by “IT’s DNA” since the first programmer squished the first bug. King finishes by saying “data quality efforts have to be consistent and ongoing”. I agree. But I believe data quality is everybody’s business. Ensure data quality when it enters the system. Don’t encourage poor data habits with the idea that data personnel will be able to clean it up down the line. That’s a recipe for disaster. Melissa Data