Jim Harris is an eloquent opponent of the idea that you need data standards to make sense of big data. One of the examples he cites to support his position is Google Translate, which does an excellent job on translating human languages, and, like many of Google's signature innovations, depends on statistical comparisons of very large quantities of messy data, rather than any overt theory of language.
It's certainly true that many tasks originally associated with Artificial Intelligence have turned out to yield much more quickly to statistical algorithms than more theoretically “intelligent” processes. At the same time, researchers are studying the statistical capabilities of the brain. As a quick and obvious example, your brain works out a three dimensional world from your two eyes without, um, thinking about it.
But. But. But. The world we humans live in generates messy data. And yet there's a world within that world where the data is not inherently messy. This is the business world. If you find messy data in your business, it means you haven't defined your purposes and processes well enough.
A business is a machine, not an organism. Yes, it is a machine built and enacted by human beings for human reasons. But it is still a consciously designed system. The world's inherently messy data is yielding to analytics. But organizations must not assume that analytic success with, say, human language is an argument for all data to be messy.
Look at it this way. You could get a million people to saw a piece of wood. You could then pick the one that's the right size for your project. On the other hand, you could figure out the size you need ahead of time, measure out the wood, and saw it. Once.
We see computer storage and processing as pretty much free resources, so it's tempting to go the “messy” route. But every time you opt to go this way, you entrench your dependence on systems. You add a little sliver to the server population and to energy requirements. Most importantly, your organization becomes a little less intelligent. OCDQ