article thumbnail

Data Science, Past & Future

Domino Data Lab

One is data quality, cleaning up data, the lack of labelled data. In 2005, a colleague had moved to Seattle, and he was on a new project, and he kept calling me with these really weird questions about a new kind of service. They’re years away from being up to that point. You know what?

article thumbnail

Measuring Validity and Reliability of Human Ratings

The Unofficial Google Data Science Blog

We normally have lots of labelers and items in our dataset, and priors give a form of regularization that better handles cases where data might be sparse and makes the model less prone to overfitting. We derive our measurement of data quality, ICC, from the variance parameters in the model.$$ Instead, we measure with error.