I’ve been fooling around with an anonymized data set on the side. Although this can be frustrating in its own way, it occurred to me that it does have the advantage of forcing me to see the data in the same way my algorithms see it: that is, the data is just some anonymous strings, values and identifiers. To the code, strings like “120 minute IPA” or “Dogfish Head Brewery” have no more significance than “Beer-12” or “Brewer-5317”, and the anonymous identifiers remove any subconscious or conscious tendencies of mine to impart more meaning to an identifier string than is present to the algorithms.

On the other hand, having anonymous identifiers prevents me from drawing any actual inspirations for utilizing semantics that might genuinely be leveragable by an algorithm. However, my current goal is to produce tools that are generically useful across data domains. In that respect, I think developing on anonymized data could actually be helping. Time will tell.