One of the most frustrating parts of my PhD research has also been one of the most rewarding, or vice-versa. A significant portion of my research deals with gathering and analyzing medical data. These data have the property of being initially simple, but generally much more complex as you start digging into them. You might start with a simple query trying to gather all the encounters for which an alert was triggered, which sounds fairly straightforward, but then you have to decide do you want the first overall encounter for each patient or do you want the first encounter of each specific type (inpatient vs. ED). And then you find some encounters where a patient went to the ED but was admitted directly to the hospital as part of the same encounter. And you develop a method (hack?) for classifying things the way you want. But then later you discover that for some reason, there are cases where the ED Report for the encounter was actually filed after the discharge summary, meaning your classification hack (method?) only works for most of the encounters.
Finding these subtle distinctions is not generally something that happens on the first pass through the data. Generally its something I find as I reach what I think should be the end of things. And yes, its rewarding to know that I really am starting to fully understand the data, but at the same time, when a subtle change only affects 1% of the encounters, but I still feel like it would be dishonest to not redo the analysis with that change included, it can be frustrating, because while the numbers change, the results almost always stay the same. I mean, really, I'm the only one who would know the difference, and when the results don't change, its almost not worth re-doing the analysis, just to adjust the numbers slightly to reflect reality. And really, given the limitations of the study these numbers really only approximate reality as it is.
I had one of these moments today. Recycling as we speak.