Review To Explain or to Predict
Things I have learned(new things I see):
- Explanatory modeling is more based on theoretical ideas such as theoretical constructs and the operationalization that connects the explanatory construct and dependent construct. Data are only measurements of those constructs. And the end goal of explanatory modeling is to understand the relationships between those constructs. An extreme example would be studying the relationship between intelligence(IQ) and long-term success in life. And a less extreme example is studying the relationship between the percentage of terrains and a country’s GDP, whereas the variables are tangible measurements.
While predictive modeling deals and learns directly from data, and the end goal is about predicting the data as well. In terms of Bias-Variance analysis, explanatory modeling is to minimize bias, while predictive modeling wants to reduce sum of Variance + Bias(thought just some metrics before).
2.Explanatory studies have to eliminate some variable due to endogeneity issues (cases when the dependent could be interpreted as the cause of an explanatory variable), or at least researchers should recognize if such issue exists for their studies.
3.In dealing with missing values, besides through them our or imputation, we could build models and learn a lot from it.(I also learn some patterns from corrupted/incorrect data as well, most concentrated on a date or around a date)
4.I think there are more conflated terms that need clarification. Here I am listing a few similar “conflated” terms:
“Learning” and “Teaching”: People tend to assume that expert on a field would make a good teacher largely because he/she is an expert on the field; or a good learner could be a good teacher largely because he/she is a good learner.
“Explaining why things happen in past” and “Changing the future”(retrospective-prospective) Some people may stand time making senses how things happened in the past(probably mostly historians); and some people are working hard to try to change/create the future. Or understanding why I made certain mistakes may have little help on reducing it. The idea of clarifying conflated terms/concepts in this paper helps me stop over-thinking how I make certain big mistakes, but focus more on avoid making similar mistakes in the future.(for now the latter is much more important than the former)
5.When I approach a problem, I focus heavily on the explanatory part. For instance, if I was to compete in the Netflix competition, I would probably put more effort building up a theoretical model how people rate movies, considering and digging into questions like
How does actor, or a particular cinema affect a movie rating, or other factors such as watch time? Or thinking hard how people have different ideas about rating 1 to 5. Sometimes people tend to rate anything they think moderately good a 5 but some people have a much more strict rules? How do we separate those people…
I think those are interesting research questions but maybe less important comparing to tuning some machine learning models, if my goal is to improve predict accuracy.