So the dependent variable, y, is a measure of productivity.
One possible independent variable that would explain 100% (R squared=1) of the variation in y would be my level of motivation. This proves that a high R squared doesn't necessarily mean the model is a useful one, because this model isn't very useful. Of course I'm more motivated when I get more work done. But why? What I'd really like to find out is: what causes me to be more motivated? By explaining the variation in my motivation level, I can use this knowledge to turn lazy days into productive ones.
So I came up with a list of other possible independent variables to test:
SLEEP= # of hours of sleep the night before
JOG= # of miles I jogged in the morning
ZUMBA=a dummy variable equal to 1 if I had Zumba and 0 otherwise
LIST= a dummy variable equal to 1 if I made a to-do list and 0 otherwise
WINE= a dummy variable equal to 1 if I drank the night before and 0 otherwise
There are many problems with these possible variables.
SLEEP: While it may be true that to an extent more sleep leads to higher productivity, this doesn't really apply with too much sleep (for example, 13 hours makes me feel like I woke up out of a black hole and I don't want to accomplish anything). This could be a problem we observe when trying to apply data taken on developing world countries to the United States. For example, eating cheeseburgers in some African countries makes them "healthier" because any calories are better than no calories. But eating cheeseburgers where we have an abundance of food makes us less healthy.
JOG: This variable will likely show a high correlation with my productivity level, but it is questionable whether my productivity is what causes me to jog, or that jogging causes my productivity. This is a reason we should question statistics such as "eating breakfast makes you healthier." But maybe people eat breakfast because they are health conscious, rather than being health conscious because you eat breakfast. When put this way, the statistic doesn't make as much sense.
LIST: Maybe before I make the list, I am destined to be productive because I have a lot to do, and making the to do list merely indicates that I have to be productive, rather than actually causes productivity. For this reason, we should question statistics such as "reading to your baby in the womb will make them smarter." Maybe you read to your baby because you are
ZUMBA: I could arguably try to fudge this statistic so that it appears that Zumba increases my productivity. For example, I could only teach Zumba on weekdays, and then on weekends when I am less productive I didn't have Zumba. I would leave out the variable WINE so that it appears that when I have Zumba I am more productive, when the real reason is that I didn't go out the night before. That way, people will think Zumba causes productivity and want to come to my class. I could have an ulterior motive behind this study. This can be seen when cereal companies tell you that eating breakfast helps you loose weight or dairy companies tell you that milk decreases your chance of osteoporosis.
Can you think of possible issues with WINE? What about possible issues with other statistics you have heard? What leads to higher productivity is a question that still needs to be answered. But I do hope that I have encouraged you to question statistics that you would normally accept without question. What may appear to be true may not be true at all.