The main reason why almost all econometric models are wrong
By Lars Syll
Since econometrics does not content itself with only making optimal predictions, but also aspires to explain things in terms of causes and effects, econometricians need loads of assumptions — most important of these are additivity and linearity. Important, simply because if they are not true, your model is invalid and descriptively incorrect. And when the model is wrong — well, then it is wrong.
Limiting model assumptions in economic science always have to be closely examined since if we are going to be able to show that the mechanisms or causes that we isolate and handle in our models are stable in the sense that they do not change when we ‘export’ them to our ‘target systems,’ we have to be able to show that they do not only hold under ceteris paribus conditions and a fortiori only are of limited value to our understanding, explanations or predictions of real economic systems.
Our admiration for technical virtuosity should not blind us to the fact that we have to have a cautious attitude towards probabilistic inferences in economic contexts. We should look out for causal relations, but econometrics can never be more than a starting point in that endeavour since econometric (statistical) explanations are not explanations in terms of mechanisms, powers, capacities or causes. Firmly stuck in an empiricist tradition, econometrics is only concerned with the measurable aspects of reality. But there is always the possibility that there are other variables – of vital importance and although perhaps unobservable and non-additive, not necessarily epistemologically inaccessible – that were not considered for the model. Those which were can hence never be guaranteed to be more than potential causes, and not real causes. A rigorous application of econometric methods in economics really presupposes that the phenomena of our real world economies are ruled by stable causal relations between variables. A perusal of the leading econom(etr)ic journals shows that most econometricians still concentrate on fixed parameter models and that parameter-values estimated in specific spatio-temporal contexts are presupposed to be exportable to totally different contexts. To warrant this assumption one, however, has to convincingly establish that the targeted acting causes are stable and invariant so that they maintain their parametric status after the bridging. The endemic lack of predictive success of the econometric project indicates that this hope of finding fixed parameters is a hope for which there really is no other ground than hope itself.
Real-world social systems are not governed by stable causal mechanisms or capacities. The kinds of ‘laws’ and relations that econometrics has established, are laws and relations about entities in models that presuppose causal mechanisms being atomistic and additive. When causal mechanisms operate in real-world systems they only do it in ever-changing and unstable combinations where the whole is more than a mechanical sum of parts. If economic regularities obtain they do it (as a rule) only because we engineered them for that purpose. Outside man-made ‘nomological machines’ they are rare, or even non-existent. Unfortunately, that also makes most of the achievements of econometrics – as most of the contemporary endeavours of mainstream economic theoretical modelling — rather useless.
Even in statistics, the researcher has many degrees of freedom. In statistics — as in economics and econometrics — the results we get depend on the assumptions we make in our models. Changing those assumptions — playing a more important role than the data we feed into our models — leads to far-reaching changes in our conclusions. Using statistics is no guarantee we get at any ‘objective truth.’
On the limits of ‘statistical causality’
Causality in social sciences — and economics — can never solely be a question of statistical inference. Causality entails more than predictability, and to really in depth explain social phenomena requires theory. Analysis of variation — the foundation of all econometrics — can never in itself reveal how these variations are brought about. First, when we are able to tie actions, processes or structures to the statistical relations detected, can we say that we are getting at relevant explanations of causation?
Most facts have many different, possible, alternative explanations, but we want to find the best of all contrastive (since all real explanation takes place relative to a set of alternatives) explanations. So which is the best explanation? Many scientists, influenced by statistical reasoning, think that the likeliest explanation is the best explanation. But the likelihood of x is not in itself a strong argument for thinking it explains y. I would rather argue that what makes one explanation better than another are things like aiming for and finding powerful, deep, causal, features and mechanisms that we have warranted and justified reasons to believe in. Statistical — especially the variety based on a Bayesian epistemology — reasoning generally has no room for these kinds of explanatory considerations. The only thing that matters is the probabilistic relation between evidence and hypothesis. That is also one of the main reasons I find abduction — inference to the best explanation — a better description and account of what constitute actual scientific reasoning and inferences.
In the social sciences … regression is used to discover relationships or to disentangle cause and effect. However, investigators have only vague ideas as to the relevant variables and their causal order … I see no cases in which regression equations, let alone the more complex methods, have succeeded as engines for discovering causal relationships.
David Freedman (1997:60)
Since statisticians and econometricians have not been able to convincingly warrant their assumptions of homogeneity, stability, invariance, independence, additivity as being ontologically isomorphic to real-world economic systems, there are still strong reasons to be critical of the econometric project. There are deep epistemological and ontological problems of applying statistical methods to a basically unpredictable, uncertain, complex, unstable, interdependent, and ever-changing social reality. Methods designed to analyze repeated sampling in controlled experiments under fixed conditions are not easily extended to an organic and non-atomistic world where time and history play decisive roles.
If contributions made by statisticians to the understanding of causation are to be taken over with advantage in any specific field of inquiry, then what is crucial is that the right relationship should exist between statistical and subject-matter concerns … The idea of causation as consequential manipulation is apt to research that can be undertaken primarily through experimental methods … However, the extension of the manipulative approach into sociology would not appear promising, other than in rather special circumstances … The more fundamental difficulty is that under the — highly anthropocentric — principle of ‘no causation without manipulation,’ the recognition that can be given to the action of individuals as having causal force is in fact peculiarly limited.
John H Goldthorpe (2000:159)
Why statistics and econometrics are not very helpful for understanding economies
As social researchers, we should never equate science with mathematics and statistical calculation. All science entail human judgement, and using mathematical and statistical models do not relieve us of that necessity. They are no substitutes for thinking and doing real science.
Most work in econometrics is made on the assumption that the researcher has a theoretical model that is ‘true.’ But — to think that we are being able to construct a model where all relevant variables are included and correctly specify the functional relationships that exist between them, is not only a belief without support, it is a belief impossible to support.
The theories we work with when building our econometric regression models are insufficient. No matter what we study, there are always some variables missing, and we do not know the correct way to functionally specify the relationships between the variables.
Every econometric model constructed is miss-specified. There is always an endless list of possible variables to include, and endless possible ways to specify the relationships between them. So every applied econometrician comes up with his own specification and ‘parameter’ estimates. The econometric Holy Grail of consistent and stable parameter-values is nothing but a dream.
The theoretical conditions that have to be fulfilled for econometrics to really work are nowhere even closely met in reality. Making outlandish statistical assumptions do not provide a solid ground for doing relevant social science and economics. Although econometrics has become the most used quantitative method in economics today, it is still a fact that the inferences made are as a rule invalid.
Econometrics is basically a deductive method. Given the assumptions, it delivers deductive inferences. The problem, of course, is that we will never completely know when the assumptions are right. Conclusions can only be as certain as their premises — and that also applies to econometrics.
On randomness and probability
Modern mainstream economics relies to a large degree on the notion of probability. To at all be amenable to applied economic analysis, economic observations have to be conceived as random events that are analyzable within a probabilistic framework. But is it really necessary to model the economic system as a system where randomness can only be analyzed and understood when based on an a priori notion of probability?
When attempting to convince us of the necessity of founding empirical economic analysis on probability models, neoclassical economics actually forces us to (implicitly) interpret events as random variables generated by an underlying probability density function.
This is at odds with reality. Randomness obviously is a fact of the real world. Probability, on the other hand, attaches (if at all) to the world via intellectually constructed models, and a fortiori is only a fact of a probability generating (nomological) machine or a well-constructed experimental arrangement or ‘chance set-up.’
Just as there is no such thing as a ‘free lunch,’ there is no such thing as a ‘free probability.’
To be able at all to talk about probabilities, you have to specify a model. If there is no chance set-up or model that generates the probabilistic outcomes or events – in statistics one refers to any process where you observe or measure as an experiment (rolling a die) and the results obtained as the outcomes or events (number of points rolled with the die, being e. g. 3 or 5) of the experiment – there strictly seen is no event at all.
Probability is a relational element. It always must come with a specification of the model from which it is calculated. And then to be of any empirical scientific value it has to be shown to coincide with (or at least converge to) real data generating processes or structures – something seldom or never done.
And this is the basic problem with economic data. If you have a fair roulette-wheel, you can arguably specify probabilities and probability density distributions. But how do you conceive of the analogous nomological machines for prices, gross domestic product, income distribution etc? Only by a leap of faith. And that does not suffice. You have to come up with some really good arguments if you want to persuade people into believing in the existence of socio-economic structures that generate data with characteristics conceivable as stochastic events portrayed by probabilistic density distributions.
We simply have to admit that the socio-economic states of nature that we talk of in most social sciences – and certainly in economics – are not amenable to analyze as probabilities, simply because in the real world open systems there are no probabilities to be had!
The processes that generate socio-economic data in the real world cannot just be assumed to always be adequately captured by a probability measure. And, so, it cannot be maintained that it even should be mandatory to treat observations and data – whether cross-section, time series or panel data – as events generated by some probability model. The important activities of most economic agents do not usually include throwing dice or spinning roulette-wheels. Data generating processes – at least outside of nomological machines like dice and roulette-wheels – are not self-evidently best modelled with probability measures.
When economists and econometricians – often uncritically and without arguments — simply assume that one can apply probability distributions from statistical theory on their own area of research, they are really skating on thin ice. If you cannot show that data satisfies all the conditions of the probabilistic nomological machine, then the statistical inferences made in mainstream economics lack sound foundations.
Statistical — and econometric — patterns should never be seen as anything other than possible clues to follow. Behind observable data, there are real structures and mechanisms operating, things that are — if we really want to understand, explain and (possibly) predict things in the real world — more important to get hold of than to simply correlate and regress observable variables.
Statistics cannot establish the truth value of a fact. Never has. Never will.
Sometimes we do not know because we cannot know
To understand real world ‘non-routine’ decisions and unforeseeable changes in behaviour, ergodic probability distributions are of no avail. In a world full of genuine uncertainty – where real historical time rules the roost – the probabilities that ruled the past are not those that will rule the future.
Time is what prevents everything from happening at once. To simply assume that economic processes are ergodic and concentrate on ensemble averages – and a fortiori in any relevant sense timeless – is not a sensible way for dealing with the kind of genuine uncertainty that permeates open systems such as economies.
When you assume the economic processes to be ergodic, ensemble and time averages are identical. Let me give an example: Assume we have a market with an asset priced at 100 €. Then imagine the price first goes up by 50% and then later falls by 50%. The ensemble average for this asset would be 100 € — because we here envision two parallel universes (markets) where the asset-price falls in one universe (market) with 50% to 50 €, and in another universe (market) it goes up with 50% to 150 €, giving an average of 100 € ((150+50)/2). The time average for this asset would be 75 € – because we here envision one universe (market) where the asset-price first rises by 50% to 150 €, and then falls by 50% to 75 € (0.5*150).
From the ensemble perspective nothing really, on average, happens. From the time perspective lots of things really, on average, happen.
Assuming ergodicity there would have been no difference at all. What is important with the fact that real social and economic processes are nonergodic is the fact that uncertainty – not risk – rules the roost. That was something both Keynes and Knight basically said in their 1921 books. Thinking about uncertainty in terms of ‘rational expectations’ and ‘ensemble averages’ has had seriously bad repercussions on the financial system.
Knight’s uncertainty concept has an epistemological founding and Keynes’ definitely an ontological founding. Of course, this also has repercussions on the issue of ergodicity in a strict methodological and mathematical-statistical sense.
The most interesting and far-reaching difference between the epistemological and the ontological view is that if one subscribes to the former — Knightian – view, you open up for the mistaken belief that with better information and greater computer-power we somehow should always be able to calculate probabilities and describe the world as an ergodic universe. As Keynes convincingly argued, that is ontologically just not possible.
To Keynes, the source of uncertainty was in the nature of the real — nonergodic — world. It had to do, not only — or primarily — with the epistemological fact of us not knowing the things that today are unknown, but rather with the much deeper and far-reaching ontological fact that there often is no firm basis on which we can form quantifiable probabilities and expectations at all.
Sometimes we do not know because we cannot know.
Keynes’ critique of econometrics — still valid after all these years
To apply statistical and mathematical methods to the real-world economy, the econometrician, as we have seen, has to make some quite strong assumptions. In a review of Tinbergen’s econometric work — published in The Economic Journal in 1939 — John Maynard Keynes gave a comprehensive critique of Tinbergen’s work, focusing on the limiting and unreal character of the assumptions that econometric analyzes build on:
(1) Completeness: Where Tinbergen attempts to specify and quantify which different factors influence the business cycle, Keynes maintains there has to be a complete list of all the relevant factors to avoid misspecification and spurious causal claims. Usually, this problem is ‘solved’ by econometricians assuming that they somehow have a ‘correct’ model specification. Keynes (1940:155) is, to put it mildly, unconvinced:
It will be remembered that the seventy translators of the Septuagint were shut up in seventy separate rooms with the Hebrew text and brought out with them, when they emerged, seventy identical translations. Would the same miracle be vouchsafed if seventy multiple correlators were shut up with the same statistical material? And anyhow, I suppose, if each had a different economist perched on his a priori, that would make a difference to the outcome.
(2) Homogeneity: To make inductive inferences possible — and being able to apply econometrics — the system we try to analyze has to have a large degree of ‘homogeneity.’ According to Keynes most social and economic systems — especially from the perspective of real historical time — lack that ‘homogeneity.’ It is not always possible to take repeated samples from a fixed population when we were analyzing real-world economies. In many cases, there simply are no reasons at all to assume the samples to be homogenous.
(3) Stability: Tinbergen assumes there is a stable spatio-temporal relationship between the variables his econometric models analyze. But Keynes argued that it was not really possible to make inductive generalisations based on correlations in one sample. As later studies of ‘regime shifts’ and ‘structural breaks’ have shown us, it is exceedingly difficult to find and establish the existence of stable econometric parameters for anything but rather short time series.
(4) Measurability: Tinbergen’s model assumes that all relevant factors are measurable. Keynes questions if it is possible to adequately quantify and measure things like expectations and political and psychological factors. And more than anything, he questioned — both on epistemological and ontological grounds — that it was always and everywhere possible to measure real-world uncertainty with the help of probabilistic risk measures. Thinking otherwise can, as Keynes wrote, ‘only lead to error and delusion.’
(5) Independence: Tinbergen assumes that the variables he treats are independent (still a standard assumption in econometrics). Keynes argues that in such a complex, organic and evolutionary system as an economy, independence is a deeply unrealistic assumption to make. Building econometric models from that kind of simplistic and unrealistic assumptions risk producing nothing but spurious correlations and causalities. Real-world economies are organic systems for which the statistical methods used in econometrics are ill-suited, or even, strictly seen, inapplicable. Mechanical probabilistic models have little leverage when applied to non-atomic evolving organic systems — such as economies.
Building econometric models can’t be a goal in itself. Good econometric models are means that make it possible for us to infer things about the real-world systems they ‘represent.’ If we cannot show that the mechanisms or causes that we isolate and handle in our econometric models are ‘exportable’ to the real-world, they are of limited value to our understanding, explanations or predictions of real-world economic systems.
(6) Linearity: To make his models tractable, Tinbergen assumes the relationships between the variables he study to be linear. This is still standard procedure today, but as Keynes (1939:564) writes:
It is a very drastic and usually improbable postulate to suppose that all economic forces are of this character, producing independent changes in the phenomenon under investigation which are directly proportional to the changes in themselves; indeed, it is ridiculous.
To Keynes, it was a ‘fallacy of reification’ to assume that all quantities are additive (an assumption closely linked to independence and linearity).
Econometric modelling should never be a substitute for thinking. From that perspective, it is really depressing to see how much of Keynes’ critique of the pioneering econometrics is still relevant today.
The limits of probabilistic reasoning
Probabilistic reasoning in science — especially Bayesianism — reduces questions of rationality to questions of internal consistency (coherence) of beliefs, but, even granted this questionable reductionism, it is not self-evident that rational agents really have to be probabilistically consistent. There is no strong warrant for believing so. Rather, there is strong evidence for us encountering huge problems if we let probabilistic reasoning become the dominant method for doing research in social sciences on problems that involve risk and uncertainty.
In many of the situations that are relevant to economics, one could argue that there is simply not enough of adequate and relevant information to ground beliefs of a probabilistic kind and that in those situations it is not possible, in any relevant way, to represent an individual’s beliefs in a single probability measure.
Say you have come to learn (based on own experience and tons of data) that the probability of you becoming unemployed in Sweden is 10%. Having moved to another country (where you have no own experience and no data) you have no information on unemployment and a fortiori nothing to help you construct any probability estimate on. A Bayesian would, however, argue that you would have to assign probabilities to the mutually exclusive alternative outcomes and that these have to add up to 1 if you are rational. That is, in this case – and based on symmetry – a rational individual would have to assign probability 10% to become unemployed and 90% to become employed.
That feels intuitively wrong though, and I guess most people would agree. Bayesianism cannot distinguish between symmetry-based probabilities from information and symmetry-based probabilities from an absence of information. In these kinds of situations, most of us would rather say that it is simply irrational to be a Bayesian and better instead to admit that we ‘simply do not know’ or that we feel ambiguous and undecided. Arbitrary and ungrounded probability claims are more irrational than being undecided in the face of genuine uncertainty, so if there is not sufficient information to ground a probability distribution it is better to acknowledge that simpliciter, rather than pretending to possess a certitude that we simply do not possess.
We live in a world permeated by unmeasurable uncertainty – not quantifiable stochastic risk – which often forces us to make decisions based on anything but rational expectations. Sometimes we ‘simply do not know.’ According to Bayesian economists, expectations tend to be distributed as predicted by theory.’ I rather think, as did Keynes, that we base our expectations on the confidence or ‘weight’ we put on different events and alternatives. Expectations are a question of weighing probabilities by ‘degrees of belief,’ beliefs that have preciously little to do with the kind of stochastic probabilistic calculations made by the rational agents modelled by probabilistically reasoning Bayesian economists.
We always have to remember that economics and statistics are two quite different things, and as long as economists cannot identify their statistical theories with real-world phenomena there is no real warrant for taking their statistical inferences seriously.
If you have a fair roulette-wheel, you can arguably specify probabilities and probability density distributions. But how do you conceive of the analogous ‘nomological machines’ for prices, gross domestic product, income distribution etc? Only by a leap of faith. And that does not suffice in science. You have to come up with some really good arguments if you want to persuade people to believe in the existence of socio-economic structures that generate data with characteristics conceivable as stochastic events portrayed by probabilistic density distributions! Not doing that, you simply conflate statistical and economic inferences.
The present ‘machine learning’ and ‘big data’ hype shows that many social scientists — falsely — think that they can get away with analysing real-world phenomena without any (commitment to) theory. But — data never speaks for itself. Without a prior statistical set-up, there actually are no data at all to process. And — using a machine learning algorithm will only produce what you are looking for. Theory matters.
Some economists using statistical methods think that algorithmic formalisms somehow give them access to causality. That is, however, simply not true. Assuming ‘convenient’ things like ‘faithfulness’ or ‘stability’ is to assume what has to be proven. Deductive-axiomatic methods used in statistics do no produce evidence for causal inferences. The real causality we are searching for is the one existing in the real world around us. If there is no warranted connection between axiomatically derived statistical theorems and the real-world, well, then we have not really obtained the causation we are looking for.
Freedman, David (1997). From Association to Causation via Regression. Advances in Applied Mathematics.
Goldthorpe, John H (2000). On sociology: numbers, narratives, and the integration of research and theory. Oxford: Oxford University Press
Keynes, J M (1939). Professor Tinbergen’s Method. Economic Journal.
Keynes, J M (1940). Comment. Economic Journal.
From: pp.5-10 of WEA Commentaries 8(3), June 2018