Abstract

This article shows that Friedman's experiences with the research conducted at the Statistical Research Group (SRG) significantly shaped his 1953 methodology. These experiences gave him the ideas for a methodology for statistical analysis when dealing with “lousy data,” that is, when the conditions under which they are generated are not clear, when the data are “uncontrolled experiences.” Two views emerged from his SRG work. One is his view on theoretical assumptions: they only specify and do not determine under which conditions a model or device is expected to work. The other is his view on testing: the working of a device should be tested in comparison with a standard. Assessing the performance of a model with a benchmark model makes irrelevant the uncontrollable, only partly known, complex mass of circumstances under which both models perform. So both views were two sides of the same coin.

The role of statistics is not to discover truth.

—Milton Friedman and Rose Friedman (1998)

1. Introduction

Since its publication in 1953, Milton Friedman's essay on the methodology of positive economics has remained a puzzle: What kind of methodology is actually proposed? Its vagueness has allowed scholars to interpret its message in a wide variety of different directions. “Surely, if the essay had been really lucid, scholars should not today still having different opinions about what it says” (Friedman 2009: 355). While most philosophical debates have been centered around the question of which kind of “ism” it comes closest to—pragmatism, instrumentalism, falsificationism, or realism—historians have shown more consensus by emphasizing Friedman's prewar education, training, and work at Columbia University and NBER for a better understanding of the 1953 essay. But, remarkably, almost none of these histories take into account the rise of econometrics, which took shape in the same period in which the essay was written, and none, as far as I know, relates the essay to his wartime involvement in the Statistical Research Group (SRG) Division of War Research at Columbia University, for the Applied Mathematics Panel of the National Defense Research Committee, which was part of the Office of Scientific Research and Development.1 This article shows that Friedman's experiences with the research done at the SRG significantly shaped his 1953 methodology.

The reason that different readers get different messages from the 1953 essay is because it is a composite of different messages. Friedman's experiences at the SRG shaped one of its main messages, namely the message about the validity of a hypothesis: “The only relevant test of the validity of a hypothesis is comparison of its predictions with experience” (Friedman 1953: 8–9). But this validity criterion is not sufficient: “Observed facts are necessarily finite in number; possible hypotheses, infinite. If there is one hypothesis that is consistent with the available evidence, there are always an infinite number that are” (9). The problem then is how can we reduce the class of all possible hypotheses that are consistent with the evidence to a single one?

The additional problem is the inability to conduct “controlled experiments” in economics, and therefore the available evidence is based on “uncontrolled experience” (Friedman 1953: 10). Such evidence is “far more difficult to interpret. It is frequently complex and always indirect and incomplete. Its collection is often arduous, and its interpretation generally requires subtle analysis and involved chains of reasoning, which seldom carry real conviction” (10–11).

His experiences at the SRG gave him the ideas for a methodology for statistical analysis when dealing with “lousy data,” that is, when the conditions under which they are generated are not clear, when the data are uncontrolled experiences (Friedman and Friedman 1998: 145).

The SRG was organized, and its initial staff selected by Warren Weaver, chief of the Applied Mathematics Panel, on July 1, 1942. The staff was drawn from universities and research organizations through the United States. The group focused on statistical analysis of military problems and operated till its dissolution on September 30, 1945. W. Allen Wallis was its director of research throughout its existence. Friedman played a prominent role in the SRG; “effectively” he was its “Deputy Director” (Wallis 1980: 322).

The SRG is best known for having created sequential analysis. Although this technique was mainly developed by Abraham Wald, the initial ideas came from Friedman and Wallis (Wallis 1980). Friedman also played a crucial role in the development of the more general technique of sampling inspection, in particular the usage of the “operating characteristic curve” in deciding on the most appropriate inspection plan.

Besides being involved with the development of the more general method of inspection plans, Friedman was also involved with the design of specific inspection plans, such as to investigate the sensitivity of a radio proximity fuze and the best composition of a high-temperature alloy for jet engines.

This article will first explore more precisely these contributions to the sampling inspection techniques developed by the SRG, in section 2, and then in what way the SRG research outcomes shaped his early postwar contributions to economic methodology, which culminated in the 1953 essay on positive methodology (Friedman 1953). He already started to develop the first ideas for this essay, which were about shifting from “descriptive validity” to “analytical relevance” in late 1947, presented in section 4, when he had just started in Chicago where the Cowles Commission was then housed. His interactions with some of the Cowles researchers, discussed in section 3, show nicely what he has learned from his research experiences at the SRG. This article will show how his statistical methodology grew out of his work on designing inspection plans.

2. Friedman's Contributions to the Statistical Research Group

In their Memoirs, written together with Rose (1998: 133), Friedman sorts his ninety-eight reports and memos he has written for the SRG into five major topics: aircraft vulnerability, proximity fuze, sequential analysis, sampling inspection, and high-temperature alloys. Each topic, in varying degrees had an impact on his later ideas about economic analysis.

2.1. Aircraft Vulnerability and Proximity Fuze

Although the results of an early assignment that was mentioned in his Memoirs (Friedman and Friedman 1998: 134) have not been published, unlike the results of his research on the other topics, they nevertheless give a good impression of the kind of research that he did for the SRG.2 The assignment was to evaluate a possibly more effective antiaircraft shell. The standard shell shattered into a large number of fragments varying widely in size and shape. The idea was that an improvement in effectivity could be made by using a shrapnel shell, which, on explosion, would project forward a stream of uniform spherical pellets, because one would then be able control the size of the pellets to find its optimal size and compare the effectiveness of the resulting shell with the standard shell. The basic problem of finding an optimum size of the pellets was to find the optimum trade-off between size and number.

The most extensive work on aircraft vulnerability, however, was his statistical research in connection with the development of a new type of fuze for antiaircraft projectiles, a “radio proximity fuze.” The standard fuze used on antiaircraft rockets was a time-delay fuze that could be set to explode after a specific period. Therefore, three numbers had to be fed in to aim such a projectile: azimuth, elevation, and range. A so-called director controlling the gun converted the range into time delay.3 A radio proximity fuze was designed to explode the projectile when it was in the neighborhood of the target, so only two numbers had to be fed in: azimuth and elevation. This advantage was important particularly for air-to-air combat or defense against dive bomber attacks on ships because in both cases the ranges change so rapidly in the course of the combat or defense that accurate estimates were almost impossible.

Friedman was closely involved in the development of these fuzes to the extent that, at some point, he, “for his extensive and accurate knowledge of the way the fuzes actually performed” (Wallis 1980: 323), was considered one of the experts to be consulted by the army during the Battle of the Bulge in December 1944 about their best settings for air bursts of artillery shells against ground troops.

The research itself was to analyze the data from a large number (1,211 to be precise) of test-firings of rockets containing a fuze against simulated targets in order to determine the distribution of the point of burst in relation to the target. The design of the statistical analysis of this problem was published after the war as “planning an experiment for estimating the mean and standard deviation of a normal distribution from observations on the cumulative distribution” (Friedman 1947a). This article provides some important insights into the limitations of the type of research conducted at the SRG, namely, the lack of control over relevant experimental conditions.

The concrete problem was the design of an experiment to study the “sensitivity” of a radio proximity fuze, that is, the determination of “the probability that a nondefective fuze will operate if it passes within a certain distance of the target” (Friedman 1947a: 342). It was assumed that each fuze is characterized by a definite sensitivity, that is, a maximum distance at which it will function, and that these sensitivities are normally distributed among fuzes. The proportion T(d) of fuzes passing a given distance d from a target is then the cumulative normal distribution from d to ∞. It was, however, not possible to observe the sensitivity of a fuze directly, only its minimum distance from the target. The observations obtained are therefore observations on T(d) and not directly on the normal distribution of the sensitivities. The aim of the experiment was then to estimate the mean and variance of this normal distribution.

Although the design of such an experiment looks simple, the analysis of the data had to face some epistemological limitations. The idea of an experiment is that the test conditions are fixed, but in practice the data are collected under different conditions. “Theoretical considerations gave little indication of the way in which the performance of the fuze would be affected by the difference in conditions” (Friedman 1947a: 344). To somewhat reduce the variations of conditions, the scale of the experiment had to be limited to at most 250 rounds. The statistical problem then is that of distributing these rounds over different possible distances from the target in such a way that the estimates of the mean and variance are obtained as accurately as possible. “This problem serves to illustrate a number of considerations pertinent to such experiments; while no general, theoretical solution is offered, the mode of approach should be at least suggestive in similar situations” (344–45).

Another epistemological limitation in designing experiments was that “the most efficient design depends on the true value of the parameter under investigation, but if this were known there would be no need for the experiment” (Friedman 1947a: 345). This problem was usually solved by basing the experimental design on some advance estimate of the parameter. But in the case of the sensitivity of the proximity fuze, two parameters (mean and variance) had to be estimated and “the experimental design that is best for estimating one is not best for estimating the other” (345). The most efficient experiment is the one that minimizes both the variance of the estimate of the mean and the variance of the estimate of the variance of the normal distribution. The design must therefore be such that the experiment arrives at an “appropriate compromise” (346).

2.2. Sequential Analysis and Sampling Inspection

The best-known output of the SRG is the development of sequential analysis. Wallis's (1980) history of the SRG focuses primarily on the origins of sequential analysis. It is generally recognized that the founder of sequential analysis is Wald (see, e.g., Berger 2018: 12198), but such recognition is often accompanied with the comment that Friedman and Wallis had provided “substantial motivational and collaborative support,” which draws on Wald's own account of this origin in his two first publications on sequential analysis:

It was conjectured by W. Allen Wallis and Milton Friedman jointly that sequential test would have far reaching consequences for the theoretical foundations of statistics, beyond their immediate practical applications, and that in a sense they are more efficient than the current best tests, i.e., the “most powerful” test. More precisely, it was conjectured that there exists a sequential test that controls the errors of the first and second kind to exactly the same extent . . . as the current most powerful test, and at the same time requires an expected number of trials considerably smaller than the number of trials by the current most powerful test procedure. It was this conjecture of Friedman and Wallis which gave the author the incentive for carrying out the present investigation. (Wald 1943: 5–6; see also Wald 1945: 120–21)

To supplement Wald's brief historical account, Wallis (1980) provides a more detailed history of the role he and Friedman played in the early development of sequential analysis. It is based on a letter he wrote to Weaver in March 1950 and a memorandum, dated April 3, 1943, he wrote for Wald.4

Early in 1943, sequential analysis arose in connection with a specific problem posed by Navy Captain Garret L. Schuyler. To achieve a desirable degree of precision and certainty in ordnance testing, the required samples run easily to many thousands of rounds, which to Schuyler seemed to be too wasteful:

If a wise and seasonal ordnance expert like Schuyler were on the premise, he would see after the first few thousand or even few hundred [rounds] that the experiment need not be completed, either because the new method is obviously inferior or because it is obviously superior beyond what was hoped for. He said that you cannot give any leeway to Dahlgren [where the tests were carried out] personnel, whom he seemed to think often lack judgment and experience, but he thought it would be nice if there were some mechanical rule which could be specified in advance stating the conditions under which the experiment might be terminated earlier than planned. (Wallis 1980: 325)

When Wallis brought Schuyler's suggestion up during a lunch, Friedman also got interested and both started to work on this problem. They came to realize that some economy in sampling can be achieved merely by applying an ordinary single-sampling test sequentially.

The fact that a test designed for its optimum properties with a sample of predetermined size could be still better if that sample size were made variable naturally suggested that it might pay to design a test in order to capitalize on this sequential feature; that is, it might pay to use a test which would not be as efficient as the classical tests if a sample of exactly N were to be taken, but which would more than offset this disadvantage by providing a good chance of terminating earlier when used sequentially. (Wallis 1980: 325)

Friedman explored this idea and composed “a rather pretty but simple example involving Student's t-test” (Wallis 1980: 325). But they came to realize that “to squeeze the juice out of the idea” they needed “someone more expert in mathematical statistics” than themselves (325). They presented the problem to Wald in general terms for its basic theoretical interest; and as a practical example, they cited the problem of comparing two fire control devices with a hit or miss classification of each round. At first Wald was “not enthusiastic and was completely noncommittal,” he thought that “nothing would come of it; his hunch was that tests of a sequential nature might exist but would be found less powerful than existing tests” (326). But within a few days, he “phoned that he had found that such tests do exist and are more powerful, and furthermore he could tell us how to make them” (326). This idea became the procedure for sequential analysis of “double dichotomies” and was first published as an appendix to Wald 1943, in September 1943.5 “The problem of double dichotomies was the immediate practical problem out of which the sequential idea grew, and is one frequently encountered in the testing and developmental work of the National Defense Research Committee, the Army, and the Navy” (Wald 1943: 7). An extended version of this appendix was published almost a year later as “Sequential Analysis When the Result of a Single Observation Is a Classification as Good or Bad and When the Result of the Test Is a Decision between Two Methods or Product,” section 3 of Sequential Analysis of Statistical Data: Applications (SRG 1944, 1945).6 “Sequential analysis is a new technique for testing statistical significance. In other words, it is a new technique for deciding whether numerical data conform with some standard. We may, for example, need to decide whether the data on the muzzle velocity of a number of shells from a new lot show that the average muzzle velocity of the new lot is below some standard” (Wald 1943: iii). Wald 1943 was the theoretical underpinning of sequential analysis, “of interest chiefly in connection with the mathematical theory of statistics,” Wald had “not attempted to expound them for readers whose interest are primarily in applications” (vi). For that reason, the “companion report” Applications (SRG 1944, 1945) was published.

The most striking difference between Theory and Applications is the use of graphs to clarify sequential analysis and to make it applicable for practical purposes. These graphs do not show up in any of Wald's work and will not be found in any later econometrics paper on sequential analysis, but they are relevant to show Friedman's more substantial involvement in the development of sequential analysis that was not restricted to providing only “motivational and collaborative support.” The graphs are also helpful to clarify briefly the basic elements of sequential analysis. To do so the problem that led originally to the development of sequential analysis is taken as an exemplary case, the sequential analysis of double dichotomies (see sec. 3 of SRG 1945): the problem of choosing between two processes, a “standard” and an “experimental.” The two possible outcomes of a particular trial with either process is “success” and “failure.” The example that was given is the comparison between an experimental and a standard gun and the criterion of comparison is hitting a target. The decision is then to be either that the experimental process (shooting at a target with the experimental gun) is superior to the standard process (shooting with the standard gun), or that the standard is superior, where superior meant producing “a higher proportion of successes” (SRG 1945: 3.02).

The procedure of determining which of the two processes is superior is by taking an observation on each process and then making one of the three possible decisions:

  1. The experimental process is superior to the standard process.

  2. The standard process is superior.

  3. Further data are necessary to make a decision with sufficiently small risk of error.

When the third decision is made, an additional observation is taken on the two processes, and the three decisions are reconsidered. Each observation consists of an outcome of the standard process and an outcome of the experimental process. But only the pairs that consist of one failure and one success yield information about which of the two processes is superior; these are called the “pairs favorable to one process or the other,” or “favorable pairs,” in short. This procedure was graphically presented by a chart (see fig. 1), whose vertical scale indicates the number of pairs favorable to the experimental process, E, and whose horizontal scale indicates the number of all favorable pairs, n. The criteria for the three decisions are represented by a pair of parallel lines defined by their intercepts h1 and h2 and the same slope s. The investigator plots the total number of pairs favorable to the experimental process observed in the sequential testing against the total number of favorable pairs. The testing continues as long as the plotted points fall between the two lines. It terminates when a plotted line falls on or outside either line. If the point falls on or above the upper line, the experimental process is judged superior; if it falls on or below the lower line, the standard process is judged superior.

The sequential procedure is therefore defined by the values of three parameters: the slope, s, and the intercepts, h1 and h2, of the parallel lines. These three parameters represent the characteristics of a test. The first characteristic is the odds ratio u. Its effectiveness is expressed by the success odds, that is, the ratio of the proportion of success, p, and the proportion of failure, 1 – p, namely, p/(1 − p). The ratio of the success odds of each process is then taken as a measure of the difference between the experimental (E) and standard (S) process,

The second characteristic of a test are the two risks that the wrong process will be judged superior on the basis of the sample evidence. As a result, the values of the three parameters, s, h1, and h2, are determined by the following values of the test characteristics:

  • u1: the odds ratio below which the standard process is superior

  • u2: the odds ratio above which the experimental process is superior

  • α: the maximum risk of calling the experimental process superior when in fact the standard process is superior

  • β: the maximum risk of calling the standard process superior when in fact the experimental process is superior

Thus the “sequential plan” is actually determined by the choice of the values of these four variables.

According to Wallis (1980: 327), “One of the big theoretical developments after Wald's initial results was the discovery of methods of deriving power functions or operating characteristics curves,” a discovery made by Friedman. The sequential test is described by four values, the two odds ratios and the two maximum risks. When the test is constructed, however, it turns out to involve two parallel straight lines, which are described by three values: the slope and the two intercepts. “Friedman was the first to point this out, and he naturally asked, ‘Where is the vanished parameter?’ More specifically he pointed out that there must be many combinations of the original four parameters which lead to the same final set of three” (327). The combination of the four test characteristics leading to a given set of the three parameters is the probability (Lu) that the standard process is declared superior for each value of the odds ratio (u). Lu can be represented by the “operating characteristic curve”; see figure 2. Its complement is the power function.

The shape of this curve shows the sensitivity of the test. If u is very small, indicating that the standard process is actually far superior, the probability Lu that the sample data will lead to a decision in favor of the superiority of the standard is very high. And if u is very large, indicating the superiority of the experimental process, the probability of deciding in favor of the standard is very small. In between these two values, the probability that the sample data will lead to a decision in favor of the standard process declines as the superiority of the experimental increases. The steeper the curve, the better the discrimination between the two processes. The two odds ratios and the two maximum risks correspond with two points on the characteristics curve, and so the three points (0, 1), (u1, 1 – α), and (u2, β) determine its shape.

The main difference between the 1944 Applications (SRG 1944) and the 1945 Applications (SRG 1945) are the additions of these operating characteristics curves and the “average sample number curves.” The latter curves show the average number of favorable pairs for each odds ratio, in other words, the expected number of trials needed to make a decision. Friedman derived these curves for the binomial case, and Wald (1944) obtained general formulas for them (SRG 1945: iv). Both types of curves played a more prominent role in Sampling Inspection published in 1948. Although this work was originally developed because of a request in February 1945 by the navy “to prepare a manual on sampling inspection, including tables, procedures, and principles,” the 1948 publication was made suitable for the “peacetime needs of industry” to determine the “quality and acceptability” of product (SRG 1948: vii). The manuscript was prepared by Freeman, Friedman, Frederick Mosteller, L. Jimmie Savage, David H. Schwartz, and Wallis. Friedman, however, “bore the main burden of planning the Navy manual, of integrating the contributions of others, and of finally editing the entire manual” (vii). The first three parts, taking up less than half of the book, presented the principles of sampling inspection and a description of a standard procedure. The larger half consisted of two parts: part 4 explained the “construction of sampling tables and standard procedure,” “almost entirely the work of Mr. Friedman” (vii), and part 5 provided “an extensive catalogue of sampling plans” covering 157 pages of which each contained the “plans” for single, double, or sequential sampling classified by the sample size; requested level of quality and acceptability of the product to be tested; and the corresponding operating characteristics curves in three different colors drawn in one chart. One cannot find one mathematical expression in the whole book; the different inspection plans were presented only in terms of numbers and graphs. The operating characteristics can be computed for any sampling plan, but because the computations are difficult, the charts were drawn as accurately as possible, so that one can read from them the chance that a submitted inspection lot of any given quality will be accepted. For the selection of an appropriate sampling plan, the shape of the operating characteristics curve was informative. Its steepness reveals its discriminating power and so indicated the required sample size.

2.3. High-Temperature Alloys

In the middle of 1944, Friedman embarked on an assignment “that was to prove both challenging and instructive” (Friedman and Friedman 1998: 142) and was “to have a major effect on my approach to empirical work for the rest of my life” (143). The assignment was “to serve as a statistical consultant to a number of projects seeking to develop an improved alloy for use in airplane turbo-superchargers and as a lining for jet engines. The goal was to develop alloys that could withstand the highest possible temperature, since the efficiency of a turbine (or its equivalent) rises very rapidly with the temperature at which it can safely operate” (Friedman 1991: 48).

Before I address in what way this assignment was “instructive,” I will first present his involvement in this project resulting in two publications (Friedman 1947b and Friedman and Savage 1947), based on his own account given in Friedman 1991.7 This involvement went far beyond his assumed role as “a clearing agency for the results of the various experiments in progress, as an adviser on statistical design of experiments, and as an analyst of the results” (Friedman 1991: 48):

The procedure in testing an experimental alloy was to hang a specified weight on a standard turbine blade made from the alloy, put it in a furnace capable of generating a very high temperature, and measure the time it took for the blade to break. At one point, I combined the test data from all the separate experiments. . . . I ended up with a single proposed regression that expressed time to fracture as a function of stress, temperature, and variables describing the composition of the alloy. I assured myself that the equation was consistent with metallurgical theory. (48)

Friedman was “delighted” with the found regression: it had “a high multiple correlation, low standard error of estimate, and high t values for all of the coefficients, and it satisfied every other test statistic that I know of more than 40 years ago” (Friedman 1991: 48). To create a proposal for a better alloy, based on this regression, Friedman had “to go outside the joint range of [his] sample set of independent variables, but [he] was careful to stay as close as [he] could and to be within the limits used in prior experiments for each variable separately” (48). He arrived at two new alloys, which he labeled F-1 and F-2. According to the regression, “each would take several hundred hours to rupture at the very high temperature . . . a sizable multiple of the best recorded time for any previous alloy” (49). An MIT lab constructed and tested the two alloys, with the result that the two alloys “had ruptured in something like 1–4 hours, a much poorer outcome than for many prior alloys” (49).

From this disappointing experience, Friedman (1991: 49) drew the following lesson:

Ever since, I have been extremely skeptical or relying on projections from a multiple regression, however well it performs on the body of data from which it is derived; and the more complex the regression, the more skeptical I am. In the course of decades, that skepticism has been justified time and again. In my view, regression analysis is a good tool for deriving hypotheses. But any hypothesis must be tested with data or non-quantitative evidence other than that used in deriving the regression or available when the regression was derived. Low standard errors of estimate, high t values, and the like are often tributes to the ingenuity and tenacity of the statistician rather than reliable evidence of the ability of the regression to predict data not used in constructing it.

The irony of this lesson is that it has the same message as his prewar review of Jan Tinbergen's work for the League of Nations (Friedman 1940), which echoed Wesley C. Mitchell's criticism of attempts to find curves that fit the data with the highest attainable correlation coefficient: “Tinbergen is seldom satisfied with a correlation coefficient less than .98. But these attractive correlation coefficients create no presumption that the relationships they describe hold in the future” (Friedman 1940: 659). Before the war, the assessment of a hypothesis, theory, or model8 based on predictive performance was already a core element of Friedman's statistical methodology. But it was his involvement in sampling inspections for the military at the SRG that allowed him to develop the tools for the assessment of predictive performance. The sampling inspection procedures were based on the comparison of a “new device” or “experimental process” with an “old” or “standard” one. Since both devices were tested under the same but uncontrollable conditions, the comparison made the importance of controlling these conditions less relevant.

3. Friedman and the Cowles Commission

The kind of impact the research at the SRG had on Friedman's ideas of economic analysis can be best studied by looking closely at Friedman's interactions with the Cowles Commission at Chicago, almost immediately after the war.9 After the SRG's dissolution on September 30, 1945, Friedman had first a one-year appointment at the University of Minnesota before he “came home” to Chicago in September 1946 (Friedman and Friedman 1998: 156).

In 1943, the Cowles Commission started to have regular seminars, initially every three or four weeks but soon on a biweekly basis. Friedman recalled the seminars as “exciting events in which I and other members of our department participated regularly and actively. . . . Similarly, Cowles staff participated in and contributed to departmental seminars” (Friedman and Friedman 1998: 197). The Cowles Commission annual reports (1948, 1953, and 1954) mention three seminars at which Friedman presented papers. The first was on October 23, 1947, when he discussed “Utility Analysis of Gambling and Insurance”; the second was on January 10, when he presented “Price, Income, and Monetary Changes in Three Wartime Periods”; and the topic of the third, on November 20, 1952, was the effect of individual choice on the income distribution.10

But Friedman's interactions with the Cowles Commission went further than participating in these seminars alone. In addition to participating in some discussion meetings of the Cowles Commission,11 he played a contributing role in developing tests for the models Lawrence Klein was developing at Cowles, especially the naive model tests.

In 1945, Klein was commissioned to construct a model of the US economy in the spirit of Tinbergen's modeling method. A major concern was whether any relationship that was discovered as holding during the period 1921–41 would also hold after World War II. Reflecting this concern and the resulting realization that new data must be utilized as soon as it became available so that any existing finding could be updated, it was decided that Andrew W. Marshall, a graduate economics student at Chicago, be commissioned to compare data for 1946 and 1947 with the figures predicted by Klein's equations. Marshall's (1949b) study rejected a number of equations of Klein's model. Carl Christ (1949a, 1949b) thereupon modified Klein's model based on Marshall's test results to construct a new model for the United States. In his revised version of Klein's model, Christ acknowledged Friedman's “assistance and helpful criticism” (Christ 1949b: 1).12 This consisted of two elements. One was Friedman's proposal of an “idealized procedure” for choosing the best structure out of “an infinity of structures which explain any given set of observations” (Christ 1949b: 3). The other element was the proposal of the naive model test.

Both agreed that the best structure is “the one which gives the most accurate predictions of the future” (Christ 1949b: 3). But because we only know this after the model has been built, the choice of the best structure must be made on the basis of “immediately available criteria” (3) for which Friedman suggested the following:

  1. Generality.

  2. Simplicity.

  3. Correspondence with our theoretical ideas of what to expect (but if we have a poor theory, this criterion will mislead us).

  4. Accuracy of explanation of past observations (though we must be careful with this criterion, because it is necessary but not sufficient. . . . Remember that it is always possible to fit an nth degree polynomial exactly to a set of n  +  1 plane points, and that this very seldom makes for good prediction) (Christ 1949b: 3).

Despite explicitly mentioning Friedman's proposal for structure selection, Christ chose to follow the procedure that is “common to most econometric studies” (3), that is, the procedure developed at Cowles at that time: the construction of a model consisting of G independent simultaneous equations in G endogenous variables (suggested by theory), and then estimating the model's parameters with the available statistics.13 This procedure was “preferable because of its comparative simplicity. . . . The equations of the model can be set up once for all on the basis of previous knowledge, theoretical or empirical, and then the parameters can be estimated by straightforward statistical processes” (Christ 1949b: 5). Although Christ admits that there is a disadvantage with this procedure, as noted by Friedman, in that the “structures one would choose as ‘best’ for different sets of observations might not all be within the same model,” this is put aside as not being serious because “usually the econometrician knows nearly enough what observations to expect so that he can construct a model which does contain the ‘best’ structure for each likely set of observations” (5). Friedman's criticism was that theoretical criteria or the econometricians’ knowledge may identify a different structure than the available statistics.14

Christ and Friedman, however, agreed that the ultimate test of an econometric model is “checking up on the predictions it leads to” (Christ 1949b: 18). Marshall (1949a) had estimated tolerance intervals within which the unexplained residual of each equation would fall with a specific high probability if that equation continued to hold and were subject to the same kind of random disturbances. But in order to be “completely happy with the model,” an additional test condition was needed: the errors of prediction “should be no larger, on the average over a number of years if not every year, than the errors made by some naive noneconomic hypothesis such as ‘next year's value of any variable is equal to this year's value plus a random disturbance’” (23). Christ noted that the name “naive model” given to this noneconomic hypothesis had been suggested by Marshall (23) but that the procedure itself was attributable to Friedman (24).15

The results from Christ's work were presented to a conference on business cycle research organized by the NBER in November 1949, where Friedman was invited as one of the commentators. Friedman's (1951) comment on Christ's paper focused mainly on Marshall and Christ's postmodel tests: “It is one of our chief defects that we place all too much emphasis on the derivation of hypotheses and all too little on testing their validity. This distortion of emphasis is frequently unavoidable, resulting from the absence of widely accepted and objective criteria for testing the validity of hypotheses in the social sciences” (Friedman 1951: 107). Friedman emphasized that the naive models should be considered as “standards of comparison,” the “‘natural’ alternative hypotheses—or ‘null’ hypotheses—against which to test the hypothesis that the econometric model makes good predictions” (109).

Friedman argued that the validity of the model's equations should not be determined by high correlation coefficients: “The fact that the equations fit the data from which they were derived is a test primarily of the skill and patience of the analyst; it is not a test of the validity of the equations for any broader body of data” (Friedman 1951: 108). Although this appears to be a repetition of his earlier criticism of Tinbergen's modeling method, there is one crucial difference between his views on testing before and after the war: statistical testing of models should not be done solely by assessing how well they perform on data that are not used to build them, but this performance should also be assessed by comparing it to a standard, such as a naive model.

4. Friedman's Statistical Methodology

Despite these initial constructive interactions with the Cowles people, especially with Christ, Friedman became increasingly critical of the Cowles Commission's methodology, culminating in the 1953 essay. Although it was published in 1953, J. Daniel Hammond (2009) was able to identify an earliest draft of the essay with the title “Descriptive Validity vs. Analytical Relevance in Economic Theory,” which Friedman began writing in late 1947 or early 1948 and finished in the summer of 1948. It is this first version that is particularly important here because the period in which it was written is closer to Friedman's experiences with the research done at the SRG and probably just before getting engaged with Christ's work on testing econometric models (see n. 12).16

Friedman opens the first draft with the remark that theories are often judged by two criteria: the validity of their assumptions and the validity of their implications, but he argues that “the two tests, when examined critically, reduce to one, since the only relevant criterion for judging whether the assumptions are sufficiently good approximations for the purpose in hand is whether they lead to valid conclusions” (Friedman, quoted in Hammond 2009: 69). He distinguishes between “mere” validity of a theory and significance or importance, arguing that significant theories have assumptions that are “inaccurate representations of reality.” This is because a significant theory necessarily abstracts the crucial elements from the complex mass of circumstances surrounding the phenomenon under explanation.

To illustrate the analytical irrelevance of testing a hypothesis by its assumptions, he discusses the law of falling bodies in vacuum, mathematically expressed by the formula s = ½gt2 relating the distance a body falls to the force of gravity (g) and the time the body is falling (t). The relevant question about the validity of this formula is to know for which conditions outside the vacuum it “works.” “What are commonly regarded as the assumptions of a theory are in fact parts of the specification of conditions under which the theory is expected to work. And the only way to see which conditions are critical or at what point any of them become critical in the context of the others is to test the theory by its implications” (Hammond 2009: 70).

Hammond (2009: 70) notes that Friedman in a letter to Stigler, dated November 19, 1947, already used the formula of falling bodies to introduce Stigler to his idea that the only way to determine how closely assumptions correspond to reality is by checking the theory's prediction. It is this letter in which Friedman writes that he has gotten involved in discussions of “scientific methodology” and that he had been led “to go further than I had before in distinguishing between description and analysis” (Friedman, quoted in Hammond 2009: 69).

Another way to contextualize the development of Friedman's methodology is by exploring the statistical approaches Friedman was aware of during this period. According to David Teira (2007), Friedman was successively exposed to the statistical approaches of Ronald Fisher (via Harold Hotelling), Jerzy Neyman, and Savage.

In his Memoirs (Friedman and Friedman 1998: 44), Friedman recalls that, when he had moved from Chicago to Columbia for his second year of graduate study, Hotelling had influenced him most during that year. Hotelling's classes were based on the techniques introduced by Fisher, “probably as they appeared in his Statistical Methods for Research Workers” (Teira 2007: 513). According to Teira, a Fisherian assessment of statistical prediction was answering the question, “How do we justify our estimation of the value of a given parameter in a population by an analysis of a particular sample?” (514) and the most appropriate method of estimation was maximum likelihood recommended and popularized by Fisher. For the assessment of the correlation coefficient, Fisher's “analysis of variance” was applied. Although, according to Teira, Friedman exploited the virtues of the analysis of variance throughout his career, when the issue of predictive accuracy was raised in the 1953 essay, this particular approach was “not defended, nor ever mentioned” (515).

In 1937, Friedman attended the lectures by Neyman on his theory of sampling and must have been familiar with Neyman's approach by the early 1940s (see Teira 2007: 518). Moreover, Wald's theory of sequential analysis is based on Neyman and Egon S. Pearson's theory of hypothesis testing, which Wald (1943) called the “current procedure of testing statistical hypotheses” and which was outlined in chapter 1 of his Theory: the testing of a “null hypothesis” H0 against an “alternative hypothesis” H1 with the two main testing criteria being the “error of the first kind,” rejecting H0 when it is true, and the “error of the second kind,” accepting H0 when H1 is true, usually denoted respectively by α and β (cf. the sequential test characteristics above).

Whether Savage had some influence on Friedman's statistical methodology is harder to see. It is more likely that, if one wishes to talk in terms of “influence,” it was the other way around (see Teira 2007: 521). Teira (2007: 522) concludes that “Friedman's commitment to Savage as to the foundations of statistics must have been only in principle.” Moreover, as Teira notes, Friedman “did not credit Savage for the inspiration provided either in his 1953 methodological paper or in any other publication he authored at that time” (522). I would like to add that the view on the role of statistics, that the aim is not to discover truth, which Friedman attributed to Savage (see the epigraph to this article), is very characteristic of the Neyman and Pearson approach of hypothesis testing, where hypotheses are only accepted or rejected but never “proved” to be true (see, e.g., Friedman 1953: 9).

According to Friedman (1953: 7), “The ultimate goal of a positive science is the development of a ‘theory’ or ‘hypothesis’ that yields valid and meaningful (i.e., not truistic) predictions about phenomena not yet observed.” The importance of this goal is that consensus on “correct” economic policy then can depend “on the progress of positive economics yielding conclusions that are, and deserve to be widely accepted” (6). But Teira (2007: 523) concludes that Friedman's Essay in this sense must be considered as “incomplete”: “economic predictions failed to accomplish the social task they were assigned in 1953” because predictive success was not defined with respect to a particular standard. Teira argues that there were no “standards on which to decide which is the correct way to make statistical predictions” (512), even though they existed in Fisher's, Neyman's, and Savage's approaches and Friedman was well aware of them.

Although standards were not explicitly mentioned in Friedman's essay, they played a crucial role in his statistical methodology. The standard of evaluating economic predictions was the “alternative hypothesis” of the Neyman and Pearson framework of testing: “The hypothesis is rejected if its predictions are contradicted (‘frequently’ or more often than predictions from an alternative hypothesis); it is accepted if its predictions are not contradicted” (Friedman 1953: 9).

Sequential analysis was based on this framework. It was also this framework of assessment that Friedman proposed to the Cowles researchers to assess Klein's models: the naive model as “natural alternative hypothesis” to evaluate the predictive performance of them.

The probable reason why Cowles researchers were so open to Friedman's test proposals is that in the same period Wald was introducing the Neyman-Pearson framework to the SRG statisticians, Trygve Haavelmo (1944: iv) did the same for the Cowles econometricians:17 “The general principles of statistical inference introduced in this study are based on the Neyman-Pearson theory of testing statistical hypotheses.” “By introducing a few very general—and, in themselves, very simple—principles of testing statistical hypotheses and estimation, [Neyman and Pearson] opened up the way for a whole stream of high-quality work, which gradually is lifting statistical theory to a real scientific level” (Haavelmo 1944: 60).

According to Mary Morgan (1990: 152), the Neyman-Pearson testing procedures were not accepted into econometrics until the 1940s, because it is dependent on a probability approach. It was due to Haavelmo's “probabilistic revolution” that the Neyman-Pearson framework became the “blueprint for econometrics” (251).

5. Conclusion

In answering the question of how Friedman's experiences with statistical research for the military during the war years shaped his statistical methodology, one can detect two views that emerged from his SRG work. One is his view on theoretical assumptions—that they only specify and do not determine under which conditions a model or device is expected to work—and the other is his view on testing: that the working of a device should be tested in comparison with a standard. And both views were two sides of the same coin.

When designing a sampling inspection, one had to face two epistemological limitations. The first was that the data were not generated by an ideal lab experiment, and so the conditions under which the data were generated vary and are partly unknown. The problem is that theoretical considerations give little indication of the way in which the performance of the device will be affected by changing conditions (the gravity law only tells what happens in a vacuum). According to Friedman, this was the reason that social scientists were hired to do the statistical research:

Surprisingly, social scientists turned out to be extremely useful in wartime operational research—indeed, typically more useful than natural scientists. The reason was simple. Social scientists were accustomed to working with lousy data, and wartime operational data certainly fitted that category. Natural scientists were accustomed to dealing with carefully controlled experimental data, and were often at a loss how to handle the kind of data turned out in wartime. (Friedman and Friedman 1998: 145)

To Friedman, to make any progress in dealing with these kinds of problems “refined statistical techniques were of no great importance” (Friedman and Friedman 1998: 142). He did so once, when he tried to develop the optimal composition for an alloy that could stand the highest possible temperature. His model for this alloy was based on a regression that included time to fracture as a function of stress, temperature, and variables describing the composition of the alloy, with a high multiple correlation, low standard error of estimate, and high t values for all of the coefficients and consistent with metallurgical theory. But the functioning of the alloy based on this model was worse than many already existing alloys. It appeared to Friedman that such a model is only helpful in the design of an alloy but does not guarantee that it will function as expected. Separate tests are needed.

This experience was “instructive” because it told Friedman that the value of theoretical assumptions, model-structure equations, or hypotheses was not that they provide accurate descriptions but that they specify the conditions under which the theory, model, or hypothesis is expected to work. And the only way to see whether they “identify” the critical conditions is to test them by their implications.

This problem came back in his discussions with Christ on model building. A model that is accurate with respect to a specific data set is not necessarily equally accurate to another data set. There is no guarantee that the Cowles approach would “identify” the real structure. Additional tests were needed to verify whether the model also works for another data set. Because any data set includes a “complex mass of circumstances” that are not crucial to the structure, one still needs to figure out for which specific conditions a model is expected to work for other circumstances. Its predictive performance is a way to verify this. An accurate description is not yet analytically relevant.

That a model should predict well was already widely recognized, including by the Cowles econometricians. What Friedman, however, added to this kind of model validation is a procedure to assess the model's predictive performance. It should be done in comparison with a benchmark, a standard. From his earliest assignment (on the development of optimal antiaircraft shells), the SRG tasks have always been to statistically test whether a new design was an improvement with respect to a “standard” device. At the SRG, this kind of testing was theoretically grounded in a Neyman-Pearson framework in which any hypothesis to be tested was compared with an “alternative hypothesis.” Assessing the predictive performance of a model with a benchmark model makes irrelevant the uncontrollable, only partly known, complex mass of circumstances conditions under which both models perform.

Friedman's war research experience brought him new ideas for an “alternative approach to analyzing economic data,” for there is “no magic formula for wringing reasonable conjectures from refractory ad inaccurate evidence” (Friedman and Schwartz 1991: 39). One of the main lessons he drew from his war experiences was the need for a methodology for “uncontrolled experience”:

The necessity of relying on uncontrolled experience rather than on controlled experiment makes it difficult to produce dramatic and clear-cut evidence to justify the acceptance of tentative hypotheses. Reliance on uncontrolled experience does not affect the fundamental methodological principle that a hypothesis can be tested only by the conformity of its implications or predictions with observable phenomena; but it does render the task of testing hypotheses more difficult and gives greater scope of confusion about the methodological principles involved. (Friedman 1953: 40)

The methodological principle he learned from his research at the SRG was comparison with a benchmark.

Notes

1.

See, however, Klein 2000 and Mirowski 2002 for other relevant historical accounts on Friedman at the SRG.

2.

To give an idea about the SRG’s work, Wallis (1980: 323–24) provides a list of 13 of the 572 substantive reports, memoranda, and letters that were produced at the SRG, that is, the first, and every fiftieth thereafter, and the last. This list mentions a classified report by Friedman, dated August 28, 1945, titled “Relative Effectiveness of Caliber .50, Caliber .60, and 20 mm Guns as Armament for Multiple Anti-Aircraft Machine Gun Turrets.”

3.

See Klein 2016 for an excellent history how the developments of these kind of devices, via exponential smoothing models, diffused to Friedman’s work on adaptive expectations.

4.

Both are reproduced in Wallis 1980. Comparing the memorandum to Wald with Wald’s own account of the role of Wallis and Friedman, it appears strongly that Wald had used this memorandum for his own historical account.

5.

The term “double dichotomy” was used to indicate that the data were generated by two different processes, for example, shooting with an experimental gun and shooting with a standard one, each of which had two different outcomes: success and failure.

6.

SRG 1945 (SRG Report 255 / AMP Report 30.2R, published on September 15, 1945) was edited by Harold A. Freeman, M. A. Girshik, and Wallis, with the cooperation of Kenneth J. Arnold, Friedman, and Edward Paulson, and appeared as a ring binder of eight booklets (seven sections and one containing two appendixes). It is a revision and expansion of SRG 1944 (SRG Report 255 / OSRD Report 3926), a ring binder of seven booklets, edited by Freeman and published on July 15, 1944. SRG 1945 was expanded with sec. 7, “Sequential Analysis When Quality Is Measured by the Number of Defects per Unit and When the Question Is Whether a Standard Is Exceeded (Poisson Distribution).” All other sections, except sec. 5, were revised.

7.

This 1991 “cautionary tale” is almost reproduced verbatim in Friedman and Friedman 1998: 143–44.

8.

Because Friedman used these term often as synonyms, or at least did not distinguish them clearly, I follow this usage.

9.

See Boumans 2016 for a more detailed history of these interactions.

11.

At least he attended the discussion meeting on July 10, 1947, about Herman Rubin’s discussion paper titled “Systems of Linear Stochastic Equation” (Statistics, no. 301). Minutes of Cowles Commission Discussion Meeting, no. 104.

12.

Discussion Paper Econ 269 (Christ 1949b) is dated October 7, 1949. But the interaction with Friedman related to the revision of Klein’s model must have started at the latest early January of that year, because Christ (1949a: 14) acknowledges Friedman for providing him with a useful definition of equilibrium in Discussion Paper Econ 241, which is dated January 10. But there is a good reason that their discussions about modeling must have started earlier, in late 1948; namely, Christ (1949b: 3) notes that the “idealized procedure was first suggested to me about a year ago by Milton Friedman.”

13.

This latter procedure was presented in a paper written by Koopmans, Rubin, and Roy B. Leipnik, “Measuring the Equation System of Dynamic Economics.” Although it was not published until 1950, the procedure had already been presented at a Cowles Commission conference in 1945 on statistical inference in economics.

14.

Friedman gave a more detailed discussion of this “identification problem” in a long footnote (11) of his 1953 essay.

15.

This suggestion that Friedman was the originator of the naive model test and that Marshall was only its name giver was later confirmed by Christ (1951: 57) and Koopmans (1951: 4; 1956: 1; 1957: 203) and in Cowles Commission 1950.

16.

Hammond 2009 sketches the evolution of the 1953 essay through two early drafts into the final published version. The second is placed by Hammond in the fall of 1952.

17.

Haavelmo’s paper had been circulated among econometricians in an unpublished form in 1941 and became the basis of a new research program initiated at Cowles by Marschak in 1943 (Morgan 1990: 251).

References

Berger, James O.
2018
. “
Sequential Analysis
.” In
The New Palgrave Dictionary of Economics
, edited by M. Vernengo, E. Perez Caldentey, and B. J. Rosser,
12196
98
.
London
:
Palgrave Macmillan
.
Boumans, Marcel.
2016
. “
Friedman and the Cowles Commission
.” In
Milton Friedman: Contributions to Economics and Public Policy
, edited by Cord, R. A. and J. D. Hammond,
585
604
.
Oxford
:
Oxford University Press
.
Christ, Carl F.
1949a
. “
Further Comments on L. R. Klein's Economic Fluctuations in the United States 1921–1941 (Second Draft)
.” Discussion Paper Econ 241, Cowles Commission.
Christ, C. F.
1949b
. “
A Revised Klein Econometric Model for the United States, 1921–1947
.” Discussion Paper Econ 269, Cowles Commission.
Christ, C. F.
1951
. “
A Test of an Econometric Model for the United States, 1921–1947
.” In
Conference on Business Cycles
,
35
107
.
New York
:
NBER
.
Cowles Commission
.
1948
.
Cowles Commission for Research in Economics: Report for 1947
.
University of Chicago
.
Cowles Commission
.
1950
.
Cowles Commission for Research in Economics: Report for Period 1949–1950
.
University of Chicago
.
Cowles Commission
.
1953
.
Economic Theory and Measurement: A Twenty-Year Research Report, 1932–1952
.
Chicago
:
University of Chicago Press
.
Cowles Commission
.
1954
.
Cowles Commission for Research in Economics: Report for Period July 1, 1952–June 30, 1954
.
University of Chicago
.
Friedman, Milton.
1940
. Review of
Business Cycles in the United States of America, 1919–1932
, by J. Tinbergen.
American Economic Review
30
, no.
3
:
657
60
.
Friedman, M.
1947a
. “
Planning an Experiment for Estimating the Mean and Standard Deviation of a Normal Distribution from Observations on the Cumulative Distribution
.” In SRG 1947:
339
52
.
Friedman, M.
1947b
. “
Utilization of Limited Experimental Facilities When the Cost of Each Measurement Depends on Its Magnitude
.” In SRG 1947:
319
28
.
Friedman, M.
1951
. “
Comment
.” In
Conference on Business Cycles
,
107
14
.
New York
:
NBER
.
Friedman, M.
1953
. “
The Methodology of Positive Economics
.” In
Essays in Positive Economics
,
3
43
.
Chicago
:
University of Chicago Press
.
Friedman, M.
1991
. “
Appendix: A Cautionary Tale about Multiple Regression.” Addendum to Friedman and Schwartz 1991 
.
American Economics Review
81
, no.
1
:
48
49
.
Friedman, M.
2009
. “
Final Word
.” In
The Methodology of Positive Economics
, edited by U. Mäki,
355
.
Cambridge
:
Cambridge University Press
.
Friedman, Milton, and Rose D. Friedman.
1998
.
Two Lucky People: Memoirs
.
Chicago
:
University of Chicago Press
.
Friedman, Milton, and L. J. Savage.
1947
. “
Planning Experiments Seeking Maxima
.” In SRG 1947:
363
72
.
Friedman, M., and L. J. Savage.
1948
. “
The Utility Analysis of Choices Involving Risk
.”
Journal of Political Economy
56
, no.
4
:
279
304
.
Friedman, M., and A. J. Schwartz.
1991
. “
Alternative Approaches to Analyzing Economic Data
.”
American Economic Review
81
, no.
1
:
39
49
.
Haavelmo, Trygve.
1944
. “
The Probability Approach in Econometrics
.”
Econometrica
12
(supplement): iii–vi,
1
115
.
Hammond, J. D.
2009
. “
Early Drafts of Friedman's Methodology Essay
.” In
The Methodology of Positive Economics
, edited by U. Mäki,
68
89
.
Cambridge
:
Cambridge University Press
.
Klein, Judy L.
2016
. “
Disturbed Gun Sights and Exponential Smoothing, 1940–1968. Part A of Protocols of War and the Mathematical Invasion of Policy Space, 1940–1947
.” Unpublished manuscript.
Klein, Judy L.
2000
. “
Economics for a Client: The Case of Statistical Quality Control and Sequential Analysis
.”
Toward a History of Applied Economics
, edited by Roger E. Backhouse and Jeff Biddle.
History of Political Economy
32
(supplement):
25
70
.
Koopmans, Tjalling C.
1951
. “
Comments on Macro-Economic Model Construction
.” Discussion Paper Econ 2008, Cowles Commission.
Koopmans, T. C.
1956
. “
The Klein-Goldberger Forecasts for 1951, 1952 and 1954, Compared with Naive-Model Forecasts
.” Discussion Paper 12, Cowles Foundation.
Koopmans, T. C.
1957
.
Three Essays on the State of Economic Science
.
New York
:
McGraw-Hill
.
Koopmans, Tjalling, Herman Rubin, and Roy B. Leipnik.
1950
. “
Measuring the Equation System of Dynamic Economics
.” In
Statistical Inference in Dynamic Economic Models
, edited by T. C. Koopmans,
53
237
. Cowles Commission Monograph 10.
New York
:
Wiley
.
Marshall, A. W.
1949a
. “
A Note on the Use of Tolerance Intervals as Test Regions
.” Discussion Paper Stat 335, Cowles Commission.
Marshall, Andrew W.
1949b
. “
A Test of Klein's Model III for Changes of Structure
.” Unpublished thesis,
University of Chicago
.
Mirowski, Philip.
2002
.
Machine Dreams: Economics Becomes a Cyborg Science
.
Cambridge
:
Cambridge University Press
.
Morgan, Mary S.
1990
.
The History of Econometric Ideas
.
Cambridge
:
Cambridge University Press
.
SRG (Statistical Research Group)
.
1944
.
Sequential Analysis of Statistical Data: Applications
. Edited by H. A. Freeman.
New York
:
Statistical Research Group, Columbia University
.
SRG (Statistical Research Group)
.
1945
.
Sequential Analysis of Statistical Data: Applications
. Edited by H. A. Freeman, M. A. Girshik, and W. A. Wallis.
New York
:
Columbia University Press
.
SRG (Statistical Research Group)
.
1947
.
Selected Techniques of Statistical Analysis for Scientific and Industrial Research and Production and Management Engineering
. Edited by C. Eisenhart, M. W. Hastay, and W. A. Wallis.
New York
:
McGraw-Hill
.
SRG (Statistical Research Group)
.
1948
.
Sampling Inspection. Principles, Procedures, and Tables for Single, Double, and Sequential Sampling in Acceptance Inspection and Quality Control Based on Percent Defective
. Edited by H. A. Freeman, M. Friedman, F. Mosteller, and W. A. Wallis.
New York
:
McGraw-Hill
.
Teira, David.
2007
. “
Milton Friedman, the Statistical Methodologist
.”
History of Political Economy
39
, no.
3
:
511
27
.
Wald, Abraham.
1943
.
Sequential Analysis of Statistical Data: Theory
.
New York
:
Statistical Research Group, Columbia University
.
Wald, Abraham.
1944
. “
A General Method of Deriving the Operating Characteristics of Any Sequential Probability Ratio Test
.” Unpublished SRG memorandum.
Wald, Abraham.
1945
. “
Sequential Test of Statistical Hypotheses
.”
Annals of Mathematical Statistics
16
:
117
86
.
Wallis, W. Allen.
1980
. “
The Statistical Research Group, 1942–1945
.”
Journal of the American Statistical Association
75
, no.
370
:
320
30
.