- experiment with tools / applications
- share your ideas
- contribute learning content
Please register to get privileged access (comment, blog, forum, 0sEd, 0sILE, 0sNotes etc.). Registration is free.
Historically Correlation was not interpreted by its inventor Sir Fransis Galton (not Karl Pearson as many people assume) in same manner as we do (measurement of linear relation between two characteristics) .
Like his cousin Charls Darwin, Galton’s fascination with genetics and hereditary led him for invention of modern notion of Correlation and Regression. He was trying to measure impact of parent generation on child one for various characteristics. He approached this problem, by examining self- fertilized (for minimizing impact of multiple parental source) sweet pea. He plotted size of parent sweet pea on X-axis and offspring pea on Y- axis and find that extremely large or small mother pea generated less extreme daughter pea. In other words the average size of offspring born of mother of a given size tended to move or “Regress” toward the average size in the population as a whole.
He tried to obtain regression coefficient by fitting line through median characteristics of offspring pea for size of given mother pea.
Although he used free hand line fitting technique, the important concept emerged from his realization was interrelation in form of variability in characteristics (size of mother and child pea) with dependency (slope of line) between characteristics (change in size of child pea with change in mother pea). He found that if the degree of association (hereditary constant or current days Correlation) between two variables was held constant, then the slope of the regression line could be described if the variability of the two measures were known. At that time Galton believed he had estimated a single heredity constant that was generalizable to many or most inherited characteristics (see …). In his opinion, although there is single heredity constant, different slope for different properties of pea (like size, color) is due to different type of variability in mother and daughter pea.
In 1896, Pearson published his first rigorous treatment of correlation and regression in the Philosophical Transactions of the Royal Society of LondonPearson credited Bravis (1846) with ascertaining the initial mathematical formulae for correlation. Pearson noted that Bravais happened upon the product-moment (that is, the "moment" or mean of a set of products) method for calculating the correlation coefficient but failed to prove that this provided the best fit to the data. Using an advanced statistical proof (involving a Taylor expansion), Pearson demonstrated that optimum values of both the regression slope and the correlation coefficient could be calculated from the product-moment, , where x and y are deviations of observed values from their respective means and n is the number of pairs.
Galton realized soon after he had collected and analyzed his sweet pea data that the generations prior to the immediate parents could also influence individual characteristics Pearson (1930). He even noticed that certain characteristics occasionally skipped one or more generations; a man may appear more similar to his grandfather than to his father in certain respects. In an 1898 paper to the journal Nature (cited in Pearson (1930)), Galton published a clever diagram that partitioned a unit square into successively smaller squares, where each square represented the ever diminishing influence of previous generations of ancestors on the present individual. Galton's conceptualization of the multiple influences of progenitors on characteristics of the present day individual was entirely parallel to the modern conception of multiple regression.
Bravais, A. (1846), "Analyse Mathematique sur les Probabilites des Erreurs de Situation d'un Point," Memoires par divers Savans, 9, 255-332.
Pearson, K. (1930), The Life, Letters and Labors of Francis Galton, Cambridge University Press.