Voices in Statistics

The motto of the Statistics Group is to expand the knowledge co-operation in Statistics. Voices in Statistics tries to summarize all the discussions between the trios of knowledgebase - experts, novices and users (professionals).

Summary of the discussions will be compiled in a book-form. The book may re-use the content from the blogs of the users. Currently eight chapters -

  1. Basic Statistical Concept,
  2. Tools,
  3. Data Processing,
  4. Data Center,
  5. Discussion,
  6. Readers' Corner,
  7. Your Project , and 
  8. Useful Sites

are planned. More chapters may be added depending on the needs felt by the members of the group.

Users may add different sections in this book. The place of  a section will be decided according to the taxonomy of the book and topics. The content of sections may be revised by authors depending on the feedback in the form of comments and discussions taking place in  relevant forums. Revised content will be displayed with proper highlighting and with appropriate references to original source.

Purpose of book is to display content organized via consensus  in the form of a book so that it may be printed if required. It will help readers to get summarized opinion without going into the details of discussions and comments.

Basic Concepts in Statistics

This section will cover basic statistical concepts like concept of variation, central limit theorem, hypothesis testing etc. which are essential for use of statistical tools. Proposed topics for book may be seen from the taxonomy of topic.

Statistical Thinking

The term "Statistical Thinking" came in discussions with the evolution of computational power. Earlier it was a common assumption that the use of available statistical power gets limited by the horizon of computation.

This myth was broken after a lot more computational capability was achieved. It became clear that lack of statistical thinking is the main obstacle in the using statistics to its full potential. In fact, it was realized that statistical thinking provides a sound philosophical framework for using statistical methods in correct prospective. 

History

Although term Statistical Thinking seems to be related with whatever we do in name of Statistics, but it has specific meaning which got attention in nineties. This newer meaning was discussed by Snee (1990) and Moor (1990). This manner of thinking was further promoted by the Statistics Division of the American Society for Quality (ASQ) in 1994 when they set a goal to “enable broad application of statistical thinking” see Torback (2001).

Although origin of concept of Statistical Thinking has been attributed to field of quality control (which is based on interconnected processes), now it is being used in many areas like medical, market research etc. Especially field of Teaching of Statistics gave more place to this concept.

Concept

Following are key points of Statistical Thinking

  1. All works occur in system of inter connected processes.
  2. Variation exist in all processes
  3. Process of output varies according to special variation (due to special cause) and random variation (random cause). Understanding and reducing variation is key to success.

These concepts are in context of quality control. For giving it a broader meaning so that concept of Statistical Thinking may be generalized, it is reshaped by use of following concepts (see Wild and Pfannkuch)

  1. Statistics is science of variation
  2. Statistical tools works between context area and data
  3. Ultimate aim of use of statistical tool is enriching the context area.

On this site, concept of Statistical Thinking has been used in more broader sense by keeping its original spirit intact. For our purposes (to use statistical thinking in other area than engineering),  concept of Statistical Thinking is based on following components

  1. Study of variation
  2. Data
  3. Enriching context area
  4. Simulation

Given diagram represents how `Statistical Thinking’ may be perceived to accommodate current need.  

Example

Whenever we compare two sets of data, like income of people of two different geographical regions, generally emphasis given to comparing center of data for both area, without considering  variation in data. In such situation, misleading results may be obtained. Through example and diagrams, Chris Wild and others has presented  how one may get misleading view of reality and how variation should be incorporated in comparing two groups (in later part of article).

Teaching Tips

Curtsy to rktyagi 

It is difficult to get feeling of variation in comparison of  center of data. Even for getting feeling of center, median is more difficult to visualize in comparison of average. For giving better feeling of data, different scatter plots of data should be shown and students should identify groups in terms of `less- more’. Teacher can judge sensitivity of students towards variation for different type of tools used for Measuring Variation (see ..)
Latter on plot (scatter) mixture data of two or three type of groups and ask possible number of groups.
Five types of thinking that are considered as fundamental elements in Statistical Thinking are: recognition of the need for data, trans numeration, consideration of variation, reasoning with statistical models, and integrating the statistical with the contextual.
Are there particular ways of teaching that can elicit such thinking? How does the teacher draw students’ attention to notice and to attend to this thinking? How is such a habit of thinking communicated in a curriculum document?
Some teaching tips for Teaching of Statistical Thinking may be obtained from note by
Maxine Pfannukch
Pedagogical issues concerned with Statistical Thinking is in not matured yet, a framework based on three core issue may be considered- (1) The teachers and the researcher need to come to a common consensus of what they mean by the term statistical thinking and thus be able to communicate. (2) The teachers need to reflect critically on their current teaching and identify areas which are acting as barriers to the development of their students statistical thinking. (3) The constraints that are imposed externally on teaching need to be recognized and acknowledged. In this regard, one can get help from case study by Maxine Pfannukch and Chris Wild.

Using Tips

It is difficult to calculate variation in categorical data (see Data Type). Tools to measure variation of  Ratio Scale or Interval Scale data (see Data Type) cannot be applied for categorical data. Various type of diversity measure (see Measurement of  Variation) like Entropy Measure may be used for purpose.

References

Snee, R. (1990). Statistical Thinking and its Contribution to Quality. The American Statistician, 44(2), 116-121  
Moore, D. (1990). Uncertainty. In L. Steen (Ed.) On the shoulders of giants: new approaches to numeracy (pp. 95-137). Washington, D.C.: National Academy Press
Torback, L.D. (2001). Statistical Thinking, Pharmaceutical Technology,( Link http://pharmtech.findpharma.com/pharmtech/data/articlestandard//pharmtech/252002/22855/article.pdf )
Wild, C. and M. Pfannkuch, What Is Statistical Thinking? (Link http://icots6.haifa.ac.il/download_documents/word_documents/icots6_sample_paper_1.doc

Other Useful Links

New approach of teaching: This note tries to give a picture how course based on Statistical Thinking may be different from old one.
Using Statistical Thinking in Plant: 20 slides gives outline of statistical thinking applied to different area (managerial, sales, strategic etc.) of plant. Here it is important that such process was used by production system only. How outline presented here can be used in type of statistical is subject of discussion.
Statistical Thinking in Technological Environment: This is chapter in book Research on the Role of Teaching and Learning of Statistics. This edited book (procceddings of 1996 IASE round table conference) gives guidelines for new direction of statistics for different level of education.

Causality

Most of time for dealing with uncertainty we want to understand causality associated with some pattern. Statistical tools are used to measure it and how to discover it. Understanding causality with statistical point of view is one of biggest challenge particularly in area of social sciences where experiments are often impossible and observational studies are the norm. Yet in introductory statistics correlation and coefficient of determination are discussed for dealing causality. Most of time we use the phrase “correlation is not causality." This denial makes interpretation of results through statistical tool more complex because our causality is deep rooted way of interpretation for understanding uncertainty. To better satisfy the interests of user of statistics we must emphasize causality more in teaching statistics. There are many things that can be taught about causality that are not discipline specific. Students should be taught how to detect the causal connotations of words and phrases. Students must be taught to be proactive in seeking alternative explanations for differences, ratios and correlations in observational studies. Students must be taught the causal differences between description, prediction and explanation. Statistics should be expanded to include causality in ways that are discipline independent and professionally appropriate.  For understanding causality we can see http://en.wikipedia.org/wiki/Causality

Tools

As mentioned at home page of statistics group, discussion on statistical tool is central activity of group. Statistical tools are interface between data and conceptual images (may be in form of hypothesis or relationship in mathematical form) of context area. In this way, statistical tools are extension of statistical methods through different context area (like economics, sociology etc), different software platform (like SPSS, Stata etc) and different setup (classical and Bayesian). Thus, group will try to look different statistical methods in larger canvass so that it may be more useful in current scenario. Topics proposed as statistical tools may be seen from taxonomy of topic. How topic may be composed (subsection of topic) may be seen from taxonomy of book

Correlation and Regression

Correlation and Regression are most commonly used terms for applied statisticsians. Its fame is due to its different flavor- from line fitting to statistical modelling. It can be used as mathematical tool as well as statistical tool both. It has different level (set) of assumptions- weaker to stronger. With stronger assumptions one can get powerful result but it is difficult to justify in real field. So it is very important to judge that which version of this tool is suitable according to field situation.

Motivation

Motivation

Relation between two or more characteristics is one of the fundamental queries in development of human thought processes. Such relationship has been studied in different paradigms like causal system, control system, knowledge system. These paradigms are based on different type of believes like some thing is effect of some cause factors (event based causal system) or something may be controlled by some control factor (control system) or some thing may be explained by some explanatory factors (knowledge system). For example, it may be desired to know.

  1. What is the relationship between education and income? For each year of education, how much does income increase (on average)?

  2. What will be the rate of return on investment? For each dollar invested, how much will sales increase?

  3. For a political candidate, how many votes will he get for each unit of money he spends on advertising?

  4. With what confidence, height data may be used for taking decision regarding shoe size? In other words, how much variation in shoe size is explained by height?

  5. With how much confidence we can predict weather on basis of height of barometer?

  6. On basis of data, whether we have sufficient reason to consider parental education level as a cause for maximum level of education of child.

  7. Whether Marginal Propensity to Consume (MPC) is less than 1 as assumed by Keynes?

In statistics there are many tools to get answer based on relationship between characteristics (as in above example) available in form of data. Regression is one of them which works in boundary determined by its assumption and based on concept of dependent characteristics (effect) with independent characteristics (cause) . Although main answer from regression is measure of degree of closeness between cause and effect and change in effect with unit change in cause,  it may be used for prediction, validating causal factors, substituting more costly or non available information with set of other information. In fact dependent, and independent characteristics have different name in different framework, see Gujrati (2004) pp 50,  (not only `cause’ and `effect’). Although basic statistical tool is same, but due to different frame, different type of answer one may obtained (Schield 1995). 
Correlation is close associate of Regression and even widely used than Regression. Regression is a technique while Correlation is a measurement which measure degree of relationship between two variables (may be generalized). Generally speaking, Correlation is a common noun synonymous with ‘association’. In this non-technical sense, Correlation is necessary for causality. But in statistics, Correlation signifies a proper noun -- the Pearson linear product-moment Correlation. In this technical sense, Correlation is not necessary for causality  Both concept Correlation and Regression is so much intermingled, that without one it difficult to get understanding of other.

Prerequisites

Prerequisites

Data Types, Scatter Plot, Straight Line in Cartesian Plane, Normal Distribution, Expectation, Median, Variance, Parameters , Causality

History

History

Historically Correlation was not interpreted by its inventor Sir Fransis Galton (not Karl Pearson as many people assume) in same manner as we do (measurement of linear relation between two characteristics) .
Like his cousin Charls Darwin, Galton’s fascination with genetics and hereditary led him for  invention of modern notion of Correlation and Regression. He was trying to measure impact of parent generation on child one for various characteristics. He approached this problem, by examining self- fertilized (for minimizing impact of multiple parental source) sweet pea. He plotted size of parent sweet pea on X-axis and offspring pea on Y- axis and find that extremely large or small mother pea generated less extreme daughter pea. In other words the average size of offspring born of mother of a given size tended to move or “Regress” toward the average size in the population as a whole.
He tried to obtain regression coefficient by fitting line through median characteristics of offspring pea for size of given mother pea.
Although he used free hand line fitting technique, the important concept emerged from his realization was interrelation in form of variability in characteristics (size of mother and child pea) with dependency (slope of line) between characteristics (change in size of child pea with change in mother pea).  He found that if the degree of association (hereditary constant or current days Correlation) between two variables was held constant, then the slope of the regression line could be described if the variability of the two measures were known. At that time Galton believed he had estimated a single heredity constant that was generalizable to many or most inherited characteristics (see …). In his opinion, although there is single heredity constant, different slope for different properties of pea (like size, color) is due to different type of variability in mother and daughter pea.
In 1896, Pearson published his first rigorous treatment of correlation and regression in the Philosophical Transactions of the Royal Society of LondonPearson credited Bravis (1846) with ascertaining the initial mathematical formulae for correlation. Pearson noted that Bravais happened upon the product-moment (that is, the "moment" or mean of a set of products) method for calculating the correlation coefficient but failed to prove that this provided the best fit to the data. Using an advanced statistical proof (involving a Taylor expansion), Pearson demonstrated that optimum values of both the regression slope and the correlation coefficient could be calculated from the product-moment,  , where x and y are deviations of observed values from their respective means and n is the number of pairs.
Galton realized soon after he had collected and analyzed his sweet pea data that the generations prior to the immediate parents could also influence individual characteristics Pearson (1930). He even noticed that certain characteristics occasionally skipped one or more generations; a man may appear more similar to his grandfather than to his father in certain respects. In an 1898 paper to the journal Nature (cited in Pearson (1930)), Galton published a clever diagram that partitioned a unit square into successively smaller squares, where each square represented the ever diminishing influence of previous generations of ancestors on the present individual. Galton's conceptualization of the multiple influences of progenitors on characteristics of the present day individual was entirely parallel to the modern conception of multiple regression.
Bravais, A. (1846), "Analyse Mathematique sur les Probabilites des Erreurs de Situation d'un Point," Memoires par divers Savans, 9, 255-332.
Pearson, K. (1930), The Life, Letters and Labors of Francis Galton, Cambridge University Press.

Analogy

Analogy

Simplest form of regression can be undertaken as relation between two type of characteristics (for same entity like person, household etc) which may be represented as continuous data (variable see Data Type). For example, relation between earning and schooling of a person, relation between score and hr. of labour for individuals etc. General technique to study such relationship, is crating a graph on XY plane. Data shown on Y axis is called Dependent variable (DV or endogenous) and its counter part on X is called Independent variable (IV or exogenous) variable. to visitFigure 1 shows scatter plot monthly income (DV) in Rupees with level of education (IV). Although there are many straight lines (like HH, FF, LL) are possible with free hand to get summary of relationship between characteristics shown at Y and X. Best fit for scatter plot may be obtained by choosing a line in such a way so that it may minimize sum (e1+e2…en) of distances from point to proposed line (like FF).

Figure 1

Line Fitting

Data Processing

Statistics move around data. Data process cycle for large scale survey and census need special attention to get good quality data in short span. Through this chapter, group  will summarize challanges and solutions for processing quantitative and qualitative data. Proposed components of data processing for this chapter may be seen at taxonomy of book.

Data Center

Chapter will record experiences to handle data available at different places at web. These data may used as example for explaining statistical tools.  Data collected from personal effort of users for understanding pattern of variation in day to day activity as well as academic project will also be presented in this chapter. Proposed components of data processing for this chapter may be seen at taxonomy of book.

Discussions

Statistics has its social values. With misuse of statistics (due to non representative data and methodology), a lot of social tension has been generated. In noise of statistics, it is difficult to figure out ill motive of individuals and organization. This chapter will summarize discussions which are meant for changing the environment for better use of statistics. In this regard, this chapter will filter out those voices (from voices of statistics) which are trying to bring academic community as well as government organization closer by strengthening both organizations. Proposed components of discussion for this chapter may be seen at taxonomy of book.

Reader's Corner

In this chapter, link of useful articles (related with statistics) available at web will be presented along with comments (notes, example etc) by different users will be presented. Such comments may be generated through annotation tool 0snote.

Your Project

One can add sections for real life project in this chapter and in this way can discuss project with members of group. Each section of Your Project will contain details of different aspect of project (like formulation of project, schedule preparation, data collection, data processing etc.). Components for this chapter may be seen at taxonomy of book.

Useful Sites

In this chapter we will record useful sites for statistics which help in expanding knowledge base directly or indirectly. Sites metioned at different sections of this chapter may be treated as part of Virtual Library for statistics group. One can ask whether this virtual library is useful because available contents here are free. No. It may be one reason but not most important one. Most important reason is- these material may be used for collective reading (one way is to use reframeit add on to browser). These materials are easily modifiable, linkable and sharable in comparison of books available in real libraries of universities.

Member of group can send information regarding such sites if they are not mentioned in list of this chapter. For different nature of sites, there will be separate section. User can send their comments about particular site (specially how they would like to use site) through comment of section.

Although useful links may be mentioned at different chapters of this book but they will be in form of web pages not the site. Discussion on site will help statistics group in coordinating information (organizing and expanding library) available at different sites. Validation of these sites through comments of user will enrich content of this chapter.   

History of Statistics

Following sites and links provides information to understand historical background of various  concepts used in statistics:

Gateway of free resources

Following Sites are good sources of free resources including software and training materials. These sites provide links to enter in world of free web based materials. Their coverage is vast and hence patience is expected to explore them. Certainly group effort will make work easier.  

  1. International Statistical Institute (ISI): Although List of free tools and contents are not as wide as at Global Sociology and Stat Pages but generally links provide good materials. With multiple links for particular area, site suggest best for starting in particular area. It also provide good resources for teachers through its project International Statistical Literacy Project. International Association for Statistical Education (IASE) is the education section of the International Statistical Institute (ISI), but may also be joined independently by those who wish participate in IASE's activities, or simply to support the work on improving statistics education and extending its outreach. One of the main contribution of IASE is free resources for teaching through Statistics Education Research Journal (SERJ).

  2. Global Sociology: Although goal of site is global sociology i.e. to bring together knowledge of social, political and economic world. Site is good entry point for free stuff available in field of sociology, economics and political science. It also provides a page containing lists of free statistical software, along with mapping, spreadsheets, database, stuff to do data analysis or management. It also provide free statistical methods used in socio-economic area specially in qualitative data analysis. Site may be useful economist, sociologist, political scientist who want to important theories (used in sociology and economics) and methods (including statistical).

  3. Stat Pages: Well categorized list of software may be obtained at satatPages.org. It provide list of free and partially free software in different areas like general packages, survey, programming languages for statistics use, macros etc. It gives link for online statistical computational tools which are very useful for online courses. Statistics Online Computational Resource (SOCR)is one of them. It is designed for freely disseminate knowledge. It provides portable online aids for probability and statistics education, technology based instruction and statistical computing. SOCR tools and resources include a repository of interactive applets, computational and graphing tools, instructional and course materials. Detail of interactive materials has also been provided as wiki page.

  4. The Community College Consortium for Open Educational Resources (CCCOER) is a joint effort by many community colleges and university partners to develop and use open educational resources (OER) and especially open textbooks in community college courses.  It has good collection of statistics books also which are basically wikibooks. Link for a large number of OER sites has been placed at right side of home page. OER revolution is great help for e-learning.One can search OER content through UNESCO OER Toolkit/Finding and Using Open Educational Resources.

  5. StatLib was founded for distributing statistical software, datasets, and information by electronic mail, FTP and WWW in April of 1989 by Mike Meyer a then Senior Research Scientist, in the Department of Statistics, at Carnegie.

  6. System Analysis Laboratory: Material available at site is mainly concerned with decision theory (from statistics point of view) but site can be used for getting link for free statistical resources and web tools. Links have been organised in well structured manner.  Also provides links for interactive tools (like applets).

  7. Site of Betty C Jung: It is site for links on statistics related with public health specially biostatistics. It also proves links for training on basic statistics  and numerical literacy.

  8. Sharing of teaching experiences: online service for sharing experiences has been provided by the Australian Learning and Teaching Council. Through program Exchange council works for identifying, disseminating and embedding good individual practice and institutional practice into the higher education sector. The Exchange supports networking and the development of communities of practice across the higher education sector. This site is helpful for teachers to exchange ideas based on teaching experiments.

  9. Free Statistics: Free statistics information provides link for basic and advance learning resources from various field like engineering. Although links are limited but they are linked to relatively better sites.

  10. Statistical Literacy and Education: The International Association for Statistical Education (IASE) is a major source of statistics education. The International Statistical Literacy Project plays an important role for purpose of expanding statistical literacy. It is joint project of IASE and  International Statistical Institute. Under this project, contents related to statistical literacy across the world, among young and adults, in all walks of life. To this end, it provides an online repository of international resources and news in Statistical Literacy.
    Teaching Statistics is international quarterly online journal which appeared first time in 1979. It is published by Royal Statistical Society Center for Statistics Education. Its aim is to support teachers in teaching statistics to students up to age 19. 
    Another site,  Statlit.org provides link of resources for numerical literacy. Many links (for books) placed at site are not free. Articles are mostly free. Page by Milo Schield may be useful for getting resources on numerical literacy. 
    For statistical literacy many government organization like Statistics Canada, Statistics Australia etc. working. One can find such organization at Partners of Statistics section.  

Certainly these sites are useful for users of statistics mainly for them who are searching free statistical resources (specially software). Sites do not say any thing about how good software are, whether they crash etc.. Sharing experiences regarding these free software will be significant addition in knowledge base in area of statistics. Most of the sites metioned in next sections may be present as link in above mentioned four link pages.  

Consortium for the Advancement of Undergraduate Statistics Education (Cause)

Consortium for the Advancement of Undergraduate Statistics Education (Cause) is national organization whose mission is to support and advance undergraduate statistics education, in four target areas- resources, professional development, outreach, and research. This site is useful for under graduate teaching in same way as MSTE is useful for school level statistical teaching and statistical literacy. Site has large number of applets for purpose of teaching statistics. Site has lot of articles for professional development in teachers of statistics (Readings) as well as teaching tips. It also provides materials for starting workshop oriented teaching (Presenters Guide). Site also provide information obtained from researches in teaching of statistics.

This site is very useful for teachers. Anyone who has interest in this site may find book Research on the Role of Teaching and Learning of Statistics useful. This edited book (procceddings of 1996 IASE round table conference) gives guidelines for new direction of statistics for different level of education.

Open text books for statistics application courses

Free statistics books on web are available in four form- (1) Books in printable format (2) E-books (hypertext books) (3) Interactive books (4) Books available in form of notes and slides. We will try to cover all types of books (or materials) which may be used as text book for statistics course or can assist such courses.

Sites for link of books

  1. CCCOER: The Community College Consortium for Open Educational Resources (CCCOER) is a joint effort by many community colleges and university partners to develop and use open educational resources (OER) and especially open textbooks in community college courses.  It has good collection of statistics books also which are basically wikibooks.

  2. Textbook Revolution: is another site where one can get many types of free books and e-books, certainly book on statistics also.

  3. Teaching resources: It provides link for variety of teaching resources including book, tutorials, notes, data, iterative tools etc.

  4. Web-oriented Teaching Resources in Probability and Statistics: Links placed at site are those  books which having exercises and dynamic demonstration. For linked sites, there is hidden ranking. The better and more extensive a resource, the closer to the top of the table it is.

  5. UCLA Statistics Collection: It has large number of preprints, statistical thesis and dissertations.

Introductory books

  1. A new view of statistics: Although this e-book has been written for researchers and students in the sport and exercise sciences but it can help students and researchers struggling to understand stats in other disciplines. Basically it is easy to translate many problems in life as problem of sports and exercise. Although this book has discussed some advance topics like binomial regression and Baysian technique but its language is very simple, computational complexity has been avoided. Intuition, interpretation has got more emphasis. 

  2. Book for Life Science With emphasis on graph: Class notes created by C. J. Schwarz, Department of Statistics and Actuarial Science, Simon Fraser University, covers wide range of topics used in statistics course curriculum of life sciences. These notes are a summary of the important points in statistics course in life sciences. Notes on graphical representation may draw attention of users. As such, notes are not intended to be a complete replacement for a text book, nor intended to be a reference. In many sections, the details have been omitted with intention of that these will usually be covered in class.

  3. Site of David M. Lane: It contain online introductory statistics Textbook. and Online Tutorial for Help in Statistics Courses. It is mixed of free and advertisement link. Site has good link through stat primier and jokes.

  4. Concepts and Applications of Inferential Statistics: It is full-length, and occasionally interactive statistics textbook. It is a companion site of VassarStats, a web Site for Statistical Computation. Each chapter and chapter part can be saved as a PDF file for easy printing or off-line study.

  5. Online Statistics: An Interactive Multimedia Course of Study is an introductory-level statistics book. The material is presented both as a standard textbook and as a multimedia presentation. The book features interactive demonstrations and simulations, case studies, and an analysis lab. 

  6. Site of visual statistics provides work of selected professors, inclusing book also).

  7. The Little Handbook of Statistical Practice: Book written by Gerard E. Dallal avilable at site of Tufts University. It covers basic topics used in statistics application courses. 

 Books on advance topics 

  1. StatSoft: This e book cover advance topic covered in undergraduate and graduate statistics courses and covers a wide variety of applications, including laboratory research (biomedical, agricultural, etc.), business statistics and forecasting, social science statistics and survey research, data mining, engineering and quality control applications, and many others.
        The Electronic Textbook begins with an overview of the relevant elementary (pivotal) concepts and continues with a more in depth exploration of specific areas of statistics, organized by "modules," accessible by buttons, representing classes of analytic techniques. A glossary of statistical terms and a list of references for further study are included.

  2. Site of StatLink provide links for wide range of topics including Baysian statistics, computational statistics, stochastic processes, spatial statistics, time series, generalize linear models etc. Site also covers topic which has been used in different context area like social statistics, medical statistics, econometrics,  quality control, environmentrics, astrostatistics.

  3. Advance topic on methods of data and knowledge mining by using decision tree approach.

  4. Book on Time Series: Time series data has distinguished feature for its analysis. Unlike most of statistical tools time series tools are not based upon assumption of independence. Book one time series can be found at Federal Forecasters Consortium. This book covers traditional and some advance topics used in time series. Site also provides link for data.
    Another book (covering some advance models) can be found at site of Professor Hossein Arsham. More advance topic may be downloaded from MIT open courseware or from from site of Australian Bureau of Statistics. Data for time series can be downloaded from Time Series Data Library.

Site for statistics for managerial decisions

Site prepared by Prof. Hossein Arsham is useful for managerial decisions. Sites covers wide range of topics used in statistical decision making as well as computational environment of Excel. Sites also discuss challenges before statistics and statistics community. It also provide teaching techniques for statistical decision making.

Site for Mathematics, Science and Technology Education (MSTE)

The Site for Mathematics, Science and Technology Education (MSTE) program at the University of Illinois at Urbana-Champaign has evolved as a learning system community of practice that functions as a bridge among other such communities promoting collaboration between widely dispersed academic researchers, school teachers, administrators and students at all levels. This site has different type of courses like algebra, geometry, measurement, data analysis and probability theory, problem solving etc. This site trains different concept through project. This site is very useful for beginners in field of statistics but for advance learner, it presents model teaching techniques.

Anyone who has interest inthis site may find book Research on the Role of Teaching and Learning of Statistics useful. This edited book (procceddings of 1996 IASE round table conference) gives guidelines for new direction of statistics for different level of education.

Online courses for application of statistics

A lot of online courses are available for statistics learner on web but most of them are paid. Unlike textbook on web, these courses utilizes complete power of web and provide very interactive invironment for learner (through exersises and projects). Following are free sites for online statistics courses. 

  1. Open Learning Project: Through the Open Learning Initiative (OLI) project, Carnegie Mellon is working to help the World Wide Web make good and effective online education. Site provides free online education on different subjects like Statistics, Causal and Statistical Reasoning, Biology, Economics, Physics etc. OLI also provides projects based on use of statistics in real situation. Although a lot of materials are available on web for different courses, but OLI has its different identity due to its attempt to utilize virtual power- use of virtual laboratories, group experiments, simulations in true sense. This project has definite plan for evaluation of online training programs. One can see detail list of courses from list of courses page

  2. CAST: CAST stands for Computer-Assisted Statistics Textbooks and consists of a collection of electronic textbooks (e-books). Three e-books cover material in introductory statistical methods courses with data and scenarios from different application areas. Other e-books teach more advanced topics.

  3. Site of Tufts University is one of important source for Open Educational Resources (OER) movement, bringing access to educational content, tools, and infrastructure to educators, students, and self-learners. It has variety of courses including physics, medicine, epidemiology and biostatistics etc. Its statistics course ConStat Open Educational Recourses (OER) content.
    ConStats is a learning tool designed for introductory statistics students to actively experiment with statistical ideas and reasoning. Unlike data analysis programs, ConStats modules gives  hands-on experience with statistical concepts for deepening  understanding of the science of statistics.

  4. Connections: Site identify its own name as Connexions (and not connections). It provides a platform to view and share educational material made of small knowledge chunks called modules that can be organized as courses, books, reports, etc. Site claims its contents are modular and can be linked in different ways. Site also tell how to create collecton of modules for specific purpose. Such modular courses may be useful for online training courses. Contents at site come under open educational resources (OER) project. Site has good amount of material on statistics and mathematics. There are many collaborative books on statistics and statistics related areas.

  5. Web Interface for Statistics Education (WISE): A special feature of WISE is the sequence of interactive tutorials on key statistical concepts (sampling distributions, the central limit theorem, hypothesis testing, and statistical power). The tutorials use dynamic applets that allow the user to explore relationships on their own. Guided exercises are designed to help the learner to take full advantage of the applets to gain a deeper understanding of the concepts and logic that underlie much of inferential statistics.
    Although from statistics point of view coverage is not wide but site has good orientation for web based statistics education. Under Applet menu, own applets of site are placed. Under link menu there is applet item which provides link for those sites which have  applets for statistics learning purposes.
    Site also provides teaching aids also. Its unique feature is using signal detection theory for handling uncertainty in broader sense. Note on using JAVA applets tells how students can use the interactive tools for learning statistics.

  6. MIT Open courseware: It is large collection of modules for different subjects. Modules of statistics has been placed at mathematics department. Better to use search for finding all modules related with statistics. Notes from many modules are downloadable.

  7. Experiments at school: It is not exactly on line course but provide opportunity to learn statistics by real data. Its supporting project is census at school. These projects are part of Royal Statistical Society Statistics Education. Currently data from India is not available for census at school.

Above mention links may require login account. It is better to register at site.

Statistics for engineers

This site is useful for engineers who want to understand role of statistics in their field. Examples are taken from field of engineering. This increases worth of site for engineers. Although site has not much information of its own (specially for statistics) but it coordinates links for useful sites (mainly used by engineers). Link for some applets has been given but some has problem (for Bays applets). A lot of materials have been placed under new stuff.

Anyone who has engineering background may appreciate forum of engineers also.

I good resource for use statistics in  field of engineering is available at site of National Institute of Standards and Technology (NIST). Site may be slow some time.

Useful ideas for using statistics in field of engineering may be found at site DEN, a volunteer-based, non-commercial electronic communications resource available internationally to individuals and organizations interested in the past, present, and future of Dr. W. Edwards Deming's System of Profound Knowledge and related philosophies. 

Data visualization

Data visualization is about the combination of visual and statistical thinking. It is important tool for presentation of statistical conclusions as well as getting feeling of data. Like good writing, good graphical displays of data communicate ideas with clarity, precision, and efficiency. Following sites help to improve data visualization capability.

  1. The Best and Worst of Statistical Graphics:This sites has rich resources cocerned with graphics. It not only try to differenciate good and bad graph, but presents milestones in the history of Thematic Cartography,Statistical Graphics, and Data Visualization

  2. Informative Presentation of Tables, Graphs and Statistics: This page of university of Reading give outline of effective transmission of numerical information in project reports and serious publications, such as scientific papers.

  3. Data Presentation: A Guide To Good Graphics: It provides 50 slides showing how graphs and charts should be used.

  4. Just Plain Data Analysis: Although this is the companion website for a book on graphics but may help in understanding good graph and using them through Excel.

  5. Speaking of Graphics: In this essay those graphics that are derived from tabulated observations or measurements has been dicussed.

Data store

Although a lot of data are available on web but it is not easy to download them for use. Following are the problems to get data from web.

  1. Most of the data is archived and can be searched through search engine (at site) only.
  2. Some data is without proper document to understand nature of data.
  3. File may not be available in desired format.
  4. Resolve proprietary issues. Due to this reason at same site there may be data with different access control.
  5. It is difficult to judge size of data.

Keeping these problems in mind one can visit following sites for data search:

  1. IQSS Dataverse Network: The Institute of Quantitative Social Science (IQSS) is a virtual archive where one can store, permanently preserve, distribute, and generally or selectively share data or list data from other dataverses. It insure that data user properly site owner of data. It is wide network for data (quantitative, qualitative, report). Anyone can create own dataverse easily. In its network . The Henry A. Murray Research Archive (very large data archive) also belongs to its network. One can store as well as retrieve others data from this site also. Detail information about working of IQSS datavers is available at different site.

  2. Inter University Consortium for Political and Social Research (ICPSR): ICPSR provides training in data access, and methods of analysis for a diverse and expanding social science research community. Site address will be changed in mid august.  ICPSR provides network of many data banks like Terrorism & Preparedness Data Resource Center (TPDRC). TPDRC archives and distributes data collected by government agencies, non-governmental organizations (NGOs), and researchers about the nature of intra- (domestic) and international terrorism incidents, organizations, perpetrators, and victims; governmental and nongovernmental responses to terror and citizen's attitudes towards terrorism, terror incidents, and the response to terror.

  3. The United Nations Statistics Division:   UNdata provides links for data which may be obtained through different international/national agencies. Through wiki page it gives information about each of UNdata's sources and also includes links to the sources' home pages and databases, contact links, descriptions of the methodology used, and glossaries of terms, when available.

  4. Minnesota Population Center is source of census related demographic data. On its site one can found many projects related with data archives. Integrated Public Use Microdata Series is one of them. IPUMS-International is a project dedicated to collecting and distributing census data from around the world. It collect and preserve data and documentation, harmonize data and disseminate data freely. For getting data one should apply for registration and it may take time for approval. One can visit site as guest and can know how system works. Major limitation of data is that it is composed entirely of individual person and household records from population censuses. There are no macroeconomic, business, or aggregate statistics.

  5. Time Series Data Library: This is a collection of about 800 time series drawn from many different fields.

  6. DASL (Data And Story Library) is an online library of data files and stories that illustrate the use of basic statistics methods. Stories are abstracts that discuss the statistical concepts of a particular data file. It contains data from a wide variety of topics so that statistics teachers can find real-world examples that will be interesting to their students.

  7. University of Michigan Document Center: Link of data arranged in different categoris. 

  8. American Statistical Association: It provides data which may be useful for teaching. For each data file .dat there is story file .txt which explain how data should be interpreted in corresponding .dat file. 

  9. Dataset prepared by A.P.Gore, S.A.Paranjpe, M.B.Kulkarni of Pune University.  Description and covering note of data explain data in nice way. This dataset is more useful practical classes at (university level).

Sites for interactive teaching tools (including applets)

Applets, used as training kits, have significant role in interactive simulation based teaching. These applets increases participation of students in learning.

These applets may be found at following sites.

  1. CHANCE (which is good site for getting teaching material on probability theory) at Dartmouth Collage. Applets may be found at page. A good number of applets are also available with free probabilty book, Introdution to probability.

  2. Statistics Online Computational Resource (SOCR) are to design, validate and freely disseminate knowledge. Its resource specifically provides portable online aids for probability and statistics education, technology based instruction and statistical computing. SOCR tools and resources include a repository of interactive applets, computational and graphing tools, instructional and course materials.

  3. Onlinetextbook.com,  a site for interactive multimedia course for introductory-level statistics. Applets may be found at page.

  4. Site of WISE (Web Interface for Statistics Education).

  5. Department of Statistics, North Carolina State University. Bayes applets may be found at page.

  6. Link for large number of interactive tools at site of statLink

Partners of Statistics- Associations and Organizations

For development of statistics many association and organization are working. Nature of these organization are different (commercial, volunteer, government). There are many sites which provides list of statistical societies (like site of Stata, Statistical Science etc.) and government organizations. A large number of links at these sites are commercial or work in limited sphere. Following list are some of leading organization (may be government) from where any individual can get free help for purpose of learning and using statistics.

  1. International statistical institute: The International Statistical Institute (ISI) is one of the oldest international scientific associations functioning in the modern world. Its first congresses were convened in 1853, and it was formally established in 1885. The Institute is an autonomous society, which seeks to develop and improve statistical methods and their application through the promotion of international activity and co-operation. Many useful projects like  The International Statistical Literacy Project run under The International Association for Statistical Education (IASE) with association of ISI. Site of ISI provides links to other useful sites through page Other Websites of Possible Interest and Free Statistical Tools on Web.

  2. Statistics Canada: Statistics Canada is a leading government statistical agency which is involve in research. Apart from periodicals and series, it provides technical as well as analytical studies. From its site one can obtain The Common Metadata Framework for government statistical organizations. This valuable resource is being developed through the collective input on national and international statistical organizations. There are good material on learning resources (for kids) for statistics at site.

  3. Statistics Australia: Australian Bureau of Statistics(ABS) is leading government organization which apart from providing statistics related with training and education, works for expanding knowledge of statistics. Site also provides materials to provoke public discussion on statistical matters on which ABS has not formed a view.

  4. US Census Bureau: The Census Bureau serves as the leading source of quality data about American  people and economy. It handle it vast data through data tools given at site. It also works for expanding awareness regarding use of census data (specially among kids).

  5. Ministry of Statistics and Programme Implementation (India): From this site one can get link for Central Statistical Organization (CSO), National Statistical Commission, Nation Sample Survey. CSO is main source of information for official statistics in India. From this site list of links for Directorate of Economics and Statistics (DES) working at different state of India may be obtained. Site also provides list of concerned person for these DESs.

  6. The United Nations Statistics Division: Site can be very useful for getting standardization of statistical methods, classifications and definitions. From its organizational chart one can see how it is working for expanding role of statistics.  UNdata provides links for data which may be obtained through different international/national agencies. Through wiki page it gives information about each of UNdata's sources and also includes links to the sources' home pages and databases, contact links, descriptions of the methodology used, and glossaries of terms, when available.
    Its some of publications are very useful for standardization of surveys, functioning of government statistical agencies.

  7. The World Bank Institute (WBI) is one of the pioneer source for developing individual, organizational, and institutional capacity through the exchange of knowledge (learning program, publication)  among those countries. Specially knowledge for development (k4d) program may be helpful to understand role of knowledge in development. It provide verity of analytical reports which demonstrates use of statistics in socio-economic-political life.