# AIOU Solved Assignments 1& 2 Code 8614 Spring 2020

AIOU Solved Assignments code B.Ed 8614 Spring 2020 Assignments 1& 2  Course: Introduction to Educational Statistics (8614) Spring 2020. AIOU past papers

ASSIGNMENT No: 1& 2
Introduction to Educational Statistics (8614) B.Ed 1.5 Years
Spring, 2020

## AIOU Solved Assignments 1& 2 Code 8612 Spring 2020

Q1. Differentiate between primary and secondary data. Explain their uses and benefits with examples . (20)

Data collection plays a very crucial role in the statistical analysis. In research, there are different methods used to gather information, all of which fall into two categories, i.e. primary data, and secondary data. As the name suggests, primary data is one which is collected for the first time by the researcher while secondary data is the data already collected or produced by others.

# Difference between Primary and Secondary Data

Data collection plays a very crucial role in the statistical analysis. In research, there are different methods used to gather information, all of which fall into two categories, i.e. primary data, and secondary data. As the name suggests, primary data is one which is collected for the first time by the researcher while secondary data is the data already collected or produced by others.

 BASIS FOR COMPARISON PRIMARY DATA SECONDARY DATA Meaning Primary data refers to the first hand data gathered by the researcher himself. Secondary data means data collected by someone else earlier. Data Real time data Past data Process Very involved Quick and easy Source Surveys, observations, experiments, questionnaire, personal interview, etc. Government publications, websites, books, journal articles, internal records etc. Cost effectiveness Expensive Economical Collection time Long Short Specific Always specific to the researcher’s needs. May or may not be specific to the researcher’s need. Available in Crude form Refined form Accuracy and Reliability More Relatively less

### Definition of Primary Data

Primary data is data originated for the first time by the researcher through direct efforts and experience, specifically for the purpose of addressing his research problem. Also known as the first hand or raw data. Primary data collection is quite expensive, as the research is conducted by the organisation or agency itself, which requires resources like investment and manpower. The data collection is under direct control and supervision of the investigator.

The data can be collected through various methods like surveys, observations, physical testing, mailed questionnaires, questionnaire filled and sent by enumerators, personal interviews, telephonic interviews, focus groups, case studies, etc.

### Definition of Secondary Data

Secondary data implies second-hand information which is already collected and recorded by any person other than the user for a purpose, not relating to the current research problem. It is the readily available form of data collected from various sources like censuses, government publications, internal records of the organisation, reports, books, journal articles, websites and so on.

Secondary data offer several advantages as it is easily available, saves time and cost of the researcher. But there are some disadvantages associated with this, as the data is gathered for the purposes other than the problem in mind, so the usefulness of the data may be limited in a number of ways like

A sequence of observations, made on a set of objects included in the sample drawn from population, is known as statistical data.

## Scope and purpose

Data analysis is the process of developing answers to questions through the examination and interpretation of data.  The basic steps in the analytic process consist of identifying issues, determining the availability of suitable data, deciding on which methods are appropriate for answering the questions of interest, applying the methods and evaluating, summarizing and communicating the results.

Analytical results underscore the usefulness of data sources by shedding light on relevant issues. Some Statistics Canada programs depend on analytical output as a major data product because, for confidentiality reasons, it is not possible to release the microdata to the public. Data analysis also plays a key role in data quality assessment by pointing to data quality problems in a given survey. Analysis can thus influence future improvements to the survey process. Data analysis is essential for understanding results from surveys, administrative sources and pilot studies; for providing information on data gaps; for designing and redesigning surveys; for planning new statistical activities; and for formulating quality objectives.

Results of data analysis are often published or summarized in official Statistics Canada releases.

## Principles

A statistical agency is concerned with the relevance and usefulness to users of the information contained in its data. Analysis is the principal tool for obtaining information from the data.

Data from a survey can be used for descriptive or analytic studies. Descriptive studies are directed at the estimation of summary measures of a target population, for example, the average profits of owner-operated businesses in 2005 or the proportion of 2007 high school graduates who went on to higher education in the next twelve months.  Analytical studies may be used to explain the behaviour of and relationships among characteristics; for example, a study of risk factors for obesity in children would be analytic.

To be effective, the analyst needs to understand the relevant issues both current and those likely to emerge in the future and how to present the results to the audience. The study of background information allows the analyst to choose suitable data sources and appropriate statistical methods. Any conclusions presented in an analysis, including those that can impact public policy, must be supported by the data being analyzed.

## Guidelines

### Initial preparation

• Prior to conducting an analytical study the following questions should be addressed:
• What are the objectives of this analysis? What issue am I addressing? What question(s) will I answer?
• Why is this issue interesting?  How will these answers contribute to existing knowledge? How is this study relevant?
• What data am I using? Why it is the best source for this analysis? Are there any limitations?
• Analytical methods. What statistical techniques are appropriate? Will they satisfy the objectives?
• Who is interested in this issue and why?

### Suitable data

• Ensure that the data are appropriate for the analysis to be carried out.  This requires investigation of a wide range of details such as whether the target population of the data source is sufficiently related to the target population of the analysis, whether the source variables and their concepts and definitions are relevant to the study, whether the longitudinal or cross-sectional nature of the data source is appropriate for the analysis, whether the sample size in the study domain is sufficient to obtain meaningful results and whether the quality of the data, as outlined in the survey documentation or assessed through analysis is sufficient.
• If more than one data source is being used for the analysis, investigate whether the sources are consistent and how they may be appropriately integrated into the analysis.

### Appropriate methods and tools

• Choose an analytical approach that is appropriate for the question being investigated and the data to be analyzed.
• When analyzing data from a probability sample, analytical methods that ignore the survey design can be appropriate, provided that sufficient model conditions for analysis are met. (See Binder and Roberts, 2003.) However, methods that incorporate the sample design information will generally be effective even when some aspects of the model are incorrectly specified.
• Assess whether the survey design information can be incorporated into the analysis and if so how this should be done such as using design-based methods.  See Binder and Roberts (2009) and Thompson (1997) for discussion of approaches to inferences on data from a probability sample.
• See Chambers and Skinner (2003), Korn and Graubard (1999), Lehtonen and Pahkinen (1995), Lohr (1999), and Skinner, Holt and Smith (1989) for a number of examples illustrating design-based analytical methods.
• For a design-based analysis consult the survey documentation about the recommended approach for variance estimation for the survey. If the data from more than one survey are included in the same analysis, determine whether or not the different samples were independently selected and how this would impact the appropriate approach to variance estimation.
• The data files for probability surveys frequently contain more than one weight variable, particularly if the survey is longitudinal or if it has both cross-sectional and longitudinal purposes. Consult the survey documentation and survey experts if it is not obvious as to which might be the best weight to be used in any particular design-based analysis.
• When analyzing data from a probability survey, there may be insufficient design information available to carry out analyses using a full design-based approach.  Assess the alternatives.
• Consult with experts on the subject matter, on the data source and on the statistical methods if any of these is unfamiliar to you.
• Having determined the appropriate analytical method for the data, investigate the software choices that are available to apply the method. If analyzing data from a probability sample by design-based methods, use software specifically for survey data since standard analytical software packages that can produce weighted point estimates do not correctly calculate variances for survey-weighted estimates.
• It is advisable to use commercial software, if suitable, for implementing the chosen analyses, since these software packages have usually undergone more testing than non-commercial software.
• Determine whether it is necessary to reformat your data in order to use the selected software.
• Include a variety of diagnostics among your analytical methods if you are fitting any models to your data.
• Data sources vary widely with respect to missing data.  At one extreme, there are data sources which seem complete – where any missing units have been accounted for through a weight variable with a nonresponse component and all missing items on responding units have been filled in by imputed values.  At the other extreme, there are data sources where no processing has been done with respect to missing data.  The work required by the analyst to handle missing data can thus vary widely. It should be noted that the handling of missing data in analysis is an ongoing topic of research.
• Refer to the documentation about the data source to determine the degree and types of missing data and the processing of missing data that has been performed.  This information will be a starting point for what further work may be required.
• Consider how unit and/or item nonresponse could be handled in the analysis, taking into consideration the degree and types of missing data in the data sources being used.
• Consider whether imputed values should be included in the analysis and if so, how they should be handled.  If imputed values are not used, consideration must be given to what other methods may be used to properly account for the effect of nonresponse in the analysis.
• If the analysis includes modelling, it could be appropriate to include some aspects of nonresponse in the analytical model.
• Report any caveats about how the approaches used to handle missing data could have impact on results

### Interpretation of results

• Since most analyses are based on observational studies rather than on the results of a controlled experiment, avoid drawing conclusions concerning causality.
• When studying changes over time, beware of focusing on short-term trends without inspecting them in light of medium-and long-term trends. Frequently, short-term trends are merely minor fluctuations around a more important medium- and/or long-term trend.
• Where possible, avoid arbitrary time reference points. Instead, use meaningful points of reference, such as the last major turning point for economic data, generation-to-generation differences for demographic statistics, and legislative changes for social statistics.

### Presentation of results

• Focus the article on the important variables and topics. Trying to be too comprehensive will often interfere with a strong story line.
• Arrange ideas in a logical order and in order of relevance or importance. Use headings, subheadings and sidebars to strengthen the organization of the article.
• Keep the language as simple as the subject permits. Depending on the targeted audience for the article, some loss of precision may sometimes be an acceptable trade-off for more readable text.
• Use graphs in addition to text and tables to communicate the message. Use headings that capture the meaning (e.g. “Women’s earnings still trail men’s”) in preference to traditional chart titles (e.g.”Income by age and sex”). Always help readers understand the information in the tables and charts by discussing it in the text.
• When tables are used, take care that the overall format contributes to the clarity of the data in the tables and prevents misinterpretation.  This includes spacing; the wording, placement and appearance of titles; row and column headings and other labeling.
• Explain rounding practices or procedures. In the presentation of rounded data, do not use more significant digits than are consistent with the accuracy of the data.
• Satisfy any confidentiality requirements (e.g. minimum cell sizes) imposed by the surveys or administrative sources whose data are being analysed.
• Include information about the data sources used and any shortcomings in the data that may have affected the analysis.  Either have a section in the paper about the data or a reference to where the reader can get the details.
• Include information about the analytical methods and tools used.  Either have a section on methods or a reference to where the reader can get the details.

## AIOU Solved Assignments 1& 2 Code 8614 Spring 2020

Q2. Explain the measures of central tendency and measures of dispersion. How these two concepts are related? Suggest one measure of central tendency with logical reasons. (20)

Collecting data can be easy and fun. But sometimes it can be hard to tell other people about what you have found. That’s why we use statistics. Two kinds of statistics are frequently used to describe data. They are measures of central tendency and dispersion. These are often called descriptive statistics because they can help you describe your data.

### Range, variance and standard deviation

These are all measures of dispersion. These help you to know the spread of scores within a bunch of scores. Are the scores really close together or are they really far apart? For example, if you were describing the heights of students in your class to a friend, they might want to know how much the heights vary. Are all the men about 5 feet 11 inches within a few centimeters or so? Or is there a lot of variation where some men are 5 feet and others are 6 foot 5 inches? Measures of dispersion like the range, variance and standard deviation tell you about the spread of scores in a data set. Like central tendency, they help you summarize a bunch of numbers with one or just a few numbers.

A population is the collection of all people, plants, animals, or objects of interest about which we wish to make statistical inferences (generalizations). The population may also be viewed as the collection of all possible random draws from a stochastic model; for example, independent draws from a normal distribution with a given population mean and population variance.

A population parameter is a numerical characteristic of a population. In nearly all statistical problems we do not know the value of a parameter because we do not measure the entire population. We use sample data to make an inference about the value of a parameter.

A sample is the subset of the population that we actually measure or observe.

A sample statistic is a numerical characteristic of a sample. A sample statistic estimates the unknown value of a population parameter. Information collected from sample statistic is sometimes refered to as Descriptive Statistic.

Here are the Notations that will be used:

$$X_{ij}$$ = Observation for variable j in subject i .

$$p$$ = Number of variables

$$n$$ = Number of subjects

In the example to come, we’ll have data on 737 people (subjects) and 5 nutritional outcomes (variables). So,

$$p$$ = 5 variables

$$n$$ = 737 subjects

In multivariate statistics we will always be working with vectors of observations. So in this case we are going to arrange the data for the p variables on each subject into a vector. In the expression below, $$\textbf{X}_i$$ is the vector of observations for the $$i$$th subject,  $$i$$ = 1 to $$n$$ (737). Therefore, the data for the $$j$$th variable will be located in the $$j$$th element of this subject’s vector, $$j$$ = 1 to $$p$$ (5).

$\mathbf{X}_i = \left(\begin{array}{l}X_{i1}\\X_{i2}\\ \vdots \\ X_{ip}\end{array}\right)$

### Learning Objectives & Outcomes

Upon completion of this lesson, you should be able to do the following:

• interpret measures of central tendancy, dispersion, and association;
• calculate sample means, variances, covariances, and correlations using a hand calculator;
• use software like SAS or Minitab to compute sample means, variances, covariances, and correlations.

### Central Tendency: The Mean Vector

Throughout this course, we’ll use the ordinary notations for the mean of a variable. That is, the symbol \mu is used to represent a (theoretical) population mean and the symbol \bar{x} is used to represent a sample mean computed from observed data. In the multivariate setting, we add subscripts to these symbols to indicate the specific variable for which the mean is being given. For instance, \mu_1 represents the population mean for variable x_1 and \bar{x}_{1} denotes a sample mean based on observed data for variable \bar{x}_{1}.

The population mean is the measure of central tendency for the population. Here, the population mean for variable j is

\mu_j = E(X_{ij})

The notation E stands for statistical expectation; here E(X_{ij}) is the mean of X_{ij} over all members of the population, or equivalently, over all random draws from a stochastic model. For example, \mu_j = E(X_{ij}) may be the mean of a normal variable.

The population mean \mu_j for variable j can be estimated by the sample mean

\bar{x}_j = \frac{1}{n}\sum_{i=1}^{n}X_{ij}

Note: the sample mean \bar{x}_{j}, because it is a function of our random data is also going to have a mean itself. In fact, the population mean of the sample mean is equal to population mean \mu_j; i.e.,

E(\bar{x}_j) = \mu_j

Therefore, the \bar{x}_{j} is unbiased for \mu_j.

Another way of saying this is that the mean of the \bar{x}_{j}’s over all possible samples of size n is equal to \mu_j.

Recall that the population mean vector is \mathbf{\mu} which is a collection of the means for each of the population means for each of the different variables.

\mu = \left(\begin{array}{c} \mu_1 \\ \mu_2\\ \vdots\\ \mu_p \end{array}\right)

We can estimate this population mean vector, \mathbf{\mu}, by \bar{x}. This is obtained by collecting the sample means from each of the variables in a single vector. This is shown below.

\bar{x} = \left(\begin{array}{c}\bar{x}_1\\ \bar{x}_2\\ \vdots \\ \bar{x}_p\end{array}\right) = \left(\begin{array}{c}\frac{1}{n}\sum_{i=1}^{n}X_{i1}\\ \frac{1}{n}\sum_{i=1}^{n}X_{i2}\\ \vdots \\ \frac{1}{n}\sum_{i=1}^{n}X_{ip}\end{array}\right) = \frac{1}{n}\sum_{i=1}^{n}\textbf{X}_i

Just as the sample means, \bar{x}, for the individual variables are unbiased for their respective population means, note that the sample mean vectors is unbiased for the population mean vectors.

E(\bar{x}) = E\left(\begin{array}{c}\bar{x}_1\\\bar{x}_2\\ \vdots \\\bar{x}_p\end{array}\right) = \left(\begin{array}{c}E(\bar{x}_1)\\E(\bar{x}_2)\\ \vdots \\E(\bar{x}_p)\end{array}\right)=\left(\begin{array}{c}\mu_1\\\mu_2\\\vdots\\\mu_p\end{array}\right)=\mathbf{\mu}

## AIOU Solved Assignments 1& 2 Code 8614 Spring 2020

Q3. Explain the concept of statistics in education. How does it help in analyzing the data and can the interpretation is trusted for decision making? (20)

Statisticians have defined the term in different ways.

Some of the definitions are given below:

Longman Dictionary:

Statistics is a collection of numbers which represent facts or measurement.

Webster:

‘Statistics are the classified facts representing the conditions of the people in a state especially those facts which can be stated in numbers or in tables of numbers of in any tabular or classified arrangements.

A.L. Bowley:

Statistics are numerical statements of facts in any department of enquiry placed in relation to each other.

1. Sacrist:

“By statistics we mean aggregate of facts affected to a marked extent by multiplicity of causes, numerically ex­pressed, enumerated or estimated according to reasonable stand­ard of accuracy, collected in a systematic manner for a predeter­mined purpose and placed in relation to each other.”

From the above definitions it can be said that statistics is:

1. Numerical facts which can be measured enumerated and estimated.
2. Facts are homogeneous and related to each other.
3. Facts must be accurate.
4. It must be collected systematically.

Lovitt:

“Statistics is that which deals with the collection, classification and tabulation of numerical facts as the basis for explanation, description and comparison of phenomena.”

### Function of Statistics:

Statistics has a numerous functions to do.

The following points explain the functions of statistics in summary:

1. It helps in collecting and presenting the data in a systematic manner.
2. It helps to understand unwisely and complex data by simplifying it.
3. It helps to classify the data.
4. It provides basis and techniques for making comparison.
5. It helps to study the relationship between different phenomena.
6. It helps to indicate the trend of behaviour.
7. It helps to formulate the hypothesis and test it.
8. It helps to draw rational conclusions.

### Statistics in Education:

Measurement and evaluation are essential part of teaching learning process. In this process we obtained scores and then interpret these score in order to take decisions. Statistics enables us to study these scores objectively. It makes the teaching learn­ing process more efficient.

The knowledge of statistics helps the teacher in the following way:

1. It helps the teacher to provide the most exact type of description:

When we want to know about the pupil we administer a test or observe the child. Then from the result we describe about the pupil’s performance or trait. Statistics helps the teacher to give an accurate description of the data.

1. It makes the teacher definite and exact in procedures and thinking:

Sometimes due to lack of technical knowledge the teachers become vague in describing pupil’s performance. But statistics enables him to describe the performance by using proper language, and symbols. Which make the interpretation definite and exact.

1. It enables the teacher to summarize the results in a meaningful and convenient form:

Statistics gives order to the data. It helps the teacher to make the data precise and mean­ingful and to express it in an understandable and interpretable manner.

1. It enables the teacher to draw general conclusions:

Statistics helps to draw conclusions as well as extracting con­clusions. Statistical steps also help to say about how much faith should be placed in any conclusion and about how far we may extend our generalization.

1. It helps the teacher to predict the future perfor­mance of the pupils:

Statistics enables the teacher to predict how much of a thing will happen under conditions we know and have measured. For example the teacher can predict the probable score of a student in the final examination from his entrance test score. But the prediction may be erroneous due to different factors. Statistical methods tell about how much margin of error to allow in making predictions.

1. Statistics enables the teacher to analyse some of the causal factors underlying complex and otherwise be-wildering events:

It is a common factor that the behavioural outcome is a resultant of numerous causal factors. The reason why a particular student performs poor in a particular subject are varied and many. So with the appropriate statistical methods we can keep these extraneous variables constant and can observe the cause of failure of the pupil in a particular subject.

### Important Concepts in Statistics:

Data:

Data may be defined as information obtained from a survey, an experiment or an investigation.

Score:

Score is the numerical evaluation of the performance of an individual on a test.

Continuous Series:

Continuous series is a series of observations in which the various possible values of the variable may differ by infinitesimal amounts. In the series it is possible to occur at any intermediate value within the range of the series.

Discrete Series:

Discrete series is a series in which the values of a variable are arranged according to magnitude or to some ordered principles. In this series it is not possible to occur at any intermediate value within the range. The example of such is merit, number of persons or census data.

Variable:

Any trait or quality which has the ability to vary or has at least two points of measurement. It is the trait that changes from one case or condition to another.

Variability:

The spread of scores, usually indicated by quartile deviations, standard deviations, range etc.

Frequency:

Frequency may be defined as the number of occurrences of any given value or set of values. For example 8 students have scored 65. So that the score 65 has a frequency of 8.

Frequency Distribution:

It is a tabulation showing the frequencies of the values of a variable when these values are arranged in order of magnitude.

Correlation:

Correlation means the interdepended between two or more random variables. It may be stated as the tendency for corresponding observation in two or more series to vary together from the averages of their respective series, that is, to have similar relative position.

If corresponding observations tend to have similar relative positions in their respective series, the correlation is positive; if the corresponding values tend to be divergent in position in their respective series, the correlation is negative; absence of any systematic tendency for the corresponding obser­vations to be either similar or dissimilar in their relative positions indicated zero correlation.

Coefficient:

It is a statistical constant that is independent of the unit of measurement.

Coefficient of correlation:

It is a pure number, limited by the values + 1.00 and —1.00 that expresses the degree of relationship between two continuous variables.

## AIOU Solved Assignments 1& 2 Code 8614 Spring 2020

Q4. How is bar chart different from pie chart? Discuss cons and pros of both types of data presentation? (20)

There are several different types of charts and graphs. The four most common are probably line graphs, bar graphs and histograms, pie charts, and Cartesian graphs. They are generally used for, and best for, quite different things.

You would use:

Bar graphs to show numbers that are independent of each other. Example data might include things like the number of people who preferred each of Chinese takeaways, Indian takeaways and fish and chips.

Pie charts to show you how a whole is divided into different parts. You might, for example, want to show how a budget had been spent on different items in a particular year.

Line graphs show you how numbers have changed over time. They are used when you have data that are connected, and to show trends, for example, average night time temperature in each month of the year. Cartesian graphs have numbers on both axes, which therefore allow you to show how changes in one thing affect another. These are widely used in mathematics, and particularly in Algebra.

Axes

Graphs have two axes, the lines that run across the bottom and up the side. The line along the bottom is called the horizontal or x-axis, and the line up the side is called the vertical or y-axis. The x-axis may contain categories or numbers. You read it from the bottom left of the graph. The y-axis usually contains numbers, again starting from the bottom left of the graph. The numbers on the y-axis generally, but not always, start at 0 in the bottom left of the graph, and move upwards. Usually the axes of a graph are labelled to indicate the type of data they show. Beware of graphs where the y-axis doesn’t start at 0, as they may be trying to fool you about the data shown (and there is more about this in our page on Everyday Mathematics).

Bar Graphs and Histograms

Bar graphs generally have categories on the x-axis, and numbers on the y-axis. This means that you can compare numbers between different categories. The categories need to be independent, that is changes in one of them do not affect the others.

Throughout most of human history, data visualization was limited because data was limited. Then, thanks to various sciences, scads of information — about demographics, economics, geography and weather patterns — emerged. And people needed a way to more easily analyze all this information. By the end of the 18th century, most charts used today — histograms, pie charts, bar and line graphs — were already in use, introduced to the world in William Playfair‘s breakthrough 1786 publication, “Commercial and Political Atlas.”

Playfair decided to use his draftsman’s skills to illustrate economic data. At the time, such information was commonly represented in tables, but Playfair transformed the data into infographics. In one famous line graph, he charted the price of wheat against the cost of labor, countering the popular opinion that wages were driving up grain costs and demonstrating that wages were, in fact, rising much more slowly than the product’s cost. From their humble beginnings, charts and graphs have helped audiences make educated decisions based on data, as well as identify previously unknown trends. Over the years, statisticians from all walks of life developed and designed additional tools for visually plotting data until modern technology allowed an explosion of new data visualizations that illustrate quantitative values in ways never before imagined.

To help our readers choose the best graph design to illustrate any data set, we’ve compiled a list of 44 types of graphs and the industries that they can serve. Of course, our list is far from all-inclusive. Fact is, the ways in which most graphs can be used are only limited by one’s own imagination. But for starters, here are 44 of our favorite uses:

### Column Bar Graphs The simplest and and most straightforward way to compare various categories is often the classic column-based bar graph. The universally-recognized graph features a series of bars of varying lengths.

One axis of a bar graph features the categories being compared, while the other axis represents the value of each. The length of each bar is proportionate to the value it represents. For example, $4 could be represented by a rectangular bar four units long, while$5 would equate to a five-unit long bar. With one quick glance, audiences learn exactly how the various items size up against one another. Bar graphs work great for visually presenting nearly any type of data, but they hold particular power in the marketing industry. The charts are commonly used to present financial forecasts and outcomes, and the graphs are ideal for comparing any sort of numeric value, including group sizes, inventories, ratings and survey responses.

Line Graphs Line charts, or line graphs, are powerful visual tools that illustrate trends in data over a period of time or a particular correlation. For example, one axis of the graph might represent a variable value, while the other axis often displays a timeline.

Each value is plotted on the chart, then the points are connected to display a trend over the compared time span. Multiple trends can be compared by plotting lines of various colors or patterns.

For example, the popularity of various social-media networks over the course of a year can be visually compared with ease through the use of a line graph. Simply plot each company’s user base for each month of the 12-month span, then connect the dots with a line of a designated color.

Audiences will quickly recognize which social networks are the most and least successful, as well as which are experiencing growth or loss.

Pie Charts Pie charts are the simplest and most efficient visual tool for comparing parts of a whole. For example, a pie chart can quickly and effectively compare various budget allocations, population segments or market-research question responses.

Marketing content designers frequently rely on pie charts to compare the size of market segments. For example, a simple pie graph can clearly illustrate how the most popular mobile-phone manufacturers compare based on the sizes of their user-bases.

Audiences can quickly understand that Apple and Samsung hold almost 75-percent of the mobile-communication market, with Apple slightly ahead. That message can be sent without printing a single numerical digit.

Mosaic or Mekko Charts Basic line, bar and pie charts are excellent tools for comparing one or two variables in few categories, but what happens when you need to compare multiple variables or multiple categories at the same time? What if all those variables aren’t numeric even? A mosaic — or Mekko — chart plot might be the better choice. Perhaps a market analyst, for example, wants to compare more than the size of various mobile-phone markets. What if, instead, he or she needs to compare the size of the user bases, as well as the age groups within each group? A mosaic chart would allow said marketer to illustrate all the variables in a clear and straightforward manner.

In the above example, one axis of the chart represents the categories being compared — mobile phone manufacturers — while the other axis lists various age ranges. The size and color of each cross-section of the chart corresponds with the market segment it represents.

Population Pyramids Market segments are often divided based on age and gender, and a population pyramid is an ideal visual representation of the two groups. The graph classically takes on the shape of a pyramid when a population is healthy and growing — the largest groups are the youngest, and each gender dwindles somewhat equally as the population ages, leaving the smallest groups at the top of the graph.

A population pyramid that veers away from its classic shape might indicate an irregularity in a population during a particular period, such as a famine or an economic boom that led to an increase in deaths or births.

Of course, population pyramids aren’t always used to compare populations by age, and therefore don’t always take on the graph’s namesake shape. A marketer, for example, might use the design to compare a population by income, weight or IQ, in which the smallest groups will often be at both the top and bottom. Regardless, the graph clearly depicts population trends, while it compares the sizes of two related groups.

Spider Charts When a statistician needs to visually compare three or more quantitative variables, he or she might choose to use a radar chart, also known as a spider or star chart. The chart usually consists of a series of radii, each representing a different category, that splay out from a center point like spokes.

The length of each “spoke” is proportionate to the value being compared. For each category, the spokes are then connected with a line of a designated pattern or color, forming a star-like shape with points equal to the number of categories. The result is a graphic representation that can reveal trends and compare categories at the same time. Data analysis is the process of developing answers to questions through the examination and interpretation of data.  The basic steps in the analytic process consist of identifying issues, determining the availability of suitable data, deciding on which methods are appropriate for answering the questions of interest, applying the methods and evaluating, summarizing and communicating the results.

Analytical results underscore the usefulness of data sources by shedding light on relevant issues. Some Statistics Canada programs depend on analytical output as a major data product because, for confidentiality reasons, it is not possible to release the microdata to the public. Data analysis also plays a key role in data quality assessment by pointing to data quality problems in a given survey. Analysis can thus influence future improvements to the survey process.

Data analysis is essential for understanding results from surveys, administrative sources and pilot studies; for providing information on data gaps; for designing and redesigning surveys; for planning new statistical activities; and for formulating quality objectives.

## Principles

A statistical agency is concerned with the relevance and usefulness to users of the information contained in its data. Analysis is the principal tool for obtaining information from the data.

Data from a survey can be used for descriptive or analytic studies. Descriptive studies are directed at the estimation of summary measures of a target population, for example, the average profits of owner-operated businesses in 2005 or the proportion of 2007 high school graduates who went on to higher education in the next twelve months.  Analytical studies may be used to explain the behaviour of and relationships among characteristics; for example, a study of risk factors for obesity in children would be analytic.

To be effective, the analyst needs to understand the relevant issues both current and those likely to emerge in the future and how to present the results to the audience. The study of background information allows the analyst to choose suitable data sources and appropriate statistical methods. Any conclusions presented in an analysis, including those that can impact public policy, must be supported by the data being analyzed.

## AIOU Solved Assignments 1& 2 Code 8614 Spring 2020

Q5. What is normal distribution? Explain the role of normal distribution in decision making for data analysis . Write  a note on skeweness and kurtosis and explain it’s causes.(20)

The normal distribution refers to a family of continuous probability distributions described by the normal equation.

## The Normal Equation

The normal distribution is defined by the following equation:

Normal equation. The value of the random variable Y is:

Y = { 1/[ σ * sqrt(2π) ] } * e-(x – μ)2/2σ2

where X is a normal random variable, μ is the mean, σ is the standard deviation, π is approximately 3.14159, and e is approximately 2.71828.

The random variable X in the normal equation is called the normal random variable. The normal equation is the probability density function for the normal distribution.

## The Normal Curve

The graph of the normal distribution depends on two factors – the mean and the standard deviation. The mean of the distribution determines the location of the center of the graph, and the standard deviation determines the height and width of the graph. When the standard deviation is large, the curve is short and wide; when the standard deviation is small, the curve is tall and narrow. All normal distributions look like a symmetric, bell-shaped curve, as shown below. The curve on the left is shorter and wider than the curve on the right, because the curve on the left has a bigger standard deviation.

## Probability and the Normal Curve

The normal distribution is a continuous probability distribution. This has several implications for probability.

• The total area under the normal curve is equal to 1.
• The probability that a normal random variable X equals any particular value is 0.
• The probability that X is greater than a equals the area under the normal curve bounded by a and plus infinity (as indicated by the non-shaded area in the figure below).
• The probability that X is less than a equals the area under the normal curve bounded by a and minus infinity (as indicated by the shaded area in the figure below). Additionally, every normal curve (regardless of its mean or standard deviation) conforms to the following “rule”.

• About 68% of the area under the curve falls within 1 standard deviation of the mean.
• About 95% of the area under the curve falls within 2 standard deviations of the mean.
• About 99.7% of the area under the curve falls within 3 standard deviations of the mean.

Collectively, these points are known as the empirical rule or the 68-95-99.7 rule. Clearly, given a normal distribution, most outcomes will be within 3 standard deviations of the mean.

To find the probability associated with a normal random variable, use a graphing calculator, an online normal distribution calculator, or a normal distribution table. In the examples below, we illustrate the use of Stat Trek’s Normal Distribution Calculator, a free tool available on this site. In the next lesson, we demonstrate the use of normal distribution tables.

Example 1

An average light bulb manufactured by the Acme Corporation lasts 300 days with a standard deviation of 50 days. Assuming that bulb life is normally distributed, what is the probability that an Acme light bulb will last at most 365 days?

Solution: Given a mean score of 300 days and a standard deviation of 50 days, we want to find the cumulative probability that bulb life is less than or equal to 365 days. Thus, we know the following:

• The value of the normal random variable is 365 days.
• The mean is equal to 300 days.
• The standard deviation is equal to 50 days.

We enter these values into the Normal Distribution Calculator and compute the cumulative probability. The answer is: P( X < 365) = 0.90. Hence, there is a 90% chance that a light bulb will burn out within 365 days.

Example 2

Suppose scores on an IQ test are normally distributed. If the test has a mean of 100 and a standard deviation of 10, what is the probability that a person who takes the test will score between 90 and 110?

Solution: Here, we want to know the probability that the test score falls between 90 and 110. The “trick” to solving this problem is to realize the following:

P( 90 < X < 110 ) = P( X < 110 ) – P( X < 90 )

We use the Normal Distribution Calculator to compute both probabilities on the right side of the above equation.

• To compute P( X < 110 ), we enter the following inputs into the calculator: The value of the normal random variable is 110, the mean is 100, and the standard deviation is 10. We find that P( X < 110 ) is 0.84.
• To compute P( X < 90 ), we enter the following inputs into the calculator: The value of the normal random variable is 90, the mean is 100, and the standard deviation is 10. We find that P( X < 90 ) is 0.16.

We use these findings to compute our final answer as follows:

P( 90 < X < 110 ) = P( X < 110 ) – P( X < 90 )
P( 90 < X < 110 ) = 0.84 – 0.16
P( 90 < X < 110 ) = 0.68

Thus, about 68% of the test scores will fall between 90 and 110.

Check Also: AIOU