STATISTICS TERMS AND EXAMPLES - Daily Math Guide
Home » , » STATISTICS TERMS AND EXAMPLES

## STATISTICS

is a science and the study of collection, organization, analysis, interpretation, and presentation of quantitative  data.

- the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in
making more effective decisions. (Mason, Lind Marshall)

USES OF STATISTICS:

Used in all fields of endeavors, namely, fisheries, agriculture, commerce, trade and industry, education, biology, economics, psychology, sociology, chemistry…etc..

FUNCTIONS OF STATISTICS:

•      Provides researchers the means to scientifically measure the conditions that maybe involved in a given problem and evaluating the way in which these conditions are related.
•            Shows the laws underlying facts and events that cannot be determined by individual observation.
•       Observes trends and behavior in related conditions which otherwise may remain unclear.

### TYPES OF STATISTICS

1. Descriptive

-referring to, constituting, or grounded in matters of observation or experience.

- concerned with the gathering, classification, and representation of data and the collection of summarizing values to describe group characteristics of the data.
These values are percentage, measure of central tendency of variability , and of skewness and kurtosis. (source: Masteral notes)

2. Inferential - relating to, involving, or resembling inference.
•    aims to give information about the large groups of data without dealing with each
and every element of these groups. Among the topics included in this study are testing
hypotheses using t-test, z-test, correlation, analysis of variance, chi-square test, regression analysis, time series analysis. The basis for inferential is the ability to make decisions about parameters without having the complete census of the population.
(source: Masteral notes)

### SOME DEFINITION OF TERMS USED IN STATISTICS:

•   Quantitative Variable – when the variable studied can be reported numerically.
•  Sample – A portion or part of the population of interest.
• Population – A collection of all possible individuals, objects or measurements of    interest.
•  Qualitative Variables – when the characteristics or variable being studied is non numeric.
•  Discrete variable – it is a quantitative variable. Assumes only certain values like bedrooms in a house, no. of cars arriving at a tollbooth, etc..
• Continuous variable – It is a quantitative variable. Assume a specific values within a specific range like air pressure in a tire, weigh in a shipment of grains, the flight from USA to Manila  Etc..

### Collection of Data:

TWO TYPES OF DATA:

1. Primary   – refers to information which are gathered directly from an original source, or  which are based on direct or first-hand experience.

Ex. 1st person accounts, autobiographies, and diaries.

2. Secondary data – refers to information which are taken from published or unpublished data which were previously gathered by other individuals or agencies.

Ex. Published books, news papers, magazines, business reports, etc.
.
Methods used in collecting data:

1. 1. Direct or Interview method – this is a method of person to person exchange between the interviewer and the interviewee.
- provides consistent and more precise     information.

- but time-consuming, expensive, and has limited field coverage.

2. Indirect or questionnaire method - this method, written responses are given to

prepared     questions.

- Questionnaire maybe mailed or hand-carried.

- This is not expensive and can cover a wide area in a shorter span of time.

3. Registration Method – this type of information gathering is enforced by certain laws.
Ex. Birth, death, vehicles, and licenses.

4. Observation Method – in this method, the investigator observes the behavior of persons or organizations and their outcomes. It is usually used when the subjects can not talk or write.This method makes possible the recording of behavior at the appropriate time.

5   . Experiment method – this method is used when the objective is to determine the cause and effect relationship of certain phenomena under controlled conditions. Scientific  researchers usually use this type of method.

Frequency distribution and their graphical Representation:

Frequency Distribution – A grouping of data into categories showing the number of

observations in each   mutually exclusive category.

– is appropriate if the number of case (N) is 30 or more.

Steps in FDT construction:

1.     Find the range: ( the difference between the highest score and the lowest score)
Range = HS – LS

1.      FIND THE CLASS INTERVAL.
Recommended No. of classes to the no. observations

9-16                                       4
17-32                                     5
33-64                                     6
66-128                                   7
129-256                                 8
257-512                                 9
513-1024                              10

3.      Determine the approximate size of the CI by dividing the range by the desired no. of CI.
4.      Write the CI starting with the lowest score limit as determined by your choice ( as
researcher).
5.      Determine the class frequencies for each class interval by referring to the tally column dividing the sum by 2. The class mark is the representative value of the corresponding  interval.
6.      Compute for the class mark by adding the lower and upper limits of the class interval,

Class Boundaries – more precise expressions of the class limits by at least 0.5 of their values. CB    is situated between the upper limit of one interval and the lower limit of the next interval.

Given the Data:

120                   133                    180                 138
140                   150                    170                 153
161                  149                    124                 168
148                  139                    161                 142
130                  143                    137                 147
156                  151                    128                 118
165                  138                    147                 167
146                  150                    149                 129
142                  158                    152                 130
175                  148                    142                 159

Find for the following from the data given:

•      Range:

•      Make a class interval ( use 5)

•      Show the < frequency

•      Show the > frequency

•      Solve for relative frequency

•      Solve for the Percentage frequency

•      Solve for the Class Mark

•      Solve for the Class Boundaries

•      Solve for the pie/circle graph

•      Draw the circle/pie graph/chart
Measures of Central Tendency ( Grouped Data):

•             Popularly known as average.
•           Are descriptive statistics because a single no. describes a central value of a group of observations or individuals where this central value represents all the figures in a group of which it is a part.

Arithmetic Mean:

-          The most important and widely used measure of central tendency.

Mean, Median, and Mode : (Its importance)

•       It is a short hand descriptive of a group of quantitative data obtained from a sample.
•       It is more economical, easier, and meaningful to let one figure stand for a group than to remember all particular numbers in a group.
•      It is descriptive of a sample obtained in particular group of observations at a particular time in a particular way.

•          It is also describes indirectly, but with some accuracy, the population from which the sample is drawn.

Characteristics of:

·         Mean:
•         Arithmetic mean is frequently used measure of central tendency because it is subject to less error.
•      It lends itself to algebraic manipulation.
•          Its standard error is less than the median.

•     The sum of the deviation of the cases about mean is zero.

·         Median:

-          The sum of absolute deviation about the median is less than or equal to the sum of absolute

·         Mode:
•     It is entirely independent  of the extreme measures.
•           Its position is not stable.
•      It is not contributed by all items in a series.
•      It is not always be well defined or possible to locate properly.
•           The set of observations can be unimodal (one mode), bimodal ( two modes), trimodal (three modes), or polymodal.

·         Mean:
•     Most reliable, most stable, and with the least probable error.
•         Most generally recognized measure of central tendency.
·          Median:
•           the best measure for irregular or skewed distribution.
•          It maybe located in an open-end distribution or when the data are incomplete.
·         Mode:
•          Always real value since it does not fall on zero.
•         Simple to approximate by observation especially when the number of cases is small.
•                It does not lend itself to algebraic manipulation.

•          Does not require the arrangement values.

·         Mean:
•           Does not supply the information about the homogeneity of the group.
•      The more heterogeneous the set of observations or group of individuals is, the less satisfactory
•      Is the mean as measure of tendency.
·         Median:
•           Requires the arranging of items according to size before it can be computed.

•         Has a larger probable error than the mean,
•           t does not lend itself to  algebraic treatment.
•          Erratic when the data do not cluster at the center of distribution.
·         Mode:
•             Inapplicable to small number of cases when the values may not be repeated.
•       It is rigidly defined and is inapplicable to irregular distribution.
Point Measures: (Grouped Data)

Different types of point measures

Quartile, Decile, Percentile

·         Quartile – is a point in a scale where the distribution is divided into four equal parts.

Formula:

Qk = LB + [(kN/4 - >cf) / f] i

·         Decile - is a point in a scale where the distribution is divided into ten equal parts.

Formula:

Dk = LB + [(kN/10 - >cf) / f] i

·         Percentile - is a point in a scale where the distribution is divided into hundred equal parts.

Formula:

Pk = LB + [(kN/100 - >cf) / f] i

Measures of Variability (Grouped Data):
•      It tells us the spread of the data.
•      Measures of variability give information on how the data are scattered or spread and describe the mass of data. They give the total picture and characteristics of the set of data on how they are dispersed.
Two Types of Measures of Variability
•     Absolute Variability:
•       Range - simplest and easiest measure of variability, classified into, absolute range, total range, Kelly range. Absolute range is simply getting the difference between highest and lowest score. Total range is the difference by subtracting lowest score from the highest score + lowest score.
1. Kelly range is obtained by subtracting the 10th percentile from the 90th percentile

(P90-­10).

-  it is most useful in representing the dispersion of small data sets.

b.      Quartile deviation – divides the result of Q3 – Q1 into halves. It means ½ of the distance of  the difference of the third and the first quartile.

c  .       Average deviation / mean deviation – (obtained by formulated steps).

d .      Variance(obtained by formulated steps). A squared standard deviation.

e .      Standard Deviation – most commonly used as guide for the degree of dispersion or spread. It also the most dependable measure to calculate the variability of the total population from which the sample came.

2  .     Relative Variability

a  .      Coefficient of variation
b  .      Coefficient of quartile deviation
c  .       Coefficient of mean deviation

This is the most commonly used measure of the spread or dispersion of data around the mean. The standard deviation is defined as the square root of the variance (V).

The variance is defined as the sum of the squared deviations from the mean, divided by n-1.

$s=\sqrt{\frac{\sum \left ( x_{1}-x{^{2}}\right )}{n-1}}$

Although the standard deviation of analytical data may not vary much over limited ranges of  such data, it usually depends on the magnitude of such data: the larger the figures, the larger s. Therefore, for comparison of variations it is often more convenient to use the relative standard deviation (RSD) than the standard deviation itself. The RSD is expressed as a fraction, but more usually as a percentage and is then called coefficient of variation (CV).

Formula:
$RSD=\frac{s}{x}=CV=\frac{s}{x}\ast 100percent$

Note. When needed (e.g. for the F-test,) the variance can, of course, be calculated by squaring the standard deviation:

Hence:
V = s2

 Inter quartile range (IQR)         In order to talk about inter quartile range, we need to first talk about percentile.        The pth percentile of the data set is a measurement such that after the data are ordered from      smallest to largest, at most p% of the data are below this value and at most (100-p)% above it.      Thus, the median is the 50th percentile.     Also, Q1 = lower quartile = 25th percentile and Q3 = upper quartile = 75th percentile.        Inter quartile range is the difference between upper and lower quartiles and denoted as IQR.     I    QR = Q3 - Q1 = upper quartile - lower quartile = 75th percentile - 25th percentile.      Note: IQR is not affected by extreme values. It is thus a resistant measure of variability.

Measures of Correlation:
Correlation- is a measure to determine the degree of relationship of two sets of variable, X and Y. It is also called linear correlation.
Measures of correlation are inferential statistics because they determine if there is a significant relationship that exists between the two variables.

The Pearson Product-Moment Correlation Coefficient

Formula:

r   xy = (real formula to be followed)

where :
sum of test Y
r   xy = Pearson Product-Moment Correlation Coefficient of X and Y.

sum of test X

=     sum of the product of X and Y
=     sum of squared X scores
=     sum of squared Y scores
N = number of cases

Interpretation of correlation values:

Classifications of r from:

0.00 to ± 0.20­­ ; denotes negligible correlation
± 0.21 to ± 0.40 ; denotes low or slight correlation
± 0.41 to  ± 0.70; denotes marked or moderate correlation
± 0.71 to ± 0.90 ; denotes high correlation
± 0.91 to ± 0.99; denotes very high correlation
± 1.00 ; denotes perfect correlation

Compute using PP-MCC on the weight-length relationship of milkfish cultured in fish cages using bread meal as supplemental feed. Interpret results.

 Weight(kg) Length(m) 0.43 0.52 0.54 0.62 0.41 0.51 0.63 0.68 0.55 0.63 0.42 0.5 0.58 0.62 0.57 0.61 0.48 0.54 0.62 0.68 0.6 0.65 0.59 0.62 0.65 0.7 0.59 0.63 0.5 0.55

.

Select button to Share :