STATISTICS
 is a science and the study of collection, organization, analysis, interpretation, and presentation of quantitative data.
 the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making more effective decisions. (Mason, Lind Marshall)
 the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making more effective decisions. (Mason, Lind Marshall)
USES OF STATISTICS:
Used in all
fields of endeavors, namely, fisheries, agriculture, commerce, trade and
industry, education, biology, economics, psychology, sociology, chemistry…etc..
FUNCTIONS OF STATISTICS:
 Provides researchers the means to scientifically measure the conditions that maybe involved in a given problem and evaluating the way in which these conditions are related.
 Shows the laws underlying facts and events that cannot be determined by individual observation.
 Observes trends and behavior in related conditions which otherwise may remain unclear.
TYPES OF STATISTICS
1. Descriptive
referring to, constituting, or grounded in matters of observation or experience.
 concerned with the gathering, classification, and representation of data and the collection of summarizing values to describe group characteristics of the data.
referring to, constituting, or grounded in matters of observation or experience.
 concerned with the gathering, classification, and representation of data and the collection of summarizing values to describe group characteristics of the data.
These values are percentage, measure of central tendency of variability , and of skewness and kurtosis. (source: Masteral notes)
2. Inferential  relating to, involving, or resembling inference.
 aims to give information about the large groups of data without dealing with each
and every element of these groups. Among the topics included in this study are testing
hypotheses using ttest, ztest, correlation, analysis of variance, chisquare test, regression analysis, time series analysis. The basis for inferential is the ability to make decisions about parameters without having the complete census of the population.
(source: Masteral notes)
hypotheses using ttest, ztest, correlation, analysis of variance, chisquare test, regression analysis, time series analysis. The basis for inferential is the ability to make decisions about parameters without having the complete census of the population.
(source: Masteral notes)
SOME DEFINITION OF TERMS USED IN STATISTICS:
 Quantitative Variable – when the variable studied can be reported numerically.
 Sample – A portion or part of the population of interest.
 Population – A collection of all possible individuals, objects or measurements of interest.
 Qualitative Variables – when the characteristics or variable being studied is non numeric.
 Discrete variable – it is a quantitative variable. Assumes only certain values like bedrooms in a house, no. of cars arriving at a tollbooth, etc..
 Continuous variable – It is a quantitative variable. Assume a specific values within a specific range like air pressure in a tire, weigh in a shipment of grains, the flight from USA to Manila Etc..
Collection of Data:
TWO TYPES OF DATA:
1. Primary – refers to information which are gathered directly
from an original source, or which are based on direct or firsthand
experience.
Ex. 1^{st} person
accounts, autobiographies, and diaries.
2. Secondary data – refers to information which are taken from published or unpublished data which were
previously gathered by other individuals or agencies.
Ex. Published books, news
papers, magazines, business reports, etc.
.
Methods used in collecting data:
 1. Direct or Interview method – this is a method of person to
person exchange between the interviewer and the interviewee.
 provides consistent and more
precise information.
 but timeconsuming, expensive,
and has limited field coverage.
2. Indirect or questionnaire
method  this
method, written responses are given to
prepared questions.
prepared questions.
 Questionnaire maybe mailed or
handcarried.
 This is not expensive and can
cover a wide area in a shorter span of time.
3. Registration Method – this type of information gathering
is enforced by certain laws.
Ex. Birth, death, vehicles, and licenses.
4. Observation Method – in this method, the investigator
observes the behavior of persons or organizations and their outcomes. It is
usually used when the subjects can not talk or write.This method makes
possible the recording of behavior at the appropriate time.
5 . Experiment method – this method is used when the
objective is to determine the cause and effect relationship of certain
phenomena under controlled conditions. Scientific researchers usually use this
type of method.
Frequency distribution and their
graphical Representation:
Frequency Distribution – A grouping of data into categories showing the
number of
observations in each mutually exclusive category.
observations in each mutually exclusive category.
–
is appropriate if the number of case (N) is 30 or more.
Steps in FDT construction:
 Find the range: ( the
difference between the highest score and the lowest score)
Range = HS – LS
 FIND THE CLASS INTERVAL.
Recommended No. of classes to the no. observations
916 4
1732 5
3364 6
66128 7
129256 8
257512 9
5131024 10
3.
Determine the
approximate size of the CI by dividing the range by the desired no. of CI.
4. Write the CI starting with the lowest score limit as determined by your choice ( as
4. Write the CI starting with the lowest score limit as determined by your choice ( as
researcher).
5. Determine the class frequencies for each class interval by referring to the tally column dividing the sum by 2. The class mark is the representative value of the corresponding interval.
6. Compute for the class mark by adding the lower and upper limits of the class interval,
Class Boundaries – more precise expressions of the class limits by at least 0.5 of their values. CB is situated between the upper limit of one interval and the lower limit of the next interval.
5. Determine the class frequencies for each class interval by referring to the tally column dividing the sum by 2. The class mark is the representative value of the corresponding interval.
6. Compute for the class mark by adding the lower and upper limits of the class interval,
Class Boundaries – more precise expressions of the class limits by at least 0.5 of their values. CB is situated between the upper limit of one interval and the lower limit of the next interval.
Given the Data:
120
133 180 138
140
150 170 153
161
149 124 168
148 139 161 142
130
143 137 147
156
151 128 118
165
138 147 167
146
150 149 129
142
158 152 130
175
148 142 159
Find for the following from the data
given:
 Range:
 Make a class interval ( use 5)
 Show the < frequency
 Show the > frequency
 Solve for relative frequency
 Solve for the Percentage frequency
 Solve for the Class Mark
 Solve for the Class Boundaries
 Solve for the pie/circle graph
 Draw the circle/pie graph/chart
Measures of Central Tendency (
Grouped Data):
 Popularly known as average.
 Are descriptive statistics because a single no. describes a central value of a group of observations or individuals where this central value represents all the figures in a group of which it is a part.
Arithmetic Mean:

The most
important and widely used measure of central tendency.
Mean, Median, and Mode : (Its
importance)
 It is a short hand descriptive of a group of quantitative data obtained from a sample.
 It is more economical, easier, and meaningful to let one figure stand for a group than to remember all particular numbers in a group.
 It is descriptive of a sample obtained in particular group of observations at a particular time in a particular way.
 It is also describes indirectly, but with some accuracy, the population from which the sample is drawn.
Characteristics of:
·
Mean:
 Arithmetic mean is frequently used measure of central tendency because it is subject to less error.
 It lends itself to algebraic manipulation.
 Its standard error is less than the median.
 The sum of the deviation of the cases about mean is zero.
·
Median:

The sum of
absolute deviation about the median is less than or equal to the sum of
absolute
deviations about any other value.
deviations about any other value.
·
Mode:
 It is entirely independent of the extreme measures.
 Its position is not stable.
 It is not contributed by all items in a series.
 It is not always be well defined or possible to locate properly.
 The set of observations can be unimodal (one mode), bimodal ( two modes), trimodal (three modes), or polymodal.
·
Mean:
 Most reliable, most stable, and with the least probable error.
 Most generally recognized measure of central tendency.
· Median:
 the best measure for irregular or skewed distribution.
 It maybe located in an openend distribution or when the data are incomplete.
·
Mode:
 Always real value since it does not fall on zero.
 Simple to approximate by observation especially when the number of cases is small.
 It does not lend itself to algebraic manipulation.
 Does not require the arrangement values.
Disadvantages with each other:
·
Mean:
 Does not supply the information about the homogeneity of the group.
 The more heterogeneous the set of observations or group of individuals is, the less satisfactory
 Is the mean as measure of tendency.
·
Median:
 Requires the arranging of items according to size before it can be computed.
 Has a larger probable error than the mean,
 t does not lend itself to algebraic treatment.
 Erratic when the data do not cluster at the center of distribution.
·
Mode:
 Inapplicable to small number of cases when the values may not be repeated.
 It is rigidly defined and is inapplicable to irregular distribution.
Point Measures: (Grouped Data)
Different types of point measures
Quartile, Decile, Percentile
·
Quartile – is a point in a scale where the
distribution is divided into four equal parts.
Formula:
Q_{k} = LB + [(kN/4  >cf) / f] i
·
Decile  is a point in a scale where the
distribution is divided into ten equal parts.
Formula:
D_{k} = LB + [(kN/10  >cf) / f] i
·
Percentile  is a point in a scale where the
distribution is divided into hundred equal parts.
Formula:
P_{k} = LB + [(kN/100  >cf) / f] i
Measures of Variability (Grouped
Data):
 It tells us the spread of the data.
 Measures of variability give information on how the data are scattered or spread and describe the mass of data. They give the total picture and characteristics of the set of data on how they are dispersed.
Two Types of Measures of
Variability
 Absolute Variability:
 Range  simplest and easiest measure of variability, classified into, absolute range, total range, Kelly range. Absolute range is simply getting the difference between highest and lowest score. Total range is the difference by subtracting lowest score from the highest score + lowest score.
1. Kelly range
is obtained by subtracting the 10^{th} percentile from the 90^{th}
percentile
(P_{90}P_{10}).
(P_{90}P_{10}).
 it is most useful in representing the dispersion of small data sets.
b.
Quartile deviation – divides the result of Q_{3} –
Q_{1} into halves_{. }It means ½ of the distance of the
difference of the third and the first quartile.
c .
Average deviation / mean deviation – (obtained by formulated steps).
d .
Variance – (obtained by formulated steps). A squared standard deviation.
e .
Standard Deviation – most commonly used as guide for the
degree of dispersion or spread. It also the most dependable measure to
calculate the variability of the total population from which the sample came.
2 . Relative Variability
a . Coefficient of variation
b . Coefficient of quartile deviation
c . Coefficient of mean deviation
This
is the most commonly used measure of the spread or dispersion of data around
the mean. The standard deviation is defined as the square root of the variance
(V).
The
variance is defined as the sum of the squared deviations from the mean, divided
by n1.
_{
}


Relative standard deviation (RSD). Coefficient of variation 
Although the standard deviation of
analytical data may not vary much over limited ranges of such data, it usually
depends on the magnitude of such data: the larger the figures, the larger s. Therefore, for comparison of variations it is often more convenient to use the relative
standard deviation (RSD) than the standard deviation itself. The RSD
is expressed as a fraction, but more usually as a percentage and
is then called coefficient of variation (CV).
Formula:
Formula:
Note. When
needed (e.g. for the Ftest,) the variance can, of course, be calculated
by squaring the standard deviation:
Hence:
V = s^{2}
Inter quartile range (IQR)
In order to talk about inter quartile range, we need to first talk about percentile. The pth percentile of the data set is a measurement such that after the data are ordered from smallest to largest, at most p% of the data are below this value and at most (100p)% above it. Thus, the median is the 50th percentile.
Also, Q_{1} = lower
quartile = 25th percentile and Q_{3} = upper quartile = 75th
percentile.
Inter quartile range is the difference between upper and lower quartiles and denoted as IQR.
I QR = Q_{3}  Q_{1}
= upper quartile  lower quartile = 75th percentile  25th percentile.
Note: IQR is not affected by extreme values. It is thus
a resistant measure of variability.

Measures of Correlation:
Correlation is a measure to determine the degree
of relationship of two sets of variable, X and Y. It is also called linear
correlation.
Measures of correlation are inferential statistics because they determine
if there is a significant relationship that exists between the two variables.
The Pearson ProductMoment
Correlation Coefficient
Formula:
r _{xy}_{ }= (real formula to be followed)
where :
sum
of test Y
r _{xy}_{ =} Pearson ProductMoment Correlation Coefficient of X
and Y.
N = number of cases
Interpretation of correlation values:
Classifications of r from:
0.00 to
± 0.20 ; denotes negligible correlation
± 0.21 to ± 0.40
; denotes low or slight correlation
± 0.41 to ± 0.70; denotes marked or moderate
correlation
± 0.71 to ± 0.90
; denotes high correlation
± 0.91 to ± 0.99;
denotes very high correlation
± 1.00 ; denotes
perfect correlation
Compute using PPMCC on the weightlength relationship
of milkfish cultured in fish cages using bread meal as supplemental feed.
Interpret results.
Weight(kg)

Length(m)

0.43

0.52

0.54

0.62

0.41

0.51

0.63

0.68

0.55

0.63

0.42

0.5

0.58

0.62

0.57

0.61

0.48

0.54

0.62

0.68

0.6

0.65

0.59

0.62

0.65

0.7

0.59

0.63

0.5

0.55

.