STATISTICS TERMS AND EXAMPLES

STATISTICS

- is a science and the study of the collection, organization, analysis, interpretation, and presentation of quantitative data.

- the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making more effective decisions. (Mason, Lind Marshall)

Statistics is used in a wide range of fields and situations whenever you need to analyze data, make predictions, or draw conclusions based on evidence. Here are some common scenarios where statistics is utilized:

Scientific Research: Statistics is crucial for designing experiments, analyzing data, and drawing conclusions in fields like biology, chemistry, physics, and social sciences.
Business and Economics: Businesses use statistics for market research, forecasting sales, analyzing financial data, and making strategic decisions. Economists use statistics to study economic trends, evaluate policies, and forecast economic indicators.
Healthcare: Statistics is vital in medical research for clinical trials, epidemiological studies, analyzing patient data, and evaluating treatment effectiveness.
Quality Control: Industries use statistics to monitor product quality, control manufacturing processes, and ensure compliance with standards.
Finance: Statistics plays a key role in risk management, portfolio analysis, pricing financial instruments, and modeling financial markets.
Social Sciences: Statistics is used in sociology, psychology, political science, and other social sciences to analyze survey data, study social phenomena, and test hypotheses.
Education: Statistics is important in educational research for evaluating teaching methods, assessing student performance, and conducting educational assessments.
Sports Analytics: Statistics is widely used in sports to analyze player performance, optimize strategies, and make decisions in areas like player recruitment and game tactics.
Environmental Studies: Statistics is used to analyze environmental data, study climate change, assess environmental impacts, and model ecological systems.
Market Research: Statistics is employed to analyze consumer preferences, conduct surveys, segment markets, and predict market trends.

USES OF STATISTICS:

Used in all fields of endeavors, namely, fisheries, agriculture, commerce, trade and industry, education, biology, economics, psychology, sociology, chemistry…etc..

FUNCTIONS OF STATISTICS:

Provides researchers the means to scientifically measure the conditions that may be involved in a given problem and evaluate how these conditions are related.
Shows the laws underlying facts and events that cannot be determined by individual observation.
Observe trends and behavior in related conditions that otherwise may remain unclear.

TYPES OF STATISTICS

1. Descriptive

-referring to, constituting, or grounded in matters of observation or experience.

- concerned with the gathering, classification, and representation of data and the collection of summarizing values to describe group characteristics of the data.

These values are percentage, a measure of central tendency of variability, and of skewness and kurtosis. (source: Masteral notes)

2. Inferential - relating to, involving, or resembling inference.

aims to give information about the large groups of data without dealing with each

and every element of these groups. Among the topics included in this study are testing
hypotheses using t-test, z-test, correlation, analysis of variance, chi-square test, regression analysis, and time series analysis. The basis for inferential is the ability to make decisions about parameters without having the complete census of the population.
(source: Masteral notes)

SOME DEFINITION OF TERMS USED IN STATISTICS:

Quantitative Variable – when the variable studied can be reported numerically.

Sample – A portion or part of the population of interest.
Population – A collection of all possible individuals, objects, or measurements of interest.
Qualitative Variables – when the characteristics or variable being studied is non-numeric.
Discrete variable – it is a quantitative variable. Assumes only certain values like bedrooms in a house, no. of cars arriving at a tollbooth, etc..
Continuous variable – It is a quantitative variable. Assume specific values within a specific range like air pressure in a tire, weight in a shipment of grains, the flight from USA to Manila Etc..

Collection of Data:

TWO TYPES OF DATA:

1. Primary – refers to information which are gathered directly from an original source, or which is based on direct or first-hand experience.

Ex. 1^st person accounts, autobiographies, and diaries.

2. Secondary data – refers to information that is taken from published or unpublished data that were previously gathered by other individuals or agencies.

Ex. Published books, newspapers, magazines, business reports, etc.

Methods used in collecting data:

1. Direct or Interview method – this is a method of person-to-person exchange between the interviewer and the interviewee.

- provides consistent and more precise information.

- but time-consuming, expensive, and has limited field coverage.

2. Indirect or questionnaire method - In this method, written responses are given to

prepared questions.

- Questionnaire may be mailed or hand-carried.

- This is not expensive and can cover a wide area in a shorter period.

3. Registration Method – this type of information gathering is enforced by certain laws.

Ex. Birth, death, vehicles, and licenses.

4. Observation Method – in this method, the investigator observes the behavior of persons or organizations and their outcomes. It is usually used when the subjects can not talk or write. This method makes possible the recording of behavior at the appropriate time.

5 . Experiment method – this method is used when the objective is to determine the cause-and-effect relationship of certain phenomena under controlled conditions. Scientific researchers usually use this type of method.

Frequency distribution and their graphical Representation:

Frequency Distribution – A grouping of data into categories showing the number of

observations in each mutually exclusive category.

– is appropriate if the number of cases (N) is 30 or more.

Steps in FDT construction:

Find the range: ( the difference between the highest score and the lowest score)

Range = HS – LS

FIND THE CLASS INTERVAL.

Recommended No. of classes to the no. observations

9-16 4

17-32 5

33-64 6

66-128 7

129-256 8

257-512 9

513-1024 10

3. Determine the approximate size of the CI by dividing the range by the desired no. of CI.
4. Write the CI starting with the lowest score limit as determined by your choice ( as

researcher).
5.      Determine the class frequencies for each class interval by referring to the tally column and dividing the sum by 2. The class mark is the representative value of the corresponding interval.
6.      Compute the class mark by adding the lower and upper limits of the class interval,

Class Boundaries – more precise expressions of the class limits by at least 0.5 of their values. CB    is situated between the upper limit of one interval and the lower limit of the next interval.

Given the Data:

120 133 180 138

140 150 170 153

161 149 124 168

148 139 161 142

130 143 137 147

156 151 128 118

165 138 147 167

146 150 149 129

142 158 152 130

175 148 142 159

Find the following from the data given:

Range:
Make a class interval ( use 5)
Show the < frequency
Show the > frequency
Solve for relative frequency
Solve for the Percentage frequency
Solve for the Class Mark
Solve for the Class Boundaries
Solve for the pie/circle graph
Draw the circle/pie graph/chart

Measures of Central Tendency ( Grouped Data):

Popularly known as average.
Are descriptive statistics because of a single no. describes a central value of a group of observations or individuals where this central value represents all the figures in a group of which it is a part.

Arithmetic Mean:

- The most important and widely used measure of central tendency.

Mean, Median, and Mode : (Its importance)

It is a shorthand descriptive of a group of quantitative data obtained from a sample.
It is more economical, easier, and meaningful to let one figure stand for a group than to remember all particular numbers in a group.
It is descriptive of a sample obtained in a particular group of observations at a particular time in a particular way.
It also describes indirectly, but with some accuracy, the population from which the sample is drawn.

Characteristics of:

· Mean:

Arithmetic mean is a frequently used measure of central tendency because it is subject to less error.

It lends itself to algebraic manipulation.

Its standard error is less than the median.
The sum of the deviation of the cases about the mean is zero.

· Median:

- The sum of absolute deviations about the median is less than or equal to the sum of absolute deviations about any other value.

· Mode:

It is entirely independent of the extreme measures.

Its position is not stable.

It is not contributed by all items in a series.

It is not always well-defined or possible to locate properly.

The set of observations can be unimodal (one mode), bimodal ( two modes), trimodal (three modes), or polymodal.

Advantages with each other:

· Mean:

Most reliable, most stable, and with the least probable error.
Most generally recognized measure of central tendency.

· Median:

the best measure for irregular or skewed distribution.

It may be located in an open-end distribution or when the data are incomplete.

· Mode:

It is always real value since it does not fall on zero.

Simple to approximate by observation especially when the number of cases is small.

It does not lend itself to algebraic manipulation.
Does not require the arrangement values.

Disadvantages with each other:

· Mean:

Does not supply information about the homogeneity of the group.

The more heterogeneous the set of observations or group of individuals is, the less satisfactory

Is the mean as measure of tendency.

· Median:

Requires the arranging of items according to size before it can be computed.
Has a larger probable error than the mean,

t does not lend itself to algebraic treatment.

Erratic when the data do not cluster at the center of distribution.

· Mode:

Inapplicable to a small number of cases when the values may not be repeated.

It is rigidly defined and is inapplicable to irregular distribution.

Point Measures: (Grouped Data)

Different types of point measures

Quartile, Decile, Percentile

· Quartile – a point in a scale where the distribution is divided into four equal parts.

Formula:

Q_k = LB + [(kN/4 - >cf) / f] i

· Decile - a point in a scale where the distribution is divided into ten equal parts.

Formula:

D_k = LB + [(kN/10 - >cf) / f] i

· Percentile - a point in a scale where the distribution is divided into hundred equal parts.

Formula:

P_k = LB + [(kN/100 - >cf) / f] i

Measures of Variability (Grouped Data):

It tells us the spread of the data.
Measures of variability give information on how the data are scattered or spread and describe the mass of data. They give the total picture and characteristics of the set of data on how they are dispersed.

Two Types of Measures of Variability

Absolute Variability:
Range - simplest and easiest measure of variability, classified into, absolute range, total range, Kelly range. The absolute range is simply the difference between the highest and lowest scores. Total range is the difference by subtracting the lowest score from the highest score + lowest score.

1. Kelly range is obtained by subtracting the 10^th percentile from the 90^th percentile

(P_90-P₁₀).

- it is most useful in representing the dispersion of small data sets.

b. Quartile deviation – divides the result of Q₃ – Q₁ into halves_.It means ½ of the distance of the difference between the third and the first quartile.

c . Average deviation / mean deviation – (obtained by formulated steps).

d . Variance – (obtained by formulated steps). A squared standard deviation.

e . Standard Deviation – most commonly used as a guide for the degree of dispersion or spread. It is also the most dependable measure to calculate the variability of the total population from which the sample came.

2 . Relative Variability

a . Coefficient of variation

b . Coefficient of quartile deviation

c . Coefficient of mean deviation

Measures of Variability (Ungrouped Data)

Standard deviation:

This is the most commonly used measure of the spread or dispersion of data around the mean. The standard deviation is defined as the square root of the variance (V).

The variance is defined as the sum of the squared deviations from the mean, divided by n-1.

$s=\sqrt{\frac{\sum \left ( x_{1}-x{^{2}}\right )}{n-1}}$


	Relative standard deviation (RSD). Coefficient of variation

Although the standard deviation of analytical data may not vary much over limited ranges of such data, it usually depends on the magnitude of such data: the larger the figures, the larger s. Therefore, for comparison of variations it is often more convenient to use the relative standard deviation (RSD) than the standard deviation itself. The RSD is expressed as a fraction, but more usually as a percentage, and is then called the coefficient of variation (CV).

Formula:
$RSD=\frac{s}{x}=CV=\frac{s}{x}\ast 100percent$

Note. When needed (e.g. for the F-test,) the variance can, of course, be calculated by squaring the standard deviation:

Hence:

V = s²

Inter quartile range (IQR)

To talk about the interquartile range, we need to first talk about the percentile.

The pth percentile of the data set is a measurement such that after the data are ordered from

smallest to largest, at most p% of the data are below this value and at most (100-p)% above it.

Thus, the median is the 50th percentile.

Also, Q₁ = lower quartile = 25th percentile and Q₃ = upper quartile = 75th percentile.

The inter-quartile range is the difference between upper and lower quartiles and is denoted as IQR.

I QR = Q₃ - Q₁ = upper quartile - lower quartile = 75th percentile - 25th percentile.

Note: IQR is not affected by extreme values. It is thus a resistant measure of variability.

Measures of Correlation:

Correlation- is a measure to determine the degree of relationship of two sets of variables, X and Y. It is also called linear correlation.

Measures of correlation are inferential statistics because they determine if there is a significant relationship that exists between the two variables.

The Pearson Product-Moment Correlation Coefficient

Formula:

r _xy= (real formula to be followed)

where :

the sum of test Y

r _xy₌ Pearson Product-Moment Correlation Coefficient of X and Y.

the sum of test X

= sum of the product of X and Y

= sum of squared X scores

= sum of squared Y scores

N = number of cases

Interpretation of correlation values:

Classifications of r from:

0.00 to ± 0.20 ; denotes negligible correlation

± 0.21 to ± 0.40 ; denotes low or slight correlation

± 0.41 to ± 0.70; denotes marked or moderate correlation

± 0.71 to ± 0.90 ; denotes high correlation

± 0.91 to ± 0.99; denotes very high correlation

± 1.00; denotes perfect correlation

Compute using PP-MCC on the weight-length relationship of milkfish cultured in fish cages using bread meal as supplemental feed. Interpret results.

Weight(kg)	Length(m)
0.43	0.52
0.54	0.62
0.41	0.51
0.63	0.68
0.55	0.63
0.42	0.57
0.58	0.62
0.57	0.61
0.48	0.54
0.62	0.68
0.60	0.65
0.59	0.62
0.65	0.72
0.59	0.63

A SUCCESS STORY

In the early 2000s, Zambezia, a small nation grappling with severe health challenges, underwent a remarkable transformation under the leadership of Dr. Maria Nkosi, an epidemiologist with a passion for data-driven solutions. Dr. Nkosi spearheaded an initiative to harness the power of statistics to address Zambezia's pressing public health issues. Collaborating with a dedicated team, they embarked on an extensive data collection effort, gathering information on prevalent diseases, healthcare access, and demographic indicators. Through rigorous analysis, they uncovered alarming rates of infectious diseases like malaria, tuberculosis, and HIV/AIDS, particularly affecting vulnerable groups such as children and pregnant women. Armed with these insights, Dr. Nkosi devised targeted interventions, including vaccination campaigns, community health centers, and public awareness initiatives. Using statistical modeling, they optimized resource allocation and continuously evaluated program outcomes to ensure maximum impact. Over time, Zambezia witnessed significant improvements in health outcomes, with reduced disease incidence and improved maternal and child health. Dr. Nkosi's pioneering approach attracted international attention, positioning Zambezia as a beacon of success in evidence-based public health interventions, showcasing the transformative potential of statistics in saving lives and building resilient healthcare systems.

Visit us at;