Univariate Analysis 1

Greetings!

This is Dr.

Fisher and this

is a short presentation about univariate

data analysis.

Statistics is a tool to

help us understand a distribution of

observations.

One example would be ages.

So we could take the age of everybody in

class and array it on a line and that

gives us very detailed information, but

it's hard to figure out well, who's the

oldest, youngest...

whereas the middle of

the distribution, the average and so on.

nd so with statistics we come up with a

way to try to summarize and describe in a

single number a distribution of

observations.

So in this particular

instance what we're going to do is look

at ways in which we can come up with a

summary statistic.

Again it's difficult

to look at a string of numbers and make

sense and so this is where any sort of

statistical program comes in handy

because we can create tables and ways to

present the data that is more

comprehensible and understandable.

So with univariate data analysis a very

simple way is to create a frequencies

table and in Stata you've already been

doing this by running a tabulate command

for individual variables.

Measures of

central tendency are a way to get a

sense of where the distribution...

where

the center is in their distribution and

it's important to remember the different

levels of data because that will

determine the appropriate statistic that

you compute.

So whether it's nominal,

ordinal, interval or ratio that will

determine the appropriate measure of

central tendency.

The other thing to

remember is that anytime we come up with

a statistic what we're doing is we're

sort of compacting summarizing a

distribution and this means we're going

to lose some detail, but in the end we

have get a better sense

of how to describe the distribution when

we can come up with these statistics

The

appropriate measure of central tendency

for each level of data is the following:

for nominal level data the mode is the

appropriate measure, for ordinal level

data the median will be the appropriate

measure of central tendency and then for

interval and ratio a mean, which is

normally what we think of as average,

will be the appropriate measure.

For the

mode we use it with nominal level data

and the reason is nominal level data has

no sense of rank.

And the mode is the

category where we find the most

observations and the mode is sort of

different than the other two measures of

central tendency in that you can have

more than one mode.

So if you have an

instance where there are two categories

out of the five both have equal number

of observations that are that are the

highest then the mode would be those two

categories.

Again the critical thing is

there's no assumptions about rank.

So in

a sense we're trying to get an

understanding of which category has a

preponderance of observations.

For the

median we can use ordinal level data in

here we're assuming there's rank the

medians the point where if you took

every individual case and you rank them

from low to high where the middle case

rests.

So if it's a hundred cases where

the 50th case falls.

One of the things

about the median is it's not sensitive

to extreme values and again remember

with ordinal level data we don't really

have a sense of the size of the

categories and that's why we're just

looking for which category does that

middle case fall in.

For interval...

for interval and

ratio data, we're

going to calculate a mean and that's

normally what we think of is an average

and that's simply adding up, say, ages adding

up ages and dividing them by the number

of cases.

One of the things about the

mean is it's sensitive to extreme values

so if we were to take the ages in the

class if we took just those of the

students we would probably come up with

a mean something in the 20s but if you

throw my age in the mix we're gonna end

up with something much higher than

twenty even though most of the class is

in their 20s.

So one of....

so that's one of

the things to remember about the mean is

it's sensitive to extreme values.

And

with measures of central tendency it's a

way to give us a sense of the middle of

the distribution so different statistics

are going to paint different pictures so

it's always important to keep that in

mind.

The other thing is that different

measures are appropriate for different

levels of data.

So in this instance the

mode can be used for all levels of data

whether it's nominal ordinal interval or

ratio.

Median is appropriate for ordinal

and interval and ratio level data and

again the nice thing about median is

it's not sensitive to extreme values so

it's sometimes handy if you have a

skewed distribution to use a median.

And

then the mean is only appropriate for

interval or ratio level data.

Now do

remember that in Stata our any other

statistical program, it doesn't

know whether you're talking about

nominal ordinal or interval or ratio so

you have to make that judgment

it'll compute any sort of statistic

regardless of what level of data.

Here we

have a distribution of numbers and just

looking at it isn't going to give us a

very good understanding of what's going

on there, so the appropriate thing to do

is create a frequencies table.

here's that same distribution arrayed

in a frequencies table.

So the categories

the numbers range from zero to nine and

the second column shows the number of

observations in each of those categories...

so we know there are three zeros, there's

no ones...

three twos and so on.

The third

column is the percentage that those

observations represent.

So twelve percent

of the observations are zero.

We got

twelve percent that are two and so on.

And

then the cumulative is the last column

and so we start out with twelve...

there's

zero percent for the category one, so

it's still 12 and then when we go to

category two we had a 12 percent to the

original 12 when we get the twenty four

and so on so when we get to the end of

the table

under category nine we reach a hundred

percent

Univariate Analysis 1

Share via this short url:

Embed with this HTML snippet: