Presentious uses cookies to assist with our ability to gather feedback, analyze your use of our products and services, and assist with our promotional and marketing efforts. Please see our privacy policy for more information.
Close
Presentious
Sign in  ·  Register
  • Content
  • Help & Support
  • More
    • Presentious
    • Account
    • Mic check
    • Privacy
    • Support
    • Upgrade Presentious

Univariate Analysis 1

Ask Samuel a question
 

  • Share & Embed
  • Contents
  • Questions (0)
  • Transcript

Share via this short url:


Embed with this HTML snippet:


Sign in to ask, answer and view questions.
  • 1. 
  • 2. 
  • 3. 
  • 4. 
  • 5. 
  • 6. 
  • 7. 
  • 8. 
  • 9. 
  • 10. 
  • 11. 
  • 12. 
Greetings!
This is Dr.
Fisher and this
is a short presentation about univariate
data analysis.
Statistics is a tool to
help us understand a distribution of
observations.
One example would be ages.
So we could take the age of everybody in
class and array it on a line and that
gives us very detailed information, but
it's hard to figure out well, who's the
oldest, youngest...
whereas the middle of
the distribution, the average and so on.
A
nd so with statistics we come up with a
way to try to summarize and describe in a
single number a distribution of
observations.
So in this particular
instance what we're going to do is look
at ways in which we can come up with a
summary statistic.
Again it's difficult
to look at a string of numbers and make
sense and so this is where any sort of
statistical program comes in handy
because we can create tables and ways to
present the data that is more
comprehensible and understandable.
So with univariate data analysis a very
simple way is to create a frequencies
table and in Stata you've already been
doing this by running a tabulate command
for individual variables.
Measures of
central tendency are a way to get a
sense of where the distribution...
where
the center is in their distribution and
it's important to remember the different
levels of data because that will
determine the appropriate statistic that
you compute.
So whether it's nominal,
ordinal, interval or ratio that will
determine the appropriate measure of
central tendency.
The other thing to
remember is that anytime we come up with
a statistic what we're doing is we're
sort of compacting summarizing a
distribution and this means we're going
to lose some detail, but in the end we
have get a better sense
of how to describe the distribution when
we can come up with these statistics
The
appropriate measure of central tendency
for each level of data is the following:
for nominal level data the mode is the
appropriate measure, for ordinal level
data the median will be the appropriate
measure of central tendency and then for
interval and ratio a mean, which is
normally what we think of as average,
will be the appropriate measure.
For the
mode we use it with nominal level data
and the reason is nominal level data has
no sense of rank.
And the mode is the
category where we find the most
observations and the mode is sort of
different than the other two measures of
central tendency in that you can have
more than one mode.
So if you have an
instance where there are two categories
out of the five both have equal number
of observations that are that are the
highest then the mode would be those two
categories.
Again the critical thing is
there's no assumptions about rank.
So in
a sense we're trying to get an
understanding of which category has a
preponderance of observations.
For the
median we can use ordinal level data in
here we're assuming there's rank the
medians the point where if you took
every individual case and you rank them
from low to high where the middle case
rests.
So if it's a hundred cases where
the 50th case falls.
One of the things
about the median is it's not sensitive
to extreme values and again remember
with ordinal level data we don't really
have a sense of the size of the
categories and that's why we're just
looking for which category does that
middle case fall in.
For interval...
for interval and
ratio data, we're
going to calculate a mean and that's
normally what we think of is an average
and that's simply adding up, say, ages adding
up ages and dividing them by the number
of cases.
One of the things about the
mean is it's sensitive to extreme values
so if we were to take the ages in the
class if we took just those of the
students we would probably come up with
a mean something in the 20s but if you
throw my age in the mix we're gonna end
up with something much higher than
twenty even though most of the class is
in their 20s.
So one of....
so that's one of
the things to remember about the mean is
it's sensitive to extreme values.
And
with measures of central tendency it's a
way to give us a sense of the middle of
the distribution so different statistics
are going to paint different pictures so
it's always important to keep that in
mind.
The other thing is that different
measures are appropriate for different
levels of data.
So in this instance the
mode can be used for all levels of data
whether it's nominal ordinal interval or
ratio.
Median is appropriate for ordinal
and interval and ratio level data and
again the nice thing about median is
it's not sensitive to extreme values so
it's sometimes handy if you have a
skewed distribution to use a median.
And
then the mean is only appropriate for
interval or ratio level data.
Now do
remember that in Stata our any other
statistical program, it doesn't
know whether you're talking about
nominal ordinal or interval or ratio so
you have to make that judgment
it'll compute any sort of statistic
regardless of what level of data.
Here we
have a distribution of numbers and just
looking at it isn't going to give us a
very good understanding of what's going
on there, so the appropriate thing to do
is create a frequencies table.
So
here's that same distribution arrayed
in a frequencies table.
So the categories
the numbers range from zero to nine and
the second column shows the number of
observations in each of those categories...
so we know there are three zeros, there's
no ones...
three twos and so on.
The third
column is the percentage that those
observations represent.
So twelve percent
of the observations are zero.
We got
twelve percent that are two and so on.
And
then the cumulative is the last column
and so we start out with twelve...
there's
zero percent for the category one, so
it's still 12 and then when we go to
category two we had a 12 percent to the
original 12 when we get the twenty four
and so on so when we get to the end of
the table
under category nine we reach a hundred
percent

October 3, 2018 12:00 PM

Viewed 845 times