What is the percent or proportion of observations within a category of a data set?

In SPSS, the Frequencies procedure can produce summary measures for categorical variables in the form of frequency tables, bar charts, or pie charts.

To run the Frequencies procedure, click Analyze > Descriptive Statistics > Frequencies.

What is the percent or proportion of observations within a category of a data set?

What is the percent or proportion of observations within a category of a data set?

A Variable(s): The variables to produce Frequencies output for. To include a variable for analysis, double-click on its name to move it to the Variables box. Moving several variables to this box will create several frequency tables at once.

B Statistics: Opens the Frequencies: Statistics window, which contains various descriptive statistics.

What is the percent or proportion of observations within a category of a data set?

The vast majority of the descriptive statistics available in the Frequencies: Statistics window are never appropriate for nominal variables, and are rarely appropriate for ordinal variables in most situations. There are two exceptions to this:

  • The Mode (which is the most frequent response) has a clear interpretation when applied to most nominal and ordinal categorical variables.
  • The Values are group midpoints option can be applied to certain ordinal variables that have been coded in such a way that their value takes on the midpoint of a range. For example, this would be the case if you had measured subjects' ages and had coded anyone between the ages of 20 and 29 as 25, or between the 30 and 39 as 35 (source: IBM SPSS Statistics Information Center).

If your categorical variables are coded numerically, it is very easy to mis-use measures like the mean and standard deviation. SPSS will compute those statistics if they are requested, regardless of whether or not they are meaningful. It is up to the researcher to determine if these measures are appropriate for their data. In general, you should never use any of these statistics for dichotomous variables or nominal variables, and should only use these statistics with caution for ordinal variables.

C Charts: Opens the Frequencies: Charts window, which contains various graphical options. Options include bar charts, pie charts, and histograms. For categorical variables, bar charts and pie charts are appropriate. Histograms should only be used for continuous variables; they should not be used for ordinal variables, and should never be used with nominal variables.

What is the percent or proportion of observations within a category of a data set?

  • Bar chart displays the categories on the graph's x-axis, and either the frequencies or the percentages on the y-axis
  • Pie chart depicts the categories of a variable as "slices" of a circular "pie".

Note that the options in the Chart Values area apply only to bar charts and pie charts. In particular, these options affect whether the labeling for the pie slices or the y-axis of the bar chart uses counts or percentages. This setting will greyed out if Histograms is selected.

D Format: Opens the Frequencies: Format window, which contains options for how to sort and organize the table output.

What is the percent or proportion of observations within a category of a data set?

The Order by options affect only categorical variables:

  • Ascending values arranges the rows of the frequency table in increasing order with respect to the category values: (alphabetically if string, or by numeric code if numeric)
  • Descending values arranges the rows of the frequency table in decreasing order with respect to the category values.
    • Note: If your categorical variable is coded numerically as 0, 1, 2, ..., sorting by ascending or descending value will arrange the rows with respect to the numeric code, not with respect to any assigned labels.)
  • Ascending counts orders the rows of the frequency table from least frequent (lowest count) to most frequent (highest count).
  • Descending counts orders the rows of the frequency table from most frequent (highest count) to least frequent (lowest count).

When working with two or more categorical variables, the Multiple Variables options only affects the order of the output. If Compare variables is selected, then the frequency tables for all of the variables will appear first, and all of the graphs for the variables will appear after. If Organize output by variables is selected, then the frequency table and graph for the first variable will appear together; then the frequency table and graph for the second variable will appear together; etc.

E Display frequency tables: When checked, frequency tables will be printed. (This box is checked by default.) If this check box is not checked, no frequency tables will be produced, and the only output will come from supplementary options from Statistics or Charts. For categorical variables, you will usually want to leave this box checked.

By the end of this section, you will be able to...

  1. organize qualitative data in tables
  2. organize quantitative data into tables
  3. create cumulative frequency and relative frequency tables

For a quick overview of this section, watch this short video summary:

Let's suppose you give a survey concerning favorite color, and the data you collect looks something like the table below.

blue
red blue orange blue yellow green red pink
blue green blue purple blue blue green yellow pink
blue red pink green blue yellow green blue  

Clearly, we need a better way to summarize the data. The most obvious thing to do would be to make a table with the list of favorite colors and the frequency for each.

favorite color frequency
blue 10
red 3
orange 1
yellow 3
green 5
pink 3
purple 1

Officially, we call this a frequency distribution.

A frequency distribution lists each category of data and the number of occurrences for each category.

Sometimes, we really want to know the frequency of a particular category in reference to the total. We can do this just by finding the total, and dividing the frequency for each category by that total.

The relative frequency is the proportion (or percent) of observations within a category and is found using the formula

relative frequency =   frequency
sum of all frequencies

A relative frequency distribution lists each category of data together with the relative frequency of each category.

favorite color relative frequency
blue 10/26 ≈ 0.38
red 3/26 ≈ 0.12
orange 1/26 ≈ 0.04
yellow 3/26 ≈ 0.12
green 5/26 ≈ 0.19
pink 3/26 ≈ 0.12
purple 1/26 ≈ 0.04

Technology

Here's a quick overview of how to create frequency and relative frequency tables in StatCrunch.

  1. Enter or import the data.
  2. Select Stat > Tables > Frequency.
  3. Select the column(s) you want to summarize and click Next.
  4. Add any modifications for an "Other" category and how to order the categories.
  5. Click Calculate and another window with these numbers calculated will pop up.
  6. You can then choose Options > Copy to copy the output for use elsewhere.

Organizing Discrete Data into Tables

If you recall from Section 1.2,

A discrete variable is a quantitative variable that has either a finite number of possible values or a countable number of values. (Countable means that the values result from counting - 0, 1, 2, 3, ...)

Since we can list all the possible values (that's essentially what countable means), one way to make a table is just to list the values along with their corresponding frequency.

Example 1

Here's some data I collected from a previous students Mth120 course. It refers to the number of children in their family (including themselves).

2 2 2 4 5 3 3 3 3
2 1 2 3 5 3 4 3 1
2 3 5 3 2 1 3 2

An easy way to compile the data would then be to make a frequency or relative frequency table as we did before.

children frequency relative frequency
1 3 3/26 ≈ 0.12
2 8 8/26 ≈ 0.31
3 10 10/26 ≈ 0.38
4 2 2/26 ≈ 0.08
5 3 3/26 ≈ 0.12

Sometimes, however, we have too many values to make a row for each one. In that case, we'll need to group several values together.

Example 2

A good example might be the scores on an exam, ranging from 1-100. Here are some data from a past Mth120 class.

62
87 67 58 95 94 91 69 52
76 82 85 91 60 77 72 83 79
63 88 79 88 70 75 87

In this case, we'll have to set up intervals of numbers called classes. Each class has a lower class limit and an upper class limit, along with a class width. The class width is the difference between successive lower class limits.

To be consistent, the class width should be same for each class. One good option might look something like this:

What is the percent or proportion of observations within a category of a data set?

Organizing Continuous Data into Tables

Organizing continuous data is similar to organizing multi-valued discrete data. We have to form classes which don't overlap. I usually try to design a class width that's either logical (i.e. 10 points for grades above) or so that I have 5-8 classes when complete.

Example 3

For this example, let's consider the average commute for each of the 50 states. The data below show the average daily commute of a random sample of 15 states.

23.1 18.3 23.2 19.9 26.6
24.8 23.1 23.2 22.7 29.4
22.3 30.0 25.8 21.9 16.7
Source: US Census

Do you know why this is a continuous random variable and not discrete? (Hint: It's not because of the decimal.)

I think I know!

This is continuous because the variable we're measuring - time - is not finite. When, say, a marketing agent measures her commute time, she actually rounds to the nearest minute. If she reports 32 minutes, it's not exactly 32 minutes, it's 32 minute to the nearest minute. In reality, it might be 32.15323623245134... (you get the idea).

To make a frequency or relative frequency for continuous data, we use the same strategy we'd use for multi-valued discrete data.

average commute frequency relative frequency
16-17.9 1 1/15 ≈ 0.07
18-19.9 2 2/15 ≈ 0.13
20-21.9 1 1/15 ≈ 0.07
22-23.9 6 6/15 = 0.40
24-25.9 2 2/15 ≈ 0.13
26-27.9 1 1/15 ≈ 0.07
28-29.9 1 1/15 ≈ 0.07
30-31.9 1 1/15 ≈ 0.07

Once we have these tables, we'll need to learn how to create some charts to display the information, which is what the next few page are about.

Technology

Here's a quick overview of how to create frequency and relative frequency tables for quantitative data in StatCrunch.

Discrete Data

  1. Enter or import the data.
  2. Select Stat > Tables > Frequency.
  3. Select the column(s) you want to summarize and click Next.
  4. Add any modifications for an "Other" category and how to order the categories, and click Calculate.

Continuous or Multi-valued Discrete Data:

  1. Enter or import the data.
  2. Select Data > Bin Column.
  3. Select the column containing the data, select "Use fixed width bins", and set the lowest class limit (Start bins at:) and class (bin) width.
  4. Click Calculate.
  5. Select Stat > Tables > Frequency.
  6. Select the newly created bin column and click Calculate.*

* Note that these classes seem to overlap, but that the class "0-k" does not include Mk.

Creating a relative frequency table from a frequency table

If you are given a frequency table and need to create a relative frequency table, use the following steps, assuming that "Frequency" is the label of the column containing the frequencies - edit as needed.

  1. Click on Data > Compute > Expression.
  2. Enter the text "Frequency/sum(Frequency)" in the Expression box.
  3. If desired, enter a column label.
  4. Click Compute.

Cumulative Tables

Cumulative tables are just what they imply - they show the sum of values up to and including that particular category. As with regular tables, we can have both cumulative frequency and relative frequency.

Example 4

To illustrate the idea, let's look at the average commute data from the last section.

average commute frequency cumulative frequency
16-17.9 1 1
18-19.9 2 3
20-21.9 1 4
22-23.9 6 10
24-25.9 2 12
26-27.9 1 13
28-29.9 1 14
30-31.9 1 15

average commute relative
frequency
cumulative relative

frequency

16-17.9 1/15 ≈ 0.07 1/15 ≈ 0.07
18-19.9 2/15 ≈ 0.13 3/15 ≈ 0.20
20-21.9 1/15 ≈ 0.07 4/15 ≈ 0.27
22-23.9 6/15 = 0.40 10/15 ≈ 0.67
24-25.9 2/15 ≈ 0.13 12/15 = 0.80
26-27.9 1/15 ≈ 0.07 13/15 ≈ 0.87
28-29.9 1/15 ≈ 0.07 14/15 ≈ 0.93
30-31.9 1/15 ≈ 0.07 15/15 = 1.00

Technology

  1. Enter or import the data.
  2. Select Stat > Tables > Frequency.
  3. Select the column(s) you want to summarize.
  4. Select Cumulative frequency or Cumulative relative frequency as the Statistic(s).
  5. Click Compute.

Creating cumulative tables from a frequency table.

Unfortunately, there is no easy way to create cumulative tables in StatCrunch. You actually need to write a custom function to do this.

  1. Go to Data > Compute > Expression.
  2. Enter cumsum([column name]) (Where [column name] is the column where the frequencies or relative frequencies are stored.)
  3. If desired, enter a Column label.
  4. Click Compute.