Now that Alex, Beth, and Cristina have their data from existing data sources and/or from their own data collection efforts, it is time to make sense of the data. This phase is called analysis and interpretation. There are generally two forms of data analysis: quantitative and qualitative. The goal of quantitative analysis is to use numbers and statistics to describe your sample and to use these data to help explain any differences that may exist between your sample and the population. The goal of qualitative analysis, on the other hand, is to use words and stories to provide a narrative or picture of a situation or individuals.
Please note: This module emphasizes quantitative data analysis. To learn more about qualitative data, view Video 4.3, featuring Dr. Richard Reddick.
To continue Module 4: Analyzing and Interpreting Data, click on Objectives & Keywords in the right-hand navigation.
Objectives & Keywords
- Describe common data analysis terms
- Analyze data to understand relationships among data related to your question
- Recognize common misconceptions about data (e.g., correlation vs. causation)
- Apply basic data analysis techniques to attempt to answer your question
- Formulate claims about your data
To continue Module 4: Analyzing and Interpreting Data, click on Keywords in the right-hand navigation.
Causation: A relationship between variables in which one variable causes a change in another.
Correlation: A statistical relationship between variables in which a change in one variable is associated with a change in one or more other variables.
Critical question: An inquiry often asked after examining the results from a data analysis that attempts to explain those results and prompts next steps.
Data analysis: The use of a variety of qualitative and quantitative techniques to better understand data. Correlation is a common quantitative technique discussed in this module, but many others exist. Qualitative techniques involve thematic analysis of the text to discover and understand categories, patterns, and themes.
Dependent variable: The variable that is of specific interest and is affected by independent variables. For Alex, the dependent variable is students’ ability to solve physics problems. For Beth, the dependent variable is students’ writing skills. For Cristina, a dependent variable is the number or percent of students graduating with a particular STEM discipline.
Independent variable: The variable that has an effect on the dependent variable. Independent variables can include a type of pedagogy (instruction), a student’s age, or a student’s socioeconomic status.
Mean: The average of all the data in a specific dataset found by dividing the sum of the dataset by the number of individual data points in the set.
Median: The value in the middle in an ordered dataset found by ordering the dataset from highest to lowest and selecting the individual data point that is located in the middle.
Mode: The most frequent value in a dataset.
Qualitative data: Non-numerical data that provide in-depth descriptions or narratives, such as interview transcripts, written comments from surveys, or observations.
Quantitative data: Numerical data, such as test scores, class rankings, grades, or numerical ratings.
Scatterplot: A graph that shows the relationship, including any patterns, between two variables.
Variable: A quantity that is under study, represented numerically for each student. Examples include tenth-grade math assignment scores, number of absences, or number of students majoring in a STEM discipline.
To continue Module 4: Analyzing and Interpreting Data, click on Case Studies in the right-hand navigation.
At the Classroom Level: Alex
Alex sat down at his computer and reviewed his existing datasets. He had more data than he could analyze and decided he need to focus his analysis on certain datasets. Although he would have liked to include the data on the percentage of homework problems that students attempted as a measure of effort, he would have to include those data in his analysis next year once he collects them. For right now, he was most interested in exploring the relationship among the test scores and determining whether attendance correlated with test scores. He hoped the data on test scores and attendance would help him understand why his 11th-grade students had difficulty applying formulas to solve physics problems. Although he thought he was making progress by revising his teaching methods this year, the changes he made didn’t seem to have an effect on test scores. The datasets he analyzed included:
- 11th-grade physics test scores
- 10th-grade reading test scores
- 10th-grade math test scores
- number of days each student had been absent
- number of days each student had been tardy
Alex wanted to examine the relationship between each of his independent variables and his dependent variable, which is the physics exam score. He could have created a table of all of these scores, but it could be difficult to see any patterns. Instead, he decided to create a scatterplot of each variable. Each student is represented by a dot; the horizontal position of the dot corresponds to one score, and the vertical position of the dot corresponds to another score.
When Alex reviewed his scatterplots of each independent variable compared to his dependent variable, he summarized his findings in the following table:
|Independent Variable||Dependent Variable||Interpretation|
|Days absent||Physics scores||No trend, no impact on physics scores|
|Days tardy||Physics scores||No trend, no impact on physics scores|
|Reading scores||Physics scores||
No clear trend
Low/high reading scores = low physics scores
|Math scores||Physics scores||
Low math scores = low physics scores
High math scores = high physics scores
From his analysis, Alex concluded that there was no relationship between students’ attendance and their physics scores. At first he was disappointed with the results because he had spent so much time gathering and analyzing that data, but then he realized that the results helped guide his future analysis. In and of itself, the analysis was still informative and helped him focus his attention on the reading and math scores.
The scatterplot for the last independent variable (math scores) is displayed below. Based on his data, Alex could claim that students who had low math scores in 10th grade didn’t score well on the physics tests in 11th grade.
When Alex noticed this relationship between math test scores and physics test scores, he wondered if there was a correlation between the two. A correlation is a statistical association between two variables. A positive correlation exists when one variable increases and the other variable increases. A negative correlation exists when one variable increases and the other decreases. It is important to remember that correlation doesn’t mean causation. Just because two variables are associated doesn’t mean that one causes the other.
Alex looked up online how to use the CORREL (short for correlation) function in Excel to determine the correlation between his students’ 10th-grade math scores and their 11th-grade physics scores. He discovered that the correlation coefficient (the statistic describing the magnitude of the correlation) was 0.98. A correlation coefficient ranges from -1.0 to 1.0, so he determined that this correlation was very strong. Because the coefficient was positive, this was a positive correlation. As the scatterplot demonstrates, the data points form a line from bottom left to top right, so the physics scores increase as math scores increase. Based on his data analysis, Alex proposed that students with stronger math skills in 10th grade would perform better on physics problems in 11th grade.
To continue Module 4: Analyzing and Interpreting Data, click on Beth in the right-hand navigation.
At the Department Level: Beth
Like Alex, Beth had various types of data. She had students’ grades from first-year composition courses, their first-year GPAs, and their SAT scores. In addition, she had the faculty ratings of each student’s writing ability she had recently collected.
Beth remembered back to her graduate program when she had a brief introduction to statistics. She knew there were some basic statistics that could help her answer her question, “What factors contribute to students’ writing skills in literature courses?” She thought of three descriptive statistics that might be useful: mean, median, and mode. Beth did a quick Internet search to refresh her memory about these concepts. She discovered that the mean is simply the average of all of the data in a specific dataset. The mode is the most frequent value in a dataset, and the median is the value in the middle that separates the upper half of data from the lower half.
Using a spreadsheet, Beth entered the course grades for each student that she received from her institution’s registrar. She then assigned a numerical value to each letter grade. In the column next to the grade she entered the students’ first year GPAs, and in additional columns she entered their SAT writing scores and the faculty ratings. She did this for each composition course. Using the basic formula function in the spreadsheet, Beth calculated the mean for each variable in each course and then compared these statistics across courses.
Beth noticed four courses had higher mean grades compared to the other courses. More surprising to Beth was that in the four courses that had higher mean grades, the students actually had lower mean first-year GPAs and lower mean SAT writing scores.
These data prompted a couple of critical questions. First, why were students in the four courses generally getting better grades, as measured by the mean? Why were students with lower mean first-year GPAs and lower mean SAT writing scores performing better in these four literature courses than students with higher mean first-year GPAs and higher mean SAT writing scores? These questions suggested a couple of possibilities: one possibility was that faculty members in these four courses may have lower standards when grading their students. Another possibility, consistent with the data, was that these faculty members were teaching more effectively. It is even possible that something else was going on.
Beth decided to look closer at the mode and median grades of the four classes with the higher average and compare the mode and the median across the courses to see if they might illuminate the difference in mean grades in the courses. She noticed that in two courses, the median and the mode grades were higher than the average median and mode of all the courses. Having a mode that is higher than the average mode indicated that the most frequently occurring grades in those two courses were higher than the most frequently occurring grades across all courses. In the other two courses, however, the median and mode grades were comparable to the overall median and mode grades. These results puzzled Beth even more. She wondered if the results for the latter two courses might be explained by the instructors in those classes employing more effective teaching methods.
To continue Module 4: Analyzing and Interpreting Data, click on Cristina in the right-hand navigation.
At the Institutional Level: Cristina
Often it is useful to summarize data in a tabular format, especially when variables of interest are categorical (e.g., male and female), rather than numerical. Typically, each column presents values from one variable of interest, and each row presents values from the other variable of interest.
Cristina reflected on her question, “What trends in STEM attainment exist at our institution, and how do those trends compare at our peer institutions?” She then accessed the Texas Higher Education Coordinating Board (THECB) website and downloaded data about her institution and three other institutions. Cristina created a table in which the rows represent individual institutions, including her own, and the columns represent different STEM disciplines. She also inserted an additional row and column to show aggregate data across all institutions and disciplines. The individual cells indicated a percentage change in the number of STEM degrees awarded between 2000 and 2011, which Cristina calculated by hand.
|Computer Science||Engineering||Math||Physics||All disciplines|
After reviewing her summary table, Cristina discovered that the overall increase in STEM degree attainment at her institution was on par with the average statewide numbers. She also noticed that the number of degrees conferred in the computer science and physics departments at her institution had increased significantly more than the other institutions while the number of degrees conferred in the engineering department had increased significantly less than the other institutions. She also found that the institutions varied quite a bit by discipline—for example, UT Austin showed a substantial decrease only in the number of degrees conferred in Computer Science while Texas A&M showed a substantial decrease only in Physics. These numbers indicated to Cristina that her institution seemed to be doing a good job at improving STEM degree attainment, as compared to peer institutions, in all areas except Engineering.
This was somewhat surprising to Cristina. Given that she had been part of anxious discussions about STEM degree attainment, she had expected to find her institution in worse shape, compared to peer institutions, than it actually was. Cristina thought back to the other issue raised in many of these STEM conversations—poor student performance in gateway STEM courses—and realized that the degree attainment data did not tell her much about that issue.
To continue Module 4: Analyzing and Interpreting Data, click on Module in Action in the right-hand navigation.
Module in Action
As you learned from the case studies, there are many different ways to analyze data; the type of analysis you perform depends both on what type(s) of data you have and what form you wish your results to take. Your own data-driven decision-making journey will require analytical techniques specific to your data and situation, but there are some basic analytical techniques that you are likely to employ. The activities in this module will help you practice these techniques and build your data analysis skills.
To continue Module 4: Analyzing and Interpreting Data, click on Video 4.2 in the right-hand navigation.
Conclusion & Review
Module 4 offered you the opportunity to practice some basic analytical techniques to address your question. You also learned how to think critically about the results of an analysis to understand both its implications and limitations. When you decide to use data to answer questions that arise in your own work, you may be able to apply the analytical techniques you learned here, or you may require different ones. In the latter case, you can look up other statistical methods on your own or partner with someone who already has that knowledge. In either case, you will want to ask the same kinds of critical questions as you did in the activities associated with Alex's, Beth's, and Cristina's scenarios. These critical questions will help you determine whether your analysis provides a possible explanation for your question and/or presents any limitations. In Module 5, we will address moving from data to action.
To continue Module 4: Analyzing and Interpreting Data, click on Review Questions in the right-hand navigation.
- What is the difference between an independent variable and a dependent variable?
- What does a positive correlation signify?
- What are the differences among the terms mean, median, and mode?
To view the supplemental materials for Module 4: Analyzing and Interpreting Data, click on Supplemental Materials in the right-hand navigation.
To move on to Module 5: Putting Data to Use, click on Putting Data to Use in the left-hand navigation.