Overview
Solution and Impact
We designed and developed custom training and testing software. After training, users were able to reliably identify the composer of a piano clip in the test. So we showed experimentally that composer styles exist, and that they can be learned.
Problem
Music experts talk about composers having unique styles of composition. This is a key piece of how music is evaluated, sold, and taught. So it is important that there is strong evidence for composers’ styles. But there was no empirical evidence of this in the academic literature. So we investigated whether composers have unique, learnable styles of composition.
Methods
Literature review, Experiment Design, Software Design, Survey, Statistical Data Analysis
Approach
We reviewed and synthesized existing research to better understand the problem and evaluate initial hypotheses.
Part 2: Research and Methodology - Experiment
We used a quasi-experimental study design: We tested participants, trained them to distinguish composers’ styles, then tested again.
Part 3: Design - Learning Program
We designed a custom learning program for the training. We prioritized functionality over aesthetics because the software was for research purposes only.
Part 4: Design - Composer Style Test
We designed a new test of composers’ styles for before and after training.
Part 5: Results and Data Analysis - Statistics and Data Visualizations
We statistically analyzed our data to answer our research question.
Part 1: Research and Methodology - Literature Review
Our initial hypotheses, based on team members’ experience with studying and listening to music, were:
Composers have distinct styles of composition.
Composer style is hard to learn.
We explored the validity of these hypotheses through reviewing research on music learning, musical style, auditory perception, and perceptual learning. We read and discussed papers in our weekly meeting, identifying how each contributed to our understanding of our topic. Then we synthesized the research, identifying insights and gaps in the literature:
Humans can learn complex patterns (e.g. Thai, Krasne, & Kellman, 2015), including musical themes (Java, Kaminska, & Gardiner, 1995)
Musical period style has been extensively studied by psychologists (e.g. Hasenfus, Martindale, & Birnbaum, 1983)
The only 2 studies relevant to learning composer style showed some learning (Crump, 2002; Tyler, 1946), but did not separate composer style from musical form (i.e. sonata, rondo, waltz, etc.)
We identified a gap in the research: the research question of whether composer style exists and can be learned. There were two studies that suggested composer style may exist, but both did not separate the effects of composer from the effects of musical form. There was no direct evidence about the difficulty of learning composer style, though the lack of research suggests that it may be difficult.
This gap may be hard to fill, requiring extensive training of participants and careful study design. But it is also a theoretically and practically important gap, so we decided to do our best to fill it.
Part 2: Research and Methodology - Experiment
Study Design and Procedure
We decided to use a quasi-experimental pretest-posttest design: test participants, train them, and then test them again (see Figure 2). The study was 9 days long, beginning and ending with a lab session, so it would have been difficult to recruit a large enough number of participants to have a control group in addition to the experimental group.
Day 1 |
Day 2 |
Day 3 |
Day 4 |
Day 5 |
Day 6 |
Day 7 |
Day 8 |
Day 9 |
|
Lab |
Survey + Test +
Learn |
Test |
|||||||
Remote |
Learn (45 min.) |
Learn (45 min.) |
Learn (45 min.) |
Learn (45 min.) |
Learn (45 min.) |
Learn (45 min.) |
Learn (45 min.) |
FIGURE 2. DIAGRAM OF THE STUDY PROCEDURE: LAB TESTING, REMOTE LEARNING, LAB TESTING.
We tested participants before and after training, so that we could use the first Composer Style Test as a baseline of untrained test performance. Even though we could not compare performance to a control group, we could compare trained performance to baseline.
Participants
Recruitment: Participants were recruited through an online participant recruitment platform, which allows potential participants to sign up for studies they are interested in and available for.
Demographics: 41 UCLA undergraduate students (19 men, 24 women) volunteered.
Survey
We wrote a short open-ended survey to learn about our participants’ musical experience. We asked about their formal musical training and instruments that they played. On average, participants had about 7 years of musical instrument training, typically on the piano. 17 participants did not have music training before the study.
Part 3: Design - Learning Program
Content
Study and Learning Program Iteration |
Composers in Learning Program |
Reasoning |
|
Baroque period composers |
Romantic period composers |
||
First Attempt |
(n/a) |
Beethoven |
4 Romantic composers (2 early,
2 late) to hold musical period style relatively constant |
Chopin |
|||
Schubert |
|||
Schumann |
|||
Second Iteration (focus of this case study) |
Bach |
Chopin |
·
Very little learning with first iteration:
maybe styles were too similar, maybe confusion from similar spelling of
Schubert and Schumann ·
2 distinct musical periods (omitting
Classical period in between) with 2 composers in each, and distinct composer names |
Handel |
Schumann |
FIGURE 3. COMPOSERS AND PERIODS IN LEARNING PROGRAM.
We collected 100 15-second clips of piano music for each composer. We discussed various ways of selecting clips, and eventually decided to randomly selected a start time for each clip to minimize bias.
We wanted our users to learn the composers’ styles, and not just their favorite instrumentation, so we controlled for that variable by only using solo piano music. We selected the above composers because we found many high-quality recordings of several different solo piano pieces for each composer.
We also considered other ways composers and clips of music can be different from each other: volume, tempo, major or minor key, emotional quality (happy or sad or calm), etc. We wanted users to learn the essence of each composer’s style, so we included clips across a range of these variables for each composer.
Learning Algorithm
Users did not learn for a fixed amount of time. Instead, the learning was adapted to each user’s performance using Insight Learning Technology’s learning algorithm: Each user’s trial-by-trial performance was tracked for each composer. Users completed the Learning Program when they had answered correctly on most of the last several trials for each composer.
The learning algorithm was designed for learning visual materials, which are presented across space. Unlike visual materials, music is presented across time. We adapted Insight’s learning algorithm by removing response time from it, and by adding a delay before advancing to the next trial for incorrect answers. We wanted to make sure that participants heard most of the clip and had time to process it, so if they answered incorrectly, they had to continue listening (see Figure 5).
User Flows
The learning program had two parts: a user signup and login flow, and a flow for learning trials with feedback (see Figure 5).
Sign Up and Log In
Users set up accounts by creating a unique username and a password, and entering an email. Account creation required entering an email so that users could be contacted. Users accessed the learning program remotely by logging in with their username and password between laboratory sessions.
Learning Trials
The learning program was composed of learning trials (see video). In each trial, users heard a clip, saw “Which composer?”, clicked on a button labeled with a composer’s name, and then received feedback. The order of trials was customized for each user by the learning algorithm. We randomized the order of composer name buttons on each trial so that users learned the composer’s names and not a particular button location.
Trial Structure (see Figure 4)
Listen to clip
Attempt to identify composer of the clip by clicking on the composer’s name
Receive feedback on trial accuracy - feedback both helps users learn and is also motivating
If the answer was correct, give feedback that user was correct, then continue to the next trial - assume that the user knows the composer
If the answer was incorrect, show the correct answer and listen to (at least) 10 total seconds of the clip, then continue to the next trial - assume that the user does not know the composer and needs to get some more musical information (which is presented over time) from the clip
If the user did not respond in 30 seconds, show the correct answer and count as incorrect, then continue to the next trial - assume that the user does not know the composer even after hearing the whole clip
Part 4: Design - Composer Style Test
Content
We wanted to test users’ ability to identify the composer of a short clip of piano music. We tested whether users learned the 4 composers in the Learning Program training.
We wanted to have a high chance of showing learning, so we decided to test users on more than just the composers included in the training. We debated exactly who else to include in the test, then decided to include another composer in each trained musical period (Baroque, Romantic) and one composer each from two other periods (pre-Baroque and post-Romantic, see Figure 5).
This was our reasoning: If users learned the styles of the composers in the training but confused the composers’ names, then users would show learning by distinguishing trained composers from the other composer in each trained period. If users learned the period styles only, then users would still show learning by distinguishing trained periods from untrained periods.
We had 24 clips in the test. We included more than one clip per composer to get more data points to combine into better estimates of learning, and so that users could not just responding to one particularly distinctive clip.
Baroque |
Romantic |
Other Periods |
|
In Learning Program |
Bach Handel |
Chopin Schumann |
n/a |
Not in Learning Program |
Scarlatti |
Mendelssohn |
pre-Baroque: Byrd post-Romantic: Debussy |
FIGURE 5. COMPOSERS AND PERIODS IN TEST, BY INCLUSION IN LEARNING PROGRAM.
Because we were testing users more than once, we built more than one version of the test. We wanted the test to be new each time users experienced it, so that performance improvements were not (as much) from familiarity with the test.
We used each version as the first test for some participants and as the second test for other participants. We did this counterbalancing in case one version of the test happened to be harder, so that test difficulty would not bias our results.
Built in PsychoPy
We built the test in PsychoPy, a Python-based GUI for building psychology experiments. We (the researchers) had time regularly dedicated to this project, but our developers were working on several other projects that took priority for their time.
Instead of waiting weeks or months for developers to have time to build the test, we (the research team) built it ourselves. We decided to build in PsychoPy because it is was easy and fast to learn and use. At the time, PsychoPy did not have the option of including clickable buttons, so we set up responding through keypresses.
User Flow
The test consisted of the 24 test trials. On each trial, users heard a clip and pressed a key to indicate who they believed was the composer of the clip.
The trials in the test were similar to the trials in the Learning Program, but simpler in structure: users heard a clip while seeing the screen (see Figure 6), and then pressed a key to indicate who they thought composed the clip. Test trials were different from learning trials in that users did not receive feedback, because we did not want users to learn during the tests. As soon as a user responded to a trial, the next trial began (see Figure 7).
Trial Structure:
Clip plays
Keypress response collected (1, 2, 3, 4, 5, 6, or 7)
(No feedback)
Part 5: Results and Data Analysis - Statistics and Data Visualizations
Signal Detection
We noticed that participants were biased towards the “other” responses (“Other Baroque”, “Other Romantic”, “Other Period”), so we decided not to analyze the raw data. Instead, we decided to use signal detection statistics, because signal detection models the effects of sensitivity - the ability to tell if something is there or not - separately from response biases. If sensitivity is the signal, then bias is noise. (See my blog post on signal detection for more details.) We used formulas in Google Spreadsheets to transform the raw data into signal detection measures.
ANOVA for Composers
We wanted to look at the effect of two variables, test time and composer, at the same time (see Figure 8), so we conducted a two-way within-subjects Analysis of Variance (ANOVA) in SPSS. A custom hypothesis test in SPSS syntax showed that participants’ sensitivity, averaged across composers, did not significantly differ from zero (no sensitivity) when they were tested at the beginning of the study (pretest - blue bars). Participants did not know composers’ styles before the study.
Averaged across composers, participants showed significantly higher sensitivity at the end of the study (posttest - red bars) than at the beginning. Participants showed improvement across composers, and for almost every individual composer.
ANOVA for Musical Periods
We did not directly train participants on musical period style. Participants may have incidentally learned period styles from the Learning Program: A composer’s period was listed in parentheses on his button (see Figure 1 and video above). Additionally, we found evidence of sensitivity to period styles in previous research (see literature review).
To see the effects of test time and period style on sensitivity, we conducted another ANOVA (see Figure 9). Participants significantly improved their sensitivity from the initial test to the final test for both trained periods (Baroque, Romantic) and for the combination of the pre-Baroque and post-Romantic periods (Other Period). Even though period style was not directly trained, participants learned period styles.
Conclusions and Impact
Overall, we successfully trained users in composers’ styles, showing experimentally that composers’ styles exist and can be learned.
Users who volunteered for the study were already interested in music, and most had years of musical training. Despite their experience, the initial test showed that users could not reliably distinguish composers’ styles at the beginning of the study (blue bars in Figure 8).
After just a few hours of training through the Learning Program, users showed significant improvement on the final test (red bars in Figures 8 and 9). Users improved in their ability to correctly identify both the composer and the musical period of a clip of piano music. Users learned musical period styles even though the styles were not directly trained.
Reflection
Learnings - first iteration
First attempt: I joined the project as a research assistant with a background in psychology and music. I gained some of my first research experience outside of a classroom. With the lead researcher and my fellow research assistant, I helped design, build, and run participants in the first iteration of the Learning Program. Specifically, I helped choose included composers, suggested the longer enforced delay for incorrect trials, collected and analyzed musical clips, and ran participants.
Learnings - second iteration
Second iteration (focus of this case study): I was the lead researcher, partnering with the researcher who lead the first iteration as my co-lead. Through managing the study and our team of research assistants, I gained valuable experience with a variety of research tasks. I lead my first team of researchers, training research assistants in their tasks (collecting music clips, reading and discussing papers, running participants, helping program the test, helping transform the raw data into signal detection measures, presenting at lab meetings and conferences) and mentoring them in research design and ethics, analysis, and programming.
I also practiced collaborating in carefully designing complex research. Prior research could not clearly show whether composer style existed, so designing our study well was critical. Participants in the first iteration did not show much learning, so we redesigned of the Learning Program to make it less challenging. I ensured that we had more than one composer per period, and that we controlled for other important variables. We also wrestled through designing a new test that would capture more learning from our new Learning Program. By working through the design with my more experienced colleagues, I learned more about how to think through a broad range of possibilities.
Additionally, I gained more experience statistically analyzing data with signal detection and ANOVA analyses. I taught the research assistants about signal detection, and supervised their calculation of signal detection measures. Using signal detection allowed us to model the ability to recognize the composer (signal) separately from the bias (noise) in our data. I conducted ANOVA analyses on the signal detection measures in SPSS, gaining more experience with running statistics and with SPSS.
Finally, I gained experience communicating findings visually, orally, and in writing. I was the lead author of our paper, leading the research and writing most of the text, in collaboration with my co-lead and manager. I also created a scientific poster based on the research, and presented it at the International Conference on Music Perception and Cognition. I and my co-lead also coached our research assistants through making and presenting a poster on the research for a competitive undergraduate research conference.
What I Would Do Next
If I were to follow up on this study, I would next run a true experiment to confirm the findings of this quasi-experimental design. I might also interview some experts on the composers in the Learning Program to collect qualitative data on how experts characterize the composers’ styles, and then quantitatively test those descriptions with musical novices by having the novices match the descriptions to longer clips of music.
I would also be interested in trying to replicate these findings in a very different kind of music, like electronic dance music (EDM). That would be an extreme test of how far our results generalize, and more relevant to the majority of the music industry.