Abstract
Introduction
Aesthetic experience, with its observable aspects and phenomenal qualities, has been the focus of philosophical aesthetics since its beginnings (Nadal & Vartanian, 2022). In recent years, the interest in aesthetic experience has acquired new momentum. No longer is aesthetic experience reduced to a recipient's cognitive understanding of the artwork. Experiential and phenomenal aspects are increasingly considered in addition to, or even instead of, cognitive appraisals. Shusterman (1997) identified four dimensions discussed within this wider, experiential context: The
Aesthetic experience is thus conceptualized as practice, which in the case of music may involve embodied experiencing through bodily movements, singing along, and affective responses. Becker (2007) viewed musical experience as a process of reenactment (
In the field of empirical aesthetics, several models of aesthetic experience have been developed (Berlyne, 1960; Brattico et al., 2013; Fechner, 1876; Leder et al., 2012; Leder & Nadal, 2014; Pelowski et al., 2016). None of these, however, have specifically addressed
Approaches considering the relations between listener, music, and context have so far focused merely on single aspects of aesthetic experience, such as musical preferences (Greasley & Lamont, 2016; LeBlanc, 1982; North & Hargreaves, 2010) or on music-induced emotions (Zentner & Scherer, 2001). A comprehensive account covering the complexity of phenomenal aspects of musical experience and how experience emerges from the association of subject- and object-related aspects and their situational and sociocultural framing is still a desideratum.
In general, there is a growing correspondence between recent approaches in philosophical aesthetics and approaches that, informed by the cognitive sciences and explicitly relying on empirical data, emphasize the roles of the body and physical environment for musical experiencing (Cox, 2016; Krueger, 2009; Schiavio et al., 2014). This corporeal turn in (music) psychology is often termed
Empirical research on the experience of, and physiological responses to, music has become an important research field in psychology and systematic musicology, and has mainly focused on music- and person-related factors. Situational and framing factors were deemed less relevant than measures that have been monitored predominantly in the laboratory; only few studies have been conducted in ecologically valid concert settings (e.g., Egermann et al., 2013; Høffding et al., 2024; McAdams et al., 2004; Stevens et al., 2014; Tschacher, Greenwood, Egermann, et al., 2023).
The much-cited theoretical framework aiming at emotional responses to music by Juslin and Västfjäll (2008) distinguished different theories of emotion induction and applied these to music. They listed these psychological mechanisms for the induction of emotion:
Recently, Stevens et al. (2014) have carried out focus-group interviews to analyze the effect of varying locations in several concert environments, calling for a combination of different approaches in audience research. Given that questionnaires and self-report measures are the traditional methods of concert researchers, they promoted the future use of physiological data, brain scans, and observational data, since “recording a range of indicators of audience response to a live performance will shed light on the interplay between sensory modes such as vision, audition, kinesthesis, and changes in physiological arousal” (Stevens et al., 2014, p. 85).
We agree with this point of view and assume that embodied physiological and emotional states are insufficiently represented by the verbal reports of standardized surveys (Barsalou, 2008; Behne, 1982; Clarke, 2005; Tröndle et al., 2014). Only a combination of different data types, including listeners’ peripheral physiological responses, body movements, and facial expressions, can provide a holistic, integrated view of experiential processes. Consequently, and to include the mentioned additional data types, we are proposing an integrative approach to studying music experience in concerts. The methodological framework described in this article was designed to realize comprehensive data acquisition.
In an earlier project, we developed a research setup to study art experience in a museum context (eMotion – mapping museum experience 1 ). The rationale of this methodology in a different realm of aesthetic research was a building block for the methodological framework presented here covering concert experience. An integrated research approach provides novel insights and a holistic understanding of aesthetic experience. In the following section, this methodological design adapted to concert research will be introduced in detail.
This methodology was developed to address the physiological, experiential, and behavioral dimensions of aesthetic experience. Its goal is to collect dense information on listeners’ cognitive responses to the music, such as its expressive, artistic and performative qualities, as well as self-reported responses to the music in the situation (experiences, emotions). Data are captured as simultaneous time series, which opens the potential to model phenomena of entrainment and synchrony. We consider aspects of the concert frame (spatial and architectural disposition, room acoustics, the aura and reputation of a venue) and listeners’ relating to the presence of others (being part of the audience, relation to musicians). This methodological approach was elaborated to cover three different timescales: the whole concert (timescale 1), the pieces presented in the concert (timescale 2), and specific short music passages (timescale 3).
The main objectives of the methodological framework are therefore to implement data acquisition under the field conditions of real live concerts open to the public. Guarding ecological validity and low invasiveness were major tenets of this framework. As it is a common concern that the fact of measurement may alter the very data measured (the “Hawthorne effect”), we planned to include a control group which would fill out self-report scales yet not provide physiological recordings. A further objective was to allow for the integration of data sources by triangulating objective and subjective data. This demanded exact timestamping of the various data sources in order to merge self-report with physiological measures in the post-concert survey.
Methods
The research design involved the integration of quantitative and qualitative methodologies to assess the behavior, physiology, and self-reported aesthetic-emotional experiences of concertgoers within live classical music concerts. The methodological framework was tested in 3 public pilot concerts and finally implemented in 11 public concerts of the large-scale project ECR–Experimental Concert Research. 2
Overall, quantitative data acquisition in the project included the collection and analysis of physiological measurements of electrodermal activity, cardiac activity, and respiration measured by sensors attached to one hand and a breathing belt. Videos were recorded of the audiences and the music ensemble from different perspectives and formatted to allow motion-capture techniques and detection of facial expressions in the audience. High-quality audio recordings of the concerts were collected to enable audio analysis. Self-reports of listeners were assessed through questionnaires administered by electronic hand-held devices in standardized pre- and post-concert surveys.
Qualitative research methods complemented the wide range of data acquisition methods; these included focus-group discussions with audience members after the concerts and interviews with the musicians. Further, the video recordings of the audiences were also accessible to qualitative analyses.
In the following subsections, we will first introduce the
Participants’ Journey

Questionnaire hall in Radialsystem, Berlin, 2022. Photo: Phil Dera.
After signing the consent form and receiving the token, each participant was then assisted on how to fill out the pre-concert questionnaire on the provided tablet computer. Assistants were trained in advance concerning all procedures and the sequence of methodological steps.
After completing the pre-concert questionnaire, participants proceeded into the concert hall. They were greeted by further assistants who showed them to their individually assigned seats with sensors, and control-group participants to their seats in the allocated row (Figure 2). Regular visitors independently entered the hall right before the concerts.

Participant journey – pre-concert and in-concert.

Custom-developed fabric glove with sensors for electrodermal activity (under the black cloth around two fingers), cardiac activity (gray finger clip), and black respiration belt (over the waist). Photo: Phil Dera.
After fixing the sensors, a sensor-placement check was carried out. Using the dashboard of the developed web interface, the project engineer monitored each participant's incoming real-time data. Each of the three sensor types (for EDA, BVP, and respiration) were individually checked. Typically, a flat signal indicated that sensor placement was insufficient and needed better fixation by the respective assistant. Common issues also included loose respiration belts and loose BVP clips on thin fingers. The sensor-placement check greatly contributed to the physiological data quality. After the completed sensor placement, the musical program of approximately 70 min started.

Participant journey – post-concert.
As soon as peak processing was finished, participants started filling out the questionnaires provided on the tablets. They also received headphones because short music segments together with video sequences (timescale 3) were presented as part of the post-concert questionnaire (see below,
The post-concert questionnaire assessed participants’ listening experiences and cognition during the concert (Behne, 1997; Rössel, 2011; Weining, 2022). To analyze the difference between pre-concert expectations and in-concert experiences, a group of corresponding items were entered both in pre- and post-questionnaires. The participants were asked to rate the concert as a whole (timescale 1) and also each musical piece separately (timescale 2) regarding their aesthetic experiences. The questionnaires will be made available online.
Furthermore, timescale 3 referred to short music segments representing specifically salient passages of the concert. The music of the just-completed concert was divided into 96 segments (mean duration, 40 s). These 96 segments were predefined by two of the authors with a strong musicological background to represent a musical unit, following the compositional inherent logic. Compared to language, each segment contains a single “phrase” or “sentence,” so that durations of segments differed slightly to represent meaningful musical entities. Out of the 96 segments of each concert, eight segments were presented to each participant in the post-concert questionnaire. Three segments (one for each piece) were so-called index segments, which were pre-determined and addressed the same passage throughout concerts for all participants. One segment was selected randomly for each participant. Four further segments were individualized for each participant based on their peak physiological responses.
Participants were asked to evaluate each of these eight segments using a stimulated-recall method. Using headphones and the tablet computer, the participants were presented with the video snippet showing this segment exactly as it was played in the concert just attended (Figure 5). After presentation of each video, participants rated the segment using items of the Aesthetic Emotions Scale (AESTHEMOS, Schindler et al., 2017).

Presentation of a video snippet to participant on a tablet computer. Photo: Phil Dera.
Technical Set-Up
Up to 88 participants had their electrodermal activity, cardiac activity, and respiration recorded while listening to the music. Only a few minutes after the end of the concert, hence also the end of recording, participants would take part in the questionnaire, which included video snippets of exactly those four “peak segments” during which each participant had shown peak physiological values. This demanded two processes running in parallel after the concert: First, the video recording of the concert had to be segmented to produce all 96 video snippets; hence, an exact logging of segments with the timestamps of segment starts and ends was needed during each concert. Second, from the raw physiological data of the sensors, meaningful physiological measures had to be extracted, namely heart rate (HR), heart-rate variability (HRV), respiration rate (RR) and skin-conductance response (SCR), to define the four peak segments. Additionally, the three index segments and one random segment had to be determined (Figure 6).

Plan of the informational infrastructure.

Custom Biosignalsplux 8-Channel Hub.

Seats in the concert hall prepared for participants with physiological measurement.
The set of three sensors was then plugged into a Biosignalsplux 8-channel hub (Figure 9). This modified hub had an extra micro-USB port that was attached via a shielded cable to the port of each participant's Raspberry Pi. The Raspberry Pi’s were connected via Gigabit Ethernet to the wired network. A wired network was chosen as it provided the least latency while ensuring higher reliability than a wireless network connection. The hubs could be programmed to start/stop a data acquisition and thus stream the live data from the sensors to the attached Raspberry Pi.

Unit holder for Biosignalsplux Hub linked with sensors (left) and Raspberry Pi (right).
The 88 Raspberry Pi computers were used to interface with the hubs. They acquired data via the USB connection, stored the data in an Apache Kafka topic (free software of Apache Software Foundation, Wakefield USA), and generated peaks from the stored data.
A project-developed glove held the three sensors close to the participant's hand. After pilot studies, we decided to manufacture gloves for several reasons: The cables could be integrated into them, thus were fixed to reduce failures and artifacts in data collection. Participants were asked to not move the hand much and rest it on the leg. As the sensors ruled out giving applause by clapping the hands, the team recommended to show appreciation by stamping with the feet. This was welcomed without problems by all audiences in the project concerts.

Top: Birds-eye infrared cameras connected to Raspberry Pi’s. Bottom: Stills taken from the birds-eye perspective (participants; ensemble).
Two Raspberry Pi’s, each with a 12 megapixel (12 MP) camera module and a USB sound card, were used to record the musicians on stage. The resulting videos including the timestamp (Figure 11) were dynamically integrated into the segment ratings of the post-concert questionnaire.

Still taken from stage camera.
High-resolution infrared cameras (Geutebrück GmbH, Windhagen, Germany) were mounted above the stage toward the audience capturing the participants’ facial expressions and gestures (Figure 12; four cameras at the Pierre Boulez Saal; eight at the Radialsystem). This data was recorded with the same timestamp as was used for all other recordings (using Clapperboard software) and stored on a camera-specific server.

Still taken from the audience camera.
The basic software requirements included presenting the questionnaires on the tablet devices; data acquisition and storage of the sensor data; processing of the sensor data and storage of segments; and the recording of the segment videos and audios and their individual integration into each participant's post-concert questionnaire. All datasets received the timestamp of the central server to synchronize the cross-analyses.
The open-source, online-survey software LimeSurvey (LimeSurvey GmbH, Hamburg Germany) was used due to its flexible customization possibilities needed in the project to enable dynamically integrating the segment videos into questionnaires. The questionnaires used a standard LAMP stack (LAMP stands for Linux, Apache, MySQL, PHP). Export of the questionnaire data in numerous formats for ensuing analysis was possible.
The starting and stopping of the data acquisition of all devices was controlled via the dashboard (web interface of this application: React.js/Node.js/MongoDB). The software had two main pages in its navigation, the Devices and the Logger pages.
On the

Monitoring of physiology sensors on the
The

Logging of segments and events using the
Logging determined the unique position of each of 96 pre-defined segments across all pieces (Beethoven, Dean, Brahms). If the pieces were performed in different sequences in a concert, the segment could thus still be identified. The start and finish properties represented the start and finishing times of the segment in Unix time.
After the testing in pilot concerts, we decided to define each participant's peaks in relative terms, and not exclusively use the 15-s sliding window for peak detection. Thus “peaks” were not necessarily those moments in which this participant's extreme value of the entire concert was reached, but moments that stood out within the local context of the respective movement. The reason behind this decision was to not favor segments located in the more “exciting” or high-tempo movements from the start. Correspondingly, peak segments of HR, RR, SCR, and HRV were those four segments of each participant that showed the greatest deviation from their respective movement averages of HR, RR, SCR, and HRV. Detection of peaks was thus performed in segments within the same movement.
Results
In this section, we will first sketch the main findings that have resulted from implementing the presented methodological framework to date and also point to analyses suggested or enabled by the data. Subsequently, we will address the challenges and the benefits that have come to the fore when the methodological framework was implemented.
Main Findings and Analyses
In the context of the quantitative analysis of facial expressions, the regions of interest were identified first. A Python script was then used to crop individual videos of all faces that allowed detection of facial expression (in the present dataset, 537 faces were selected; participants with masks or reflecting glasses were excluded). Many participants were recorded by more than one camera, so the best video quality for each face was identified, and duplicates excluded. Ultimately, a total of 303 faces were analyzed using the software iMotions Affectiva (Figure 15). Based on the Facial Action Coding System (Ekman et al., 2002), Affectiva monitors muscle movement in 23 different facial action units. Seven basic emotions and five complex emotions can be derived from combinations of action units. With the background of the questionnaire data and physiological measurements, it is possible to investigate the connection between the emotional facial expressions displayed by participants during the concert and their ratings of aesthetic emotions as well as the physiological recordings (Herget et al., 2023; Weth et al., 2015).

Automated facial expression analysis using Affectiva Software. Rectangles show “regions of interest’,” i.e., participants’ faces included in the analysis (green: included faces).
“

Printout of the artwork
Challenges
The methodological framework constituted a large array of variables encompassing multiple components of subjective emotion and experience relevant in music listening (Scherer, 2005): cognitive appraisals (in the present methodological framework, questionnaire data on appreciation), subjective experiences (questionnaire data on experiences), physiological arousal (continuous physiological monitorings), and expressive behavior (camera recordings of musicians’ and participants’ movements, applause intensity and duration). Whereas in controlled laboratory experiments most of these components may be reliably accessible, the implementation in field contexts such as public concerts posed considerable challenges owing to the large sample of participants whose data were recorded simultaneously.
Thus, the technical development of the software, hardware, and infrastructure was demanding. Although a variety of devices for physiological data collection in research settings is available, problems were encountered related to internal time processing, synchronization with external sources, and the connectivity of the devices for real-time data analysis. We therefore collaborated with the producers of Biosignalsplux to adapt their devices to our needs. As described in the Methods section, a modified infrastructure was necessary to arrive at a stable and flexible system.
To ensure a smooth experimental procedure within the context of a live concert evening was another major challenge. A large and trained team of temporary assistants and professional coordinators was crucial. With a ratio of one assistant per two participants, each participant was guided comfortably through the evening. This ensured a convenient personal experience for concertgoers and kept the level of ecological validity and participant satisfaction high. Waiting queues emerged only at the venue entrance, and participants could move freely in the venues except when filling out the questionnaires and getting attached to the physiological sensors. For participants, the duration of the concert evening including the surveys was about 2.5 h, which seemed the maximum in terms of concentration and patience. On principle, future projects should consider condensing the various steps of methodological procedures wherever possible.
Studying the effects of variations of the concert format (i.e., the “concert frame”) on aesthetic experiences was one goal of the methodological framework. These variations were programmed with the background of state-of-the-art concert formats and programs in the field of classical concert practice. Throughout the process and by analyzing the data, we learned that not all such variations turned out as expected (Tröndle, Weining, Uhde, et al., 2025). In some cases, the variations were found too subtle to exert a clear effect. Future research may consider testing the impact of variations before their actual experimental implementation. Further, an artistic realization of a certain format idea (such as “audience participation” or “visual enhancement through lighting”) may in principle take numerous concrete forms. In our experiments, we studied the effects of only one realization per artistic idea, which does not yet allow generalization of findings.
Owing to the complexity of the hardware and software set-up, artifacts and missing data were to be expected as in all physiological research, especially outside the lab. Countermeasures were implemented to reduce attrition to a minimum. In 2020, we therefore conducted a pilot study with three concerts to test the technical setup and design. Of 141 participants who provided informed consent, 9 had to be excluded as they were younger than 18 years, or as they left the venue before the concert. The physiological recordings showed considerable degrees of missing data, ranging from 9% in respiration, 35% in electrodermal activity, to 53% in cardiac measures. This afforded improvement of finger-clip sensors for the blood-volume measures and better fixation of electrodes, which was achieved in time before the final concerts. Considerable work was additionally invested in adapting the peak-detection algorithm, whose performance was found unsatisfactory in the pilot study. The resulting improvements were integrated into the 2022 setting of the regular project concerts. In other words, the rehearsal concerts of the pilot study were essential for the success of the final data acquisition.
Although the hardware and software were tested in the pilot-study concerts, the complexity of the design resulted in some unexpected dropouts and errors. Some limited data losses were caused by misunderstandings among musicians or researchers, which could have been avoided by written instructions, for instance a concise instruction manual for the logger page, and even more extensive previous training.
Concerning network design, we were confronted with problems encountered in a wireless network architecture. It was therefore decided to use a wired network instead because of its higher reliability. This decision increased the effort and costs involved in the installation, as a dedicated network installation company had to be employed to lay hundreds of meters of network cabling. Networking issues occurred particularly with the simultaneous streaming of the locally hosted segment videos to participants. Some participants found the waiting time due to peak processing unpleasant; computational load balancing may alleviate this delay problem in the future.
Some Biosignalsplux hub devices became unresponsive to dashboard calls right before the concert started. We countered this issue by generally conducting short start-and-stop tests. Unresponsive devices were thus identified and turned off and on again. The 15-min time window dedicated to sensor placement on participants was therefore found very short and should be extended.
Benefits
In the face of the described challenges and likely owing to coping with challenges at the right time, the project arrived at a successful implementation of this methodology framework, generating a rich and complex dataset. The integration of the data by the visitor tokens and a computational infrastructure providing a central timestamp allowed us to relate and join the various data types. This multiplied the analytic possibilities and offered novel insights into the aesthetic and social experience of people attending a classical music concert. The presented methodology is scalable and can be readily implemented in other kinds of performances, such as opera, theater, musical, and movie.
As described in the Methods section, only a modified infrastructure (building USB links connecting Plux devices with Raspberry Pi’s) finally allowed us to develop a stable and flexible system, which could start and stop the devices at the same time and if necessary repeatedly, process data streams simultaneously and in parallel on the multiple Raspberry Pi’s, integrate the logger data, and synchronize the physiological data with the videos.
This complex methodology was able to generate individualized video sequences based on the physiological data that had just been recorded during the concert and then had undergone the peak detection process. By using high-performance servers, we managed to reduce the duration of post-concert data processing, simultaneously for up to 88 participants, to between 5 and 10 minutes. Presenting such data in time for ratings in the subsequent post-concert questionnaires is an innovation in empirical aesthetics research.
An essential methodological improvement was monitoring of the real-time data streams to the dashboard, which allowed readjustment of participants’ sensors before the presentations started. This significantly increased the number of high-quality datasets from the pilot study in 2020 to the final study in 2022. In the 2022 data acquisition, 747 adult participants were allocated to the group with physiological data acquisition and 45 to the control group without. Participants reported little distraction resulting from the sensors; the general evaluation of the concert experience in the post-concert survey showed no significant differences between the participants and the control group of participants without physiological measurements (ANOVA
Discussion
The methodology of data acquisition described in this article allows for types of analyses that are novel for musicology. The setting of live concerts offers ample information on the collective behavior and physiology of whole audiences. As the time series data are recorded simultaneously, physiological and motor synchronies of complete audiences can be derived from all participants’ coordination of physiology and movement. Using cross-correlation algorithms, such synchrony signatures can be quantified for entire audiences but also assigned to the individual participants, their “synchrony contributions.” These latter measures allow linking objective embodied synchronies to each participant's self-reported aesthetic assessments of the presented music, thus supporting fine-grained modeling of individual associations between physiology, body movement, and aesthetic experiences. Additionally, the synchronization of single participants with time series of music properties such as loudness, tempo or pitch (MIR data) offers data for individual associations. The pre-concert survey provides data on further states and traits of participants, their affective and mood states, personality traits, and musical backgrounds and attitudes.
Establishing the methodological framework of a large research project such as the one described here demands considerable financial resources, and much effort must be devoted to team-building. A willingness to overcome unforeseen hurdles is mandatory. Hurdles may include technical and practical problems; it is especially important to cope with the restrictions of scientific disciplines and compose a truly interdisciplinary team (Tröndle et al., 2019, 2022). The funding period of the present project, ECR–Experimental Concert Research, was six years, which does not include years needed for building the core team and the preparation of the proposal, nor most of the time for data analyses and publications after the official termination of the project.
The outlined methodological framework yielded a high degree of valid data while adhering to an ecologically valid, non-invasive approach. Looking back on the data and insights gained to date, we believe that such knowledge can only be accumulated by transdisciplinary cooperations and the collection of large and diverse datasets. The complexity of aesthetic experience must be investigated through integrative studies of the real performance “in the wild” such as in the context of a public concert hall. In this way, reliable and generalizable knowledge of concert experience, and aesthetic experience in general, comes within reach.
Footnotes
Action Editor
Alexander Refsum Jensenius, University of Oslo, RITMO Centre for Interdisciplinary Studies in Rhythm, Time, and Motion, & Department of Musicology.
Peer Review
Sara D'Amario, University of Oslo, RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion.
Jonna Vuoskoski, University of Oslo, RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion.
Author Contributions
MT: project management, writing; SG: setup of informational infrastructure, data integration; writing; CW: organization of data collection; CR: data integration; MW-F: musicological concert log, review; HG: review; A-KH: facial expression analysis; DH: music information retrieval; CS: review; WT: statistical analyses, writing.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical Approval
The written informed consent included permission for recording and processing of participants’ data. The statement informed about privacy protection, data security, and the right to the deletion of data, including video and sound records. Permission was also requested for the usage of video and sound data for demonstration purposes including teaching and publications. The procedure adhered to the principles of the Declaration of Helsinki and ethics regulations in Germany and was approved by the Ethics Council of the Max Planck Society (#2702_12).
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was conducted within the Experimental Concert Research project, which was substantially funded by VolkswagenStiftung. The concert series was additionally supported by the Aventis Foundation (grant number 93 263).
