Perceptions of Visual-to-Auditory Sensory Substitution (VASS) Technology Among Students With Visual Impairments and Special Educators

Abstract

While prior studies on visual-to-auditory sensory substitution (VASS) have focused on technological development, practical adoption remains limited. This study explores how students with visual impairment and special educators in South Korea perceive the usability and potential improvements of VASS after brief training. Drawing on semi-structured interviews and thematic analysis, the study identified four overarching themes (i.e., perceived value, influencing factors, real-world challenges, and suggestions for improvement) spanning 18 subthemes. Participants noticed the potential of VASS to enhance learning, spatial reasoning, creativity, and engagement, while also highlighting limitations including high cognitive load, unclear comparative advantage over conventional assistive tools, and the absence of structured training protocols. The findings emphasize the importance of user-centered design, adaptive training, and multisensory integration for future development of sensory substitution systems in educational and daily contexts.

Keywords

assistive technology sensory substitution user perceptions visual impairment visual-to-auditory sensory substitution VASS

Introduction

Sensory substitution refers to the process of delivering information acquired through one sense via another. It is grounded in the brain’s neuroplasticity—the capacity to reorganize and adapt to new sensory inputs (Bach-y-Rita, 1972; Bach-y-Rita et al., 1969). Visual-to-auditory sensory substitution (VASS) involves converting visual features such as shape, color, and spatial layout into auditory signals, enabling users to access visual information through sound.

Since the 1960s, advances in neuroscience, computer science, and engineering have led to the development of increasingly sophisticated sensory substitution devices (SSDs). Notable examples include the vOICe (Meijer, 1992), which converts digital images into continuous soundscapes, and newer tools like EyeMusic (Abboud et al., 2014) and EyeCane (Buchs et al., 2015; Maidenbaum et al., 2014), which integrate additional sensory cues such as color or depth (Levy-Tzedek et al., 2012). Recent developments include real-time feedback systems, smartphone-based applications like TADA (Zhao et al., 2024) and ChartA11y (Zhang et al., 2024). In South Korea, the Electronics and Telecommunications Research Institute (ETRI) has developed the V2A (Visual to Audio) program since 2019 to support multisensory fusion.

Despite these technical advancements, the widespread adoption of VASS technologies among individuals with visual impairment remains limited. Much of the existing research has been conducted in controlled laboratory settings (Elli et al., 2014; Kristjánsson et al., 2016). This gap between technical capability and real-world use suggests that factors beyond technical performance such as user perception and contextual relevance may play an important role in implementation success.

Bridging this gap requires a clearer understanding of how VASS is perceived and evaluated by its intended users. The Technology Acceptance Model (Davis, 1989) and Diffusion of Innovations theory (Rogers et al., 2014) emphasize that adoption depends not only on functionality but also on perceived usability, compatibility with existing practices, and contextual relevance. Recent studies have begun to address these user-centered dimensions of assistive technology, examining satisfaction and accessibility (Liang et al., 2024), barriers to educational inclusion (Hayes & Proulx, 2024), and preferences for controllable and comfortable sensory-substitution designs (Hamilton-Fletcher et al., 2016). However, these studies primarily focus on general awareness or conceptual barriers rather than users’ firsthand experiences with a specific VASS system in educational contexts.

Addressing this gap, the present study investigates perceptions of VASS technology among students with visual impairment and special educators in South Korea. Through semi-structured interviews and thematic analysis, this study explores how potential users interpret, evaluate, and envision the application of VASS in educational and daily contexts. Drawing on participants’ reflections, the study identifies both the opportunities and constraints of VASS implementation and offers user-informed insights to guide its future development.

Methods

Participants

Participants were recruited from three schools serving students with visual impairment located in different regions of South Korea. The final sample included a total of 21 participants (10 students and 11 special educators with over 5 years’ experience supporting students with visual impairment). Schools are a common setting for introducing and training students on assistive technologies, providing an important early user implementation context. Students, as direct users, offer insights into real-world usability and sensory demands, while special educators, including several with visual impairment, provide perspectives on instructional integration, design fit, and classroom feasibility. Recruitment flyers were posted at each school, and interested individuals received study details and consent forms.

The student group consisted of two females and eight males, with a mean age of 16 years (range = 14–19, standard deviation (SD) = 1.66). Six students (60%) had total blindness, and none had hearing impairment (see Table 1). The educator group included four female and seven male teachers, with a mean age of 44 years (range = 35–55, SD = 6.5) and a mean of 15.3 years of teaching experience in visual impairment education (SD = 6.5) (see Table 2). Four educators (36%) had visual impairment, including three with total blindness, and none reported hearing impairment. Overall, this heterogeneous sample provided diverse perspectives across roles and sensory experiences.

Table 1.

Demographic Information of the Students.

ID	Gender	Age	Vision status	Presence of hearing impairment	Cause of visual impairment	Age of visual impairment onset
sast1_vi	Male	19	Partially Blind	No	Retinal inflammation	0
sast2_vi	Male	16	Partially Blind	No	Brain tumor	4
sast3_vi	Male	15	Totally Blind	No	Unknown	0
scst4_vi	Male	18	Totally Blind	No	Glaucoma	0
sast5_vi	Male	16	Totally Blind	No	Retinopathy of prematurity	0
sbst1_vi	Female	17	Totally Blind	No	Unknown	0
sbst2_vi	Female	14	Partially Blind	No	Congenital glaucoma	0
scst1_vi	Male	15	Totally Blind	No	Unknown	0
scst2_vi	Male	14	Totally Blind	No	Medication side effects	10
scst3_vi	Male	15	Partially Blind	No	Unknown	0

Note. Participant IDs ending in “_vi” indicate individuals with visual impairment.

Table 2.

Demographic Information of the Educators.

ID	Gender	Age	Years of teaching experience	Vision status	Presence of hearing impairment	Cause of visual impairment	Age of visual impairment onset
satr1	Female	42	13	Sighted	No	—	—
satr2_vi	Female	43	20	Partially Blind	No	Retinitis pigmentosa	0
satr3	Male	35	5	Sighted	No	—	—
satr4_vi	Male	44	11	Totally Blind	No	Retinopathy of prematurity	0
satr5	Male	50	13	Sighted	No	—	—
sbtr1	Male	48	16	Sighted	No	—	—
sbtr2	Female	39	14	Sighted	No	—	—
sbtr3	Female	37	14	Sighted	No	—	—
sbtr4_vi	Male	52	26	Totally Blind	No	Explosion accident	10
sbtr5	Male	39	10	Sighted	No	—	—
sctr1_vi	Male	55	26	Totally Blind	No	Retinal detachment	15

Note. Participant IDs ending in “_vi” indicate individuals with visual impairment. “—” indicates not applicable (i.e., for participants with sighted vision).

Apparatus and Materials

This study used the Visual-to-Audio (V2A) system developed by the Electronics and Telecommunications Research Institute (ETRI), which applies the vOICe algorithm to convert visual images into auditory input. The algorithm scans each image column by column, encoding vertical position as sound frequency, horizontal position as time, and pixel brightness as sound intensity (white = loudest; black = silent). Each image was divided into a 64 × 64 grid, mapped to frequencies from 80 to 7,600 Hz over 1.05 s per frame, with 16 gray levels and click sounds marking frame boundaries. These parameters reflected ETRI’s (2021) validated default configuration, optimized for perceptual clarity and temporal efficiency. A non-musical tone mapping was used to examine auditory–spatial processing and cognitive load. The study employed the Windows-based prototype, as the mobile version was still under development as part of the broader multisensory-fusion initiative during data collection.

Auditory signals were delivered via over-ear headphones. Eight simple geometric shapes, including circles, ellipses, triangles, and rectangles, were used as stimuli (see Figure 1). Simple, symmetrical shapes were selected to allow participants to understand VASS concepts quickly. Visual stimuli were presented on a laptop screen for sighted participants, while participants with visual impairment explored 3D-embossed tactile graphics of the same shapes. Graphics measured 11.6 cm squares or 22 cm squares upon request, with contrasting textures.

Figure 1.

VASS learning session screen.

A custom-developed presentation interface displayed two shapes per screen or tactile set across four learning trials. Participants could replay each shape’s soundscape as needed during learning. For the post-training test, participants were presented with a single tactile shape or visual shape and asked to identify the matching sound from two audio options. Each sound could be played up to two times (see Figure 2).

Figure 2.

VASS test session screen.

Procedure

After consent, participants completed a demographic questionnaire on age, gender, vision and hearing status, and years of teaching experience for educators. Participants completed a 15 min VASS training session in a quiet room at their respective schools. Sessions began with a standardized explanation of the vOICe system and sensory substitution (see Figure 3). Participants explored each shape, listened to its corresponding sound, and identified auditory patterns. No explicit instruction was provided on the mapping between sound and visual features. The training involved four sets of two shapes. Sighted participants used vision to view digital shapes on the laptop screen, while participants with visual impairment used tactile graphics only. All participants listened to the associated soundscapes via headphones.

Figure 3.

Introduction script for VASS technology.

The purpose of this brief training was not to assess mastery of the VASS system, but to provide participants with a basic understanding of how auditory signals encode visual information and to enable them to meaningfully engage with the system during subsequent tasks and interviews. The training was intentionally designed to introduce core principles of sound-shape mapping in an accessible and simplified manner, allowing participants to grasp the fundamental logic of sensory substitution within a limited time frame.

After training, participants completed a shape-sound recognition test to assess their understanding of the basic encoding principles introduced during the session. The task was not used as an outcome measure but served to ensure that participants had sufficient familiarity with the system prior to the interview. For each of the eight test trials, a single shape (tactile or visual) was presented. Participants listened to two sound options with one correct and one distractor, and verbally identified the matching sound. Sound could be repeated once upon request. Participants generally demonstrated a basic understanding of the encoding principles before proceeding to the interview. Each participant then completed a 30- to 60-min semi-structured interview about user experiences and perceptions (see Figure 4). Interviews were audio-recorded with consent and supplemented by field notes.

Figure 4.

Interview protocol.

Ethical Considerations

Ethical approval has been given by the Institutional Review Board of Korea University (Approval No. KUIRB-2023-0313-0).

Data Analysis

This study employed a qualitative approach to explore participants’ perceptions of VASS. Thematic analysis was conducted following Braun and Clarke’s (2006) framework. We began by reading interview transcripts repeatedly to build familiarity and generate initial impressions. Inductive coding was used to capture meaningful features, grounded in participants’ own language. Codes were reviewed, grouped into candidate themes, and refined through team discussions to ensure coherence. To enhance credibility, selected participants reviewed preliminary findings, and their feedback informed final themes.

Transcripts were generated using Clova, a Korean speech-to-text tool, and manually reviewed for accuracy. Coding was conducted in Taguette, an open-source qualitative analysis platform. To strengthen analytic rigor, we triangulated interviews with field notes, engaged multiple coders to reduce bias, and conducted informal member checking during interviews. A detailed audit trail was maintained throughout to ensure transparency. This process yielded four overarching themes—perceived value, influencing factors, real-world challenges, and suggestions for improvement, comprising 18 subthemes, summarized in Table 3.

Table 3.

Developed Themes.

Theme	Subthemes
Potential Value of VASS	Intuitiveness
	Academic Use
	Daily Life Use
	Auditory Sensitivity
	Creativity and Engagement
Factors Influencing Effectiveness	Auditory Abilities
	Cognitive Abilities
	Training and Practice
Anticipated Challenges in Real-world Application	Fleeting and Sequential Nature of Sound
	Mechanical Sound Discomfort
	Unclear Comparative Advantages
	Complexity and Risk of Misinterpretation
	Need for Long-Term Training
Suggestions for Improvement	Customizable Sound Options
	Standardized Sound Cues
	Selective Information Delivery
	Multisensory Integration
	Device Design and Assistive Options

To understand the prominence and distribution of these themes, we supplemented the thematic analysis with a frequency count of how many unique participants mentioned each theme at least once. We categorized responses by participant group and compared subtheme frequencies between students and teachers, and between teachers with and without visual impairment, to explore how themes varied by role and vision status. Finally, we examined whether responses clustered by school, but no consistent institutional patterns were observed. Although subgroup comparison was not the primary goal of this study, this breakdown provides useful descriptive context.

Results

This section presents findings from the thematic analysis, organized into four overarching themes and 18 subthemes: (a) perceived value of VASS technology (intuitiveness, academic use, daily life use, auditory sensitivity, creativity and engagement); (b) influencing factors (auditory abilities, cognitive abilities, training and practice); (c) anticipated real-world challenges (fleeting auditory cues, discomfort with mechanical sounds, comparative advantages, complexity and misinterpretation, long-term training); and (d) suggestions for improvement (customizable sounds, standardized cues, selective auditory input, multisensory integration, assistive device options).

To contextualize the findings, we first present an overview of the frequency with which each subtheme was discussed across participants and group-level comparisons. Then, we offer an in-depth analysis of each theme, supported by illustrative excerpts.

Overview of Emerging Themes

The most commonly discussed subthemes included daily life application (N = 20) and academic use (N = 18), complexity and misinterpretation of the auditory information (N = 17), and device design and assistive options (N = 17). These patterns highlight participants’ shared focus on the practical usability and real-world implementation of VASS. To explore how participant roles shaped their perspectives, we compared the frequency of subthemes across students and teachers. As shown in Table 4 and Figure 5, students more frequently expressed concerns about complexity and misinterpretation (%Diff = 36%) and suggested assistive device options (%Diff = 36%). In contrast, educators more often emphasized the importance of structured training and practice (%Diff = 35%) and noted concerns about the burden of long-term training (%Diff = 26%). They also discussed the need for consistent design through customizable (%Diff = 35%) and standardized sounds (%Diff = 25%) and highlighted VASS’ potential to enhance students’ auditory sensitivity (%Diff = 27%) and to support students’ creativity and engagement (%Diff = 18%), themes not mentioned by students. Overall, these patterns suggest that while students focused on more practical usability, educators’ concerns were rooted in pedagogical outcomes and classroom use of VASS

Table 4.

Subtheme Frequency by Participant Role.

Theme	Subtheme	Total(n = 21)	Students(n = 10)	Teachers(n = 11)	% Difference (Teacher–Student)
Potential Value of VASS	Intuitiveness	12 (57%)	5 (50%)	7 (64%)	14
	Academic Use	18 (86%)	9 (90%)	9 (82%)	–8
	Daily Life Use	20 (95%)	10 (100%)	10 (91%)	–9
	Auditory Sensitivity	3 (14%)	0 (0%)	3 (27%)	27
	Creativity and Engagement	2 (10%)	0 (0%)	2 (18%)	18
Factors Influencing Effectiveness	Auditory Abilities	9 (43%)	5 (50%)	4 (36%)	–14
	Cognitive Abilities	3 (14%)	1 (10%)	2 (18%)	8
	Training and Practice	8 (38%)	2 (20%)	6 (55%)	35
Anticipated Challenges in Real-world Application	Fleeting and Sequential Nature of Sound	5 (24%)	2 (20%)	3 (27%)	7
	Mechanical Sound Discomfort	10 (48%)	5 (50%)	5 (45%)	–5
	Unclear Comparative Advantages	7 (33%)	2 (20%)	5 (45%)	25
	Complexity and Risk of Misinterpretation	17 (81%)	10 (100%)	7 (64%)	–36
	Need for Long-Term Training	5 (24%)	1 (10%)	4 (36%)	26
Suggestions for Improvement	Customizable Sound Options	8 (38%)	2 (20%)	6 (55%)	35
	Standardized Sound Cues	7 (33%)	2 (20%)	5 (45%)	25
	Selective Information Delivery	2 (10%)	0 (0%)	2 (18%)	18
	Multisensory Integration	3 (14%)	1 (10%)	2 (18%)	8
	Device Design and Assistive Options	17 (81%)	10 (100%)	7 (64%)	–36

Note. The % Difference column reflects the relative emphasis between teacher and student group, calculated as Teacher % − Student %. Positive values indicate a stronger emphasis by teachers. Negative values indicate a stronger emphasis by the student.

Figure 5.

Thematic emphases and coverage by participant group.

Although most participants with visual impairment were students (11 out of 15) and most sighted participants were teachers (N = 7), the patterns largely mirrored role-based trends. The small group of teachers with visual impairment (N = 4) appeared to bridge the two perspectives—sharing students’ focus on daily usability and perceptual challenges (e.g., complexity and misinterpretation: 100% for students and teachers with visual impairment vs. 43% for sighted teachers), while echoing teachers’ focus on structured training (e.g., training and practice: 20% for students, 50% for teachers with visual impairment, and 57% for sighted teachers). These descriptive contrasts suggest that teachers with visual impairment integrated user-oriented and pedagogical viewpoints rather than aligning entirely with one group.

Finally, we examined whether responses clustered by institution. No clear school-level patterns emerged. Participants from the same school often expressed distinct priorities, suggesting that perceptions of VASS were shaped more by individual roles and experiences than by institutional context.

In-Depth Thematic Findings

The following section provides a detailed analysis of each major theme, supported by representative excerpts. These quotations illustrate how participants experienced and evaluated VASS in relation to their perspectives and contexts of use.

Perceived Value of VASS Technology

Intuitiveness

Participants generally found the sound-to-shape translation intuitive. One teacher noted, “The eight shapes presented today were not difficult to distinguish” (sbtr1), while another teacher added, “Broader shapes were represented by more grand sounds, and diagonal lines by lower-pitched sounds” (satr4_vi). Students echoed this view. “It was easy to distinguish and understand the shapes by touching them while listening to the sounds” (scst3_vi).

Academic Use

Participants highlighted VASS’ value in academic subjects like math, art, and geography. A teacher explained, “Students with visual impairment often give up on math because they can’t see graphs. Braille has limitations, but representing graphs through sound could help them understand and retain the information better” (sctr1_vi). Several students also shared that VASS could support understanding of graph directionality and coordinate planes, and geometric shapes across subjects.

Daily Life Use

Participants saw VASS as useful for real-world navigation and hazard detection. A teacher mentioned, “VASS could help users recognize traffic patterns and avoid hazards, like door handles or changes in ground elevation” (satr3). Several students described its usefulness for locating entrances and exits where tactile cues are not available or consistent. One student shared “If VASS could notify me about doorhandles or steps, it would help me avoiding wasting time trying to find them” (scst2_vi). Others noted that VASS could assist in identifying distant or unsafe objects and distinguishing fast-moving hazards like bicycles.

Auditory Sensitivity

Educators highlighted VASS’ potential to leverage auditory senses to compensate for students with visual impairment and improve auditory sensitivity. “If auditory stimulation can activate the visual areas of the brain and help students compensate for visual limitations, it could significantly aid individuals in compensating for visual limitations” (sbtr1), one explained. Another added, “Auditory training is crucial. . .timely listening practice can significantly benefit students, especially for orientation skills” (sctr1_vi).

Creativity and Engagement

Educators also noted that VASS could foster creativity and engagement. As one expressed, “This could help develop creativity, as students realize, ‘this can be expressed in such ways’” (satr2_vi). Others described the experience as playful and mentally stimulating, suggesting that its multisensory nature could heighten student interest. Students did not mention creativity or engagement explicitly.

Influencing Factors on the Effectiveness of VASS

Auditory Abilities

Participants acknowledged variability in user’s auditory perception. “We often think that individuals with visual impairment must all have good hearing and memory, but that’s not always the case” (satr1), one teacher cautioned. Others agreed that individuals with stronger auditory perception would find VASS easier to use. One teacher noted, “There may be differences based on musical perception. . . students with better auditory perception would likely understand more easily” (sbtr3). Students agreed that good pitch perception or musical training could help, while emphasizing that clear sound design would make the system accessible even to those without musical backgrounds.

Cognitive Abilities

Concerns were raised about VASS’ cognitive demands. One teacher commented, “I wonder if only students with high cognitive abilities could use this” (satr1). A student similarly noted, “If shapes all have different sounds, it’s easy to forgot a few and get confused. I think that depends on each person’s memory skills” (scst2_vi).

Training and Practice

High-quality, structured training was seen as essential. One teacher explained, “With repeated training, students would become more familiar with the sounds corresponding to shapes” (sbtr3). Several teachers and students recommended introducing VASS principles in advance, remarking, “If we could provide a brief introduction on VASS principles on how the shapes are expressed in sound in advance of training, the training would be smoother and more effective” (sbtr4_vi). A student added, “If there were something like a manual, it would have been helpful to read it (beforehand) and then listen while learning” (scst4_vi). Other participants encouraged self-directed learning using labeled tactile aids and suggested using real-life examples and assessments to help users apply auditory information.

Anticipated Challenges in Real-World Implementation

Fleeting and Sequential Nature of Sound

Participants noted that the fleeting and sequential nature of auditory information increases cognitive load. “The sound is too short. Even though I was focused. . . I got it wrong” (sbtr5), said one teacher. A student echoed this, saying, “it would be better if the sound were a bit longer. It felt short” (sast3_vi).

The sequential nature of auditory information can also slow down recognition compared to visual input. This could hinder quick reactions, as one participant noted, “When we see a shape, we know what it is right away, but with VASS, it takes a few seconds of listening before we can identify it. If something happens quickly, like a ball flying toward you, can you recognize it as a ball and avoid it just by hearing the sound in a few seconds?” (satr1).

Continuous auditory feedback was seen as potentially distracting and exhausting, particularly when the information is excessive or irrelevant. “When I’m walking in real life, aren’t all these things just obstacles? The sensors would keep making sounds over and over. . . It’s not easy to distinguish between important things and unnecessary noise” (sctr1_vi), said one teacher. A student similarly shared, “If sound is coming from every object around me even those that are not important it could become noisy. I only want to know about specific objects, and if I hear everything, it might interfere my ability to judge what really matter” (scst2_vi).

Mechanical Sound Discomfort

Some users found the current sound output unpleasant. One student commented, “The sound is a bit mechanical and unpleasant right now. It would be better to use a more stable sound” (scst2_vi). Another educator noted, “The sound itself is somewhat irritating, like a sharp ‘beep’ noise. Rather than using a harsh mechanical sound, it would be better to use something softer. Individuals with visual impairment, who rely heavily on auditory senses as compensation, might find the current sounds even more jarring” (sbtr1).

Unclear Comparative Advantages

Participants questioned whether VASS offers clear advantages over tactile learning or voice-based assistance. One educator noted, “Tactile information seems to be more useful than auditory information. It is easier to convey the shape by allowing someone to feel it directly with their hands” (satr2_vi), while a student said, “I mostly use braille, so I haven’t really felt a strong need for this kind of auditory system” (sbst1_vi).

Other participants expressed a preference for voice feedback or suggested AI-generated speech. “Rather than interpreting sound waves, it might be more helpful if AI just provide the answer, like ChatGPT does. The era of requiring individuals with visual impairment to memorize everything has passed” (sctr1_vi), explained a teacher. Another added, “Unless the system names the object directly, it is hard to justify why someone would use VASS over more familiar tools. There are already too many accessible alternatives that require less effort to learn” (sbtr3). Students echoed, expressing a preference for voice-based information over remembering complex sound-shape associations.

Complexity and Risk of Misinterpretation

Participants highlighted the complexity of real-world environments, noting that sound-based interpretation may be error-prone. One teacher explained, “VASS seems to rely on memory, while tactile information can be identified immediately through touch. With VASS, there’s the possibility of memory errors and the risk of incorrect generalization” (satr1). Concerns were also raised about the difficulty of identifying objects of similar shape: “The refrigerator, dining table, and desk are all rectangles, but how can we distinguish their sizes through sound alone?” (sctr1_vi).

Some participants pointed that background noise and blind spots can add complexity to the real-world environment. One student noted, “If other noises overlap, I might not hear the VASS sound well. Imagine standing at a crosswalk when a bus passed by, I might not be able to hear anything at all” (sast1_vi). A teacher with visual impairment describes similar challenges when using ultrasonic canes: “Even if we manipulate pitch or waveforms for ultrasonic canes, it would be difficult to distinguish between a narrow pole and a wide wall based on sound alone. There are too many blind spots, and the number of sound cues one would need to memorize for full mobility would be overwhelming” (sctr1_vi).

Need for Long-Term Training

Participants also expressed concerns about the extensive training required to master VASS. “I understand simple shapes, but acquiring various types of information would require significant training” (satr1), one teacher commented. A student added, “It feels like this requires a lot of practice. I don’t think users with visual impairment can pick it up quickly, and it would take a long time to get used to” (scst1_vi).

Suggestions for Improvement and Design Considerations

Customizable Sound Options

Participants advocated for a wide range of sound schemes, including varied tones or musical notes, to represent different objects more effectively. One student noted, “Musical sounds help you focus, even with background noise like a passing bus” (sast1_vi). Others called for clearer pitch distinctions and simpler sound design, saying “The pitches are too similar—it would be better if the sound differences were more distinct” (scst1_vi), and “We need diverse sounds that are simple and convenient, not overly complicated” (scst3_vi). Customization options were also recommended, with one educator noting, “The Clover app allows users to select voices. It would be great if the VASS application could also allow users to choose sounds” (satr1).

Standardized Sound Cues

Educators emphasized the value of a consistent auditory pattern to support quicker and more accurate shape recognition. “This is also, in a way, like music. It needs to have a regular pattern that everyone can hear and understand. Establishing standards for pitch range or duration might make it more universally applicable,” said one educator (sbtr2). Educators also stressed the importance of accommodating individual differences in sound sensitivity, suggesting customizable sound settings. Another participant added, “If there is a way to define the middle note or set the highest and lowest notes, and present that at the start, it might help with better differentiation” (sbtr3).

Selective Information Delivery

To reduce cognitive load, participants recommended selecting only necessary auditory cues. “It would cause real fatigue. It is better if the feedback is given when the user is curious or uncertain” (scst1_vi), a student suggested. Another explained, “If the system keeps describing everything in front of me in real time, I might get tired and even miss important surroundings while trying to focus on the sounds. It would be better to only hear what I need, when I need it” (scst2_vi).

Multisensory Integration

Participants suggested combining VASS with other sensory inputs, like tactile information, to enhance comprehension or safety. “If I could read it in braille and hear it at the same time it would be fine. . . I hope multiple senses could be used simultaneously” (sast5_vi), said one student. An educator added, “it’s hard to protect ourselves just using vague cues. That’s why it’s better to use VASS with a white cane as a supplementary tool” (sbtr1). Another educator noted that multisensory cues could reduce anxiety and prevent misunderstandings, “Students with high anxiety perceive their surroundings as dangerous. I hope the program can provide diverse information through multiple senses” (sbtr3).

Device Design and Assistive Options

Participants were interested app-based format and discreet, wearable devices. One teacher noted, “It would be great if they could develop an app. Without the need for a separate device, it could be more widely distributed” (satr4_vi), while a student added, “If it were developed as an app, it would be much easier to use in daily life” said one student (sbst1_vi). Others suggested linking the app to Bluetooth earbuds or hearing aids to support individualized needs. To minimize cognitive burden, some participants proposed incorporating barcode-based interactions that allow users to access information only when needed.

Participants also emphasized the need for auditory devices that do not isolate users from their environment. Bone-conduction earphones were frequently recommended, as they deliver audio without blocking the ears. “If students keep wearing earphones during class, they might not be able to respond properly to important alarms like fire alarms. This should be considered when developing the device” (sbtr1), said a teacher. Another added, “Bone-conduction earphones are better because they do not block environmental sounds—even with hats or hair” (sctr1_vi).

Some students also expressed discomfort with conventional headsets that could block surrounding sounds. To address these concerns, several suggested lightweight, discreet wearables such as small clip-on chips for clothing, devices mounted on eyeglasses, or badge-style attachments to minimize obtrusiveness and avoid social discomfort. One student explained, “Attaching a device to clothing might be safer than holding a phone while also using a cane” (sast1_vi).

Discussion

The study highlights users’ perspectives on VASS’ usefulness, influencing factors, challenges, and improvement. Across interviews, participants perceived VASS as a promising tool to support both educational and daily tasks, particularly in facilitating learning, spatial reasoning, and creativity through auditory input. These perceptions align with recent advances in sensory substitution technologies. Prior research demonstrates applications ranging from sonification with tactile exploration for academic content (e.g., TactualPlot; Chundury et al., 2023) to spatial orientation tools supporting mobility (Chebat et al., 2015; Ward & Meijer, 2010), as well as complex recognition tasks such as face-shape identification (Arbel et al., 2022) and climbing-based navigation (Richardson et al., 2022). Together, these developments underscore the expanding real-world applicability of sensory substitution approaches.

Consistent with prior studies, participants emphasized that VASS usability and engagement depend strongly on users’ auditory and cognitive capacities, as well as on the structure of training. While some prior research has explored musical ability in relation to VASS (e.g., Haigh et al., 2013; Hamilton-Fletcher & Chan, 2021), few have examined how broader cognitive and auditory skills are associated with or impact VASS outcomes. Regarding training, prior studies show that repeated training and feedback can lead to automated perception and generalized learning, improving recognition of both familiar and novel stimuli (Abboud et al., 2014; Netzer et al., 2019; Striem-Amit et al., 2012; Ward & Meijer, 2010). Immersive and interactive environments further promote transfer to real-world contexts (Ricci et al., 2023; Seki & Sato, 2010). Several studies have also explored adaptive training, which adjusts task difficulty, sound complexity, or feedback intensity, and multisensory training designed to reduce cognitive load and enhance spatial understanding. However, evidence regarding their effectiveness—or which strategies work best—remains mixed (Abboud et al., 2014; Heimler et al., 2015). Further research is needed to systematically compare different training approaches that account for individual differences among users and to identify strategies that best promote sustained learning, transferability, and long-term usability of VASS.

In addition to these facilitators, participants identified several barriers that limit the practical utility of VASS. These included the fleeting nature of auditory information, difficulty distinguishing complex shapes, discomfort with mechanical sounds, and the cognitive burden required to interpret unfamiliar auditory symbols. These challenges align with concerns raised in the literature on cognitive load in auditory processing (Brown et al., 2014; Brown & Proulx, 2016; Kristjánsson et al., 2016). When auditory inputs are overly dense or simultaneous, processing capacity can become overloaded, reducing recognition accuracy and increasing reliance on memory-based decoding. These findings align with research on auditory scene analysis showing that overlapping or harmonically consonant sounds can interfere with perceptual clarity in sonified visual stimuli (Brown et al., 2015). These perceptual and cognitive barriers underscore that effective VASS design must balance information richness with processing capacity.

Notably, several participants framed VASS use as a task requiring memory-based decoding rather than a perceptual immersion. This distinction is critical, as it highlights a central tension between VASS’ promise as a sensory substitution technology and users’ current experiences with it in practice. Rather than enabling immediate perceptual access, participants often described VASS use as involving deliberate interpretation and recall, suggesting that perceptual integration has not yet fully emerged. This finding underscores the need to develop more intuitive sound mappings and scaffolded training that can help users move toward embodied perception and improve usability. Without this shift, users may struggle to progress beyond rote memorization and encounter fatigue or frustration with the tool.

Beyond technical and perceptual challenges, some participants questioned the added value of VASS compared to more familiar tools such as braille, tactile graphics, or voice-based assistance. These concerns point to the importance of clearly articulating the unique aspects of VASS, especially in contexts where other modalities are limited or less feasible. Enhancing communication about the advantages of VASS may increase user acceptance and engagement.

Taken together, these findings reconnect to the Technology Acceptance Model and Diffusion of Innovations frameworks and help clarify how users come to accept—or struggle to accept—VASS. While the Technology Acceptance Model emphasizes perceived usefulness and perceived ease of use, participants’ accounts show that ease of use depends on whether VASS is experienced perceptually or as a cognitively demanding decoding task. From a Diffusion of Innovations perspective, participants’ comparisons with braille, tactile graphics, and voice-based tools highlight the importance of comparative advantage and compatibility with existing sensory practices. Overall, these findings help explain why users may recognize VASS’ potential yet find it difficult to use comfortably and consistently in educational and daily contexts.

In light of these adoption-related considerations and participant feedback, we propose the following user-centered design recommendations to inform the next phase of VASS development:

Customization and Standardization. Offering customization yet standardized auditory scheme (e.g., preset tone) may reduce cognitive load and maintain consistent interpretability.

Focused Auditory Input. Prioritizing essential auditory cues can help reduce cognitive overload and enhance clarity.

Multisensory Integration. Adding tactile cues may enhance shape recognition and spatial understanding, reflecting multisensory integration and inclusive-design principles (Jicol et al., 2020; Lloyd-Esenkaya et al., 2020).

Accessible and Flexible Technology. Mobile application with wireless or bone-conduction earphones can improve usability, while remaining aware of their surroundings.

Adaptive Training. Developing structured, individualized training programs tailored to individual cognitive and sensory characteristics may enable a smoother transition from symbolic decoding to perceptual interpretation. Building on prior work emphasizing progressive and multisensory training (Abboud et al., 2014; Heimler et al., 2015), adaptive approaches could dynamically adjust task difficulty and feedback modalities to sustain engagement and promote transferable learning.

AI-Enhanced Personalization. Integrating AI could allow for personalized auditory mappings and training pathways. In addition, combining nonverbal auditory cues with AI-generated speech could support comprehension and navigation in certain settings by reducing ambiguity and aiding decision-making, like how lyrics layered onto rhythm convey richer meaning.

Limitations

Several limitations of this study should be acknowledged. Although the research aimed to capture practical perceptions of VASS, the tasks were introduced in a short, controlled training setting. While the training allowed participants to understand the basic principles of sound-shape encoding, it was not sufficient for developing proficiency or perceptual fluency. Consequently, participants’ feedback likely reflects initial impressions and perceived usability rather than outcomes derived from extended learning or real-world use. Perceptions regarding the application of VASS in complex or dynamic contexts may therefore be based on inference rather than direct experience. In addition, although the study included participants with varying levels of visual impairment and different timings of vision loss, small subgroup sizes limited our ability to investigate how these factors influenced user perceptions. It also limited the analysis of how reliance on different sensory modalities (e.g., touch vs. residual vision) shapes user experiences with VASS. Larger and more diverse samples are needed to explore how visual status and sensory modality reliance affect the usability and effectiveness of VASS.

Implications for Future Research

Future research should evaluate VASS in more dynamic, real-world environments that reflect the challenges faced by individuals with visual impairment. Longitudinal studies with extended training periods are needed to assess whether users move from memorization to perceptual use, a key goal of sensory substitution. Comparative studies examining different training approaches (e.g., implicit exposure, structured feedback, task sequencing) could clarify how instructional methods impact training outcomes. Further investigation is also needed to clarify for whom VASS is most beneficial. Systematic assessments of individual differences, such as vision status, cognitive ability, auditory discrimination, and prior exposure to assistive technologies, could inform tailored protocols and device customization. Moreover, evaluating VASS’ application in specific academic domains, such as graph reading or science learning, would help determine where it adds instructional value beyond tactile or verbal tools. Finally, collaboration with educators or mobility specialists will be essential to align future VASS designs with classroom demands and orientation and mobility training goals, ensuring that the technology is both pedagogically sound and practically usable.

Conclusion

This study contributes by exploring the potential and challenges of VASS technology and offering user-informed insights to guide its future development. Participants recognized the promise of VASS as a tool for learning and navigation, but also emphasized cognitive demands and usability barriers that limit broader adoption. These findings underscore the need to clearly communicate VASS’ unique values, user-centered design, intuitive interaction, and adaptive training in the future development of sensory substitution systems. With continued improvements such as customizable sound schemes, multisensory feedback, mobile accessibility, and training tailored to user capacities, VASS has the potential to become a more inclusive and effective tool for individuals with visual impairment across educational and daily settings.

Footnotes

Acknowledgements

We thank the research team at Electronics and Telecommunications Research Institute (ETRI) for creating the experimental apparatus and their valuable input in developing the training program. We also thank the students and educators from schools for the blind in South Korea who participated in this study.

ORCID iD

Sojung Park

Ethical Considerations

This study was approved by the Korea University Institutional Review Board (Approval No. KUIRB-2023-0313-0). The study was conducted in accordance with the Declaration of Helsinki, and all data were anonymized to ensure confidentiality.

Informed Consent

Written informed consent was obtained all participants prior to participation in the study.

Author Contributions

So Jung Park: Conceptualized the study, led data collection and analysis, drafted the manuscript, and served as the corresponding author.

Sunhi Bak: Contributed to data collection and participated in qualitative coding and analysis, and critically reviewed and edited the manuscript for intellectual content.

Hyojin Lee: Contributed to data collection and participated in qualitative coding and analysis, and assisted in preparing the Methods section.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

Due to ethical restrictions and the sensitive nature of qualitative data involving students with visual impairments, interview transcripts are not publicly available. Anonymized excerpts may be shared upon reasonable request to the corresponding author.

References

Abboud

Hanassy

Levy-Tzedek

Maidenbaum

Amedi

(2014). EyeMusic: Introducing a “visual” colorful experience for the blind using auditory sensory substitution. Restorative Neurology and Neuroscience, 32(2), 247–257. https://doi.org/10.3233/RNN-130338

Arbel

Heimler

Amedi

(2022). Face shape processing via visual-to-auditory sensory substitution activates regions within the face processing networks in the absence of visual experience. Frontiers in Neuroscience, 16, Article 921321. https://doi.org/10.3389/fnins.2022.921321

Bach-y-Rita

(1972). Brain mechanisms in sensory substitution. Academic Press.

Bach-y-Rita

Collins

C. C.

Saunders

F. A.

White

Scadden

(1969). Vision substitution by tactile image projection. Nature, 221, 963–964. https://doi.org/10.1038/221963a0

Braun

Clarke

(2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101.

Brown

D. J.

Proulx

M. J.

(2016). Audio–vision substitution for blind individuals: Addressing human information processing capacity limitations. IEEE Journal of Selected Topics in Signal Processing, 10(5), 924–931. https://doi.org/10.1109/JSTSP.2016.2543678

Brown

D. J.

Simpson

A. J.

Proulx

M. J.

(2014). Visual objects in the auditory system in sensory substitution: How much information do we need? Multisensory Research, 27(5–6), 337–357. https://doi.org/10.1163/22134808-00002462

Brown

D. J.

Simpson

A. J.

Proulx

M. J.

(2015). Auditory scene analysis and sonified visual images: Does consonance negatively impact on object formation when using complex sonified stimuli? Frontiers in Psychology, 6, Article 1522. https://doi.org/10.3389/fpsyg.2015.01522

Buchs

Maidenbaum

Levy-Tzedek

Amedi

(2015). Integration and binding in rehabilitative sensory substitution: Increasing resolution using a new zooming-in approach. Restorative Neurology Andneuroscience, 34(1), 97–105. https://doi.org/10.3233/RNN-150592

10.

Chebat

D. R.

Maidenbaum

Amedi

(2015). Navigation using sensory substitution in real and virtual mazes. PLOS ONE, 10(6), Article e0126307. https://doi.org/10.1371/journal.pone.0126307

11.

Chundury

Reyazuddin

Jordan

J. B.

Lazar

Elmqvist

(2023). TactualPlot: Spatializing data as sound using sensory substitution for touchscreen accessibility. IEEE Transactions on Visualization and Computer Graphics, 30(1), 836–846. https://doi.org/10.1109/tvcg.2023.3326937

12.

Davis

F. D.

(1989). Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Quarterly, 13(3), 319–340. https://doi.org/10.2307/249008

13.

Electronics and Telecommunications Research Institute. (2021). Final research report: Analysis of perceptual recognition and resolution of visual-to-auditory conversion signals among individuals with visual impairment.

14.

Elli

G. V.

Benetti

Collignon

(2014). Is there a future for sensory substitution outside academic laboratories? Multisensory Research, 27(5–6), 271–291. https://doi.org/10.1163/22134808-00002460

15.

Haigh

Brown

D. J.

Meijer

Proulx

M. J.

(2013). How well do you see what you hear? The acuity of visual-to-auditory sensory substitution. Frontiers in Psychology, 4, Article 330. https://doi.org/10.3389/fpsyg.2013.00330

16.

Hamilton-Fletcher

Chan

K. C.

(2021). Auditory scene analysis principles improve image reconstruction abilities of novice vision-to-audio sensory substitution users. In Proceedings of the 43rd annual international conference of the IEEE Engineering in Medicine & Biology Society (EMBC) (pp. 5868–5871). IEEE. https://doi.org/10.1109/EMBC46164.2021.9630296

17.

Hamilton-Fletcher

Obrist

Watten

Mengucci

Ward

(2016). “I always wanted to see the night sky”: Blind user preferences for sensory substitution devices. In Proceedings of the 2016 CHI conference on human factors in computing systems (pp. 2162–2174). ACM. https://doi.org/10.1145/2858036.2858241

18.

Hayes

Proulx

M. J.

(2024). Turning a blind eye? Removing barriers to science and mathematics education for students with visual impairments. British Journal of Visual Impairment, 42(2), 544–556. https://doi.org/10.1177/02646196221149561

19.

Heimler

Striem-Amit

Amedi

(2015). Origins of task-specific sensory-independent organization in the visual and auditory brain: Neuroscience evidence, open questions, and clinical implications. Current Opinion in Neurobiology, 35, 169–177. https://doi.org/10.1016/j.conb.2015.09.001

20.

Jicol

Lloyd-Esenkaya

Proulx

M. J.

Lange-Smith

Scheller

MO’

Neill

Petrini

(2020). Efficiency of sensory substitution devices alone and in combination with self-motion for spatial navigation in sighted and visually impaired. Frontiers in Psychology, 11, Article 1443. https://doi.org/10.3389/fpsyg.2020.01443

21.

Kristjánsson

Á.

Moldoveanu

Jóhannesson

Ó. I.

Balan

Spagnol

Valgeirsdóttir

V. V.

Unnthorsson

. (2016). Designing sensory-substitution devices: Principles, pitfalls, and potential. Restorative Neurology and Neuroscience, 34(5), 769–787. https://doi.org/10.3233/RNN-160647

22.

Levy-Tzedek

Hanassy

Abboud

Maidenbaum

Amedi

(2012). Fast, accurate reaching movements with a visual-to-auditory sensory substitution device. Restorative Neurology and Neuroscience, 30(4), 313–323. https://doi.org/10.3233/RNN-2012-110219

23.

Liang

Spencer

Scheller

Proulx

M. J.

Petrini

(2024). Assessing people with visual impairments’ access to information, awareness and satisfaction with high-tech assistive technology. British Journal of Visual Impairment, 42(1), 149–163. https://doi.org/10.1177/02646196221131746

24.

Lloyd-Esenkaya

VO’

Neill

Proulx

M. J.

(2020). Multisensory inclusive design with sensory substitution. Cognitive Research: Principles and Implications, 5(1), 37. https://doi.org/10.1186/s41235-020-00240-7

25.

Maidenbaum

Hanassy

Abboud

Buchs

Chebat

D. R.

Levy-Tzedek

Amedi

(2014). The “EyeCane,” a new electronic travel aid for the blind: Technology, behavior and swift learning. Restorative Neurology and Neuroscience, 32(6), 813–824. https://doi.org/10.3233/RNN-130351

26.

Meijer

P. B. L.

(1992). An experimental system for auditory image representations. IEEE Transactions on Biomedical Engineering, 39(2), 112–121. https://doi.org/10.1109/10.121642

27.

Netzer

Novick

Amedi

(2019). The neural basis of learning visual concepts using sensory substitution. Scientific Reports, 9, Article 13259. https://doi.org/10.1038/s41598-019-49448-0

28.

Ricci

Bisio

Ruggeri

Trombetta

(2023). Learning spatial representations through audio feedback in immersive environments: Insights for sensory substitution. Frontiers in Psychology, 14, Article 1145673. https://doi.org/10.3389/fpsyg.2023.1145673

29.

Richardson

Petrini

Proulx

M. J.

(2022). Climb-o-vision: A computer-vision-driven sensory substitution device for rock climbing. In Proceedings of the 2022 CHI conference on human factors in computing systems (pp. 1–6). ACM. https://doi.org/10.1145/3491101.3519680

30.

Rogers

E. M.

Singhal

Quinlan

M. M.

(2014). Diffusion of innovations. In Stacks

D. W.

Salwen

M. B.

(Eds.), An integrated approach to communication theory and research (2nd ed., pp. 432–448). Routledge.

31.

Seki

Sato

(2010). A training system of the sensory substitution using virtual environment. Proceedings of the 2010 IEEE Symposium on D User Interfaces, 3, 71–74. https://doi.org/10.1109/3DUI.2010.5444729

32.

Striem-Amit

Guendelman

Amedi

(2012). “Visual” acuity of the congenitally blind using visual-to-auditory sensory substitution. PLOS ONE, 7(3), Article e33136. https://doi.org/10.1371/journal.pone.0033136

33.

Ward

Meijer

(2010). Visual experiences in the blind induced by an auditory sensory substitution device. Consciousness and Cognition, 19(1), 492–500. https://doi.org/10.1016/j.concog.2009.10.006

34.

Zhang

Thompson

J. R.

Shah

Agrawal

Sarikaya

Wobbrock

J. O.

Cutrell

Lee

(2024). ChartA11y: Designing accessible touch experiences of visualizations with blind smartphone users. In Proceedings of the 26th international ACM SIGACCESS conference on computers and accessibility (pp. 1–15). ACM.

35.

Zhao

Nacenta

M. A.

Sukhai

M. A.

Somanath

(2024). TADA: Making node-link diagrams accessible to blind and low-vision people. In Proceedings of the 2024 CHI conference on human factors in computing systems (pp. 1–20). ACM.