Abstract
While prior studies on visual-to-auditory sensory substitution (VASS) have focused on technological development, practical adoption remains limited. This study explores how students with visual impairment and special educators in South Korea perceive the usability and potential improvements of VASS after brief training. Drawing on semi-structured interviews and thematic analysis, the study identified four overarching themes (i.e., perceived value, influencing factors, real-world challenges, and suggestions for improvement) spanning 18 subthemes. Participants noticed the potential of VASS to enhance learning, spatial reasoning, creativity, and engagement, while also highlighting limitations including high cognitive load, unclear comparative advantage over conventional assistive tools, and the absence of structured training protocols. The findings emphasize the importance of user-centered design, adaptive training, and multisensory integration for future development of sensory substitution systems in educational and daily contexts.
Keywords
Introduction
Sensory substitution refers to the process of delivering information acquired through one sense via another. It is grounded in the brain’s neuroplasticity—the capacity to reorganize and adapt to new sensory inputs (Bach-y-Rita, 1972; Bach-y-Rita et al., 1969). Visual-to-auditory sensory substitution (VASS) involves converting visual features such as shape, color, and spatial layout into auditory signals, enabling users to access visual information through sound.
Since the 1960s, advances in neuroscience, computer science, and engineering have led to the development of increasingly sophisticated sensory substitution devices (SSDs). Notable examples include the vOICe (Meijer, 1992), which converts digital images into continuous soundscapes, and newer tools like EyeMusic (Abboud et al., 2014) and EyeCane (Buchs et al., 2015; Maidenbaum et al., 2014), which integrate additional sensory cues such as color or depth (Levy-Tzedek et al., 2012). Recent developments include real-time feedback systems, smartphone-based applications like TADA (Zhao et al., 2024) and ChartA11y (Zhang et al., 2024). In South Korea, the Electronics and Telecommunications Research Institute (ETRI) has developed the V2A (Visual to Audio) program since 2019 to support multisensory fusion.
Despite these technical advancements, the widespread adoption of VASS technologies among individuals with visual impairment remains limited. Much of the existing research has been conducted in controlled laboratory settings (Elli et al., 2014; Kristjánsson et al., 2016). This gap between technical capability and real-world use suggests that factors beyond technical performance such as user perception and contextual relevance may play an important role in implementation success.
Bridging this gap requires a clearer understanding of how VASS is perceived and evaluated by its intended users. The Technology Acceptance Model (Davis, 1989) and Diffusion of Innovations theory (Rogers et al., 2014) emphasize that adoption depends not only on functionality but also on perceived usability, compatibility with existing practices, and contextual relevance. Recent studies have begun to address these user-centered dimensions of assistive technology, examining satisfaction and accessibility (Liang et al., 2024), barriers to educational inclusion (Hayes & Proulx, 2024), and preferences for controllable and comfortable sensory-substitution designs (Hamilton-Fletcher et al., 2016). However, these studies primarily focus on general awareness or conceptual barriers rather than users’ firsthand experiences with a specific VASS system in educational contexts.
Addressing this gap, the present study investigates perceptions of VASS technology among students with visual impairment and special educators in South Korea. Through semi-structured interviews and thematic analysis, this study explores how potential users interpret, evaluate, and envision the application of VASS in educational and daily contexts. Drawing on participants’ reflections, the study identifies both the opportunities and constraints of VASS implementation and offers user-informed insights to guide its future development.
Methods
Participants
Participants were recruited from three schools serving students with visual impairment located in different regions of South Korea. The final sample included a total of 21 participants (10 students and 11 special educators with over 5 years’ experience supporting students with visual impairment). Schools are a common setting for introducing and training students on assistive technologies, providing an important early user implementation context. Students, as direct users, offer insights into real-world usability and sensory demands, while special educators, including several with visual impairment, provide perspectives on instructional integration, design fit, and classroom feasibility. Recruitment flyers were posted at each school, and interested individuals received study details and consent forms.
The student group consisted of two females and eight males, with a mean age of 16 years (range = 14–19, standard deviation (SD) = 1.66). Six students (60%) had total blindness, and none had hearing impairment (see Table 1). The educator group included four female and seven male teachers, with a mean age of 44 years (range = 35–55, SD = 6.5) and a mean of 15.3 years of teaching experience in visual impairment education (SD = 6.5) (see Table 2). Four educators (36%) had visual impairment, including three with total blindness, and none reported hearing impairment. Overall, this heterogeneous sample provided diverse perspectives across roles and sensory experiences.
Demographic Information of the Students.
Note. Participant IDs ending in “_vi” indicate individuals with visual impairment.
Demographic Information of the Educators.
Note. Participant IDs ending in “_vi” indicate individuals with visual impairment. “—” indicates not applicable (i.e., for participants with sighted vision).
Apparatus and Materials
This study used the Visual-to-Audio (V2A) system developed by the Electronics and Telecommunications Research Institute (ETRI), which applies the vOICe algorithm to convert visual images into auditory input. The algorithm scans each image column by column, encoding vertical position as sound frequency, horizontal position as time, and pixel brightness as sound intensity (white = loudest; black = silent). Each image was divided into a 64 × 64 grid, mapped to frequencies from 80 to 7,600 Hz over 1.05 s per frame, with 16 gray levels and click sounds marking frame boundaries. These parameters reflected ETRI’s (2021) validated default configuration, optimized for perceptual clarity and temporal efficiency. A non-musical tone mapping was used to examine auditory–spatial processing and cognitive load. The study employed the Windows-based prototype, as the mobile version was still under development as part of the broader multisensory-fusion initiative during data collection.
Auditory signals were delivered via over-ear headphones. Eight simple geometric shapes, including circles, ellipses, triangles, and rectangles, were used as stimuli (see Figure 1). Simple, symmetrical shapes were selected to allow participants to understand VASS concepts quickly. Visual stimuli were presented on a laptop screen for sighted participants, while participants with visual impairment explored 3D-embossed tactile graphics of the same shapes. Graphics measured 11.6 cm squares or 22 cm squares upon request, with contrasting textures.

VASS learning session screen.
A custom-developed presentation interface displayed two shapes per screen or tactile set across four learning trials. Participants could replay each shape’s soundscape as needed during learning. For the post-training test, participants were presented with a single tactile shape or visual shape and asked to identify the matching sound from two audio options. Each sound could be played up to two times (see Figure 2).

VASS test session screen.
Procedure
After consent, participants completed a demographic questionnaire on age, gender, vision and hearing status, and years of teaching experience for educators. Participants completed a 15 min VASS training session in a quiet room at their respective schools. Sessions began with a standardized explanation of the vOICe system and sensory substitution (see Figure 3). Participants explored each shape, listened to its corresponding sound, and identified auditory patterns. No explicit instruction was provided on the mapping between sound and visual features. The training involved four sets of two shapes. Sighted participants used vision to view digital shapes on the laptop screen, while participants with visual impairment used tactile graphics only. All participants listened to the associated soundscapes via headphones.

Introduction script for VASS technology.
The purpose of this brief training was not to assess mastery of the VASS system, but to provide participants with a basic understanding of how auditory signals encode visual information and to enable them to meaningfully engage with the system during subsequent tasks and interviews. The training was intentionally designed to introduce core principles of sound-shape mapping in an accessible and simplified manner, allowing participants to grasp the fundamental logic of sensory substitution within a limited time frame.
After training, participants completed a shape-sound recognition test to assess their understanding of the basic encoding principles introduced during the session. The task was not used as an outcome measure but served to ensure that participants had sufficient familiarity with the system prior to the interview. For each of the eight test trials, a single shape (tactile or visual) was presented. Participants listened to two sound options with one correct and one distractor, and verbally identified the matching sound. Sound could be repeated once upon request. Participants generally demonstrated a basic understanding of the encoding principles before proceeding to the interview. Each participant then completed a 30- to 60-min semi-structured interview about user experiences and perceptions (see Figure 4). Interviews were audio-recorded with consent and supplemented by field notes.

Interview protocol.
Ethical Considerations
Ethical approval has been given by the Institutional Review Board of Korea University (Approval No. KUIRB-2023-0313-0).
Data Analysis
This study employed a qualitative approach to explore participants’ perceptions of VASS. Thematic analysis was conducted following Braun and Clarke’s (2006) framework. We began by reading interview transcripts repeatedly to build familiarity and generate initial impressions. Inductive coding was used to capture meaningful features, grounded in participants’ own language. Codes were reviewed, grouped into candidate themes, and refined through team discussions to ensure coherence. To enhance credibility, selected participants reviewed preliminary findings, and their feedback informed final themes.
Transcripts were generated using Clova, a Korean speech-to-text tool, and manually reviewed for accuracy. Coding was conducted in Taguette, an open-source qualitative analysis platform. To strengthen analytic rigor, we triangulated interviews with field notes, engaged multiple coders to reduce bias, and conducted informal member checking during interviews. A detailed audit trail was maintained throughout to ensure transparency. This process yielded four overarching themes—perceived value, influencing factors, real-world challenges, and suggestions for improvement, comprising 18 subthemes, summarized in Table 3.
Developed Themes.
To understand the prominence and distribution of these themes, we supplemented the thematic analysis with a frequency count of how many unique participants mentioned each theme at least once. We categorized responses by participant group and compared subtheme frequencies between students and teachers, and between teachers with and without visual impairment, to explore how themes varied by role and vision status. Finally, we examined whether responses clustered by school, but no consistent institutional patterns were observed. Although subgroup comparison was not the primary goal of this study, this breakdown provides useful descriptive context.
Results
This section presents findings from the thematic analysis, organized into four overarching themes and 18 subthemes: (a) perceived value of VASS technology (intuitiveness, academic use, daily life use, auditory sensitivity, creativity and engagement); (b) influencing factors (auditory abilities, cognitive abilities, training and practice); (c) anticipated real-world challenges (fleeting auditory cues, discomfort with mechanical sounds, comparative advantages, complexity and misinterpretation, long-term training); and (d) suggestions for improvement (customizable sounds, standardized cues, selective auditory input, multisensory integration, assistive device options).
To contextualize the findings, we first present an overview of the frequency with which each subtheme was discussed across participants and group-level comparisons. Then, we offer an in-depth analysis of each theme, supported by illustrative excerpts.
Overview of Emerging Themes
The most commonly discussed subthemes included daily life application (N = 20) and academic use (N = 18), complexity and misinterpretation of the auditory information (N = 17), and device design and assistive options (N = 17). These patterns highlight participants’ shared focus on the practical usability and real-world implementation of VASS. To explore how participant roles shaped their perspectives, we compared the frequency of subthemes across students and teachers. As shown in Table 4 and Figure 5, students more frequently expressed concerns about complexity and misinterpretation (%Diff = 36%) and suggested assistive device options (%Diff = 36%). In contrast, educators more often emphasized the importance of structured training and practice (%Diff = 35%) and noted concerns about the burden of long-term training (%Diff = 26%). They also discussed the need for consistent design through customizable (%Diff = 35%) and standardized sounds (%Diff = 25%) and highlighted VASS’ potential to enhance students’ auditory sensitivity (%Diff = 27%) and to support students’ creativity and engagement (%Diff = 18%), themes not mentioned by students. Overall, these patterns suggest that while students focused on more practical usability, educators’ concerns were rooted in pedagogical outcomes and classroom use of VASS
Subtheme Frequency by Participant Role.
Note. The % Difference column reflects the relative emphasis between teacher and student group, calculated as Teacher % − Student %. Positive values indicate a stronger emphasis by teachers. Negative values indicate a stronger emphasis by the student.

Thematic emphases and coverage by participant group.
Although most participants with visual impairment were students (11 out of 15) and most sighted participants were teachers (N = 7), the patterns largely mirrored role-based trends. The small group of teachers with visual impairment (N = 4) appeared to bridge the two perspectives—sharing students’ focus on daily usability and perceptual challenges (e.g., complexity and misinterpretation: 100% for students and teachers with visual impairment vs. 43% for sighted teachers), while echoing teachers’ focus on structured training (e.g., training and practice: 20% for students, 50% for teachers with visual impairment, and 57% for sighted teachers). These descriptive contrasts suggest that teachers with visual impairment integrated user-oriented and pedagogical viewpoints rather than aligning entirely with one group.
Finally, we examined whether responses clustered by institution. No clear school-level patterns emerged. Participants from the same school often expressed distinct priorities, suggesting that perceptions of VASS were shaped more by individual roles and experiences than by institutional context.
In-Depth Thematic Findings
The following section provides a detailed analysis of each major theme, supported by representative excerpts. These quotations illustrate how participants experienced and evaluated VASS in relation to their perspectives and contexts of use.
Perceived Value of VASS Technology
Intuitiveness
Participants generally found the sound-to-shape translation intuitive. One teacher noted, “The eight shapes presented today were not difficult to distinguish” (sbtr1), while another teacher added, “Broader shapes were represented by more grand sounds, and diagonal lines by lower-pitched sounds” (satr4_vi). Students echoed this view. “It was easy to distinguish and understand the shapes by touching them while listening to the sounds” (scst3_vi).
Academic Use
Participants highlighted VASS’ value in academic subjects like math, art, and geography. A teacher explained, “Students with visual impairment often give up on math because they can’t see graphs. Braille has limitations, but representing graphs through sound could help them understand and retain the information better” (sctr1_vi). Several students also shared that VASS could support understanding of graph directionality and coordinate planes, and geometric shapes across subjects.
Daily Life Use
Participants saw VASS as useful for real-world navigation and hazard detection. A teacher mentioned, “VASS could help users recognize traffic patterns and avoid hazards, like door handles or changes in ground elevation” (satr3). Several students described its usefulness for locating entrances and exits where tactile cues are not available or consistent. One student shared “If VASS could notify me about doorhandles or steps, it would help me avoiding wasting time trying to find them” (scst2_vi). Others noted that VASS could assist in identifying distant or unsafe objects and distinguishing fast-moving hazards like bicycles.
Auditory Sensitivity
Educators highlighted VASS’ potential to leverage auditory senses to compensate for students with visual impairment and improve auditory sensitivity. “If auditory stimulation can activate the visual areas of the brain and help students compensate for visual limitations, it could significantly aid individuals in compensating for visual limitations” (sbtr1), one explained. Another added, “Auditory training is crucial. . .timely listening practice can significantly benefit students, especially for orientation skills” (sctr1_vi).
Creativity and Engagement
Educators also noted that VASS could foster creativity and engagement. As one expressed, “This could help develop creativity, as students realize, ‘this can be expressed in such ways’” (satr2_vi). Others described the experience as playful and mentally stimulating, suggesting that its multisensory nature could heighten student interest. Students did not mention creativity or engagement explicitly.
Influencing Factors on the Effectiveness of VASS
Auditory Abilities
Participants acknowledged variability in user’s auditory perception. “We often think that individuals with visual impairment must all have good hearing and memory, but that’s not always the case” (satr1), one teacher cautioned. Others agreed that individuals with stronger auditory perception would find VASS easier to use. One teacher noted, “There may be differences based on musical perception. . . students with better auditory perception would likely understand more easily” (sbtr3). Students agreed that good pitch perception or musical training could help, while emphasizing that clear sound design would make the system accessible even to those without musical backgrounds.
Cognitive Abilities
Concerns were raised about VASS’ cognitive demands. One teacher commented, “I wonder if only students with high cognitive abilities could use this” (satr1). A student similarly noted, “If shapes all have different sounds, it’s easy to forgot a few and get confused. I think that depends on each person’s memory skills” (scst2_vi).
Training and Practice
High-quality, structured training was seen as essential. One teacher explained, “With repeated training, students would become more familiar with the sounds corresponding to shapes” (sbtr3). Several teachers and students recommended introducing VASS principles in advance, remarking, “If we could provide a brief introduction on VASS principles on how the shapes are expressed in sound in advance of training, the training would be smoother and more effective” (sbtr4_vi). A student added, “If there were something like a manual, it would have been helpful to read it (beforehand) and then listen while learning” (scst4_vi). Other participants encouraged self-directed learning using labeled tactile aids and suggested using real-life examples and assessments to help users apply auditory information.
Anticipated Challenges in Real-World Implementation
Fleeting and Sequential Nature of Sound
Participants noted that the fleeting and sequential nature of auditory information increases cognitive load. “The sound is too short. Even though I was focused. . . I got it wrong” (sbtr5), said one teacher. A student echoed this, saying, “it would be better if the sound were a bit longer. It felt short” (sast3_vi).
The sequential nature of auditory information can also slow down recognition compared to visual input. This could hinder quick reactions, as one participant noted, “When we see a shape, we know what it is right away, but with VASS, it takes a few seconds of listening before we can identify it. If something happens quickly, like a ball flying toward you, can you recognize it as a ball and avoid it just by hearing the sound in a few seconds?” (satr1).
Continuous auditory feedback was seen as potentially distracting and exhausting, particularly when the information is excessive or irrelevant. “When I’m walking in real life, aren’t all these things just obstacles? The sensors would keep making sounds over and over. . . It’s not easy to distinguish between important things and unnecessary noise” (sctr1_vi), said one teacher. A student similarly shared, “If sound is coming from every object around me even those that are not important it could become noisy. I only want to know about specific objects, and if I hear everything, it might interfere my ability to judge what really matter” (scst2_vi).
Mechanical Sound Discomfort
Some users found the current sound output unpleasant. One student commented, “The sound is a bit mechanical and unpleasant right now. It would be better to use a more stable sound” (scst2_vi). Another educator noted, “The sound itself is somewhat irritating, like a sharp ‘beep’ noise. Rather than using a harsh mechanical sound, it would be better to use something softer. Individuals with visual impairment, who rely heavily on auditory senses as compensation, might find the current sounds even more jarring” (sbtr1).
Unclear Comparative Advantages
Participants questioned whether VASS offers clear advantages over tactile learning or voice-based assistance. One educator noted, “Tactile information seems to be more useful than auditory information. It is easier to convey the shape by allowing someone to feel it directly with their hands” (satr2_vi), while a student said, “I mostly use braille, so I haven’t really felt a strong need for this kind of auditory system” (sbst1_vi).
Other participants expressed a preference for voice feedback or suggested AI-generated speech. “Rather than interpreting sound waves, it might be more helpful if AI just provide the answer, like ChatGPT does. The era of requiring individuals with visual impairment to memorize everything has passed” (sctr1_vi), explained a teacher. Another added, “Unless the system names the object directly, it is hard to justify why someone would use VASS over more familiar tools. There are already too many accessible alternatives that require less effort to learn” (sbtr3). Students echoed, expressing a preference for voice-based information over remembering complex sound-shape associations.
Complexity and Risk of Misinterpretation
Participants highlighted the complexity of real-world environments, noting that sound-based interpretation may be error-prone. One teacher explained, “VASS seems to rely on memory, while tactile information can be identified immediately through touch. With VASS, there’s the possibility of memory errors and the risk of incorrect generalization” (satr1). Concerns were also raised about the difficulty of identifying objects of similar shape: “The refrigerator, dining table, and desk are all rectangles, but how can we distinguish their sizes through sound alone?” (sctr1_vi).
Some participants pointed that background noise and blind spots can add complexity to the real-world environment. One student noted, “If other noises overlap, I might not hear the VASS sound well. Imagine standing at a crosswalk when a bus passed by, I might not be able to hear anything at all” (sast1_vi). A teacher with visual impairment describes similar challenges when using ultrasonic canes: “Even if we manipulate pitch or waveforms for ultrasonic canes, it would be difficult to distinguish between a narrow pole and a wide wall based on sound alone. There are too many blind spots, and the number of sound cues one would need to memorize for full mobility would be overwhelming” (sctr1_vi).
Need for Long-Term Training
Participants also expressed concerns about the extensive training required to master VASS. “I understand simple shapes, but acquiring various types of information would require significant training” (satr1), one teacher commented. A student added, “It feels like this requires a lot of practice. I don’t think users with visual impairment can pick it up quickly, and it would take a long time to get used to” (scst1_vi).
Suggestions for Improvement and Design Considerations
Customizable Sound Options
Participants advocated for a wide range of sound schemes, including varied tones or musical notes, to represent different objects more effectively. One student noted, “Musical sounds help you focus, even with background noise like a passing bus” (sast1_vi). Others called for clearer pitch distinctions and simpler sound design, saying “The pitches are too similar—it would be better if the sound differences were more distinct” (scst1_vi), and “We need diverse sounds that are simple and convenient, not overly complicated” (scst3_vi). Customization options were also recommended, with one educator noting, “The Clover app allows users to select voices. It would be great if the VASS application could also allow users to choose sounds” (satr1).
Standardized Sound Cues
Educators emphasized the value of a consistent auditory pattern to support quicker and more accurate shape recognition. “This is also, in a way, like music. It needs to have a regular pattern that everyone can hear and understand. Establishing standards for pitch range or duration might make it more universally applicable,” said one educator (sbtr2). Educators also stressed the importance of accommodating individual differences in sound sensitivity, suggesting customizable sound settings. Another participant added, “If there is a way to define the middle note or set the highest and lowest notes, and present that at the start, it might help with better differentiation” (sbtr3).
Selective Information Delivery
To reduce cognitive load, participants recommended selecting only necessary auditory cues. “It would cause real fatigue. It is better if the feedback is given when the user is curious or uncertain” (scst1_vi), a student suggested. Another explained, “If the system keeps describing everything in front of me in real time, I might get tired and even miss important surroundings while trying to focus on the sounds. It would be better to only hear what I need, when I need it” (scst2_vi).
Multisensory Integration
Participants suggested combining VASS with other sensory inputs, like tactile information, to enhance comprehension or safety. “If I could read it in braille and hear it at the same time it would be fine. . . I hope multiple senses could be used simultaneously” (sast5_vi), said one student. An educator added, “it’s hard to protect ourselves just using vague cues. That’s why it’s better to use VASS with a white cane as a supplementary tool” (sbtr1). Another educator noted that multisensory cues could reduce anxiety and prevent misunderstandings, “Students with high anxiety perceive their surroundings as dangerous. I hope the program can provide diverse information through multiple senses” (sbtr3).
Device Design and Assistive Options
Participants were interested app-based format and discreet, wearable devices. One teacher noted, “It would be great if they could develop an app. Without the need for a separate device, it could be more widely distributed” (satr4_vi), while a student added, “If it were developed as an app, it would be much easier to use in daily life” said one student (sbst1_vi). Others suggested linking the app to Bluetooth earbuds or hearing aids to support individualized needs. To minimize cognitive burden, some participants proposed incorporating barcode-based interactions that allow users to access information only when needed.
Participants also emphasized the need for auditory devices that do not isolate users from their environment. Bone-conduction earphones were frequently recommended, as they deliver audio without blocking the ears. “If students keep wearing earphones during class, they might not be able to respond properly to important alarms like fire alarms. This should be considered when developing the device” (sbtr1), said a teacher. Another added, “Bone-conduction earphones are better because they do not block environmental sounds—even with hats or hair” (sctr1_vi).
Some students also expressed discomfort with conventional headsets that could block surrounding sounds. To address these concerns, several suggested lightweight, discreet wearables such as small clip-on chips for clothing, devices mounted on eyeglasses, or badge-style attachments to minimize obtrusiveness and avoid social discomfort. One student explained, “Attaching a device to clothing might be safer than holding a phone while also using a cane” (sast1_vi).
Discussion
The study highlights users’ perspectives on VASS’ usefulness, influencing factors, challenges, and improvement. Across interviews, participants perceived VASS as a promising tool to support both educational and daily tasks, particularly in facilitating learning, spatial reasoning, and creativity through auditory input. These perceptions align with recent advances in sensory substitution technologies. Prior research demonstrates applications ranging from sonification with tactile exploration for academic content (e.g., TactualPlot; Chundury et al., 2023) to spatial orientation tools supporting mobility (Chebat et al., 2015; Ward & Meijer, 2010), as well as complex recognition tasks such as face-shape identification (Arbel et al., 2022) and climbing-based navigation (Richardson et al., 2022). Together, these developments underscore the expanding real-world applicability of sensory substitution approaches.
Consistent with prior studies, participants emphasized that VASS usability and engagement depend strongly on users’ auditory and cognitive capacities, as well as on the structure of training. While some prior research has explored musical ability in relation to VASS (e.g., Haigh et al., 2013; Hamilton-Fletcher & Chan, 2021), few have examined how broader cognitive and auditory skills are associated with or impact VASS outcomes. Regarding training, prior studies show that repeated training and feedback can lead to automated perception and generalized learning, improving recognition of both familiar and novel stimuli (Abboud et al., 2014; Netzer et al., 2019; Striem-Amit et al., 2012; Ward & Meijer, 2010). Immersive and interactive environments further promote transfer to real-world contexts (Ricci et al., 2023; Seki & Sato, 2010). Several studies have also explored adaptive training, which adjusts task difficulty, sound complexity, or feedback intensity, and multisensory training designed to reduce cognitive load and enhance spatial understanding. However, evidence regarding their effectiveness—or which strategies work best—remains mixed (Abboud et al., 2014; Heimler et al., 2015). Further research is needed to systematically compare different training approaches that account for individual differences among users and to identify strategies that best promote sustained learning, transferability, and long-term usability of VASS.
In addition to these facilitators, participants identified several barriers that limit the practical utility of VASS. These included the fleeting nature of auditory information, difficulty distinguishing complex shapes, discomfort with mechanical sounds, and the cognitive burden required to interpret unfamiliar auditory symbols. These challenges align with concerns raised in the literature on cognitive load in auditory processing (Brown et al., 2014; Brown & Proulx, 2016; Kristjánsson et al., 2016). When auditory inputs are overly dense or simultaneous, processing capacity can become overloaded, reducing recognition accuracy and increasing reliance on memory-based decoding. These findings align with research on auditory scene analysis showing that overlapping or harmonically consonant sounds can interfere with perceptual clarity in sonified visual stimuli (Brown et al., 2015). These perceptual and cognitive barriers underscore that effective VASS design must balance information richness with processing capacity.
Notably, several participants framed VASS use as a task requiring memory-based decoding rather than a perceptual immersion. This distinction is critical, as it highlights a central tension between VASS’ promise as a sensory substitution technology and users’ current experiences with it in practice. Rather than enabling immediate perceptual access, participants often described VASS use as involving deliberate interpretation and recall, suggesting that perceptual integration has not yet fully emerged. This finding underscores the need to develop more intuitive sound mappings and scaffolded training that can help users move toward embodied perception and improve usability. Without this shift, users may struggle to progress beyond rote memorization and encounter fatigue or frustration with the tool.
Beyond technical and perceptual challenges, some participants questioned the added value of VASS compared to more familiar tools such as braille, tactile graphics, or voice-based assistance. These concerns point to the importance of clearly articulating the unique aspects of VASS, especially in contexts where other modalities are limited or less feasible. Enhancing communication about the advantages of VASS may increase user acceptance and engagement.
Taken together, these findings reconnect to the Technology Acceptance Model and Diffusion of Innovations frameworks and help clarify how users come to accept—or struggle to accept—VASS. While the Technology Acceptance Model emphasizes perceived usefulness and perceived ease of use, participants’ accounts show that ease of use depends on whether VASS is experienced perceptually or as a cognitively demanding decoding task. From a Diffusion of Innovations perspective, participants’ comparisons with braille, tactile graphics, and voice-based tools highlight the importance of comparative advantage and compatibility with existing sensory practices. Overall, these findings help explain why users may recognize VASS’ potential yet find it difficult to use comfortably and consistently in educational and daily contexts.
In light of these adoption-related considerations and participant feedback, we propose the following user-centered design recommendations to inform the next phase of VASS development:
Limitations
Several limitations of this study should be acknowledged. Although the research aimed to capture practical perceptions of VASS, the tasks were introduced in a short, controlled training setting. While the training allowed participants to understand the basic principles of sound-shape encoding, it was not sufficient for developing proficiency or perceptual fluency. Consequently, participants’ feedback likely reflects initial impressions and perceived usability rather than outcomes derived from extended learning or real-world use. Perceptions regarding the application of VASS in complex or dynamic contexts may therefore be based on inference rather than direct experience. In addition, although the study included participants with varying levels of visual impairment and different timings of vision loss, small subgroup sizes limited our ability to investigate how these factors influenced user perceptions. It also limited the analysis of how reliance on different sensory modalities (e.g., touch vs. residual vision) shapes user experiences with VASS. Larger and more diverse samples are needed to explore how visual status and sensory modality reliance affect the usability and effectiveness of VASS.
Implications for Future Research
Future research should evaluate VASS in more dynamic, real-world environments that reflect the challenges faced by individuals with visual impairment. Longitudinal studies with extended training periods are needed to assess whether users move from memorization to perceptual use, a key goal of sensory substitution. Comparative studies examining different training approaches (e.g., implicit exposure, structured feedback, task sequencing) could clarify how instructional methods impact training outcomes. Further investigation is also needed to clarify for whom VASS is most beneficial. Systematic assessments of individual differences, such as vision status, cognitive ability, auditory discrimination, and prior exposure to assistive technologies, could inform tailored protocols and device customization. Moreover, evaluating VASS’ application in specific academic domains, such as graph reading or science learning, would help determine where it adds instructional value beyond tactile or verbal tools. Finally, collaboration with educators or mobility specialists will be essential to align future VASS designs with classroom demands and orientation and mobility training goals, ensuring that the technology is both pedagogically sound and practically usable.
Conclusion
This study contributes by exploring the potential and challenges of VASS technology and offering user-informed insights to guide its future development. Participants recognized the promise of VASS as a tool for learning and navigation, but also emphasized cognitive demands and usability barriers that limit broader adoption. These findings underscore the need to clearly communicate VASS’ unique values, user-centered design, intuitive interaction, and adaptive training in the future development of sensory substitution systems. With continued improvements such as customizable sound schemes, multisensory feedback, mobile accessibility, and training tailored to user capacities, VASS has the potential to become a more inclusive and effective tool for individuals with visual impairment across educational and daily settings.
Footnotes
Acknowledgements
We thank the research team at Electronics and Telecommunications Research Institute (ETRI) for creating the experimental apparatus and their valuable input in developing the training program. We also thank the students and educators from schools for the blind in South Korea who participated in this study.
Ethical Considerations
This study was approved by the Korea University Institutional Review Board (Approval No. KUIRB-2023-0313-0). The study was conducted in accordance with the Declaration of Helsinki, and all data were anonymized to ensure confidentiality.
Informed Consent
Written informed consent was obtained all participants prior to participation in the study.
Author Contributions
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Due to ethical restrictions and the sensitive nature of qualitative data involving students with visual impairments, interview transcripts are not publicly available. Anonymized excerpts may be shared upon reasonable request to the corresponding author.
