It’s not what you say,but how you say it: Using artifact intelligence to detect deception in natural speech in an HR recruiting context

Abstract

Human emotions trigger physical reactions of the body that can be interpreted using artificial intelligence (AI) methods. AI-enabled detection mechanisms offer new opportunities to gain deeper understanding of human emotions. AI is already able to recognize basic emotions, such as joy, anger, or fear, but is challenged to detect complex emotions accurately, such as when someone is lying. The human voice conveys a wealth of subconscious information. This research utilizes natural language processing (NLP) to present a novel AI artifact that analyzes solely the human voice to detect whether a person is speaking to their true conviction. This is useful for use cases where it is important for decision makers to know whether a person’s arguments are truthful. A suitable use case could be corporate recruitment, an area where people traditionally lie a lot. The artifact can support human resource (HR) personnel in making better decisions by overcoming their naturally weak ability to detect deception. To suit the sensitive context of recruitment talks, the artifact is designed to protect data privacy, and thus has no speech recognition capabilities and does not store information that identifies the speaker. The model detects deceptive statements with ∼82% accuracy. Integrating the artifact into corporate business processes would enable managers to more accurately detect deceptive behavior of their interview partners. The increasing use of novel AI artifacts to compensate for humans innately weak capabilities to detect other people’s complex emotions will challenge our theoretical conceptualization of organizational decision making in light of emerging human–AI hybrids.

Keywords

artificial intelligence natural language processing voice analysis deception detection complex emotions human resources organizational decision making human–AI hybrids

Introduction

Artificial intelligence (AI), defined as “the ability of machines to perform human-like cognitive tasks” (Benbya et al., 2020a: p. 9), has made enormous progress over the last years. Previously, AI was primarily used to support decision making by providing novel analytical information resulting from the automated analysis of huge amounts of data. Nowadays, modern technologies enable machines to learn and draw conclusions on a completely new level (Strich et al., 2021). Contemporary research has progressed into complex areas such as analyzing human emotions (Bagozzi et al., 2022) and predicting human behavior. These topics are high on scholars’ as well as practitioners’ agendas (Benbya et al., 2021; Huang and Rust 2024; Storey and Park 2022; Yang et al., 2021).

Corporations are interested in using AI to better understand their customers. They want to serve them better, increase retention (Ganesh et al., 2000), sell more products (Rainsberger 2022), reduce operational costs (Yang et al., 2021), and intensify their relationships with them (Huang and Rust 2024). Being able to automatically detect whether a person is being deceptive is another area of great interest (Suchotzki and Gamer 2024). This knowledge could enhance the capacities of insurance companies to identify false claims when customers report damages, or those of investment bankers to determine whether CEOs are truthfully reporting on the financial position of their company (Yang et al., 2023).

Many human interactions happen when people are not physically in the same room. Phone conversations, voice-assisted interactions over computers (Griffin 2021), and video conferencing are commonplace in business communications. This trend enables AI tools utilizing natural language processing (NLP) to analyze the voice of the speaker without the need for special equipment. For example, call centers could use AI to improve customer interactions by detecting anger or dissatisfaction in the caller’s voice (Sudarsan and Kumar 2019) and providing the handling agent with corresponding recommendations on how best to deal with the detected negative emotions of the caller (Ponomareff 2017). Customers in severe distress may be assigned to faster queues or to agents specifically trained to de-escalate the situation.

AI continues to advance into our everyday lives, more or less obviously and transparently. The societal implications of this advancement are yet not fully understood, and there is no way back (Tiwari 2023). Without doubt, unintended consequences will occur (Nyman et al., 2024) and people will be affected personally (Rainie et al., 2023), which can make many people afraid of AI. However, when corporations are able and willing to minimize the potential negative consequences of AI (Suchotzki and Gamer 2024), it offers a wealth of useful applications that can benefit society at large.

Analyzing the human voice offers unique opportunities because speech carries a lot more information than just the content of the spoken words. Contemporary NLP research is well advanced in detecting basic emotions, including joy, love, surprise, sadness, anger, and fear (Shaver et al., 1987). However, identifying complex emotions like loneliness, guilt, or nervousness still remains a challenge.

A specific area of practical and academic interest is detecting truthfulness, which is common in and key to understanding human interactions. Identifying whether someone is speaking the truth is complex and cannot be reduced to the simple question of whether or not the person is lying, which may be colored by different qualities, such as telling a white lie, not telling the whole truth, or twisting the truth. However, the common characteristic of all forms of deception is that the person aims to convince the audience of a statement that is not in accordance with what they believe to be true.

DePaulo et al. (2003) found initial evidence that the way a person speaks may be a cue to detecting deception, which Yang et al. (2023) findings support. We hypothesize that being dishonest triggers an emotional status that subconsciously affects the voice of the speaker, which AI artifacts can detect. We tested this hypothesis by developing and testing an AI-based artifact that can identify whether a person has deceptive intentions solely from analysis of natural speech.

A common challenge of NLP research is the limited availability of data sets for training purposes. To generate a data set of people’s emotions when giving false statements, we conducted a controlled experiment in a debate club setting and recorded the voices of 92 participants arguing for or against a randomly assigned polarizing topic. We used the dataset to test several AI models, the best of which correctly identified positives (argument represents true conviction) and negatives (argument does not represent true conviction) with 82.4% accuracy.

This artifact was developed to support a practical use case in a human resource management business process, namely determining whether applicants are telling the truth in a job interview/recruitment situation.

Information systems (IS) is an interdisciplinary research field that studies the design, development, use, and management of information technology (IT) and its impact on individuals, organizations, and society. This research contributes to theory in various ways. From a design science perspective, this research advances the automated detection of complex human emotions and explicates design principles to guide the development of related AI artifacts. Our research also advances the adjunct research field of computer science by providing novel insights into the capabilities of AI/NLP and by presenting an AI model that can detect deception in speech more accurately than comparable published models. In the field of psychology, our research contributes to the theoretical discussion of human emotions and especially their measurability, as well as the question of whether ‘lying’ is an emotion.

The artifact developed is not only a technical innovation, but also leads to a conceptual expansion of decision-support logic in a high-stakes environment that traditionally strongly depends on human intuition in decision making. It adds to our understanding of the working relationship between humans and advanced AI artifacts. By enabling humans to incorporate automated emotion detection into complex organizational decisions, this type of artifact paves the way for new types of hybrid organizational decision-making processes, extending our current conceptualization of these matters. Consequently, the emerging human–AI collaboration taxonomies need to be amended to reflect this new option.

The article is structured as recommended by Gregor and Hevner (2013) and follows the steps of the Design Science Research (DSR) process proposed by Peffers et al. (2007): Following this introduction, we identify and motivate the problem to be solved, provide some contextual background and report on related work. Thereafter, we derive objectives for the solution, based on the design requirements and design principles. Subsequently, we describe the experiment to gather the training data and present the design and development of the artifact. Finally, we demonstrate the applicability to practice and evaluate the artifact within the selected use case.

Following the reporting on the technical development, we elaborate on the implications for theory and practice, explicate the limitations of this work, provide avenues for further research, and close the paper with a conclusion.

Problem identification

This research was initially inspired by the case of George Santos, a former elected member of the US Congress, who knowingly provided false statements on his résumé (see, e.g., Kohli 2023). He apparently did not consider it wrong or illegal to provide incorrect information knowingly, defending his actions by saying, “most people stretch the truth on their résumé” (Boyle 2023). Closer investigation shows that this appears to be true:

• The reference checking company Checkster reported that 78% of the candidates who applied for a job admitted that they misrepresented themselves on their application (Checkster 2020).

• Resumelab stated that 93% of the respondents to their survey said they knew a person who lied on their résumé (Resumelab 2020).

• When studying online behavior, Weitzel et al. (2019) found that more than 20% of job seekers provide incorrect information on their social media profiles to attract recruitment companies and almost 40% include untrue specifics related to job advertisements on their online profile.

The recruitment manager of an IT company confirmed in an interview with a German national newspaper, “One has to assume that every applicant lies to put her-/himself in a better light. However, there are the irrelevant small cheats and the big lies bordering on fraud” (own translation from German) (Schenz and Zeitung 2018). The manager classified four types of lies to illustrate the spectrum of untruthfulness:

• Irrelevant cheats (e.g., made-up hobbies)

• Tactical lies (e.g., “business travel is no problem for me”)

• Justified lies if the question violates the statutory rights of the interviewee (e.g., “I do not intend to have children”)

• Fraud (e.g., forged certificates or providing incorrect marks/grades).

Whereas the smaller cheats are unimportant, the consequences of serious lies can be harsh for the employee and costly (in terms of additional expenses and lost time) for the company. As one of the participants of our focus group stated: “We are in a labor market that favors the applicant, you know, a war for talent. These days, every new employee costs me around 20,000 EUR for headhunters, recruitment camps, onboarding, etc. If we have to terminate employment due to serious lies in the résumé, this money is lost and we need to start the whole process all over again. This is not only a monetary loss, it also costs us months…” (Expert 4).

Recruitment in Germany is traditionally based on the written materials provided by the applicant and one or more personal interviews, sometimes even a professional assessment center (Careerbee 2025). It is uncommon to request or consult personal references, so the recruitment process depends heavily on applicants telling the truth.

What do German job seekers lie about? A recent study (Statista 2022) identified three primary areas:

• Exaggerating their responsibilities in the previous job

• Listing capabilities they do not have

• Lying about their tenure with a previous employer

Surveys show that the number of applicants providing incorrect information to potential employers rose sharply over the last years (Bolden-Barrett, 2017; Resume Templates, 2024). Anecdotal discussion with company representatives confirmed that this issue is in fact a great concern for many companies, specifically when hiring technical personnel. As the CEO of medium-sized software development company put it: “We have big problems, specifically when hiring graduates. The information they provide on their résumés often does not match their real life experiences” (personal communication). This raises the question whether it is possible to automatically detect whether a person is stating the truth before the costly part of the recruitment process, that is, in-person interviews, are conducted.

Research objectives and contextual background

Specification of research objectives

“Everybody lies” is the famous tagline of Dr. Gregory House in the American TV series House, M.D. The series main character refuses to talk to his patients because he does not trust their assertions. Instead, he solely relies on objective measures like laboratory test results. Although the series exaggerates this concept for dramatic purposes, there is a grain of truth in it. Research has found that not telling the truth or making statements that are different than one’s inner belief is frequent in human interactions (Serota et al., 2022). Unfortunately, people are notoriously bad at detecting false statements and are typically able to identify them only half of the time (Bond Jr. and DePaulo 2006; George and Robb 2021). In a large-scale experiment with human participants, Bond Jr. and DePaulo (2006) found that people with no training in detecting false statements were able to correctly identify deception with an average rate of only around 54%. Similar results are presented by Levitan et al. (2020), who developed a game in which participants without training in detecting false statements had to judge whether someone is telling the truth based solely on audio files. The participants detected the accuracy of the recordings 49% of the time, which corresponds to mere chance. More dramatically, in a recruitment context, Suen and Hung (2023) find that interviewers detect false statements less than 20% of the time. These concerning research findings were among the motivations to develop an AI artifact that is able to detect deceptive statements significantly better than people.

Dishonesty: Deception and lies

Deception is defined as the “deliberate attempt to mislead others” (DePaulo et al., 2003: p. 74). This deliberate act includes all forms of intentional dishonesty, but not unintended falsehoods communicated by people who are mistaken (human error), self-deceived (cognitive dissonance), or experience challenges like the Dunning-Kruger effect, according to which people unknowingly overexaggerate their own knowledge and capabilities (Kruger and Dunning 1999).

Research on deception has a long tradition in various areas of scientific research, including psychology, medical research, sociology and organizational sciences. This breadth may be explained by the fact that deception, although usually perceived as negative, is a very common social behavior that the average person performs on a daily basis (DePaulo et al., 2003) As Zuckerman et al. (1981, p. 2) put it: “Lying is a common aspect of interpersonal relationships.”

Research shows that there are various facets of deception, including “real lies, white lies, and gray lies” (Fox 2018). Common across all of the facets, however, is the fact that the speaker is aware of the wrongdoing, which leads to negative emotions like guilt and fear (Ekman 2004; Whissell 2023), that is, the “bad gut feeling” commonly felt when lying.

The role of emotions in deception detection

Not speaking the truth is a highly emotional issue for human beings. Emotions are complex constructs of human feelings (Fox 2018). In broad terms, one could say that “the term ‘emotion’ exemplifies [an] ‘umbrella’ concept that includes affective, cognitive, behavioral, expressive and physiological changes” (Tyng et al., 2017: p. 3). Thus, “an emotion is a complex state that combines feelings, thoughts, and behavior and is people’s psychophysiological reactions to internal or external stimuli” (Zhang et al., 2020: p. 104).

Shu et al. (2018) discuss several models of basic emotions and complex emotions containing mixed facets. These complex emotions remain mysterious and the available concepts remain highly theoretical. Ekman (2004) identifies lying as a complex emotional construct made up of several basic emotions, predominantly fear, guilt, and delight, while DePaulo et al. (2003) found evidence that guilt, shame, and fear are amongst the strongest basic emotions experienced by someone who is lying.

Emotions arise spontaneously rather than through conscious effort (Shu et al., 2018) and result in physiological reactions such as increased heart rate, sweating, and dilation or constriction of blood vessels (i.e., blushing and paling) (Soleymani et al., 2015; Zhang et al., 2020). Such physical reactions, including heartbeat, skin reactions, blood flow, muscle tensions, facial expressions, or changes in voice, can be measured using technical equipment (Shu et al., 2018). The polygraph, or lie detector, measures such bodily reactions (Grubin and Madsen 2005) that people exhibit when they are lying (Proverbio et al., 2013). However, the polygraph use is declining because the results are often inaccurate due to bias, provide incorrect measurements and other deficiencies, such as the need for trained personnel to administer the test and the difficulty of interpreting the results (Katz 1983).

Background on automated detection of emotions

Recognizing emotions with artificial intelligence

Wherever someone experiences measurable physiological reactions, an AI artifact can be developed to interpret those. Research has shown that emotions can be detected through visual and/or auditory modalities (Fox 2018; Pan et al., 2015) and by applying machine learning techniques to the data collected (Yang et al., 2023; Zhang et al., 2020). A popular approach to detecting emotions is by analyzing facial expressions (Adolphs 2002), specifically micro-expressions (Matsumoto and Hwang 2018). For example, Tivatansakul et al. (2014) designed a web-based system that detected six basic emotions based on facial expressions with 86% accuracy. Krishnamurthy et al. (2018) combined video, audio, verbal transcripts, and micro-expression features into a multimodal neural model and report 96% accuracy in detecting deception. They tested their model on the real-life deception detection (RLDD) database (Pérez-Rosas et al., 2015) a publicly available dataset of 121 video clips from courtroom trials. Using the same dataset, Mathur and Matarić (2020) applied unimodal support vector machines along with multimodal fusion methods to identify deceptions in real-life situations and achieved 91% accuracy. Bhamare et al. (2020) used a deep neural network on an undisclosed dataset of 323 interviews. By combining analysis of speech and micro-expressions, they report to detect deception with 98% accuracy. However, their results were highly dependent on the facial features: “When interviewees concealed their eyes with sunglasses, the system failed as was expected” Bhamare et al. (2020, p. 147). Karpova et al. (2020) experimented with 93 volunteers who purposefully lied and tried to hide this from a polytrophic test. Using video analysis and eye-tracking techniques, the authors trained an end-to-end convolutional neural network. Their best model achieved a mean balanced accuracy of 64% in discerning truth from deception. On a dataset with 180 videos from 88 politicians, Kamboj et al. (2021) detected political deception from audio, text and facial features by utilizing a decision tree with an accuracy of up to 69%. Electroencephalogram (EEG) signals also provide the currently best known indications of the emotional state of individuals (Proverbio et al., 2013). Zhang et al. (2020) reported that EEG electrodes applied to the frontal lobe of a subject revealed a classification accuracy of more than 90% across different emotional states. Karnati et al. (2022) built a deep neural network with audio, visual and EEG features on three multimodal datasets. The datasets are publicly available and were technically augmented to provide a larger dataset. Through the combination of three modalities, the authors claim up to 98% emotional state identification accuracy. However, using EEG headgear for signal measurement is inconvenient for the individual and it takes considerable time to correctly measure the multichannel signals, which makes it unsuitable for the common business environment.

Emotion detection based on speech analysis

If an intrusive approach like a polygraph or an EEG is not appropriate for the situation and no video recordings sufficiently high in quality to analyze micro-expressions are available, audio analysis is an appropriate alternative, as suggested by Scherer (1995). Early work in this area by Amir and Ron (1998) provided an automated classification algorithm to identify emotions in speech but their work was limited by the technological capabilities at that time. As technology improved, numerous innovative approaches to recognizing emotions based on audio speech analysis have emerged. Zhao et al. (2019), for example, were able to distinguish several basic emotions with a test accuracy of about 96% using speech analysis. This indicates that with increasing computational abilities, analysis of the human voice is a promising route to detect even the complex emotional status of a speaker.

Emotions in spoken language are recognizable based on acoustic features including amplitude, pitch, and formants (Demircan and Kahramanlı 2014; Slimi et al., 2020). A prevalent approach involves the extraction of spectral features, namely the mel-frequency cepstral coefficients (MFCCs) first introduced by Davis and Mermelstein (1980). MFCCs have been shown to be particularly promising and computationally efficient for recognizing patterns in speech (Dave 2013; Fraser et al., 2016; Gupta et al., 2018; Logan 2000). Additionally, they are commonly used in speech emotion recognition (SER) as the basis for analyzing an individual’s emotions (Kishore and Satish 2013; Lalitha et al., 2015; Pakyurek et al., 2020).

Hidden Markov models (HMM) have been used for speech emotion recognition for more than two decades. Utilizing this method, Schuller et al. (2003) identified seven emotional states with 78% accuracy. More recent approaches apply deep learning (DL) methods, including recurrent neural networks (RNNs) (Trigeorgis et al., 2016) and combinations of convolutional neural networks (CNNs) and RNNs (Soleymani et al., 2015; Zhao et al., 2019). Based on log-mel spectrograms, Zhao et al. (2019) identified seven emotional states with an average test accuracy of around 96%. Beyond spectrogram features like MFCCs, some algorithms include contextual features to increase accuracy (Fayek et al., 2017; Trigeorgis et al., 2016). Taking this approach, Fayek et al. (2017) distinguish five emotional states with 65% accuracy. While the SER systems focus on a single person at a time, Majumder et al. (2019) analyze the conversations of a set of individuals and use this data to classify emotions. Their approach can classify six emotions with an average accuracy of 64%.

Automated detection of deception from speech

Although speech-based emotion analyses using MFCCs have great potential to serve as lie detectors (Chamoli et al., 2017), published research on automated detection of deception remains scarce. Several publications present artifacts but do not share sufficient information regarding the data set they used or they use a dataset which did not correctly record the emotion of a lying person because the data collection experiment was set up incorrectly. Fernandes and Ullah (2021), for example, trained several models on various spectral and cepstral features from speech, including MFCC features, and reached an accuracy up to 100%. However, their dataset consists of only one person providing true and false statements at three different times of the day. Nasri et al. (2016) analyzed data of true and false statements expressed by 40 volunteers and reported between 85% and 88% by using 13 MFCC features with a support vector machine. However, the study fails to explain how the participants were encouraged to lie, which makes it difficult to assess the validity of the results.

There are, however, many examples of more robust and better documented studies. Mihalache and Burileanu (2022) utilized neural network architectures for deceptive speech detection. Their model achieved 64% accuracy when tested on the RLDD database. However, the audio recordings in this dataset come from only 56 unique speakers, limiting its diversity. Fu and Lei (2020) proposed a model tested on the Columbia-SRI-Colorado (CSC) corpus. The CSC corpus consists of approximately 7.5 hours of speech data from 32 unique speakers, including 54% deceptive and 46% true speech acts (Hirschberg et al., 2005). Their model achieved 68% accuracy. However, since the CSC corpus is a professional database, it may only partially represent the diversity of real-world speech patterns. Fan et al. (2015) developed a method for detecting deception in spoken Chinese, achieving 64% accuracy on a corpus containing around 9.4 hours of speech data from 30 unique native Chinese speakers, each providing a truthful and deceptive statement. Fu et al. (2023) developed a semi-supervised deception detection algorithm for speech that achieved 69% accuracy. The dataset utilized in the study contains 582 true and 521 deceptive speech samples from 40 speakers (28 males and 12 females). Bareeda et al. (2021) proposed a model that achieved 81% accuracy in detecting lies on the RLDD dataset. The methodology extracts MFCC features and a support vector machine classifier. More recently, Snehalakshmi et al. (2024) presented a platform for lie detection in interviews incorporating a stress measure from MFCC features, deception detection from facial features and a plagiarism checker. With their combined approach, they reached 72.2% accuracy. Because applicants are clearly visible in the videos, they state that “ethical concerns and technology-related issues need to be overcome” in order to apply their solution.

This review of published results shows that a robust algorithm that provides data privacy, uses solely natural speech (i.e., can be used in phone conversations) and delivers consistently high accuracy levels to detect false statements, has not yet been published in publicly available resources.

Objectives for the solution

The DSR method defines steps to ensure rigorously developed generalizable outcomes in the development of artifacts (Hevner et al., 2004). After the problem-centered initiation, that is, motivation of the problem and definition of the research objective, the next step is the objective-centered solution, that is, the definition of measurable objectives and functional and/or non-functional requirements towards the artifact.

Use case

In DSR, use cases are widely used to capture the objective-centered requirements towards an artifact to be developed (Engel et al., 2024). Use cases are a proven and robust tool to ensure all stakeholders involved in the development of the artifact share the same understanding, especially developers and users (Dutoit and Paech 2002). To anchor our research in practice, we chose the corporate recruiting process context. Recruiting is an area where innovative technical tools are often deployed to gain efficiency and raise decision quality (Fleiß et al., 2024) and where AI-enabled tools are increasingly used (Gao et al., 2025; Purohit and Banerjee 2025). The research team used HR textbooks and practitioner-oriented literature (e.g., Bogen and Rieke 2018) to develop the use case.

The company posts a job offering that individuals can apply for using an online tool on the corporate website. The applicants provided the required information in a structured format. The first computational stage in the process is an automated check for defined formalities like minimum GPA, university degree, and years of relevant work experience. Candidates passing this formal check are invited to take a standardized online test focusing on mathematical skills, abstract reasoning, and other skills customized to the respective position. These tests are evaluated automatically and successful candidates are invited to a next stage, which is new.

The applicant is asked to have a phone or video call with a company representative. In this conversation—without having time to prepare, which ensures free speech and preempts pre-formulated statements—the applicant is first asked to present the highlights of her/his résumé, to get the applicant talking and counter initial nervousness. After a few minutes, the recruiter then asks the applicant targeted and detailed questions that directly address an area of specific importance for the position, such as how they handled severe team conflicts in the past or how they solved a specific complex technical issue.

During this second phase, the algorithm analyzes the voice of the presenter and informs the company representative when it detects potential deception using a green/red gauge on the screen. Based on this information, the representative can enquire about questionable statements in more detail. Being alerted to potentially deceptive statements enables the representative to garner better information about areas of important for the position and advance people further along the application process more selectively.

Focus group workshop

To derive design requirements and formulate the objectives of the artifact, we conducted a focus group workshop with HR experts, using the initial use case presented above as starting input. To limit the complexity, we chose a homogeneous group of companies and invited experienced managers from six medium-sized enterprises (between 150 and 5,000 employees) in the manufacturing and software production industry. All participants are currently responsible for recruiting new personnel and have at least 5 years of experience in interviewing applicants (Table 1).

Table 1.

Description of workshop participants (experts).

Code	Role	Type of business	Recruiting mainly
Expert 1	CEO	Software provider	Software developers
Expert 2	CTO	Software provider	Software developers
Expert 3	Lawyer (labor law)	Manufacturing	All roles
Expert 4	Head of HR	Manufacturing	Managers
Expert 5	Recruitment specialist	Software provider	Managers
Expert 6	HR team leader	Manufacturing	Software developers

Participants agreed unanimously that such an artifact would enhance their decision quality and save costs by not inviting applicants to physical interviews who lacked key capabilities. The panel confirmed the use case and deemed it feasible, but considered it unsuitable for recruiting senior managers due to the delicate hiring process for top management personnel. Thus, the use case focuses on the recruitment process for IT personnel in junior positions, where the need for personnel and the costs associated with failed hires are high. In addition, extant research shows that people applying for technical positions are specifically prone to make deceptive statements on their job applications (Weiss and Feldman 2006).

All participating companies already use online recruitment instruments, mainly automated checking mechanisms such as filters and minimum criteria hurdles, and automated standardized tests, usually of math, abstract reasoning, and analytical skills. Applicants who pass the automated filters and tests are invited for physical interviews. Our workshop participants no longer use multi-day assessment centers.

Solution requirements

In a moderated discussion, the experts identified their requirements for an artifact that could support their recruitment work. To ensure a broad spectrum of input, the project team provided insights from literature on deceptive behavior in job interviews (e.g., Bill et al., 2020; Bill et al., 2024; Melchers et al., 2020; Renier et al., 2021) and the use of AI tools to detect deception (e.g., Constâncio et al., 2023; Li et al., 2021; Van Den Broek et al., 2021; Vrij et al., 2022; Vrij et al., 2023).

The requirements identified during the discussion were collected and clustered into groups:

• Use environment: The artifact needs to work with voice recordings of normal quality, such as phone calls or video conferences from home, and without specific recording equipment. No additional technical barrier for the interviewee.

• Recognition accuracy: The artifact must detect deception with significantly higher accuracy than the experts.

• Data privacy: The artifact may not store biometrical data and must conform with EU data privacy regulations (EU-GDPR).

• Legal regulations: The artifact must comply with sensitive HR processes and the applicable legal regulations.

• Ethics: The artifact needs to fulfill all ethical considerations, as discussed below.

• Biases: The artifact must operate free of gender, racial, and other biases. Applicants must always be treated fairly and objectively.

• Decision making: The artifact can support the recruitment expert’s decision making by indicating potential deception and providing an analytical assessment, but may not make decisions. The human recruitment expert always makes the decisions.

Ethical considerations and data privacy

The workshop participants discussed ethical considerations and data privacy concerns extensively. Every company wants to respect the explicit and implicit data privacy rights of applicants, especially in Germany (Prince and Wallsten 2022). Consequently, the design principles of the proposed artifact comply fully with the European General Data Protection Regulation (GDPR) (EU 2016) and the EU AI Act (EU 2024).

Beyond legal and regulatory compliance, the research team discussed ethical considerations extensively with academic experts on ethics. There was unanimous agreement that deploying the proposed artifact is ethically permissible if (a) the applicant is informed ahead of time their statements will be checked for validity and that statistical measures/AI algorithms will be used in this process, and (b) the applicant has the option to opt out without negative consequences.

The artifact is designed to prevent deception, which is unethical behavior. Therefore, a fraudulent person has no legitimate right not to be detected. After all, companies have been using various mechanisms to avoid deception (see, e.g., Vrij 2008), including fraudulent detection in job interviews (Ferran and Storck 1997; Mahbub and Pardede 2018) for decades.

With regard to data privacy, the artifact (a) does not store additional personal data and (b) has no semantic capabilities, that is, the algorithm can and does not process the meaning of the words the applicant speaks. After the voice stream has been analyzed, the artifact returns a single value, an assessment of the truthfulness of the voice stream. The voice stream does not need to be saved and can be discarded immediately after analysis. All further information is already stored by the company to process the application and as such subject to the data privacy agreement between the involved parties, typically acknowledged at the beginning of the application process. If the company wished to store the voice patterns for post-analysis use, it would need to explicate that in the data privacy agreement.

Statement of solution objectives

In summary, the artifact needs to fulfill the following objectives which will be the basis for its evaluation (see section 10): (1) No special equipment, (2) Data privacy, (3) Legal and ethical compliance, (4) Free of bias, (5) Decision support only, (6) Superior detection accuracy, (7) Superior performance towards available solutions. Please note that objective 7 was not brought up in the workshop but derives from the DSR method.

Data collection

Publicly available German language datasets are scarce (Xu et al., 2020), and a dataset focusing on deception is not available. Therefore, it was necessary to generate a new dataset to train the algorithm. Due to the sensitive nature of the recruitment use case, it is not possible to gather training data directly in the setting of a recruitment interview. Instead, in a controlled experiment, we collected audio data of verbal representations of opinions from native German speakers arguing for or against their inner conviction, which they identified only after the experiment.

Data collection process

To generate these very specific voice recordings we chose the setting of a debate club. We distributed flyers in local universities, social centers and on various social media channels inviting volunteers to participate in our study, which we referred to simply as “an experiment,” offering each participant a 10€ Amazon voucher as compensation.

The experiment followed typical debate club procedures: The participants were randomly grouped into pairs. Each pair was randomly assigned one of four topics (see 6.3 below), and whether the individual should argue pro or contra. We did not provide the participants with any information about the study’s objective beforehand to avoid possible bias. Notably, none of the participants asked about the underlying intention of the experiment.

The participants had 30 minutes to prepare their arguments and were instructed not to use the internet in their preparation. Participants were permitted to write down bullet point notes but not fully formulated statements. Each participant was then asked to argue their assigned position (pro or contra) on their topic in approximately 2 to 3 minutes of free speech.

After the debate, the research team provided each participant with feedback on their rhetoric performance/argumentation to help them improve their soft skills. Finally, participants were asked to provide suggestions on improving the operational handling of the experiment. No further interaction with the research team occurred to reduce social desirability response bias. Immediately after the feedback discussion, participants received an automated email asking them to complete a brief online survey of basic demographic data and whether they had argued for or against their true conviction about the debate topic. All participants completed the survey fully and within 1 hour of receiving it.

We collected this data in Germany in 2022 and 2023, recording the statements either personally or via videoconference. To ensure data privacy, only the audio was recorded and used. All participants were informed about the data privacy precautions (EU-GDPR compliant) and signed a consent form authorizing the use of the audio recording for research purposes before the experiment started.

Demographics of the participants

We conducted 46 debates, thus collecting 92 audio recordings. Due to audio problems, such as breaks-ups, transmission delays and interference due to bad internet connection, 12 recordings had to be removed, resulting in 80 individual audio files.

The participants included 41 employed non-students, 31 undergraduate and graduate students, and 8 high school students. All debates were conducted in German. 35 were women and 45 were men, and their ages ranged from 16 to 60 years (Ø = 27.15 years; median = 26 years). See Appendix for additional demographic details.

Topics

We deliberately chose highly polarizing topics for the discussions:

• Topic 1: Should the death penalty with public executions be reintroduced in Germany?

• Topic 2: Should cost-covering tuition fees be charged by universities in Germany? (Note: this means around 15,000 EUR per semester)¹

• Topic 3: Should the use of hard drugs such as heroin and crystal meth be legalized in Germany?

• Topic 4: Should restaurant chains serving unhealthy fast food, such McDonald’s or Burger King, be banned in Germany?

These polarizing topics were selected to increase the likelihood that participants would have a clear and unambiguous opinion. According to participants’ responses to the email questionnaire, the speakers’ true convictions were aligned with the research team’s expectations: none of the participants stated a true conviction in favor of reintroducing the death penalty, introducing cost-covering tuition fees, legalizing hard drugs or banning restaurant chains that serve unhealthy fast food. However, social desirability response bias may have influenced these stated convictions.

The distribution of the topics and the respective pro and contra positions the participants were assigned is provided in Table 2.

Table 2.

Allocation of topics and assigned pro/contra positions.

	Topic 1	Topic 2	Topic 3	Topic 4	Sum
	27	13	24	16	80
Pro	15	8	16	6	45
Contra	12	5	8	10	35

Table 3 shows for each topic how many participants argued for or against their conviction.

Table 3.

Inner conviction of the participants per topic.

	Topic 1	Topic 2	Topic 3	Topic 4	Sum
Argument reflected the speaker’s conviction	13	6	12	9	40
Argument did not reflect the speaker’s conviction	14	7	12	7	40

Development of the AI model

Feature descriptions

Based on the literature on emotion detection and automatic deception detection from speech, we pre-selected audio features that were used successfully in published studies. To train the models to classify true conviction, first the mel-frequency cepstral coefficients (MFCCs) (Davis and Mermelstein 1980) were extracted from the audio recordings. The MFCCs were derived from a time-frequency representation of the audio signal with the frequencies from 500 to 6000 Hz on a mel scale. Using MFCCs has two advantages: (1) it provides computational efficiency for recognizing patterns in speech (Dave 2013; Fraser et al., 2016; Gupta et al., 2018; Logan 2000), and (2) it generates strong results for audio files of inferior quality (Tivatansakul et al., 2014). In the next step, additional audio features were extracted, including the zero-crossing rate (ZCR), which is used to differentiate between voiced and unvoiced speech segments. ZCR reflects the number of times the signal crosses the zero axis in a given time window (Shete et al., 2014). In speech signals, voiced segments are characterized by periodic waveforms produced by the vibration of vocal cords. These waveforms result in a relatively low ZCR. In contrast, unvoiced segments, for example, those produced by airflow noise, are characterized by random and non-periodic waveforms, which result in a comparatively high ZCR (Bachu et al., 2010). Next, the spectral roll-off (McFee et al., 2015) was extracted from the audio files, which provides information about the spectral energy distribution in speech. It is used to characterize the timbre of speech tones, with brighter tones yielding higher values (Giannakopoulos et al., 2006). A further extracted feature includes statistics of the initial peaks of the sounds done during the recorded speeches. This way, variations in speech pauses can be detected. As basis-emotion recognition systems also consider pitch features, we furthermore extracted chromagrams corresponding to the twelve tone chromatic pitch classes (McFee et al., 2015). Finally, the fundamental frequency, which represents the lowest frequency in a periodic waveform, was assessed. It is a key characteristic of the speaker’s voice and can provide important information about the speaker’s emotional state (Hirst and De Looze 2021).

AI model training

The training procedure follows the approach of a support vector machine (SVM) (Hearst et al., 1998). An SVM classifier puts a separating hyperplane into a space so data points of one class are on one side of the hyperplane and the data points of the other class on the other side. SVM was chosen for three reasons: (1) The available dataset is relatively small and SVMs with its maximum-margin hyperplanes are better suited to achieve good results here; (2) SVMs are robust against noise and outliers, which frequently occur in audio data; and (3) SVMs can handle nonlinearly separable data, which is often the case in audio processing (Boateng and Kowatsch 2020). Additionally, SVM models are relatively easy to interpret and understand, which makes it easier to understand why a particular decision was made and identify improvement areas.

The following hyperparameters were applied to the training of the SVM. The radial basis function (RBF) (Hearst et al., 1998) kernel was used to implement the SVM for binary audio classification. The RBF kernel is a popular choice for nonlinear classification problems because it can map input data into high-dimensional feature spaces where linear classifiers can effectively separate the data, which allows the SVM to capture complex relationships between input variables and output class labels. The RBF kernel uses a radial basis function to measure the similarity between input data, and the kernel width hyperparameter, gamma, controls how far the influences of training examples reach. When training the SVM, the regularization hyperparameter C is essential in controlling the tradeoff between overfitting and underfitting. A higher value of C leads to a narrow margin, which may result in overfitting, and vice versa. As an overfitted model would memorize the training data and, thus, perform poorly on the test data, we want to avoid it. Hence, different values for the hyperparameters were tested to determine the optimal performance for the binary audio classification task.

Results

This section presents the average performance of the trained models. All performance metrics were derived through k-fold cross validation (Stone 1974) with specifically five folds and 25 repeats. For every repeat, the dataset was randomized and equally divided into five folds. According to the general procedure, one fold was kept for the test and the other four for the training of the model. Then, the folds were rotated until every fold has been once used for the test. This was then repeated 25 times. As an SVM does not do automatic feature selection, an appropriate choice of features is needed. Hence, the models were initially tested using 40 MFCC features, and in a second step with a reduced set of only 13 MFCC features. After careful analysis of the initial set of results, the MFCC features were further reduced to three features which generated the best results. This reduced set of MFCCs was then enriched by complementing values from ZCR, spectral roll-off, chromagrams, variations in speech pauses, and fundamental frequency to further improve the performance.

Thus, the results regarding all SVM architectures differing in C and gamma hyperparameters are presented. This emphasizes and highlights the final model’s importance as part of a dynamic process of model comparison. For every feature set combination, multiple SVM hyperparameter combinations were evaluated, which resulted in more than 100 models each. For ease of reference, only the five best performing models for each feature set are presented.

Performance metrics

The following performance metrics are applied to measure each model’s efficiency and reliability:

(1) True Positive (TP): How often the model correctly predicted that a speaker represented the position based on the debate stance.

(2) False Positive (FP): How often the model incorrectly predicted that a speaker represented the position, when the position was actually not represented.

(3) True Negative (TN): How often the model correctly predicted that a speaker did not represent the position based on the debate stance.

(4) False Negative (FN): How often the model incorrectly predicted that a speaker did not represent the position, when the position was actually represented.

These metrics alone are not sufficient to fully validate the results. For example, a high TP rate can be achieved when there is an extreme imbalance in the ratio between recordings of people whose convictions were not represented and speech data of people whose convictions were represented. To avoid this issue, a balanced ratio of input data is required, as well as a weighted consideration of the TP and TN rates (Aldwairi et al., 2018).

To compare the models’ performance, we applied the following metrics (Saxe and Hillary, 2018). Accuracy reflects the correctly classified proportion of a model composed of all data points. It is calculated by dividing the count of TP and TN by the summed count of TP, TN, FP, and FN. Precision reflects the share of TP-classified samples in the total number of samples predicted as positive (TP + FP). Sensitivity, also referred to as true-positive rate, demonstrates a model’s predictive power in terms of the proportion of correctly identified TPs to the total number of actual-positive observations (TP + FN). The F₁-score complements the accuracy measures because it balances unequal class distributions within a training dataset by calculating the harmonic mean between precision and sensitivity.

Model set 1: 40 MFCC features

All models were trained on 40 MFCC features with varying gamma and C hyperparameters, see Table 4. The best model trained, M1-40-E, achieved 61.9% accuracy across all repeated five folds. At 57.9% sensitivity, the model only predicts about 58 of 100 out of the speech events, where the speakers represent their positions, correctly as positive (TP) and 42 of 100 speeches incorrectly as negative (FN).

Table 4.

Models using 40 MFCC features.

Model	Gamma	C	Ø Accuracy	Ø Sensitivity	Ø Precision	Ø F₁-Score
M1-40-E	0.0075	100	61.9%	57.9%	59.5%	58.7%
M2-40-E	0.01	100	61.7%	57.3%	59.6%	58.5%
M3-40-E	0.01	10	61.4%	55.0%	60.2%	57.4%
M4-40-E	0.00025	100	70.0%	54.1%	60.0%	56.9%
M5-40-E	0.005	100	60.9%	57.8%	58.3%	58.1%

The non-impressive results show that utilizing many MFCC features does not produce satisfactory predictive performance.

Model set 2: 13 MFCC features

To improve performance, we assessed the number and composition of the MFCC features utilized. We started by including only the first 13 MFCC features in the training of the models, as this is a well-established method in speech pattern detection (Park et al., 2014). As shown in Table 5, the best performing model (M1-13-E) achieves an overall accuracy of 62.3%, performing a little better than M1-40-E.

Table 5.

Models using 13 MFCC features.

Model	Gamma	C	Ø Accuracy	Ø Sensitivity	Ø Precision	Ø F₁-Score
M1-13-E	0.0075	100	62.3%	53.7%	61.6%	57.3%
M2-13-E	0.01	100	61.9%	52.8%	61.8%	56.9%
M3-13-E	0.005	100	61.6%	49.5%	61.6%	54.9%
M4-13-E	0.0005	100	60.8%	46.7%	61.3%	53.0%
M5-13-E	0.1	3	60.8%	48.5%	60.6%	53.9%

Model set 3: Three MFCC features + acoustic features

To improve the performance of the models, numerous tests with different combinations of MFCC features and applied additional acoustic features like, for example, ZCR, spectral roll-off, or chromagrams were conducted. This led to significantly better results. The best accuracy was achieved by combining a reduced set of only three MFCC features with nine additional acoustic features. The best model trained, M1-6-C, achieves 82.4% accuracy (see Table 6). Moreover, 79.7% out of the true positives (conviction of a speaker is represented) will be correctly predicted as positive and 20.3% falsely as negative (conviction of a speaker is not represented) according to its sensitivity. Furthermore, when the model is predicting positive, then 84.0% (see precision) are actual positive and 16% negative. The complete list of features utilized by the best model is provided in the appendix.

Table 6.

Models using three MFCC and nine additional acoustic features.

Model	Gamma	C	Ø Accuracy	Ø Sensitivity	Ø Precision	Ø F1-Score
M1-6-C	0.0075	100	82.4%	79.7%	84.0%	81.8%
M2-6-C	0.0025	1000	82.2%	78.4%	84.1%	81.2%
M3-6-C	0.01	100	81.8%	78.6%	83.1%	80.8%
M4-6-C	0.005	100	81.7%	78.9%	82.9%	80.8%
M5-6-C	0.005	1000	81.0%	77.5%	82.2%	79.8%

Bias of the artifact

The AI artifact was trained using data generated from volunteers who participated in a debate club experiment. Although the demographics of the participants (see Section 6.2) indicate a heterogeneous mix, testing the artifact for potential bias poses an important task in developing AI models (Renier et al., 2021). The following tests were conducted to test for bias in the training data set:

Age and gender

To test the algorithm for bias towards age and gender, subsets of data for control groups with participants of different age and gender were constructed and tested. The relationships of the labels predicted by the algorithm and the given information for age/gender were tested using the chi-square test (David Freedman 2007). No significant relation was found. The test for bias in this context is particularly important as a recent study indicates that many recognition methods are prone to assign a lie to a female person (Mambreyan et al., 2022).

Nervosity

To rule out that the artifact is measuring the speaker’s nervosity instead of deception, an experiment was conducted to collect and analyze speech data in moments when the participants were highly nervous. The level of nervosity did not significantly affect the level of accuracy of the detection of deception (Jechle et al., 2026).

Professionals

When presenting the artifact in meetings with companies, a limited number of test recordings from practitioners were collected. These professionals were people working for several years in their respective roles and thus not directly comparable to the general population. It turned out that the detection accuracy is even slightly higher than in the general population groups. However, the number of participants is too small to make a solid statement, so this remains anecdotal evidence.

In the light of these tests, it can be concluded that the model indeed measures whether a speaker speaks in accordance with her/his inner conviction and no bias of age, gender or occupation was detected.

Demonstration of the artifact

Perceived usefulness of the artifact

To demonstrate the usefulness of the developed artifact, the HR professionals who provided feedback in the initial workshop were consulted again. The capabilities of the artifact were demonstrated and potential practical applications and hindering issues were discussed. In summary, all experts agreed that the artifact is useful. The more applications a company receives, the more likely they are to apply AI algorithms to help with the recruitment process.

Although the feedback was very positive, some issues were also mentioned.

• For more technical positions, like, for example, software developers, an experts said they would need to have specifically designed very deep/detailed test cases to see “whether the person really has the capabilities they claim to have” (Expert 6).

• Those companies who require their employees to have specific certifications (like SCRUM Master, PRINCE2, etc.) said it would be good have these certificates automatically checked to reduce the manual effort. The rationale is that people do not submit forged certifications if they know an AI algorithm will evaluate them.

However, although both points are relevant in a recruiting context, they are not within the scope of the proposed artifact.

One of the practitioners consulted, a lawyer specialized in labor law (Expert 3), also commented on the legal implications. He stated that he sees no legal roadblocks for deploying such an AI artifact if the applicant informed about the deployment of mathematical/statistical analysis of their voice transparently and ahead of time, and a clause is added to the data privacy agreement that is part of every interaction with applicants. He expressed hope that widespread deployment of such algorithms could reduce the number of applications containing false statements. He also suggested that adapting the recruitment process in this way may need the approval of the works council.

Design principles

The aim of the DSR method is to ensure that the findings will not only apply to one specific situation but also have a certain level of generalizability. Developing artifacts that benefit IS researchers and generate value for practitioners—one of the core missions of IS research (Iivari 2020)—requires design principles to be identified, developed, and communicated (Gregor and Hevner 2013). Design principles are “prescriptive statements that indicate how to do something to achieve a goal” (Gregor et al., 2020) and collectively advance the body of knowledge of our discipline by serving as guiding principles for future developments (Nguyen et al., 2021).

The design principles presented in the following are based on the solution requirements elicited in the focus group workshop (see Section 5.2). These were challenged by the project team during the development phase of the artifact and discussed and finalized with the experts during the demonstration of the artifact.

The AI artifact itself is a neutral technology which cannot abide by ethical and legal guidelines and regulations. The responsibility to use the artifact in a legal and ethical manner remains with the user. However, the artifact is designed to ensure data privacy in the sense that it has a minimal data design (i.e., only requests the data it really needs) and does not need to store collected data. This makes it technically possible to operate within the limits of the EU-GDPR. Still, the responsibility to obtain consent before the interview remains with the user—which is already current standard practice in the corporate environments. From an ethical perspective, the artifact is not designed to make a decision, but rather only to provide an assessment. The actual decision remains with the human being. This is also underscored by clearly communicating the ∼20% margin of error in assessments. Here again, it remains the responsibility of the user to inform the interviewee transparently and ahead of time and offer the option to opt out without consequences. Finally, the requirement to perform better than a human being was not listed as a design principle because it is an innately indispensable functional requirement, a conditio sine qua non, that is, if the AI artifact does not perform better than HR professionals, it will not be used anyway.

The following design principles may serve as generalized guidelines for the future development of speech-enabled AI artifacts in sensitive business processes (Table 7):

Table 7.

Design principles.

Design Principle	Description of the Design Principle
Limited audio quality	The artifact shall not require special technical equipment; it shall be able to analyze data recorded on standard phone calls or standard video conference equipment for home/private users.
Data privacy	To ensure compliance with common data privacy regulations, the artifact shall adopt a privacy-by-design approach. In other words, it shall not store biometrical information that identifies an individual and shall not request more data than absolutely necessary to conduct the assessment. Whenever data is needed, it shall not be stored in a non-anonymous way after the assessment is finished.
Legal and ethical compliance	The artifact shall not violate legal or ethical regulations. Potential areas of conflict (e.g., decision-support vs. decision-making) shall be explicated by the developers.
Free of bias	The artifact shall be tested for inherent bias before deployment and regularly during its deployment to avoid bias towards sub-groups of interviewees.

Summary of demonstration

In conclusion, the HR professionals were all interested in the technology and see potential to use it in their own recruitment process to help the recruiters to sort out deceptive applications early in the process. The experts were satisfied with the accuracy of the artifact and did not identify any major hindering barriers. The feedback received from the demonstrations was used to formulate generalizable design principles.

Consequently, the demonstration phase was concluded successfully by confirming the usefulness of the artifact to solve the stated solution objective in the selected use case.

Evaluation of the artifact

The next step following its demonstration is the evaluation of the artifact (Peffers et al., 2007). As discussed by Gregor and Hevner (2013), not every artifact can be evaluated in its envisioned setting. In those cases, a proof-of-concept is deemed appropriate. In our case, the artifact was tested on a population whose demographics could resemble the target audience in the eventual use case. However, the context of the voice recording was different. In line with Markowitz (2024), it is assumed that lying is a context-independent emotion (i.e., the emotion a human being experiences when lying is the same regardless of the situation). It can consequently be assumed that the artifact detects deception independent of the context. This hypothesis was supported in tests with practitioners during the presentation of the artifact and in recruitment interviews for positions at our research institute. The artifact consistently detected false statements with >80% accuracy. Although the number of real recruitment interview cases is currently too low to present solid statistical evidence, it supports the proof-of-concept that the artifact is suited for its intended purpose.

Formally, an artifact needs to be evaluated in terms of achieving its development objectives (Hevner et al., 2004) which were stated in Section 5.5. All seven stated objectives were met by the artifact:

No special equipment: The artifact works with data collected on a common phone line or video conference session. No special equipment like special microphones is necessary.

Data privacy: The artifact does not collect data that is not already collected as part of the recruitment process. This basic demographic data (gender, age, occupation) is used to rule out bias in the analysis. The voice recording is deleted after the analysis.

Legal and ethical compliance: Compliance with further legal and ethical requirements cannot be ensured by the artifact this is responsibility of the company that uses the artifact. The design of the solution ensured that no legal or ethical requirements are violated by the artifact itself.

Free of bias: All tests for bias conducted on the artifact were negative (see Section 8.5).

Decision support only: The artifact provides a numerical value which reflects its assessment of the truthfulness of the recording. This is only a data point but not a recruitment decision.

Superior Detection Accuracy: On average, people detect deception with around 50% accuracy. The artifact detects deception with >80% accuracy.

Superior Performance: The performance of the artifact needs to be evaluated against published models from studies of comparable scope and methodological approach. The comparable published studies identified are described in Section 4.3 and listed in Table 8 below.

Table 8.

Comparison to published research.

Study	Accuracy (rounded)	Data set	Number of unique speakers
Fan et al. (2015)	64%	Self-developed (Fan et al., 2015)	30
Mihalache and Burileanu (2022)	64%	Real-life deception detection (RLDD) (Pérez-Rosas et al., 2015)	56
Fu and Lei (2020)	68%	CSC deceptive speech corpus (Hirschberg et al., 2005)	32
Fu et al. (2023)	69%	Self-developed “H-Wolf corpus” (Fu et al., 2023)	40
Bareeda et al. (2021)	81%	Real-life deception detection (RLDD) (Pérez-Rosas et al., 2015)	56
This study	82%	Self-developed (see Section 6)	80

The artifact presented in this study is compared to other artifacts in terms of detection accuracy and breadth of the dataset. Only published studies of comparable scope (detection of deception) and methodological approach (analysis of human voice) were selected for comparison.

As can be seen from the table, the presented artifact provides the highest accuracy of all models. The work of Bareeda et al. (2021) provides a very close accuracy, however, the reliance on a single feature type (MFCC) limits its robustness. The artifact presented here not only utilizes a larger and more heterogeneous dataset, it also complements the analysis of the spectral characteristics (MFCCs) with acoustic features, which provides are more robust algorithm.

The combination of spectral and acoustic features, a diverse dataset, and the application of 25 repeated five-fold cross validation, enhances the model’s robustness and applicability to real-world scenarios. The achieved accuracy slightly surpasses the best performing model from reference studies.

In summary, the evaluation of the artifact is positive. It is achieving its stated objectives, applicability to the presented use case, and is technically superior towards the comparable published artifacts.

Contribution to theory and practice

Our findings contribute to theory from two main perspectives: (1) for design science researchers, our work presents a novel algorithm to the automated detection of complex human emotions and explicates design principles to guide the development of corresponding AI artifacts; (2) for behavioral scientists, the presented artifact advances the conceptualization of AI-supported organizational decision-making, especially in high stakes areas that were previously exclusively in the human decision-making sphere. This consequently adds to our understanding of the dynamically evolving working relationship between humans and AI artifacts and to an extension of the taxonomy of human–AI hybrids presented by Fabri et al. (2023).

Contribution to design science research

AI-enabled information systems to detect complex emotions

Innovative AI artifacts are becoming ever more integral to the business/IT-landscape in contemporary businesses. As interactions between AI agents and people continues to increase in breadth and depth (Bagozzi et al., 2022; Huang and Rust 2024), it becomes ever more important for the AI agent to emotionally understand the human agent. The automated detection of individuals’ complex emotions can enable researchers to deeper understand human behavior, specifically in emotionally stressful situations. This tool would help researchers to automatically detect individuals’ personality traits (Yang et al., 2022), understand the implications of using social media on personal well-being (Krasnova et al., 2015), or analyze rational and irrational behavior in algorithmic decision acceptance (Burton et al., 2020).

As the human–computer interface evolves rapidly from keyboard to voice (see, e.g., Phillips (2021)) our research opens up useful application to study numerous aspects of human behavior. One could also see various applications into the field of intelligence augmentation, the combination of artificial and human intelligence, as proposed by Jain et al. (2021). Another area closely connected to our research is honesty in the digital age (Cohn et al., 2022), the phenomenon that humans behave differently in digital environments than in the real world. Cohn et al. (2022) find that individuals cheat about three times more when they interact with a machine than when they interact with a person. For example, Ananthakrishnan et al. (2020) shed light on the issue of fraudulent reviews and Moravec et al. (2022) discuss the phenomenon of fake news. In a nutshell, individuals tend to be less truthful if no other person is present.

There is a long history of using technical support to analyze human behavior and, specifically, deceptive actions in IS Research. Almost 50 years ago, Jenkins and Johnson (1977) published their research into the relevance of body language to information analysts, Weinmann et al. (2022) later analyzed computer mouse movements to predict fraudulent insurance claims, and Yang et al. (2023) analyze multiple aspects of the human voice to assess the trustworthiness of managers on conference calls.

Although our work is focused on deceptive intentions, it shows that this class of algorithms is able to detect a variety of emotional states like stress, uncertainty, and fear. Thus, our findings support the research strand on automated detection of complex human emotions with high accuracy from speech. We presented evidence that the defining voice features essential to the speaker’s convictions are contained within the verbal statements’ spectral representation and that an SVM architecture previously used in basic emotion detection and speech recognition can also be applied to derive the complex emotions a lying person experiences. Thus, our findings provide researchers with a solid base to research the detection of other complex emotions using an SVM with a distinct set of MFCCs. Furthermore, as the model works without semantic understanding or conversion into text, it enables researchers working in jurisdictions with strict data privacy laws to extend their research and could pave the way for research in sensitive areas such as hospitals or psychiatric institutions.

Design principles to guide the development of artifacts to detect complex emotions

Designing an innovative IT-artifact to be deployed in a highly sensitive corporate business process poses multiple unique challenges. In the development process of the artifact presented, four design principles were derived, tested, and confirmed following a rigorous DSR methodology.

Design principles provide prescriptive knowledge that bridges the gap between theoretical considerations and practical knowledge in artifact design. They encapsulate what should be built and how, grounded in theoretical and/or empirical insights, and enable the systematic creation and evaluation of digital solutions (Gregor et al., 2020). Consequently, design principles act as prescriptive statements, offering actionable guidance on developing artifacts to achieve their previously specified objectives (Hevner et al., 2004). The explication of these principles contributes to theory by formalizing design knowledge from tacit experiential learning into structural components which support the efforts developers of non-identity-based AI-enabled artifacts which detect complex emotions from voice-based interactions. By outlining mechanisms and rationales, they translate abstract theories and unstructured insights into reproducible procedures for system design and implementation. Consequently, the stated design principles generalize lessons learned from the specific instance of this research project into generalized reusable constructs, thus supporting theoretical generalization for IS design theory.

Contribution to behavioral research

This novel AI-enabled artifact enhances organizational decision making by enabling experts and managers to overcome a typical human deficiency: our weak ability to detect deception. The following arguments draw upon extant research examining the emergent, multifaceted collaboration of human actors and AI systems (Baptista et al., 2020; Benbya et al., 2020b; Rai et al., 2019). The aim is to understand how novel AI tools can potentially affect organizational decision making in high-stakes contexts. In this vein, we seek to advance a more nuanced understanding of how AI-enabled technologies are being used to reconfigure organizational work practices and the implications for actors whose roles and activities are thereby transformed.

Effects of novel AI-enabled artifacts on organizational decision making

The strong growth of AI-enabled tools to support organizational decision making changes the dynamics and correspondingly challenges our understanding of the theoretical foundations of these processes. Contemporary research has a strong focus on identifying implications and providing corresponding frameworks to structure the new landscape of decision support systems (see, e.g., Fabri et al., 2023; Jöhnk et al., 2020; Pathirannehelage et al., 2024).

In organizational decision making, managers traditionally rely on both intuitive gut feelings (Hayashi 2001) and analytical practices when processing information (Dane et al., 2012). Advanced AI-enabled tools, such as the one presented here, combine both of these formerly segregated practices. The AI can analytically process much more information than a human being and now can additionally replace the intuitive gut feeling by an objectivized measurement of complex emotions. Whereas humans are notoriously bad at detecting deception, the AI-enabled artifact is able to unveil hidden information and implicit cues, thus enabling decision making beyond the traditional reliance on explicit statements. Consequently, innovative AI tools now allow decision makers to access this layer of implicit information embedded in the way something is said, by assessing paralinguistic cues.

In human interactions there is always a nonverbal layer involved in communication. Oral communication is richer and more complex than just the words spoken, and the subtle cues within natural language carries significant weight regarding a speaker’s confidence, sincerity, and even deceptive intentions. Using AI tools to analyze complex emotions allow decision makers to interpret this information more accurately, rather than simply taking the words at their face value or relying on their gut feelings. The ability to assess nonverbal cues will change organizational decision making and theories regarding decision support systems significantly, as foreseen by Grover et al. (2018) and others.

The trend towards the formation of hybrids, which combine human and AI capabilities (Fabri et al., 2023) can be seen in multiple organizational areas, like in hybrid software engineering (Abrahão et al., 2025), or the formation of hybrid team structures (Bayrak et al., 2021). In decision making, the boundary between analytical and intuitive practices has started to vanish. In individual decision making, hybrid approaches have gained traction quickly, for example, in the field of medicine (Reverberi et al., 2022). In organizational decision making, the diffusion of the new capabilities takes longer, mainly due to the complexity of the environment. Raisch and Fomina (2025) present a model of hybrid organizational decision making in a common management area and identify time and knowledge as main drivers, meaning that hybrid decision making enables management to include more information in their decisions and save time to come to a conclusion. Carter and Wynne (2024) discuss the effect a non-human collaborator has on team decision making in an organizational context.

There is currently little research available about hybrid organizational decision making in high stakes areas, like human resources. Contemporary behavioral research is mainly concerned with the ethical implications (see, e.g., Bankins 2021) and design science focusses on providing design principles for the development of such systems (see, e.g., Storey et al., 2024). The integration of advanced AI capabilities, such as detecting emotions, into hybrid organizational decision making remains largely unexplored in academic publications. The research presented enables other researchers to explore this route because there is now an AI-enabled artifact available, embedded in a practically relevant use case, that closes this gap.

Conceptualizing the interworking of humans and AI artifacts

Another interesting lens for analysis is the embedding of advanced AI-enabled artifacts into business processes and the resulting consequences for the actors involved. The integration of AI artifacts into processes and the corresponding impact is one of the most dynamic contemporary research areas in IS research. Fabri et al. (2023), for example, provide a taxonomy of human–AI hybrids to analyze the interworking relationship of AI capabilities that are deployed in various use cases. We applied that taxonomy to position our work and identified the need to extend the application of the taxonomy by defining a new archetype.

The AI artifact presented in this paper would qualify as human agency, where the human agent uses the AI capabilities to enhance their work quality. The human agents (recruitment personnel) use the human cognitive function of perceiving to have the artifact provide an opinion on the trustworthiness of the statement of the applicant. In this human to AI interaction the human agents keep full control over the decision and use the information provided by the artifact as one input factor (amongst others) to verify/challenge their opinion, that is, the artifact facilitates the decision process. We classified the human focus as sensemaking because the recruiters need to critically evaluate the artifact’s assessment and use their own judgment to derive the final decision.

However, from a functional perspective the setting described above calls for a new archetype, contextually situated between Archetype 3: Sequential Augmentation (Superpower-Giving AI) and Archetype 4: Sequential Co-Evolution (Assembly Line AI) as suggested by Fabri et al. (2023, p. 635). The new Archetype 3a: Singular Interaction Support (Advisor AI) is distinct from Archetype 3, as the AI performs tasks the human agent is not able to do (additional skill), can be used flexibly (either in sequence or multiple times in different steps during the process), and has a solely supporting role among many input factors. In an analogy to enterprise architecture management, this would compare to a web service in a service-oriented architecture: The AI artifact can be called upon either at a predefined step in the process (as suggested in the use case in this paper), or it can be utilized multiple times at different steps during the process, depending on the specific contextual situation. In any case, the role of the AI artifact remains solely supportive, providing advice to the human agents. Thus, its role is less prominent than Archetype 3, Superpower-giving AI. The new Archetype 3a is much more like a piece in the puzzle, offering advice and a second opinion, but not the stronger guidance of the example of risk assessments in court cited by Fabri et al. (2023) to describe Archetype 3.

The theoretical foundation for this proposed new archetype is that most existing models of human–AI collaboration models focus on AI’s capabilities to process more information in a shorter time than people can (Raisch and Fomina 2025; Trunk et al., 2020; Wen et al., 2025). However, the advanced AI capabilities of emotion recognition—which amalgamate with the human capacities—are not yet adequately reflected in the corresponding theoretical models.

Popular theoretical lenses in human–AI collaboration models are shared cognitive theories like shared mental models (SMM) or transactive memory models (TMM). They assume that collaborators function as distributed memory networks, where participants know “who knows what,” allowing cognition to be shared across agents rather than stored in one place. AI-enabled artifacts with the capability for automated emotion recognition fit into such shared cognitive theories, but with a different focus. The human capabilities are extended into a different sphere, the recognition of complex emotions with greater accuracy than before. This offers new opportunities to make better decisions that surpass the current conceptualization of AI to analyze more information in a shorter time. The user of the artifact gains an additional skill, the objective interpretation of a complex emotion, which enables them to make a better decision because the intuitive part of decision making is enhanced by a neutral and more accurate AI-enabled analysis. Such AI-assisted intuitive judgment could be categorized as cognitive augmentation because the AI artifact augments the cognitive capabilities of decision makers, when called upon, and thus helps them improve the quality of their organizational decision making.

An important capability of the presented artifact is that operates independent of the content of utterances, thereby responding to legitimate concerns about privacy and ethical data use. This manifests a new paradigm of privacy-preserving cognitive augmentation and presents a new theoretical lens for further development of human–AI collaboration models.

Contribution to practice

Applicants providing false statements on their résumés seem to be the rule rather than the exception (Schenz and Zeitung 2018). This was confirmed in our expert workshops: “We often have the problem, especially when recruiting for junior positions, that the capabilities stated in the CV do not match the experiences the applicants actually have” (Expert 4). This poses serious issues in terms of money and time lost: “We recruit people on a global basis, mainly in southern and eastern Europe, but also from India and Asia. Flying those people in for face-to-face interviews is costly, let alone the time it takes our staff to conduct the interviews. We estimate that each of these interviews costs us between €5,000 and €10,000” (Expert 1). Having an automated mechanism to reduce the number of interviewees who advance to the live interview stage of the recruitment process by providing false statements has great practical value.

Another area of potential application of the artifact presented is in the service industry. GDV, the German insurance association, estimates the damage from insurance fraud at 6 bn EUR annually (GDV 2024). GDV has determined that ten percent of insurance claims has a fraudulent component and around 5% of customers of member insurance companies have submitted at least one fraudulent claim (GDV 2020; GDV 2024). Applying AI techniques to detect whether a caller is speaking to her/his conviction when making a claim may not completely solve these issues, but it could help alert the claim agent to the need for more intensive questioning and give the caller the opportunity to correct their statements and avoid submitting a fraudulent claim.

It is an open secret that customers’ statements at helpdesks or the customer service desk do not always fully reflect the truth (Alton 2017). It would save time and resources if an algorithm could detect whether the customer truly rebooted their computer before they called the helpdesk, or whether the customer is being knowingly deceitful about a product malfunction. Such an algorithm could also help verify whether and when a product was really sent to the support team, resulting in productivity gains and a reduction of operational losses.

There are many potential applications in medical and therapeutic fields. Mobile assisted living systems help vulnerable patients manage their daily lives independently. An AI algorithm inquiring how much they drank during the day or whether they took their medication correctly could enhance these systems because patients often do not answer these questions truthfully (Fainzang 2002). The algorithm could identify answers that do not reflect the person’s convictions and trigger human intervention from a designated caregiver to ensure the well-being of the patient. Psychotherapists could employ such an algorithm to see whether the patient is speaking to her/his convictions and physicians could apply the algorithm to improve anamnesis and the resulting diagnosis, especially with regard to socially difficult topics such as sexually transmitted diseases, drug use, or drinking and eating habits (Palmieri and Stern 2009). Naturally, these applications require transparent disclosure to the patient and adherence to corresponding data privacy regulations.

Limitations and further research

The main limitations of the artifact are rooted in the need for further research. Currently we cannot rule out that “trained liars” (professional fraudsters), or trained speakers (actors) who have superior abilities to control their voices and emotions, can outsmart the algorithm. There is also a need to investigate the influence of the speaker’s cultural background, for example, in cultures where lying is differently connotated and does not trigger the same complex emotions. Further studies among individuals with psychological, physical or medical challenges that influence their speech are needed to determine to what degree the artifact correct classifies their speech acts. Despite the relatively small size of our model training dataset, we would not expect, but cannot rule out, significant differences with a larger or more diverse dataset in terms of age, gender and occupation.

Our findings offer a strong basis for further research. From a behavioral science perspective, the advanced AI-enabled artifact presented in this paper offers the opportunity to study AI-supported hybrid organizational decision making in high stakes areas. This area of research is currently largely underexplored. Whereas we have a strong understanding of why people use AI artifacts to support managerial decisions in an organizational context, our understanding of the mechanisms in high stakes areas like human resources is more limited. From a design science perspective, to enhance predictive capabilities of the model further studies should include a more heterogeneous sample in terms of regional dialect, age, work experience, and personality traits, as well as participants speaking different languages and from heterogeneous cultural backgrounds. To enhance the performance and accuracy of our model, we also encourage research to determine the minimum audio file length needed to reliably verify conviction. Given that many models are overfitted at some point along the learning process, further hyperparameter optimization will be needed. Also research to detect patterns in the selection of hyperparameters with respect to the phenomenon of memorization would be beneficial. Moreover, to increase predictive accuracy, further research should analyze transcripts of the audio files as described by Zhang et al. (2010) and merge the results with those gained from purely acoustic analysis to provide an enhanced multi-modality-based and therefore more robust system. This approach could be enhanced by extracting psychometric dimensions from the transcribed audio files as suggested by Ahmad et al. (2020). After the artifact is deployed in a productive environment, behavioral studies should be conducted to understand how recruiters and applicants respond to the new software and what challenges arise.

Conclusion

This paper presents a novel AI artifact that detects deceit among speakers by analyzing the acoustic and spectral features of their natural speech, but without voice recognition. We trained the AI model on a specifically generated German language dataset consisting of 80 audio files recorded in the experimental setting of a debate club. Out of the models evaluated, the best version detected deception with 82.4% accuracy. Our results demonstrate how support vector machines can be applied to perform acoustic speech analysis to detect whether or not a person is speaking to their true convictions.

An AI artifact that detects deception using only human voice characteristics, without voice recognition, pushes IS theory beyond traditional data processing and management. It forces a deeper engagement with the complexities of human communication, the implications for organizational decision making using AI artifacts in sensitive contexts, and the design of systems that can extract meaningful, non-identifying insights from rich, real-time data (Blohm et al., 2025). The application of such a novel AI artifact into organizational decision making enables managers to make better decisions by overcoming humans’ natural weak ability to detect deceptive intentions of other people accurately. Such artifacts advance the concept of human–AI hybrids, a re-conceptualization of organizational decision making concepts and an extension of the current taxonomies for human–AI collaboration.

Additionally, the presented AI artifact demonstrates that semantic understanding of a conversation is not necessary to detect complex human emotions. Our research shows that analyzing the spectral features of the human voice is sufficient to achieve accurate results. As the old saying goes: “It’s not what you say, but how you say it.”

Footnotes

ORCID iDs

Heiko Gewald

Stefan Faußer

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors would like to thank the Technology Transfer Center, Günzburg, Germany for funding parts of the data collection for this study.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Note

Author biographies

Dr. Heiko Gewald is research professor of Information Management at Neu-Ulm University of Applied Sciences in Germany and Director of the Institute for Digital Innovation (IDI). His research focuses on the use of digital resources by the aging generation, digital innovations, and HealthIT. He is a frequent speaker on conferences contributing to these matters and his work has been published in numerous peer-reviewed journals.

Dr. Stefan Faußer is a Professor of Data Analytics at the University of Applied Sciences Neu-Ulm in Neu-Ulm, Germany and a senior research fellow at the Institute of Digital Innovation. He also serves as course director of the master's program in Artificial Intelligence. He studied computer science and earned his Ph.D. from Ulm University in 2015. Before his academic journey, he worked for more than eight years as a software engineer and data scientist. Among his research interests are machine learning, natural language processing, and complex emotion recognition from speech.

Appendix

References

Abrahão

Grundy

Pezzè

, et al. (2025) Software engineering by and for humans in an AI era. ACM Transactions on Software Engineering and Methodology 34(5): 1–46. https://doi.org/10.1145/3715111

Adolphs

(2002) Neural systems for recognizing emotion. Current opinions in neurobiology 12(2): 169–177. https://doi.org/10.1016/s0959-4388(02)00301-x

Ahmad

Abbasi

, et al. (2020) A deep learning architecture for psychometric natural language processing. ACM Transactions on Information Systems 38(1): 1–29. https://doi.org/10.1145/3365211

Aldwairi

Dilina

Mark

(2018) An evaluation of the performance of Restricted Boltzmann Machines as a model for anomaly network intrusion detection. Computer Networks 144: 111–119. https://doi.org/10.1016/j.comnet.2018.07.025

Alton

(2017) How do you handle lying customers? Retrieved 5 August 2025, from: https://customerthink.com/how-do-you-handle-lying-customers/

Amir

Ron

(1998) Towards an automatic classification of emotions in speech. In: Proceedings of the Fifth International Conference on Spoken Language Processing.

Ananthakrishnan

Smith

(2020) A tangled web: should online review portals display fraudulent reviews? Information Systems Research 31(3): 950–971. https://doi.org/10.1287/isre.2020.0925

Bachu

Kopparthi

Adapa

, et al. (2010) Voiced/unvoiced decision for speech signals based on zero-crossing rate and energy. In: Proceedings of the Advanced Techniques in Computing Sciences and Software Engineering. Springer, pp. 279–282.

Bagozzi

Brady

Huang

M-H

(2022) AI service and emotion. Journal of Service Research 25(4): 499–504. https://doi.org/10.1177/10946705221118579

10.

Bankins

(2021) The ethical use of artificial intelligence in human resource management: a decision-making framework. Ethics and Information Technology 23(4): 841–854. https://doi.org/10.1007/s10676-021-09619-6

11.

Baptista

Stein

M-K

Klein

, et al. (2020) Digital work and organisational transformation: emergent Digital/Human work configurations in modern organisations. The Journal of Strategic Information Systems 29(2): 101618. https://doi.org/10.1016/j.jsis.2020.101618

12.

Bareeda

Mohan

BSS

Muneer

KVA

(2021) Lie detection using speech processing techniques. Journal of Physics: Conference Series 1921(1): 1–8.

13.

Bayrak

McComb

Cagan

, et al. (2021) A strategic decision-making architecture toward hybrid teams for dynamic competitive problems. Decision Support Systems 144: 113490. https://doi.org/10.1016/j.dss.2020.113490

14.

Benbya

Davenport

Pachidi

(2020a) Artificial intelligence in organizations: current state and future opportunities. MIS Quarterly Executive 19(4): 9–21.

15.

Benbya

Nan

Tanriverdi

, et al. (2020b) Complexity and information systems research in the emerging digital world. MIS Quarterly 44(1): 1–18. https://doi.org/10.25300/misq/2020/13304

16.

Benbya

Pachidi

Jarvenpaa

(2021) Special issue editorial: artificial intelligence in organizations: implications for information systems research. Journal of the Association for Information Systems 22(2): 281–303. https://doi.org/10.17705/1jais.00662

17.

Bhamare

Katharguppe

Nancy

(2020) Deep neural networks for lie detection with attention on bio-signals. In: Proceedings of the 2020 7th International Conference on Soft Computing & Machine Intelligence (ISCMI). IEEE, pp. 143–147.

18.

Bill

Melchers

Buehl

A-K

, et al. (2020) An investigation of situational and dispositional antecedents of faking intentions in selection interviews. Frontiers in Psychology 11: 2034. https://doi.org/10.3389/fpsyg.2020.02034

19.

Bill

Schmitz-Wilhelmy

Heinzelmann

, et al. (2024) How do situational cues influence honest and deceptive impression management in selection interviews? A grounded theory study. Journal of Business and Psychology 39(6): 1427–1447. https://doi.org/10.1007/s10869-024-09962-7

20.

Blohm

Miranda

Leimeister

(2025) Next-generation IS research methods – towards a better understanding of complex and dynamic phenomena … and generative AI as the elephant in the room. Journal of Information Technology 40(2): 102–121. https://doi.org/10.1177/02683962251340699

21.

Boateng

Kowatsch

(2020) Speech emotion recognition among elderly individuals using multimodal fusion and transfer learning. In: International Conference on Multimodal Interaction. Association for Computing Machinery, pp. 12–16.

22.

Bogen

Rieke

(2018) Help wanted: an examination of hiring algorithms, equity, and bias. Upturn.

23.

Bond Jr

DePaulo

(2006) Accuracy of deception judgments. Personality and Social Psychology Review 10(3): 214–234. https://doi.org/10.1207/s15327957pspr1003_2

24.

Boyle

(2023) George Santo’s defense that everyone lies on their resumes has truth to it. bloomberg.com.

25.

Burton

Stein

Jensen

(2020) A systematic review of algorithm aversion in augmented decision making. Journal of Behavioral Decision Making 33(2): 220–239. https://doi.org/10.1002/bdm.2155

26.

Careerbee (2025) How long does it take to get hired in Germany? Retrieved 01 March 2025, from: https://www.careerbee.io/how-long-does-it-take-to-get-hired-in-germany-with-tips/

27.

Carter

Wynne

(2024) Integrating artificial intelligence into team decision‐making: toward a theory of AI–human team effectiveness. European Management Review 1–14. Available at: https://doi.org/10.1111/emre.12685

28.

Chamoli

Semwal

Saikia

(2017) Detection of emotion in analysis of speech using linear predictive coding techniques (LPC). In: Proceedings of the International Conference on Inventive Systems and Control (ICISC). IEEE.

29.

Checkster (2020) 78% of job seekers lie during the hiring process. Retrieved 05 March 2023, from: https://www.cnbc.com/2020/02/19/how-many-job-seekers-lie-on-their-job-application.html

30.

Cohn

Gesche

Maréchal

(2022) Honesty in the digital age. Management Science 68(2): 827–845. https://doi.org/10.1287/mnsc.2021.3985

31.

Constâncio

Tsunoda

Silva

HDFN

, et al. (2023) Deception detection with machine learning: a systematic review and statistical analysis. PLoS One 18(2): e0281323. https://doi.org/10.1371/journal.pone.0281323

32.

Dane

Rockmann

Pratt

(2012) When should I trust my gut? Linking domain expertise to intuitive decision-making effectiveness. Organizational Behavior and Human Decision Processes 119(2): 187–194. https://doi.org/10.1016/j.obhdp.2012.07.009

33.

Dave

(2013) Feature extraction methods LPC, PLP and MFCC in speech recognition. International Journal of Advanced Research in Engineering & Technology 1(6): 1–4.

34.

David Freedman

(2007) Statistics. 4th edition. W. W. Norton & Company.

35.

Davis

Mermelstein

(1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, & Signal Processing 28(4): 357–366. https://doi.org/10.1109/tassp.1980.1163420

36.

Demircan

Kahramanlı

(2014) Feature extraction from speech data for emotion recognition. Journal of Advances in Computer Networks 2(1): 28–30. https://doi.org/10.7763/jacn.2014.v2.76

37.

DePaulo

Lindsay

Malone

, et al. (2003) Cues to deception. Psychological Bulletin 129(1): 74–118. https://doi.org/10.1037/0033-2909.129.1.74

38.

Dutoit

Paech

(2002) Rationale-based use case specification. Requirements Engineering 7(1): 3–19. https://doi.org/10.1007/s007660200001

39.

Ekman

(2004) Emotions Revealed: Understanding Faces and Feelings. W&N.

40.

Engel

Elshan

Ebel

, et al. (2024) Stairway to heaven or highway to hell: a model for assessing cognitive automation use cases. Journal of Information Technology 39(39): 94–122. https://doi.org/10.1177/02683962231185599

41.

EU (2016). General Data Protection Regulation (GDPR) regulation (EU) 2016/679.

42.

EU (2024) The Artificial Intelligence Act - Regulation (EU) 2024/1689.

43.

Fabri

Häckel

Oberländer

, et al. (2023) Disentangling human-AI hybrids: conceptualizing the interworking of humans and AI-enabled systems. Business & Information Systems Engineering 65(6): 623–641. https://doi.org/10.1007/s12599-023-00810-1

44.

Fainzang

(2002) Lying, secrecy and power within the doctor-patient relationship. Anthropology & Medicine 9(2): 117–133. https://doi.org/10.1080/1364847022000034574

45.

Fan

Zhao

Chen

, et al. (2015) Distinguishing deception from non-deception in Chinese speech. In: Proceedings of the 2015 Sixth International Conference on Intelligent Control and Information Processing (ICICIP), pp. 268–273.

46.

Fayek

Lech

Cavedon

(2017) Evaluating deep learning architectures for Speech Emotion Recognition. Neural Networks 92: 60–68. https://doi.org/10.1016/j.neunet.2017.02.013

47.

Fernandes

Ullah

(2021) Use of machine learning for deception detection from spectral and cepstral features of speech signals. IEEE Access 9(9): 78925–78935. https://doi.org/10.1109/access.2021.3084200

48.

Ferran

Storck

(1997) Truth or deception: the impact of videoconferencing for job interviews. In: Proceedings of the 18th International Conference on Information Systems.

49.

Fleiß

Bäck

Thalmann

(2024) Mitigating algorithm aversion in recruiting: a study on explainable AI for conversational agents. The DATA BASE for Advances in Information Systems 55(1): 56–87.

50.

Fox

(2018) Perspectives from affective science on understanding the nature of emotion. Brain and Neuroscience Advances 2(2): 1–8. https://doi.org/10.1177/2398212818812628

51.

Fraser

Meltzer

Rudzicz

(2016) Linguistic features identify Alzheimer’s disease in narrative speech. Journal of Alzheimer’s Disease 49(2): 407–422. https://doi.org/10.3233/JAD-150520

52.

Lei

(2020) Speech deception detection algorithm based on denoising autoencoder and long short-term memory network. Journal of Computer Applications 40(2): 589–594.

53.

Wang

, et al. (2023) A semi-supervised speech deception detection algorithm combining acoustic statistical features and time-frequency two-dimensional features. Brain Sciences 13(5): 725. https://doi.org/10.3390/brainsci13050725

54.

Ganesh

Arnold

Reynolds

(2000) Understanding the customer base of service providers: an examination of the differences between switchers and stayers. Journal of Marketing 64(3): 65–87. https://doi.org/10.1509/jmkg.64.3.65.18028

55.

Gao

Lyu

Cheung

(2025) The implications of AI recruiting for workplace diversity, equity, and inclusion: a literature analysis and research agenda. Pacific Asia Journal of the Association for Information Systems 17(4): 1.

56.

GDV (2020) Zahlen bitte: Versicherungsbetrug. Positionen Magazin.

57.

GDV (2024) Versicherungsbetrug verursacht Schäden von über sechs Milliarden Euro im Jahr. Retrieved 01 May, 2026, from: https://www.gdv.de/gdv/medien/medieninformationen/versicherungsbetrug-verursacht-schaeden-von-ueber-sechs-milliarden-euro-im-jahr-176852

58.

George

Robb

(2021) Comparing Australians’ and Americans’ abilities to detect deception across cultures and communication media. Journal of the Midwest Association for Information Systems 2021(1): 9–25.

59.

Giannakopoulos

Kosmopoulos

Aristidou

, et al. (2006) Violence content classification using audio features. In: Proceedings of the 4th Helenic Conference on AI. Springer, pp. 502–507.

60.

Gregor

Hevner

(2013) Positioning and presenting design science research for maximum impact. MIS Quarterly 37(2): 337–355. https://doi.org/10.25300/misq/2013/37.2.01

61.

Gregor

Chandra Kruse

Seidel

(2020) Research perspectives: the anatomy of a design principle. Journal of the Association for Information Systems 21(6): 2–1652. https://doi.org/10.17705/1jais.00649

62.

Griffin

(2021) Voice Search Statistics for 2020. Retrieved 07.04.2021, from: https://www.thesearchreview.com/google-voice-searches-doubled-past-year-17111/

63.

Grover

Chiang

Liang

T-P

, et al. (2018) Creating strategic business value from big data analytics: a research framework. Journal of Management Information Systems 35(2): 388–423. https://doi.org/10.1080/07421222.2018.1451951

64.

Grubin

Madsen

(2005) Lie detection and the polygraph: a historical review. Journal of Forensic Psychiatry and Psychology 16(2): 357–369. https://doi.org/10.1080/14789940412331337353

65.

Gupta

Bansal

Choudhary

(2018) The state of the art of feature extraction techniques in speech recognition. In: Agrawal

S. S.

Devi

Wason

Bansal

(eds) Speech and Language Processing for human-machine Communications. Springer, pp. 195–207.

66.

Hayashi

(2001) When to trust your gut. Harvard Business Review 79(2): 58–65.

67.

Hearst

Dumais

Osuna

, et al. (1998) Support vector machines. IEEE Intelligent Systems and Their Applications 13(4): 18–28. https://doi.org/10.1109/5254.708428

68.

Hevner

March

Park

, et al. (2004) Design science in information systems research. MIS Quarterly 28: 75–105. https://doi.org/10.2307/25148625

69.

Hirschberg

JAB

Brenier

Enos

, et al. (2005) Distinguishing deceptive from non-deceptive speech. In: Interspeech 2005, pp. 1833–1836.

70.

Hirst

de Looze

(2021) Measuring Speech. Fundamental Frequency and Pitch. Cambridge University Press.

71.

Huang

M-H

Rust

(2024) The caring machine: feeling AI for customer care. Journal of Marketing 88(5): 1–23. https://doi.org/10.1177/00222429231224748

72.

Iivari

(2020) A critical look at theories in design science research. Journal of the Association for Information Systems 21(3): 502–519. https://doi.org/10.17705/1jais.00610

73.

Jain

Padmanabhan

Pavlou

, et al. (2021) Editorial for the special section on humans, algorithms, and augmented intelligence: the future of work, organizations, and society. Information Systems Research 32(3): 675–687. https://doi.org/10.1287/isre.2021.1046

74.

Jechle

Faußer

Gewald

(2026) Lies or nerves? A signal detection theory-guided approach to deception detection in natural speech. In: Rocha

Ferrás

Calvo

(eds) Lecture Notes in Networks and Systems. Springer.

75.

Jenkins

Johnson

(1977) What the information analyst should know about body language. MIS Quarterly 1: 33–47. https://doi.org/10.2307/248711

76.

Jöhnk

Weißert

Wyrtki

(2020) Ready or not, AI comes— an interview study of organizational AI readiness factors. Business & Information Systems Engineering 63(1): 5–20. https://doi.org/10.1007/s12599-020-00676-7

77.

Kamboj

Hessler

Asnani

, et al. (2021) Multimodal political deception detection. IEEE Multimedia 28(1): 94–102. https://doi.org/10.1109/mmul.2020.3048044

78.

Karnati

Seal

Yazidi

, et al. (2022) LieNet: a deep convolution neural network framework for detecting deception. IEEE Transactions on Cognitive and Developmental Systems 14(3): 971–984. https://doi.org/10.1109/tcds.2021.3086011

79.

Karpova

Popenova

Glebko

, et al. (2020) “Was it you who stole 500 rubles?” - The multimodal deception detection. In: International Conference on Multimodal Interaction. Association for Computing Machinery, pp. 112–119.

80.

Katz

(1983) Dilemmas of polygraph stipulations. Seton Hall Law Review 14: 285–303.

81.

Kishore

Satish

(2013) Emotion recognition in speech using MFCC and wavelet features. In: Proceedings of the 3rd IEEE International Advance Computing Conference (IACC). IEEE, pp. 842–847.

82.

Kohli

(2023) Everything we know So far about congressman George Santos lying about his resume. In: Time Magazin.

83.

Krasnova

Widjaja

Buxmann

, et al. (2015) Research note—why following friends can hurt you: an exploratory investigation of the effects of envy on social networking sites among college-age users. Information Systems Research 26(3): 585–605. https://doi.org/10.1287/isre.2015.0588

84.

Krishnamurthy

Majumder

Poria

, et al. (2018) A deep learning approach for multimodal deception detection. In: Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing.

85.

Kruger

Dunning

(1999) Unskilled and unaware of it: how difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology 77(6): 1121–1134. https://doi.org/10.1037//0022-3514.77.6.1121

86.

Lalitha

Geyasruti

Narayanan

, et al. (2015) Emotion detection using MFCC and cepstrum features. Procedia Computer Science 70: 29–35. https://doi.org/10.1016/j.procs.2015.10.020

87.

Levitan

Tan

Hirschberg

(2020) LieCatcher: game framework for collecting human judgments of deceptive speech. In: International Conference on Multimodal Interaction. Association for Computing Machinery, pp. 762–763.

88.

Lassiter

, et al. (2021) Algorithmic hiring in practice: recruiter and HR Professional’s perspectives on AI use in hiring. In: Proceedings of the Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pp. 166–176.

89.

Logan

(2000) Mel frequency cepstral coefficients for music modeling. In: Proceedings of the International Symposium on Music Information Retrieval.

90.

Mahbub

Pardede

(2018) Using contextual features for online recruitment fraud detection. In: 27th International Conference on Information Systems Development (ISD).

91.

Majumder

Poria

Hazarika

, et al. (2019) Dialoguernn: an attentive RNN for emotion detection in conversations. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 6818–6825.

92.

Mambreyan

Punskaya

Gunes

(2022) Dataset bias in deception detection. In: Proceedings of the 26th International Conference on Pattern Recognition (ICPR), pp. 1083–1089.

93.

Markowitz

(2024) Self and other-perceived deception detection abilities are highly correlated but unassociated with objective detection ability: examining the detection consensus effect. Scientific Reports 14(1): 17529. https://doi.org/10.1038/s41598-024-68435-2

94.

Mathur

Matarić

(2020) Introducing representations of facial affect in automated multimodal deception detection. In: International Conference on Multimodal Interaction. Association for Computing Machinery, pp. 305–314.

95.

Matsumoto

Hwang

(2018) Microexpressions differentiate truths from lies about future malicious intent. Frontiers in Psychology 9(9): 1–11. https://doi.org/10.3389/fpsyg.2018.02545

96.

McFee

Raffel

Liang

, et al. (2015) Librosa: audio and music signal analysis in python. In: Proceedings of the 14th Python in Science Conference, pp. 18–25.

97.

Melchers

Roulin

Buehl

(2020) A review of applicant faking in selection interviews. International Journal of Selection and Assessment 28(2): 123–142. https://doi.org/10.1111/ijsa.12280

98.

Mihalache

Burileanu

(2022) Using voice activity detection and deep neural networks with hybrid speech feature extraction for deceptive speech detection. Sensors 22(3): 1228. https://doi.org/10.3390/s22031228

99.

Moravec

Kim

Dennis

, et al. (2022) Do you really know if it’s true? How asking users to rate stories affects belief in fake news on social media. Information Systems Research 33(3): 887–907. https://doi.org/10.1287/isre.2021.1090

100.

Nasri

Ouarda

Alimi

(2016) ReLiDSS: novel lie detection system from speech signal. In: Proceedings of the 13th International Conference of Computer Systems and Applications (AICCSA). IEEE, pp. 1–8.

101.

Nguyen

Tuunanen

Gardner

, et al. (2021) Design principles for learning analytics information systems in higher education. European Journal of Information Systems 30(5): 541–568. https://doi.org/10.1080/0960085x.2020.1816144

102.

Nyman

Bødker

Blegind Jensen

(2024) Reforming work patterns or negotiating workloads? Exploring alternative pathways for digital productivity assistants through a problematization lens. Journal of Information Technology 39(3): 503–520. https://doi.org/10.1177/02683962231181602

103.

Pakyurek

Atmis

Kulac

, et al. (2020) Extraction of novel features based on histograms of MFCCs used in emotion classification from generated original speech dataset. Elektronika ir Elektrotechnika 26(1): 46–51. https://doi.org/10.5755/j01.eie.26.1.25309

104.

Palmieri

Stern

(2009) Lies in the doctor-patient relationship. Journal of Clinical Psychiatry 11(4): 163–168. https://doi.org/10.4088/PCC.09r00780

105.

Pan

Zhao

Zhou

(2015) The application of fractional Mel cepstral coefficient in deceptive speech detection. PeerJ 3: e1194. https://doi.org/10.7717/peerj.1194

106.

Park

Shim

Chatterjee

, et al. (2014) Computational analysis of persuasiveness in social multimedia: a novel dataset and multimodal prediction approach. In: Proceedings of the 16th International Conference on Multimodal Interaction, pp. 50–57.

107.

Pathirannehelage

Shrestha

von Krogh

(2024) Design principles for artificial intelligence-augmented decision making: an action design research study. European Journal of Information Systems 34(2): 207–229. https://doi.org/10.1080/0960085x.2024.2330402

108.

Peffers

Tuunanen

Rothenberger

, et al. (2007) A design science research methodology for information systems research. Journal of Management Information Systems 24(3): 45–77. https://doi.org/10.2753/mis0742-1222240302

109.

Pérez-Rosas

Abouelenien

Mihalcea

, et al. (2015) Deception detection using real-life trial data. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 59–66.

110.

Phillips

(2021) 15 Voice Search Trends & Statistics for 2021. Retrieved 07 April, 2021, from: https://www.webfx.com/blog/seo/voice-search-trends/

111.

Ponomareff

(2017) How AI is disrupting the role of call centres and call centre staff. Retrieved 07 April, 2021, from: https://www.computerweekly.com/opinion/How-AI-is-disrupting-the-role-of-call-centres-and-call-centre-staff

112.

Prince

Wallsten

(2022) How much is privacy worth around the world and across platforms? Journal of Economics and Management Strategy 31(4): 841–861. https://doi.org/10.1111/jems.12481

113.

Proverbio

Vanutelli

Adorni

(2013) Can you catch a liar? How negative emotions affect brain responses when lying or telling the truth. PLoS One 8(3): e59383. https://doi.org/10.1371/journal.pone.0059383

114.

Purohit

Banerjee

(2025) Artificial intelligence-based organizational decision-making in recruitment practice. Human Systems Management 44(1): 173–186. https://doi.org/10.3233/hsm-240044

115.

Rai

Constantinides

Sarker

(2019) Editor’s comments: Next-generation digital platforms: toward human–AI hybrids. MIS Quarterly 43(1): iii–ix.

116.

Rainie

Anderson

McClain

, et al. (2023) AI in hiring and evaluating workers: what Americans think. P.R. Center.

117.

Rainsberger

(2022) Relevance: how AI supports the sales process. In: AI-The New Intelligence in Sales. Springer, pp. 35–40.

118.

Raisch

Fomina

(2025) Combining human and artificial intelligence: hybrid problem-solving in organizations. Academy of Management Review 50(2): 441–464. https://doi.org/10.5465/amr.2021.0421

119.

Renier

Schmid Mast

Dael

, et al. (2021) Nonverbal social sensing: what social sensing can and cannot do for the study of nonverbal behavior from video. Frontiers in Psychology 12: 606548. https://doi.org/10.3389/fpsyg.2021.606548

120.

Resumelab (2020) Lying on a Resume (2020 Study).

121.

Reverberi

Rigon

Solari

, et al. (2022) Experimental evidence of effective human–AI collaboration in medical decision-making. Scientific Reports 12(1): 14952. https://doi.org/10.1038/s41598-022-18751-2

122.

Saxe

Hillary

(2018) Malware Data Science: Attack Detection and Attribution. No Starch Press.

123.

Schenz

(2018) Wie Viele Lügen Sind Im Lebenslauf Erlaubt? S. Zeitung.

124.

Scherer

(1995) Expression of emotion in voice and music. Journal of Voice 9(3): 235–248. https://doi.org/10.1016/s0892-1997(05)80231-0

125.

Schuller

Rigoll

Lang

(2003) Hidden Markov model-based speech emotion recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing.

126.

Serota

Levine

Docan-Morgan

(2022) Unpacking variation in lie prevalence: prolific liars, bad lie days, or both? Communication Monographs 89(3): 307–331. https://doi.org/10.1080/03637751.2021.1985153

127.

Shaver

Schwartz

Kirson

, et al. (1987) Emotion knowledge: further exploration of a prototype approach. Journal of Personality and Social Psychology 52(6): 1061–1086. https://doi.org/10.1037//0022-3514.52.6.1061

128.

Shete

Patil

(2014) Zero crossing rate and energy of the speech signal of Devanagari script. IOSR-JVSP 4(1): 1–5.

129.

Shu

Xie

Yang

, et al. (2018) A review of emotion recognition using physiological signals. Sensors 18(7): 2074, online. https://doi.org/10.3390/s18072074

130.

Slimi

Hamroun

Zrigui

, et al. (2020) Emotion recognition from speech using spectrograms and shallow neural networks. In: 18th International Conference on Advances in Mobile Computing and Multimedia. Association for Computing Machinery, pp. 35–39.

131.

Snehalakshmi

Lakshmanan

Devanathan

(2024) An integrated platform for lie detection and plagiarism analysis in interviews. In: Proceedings of the IEEE Conference on Engineering Informatics (ICEI).

132.

Soleymani

Asghari-Esfeden

, et al. (2015) Analysis of EEG signals and facial expressions for continuous emotion detection. IEEE Transactions on Affective Computing 7(1): 17–28. https://doi.org/10.1109/taffc.2015.2436926

133.

Statista (2022) Lügen in Lebensläufen.

134.

Stone

(1974) Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society: Series B 36(2): 111–147. https://doi.org/10.1111/j.2517-6161.1974.tb00994.x

135.

Storey

Park

(2022) An ontology of emotion process to support sentiment analysis. Journal of the Association for Information Systems 23(4): 999–1036. https://doi.org/10.17705/1jais.00749

136.

Storey

Hevner

Yoon

(2024) The design of human-artificial intelligence systems in decision sciences: a look back and directions forward. Decision Support Systems 182: 114230. https://doi.org/10.1016/j.dss.2024.114230

137.

Strich

Mayer

A-S

Fiedler

(2021) What do I do in a world of artificial intelligence? Investigating the impact of substitutive decision-making AI systems on employees’ professional role identity. Journal of the Association for Information Systems 22(2): 304–324. https://doi.org/10.17705/1jais.00663

138.

Suchotzki

Gamer

(2024) Detecting deception with artificial intelligence: promises and perils. Trends in Cognitive Sciences 28(6): 481–483. https://doi.org/10.1016/j.tics.2024.04.002

139.

Sudarsan

Kumar

(2019) Voice call analytics using natural language processing. International Journal of Statistics and Applied Mathematics 4: 133–136.

140.

Suen

H-Y

Hung

K-E

(2023) Comparing job applicant deception in asynchronous vs synchronous video interviews, with and without AI-assisted assessments. Information Technology & People 38(2): 963–983. https://doi.org/10.1108/itp-02-2023-0189

141.

Tivatansakul

Ohkura

Puangpontip

, et al. (2014) Emotional healthcare system: emotion detection by facial expressions using Japanese database. In: Proceedings of the 2014 6th Computer Science and Electronic Engineering Conference (CEEC). IEEE, pp. 41–46.

142.

Tiwari

(2023) Ethical and societal implications of AI and machine learning. International Journal of Scientific Research in Engineering and Management 7(1): 1–10. https://doi.org/10.55041/ijsrem17519

143.

Trigeorgis

Ringeval

Brueckner

, et al. (2016) Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In: Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 5200–5204.

144.

Trunk

Birkel

Hartmann

(2020) On the current state of combining human and artificial intelligence for strategic organizational decision making. Business Research 13(3): 875–919. https://doi.org/10.1007/s40685-020-00133-x

145.

Tyng

Amin

Saad

MNM

, et al. (2017) The influences of emotion on learning and memory. Frontiers in Psychology 8(1454): 1–22. https://doi.org/10.3389/fpsyg.2017.01454

146.

Van den Broek

Sergeeva

Huysman

(2021) When the machine meets the expert: an ethnography of developing AI for hiring. MIS Quarterly 45(3): 1557–1580. https://doi.org/10.25300/misq/2021/16559

147.

Vrij

(2008) Detecting Lies and Deceit: The Psychology of Lying and Implications for Professional Practice. John Wiley & Sons.

148.

Vrij

Granhag

Ashkenazi

, et al. (2022) Verbal lie detection: its past, present and future. Brain Sciences 20(12): 1644. https://doi.org/10.3390/brainsci12121644

149.

Vrij

Fisher

Leal

(2023) How researchers can make verbal lie detection more attractive for practitioners. Psychiatry, Psychology and Law 30(3): 383–396. https://doi.org/10.1080/13218719.2022.2035842

150.

Weinmann

Valacich

Schneider

, et al. (2022) The path of the righteous: using trace data to understand fraud decisions in real time. MIS Quarterly 46(4): 2317–2336. https://doi.org/10.25300/misq/2022/17038

151.

Weiss

Feldman

(2006) Looking good and lying to do it: deception as an impression management strategy in job interviews. Journal of Applied Social Psychology 36(4): 1070–1086. https://doi.org/10.1111/j.0021-9029.2006.00055.x

152.

Weitzel

Maier

Oehlhorn

, et al. (2019) Social recruiting and active sourcing. Ausgewählte Ergebnisse der Recruiting Trends.

153.

Wen

Wang

Chen

(2025) Trust and AI weight: human-AI collaboration in organizational management decision-making. Frontiers in Organizational Psychology 3(3): 1419403. https://doi.org/10.3389/forgp.2025.1419403

154.

Whissell

(2023) Emotion and the body. In: Whissell

(ed) Engaging with Emotion. Springer, pp. 65–87.

155.

Matta

Islam

, et al. (2020) German speech recognition system using DeepSpeech. In: Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval, pp. 102–106.

156.

Yang

Y-c.

Ying

Jin

, et al. (2021) Special issue editorial: information systems research in the age of smart services. Journal of the Association for Information Systems 22(3): 579–590. https://doi.org/10.17705/1jais.00673

157.

Yang

Lau

Abbasi

(2022) Getting personal: a deep learning artifact for text-based measurement of personality. Information Systems Research 34(1): 194–222. https://doi.org/10.1287/isre.2022.1111

158.

Yang

Qin

Fan

, et al. (2023) Unlocking the power of voice for financial risk prediction: a theory-driven deep learning design approach. MIS Quarterly 47(1): 63–96. https://doi.org/10.25300/misq/2022/17062

159.

Zhang

Narayanan

Choudhary

(2010) Voice of the customers: mining online customer reviews for product feature-based ranking. In: Proceedings of the 3rd Workshop on Online Social Networks.

160.

Zhang

Yin

Chen

, et al. (2020) Emotion recognition using multi-modal data and machine learning techniques: a tutorial and review. Information Fusion 59(59): 103–126. https://doi.org/10.1016/j.inffus.2020.01.011

161.

Zhao

Mao

Chen

(2019) Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomedical Signal Processing and Control 47: 312–323.

162.

Zuckerman

DePaulo

Rosenthal

(1981) Verbal and nonverbal communication of deception. Advances in Experimental Social Psychology 14: 1–59.