Abstract
The adaptive learning problem concerns how to create an individualized learning plan (also referred to as a learning policy) that chooses the most appropriate learning materials based on a learner’s latent traits. In this article, we study an important yet less-addressed adaptive learning problem—one that assumes continuous latent traits. Specifically, we formulate the adaptive learning problem as a Markov decision process. We assume latent traits to be continuous with an unknown transition model and apply a model-free deep reinforcement learning algorithm—the deep Q-learning algorithm—that can effectively find the optimal learning policy from data on learners’ learning process without knowing the actual transition model of the learners’ continuous latent traits. To efficiently utilize available data, we also develop a transition model estimator that emulates the learner’s learning process using neural networks. The transition model estimator can be used in the deep Q-learning algorithm so that it can more efficiently discover the optimal learning policy for a learner. Numerical simulation studies verify that the proposed algorithm is very efficient in finding a good learning policy. Especially with the aid of a transition model estimator, it can find the optimal learning policy after training using a small number of learners.
Keywords
Introduction
In a traditional classroom, a teacher uses the same learning material (e.g., textbook, instruction pace) for all students. However, the selected material may be too hard for some students and too easy for some other students. Further, some students may take a longer time in learning than the others. Such a learning process may not be efficient. These issues can be solved if the teacher can make an individualized learning plan for each individual student: Select an appropriate learning material according to each student’s proficiency and let a student learn at their own pace. Considering that a very high teacher–student ratio is required, such an individualized adaptive learning plan may be too expensive to be applied to all students. As such, adaptive learning systems are developed to provide individualized adaptive learning for all students/learners. In particular, with the fast growth of digital platforms, globally integrated resources, and machine learning algorithms, adaptive learning systems are becoming increasingly more affordable, applicable, and efficient (Zhang & Chang, 2016).
An adaptive learning system—also referred to as a personalized/individualized learning or intelligent tutoring system—aims at providing a learner with optimized and individualized learning experience or instructional materials so that the learner can reach a certain achievement level in the shortest time or reach as high as possible an achievement level in a fixed time period. An adaptive learning system typically works as follows. First, learners’ historical data are used to estimate their proficiency. Then, according to the level of their proficiency, the system selects the most appropriate learning material for the learner. After the learner finishes the learning material, an assessment is given to the learner and their proficiency level is updated and is used by the adaptive learning system to choose the next most appropriate learning material for the learner. Such process repeats until the learner achieves a certain proficiency level. 1 With a good adaptive learning system, teachers can focus on designing high-quality learning materials and tests without worrying about actively collecting feedback from students and selecting the most suitable materials for them. Meanwhile, such a system leverages the state-of-the-art technologies and can easily provide high-quality learning environments around the world, which in particular would benefit students in areas lacking (experienced) teachers, promoting not only efficiency but also equity in education.
In previous studies, the proficiencies, which are latent traits, were typically characterized as vectors of binary latent variables (Chen, Li, et al., 2018; Li et al., 2018; Tang et al., 2019). However, it is important to consider the granularity of the latent traits in a complex learning and assessment environment, in which a knowledge domain consists of several fine-grained abilities. In some cases, it would be insufficient to model learners’ abilities as mastery or nonmastery. For example, when a learner is answering an item designed to measure several latent traits, even if the learner is measured to have mastered all the traits individually, there is no guarantee that the learner will answer the item correctly. A possible reason is that the so-called mastery is not full mastery of a latent trait. By measuring learners’ traits as continuous scales, the adaptive learning system can be designed to help learners to learn and improve until they reach the target levels of certain abilities, so that the learners can achieve target scores in assessments. Especially in practice, most assessments are designed to measure learners’ latent traits (McGlohen & Chang, 2008). In such scenarios, it is better to use a continuous scale to measure the latent traits as the item response theory (IRT) does. In this article, we will develop an adaptive learning system that estimate learners’ abilities using measurement models in order to provide them with most appropriate materials for further improvements.
Existing studies have focused on modeling learners’ learning paths (Chen, Culpepper, et al., 2018; Wang et al., 2018), accelerating learners’ memory speed (Reddy et al., 2017), providing model-based sequence recommendation (Chen, Li, et al., 2018; Lan & Baraniuk, 2016; Xu et al., 2016), tracing learners’ concept knowledge state transitions over time (Lan et al., 2014), and selecting materials for learners optimally based on model-free algorithms (Li et al., 2018; Tang et al., 2019). However, explicit models are typically needed to characterize learners’ learning progresses in these studies. While there exist studies that aim to find the optimal learning plan/policy (the term policy is adopted throughout the rest of this article) that chooses the most appropriate learning materials for learners using model-free algorithms, they all assume discrete latent traits. In addition, when the number of learners is too small for the system to learn an optimal policy from, these algorithms are inapplicable.
This article studies the important, yet less-addressed adaptive learning problem of finding the optimal learning policy based on continuous latent traits, and applies machine learning algorithms to deal with challenges, including limitation of the number of learners in historical data. Specifically, we formulate the adaptive learning problem as a Markov decision process (MDP), in which the state is the (continuous) latent traits of a learner, and the action is the (discrete) learning material given to the learner. However, the state transition model is unknown in practice, thus making the MDP unsolvable using conventional model-based algorithms, such as the value iteration algorithm (Sutton & Barto, 2018). To solve the issue, we apply a model-free deep reinforcement learning (DRL) algorithm, the so-called deep Q-learning algorithm, to search for the optimal learning policy. The model-free DRL algorithm is a class of machine learning algorithms that solve an MDP by learning an optimal policy represented by neural networks from a sequence of state transitions directly when the transition model itself is unknown (François-Lavet et al., 2018). DRL algorithms have been widely applied in solving a variety of problems in many different fields, such as playing Atari games (Mnih et al., 2015), bidding and pricing in electricity market (Xu et al., 2019), manipulating robotics (Gu et al., 2017), and localizing objects (Caicedo & Lazebnik, 2015). We refer interested readers to François-Lavet et al. (2018) for a detailed review on the theories and applications of DRL. Therefore, the adaptive learning system is embedded with the well-developed measurement models and the model-free DRL algorithm so as to be more flexible.
However, a deep Q-learning algorithm typically requires a large amount of state transition data so as to find an optimal policy, which may be difficult to obtain in practice. To cope with the challenge of insufficient state transition data, we develop a transition model estimator that emulates the learner’s learning process using neural networks. The transition model that is fitted using available historical transition data can be used in the deep Q-learning algorithm to further improve its performance with no additional cost.
The purpose of this article is to develop a fully adaptive learning system, in which (1) the learning material given to a learner is based on their continuous latent traits that indicate the levels of certain abilities and (2) the learning policy that maps the learner’s latent traits to the learning materials is found adaptively with minimal assumptions about the learners’ learning process. First, an MDP formulation for the adaptive learning problem by representing latent traits in a continuum is developed. Second, a model-free DRL algorithm—the deep Q-learning algorithm—is applied, to the best of our knowledge, for the first time in solving the adaptive learning problem. Third, a neural network-based transition model estimator is developed, which can greatly improve the performance of the deep Q-learning algorithm when the number of learners is inadequate. Last, some interesting simulation studies are conducted to serve as demonstration cases for the development of adaptive learning systems.
The remainder of this article is organized as follows. In the Preliminaries section, we briefly review measurement models and make some assumptions on the adaptive learning problem. In the Adaptive Learning Problem section, we introduce the conventional adaptive learning systems and develop an MDP formulation for the adaptive learning problem. Then, we apply the deep Q-learning algorithm to solve the MDP in the Optimal Learning Policy Discovery section, where a transition model estimator that emulates the learners is also developed. Two simulation studies are conducted in the Numerical Simulation section and some concluding remarks are made at the end of this article.
Preliminaries
In this section, we give a brief introduction on measurement models for continuous latent traits, which is an important component in adaptive learning systems. The representation of learners’ latent traits and assumptions on them is also presented.
Measurement Models
In an adaptive learning system, a test is given to a learner/student after each learning cycle. The learner’s responses to the test items are collected by the system and their latent traits are estimated using measurement models, specifically IRT models (Lord et al., 1968; Rasch, 1960).
An appropriate IRT model needs to be chosen based on the test’s features, such as the test’s dimensional structure (Zhang, 2013) and its response categories. To be more specific, in the case when item responses are recorded as binary values indicating correct or incorrect answers, the test that evaluates only one latent trait will use the unidimensional IRT models (Birnbaum, 1968; Lord, 1980; Rasch, 1960), whereas tests that associate more than one trait will use the multidimensional IRT (MIRT) models (Mulaik, 1972; Reckase, 1972; Sympson, 1978; Whitely, 1980). When item responses have more than two categories, polytomous IRT models, such as the partial credit model (Masters, 1982), the generalized partial credit model (Muraki, 1992), and the graded response model (Samejima, 1969), are used for unidimensional case. Their extensions can be applied in multidimensional cases.
The basic representation of an IRT model is expressed as
where
where
With an online calibration design, an accurately calibrated item bank can be acquired using previous learners’ response data for an adaptive learning system without large pretest subject pools (Makransky & Glas, 2014; Zhang & Chang, 2016). After item parameters are precalibrated, a variety of latent trait estimation methods can be applied to estimate learners’ abilities. Conventional methods such as maximum likelihood estimation (Lord et al., 1968), weighted likelihood estimation, and Bayesian methods (e.g., expected a posteriori estimation, maximum a posteriori) can accurately estimate latent traits in MIRT models. Their variations are also extended for estimating the latent traits in multiple dimensions. Many latent trait estimation methods result in a bias on the order of as small as
Assumptions
Denote
Adaptive Learning Problem
In this section, we first describe the adaptive learning problem and then formulate this problem as an MDP.
Problem Statement
A conventional adaptive learning system is illustrated in Figure 1. Such an adaptive learning system is typical in traditional classrooms and online courses like massive open online courses (MOOCs; Lan & Baraniuk, 2016). In an adaptive learning system, the learner takes some learning materials to improve their latent traits. After the learner finishes learning the materials, a test or homework assignment is given to the learner. Then, the learner’s latent traits are estimated. Based on the estimated latent traits, the learning system adaptively determines the next learning material for the learner, which may take one of the many forms, such as a textbook chapter, a lecture video, an interactive task, examples, exercises, and so on. Note that the same learning materials may be reused to improve the students’ understanding of a subject. In fact, this is important as in classroom teaching, and reviewing the same material happens a lot as one effective way to help students understand the knowledge better. Meanwhile, examples and exercises can be abundant for students to use as a learning material. This cyclic learning process continues until the learner’s latent traits reach or are close to a prespecified level of proficiency, that is, when the values of the latent traits

Conventional adaptive learning system.
The tests in an adaptive learning system can be computerized adaptive testing (CAT). The CAT is a test mode that administers tests adapted to test takers’ trait levels (Chang, 2015). It provides more accurate trait estimates with much smaller number of items (Weiss, 1982) by sequentially selecting and administering items tailored to each individual learner. Therefore, a relatively short test can assess learners’ latent traits with high accuracy.
Conventionally, the learning policy is provided by a teacher as illustrated in Figure 1. As aforementioned, however, it is too expensive for teachers to make an individualized adaptive learning policy for each learner. In this article, we use a DRL algorithm to search for an optimally individualized adaptive learning policy for each learner. The algorithm selects the most appropriate learning material among all available materials for each learner based on their provisional estimated latent traits that are obtained from their learning history and performances in tests. The adaptive selection of learning materials guarantees the learner reaches a prespecified proficiency level in a shortest number of learning cycles or reaches proficiency level as high as possible in a fixed number of learning cycles. That is, instead of resorting to an experienced teacher for the construction of a learning policy as illustrated in Figure 1, we will develop a systematic method to enable the adaptive learning system to discover an optimal learning policy from the data that have been collected, which include historical learning materials, test responses, and estimated latent traits.
MDP Formulation
Primer on MDP
Before presenting the formulation for the adaptive learning problem, we first briefly review MDPs. An MDP is characterized by a 5-tuple
Let
The Markovian property of the transition model is that for any time step t
Essentially, the Markovian property requires that a future state is independent of all past states given the current state. Assume
Then, we can drop the superscript t and write the transition model as
Let
where
Furthermore, there is only one Q function that solves the Bellman optimality equation.
The Bellman optimality equation is of central importance to solving the MDP. When both
Adaptive Learning Problem as MDP
We next formulate the adaptive learning problem as an MDP as follows.
State Space: Define the vector of parameters describing the learner’s latent traits as the state, that is,
Action Space: We can categorize learning materials into different sets, within which the materials are close in the sense that they cover the same topic/skill and have a similar level of difficulty. Let the learning material sets available in the adaptive learning system be indexed by
Reward Function: Recall that the objective of the adaptive learning system is to minimize the learning steps it takes before a learner’s latent traits reach the maximum, that is, for
where
Transition Model: The probability distributions of the latent trait and the change of trait are unknown. As a result, the transition model
Based on this MDP formulation, the adaptive learning problem is essentially to find an optimal learning policy, denoted by
Optimal Learning Policy Discovery
In this section, we solve the adaptive learning problem by using the deep Q-learning algorithm, which can learn the action-value function directly from historical transition data without knowing the underlying transition model. To utilize the available transition information more efficiently, we further develop a transition model estimator and use it to train the deep Q-learning algorithm.
Action-Value Function as Deep Q-Network
Recall that the optimal learning policy can be readily obtained if we know the action-value function. When the state is continuous and the action is discrete, which is the case in the adaptive learning problem, the action-value function
Recall that in the adaptive learning problem, the state is continuous in
Once we have

Adaptive learning system with DQN. (DQN is used as action-value function.) DQN = deep Q-network.
Learning Policy Discovery With Deep Q-Learning
The parameters of the DQN can be learned from the sequence of latent traits (states) and learning materials (actions) using the deep Q-learning algorithm proposed by Mnih et al. (2013). In order not to distract readers from grasping the main idea of the adaptive learning system we proposed, we defer technical details of this algorithm to the subsection “Deep Q Learning Algorithm” in the Online Appendix. We highlight here the key idea, which is formulating the problem of finding optimal parameters of the DQN as an optimization problem that aims to minimize the approximation error of the action-value function based on Theorem 1, given a set of state transitions extracted from learners’ learning processes. Recall that one episode represents a complete learning process of one learner. One important feature of this algorithm is that it improves the parameters of the DQN iteratively in each episode by interacting with the learner. Another important feature of this algorithm is that it needs to sufficiently explore the state-action space in order to obtain a good approximate action-value function, which is achieved by the so-called
Transition Model Estimator
The deep Q-learning algorithm requires a sufficiently large historical transition data in order to find a good approximate of the action-value function, based on which the learning policy is then derived. However, we may not be able to obtain adequate transitions due to several reasons including the lack of adequate learners (sample size) and the long time it takes to acquire an individual learner’s learning path (transitions). Thus, it is more desirable to develop an adaptive learning system, which can efficiently discover the optimal learning policy after training on a relatively small number of learners. To this end, we develop a transition model estimator, which emulates the learning behavior of learners. Specifically, the transition model estimator can take a state
Conceptually, we can write the neural network that represents the transition model as
where
The adaptive learning system with the DQN and a transition model estimator is shown in Figure 3, where the DQN is trained against the transition model, instead of the actual learners.

Adaptive learning system with deep Q-network and transition model estimator.
Discussion on Potential Real-World Applications
Due to the difficulty in developing a production-grade system to be used in the real world and acquiring real-world data, the scope of this work is limited to the conceptual and methodological development of an adaptive learning system with continuous latent traits and the illustration of its advantages via some numerical examples. However, we would like to discuss its potential real-world applications, which is ultimately what we aim for, before we proceed to the simulation results. As mentioned earlier, an adaptive learning system can be used to substitute part of the functionalities of teachers. Specifically, it can continuously collect feedback from students and select the most suitable materials for each student based on the student’s proficiency level, while teachers can focus on designing high-quality learning materials and tests that are fed into the adaptive learning system.
We envision the adaptive learning system could sit as the core in a software that can be used online by an online learning platform such as MOOCs. For a specific skill/topic to be learned, the latent traits are defined first. Then, their estimator is selected from existing ones that have proven effective in practice. Meanwhile, sets of learning materials need to be developed by teachers and provided to the adaptive learning system. In the initial stages of an MOOC, the assignment of learning materials is determined by a teacher as a routine that is not adapted to each student. As shown in Figure 3, the adaptive learning system could train a transition model from the limited data samples accumulated during the initial stage. Based on the estimated transition model, a deep Q-network can be further trained to determine the learning policy, which can now substitute the teacher in assigning learning materials to each student adaptively according to the student’s proficiency level on a skill/topic. The transition model and deep Q-network will be iteratively improved as more data on students’ learning process accumulates. Eventually, the transition model becomes unnecessary once sufficient data on students’ learning process have been observed, and the deep Q-network can be learned from all real state transition processes, as shown in Figure 2. Note that models are trained for each topic/skill. 3 The modeling of latent traits as continuous variables and the low requirements on historical data make this adaptive learning system highly flexible. Such an adaptive learning system is very helpful to MOOCs as it can provide more personalized and efficient learning experiences to students participating in MOOCs, which ideally emulates a learning environment provided by an experienced teacher.
Numerical Simulation
In this section, we show the performance of the adaptive learning system with and without the transition model estimator and also investigate the impacts of latent trait estimation errors through two simulation studies.
Simulation Overview
Consider a group of learners in a two-dimensional assessment and a learning environment with three sets of learning materials. We model the group of learners as a homogeneous MDP. Let the random vector
where
In addition, under Assumption
and
An intuitive example is how a learner learns addition “+” and subtraction “–.” A learning process usually takes a long time, and thus, a monotonic decreasing, zero-concentrated distribution is adopted to simulate the proficiency increase. In that case, each learning step will most likely lead to a small increase of the proficiency. Besides, in the distribution
Estimation errors ranging from 1% to 15% are also added to estimated latent traits to evaluate their impacts on the adaptive learning system. Denote the estimation error vector by
Two simulations cases are studied. In the first case, the DQN is trained against actual learners whose abilities’ changes follow the MDP with the transition kernels described above. In this case, it is presumed that the optimal learning policy can be trained on sufficient number of learners. The resulting optimal learning policy is compared with a heuristic learning policy, which selects the next learning material that can improve the not-fully mastered ability, and a random learning policy which selects any material randomly from the set of three. The impact of different estimation errors is also assessed. In the second case, the DQN is trained against an estimated transition model that is obtained using a small group of learners. The resulting optimal learning policy is compared with that obtained by training against actual learners. The parameters for the deep Q-learning algorithm used in the simulation are detailed in the subsection “Simulation Parameters” in the Online Appendix.
Simulation Study I
Assume all learners are beginners on the two latent traits when using the adaptive learning system, that is,
Figure 4 presents the smoothed episode reward—the sum of reward at all steps in one episode or the negative of the TTM—under the deep Q-learning algorithm across the first 1,500 episodes with a smoothing window of 20. It can be seen that the reward converges to −15 after 600 episodes, which indicates the optimal learning policy is found after the DQN is trained using 600 learners.

Smoothed rewards under the deep Q-learning algorithm.
Figure 5 and Table 1 compare smoothed rewards across 200 new learners, labeled as episodes in Figure 5, with a smoothing window of 20 between the optimal learning policy found by the deep Q-learning algorithm after being trained in 2,000 episodes—referred to as the DQN learning policy, the heuristic learning policy, and the random learning policy. The larger the reward is, the fewer learning steps a learner takes to fully master the two latent traits, or in another word, the better the learning policy is. Clearly, the rewards obtained by the deep Q-learning algorithm have a higher mean and smaller SD than those obtained by the heuristic learning policy and the random learning policy. These results show that the learning policy found by the deep Q-learning algorithm is much better than the other two.

Smoothed rewards under deep Q-network, heuristic, and random learning policies.
Comparison of Deep Q-Network (DQN), Heuristic, and Random Learning Policies
Figure 6 presents an example of two state transition paths that show how the latent traits change with a sequence of actions taken under the DQN learning policy obtained without considering estimation error. In Path 1, the initial state is (0, 0), and the optimal action sequence is 1,1,1,1,3,3,3,3,3,2,2,2,2,2,2. Meanwhile, in Path 2, the initial state is (0.5, 0.5) and the optimal action sequence is 2,2,2,2,1,1,3,3. Take the addition and subtraction test as an example. The first learning material is repeatedly selected to improve the learner’s proficiency of addition at the beginning. Then, the third material related to both addition and subtraction is selected. In the last few steps, the second learning material is chosen to further improve the learner’s proficiency of subtraction.

State transition path examples with different initial states.
Figure 7 compares rewards under the DQN and the heuristic learning policies when estimation errors with various SDs (

Comparison of rewards under deep Q-network and heuristic learning policies with various estimation errors.
Simulation Study II
Next, we show the performance of the adaptive learning system with a transition model estimator, which is represented using a neural network with one hidden layer that has 32 units. The prediction accuracy indices are presented in Table 2. The train and test scores are defined as the coefficient of determination in the training and test sets respectively, calculated by
where
Accuracy of Transition Model Trained Against Various Numbers of Learners
A DQN is trained on 2,000 episodes against the estimated transition model that is fitted using a certain number of actual learners; the learning policy corresponding to this DQN is referred to as the virtual DQN learning policy. For the purpose of comparison, another DQN is trained on the same number of actual learners; the learning policy corresponding to this DQN is referred to as the actual DQN learning policy. Essentially, these two learning policies differ in the way how the same set of actual learners is utilized. The actual learners are simulated according to the method discussed in “Simulation Overview” section and are used to train the actual DQN learning policy directly. In contrast, these actual learners are used to first fit a transition model, which is then used to train the virtual DQN learning policy; this allows the virtual DQN learning policy to be trained over as many episodes as it needs. Figure 8 compares rewards obtained by the two DQN learning policies when various numbers of actual learners are utilized. It is shown that with no more than 200 actual learners, the utilization of the transition model can significantly improve the performance of the learning policy, generating much larger mean rewards compared than the algorithm without using the transition model. When the number of learners is large enough, both two approaches found optimal learning policies and yield similar rewards.

Comparison of rewards under actual and virtual deep Q-network learning policies.
Concluding Remarks and Future Directions
In this article, we developed an MDP formulation for an adaptive learning system by describing learners’ latent traits as continuous instead of simply classifying learners as mastery or nonmastery of certain skills. The objective of the system is to improve learners’ abilities to the prespecified target levels. We developed a deep Q-learning algorithm, which is a model-free DRL algorithm that can effectively find the optimal learning policy from data on learners’ learning process without knowing the transition model of the learner’s latent traits. To cope with the challenge of insufficient state transition data, which may result in a poor performance of the deep Q-learning algorithm, we developed a transition model estimator that emulates the learner’s learning process using neural networks, which can be used to further train the DQN and improve the its performance.
The two simulation studies presented in the article verified that the proposed methodology is very efficient in finding a good learning policy for adaptive learning systems without any help from a teacher. The optimal learning policy found by the DQN algorithm outperformed the heuristic and random methods with much higher rewards, or equivalently, much fewer learning steps/cycles for learners to reach the target levels of all prespecified abilities. Particularly, with the aid of a transition model estimator, the adaptive learning system can find a good learning policy efficiently after training using a few learners.
The directions for extending the adaptive learning research include applying the adaptive learning system on actual learners to further assess the efficiency of the proposed methodology. Both the DQN algorithm and the transition model estimator can be adopted and evaluated through real data analysis on an online learning platform. Second, to design an adaptive learning system excluding the component of the latent trait estimator could also be interesting. The adaptive learning system in this article consists of a latent trait estimator, which uses measurement models to estimate latent traits. However, the latent trait estimator will for sure introduce errors due to the nature of estimation methods. Some studies instead constructed the system assuming that learning materials influence learners’ responses to test items directly, without the latent trait estimator being incorporated (Lan & Baraniuk, 2016; Lan et al., 2014). Following this direction, a learning process could be modeled and traced directly and model-free algorithms could be proposed to find the optimal learning policy. Third, because each group of learners is assumed to follow a homogeneous MDP, future works can be conducted to classify learners into groups before they use the adaptive learning system in order to find the optimal learning policy for each group. Finally, future studies can include the modeled learning paths (Chen, Culpepper, et al., 2018; Wang et al., 2018) as learners’ historical data to search the optimal learning policy more efficiently.
Supplemental Material
Supplemental Material, sj-docx-1-jeb-10.3102_10769986221129847 - Deep Reinforcement Learning for Adaptive Learning Systems
Supplemental Material, sj-docx-1-jeb-10.3102_10769986221129847 for Deep Reinforcement Learning for Adaptive Learning Systems by Xiao Li, Hanchen Xu, Jinming Zhang and Hua-hua Chang in Journal of Educational and Behavioral Statistics
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
