Abstract
Regular aerobic physical activity (PA) increases exercise capacity and physical fitness (PF), which can lead to many health benefits. Accurate quantification of PA and PF becomes essential in terms of health outcome and effectiveness of intervention programmes. In this manuscript we present a review regarding the assessment of physical activity and fitness. Three types of PA assessment methods can be distinguished: criterion methods, objective methods and subjective methods. Criterion methods like doubly labelled water, indirect calorimetry and direct observation are the most reliable and valid measurements against which all other PA assessments methods should be validated, but they also hold important drawbacks. Objective PA assessment methods include activity monitors (pedometers and accelerometers) and heart rate monitoring. Finally, questionnaires and activity diaries are considered subjective methods. For the assessment of PF, we distinguish field tests and laboratory tests. The Eurofit for Adults is a test battery that is designed to assess health-related fitness of individuals, communities, sub-populations and populations. It is mainly used for evaluating the morphological component, the muscular component, the motor component and the cardio-respiratory component. In the laboratory, exercise capacity is preferentially assessed through maximal incremental exercise testing. Cardio-pulmonary exercise testing is a well-established procedure that provides a wealth of clinically diagnostic and prognostic information. The peak oxygen uptake is the gold standard in the assessment of exercise tolerance. When maximal exercise is contraindicated or not achievable, the VAT or the submaximal slopes provide reasonable alternatives.
Introduction
Physical inactivity is a seriously growing health problem. Epidemiological studies have shown that a sedentary lifestyle will contribute to the early onset and progression of atherothrombotic cardiovascular disease and is associated with a doubling of the risk of premature death [1–4]. In a descriptive review [5] and in a meta-analytic paper [6], the associations between professional physical activity (17 studies, 623 653 subjects) or leisure time physical activity (21 studies, 181 495 subjects) and cardiovascular mortality have been clearly demonstrated. Regular aerobic physical activity increases exercise capacity and physical fitness, which can lead to many health benefits [7, 8]. Physical fitness in its turn has been related to total and cardiovascular mortality [9, 10], and even small improvements in fitness may cause a lower mortality [11]. In chronic disease, physical activity has a favourable effect on patients with diabetes mellitus type II [12–14] and cancer [15, 16] and a protective effect on the development of hypertension [17–19] and obesity [20, 21]. Well-described is the secondary preventive effect of a higher level of physical fitness in patients with ischaemic heart disease [8, 22, 23].
Although both physical activity and physical fitness are related to mortality, the relationships between physical activity, fitness and health are complex. Bouchard and Shepard [24] proposed a conceptual approach to these relationships. Their health-related fitness concept indicates that physical activity shows an interaction with health-related fitness and health (see Fig. 1). Health-related fitness refers to the state of physical and physiological characteristics that define the risk levels for the premature development of diseases or morbid conditions presenting a relationship with a sedentary mode of life. The health-related fitness of a person can be expressed in five major components: (1) a morphological component (body mass for height, body composition, subcutaneous fat distribution, abdominal visceral fat, bone density and flexibility); (2) a muscular component (power or explosive strength, isometric strength, muscular endurance); (3) a motor component (agility, balance, co-ordination, speed of movement); (4) a cardiorespiratory component (endurance or submaximal exercise capacity, maximal aerobic power, heart function, lung function, blood pressure); and (5) a metabolic component (glucose tolerance, insulin sensitivity, lipid and lipoprotein metabolism, substrate oxidation characteristics). The concept postulates that exercise has a direct influence on fitness, such as endurance, strength, flexibility and co-ordination and on numerous health parameters (body composition, blood pressure, glucose tolerance, and lipid and lipoprotein levels). Bouchard and Shepard provide good arguments for the inclusion of all these components in the fitness definition. In clinical research, several of these components are considered as precursors of disease and are usually identified as risk factors.

Model of relations between physical activity, fitness and health. Adapted from [24].
There are numerous tests for measuring physical activity and fitness, ranging from questionnaires over simple field tests to more sophisticated laboratory tests. In this manuscript we will give a review concerning the assessment of physical activity and fitness.
How to assess physical activity?
Physical activity (PA) is not synonymous with physical fitness (PF) and can be defined as: ‘any bodily movement produced by skeletal muscles that results in caloric expenditure’ [25]. This broad concept implies that the larger the muscle mass involved, the larger the energy expenditure (EE). Physical activity-associated energy expenditure (AEE) is part of the total energy expenditure (TEE). Usually, TEE is divided into three components: resting metabolic rate (RMR) as the main component, diet-induced energy expenditure (DEE), and energy expenditure due to physical activity or muscular activity (AEE). The RMR represents an amount of energy (60–70% of TEE) required at rest to maintain body temperature and involuntary muscular contraction for functions including circulation and respiration. The RMR consists of sleeping metabolic rate, and arousal. RMR is about 5% higher than the sleeping metabolic rate. The RMR is affected by different factors like age, gender and body composition. Diet induced energy expenditure (about 10% of TEE) is required to digest and assimilate food. However, AEE is the most important source of variation between individuals, and accounts for 20–30% of TEE. Physical activity-associated energy expenditure is influenced by body weight and by the movement efficiency of the subject. It is obvious that there is a wide range of activities contributing to AEE, including PA during occupation, leisure time, sports, home and household activities, personal care and transportation. In 1992, the American Heart Association released a report that identified physical inactivity as the fourth major modifiable CHD risk factor [26]. In this report, the health value of moderate amounts and intensities of exercise was recognized. Therefore, accurate quantification of PA becomes essential in determining what dimensions of PA are of importance for a specific health outcome, in monitoring temporal events of PA, in evaluating the effectiveness of intervention programmes and studying dose–response relationships. However, PA is a complex concept, which can be determined by different indicators separately (e.g., frequency which refers to ‘the number of events of PA during a specific time period', duration which refers to ‘time of participation of a single bout of PA', intensity which refers to ‘the physiological effort associated with participating in a special type of PA') [25]. Therefore, assessment of PA is based on quantification of these underlying indicators.
The most frequently used PA assessment methods that are used in research will be discussed in this paper. An overview of the strengths and limitations of the techniques should make the choice of an appropriate assessment method for a specific research question well considered. Three types of PA assessment methods can be distinguished: criterion methods, objective methods and subjective methods. As PA is defined as bodily movement resulting in energy expenditure, it should be clear that ‘direct calorimetry’ (measuring EE by measuring heat production or heat loss) is the gold standard for PA assessment against which the validation of other methods should be made. This assessment method however is in most cases not feasible due to practical reasons. Indirect calorimetry, measurement of heat production, or EE by measuring oxygen consumption and/or carbon dioxide production should be used as criterion measurement for validation. Objective PA assessment methods include activity monitors (pedometers and accelerometers) and heart rate monitoring. Finally, questionnaires and activity diaries are considered subjective methods.
Gold standard
One of the earliest methods to assess PA is a direct behavioural observation of the (motor) activities by experienced observers. Different techniques exist for different PA settings (physical education or sport classes, free-living conditions), but the essence is to classify PA behaviours into distinct categories that can be quantified and analyzed by using codes [27]. The major strength of this method is the access to contextual information. Physical activity is reciprocally influenced by the environment and this information is of utmost importance for cognitive-behaviour research to change any – sedentary – behaviour. This method is often used to study PA patterns of children since other techniques (e.g., pedometers, questionnaires; see later) are not appropriate for this group. Unfortunately, direct observation is a very time-consuming and tedious job [28] and is therefore not convenient for large-scale studies.
The doubly labelled water method (DLW) is a variant of indirect calorimetry and is applicable to both laboratory and field studies. The strength of this method is that metabolic processes are measured, which are directly related to PA. The principle of DLW is to ingest a standardized amount of two stable isotopes (2H and 18O) as water (2H2 18O). The isotopes distribute themselves in equilibrium with body water (determined from urine sample). Deuterium (2H) is eliminated from the body as water (2H2O); 18O is eliminated as water (H2 18O) and carbon dioxide (C18O2). The difference in elimination rates (over 5–14 days) of the isotopes provide a measure of the CO2 production and therefore of EE [29]. This method is accurate within 3–10% of calorimeter values in adults [29, 30], the within-subject variation for DLW is 8% [31], it is applicable to children and it provides accurate measurements of free-living conditions as it is not likely to influence PA patterns. However DLW has also some limitations. Production and analysis of these isotopes is expensive and therefore not suitable for large-scale studies, it can only measure total energy expenditure, consequently it cannot make a distinction between PA energy expenditure (AEE; ±10–30%), basal metabolic rate (RMR; ±60–70%) and diet-induced energy expenditure (DEE, ± 10%) [32]. In that respect, combination of DLW with indirect calorimetry would be beneficial. This latter method measures EE from O2 consumption and CO2 production in a ventilated hood (or in a respiration chamber, which is more expensive). Room air with known concentrations of O2 and CO2, is pulled into the canopy and respiratory gases are pulled from the plastic canopy for O2 and CO2 analysis. Food is chemically processed by using O2 to deliver energy to the body in the form of both heat and free energy for locomotion. This O2 consumption is dependent on the composition of the food being metabolized (carbohydrates versus fat). So, by measuring the O2 consumption, indirect estimation of the energy expenditure and thus the RMR, and DEE can be made [33]. Thus, TEE (from DLW)= RMR (from indirect calorimetry) + DEE (from indirect calorimetry) + AEE, so by using this equation AEE can be derived. Like DLW, indirect calorimetry has too many practical problems to apply on a large sample. Therefore, these methods will remain a good standard for validation of other objective and subjective PA assessment methods until less expensive and portable, lightweight metabolic systems become available.
Objective techniques
Motion sensors can register body motion. When a person moves, the body is accelerated in relation to the muscular forces responsible for the acceleration and thus in relation to EE [34]. The acceleration can be measured in one (vertical), two (vertical and medio-lateral) or three (vertical, medio-lateral and anterior-posterior) dimensions. Pedometers are small devices with a spring mechanism that register movements in the vertical direction and it is usually worn on the waistband in the midline of the thigh. It is used to count the steps over a period of time, often from waking up until the person goes to sleep. These steps can be converted to distance when an average stride length is entered. Consequently, only walking or running-related physical activities can be registered. Cycling, swimming, movements of upper body, carrying a load or moving on soft or graded terrain are not correctly monitored with this technique. However, since walking or running is part of most of our PA pattern, the application of pedometers remains very valuable for estimating the total amount of daily movement. Therefore, pedometers are very useful instruments for health campaigns as ‘10,000 steps a day'. Crouter et al. [35] studied the validity of ten pedometers and concluded that ‘pedometers are most accurate for assessing steps, less accurate for assessing distance and even less accurate for assessing kilocalories'. This technique is not appropriate for people with a large proportion of activities without ‘vertical’ movement. Another disadvantage of this method is the inability to provide information about the intensity of the movement.
These problems can be partially resolved by sophisticated, but small, accelerometers that monitor movements in more than one plane. It does not function with a mechanical lever such as the pedometers do, but it uses piezoelectric transducers and microprocessors to quantify the magnitude and direction of the acceleration, referred to by the dimensionless ‘counts'. The tri-axial accelerometers are, in theory, able to monitor all movements and should be considered as the best accelerometer now available, although some limitations of the pedometer for complex movements (upper body, graded terrain, cycling, and so forth) remain. It has been shown that a linear relationship exists between accelerometry counts and EE [36, 37]. Consequently, the energy cost of physical activities can be estimated using linear regression equations with height, body weight, age and gender as co-variables. There is however general consensus that accelerometer-based monitors provide a valid estimate of overall PA but it is a less accurate indicator of EE [38, 39], especially for point estimates for specific activities. However, accelerometry is a very popular technique in PA research [40–43].
The third objective PA assessment technique is the heart rate monitoring (HR). Heart rate is an indication of the intensity of a relative stress that is placed upon the cardio-respiratory system during movement and is therefore an indirect measure of PA. The method relies on the linear relationship between heart rate and oxygen consumption in the moderate to vigorous range of PA. In rest and during low-intensity activities, this relationship is not linear and is confounded by factors other than energy demands (e.g., caffeine, stress, smoking, body position) [44]. When this relationship of a subject is known, the HR recording can be used to estimate the oxygen consumption and thus EE in free-living conditions [45, 46]. The recording of the HR is usually minute-by-minute and can be stored for several hours and days, thus providing information about duration, frequency and intensity of the activity but also on TEE. The FLEX HR method is a thoroughly examined approach for estimating EE from heart rate monitoring [39, 46, 47]. The goal is to determine a HR point above which a moderate activity is performed. As the HR in rest is confounded by many factors as mentioned earlier, it is convenient to know from which point the HR increase is caused by PA and not by the environment. Because of the large variability in HR data due to the different confounding factors, estimates of EE may be unreliable at the individual level [48], but it seems to have a good epidemiological validity [44]. In addition, HR monitoring is an unobtrusive, relatively inexpensive instrument for use in epidemiological studies despite the fact that for each individual a calibration of the HR-VO2 regression is required to set the individual FLEX HR.
The next generation of free-living PA assessment is a combined heart rate and movement sensor. The combination of synchronized data of heart rate monitoring and movement registration may improve precision of AEE [49].
Subjective techniques
Traditionally PA questionnaires or surveys have been used to measure PA as it is an inexpensive tool and easily applicable to large populations. This technique however relies on the subjective interpretation of the questions and perception of the PA behaviour of the subject itself. Caution should be taken when using a questionnaire in a young or elderly population, as their memory can be impaired [50, 51]. In general, a time frame of 1 day to 1 week [52, 53] is considered as a reference period with exceptions sometimes of a few months to even an entire lifetime [54, 55]. Under- and over-estimation of PA can be influenced by many different factors (e.g., social desirability, age, complexity of the questionnaire, seasonal variation, length of period surveyed) [50, 56–58].
Survey techniques can be classified into four categories: self-report questionnaires, interviewer-assisted questionnaires, proxy-report questionnaires and diaries [34]. All these questionnaires should be validated against a criterion method (DLW, indirect calorimetry or direct observation) or an objective technique (pedometers, accelerometers or HR). Philippaerts et al. studied the reliability and validity against DLW of three frequently used PA questionnaires [59, 60] and concluded that the Tecumseh Community Health Study Questionnaire [61], the Five City Project Questionnaire [62] and the Baecke Questionnaire [63] provided reliable and valid PA data. Racette and colleagues [64] compared the 7-day PA recall questionnaire in obese women with DLW and two PA questionnaires (Zutphen PA questionnaire and PA scale for elderly) were validated against DLW in an elderly population [65, 66]. Results from these validation studies show that questionnaires in general might be valid to classify a population into distinct categories of PA behaviour (e.g., low, moderate, highly active) but they are not appropriate to quantify the energy expenditure at the individuals’ level [67, 68]. The reader is directed to an excellent review of Shephard [68] of the limitations of PA assessment by questionnaires. In the last decade, the development of the information technology, such as computer networks, multimedia software, and the Internet, gives the opportunity to develop electronic surveys useful for PA research [69–71]. There are several major advantages of computerized questionnaires, compared to traditional techniques like interviews or paper surveys [72]. First, information technology enables the researcher to administer the questionnaires to a large number of people simultaneously. Second, having the subjects enter the answers directly on the computer eliminates all coding errors, which is still possible in interviews and paper and pencil surveys. Third, the subjects cannot omit questions. Moreover, depending on the answers of the subjects, the computer program skips unnecessary questions resulting in a shorter administration time. Finally, some studies indicated that the subjects might be more honest in reporting undesirable behaviour to a computer than to the paper and pencil format or the researcher [73, 74].
Overview of strengths and weaknesses of physical activity assessment methods
∗ PA, physical activity; EE, energy expenditure.
In summary, correct assessment of PA behaviour and energy expenditure related to PA is essential to study the effects of PA on potent health benefits or the effects of an intervention on the PA level. Criterion methods like doubly labelled water, indirect calorimetry and direct observation are the most reliable and valid measurements against which all other PA assessment methods should be validated but they also hold important drawbacks. Financial costs, the invasiveness and limitation to mainly laboratory situations are the most important disadvantages. Objective methods like pedometers, accelerometers and heart rate monitoring all have their specific strengths and weaknesses. Pedometers and accelerometers are not appropriate for monitoring complex movements, cycling or movements on a graded terrain. Heart rate monitoring relies on the linear relationship between the intensity of the PA and the response of the cardio-respiratory system, though it is not reliable for sedentary or very light-intensity activities. However, these techniques are very often used as they are relatively inexpensive, easy to wear, unobtrusive, provide valid data for most common physical activities and they can monitor free-living physical activities. Accelerometers have the advantage of monitoring the intensity of the movement and the possibility to estimate the energy expenditure. Finally, questionnaires come in many forms and should be validated against a criterion method, as they are prone to subjective interpretation, memory and report variation. The questionnaire technique is the most commonly used PA assessment in epidemiology as it is a very cheap method and easily applicable on large samples. Moreover, it is a good tool for assessment of the PA level of a group but it should not be applied for individual analysis. Table 1 summarizes the most commonly used different PA assessment methods with their strengths and limitations.
How to assess physical fitness?
Since Sargent in 1921 proposed the vertical jump as a physical performance test for men [75], considerable change has taken place both in our thinking about physical performance, physical fitness and about its measurement. Physical fitness has been defined in many ways. The American Academy of Physical Education adopted the following definition: ‘Physical fitness is the ability to carry out daily tasks with vigour and alertness, without undue fatigue and with ample energy to engage in leisure time pursuits and to meet the above-average physical stresses encountered in emergency situations’ [76]. Often the distinction is made between an organic component and a motor component. The organic component is defined as the capacity to adapt to and recover from strenuous exercise, it relates to energy production and work output performance. The motor component relates to development and performance of gross motor abilities. Since the beginning of the 1980s the distinction between health-related and performance-related physical fitness has come into common use [77]. Health-related fitness is then viewed as a state characterized by an ability to perform daily activities with vigour, and traits and capacities that are associated with low risk of premature development of the hypokinetic diseases (i.e., those associated with physical inactivity) [77]. Health-related physical fitness includes cardio-respiratory endurance, body composition, muscular strength and flexibility. Performance-related fitness refers to the abilities associated with adequate athletic performance, and encompasses components such as isometric strength, power, speed–agility, balance and arm–eye co-ordination. Most recently the health-related fitness concept was redefined by Bouchard and Shepard [24], taking into consideration developments in exercise and clinical sciences. In their view, health-related fitness comprises five major components, as described in the introduction. Within the context of this overview it would be impossible to review all five components. Consequently the review will give a description of field tests for motor, muscular and cardio-respiratory fitness and concerning the laboratory test, it will be limited to the cardio-respiratory component. It is common practice to validate field tests against the more sophisticated laboratory tests, although the criterion-related validity is only one aspect of the validation process [78].
Field-tests
The above indicates that considerable change has taken place both in our thinking about physical performance, physical fitness and about its measurement. In many studies considerable effort was made to obtain tests that are objective, standardized, reliable and valid. For more information about test construction the reader is referred to Safrit [78] and Anastasi [79]. Although limited, some attempts were made to construct criterion-referenced norms [80]. Within the context of the health-related fitness concept expert panels created standards of required fitness levels, for example, 42 ml/kg per min for oxygen uptake (VO2) in young men and 35ml/kg per min for young women. Very little empirical evidence is available to create such criterion-related standards for the other health-related fitness items.
In the early days the expression ‘general motor ability’ was used to indicate one's ‘general’ skill. The term was similar to the general intelligence factor used at that time. Primarily under the influence of Brace [81] and McCloy [82], a fairly large number of studies were undertaken and a multiple motor ability concept replaced the general ability concept. There is now considerable agreement among authors and experts that the fitness concept is multidimensional and several abilities can be identified. Ability refers to a more general trait of the individual, which can be inferred from response consistencies on a number of related tasks whereas skill refers to the level of proficiency on a specific task or limited group of tasks. A person possesses isometric strength since he or she performs well on a variety of isometric strength tests. Considerable attention has been devoted to fitness testing and research in the USA and Canada. The President's Council on Youth Fitness, the American Alliance for Health, Physical Education, Recreation and Dance [83–85] and the Canadian sister organization [86] have done an outstanding job in constructing and promoting fitness testing in schools. The fundamental works of Fleishman [87], and of the International Committee for the Standardisation of Physical Fitness Tests, now the International Council for Physical Activity and Fitness Research [88] has received considerable attention. These works served, for example, as the basis for nation-wide studies in Belgium [89, 90]. Furthermore, the fitness test battery constructed by Simons et al. [91] served as the basis for studies in The Netherlands [92] and for the construction of the Eurofit test battery [93]. Later the Eurofit for Adults was proposed [94]. This test battery is designed to assess health-related fitness of individuals, communities, sub-populations and populations. The target group for this health-related fitness battery is adults aged from 18–65 years. The components, factors and tests that are included in the Eurofit for Adults are given in Table 2. The components are the same as those used by Bouchard and Shephard [24], with the exclusion of the metabolic component. The factors are the underlying general abilities. These factors have been identified in factor analytic studies including a large number of fitness tests [87, 91]. These factors measure independent abilities of the total fitness domain [91] and the factor structure is more stable over the growth period including young adults, and is similar to different studies [87, 91].
Eurofit for adults (adapted after Oja and Tuxworth, 1995 [94])
Components and factors according to Bouchard and Shephard, 1994 [24]. Tests are the Eurofit tests for adults (Oja & Tuxworth, 1995, [94]), 2nd and 3rd priority refers to tests that, at that time, showed no clear evidence of associations with health, but better evidence has been subsequently provided.
Components
It is perhaps of interest to define these factors and identify the tests and measurements that are used to measure these factors or components. The description of the factors and tests are adapted after ACSM, [95], Bouchard and Shephard [24] and Simons et al. [91].
Body mass for height, body composition and abdominal visceral fat all refer to components of body composition especially in view of their associations with obesity, type II diabetes, hyperlipidaemia, hypertension and cardiovascular disease. In field studies the body mass index, skin-folds preferably taken at several sites on the limbs and trunk, and waist and hip circumferences are used to quantify these factors. It is evident that in the laboratory setting more reliable (less measurement error) and valid indicators can be used, but this applies for all field tests. As mentioned, it is not the purpose of this overview to discuss in further detail the very extensive literature on body composition and the variety of techniques used to assess body composition.
Flexibility is the ability to move a joint through its complete range of motion. It is of importance in a variety of athletic performances but also in the capacity to carry out the activities of daily living.
Muscle strength refers to the maximal force that can be generated by a specific muscle or muscle group. It can be measured with a variety of devices including tensiometers, handgrip dynamometers and strength gauges.
Muscular endurance is the ability of a muscle group to execute repeated contractions over time or to maintain a maximal voluntary contraction for a prolonged period of time. Sit-ups, curl-ups, push-ups, and bent arm hangs are tests used to quantify this factor.
Explosive strength or power is the ability to carry out a maximal, dynamic contraction of a muscle or muscle group. It is the maximum rate of working of a muscle or muscle group. It is usually measured in a single effort such as the vertical jump or the standing long jump.
Balance is the ability to maintain over a period of time the whole body equilibrium. It is measured by a variety of tests on a beam, balance on one foot on the floor or on a beam, or by whole body sways.
Speed is the ability to move the whole body or parts of the body as quickly as possible over a distance. Running tests and tests of arm or leg movement speed are used.
Cardio-respiratory fitness is related to the ability to perform large muscle, dynamic, moderate-to-high exercise for a prolonged period [95]. The performance of such exercises depends on the functional state of the cardiovascular, respiratory, and skeletal muscle systems. In the Eurofit test battery a sub-maximal bicycle exercise test is included, a 2 km walk test, or the multistage shuttle run. The sub-maximal bicycle test is probably the most objective, reliable and valid indicator of the aerobic power but it is demanding in resources, especially when large groups are tested. The 2km walk test is most suitable for mixed groups of adults. Large groups of adults can be tested within a short period. Subjects are required to walk briskly on ground level for 2 km, and their heart rate is recorded. The test result is a predicted VO2 max or a derived fitness index [94]. The multistage shuttle run test is a maximal, gradual running test whereby the subject runs on a 20m track at an imposed speed, dictated by a sound signal. At the beginning the pace is set at 8 km/h and every minute the pace increases by 0.5 km/h. The stage at which the subject drops out is the test result. This result can also be used to predict maximal aerobic power [94]. These tests have been selected for the Eurofit test battery for adults since they showed the best psychometric (objectivity, standardization, reliability, validity and availability of reference data). It is obvious that a variety of other tests have been used and some of these tests are quite popular such as the Cooper 12-min test. The objective of this test is to cover the greatest distance in the allotted time period. Also, the distance covered can be converted in a predicted maximal aerobic power [95]. In chronic disease, this test was adapted to the 12-min or 6-min walk test.
With increasing awareness about safety and risks involved in testing, some testing procedures have been adapted, for example, sit-ups were originally tested with straight legs and hands crossed behind the neck whereas in more recent procedures the arms are crossed over the chest, the knees are bent and the subject curls to a position in which the elbows touch the knees or thighs. In the latter procedure there is less risk of causing low back pain [95].
Laboratory tests
In the laboratory, exercise capacity is preferentially assessed through maximal incremental exercise testing. These tests are not a reflection of daily activity levels [96], but could to some extent be seen as the maximal capacity of subjects to carry out tasks of daily life. Maximal incremental exercise testing renders clinicians’ insight in the maximal exercise capability. The peak oxygen uptake, which is largely independent of the work rate increment [97] is the gold standard in the assessment of exercise tolerance [10, 98]. Peak work rate is higher when larger increments are used, and should not be used as a marker of exercise capacity. During the incremental exercise test systems engaged in performing exercise (heart, circulation, ventilation, pulmonary and peripheral gas exchange), are put under increasing stress. The variables obtained during maximal incremental exercise testing (see Table 3) give insight in the functioning of these different systems and their coping with increasing exercise stress. This renders the incremental exercise test interesting for diagnostic and prognostic purposes. However, some limitations are worth mentioning. One of the critical points is the standardization and quality control of the test. As many variables are measured simultaneously, several measurement errors may occur. The maximal character of the test may also be influenced by the motivation of patients.
The variables obtained during clinical maximal incremental exercise testing
Value of cardiopulmonary exercise testing
Cardiopulmonary exercise testing is a well-established procedure that provides a wealth of clinically diagnostic and prognostic information. It is non-invasive, relative inexpensive and evaluates an individual's capacity for dynamic exercise [99]. Exercise capacity is a strong and independent predictor of cardiovascular disease and mortality. The prognostic value of peak oxygen uptake has been well documented in patients with ischaemic heart disease [10, 22, 100], chronic heart failure [101–103], systemic [104, 105] and pulmonary hypertension [106], and other chronic conditions [107]. A recent study identified VO2peak as the only variable, besides age and comorbidity to be predictive of future dependence in the elderly [108]. Peak oxygen consumption is also the gold standard to assess physiological progression after exercise training in patients with heart disease [109] and healthy elderly. In patients undergoing exercise training, the peak oxygen uptake obtained after the programme is a better predictor of risk compared to the peak oxygen uptake before the programme [109, 110].
Directly measured VO2 has been shown to be a reproducible marker of exercise tolerance and it also provides objective and additional information regarding the patient's clinical status and factors which limits exercise performance [98, 103, 111, 112].
Cardiovascular system
When a non-invasive incremental exercise test is performed, the cardiovascular system is evaluated through the evolution of heart rate and systolic blood pressure in relation to the increase in oxygen consumption. Chronotropic incompetence, defined as failure to achieve a heart rate above 80–85% of the age-predicted maximum, has recently been confirmed as a negative prognostic sign in a large cohort of patients not taking beta-blockers [113]. Although the age-predicted maximum can be questioned, the window of 85% is probably large enough to be sensitive to abnormality. Obviously in patients limited in their exercise tolerance by reaching the ventilatory, muscular or pulmonary gas exchange limits, the prognostic value of this age-predicted threshold can be questioned.
From the electrocardiogram (ECG), abnormalities in terms of arrhythmia's, conduction disturbances and ST-T changes, reflecting cardiac ischaemia during exercise are of major diagnostic importance. In the context of this paper, however, this will not be discussed further. For the diagnostic accuracy of exercise testing, the reader may read a seminar paper of Ashley et al. [114].
The oxygen uptake/heart rate ratio or oxygen pulse has traditionally been used as a non-invasive measure of stroke volume. In fact, it reflects (after rewriting the Fick-equation) the product of stroke volume and the difference in arterial and venous oxygen content (VO2/HR = SV∗(C(a − v)O2). Diseases affecting the arterial oxygen content, such as anaemia, increased carboxyhaemoglobin levels, severe arterial hypoxaemia will reduce the O2-pulse [115], but in absence of these anomalies, the oxygen pulse may be seen as an approximation of the stroke volume. Another calculated parameter of interest during exercise is the rate pressure product (RPP = heart rate∗systolic blood pressure). The RPP is very closely related to myocardial oxygen consumption [115].
Lastly, the search for non-invasive methods to evaluate cardiac output and stroke volume during exercise testing is long lasting (dye- or thermodilution techniques [116, 117], cardio-impedance [118] and CO2-rebreathing [119, 120]). Recently, it was also shown that automated measures of cardiac output by means of CO2-rebreathing are reproducible and feasible during graded maximal exercise testing [121].
Ventilatory system
The load on the ventilatory system is evaluated through the assessment of the pulmonary ventilation (VE). Dyspnea could be the result of a failure to further increase ventilation when the maximal ventilation is reached. This is often the case in patients with obstructive lung disease. Evaluation of the maximum ventilatory capacity at rest is in clinical routine often done by performing maximum voluntary ventilation (MVV) for 12–15s [115]. Although the MVV is currently our most practical estimation of the maximal ventilatory capacity of a patient, it is important to realise that the test is effort dependent, and the breathing pattern during the manoeuvre does not represent the breathing pattern during exercise [122]. Recent technology, however, allows investigating the tidal flow-volume loops obtained during exercise [123]. This also allows investigation of the operational lung volumes and eventual dynamic hyperinflation during exercise (i.e., gradually reduced inspiratory capacity). Dynamic hyperinflation increases the work of breathing, and is one of the best predictors of exercise-induced dyspnea [124, 125]. Dynamic hyperinflation is also suggested to reduce cardiac output during exercise [126], which may contribute to the reduced exercise tolerance and is also a common feature in patients with cardiac disease [127] and may help to explain symptoms of dyspnea in these patients. Overall pulmonary gas exchange is evaluated through the assessment of arterial oxygenation. In healthy subjects partial arterial oxygen tension (PaO2) and oxygen saturation remains unchanged throughout the incremental exercise test. De-saturation is common in lung disease and in patients with intrapulmonary [128] or cardiac [129] right-to-left shunting during exercise.
The ventilatory anaerobic threshold
The ventilatory anaerobic threshold (VAT) represents the point at which ventilation abruptly increases, despite linear increases in VO2 and work rate [130]. In most cases, the VAT is highly reproducible; although it remains dependent on the choice of ergometer, exercise protocol, method of detection and evaluator. It represents only one point during the exercise test and it may not be achieved or readily identified in some patients, particularly those with very poor exercise capacity [111, 131–134].
Slopes
The efficiency of peripheral gas exchange (reflected in VO2 and VCO2) with respect to pulmonary gas exchange (reflected in essence by ventilation) and cardiac output (to some extent assessed through heart rate, or more accurately by specific methods such as right heart catheterization or CO2 re-breathing) is another, often overlooked feature of the incremental exercise test. A steep increase in pulmonary ventilation (VE) for a given increase in CO2 production (VCO2) is generally indicative for high dead space ventilation [135, 136] or poor lung diffusing capacity and hence poor ventilatory efficiency (i.e., large pulmonary ventilation for low alveolar ventilation or poor pulmonary gas exchange). This is typically observed in patients with pulmonary oedema, interstitial or obstructive lung disease, cyanotic congenital heart disease [135] or pulmonary hypertension [137]. In the latter disease, specific treatment with pulmonary artery vasodilators significantly improved the VE/VCO2 [138]. This index hence introduces a variable reflecting pulmonary efficiency in the cardiopulmonary exercise test. It is, therefore not surprising that high VE/VCO2 values (indicating poor pulmonary gas exchange efficiency) are a bad prognostic sign [139]. In chronic heart failure the VE/VCO2-slope has been shown even to be a better predictor of cardiac related mortality or hospitalization compared to VO2peak [140].
By contrast, a steep increase in ventilation for a given rise in oxygen consumption generally reflects deconditioning and early onset of lactic acidosis with compensatory early onset of hyperventilation to buffer the acidosis. Since the relation between VO2 and VE is exponential the plot is generally inverted (VE on X-axis, and VO2 on Y-axis) and log-transformed to achieve linearity. Hence a steeper slope or oxygen uptake efficiency slope (OUES) is an index of physical fitness that is independent of the motivation of the patient to perform maximal exercise [141, 142]. The advantage of this index is that it can be calculated in virtually every subject and uses large amounts of data-points obtained throughout the exercise test.
Test protocol
Testing of patients can be performed either on a treadmill or on a bicycle ergometer. The bicycle ergometer offers the convenience of a stable sitting position and is more familiar in Europe whereas the treadmill is the more common testing mode in the USA. A bicycle ergometer is also less expensive, occupies less space and is less noisy than a treadmill. Upper body motion is usually reduced, making it easier to obtain blood pressure measurements and to record ECG. Another advantage is the exact knowledge of the external work performed, allowing one to evaluate the VO2–work rate relationship. A major limitation to cycle ergometer testing is the discomfort and fatigue of the quadriceps muscles. Leg fatigue in an inexperienced subject may cause him or her to stop before reaching a true peak VO2. Studies demonstrated that the peak VO2, the ventilatory threshold, and minute ventilation are generally 10–20% higher with treadmill testing [98, 111]. This may be a benefit in cardiac stress testing.
The selection of an appropriate protocol for assessing capacity is of critical importance. Protocols can differ considerably in terms of the rate with which work is incremented, the duration of time between stages, and total exercise time [98]. Recent exercise testing guidelines recommended that the exercise protocol be adapted to the subject, that the increments in work be reduced and that the total duration of the exercise test be maintained between 8–12 min [95, 98, 111].
Guidelines for exercise testing
Many organizations have established guidelines for cardiopulmonary exercise testing. These guidelines are generally slightly influenced by the focus of the organization [98, 99, 143–146]. Readers are referred to the appropriate guideline depending on their background. Contraindications for exercise testing are given by the American Heart Association [144] and the ESC [146]. In the presence of an absolute contraindication no exercise test should be performed. In the presence of a relative contraindication the need to obtain the results of the test should be balanced with the increased risk for the individual when performing the test. It is also of major importance, both at the field tests and the laboratory test, that a risk stratification strategy is incorporated in the testing procedure [95, 147, 148]. This risk stratification is of importance to screen individuals relative to risk factors for various chronic cardiovascular, pulmonary, and metabolic diseases to optimize safety during exercise testing and to develop effective exercise programmes.
Footnotes
Acknowledgements
W.H. is a postdoctoral fellow of the ‘Bijzonder Onderzoeksfonds KULeuven'. T.T. is a postdoctoral fellow of the ‘Fonds voor Wetenschappelijk Onderzoek-Vlaanderen'. L.V. is holder of the Faculty Chair ‘Health and Lifestyle', Utrecht, the Netherlands.
