The Science Journal of the American Association for Respiratory Care


October 2002 / Volume 47 / Number 10 / Page 1200

Evaluating a New Blood Gas Sensor

Evaluating a new measuring device that promises benefits over an older device poses some interesting problems. We can compare measurements made with the new device to known values and determine its accuracy. Alternatively, we can compare the new device to an older device whose accuracy is assumed and determine if the new one will be a suitable replacement. The study designs and data analyses are similar in both cases. However, there are important underlying philosophical issues that are often confused. As a result, there is a tendency for authors to "go through the motions" of acceptable statistical analysis while missing the point in their conclusions. A recent article in RESPIRATORY CARE illustrates this.1

Meyers et al evaluated a continuous intravascular blood gas sensor (Neotrend, Diametrics Medical, Palo Alto, California) to determine if it "...would produce clinically acceptable bias and precision in comparison to laboratory values.... "1 Sensor measurements were compared with arterial blood gas measurements made with a standard laboratory analyzer. The differences between sensor and laboratory values were used to calculate bias and precision, as suggested by Bland and Altman.2 The authors then concluded that the device was "accurate."

The study is well done and the authors are to be congratulated for conducting a useful study and using the appropriate data analysis. But there are 2 problems with their conclusion. First, studies of accuracy, by definition, require that measured values be compared to some type of accepted standard values.3-6 The accepted standard for blood gas measurement is whole blood, tonometered to specific values, using precision gas mixtures. In contrast, Meyers et al compared measured values from the intravascular sensor with values from a standard laboratory analyzer. The values from the laboratory analyzer are, strictly speaking, not the true values, because we know there are measurement errors. That is why the Bland-Altman plots of the data show the mean value of each data pair along the horizontal axis; we don't know which one of the pair is the true value, so our best guess is the average of the two.2 The method used in the Meyers et al study is appropriate for the evaluation of agreement between 2 methods, not for determining the accuracy of a new device. Bland and Altman never even mentioned the terms accuracy and precision.2

The idea behind evaluating agreement is to estimate the systematic error by calculating the mean difference between the new device and the conventional device. Then the random error is expressed as the standard deviation of the differences. The resultant total error or "limits of agreement" are then calculated as the mean difference ± 2 standard deviations. In other words, 95% of all future measurements with the new device should agree with the standard device by being within those limits. If the limits are small enough, we may conclude that the new device can be used in place of the standard device, with no effect on the quality of care.

Though my first objection to the study's conclusion may be splitting hairs, the second is more problematic because it could adversely affect patient outcomes. Meyers et al set out to determine if the intravascular blood gas sensor would produce "clinically acceptable" results. However, they never defined what clinically acceptable means. How large would the limits of agreement have to be to reject the device as clinically unacceptable?

As explained in detail elsewhere,5 acceptable differences between new and conventional measurements must be established a priori. These standards can be derived in a number of ways, but essentially one must decide if the expected difference will be of clinical importance (ie, that the clinical decision might differ depending on which instrument was used for the measurement). For example, in our blood gas laboratory the maximum acceptable difference in PO2 values on a split sample measured on 2 similar machines is 7 mm Hg. Thus, it would be reasonable for our lab to demand that any new device provide individual measurements whose limits of agreement are at most ± 7 mm Hg, to preserve our standard of care. This type of reasoning has been used in other studies of similar devices.7

Meyers et al found that the worst-case expected difference between the intravascular sensor and the laboratory values (ie, bias minus 2 standard deviations) was as follows:

pH: -0.06
PaCO2: -11 mm Hg
PaO2: -31 mm Hg

with the negative sign indicating that the sensor value is larger than the laboratory value (because the limits were calculated as laboratory value minus sensor value).

The question that Meyers et al should have addressed in their conclusion is whether such differences are clinically acceptable. They could have simply said that, in their opinion, differences between laboratory and sensor values of 0.06 units for pH, 11 mm Hg for PaCO2, and 31 mm Hg for PaO2 are acceptable, and anything higher is not. They did not do that. Instead they stated that " from the Neotrend sensor fall within the accuracy range required for discrete blood gas analyzers." This statement is unexplained, unreferenced, and, in my opinion, misleading. As I suggested earlier, split sample results run on identical blood gas analyzers show less difference. Section 493.1213(b) of the Clinical Laboratory Improvement Amendments of 1988 (CLIA) standards states: "Although no specific guidelines exist for verifying a test method, each laboratory is responsible for determining the performance characteristics of its own methods... verification may be accomplished by comparison of split sample results with results obtained from a method that has been shown to provide clinically valid results." Since the College of American Pathologists guidelines for blood gas laboratories recommend split sample testing among similar analyzers as an ongoing quality control measure, laboratories approved by the College of American Pathologists should already have the type of agreement data necessary to make such comparisons. These are the criteria each lab should use in determining whether to accept new technology. Proficiency testing standards mentioned in CLIA are not really appropriate, because they were meant for comparing a laboratory's performance with 10 or more refereed laboratories. There are more sources of error between laboratories than within a single laboratory, so those criteria would be too loose for assessing the acceptability of a new device introduced to the practice of one hospital.

What happens if we accept less stringent standards? This is a question that is usually ignored in device evaluation studies. Most authors simply assert that the new device is or is not acceptable based on some arbitrary criteria and never consider the subsequent effect on clinical decisions. Keep in mind that the subjects in the Meyers et al study were neonates with respiratory failure, presumably on mechanical ventilation. Therefore, we might frame the question in terms of whether a clinical decision, such as a ventilator setting change, might be indicated simply as a result of the type of blood gas analyzer used, as opposed to a real clinical condition. In other words, would using the intravascular sensor in place of a laboratory analyzer cause any important change in the standard of care?

For example, a PaCO2 of \g 50 mm Hg might be a clinical decision point indicating the need to increase minute ventilation, whereas a PaCO2 of 35-45 mm Hg might indicate that no change was needed. Suppose the patient's true PaCO2 was 41 mm Hg. Using the limits of agreement found in the Meyers et al study, it would be possible for the laboratory analyzer to give a reading of 40 mm Hg while the intravascular sensor gave a reading of 51 mm Hg. Thus, if we were relying on the intravascular sensor in place of the normal laboratory measurement, we would be inclined to make an unnecessary ventilator change. Such a change might be to increase the tidal volume and hence increase the risk of lung injury. The more unnecessary changes we make, the more risk the patient might incur and the longer the duration of ventilation might be.

Is this just an extreme example? Let's conduct a simulation to see. Suppose that we have guidelines for ventilating neonates with respiratory failure.8 These guidelines provide a set of target ranges for pH, PaCO2, and PaO2. Measured values outside of the target ranges result in a change in the ventilator settings. A simulation is then created using a computerized spreadsheet program, such as Microsoft Excel, as follows:

1. Generate simulated laboratory analyzer values. Randomly select a value within some possible physiologic range for each blood gas variable. Because the current standard of care is based on laboratory values, we consider the random values to be the results of a standard blood gas analyzer.

2. Generate simulated intravascular sensor values. First we use the spreadsheet to simulate values for the difference between the intravascular sensor and laboratory readings by randomly selecting numbers from 3 normal distributions; 1 for pH, 1 for PCO2, and 1 for PO2, each with the mean and standard deviation values observed by Meyers et al. Then these differences are subtracted from the simulated lab values to get simulated sensor values (because, from the Meyers et al report, the difference was the laboratory value minus the sensor value).

3. Evaluate the need for a ventilator change. Finally, we compare the simulated lab values and sensor values to the target ranges for pH, PaCO2, and PaO2. If any variable is out of its target range, we make a ventilator change. Table 1 shows a portion of the data from 3,000 simulated blood gas determinations.

Table 1

Given the assumptions I used for this simulation, the simulated blood gas laboratory values indicated ventilator changes 79% of the time, compared to 86% of the time based on the intravascular sensor readings. That difference is significant: p < 0.0001. Repeated computer simulations give similar results. Thus it is reasonable to conclude that using the intravascular sensor in place of conventional laboratory analysis would result in unnecessary ventilator changes 5% of the time and thus change the standard of care. When you consider the number of ventilator changes made per month in a large nursery, 5% seems like a lot of wasted time. Whether those ventilator changes would affect patient outcomes is pure speculation, but we can say for sure that it would cost more in terms of labor hours per ventilator day.

Not surprisingly, as the differences between the laboratory values and sensor values become smaller (ie, smaller bias and precision for pH, PaCO2, and PaO2), the difference in clinical decisions becomes smaller. In fact, you can easily adjust the bias and precision values in the spreadsheet until the difference in percentage of Yes answers is acceptable. When bias and imprecision are set to zero, there is no longer any difference in the percentage of Yes answers. This may be a useful procedure for setting acceptable limits of agreement a priori. Though you still end up with an arbitrary threshold, at least it has a more obvious clinical relevance. The moral of the story is that, if there is any difference between old and new devices, there will always be some effect on clinical decision-making.

It is easy to be seduced by the convenience and speed of bedside measurement devices, especially when they offer other advantages such as reduced blood loss. But if we sacrifice scientifically based decisions for such assumed benefits, we may pay hidden costs in the long run. If we accept less precise blood gas measurements, our ventilator and drug therapy decisions will be less precise and hence might increase costs and length of stay and might affect outcomes. It seems to me that it would be a lot more practical to avoid this possibility by maintaining accuracy standards now than to try to prove in the future that reducing standards does no harm.

Robert L Chatburn RRT FAARC
Department of Respiratory Care
University Hospitals of Cleveland
Cleveland, Ohio


  1. Meyers PA, Worwa C, Trusty R, Mammel MC. Clinical validation of a continuous intravascular neonatal blood gas sensor introduced through an umbilical artery catheter. Respir Care 2002;47(6):682-687.
  2. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;1(8476):307-310.
  3. Chatburn RL. Fundamentals of metrololgy: evaluation of instrument error and method agreement. Respir Care 1990;35(6):520-545.
  4. Chatburn RL. Evaluation of instrument error and method agreement. AANA J 1996;64(3):261-268. Reprinted in: Respir Care 1996;41(3):1092-1099.
  5. Chatburn RL. Principles of measurement. In: Tobin MJ. Principles and practice of intensive care monitoring. New York: McGraw-Hill; 1998:45-61.
  6. Chatburn RL, Volsko TA, Brougher P. Evaluating medical devices. In: Branson RL, Hess DR, Chatburn RL. Respiratory care equipment, 2nd edition. Philadelphia: Lippincott Williams & Wilkins, 1999:709-732.
  7. Wahr JA, Lau W, Tremper KK, Hallock L, Smith K. Accuracy and precision of a new, portable, handheld blood gas analyzer, the IRMA. J Clin Monit 1996;12(4):317-324.
  8. Chatburn RL, Carlo WA, Lough MD. Clinical algorithm for pressure-limited ventilation of neonates with respiratory distress syndrome. Respir Care 1983;28(12):1579-1586.

The authors respond:

We read with interest and thank Mr Chatburn for his comments regarding our recent study.1 The crux of the matter is his contention that the continuous measurement technique we describe may provide misleading information that could result in unnecessary or incorrect ventilator changes. He bases that conclusion on hypothetical examples of possible patient situations. The exercise is interesting, but we disagree with the conclusions drawn. He also comments on the analysis techniques used and discusses what "clinically acceptable" really means.

First we are taken to task for our use of the term "accurate," and Mr Chatburn suggests we "go through the motions of acceptable statistical analysis while missing the point." Clinical trials are difficult and time-consuming to perform. To suggest we would design a trial only to "go through the motions of acceptable statistical analysis" implies either that we did not know the correct analysis to perform or that we did not care. Neither are true. We used the Bland-Altman technique precisely because of its appropriateness for this type of study, as a number of other authors have done.2-6 We used the term "accurate" in discussing our findings. We did not compare the sensor in vitro to tonometered gas specimens. We did, however, compare our measured values to a widely used and validated clinical laboratory analyzer that is generally accepted as producing "accurate" values, even though exact agreement between laboratory analyzers and tonometered specimens is neither found nor expected. As Mr Chatburn points out, we show agreement between the 2 methods as opposed to "accuracy" per se. Is this accurate enough? We believe the readers can evaluate our work, as well as that performed by others, and draw their own conclusion.

Regarding CLIA standards, we removed some of this information during the revision process. But in CLIA section 493.927 we find criteria for acceptable performance of PO2 = target value ± 3 mm Hg; PCO2 = target value ± 5 mm Hg; pH = target value ± 0.04 pH units.7 We used the laboratory analyzer as the target value; our findings fall within the latter ranges.

We agree with Mr Chatburn regarding the need for caution when a new monitoring or measurement tool is introduced. Nowhere did we suggest that this new monitoring technique should replace those currently in use. In addition to the Neotrend monitor, we use continuous oxygen saturation monitoring, intermittent laboratory analysis of arterial blood gases, and continuous monitoring of other physiologic variables. When one indicator is out of line with either the clinical situation or another similar monitor, further testing and investigation is necessary. This does not mean that continuous information should be discarded because it may not exactly reflect values generated in the central laboratory. Mr Chatburn's arguments could also be used to suggest that transcutaneous techniques such as continuous oxygen saturation monitoring should be discarded. We believe that this technique, like all others used in patient management, require good clinical judgment in their application. When used appropriately, continuous monitoring has the potential to improve patient management.

Patricia A Meyers RRT
Mark C Mammel MD

Infant Diagnostic and Research Center
Children's Hospital
St Paul, Minnesota


  1. Meyers PA, Worwa C, Trusty R, Mammel MC. Clinical validation of a continuous intravascular neonatal blood gas sensor introduced through an umbilical artery catheter. Respir Care 2002;47(6):682-687.
  2. Goddard P, Keith I, Markovitch H, Roberton NRC, Rolfe P, Scopes JW. The use of a continuously recording intravascular oxygen electrode in the newborn. Arch Dis Child 1974;49(11):853-860.
  3. Morgan C, Newell SJ, Ducker DA, Hodgkinson J, White DK, Morley CJ, Church JM. Continuous neonatal blood gas monitoring using a multiparameter intra-arterial sensor. Arch Dis Child Fetal Neonatal Ed 1999;80(2):F93-F98.
  4. Zimmerman LJ, Dellinger RP. Initial evaluation of a new intra-arterial blood gas system in humans. Crit Care Med 1993;21(4):495-500.
  5. Tobias JD, Connors D, Strauser L, Johnson T. Continuous pH and pCO2 monitoring during respiratory failure in children with the Paratrend 7 inserted into the peripheral venous system. J Pediatr 2000;136(5):612-617.
  6. Weiss IK, Fink S, Harrison R, Feldman JD, Brill JE. Clinical use of continuous arterial blood gas monitoring in the pediatric intensive care unit. Pediatrics 1999;103(2):440-445.
  7. Naeve RA. Managing Laboratory Personnel. The CLIA and OSHA manual. New York : Morrison & Foerster, Thompson Publishing Group; 1994: appendix 1, 34-35.

The entire text of this article is available in the printed version of the October 2002 RESPIRATORY CARE.

You are here: » Contents » October 2002 » Page 1200