How We Test Televisions
Updated August 12, 2014
See How We Test Other Products:
Reviewed.com Televisions tests TVs with a rigorous set of scientific methods that employ the same tools and techniques as the manufacturers. Rather than just looking at an ad hoc set images and videos on the screen, we perform an in-depth quantitative analysis using advanced instrumentation and professional tools that look at the performance of the display, determining how it produces on-screen images in extreme detail.
While other sites watch a couple of movies and discuss how grizzled the hero looks in a particular scene, we determine via extensive measurements and data analysis the true extent of the color gamut, examine the transfer function for all of the primary colors (as well as white), determine how accurately the color temperature of the whites is maintained over the entire luminance range, and examine how the display scales lower resolution video sources to appear on the screen. And that’s just some of the testing we do, which is described in greater detail below; we also evaluate the remote control, the speakers, documentation, the ease of use, and all aspects of display performance and picture quality and accuracy.
To develop this testing methodology, we worked with Dr. Raymond Soneira, the creator of DisplayMate, an advanced industry-standard diagnostic program that helps consumers, technicians, and manufacturers alike when it comes to setup, calibration, and testing of displays. In recent months, we also supplemented our testing procedures with CalMan 5 Professional. We consider this comprehensive testing process to be the most in-depth and authoritative in the world.
Our testing software includes a very large set of incisive, challenging and sensitive test patterns to check and optimize display performance, to show the effects of a display's internal processing, and to highlight the differences between displays (see below for a few examples). Scripting capabilities automate many of our tests, and access to an extensive library of test screens and test photos enable us to highlight the performance of the display, determining its strengths and weaknesses along the way.
Instrumentation and Data Analysis
To analyze the chromaticity of an TV display, we also use a Konica Minolta CS-200 Chroma Meter, a laboratory CS-200 ChromaMeter that provides extremely accurate color measurements for all display technologies. It has a narrow one-degree acceptance angle, which is very important for accurately measuring of TVs. For more details on this device, see here. The CS-200 can measure colored light sources in the range of 0.01 to 20,000,000 cd/m2, with an absolute accuracy of +/- 0.02 cd/m2. It is significantly more accurate than the instruments used by many other reviewers who use a set of color filters instead of the light spectrum—filters which generally have a wide acceptance angle that contaminates the luminance measurements. The CS-200 connects to a PC via a USB port, and every data sample is logged.
When testing the luminance values of an TV, we use the Konica Minolta LS-100 Luminance Meter, an instrument specialized specifically for the measurement of dynamic light values. TheLS-100 is capable of measuring light sources ranging from 0.001 to 299,900 cd/m2, and as a handheld spot instrument is capable of reporting both absolute and variable readings very quickly with a ±2% degree of error. Our testing process involves capturing many thousands of individual data points, which is done using a customized scripting system that automates the testing process. We then use a number of sophisticated mathematical tools to analyze this data and produce the results and scores that you see in the reviews on this site. For more details of what we test and how we analyze these results, scroll down to the individual test below.
Our testing process involves capturing many thousands of individual data points, which is done using a customized scripting system that automates the testing process. We then use a number of sophisticated mathematical tools to analyze this data and produce the results and scores that you see in the reviews on this site. For more details of what we test and how we analyze these results, scroll down to the individual test below.
Almost all TVs arrive with preset picture modes that are chosen by the manufacturer so that the TV looks best in a brightly lit retail showroom. As a result, the TVs are set for maximum brightness and contrast rather than maximum picture quality. We adjust all of the user controls to deliver the best and most-accurate picture quality by using a series of test patterns together with advanced instrumentation measurements and user control adjustments. This enables us to find the user control settings that produce the best balance of performance.
Our process also locates the optimum settings for contrast controls (without saturation or clipping), sharpness, and many others. This generally results in a significant reduction in peak brightness in order to deliver optimal picture accuracy and quality. In this case, we also discuss in the review the maximum possible luminance of the display and the consequence of these settings; many displays can provide extremely high levels of brightness, but these settings involve serious compromises in color accuracy and image quality.
We do not score TVs by altering controls that are hidden or require special access codes or equipment to access (such as those designed for professional installers or for use in calibrating the TV at the factory); if the control is not easily accessible to an everyday user, we don’t use it to score the display. We do this because we want to get the same experience that a user would get if they bought the display and then set it up, and most users will not be able to get access to the service menus. As part of the calibration process, we also set the backlight control to maximum.
To note the black level of the display, we measure the luminance on an industry standard ANSI checkerboard screen in candelas per square meter (cd/m2). We measure the black level several times during testing, reporting on any variance we see with these multiple measurements and discussing any dynamic backlight or local dimming functions as we go. However, the main figure that we quote is for the black level at our calibrated settings, with the backlight on maximum. Our score is based on how dark the black is: the lower the luminance, the higher the score.
To measure the brightest white the TV can achieve, we use the same ANSI checkerboard screen and measure the luminance again in candelas per square meter. We do this after calibrating the TV as described above. Our score is based on how bright the white is after calibration; the brighter the white, the higher the score. When the peak white varies with the size of the test pattern area, as in the case of plasma displays, we perform several measurements with different areas, each with a different APL (Average Picture Level).
To calculate the contrast that the screen can achieve, we divide the peak white luminance by the deepest black luminance they can produce when showing normal video and not in a standby mode. So, if a display has a deepest black of 0.4 cd/m2 and a peak white of 400 cd/m2 , the contrast ratio is 1000:1. Our score here is based on how high the ratio is; the higher the better.
Note that our tests differ from the approach that manufacturers use to determine the contrast ratio; they test the peak white with the backlight on full, then the deepest black, but with the lowest attainable setting (often called a "dynamic contrast ratio"). Our test determines the true contrast ratio with the backlight on full during the test (often called the "static full field contrast ratio"). For all displays, the ANSI checkerboard allocates luminance in a way that is closest to approximating the display's static contrast ratio—what you'll really see while watching. We gather all luminance-related data with the Konica Minolta LS-100 Luminance Meter, an instrument specialized specifically for the measurement of dynamic light values. The LS-100 is capable of measuring light sources ranging from 0.001 to 299,900 cd/m2, and as a handheld spot instrument is capable of reporting both absolute and variable readings very quickly with a ±2% degree of error.
For direct view LCD and Plasma displays the ANSI checkerboard contrast ratio is generally within a few percent of the full field contrast ratio above. Reviewers that find a significant discrepancy between the two are in instead measuring the veiling glare light contamination of their measuring instrument instead of the TV. See below.
The tests above tell us about the performance of the screen showing just pure whites and pure blacks, but not in the more real world situation of mixed white and blacks on screen. Some displays have problems here: with these areas of high contrast, the whites bleed into the blacks, making them appear brighter than they should and reducng color saturation at the same time. To measure this, we do a test where a variable width outer rectangular frame on the screen is set to peak white, and we then measure the luminance of a small black area at the center of the screen to see how much light bleeds to the center as the frame expands closer to the center. Some other sites have a much simpler test using a checkerboard pattern (and refer to this as checkerboard contrast), but our test gives much more information on how the increasing amount of white bleeds into the black area. Other sites also forget one important technical aspect of this test: that having white on the screen can lead to some of the light from the white screen area reaching the measuring instrument and creating an artificially high reading for the black (a problem called veiling glare, which produces very large measurement errors that lead to erroneous conclusions). We avoid this by using a special black Duvatyne mask to block the white areas of the display; any light that reaches the measuring device has come directly from the center target on the screen; not from the surrounding area on the screen. The score a display gets is based on how constant the black level remains; a constant black gets a higher score.
Another issue with peak white is that power management issues on some displays (particularly plasmas) require a reduction in peak white levels when the average screen brightness gets too high. We test this by putting up a number of images with varying amounts of white and measuring the luminance of the peak white. Our scoring for this test is based on how much the luminance varies with the different amounts of white on the screen.
This test examines the uniformity of the screen, looking at how even the lighting is across an entirely black or entirely white screen. We use the DisplayMate uniformity test screens to look for irregularities anywhere on-screen, which can either be hot spots (too bright) or cold spots (too dim) or mottled screens with widespread irregularities. We pinpoint and measure the irregularities with an LS-100. Points are deducted for corners or spots on screen that are not uniform, and also for any changes in luminance that are not gradual.
We determine the gamma of the grayscale transfer function by measuring the luminance of screens with varying signal intensities of gray from 0 to 255. The gamma is determined by measuring the slope of the transfer function on a logarithmic graph between 30 and 70 percent of peak signal, avoiding the bottom and top ends of the curve, which often include spurious irregularities.
We test resolution scaling by examining a number of CalMan test screens in a variety of non-native resolution formats for the display under test. The test screens are designed to examine the way that the TV processes the screens and scales them to fit the screen, highlighting any problems such as moiré pattern interference or dithering patterns that compromise legibility.
The color of white that an TV produces can vary significantly with factory settings and picture modes. The exact color of white is specified precisely by its CIE chromaticity coordinates, and more commonly by its correlated color temperature, which is a rough approximation to the light given off by a laboratory black body at a temperature of 5,000 to 15,000 degrees.
The Konica Minolta CS-200 ChromaMeter that we use can measure the chromaticity coordinates and correlated color temperature very accurately. We use this to measure the performance of the display being tested, measuring the red, green and blue primaries as well as the D65 point. We test by setting the display as close as possible to D65, which is a television and photographic industry standard. D65 approximates the color of daylight at noon on an overcast day and includes components of the both blue sky and direct sunlight.
Color and Grayscale Tracking
For color and grayscale tracking, we display a number of screens at intensity levels between 255 (the brightest white) and 0 (complete black), measuring both the color temperature and color coordinates of each point in the range. The scoring for this test is based on the amount of variance from the maximum intensity chromaticity values, measured in the CIE 1976 uniform color space (u’, v’). Although we feature both the color temperature variation and the CIE 1976 color space distance in our review, the score is based on the latter, as this provides a better measure of how the white of the display shifts within the color space. We discount any shift of less than 0.004, as this is not noticeable by most observers. This distance is shown on our charts by the red circle.
We determine the transfer function of a display for each of the primary colors by measuring the luminance of a screen for the range of signal intensities from 0 to 255. We then analyze the curve to determine the granularity and other characteristics. Our scoring is based on this analysis; issues such as excessive stepping, clipping and uneven response cost the display points.
We test how closely the display matches the standard primary colors of ITU-R BT.709 (generally referred to as Rec.709), which defines the color gamut of high definition TV signals. The scoring for this test is based on the distance between the measured and standard values; the greater the distance, the lower the score. We plot the measured and recommended gamut in the CIE 1976 Lu’v’ color space. Note that a color gamut that is greater than the standard is also undesirable; this will produce colors that are outside of the standard gamut, producing incorrect colors that are too saturated and not as the content producer intended.
Our motion test uses a variety of test screens and video sources, including the Multimedia with Motion edition of DisplayMate and a number of movie sequences. We use these to judge the quality of the motion on the display, looking for issues with ghosting, shadowing, smearing and other common artifacts.
* For reviews published after March 5, 2011, Motion Smoothness and Motion Artifacting have been combined into a single section called “Motion Performance.” The score displayed in this section is the sum of the scores for Motion Smoothness and Motion Artifacting.
We test the 3:2 pulldown processing (which is also known as 2:3 pulldown) capabilities of the display with the HQV Benchmark test disc. We also evaluate the performance of the display with a video source that has been processed with the telecine effect.
To test the performance of the display with a 24 frames per second signal, we use a PlayStation 3 configured to output a 24 frames per second signal playing a Blu-ray disc.
Our viewing angle test examines the contrast ratio and color shift of the display at different viewing angles. We measure the contrast ratio at 5 degree increments from 0 degrees (straight on) to ±85 degrees. Our scoring for this test is based upon the point at which the contrast ratio has fallen by 50 percent from the maximum we measured at 0 degrees. This means that our ranges of satisfactory viewing angles are very different from the ones the manufacturers publish, which are generally based on the angle at which the contrast ratio falls to 10:1. We feel that this is far too low, since most displays have a face-on contrast ratio of over 1000:1, making a 10:1 contrast ratio unwatchable.
We examine how reflective the screen is, considering how much light is reflected from the screen surface in a standard light setting. The points for this test are based upon how much the reflection interferes with the screen image.
We test power consumption using a Watts Up Pro power meter connected to a computer. In order to make the test results comparable between displays with different luminance levels, we calibrate the monitor backlight or other controls to produce a peak luminance of 200 cd/m2. If a display cannot reach that luminance, we get as close as possible. We then test the power consumption playing back a standard video sequence of 10 minutes of 1080i video recorded from a Comcast digital cable signal, measuring the power drawn at several points during the playback and averaging the result.
We then use these figures to calculate the typical cost of using this TV, working on the basis of electricity costing 10.7 cents a Kilowatt Hour (this is the 12-month average for the cost of electricity in the USA up to April 2008 from the EPA), with the viewer watching the TV for five hours a day, seven days a week, and leaving it in standby mode the rest of the time.
For LCDs we also record the wattage draw with the backlight on the minimum and maximum settings to provide a minimum and maximum figure for the power usage of the display. The weekly and yearly running costs for these figures are calculated in the same way.
How We Score
Every test we run results in a score, which allows us to compare displays directly, even if they are not tested side by side. Our rigorous, scientific scoring system ensures that our results are consistent, accurate, and representative of the strengths and weaknesses of a display. Many of our scores are open-ended; the score can climb beyond the nominal maximum of 10 as the performance of new models improves. Most reviewers use a fixed 1 to 10 scoring system, but this means that once a product has earned a top score, there is nowhere else to go; the reviewer has to reset the scoring system and start again. Our infinite score system allows us to keep going, so if a new technology comes along that provides radically better color or a more accurate color gamut, we can still score it, and compare it with other models that we tested before the new technology arrived.
* For reviews published after March 5, 2011, Motion Smoothness and Motion Artifacting have been combined into a single section called “Motion Performance.” The score displayed in this section is the sum of the scores for Motion Smoothness and Motion Artifacting. The sections Input Ports, Output Ports, Other Parts and Media have been combined into a single section called “Connectivity.” The score displayed in this section is a sum of the scores for Input Ports, Output Ports, Other Parts and Media. The Photo Playback and Music & Video Playback have been combined into a section called “Local Media Playback.” The score displayed in this section is a sum of the scores for Photo Playback and Music & Video Playback.
To create our overall score for each display, each individual score is multiplied by a weighting, which is based on how important we think the individual factor is to the typical consumer. The majority of the score is based upon the performance tests outlined above, but features also play a modest part.
SpectraCal, located in the Seattle area is the world’s leading provider of video display calibration software for both professional and consumer needs. SpectraCal constantly drives innovations in the field, providing the tools and training necessary to achieve accurate digital images and assisting customers with the step-by-step process of screen optimization.