DESIGN AND DEVELOPMENT OF HUMAN
EQUIVALENT INSPECTION SYSTEM
A. Mazour, S. King
Canpolar East Inc.
44 Austin Street, St. John's, NF, Canada
A1B 4C2
Tel: (709) 722-6067
Email: info@canpolar.com
Web: http://www.vetech.com
ABSTRACT
The human visual system engages a wide field of view peripheral vision in conjunction with selectively scanned high resolution foveal vision. The effective scene resolution of the human eye is equivalent to a camera with 108 pixels. This performance is difficult if not impossible to match with available camera technologies.
Canpolar East has recently developed a machine vision system that utilizes a low resolution wide field camera plus a high resolution narrow field camera that is able to fixate at 30/60 frames per second. The system was specifically designed to match human visual performance in industrial inspection tasks.
The system includes software that selects objects of interest from the low resolution images for high resolution imaging. The system is capable of selection and fixation at about 25 "saccades" per second.
1.0 INTRODUCTION
Machine vision systems have been very successful in delivering precision, speed and unfatiguing vigilance that cannot be matched by human beings. However there is a large class of inspection and surveillance tasks for which machine vision has not been able to provide performance that would be competitive with human inspection. Typically this class of problem involves subtle distinctions of hue, intensity and morphology in a cluttered and often variable environment. Many such inspection problems are inherent to food product inspection. For example, detection of pits in processed fruits, candling of eggs and fish products, etc.
Some of these inspection tasks can be extremely demanding even for a human eye-brain and cannot be emulated with existing technologies. Others are relatively simple tasks that, nevertheless, have eluded automation. Egg candling is a good example of a simple inspection task which human beings can perform at throughputs of 6 eggs per second but for which standard machine vision technology has not been successful to date.
Full emulation of human eye performance cannot be achieved using brute force methods. A human equivalent vision system would have to be capable of capturing and processing 1.26 x 108 pixels, corresponding to the number of rods and cones in a single human eye(1). Such technology in economical format does not exist even in imagination. A 5K x 5K CCD sensor costs over $100K. The cost of capturing and processing this very large amount of data is prohibitive.
Canpolar East has been working with industrial clients for several years to automate some of the more exacting but tedious food inspection tasks in which machine vision has not previously been successful. We have studied the human eye as a system in the performance of selected inspection tasks and identified the components of vision that are critical to the task.
2.0 ANALYSIS OF HUMAN EQUIVALENT INSPECTION
Designing a human equivalent system requires emulation of human eye functions. Spatial resolution, intensity discrimination and color discrimination are important functional parameters of the human vision system that must be replicated by a machine vision inspection system.
Tests of psychophysical performance of the human eye are not free from the effects of neurophysicsal information processing which takes place in the neural ganglia overlying the retina (as well as post processing in the brain). The following analysis concentrates only on threshold detection phenomena, including spatial resolution, colour and intensity discrimination. All of these threshold functions are essentially retinal phenomena which do not involve extensive neurological processing. The intent of this exercise is to define the minimum retinal performance to be emulated by a camera system.
2.1 SPATIAL RESOLUTION (HUMAN EYE)
Three ways used to describe the spatial resolution are, the minimum resolvable separation, the modulation transfer function (MTF) and the shape and geometry tests. Human visual acuity is the reciprocal in minutes of arc of the angle subtended by the smallest visible detail. If the smallest detail is defined as the minimum separable distance between 2 points as illustrated in Figure 1, then the value of can vary, under optimum conditions, from 25 arc sec (visual acuity 2.4) to 30 arc sec (visual acuity 2.0)(2). For a given working distance (a), the minimum separable distance between two points (b) is;
Where is the spatial resolution of the human eye. The distance b can be expressed as;
![]()
Where c is the number of lines per millimetre. The best human visual acuity may enable the resolution of about 14 lines per millimetre or 7 line pairs per millimetre (LPMM) at 500mm range, while the average human can resolve about 4 LPMM.
Another test used for eye function is the modulation transfer function (MTF). This test measures the response of an optical system to a periodic sinusoidal line pattern(3). Note that the MTF of the eye is sensitive to illumination levels and to contrast. The limiting resolution for the average eye is about 2 cycles/mrad, while the optimal resolution occurs at 0.5 to 1 cycles/mrad (4).
The shape/geometry tests (e.g, the Snellen Letter test) are based on discrimination of detail in complex objects such as alphabetical letters. Normal "20/20" eyesight entails the correct resolution of details which subtend 0.29 mrad (1 arc minute). Rare individuals have an acuity as high as 0.15 mrad ( arc minute)(5).
Figure 1. Schematic of Human Visual Acuity. The ergonomic working distance for many industrial inspection tasks is in the range from 300 to 500 mm. For typical food defect inspection, the working distance is about 500 mm. At this range, the average human eye would resolve about 4 LPMM.
The dynamic range and sensitivity of rods and cones in the eye is comparable to silicon photodetectors used in CCD cameras. Silicon has a spectral sensitivity peak at about 800 nm. This is usually modified for commercial video to provide a match to the human eye spectral range and color discrimination capability. There is no consequential contrast sensitivity difference between the biological and physical photodetector elements of a camera or an eye.
On the other hand the eye does posses a number of contrast enhancing (lateral inhibition) and image dithering (tremor) mechanisms which would make it somewhat superior to an equivalent photodetector array. It is reasonable to conclude that a photodiode or CCD array with the same photodetector spacing as the eye might have a somewhat poorer performance in terms of spatial resolution. Consequently, a "human equivalent" vision system should have a photodetector angular subtense smaller than 150 rad, the photodetector spacing in the eye.
An empirical investigation completed as part of this work concluded that machine vision resolution of about 5 LPMM would provide human equivalent performance for typical inspection tasks (pixel spacing on object plane of 78m)(6). In the case of fish fillet inspection, the required pixel array for a single fillet image of 200mm x 400mm would be 2564 x 5128: about 50 times the 512 x 512 resolution of a standard video camera.
It should be noted that the human eye does not subtend an entire scene in a single view. The high resolution segment of the eye, the fovea, is about 0.3mm in diameter subtending about 15mrad(7). For viewing at a range of 500mm the area subtended in a single saccadic fixation is about 7.5mm in diameter. Peripheral vision provides low resolution (high contrast, monochrome) information with respect to outlying objects; saccadic movement of the eye steers the foveal image to specific points of interest building up a full image. Successive "subframes" or saccades are acquired at 3 to 10Hz(8). The total number of cones in the fovea is about 6 million. However, the total data acquisition by the eye, including peripheral vision, includes processed data from about 126 million receptors at a rate of about 10 saccades/second(9).
2.2 INTENSITY DISCRIMINATION (HUMAN EYE) The contrast discrimination is related to general scene illumination as well as to target size. Luminosity contrast is defined as(10);
Where B is the target luminance and B1 is the background luminance. The limiting value for CL under ordinary conditions is about 0.02 or 2%(11). The dynamic range of the combined photopic (colour) and scotopic (monochrome) systems is about 109 over the luminance range of 10-6 cd/m to 103 cd/m(12).
Optimal contrast sensitivity can be obtained from the eye or from a camera only when the eye is accommodated to scene illumination or when the camera is adjusted so that the image signal occupies a large percentage of the sensor dynamic range. For a CCD camera with an 8 bit dynamic range, the contrast discrimination between two elements is 1 in 256 only when both elements are close to saturation. At 50% saturation the discrimination is only 1 in 128.
The capability on an 8 bit camera to match the human eye depends upon the scene illumination, aperture, shutter speed and other camera factors which control the degree of saturation in individual pixels. The same considerations apply to the human eye but the mechanisms are taken for granted.
The dynamic range of electronic camera systems spans the same range as the human eye, however, most commercial cameras are not capable of adapting to extremes of illumination within a single scene. Some of the newer digital cameras can modulate in-frame sensitivity, avoid anti blooming and provide good image quality over a wide range of scene illumination. The human eye copes with scene illumination variations by building up a composite picture from smaller elements of the "apparent" field of view. Scene visualization is possible over a broad dynamic range.
2.3 COLOUR DISCRIMINATION (HUMAN EYE)
Evaluation of colour discrimination in humans is typically done with standard colour discrimination test sets such as the Farsnworth-Munsell 100-Hue Test(13). The primary use of this test is to group individuals into classes of superior, average and low color discrimination. The test set consists of 85 numbered color caps grouped into 4 series. Standard procedure for testing is to have the subject arrange the color caps in each series according to color. Scores are calculated by counting 4 for each 2-cap transposition and 8 for each 3-cap transposition.
Approximately 68% of the population score between 20 and 100 which is considered to be average discrimination. Another 16% generally score between 0 and 16 which is considered superior color discrimination. The remaining 16% score over 100 which constitutes low color discrimination.
Colour cameras provide RGB output at 8 bits per colour. The effective colour discrimination for the camera will depend on the specifics of the colour filter/receptor designs.
3.0 SYSTEM DESIGN AND TESTING
A human equivalent machine vision system must match or exceed human eye function in respect to spatial resolution, contrast discrimination and spectral sensitivity. Canpolar East has designed a camera system (VE-379) that emulates some of the peripheral and foveal aspects of human vision therefore providing a capacity to successfully perform inspection tasks like fish candling, traffic monitoring and other inspection tasks that require high resolution processing.
The VE-379 is a dual camera system in which a wide field of view "low resolution" camera acquires a "frame" image of the scene for analysis to identify features of interest such as blobs, bright spots, motion, etc. The system then steers the view of a second "high resolution" camera toward the identified areas of interest, see Figure 2. The wide field of view camera provides about 1 mrad resolution which is sufficient to resolve 0.5 LPMM at 500 mm range. The high resolution camera provides a "subframe" with resolution of about 0.1 mrad which is sufficient to resolve 5 LPMM at 500 mm range. The steering system is able to relocate the subframe field of view to any location in less than 1 millisecond. The system is therefore capable of acquiring a different image with each subframe at 30 Hz.
Both VE-379 cameras are single CCD 24 bit color cameras with RGB output. In the standard configuration the standoff distance of the VE-379 from the target is 500 to 750 mm. The frame field of view is approximately 300mm x 400mm and the high resolution subframe field of view is approximately 20mm x 25mm.
Figure 2. VE-379 Frame and Subframe Images. (Image quality is degraded due to reproduction) To verify that the VE-379 satisfied the essential requirements for human equivalent inspection, a series of tests were conducted to quantify performance characteristics. Tests included spatial resolution, intensity discrimination, color discrimination and subframe acquisition rate.
3.1 SPATIAL RESOLUTION TEST (VE-379)
The spatial resolution of both the frame and the subframe images were tested using a standard 1951 USAF glass test pattern. Lighting conditions for the tests were controlled to give 100% transmitted illumination and 0% reflected illumination from the target surface. At 675mm range with a field of view of 300mm x 400mm the MTF for the frame image was measured as 0.32 at 0.5 LPMM.. For the high resolution subframe image at this range with a 20mm x 25mm field of view the MTF was measured as 0.50 at 5 LPMM. Figures 3a and 3b show the MTF test results for the frame and subframe images across the frame field of view and for various target orientations.
3.2 COLOUR DISCRIMINATION TEST (VE-379)
The colour discrimination of the frame and subframe images were measured using the Farnsworth-Munsell 100-Hue Test. The 85 colour test caps were arranged on an opaque white polyethylene board measuring 400mm x 200mm with controlled overhead lighting. A frame image was taken of the 85 test targets followed by subframe images of each colour cap. These images were saved and analysed to measure colour discrimination. The VE-379 scored 270 (low discrimination) for colour discrimination in the frame image and 40 (average discrimination) in the subframe image. Figures 4a and 4b show the colour discrimination test results for the low resolution and high resolution images respectively.
The scores for each colour discrimination test were calculated as follows:
- Chromaticity values are grouped according to the colour region they occupy (e.g. Red-Yellow).
- The chromaticity values for each region are plotted on a X-Y graph.
- Each chromaticity value is labelled based on the corresponding colour cap number.
- The order of the chromaticity values in the x direction is determined.
- A raw score is calculated by measuring the distance between the label numbers of each point and its two neighbours (a perfect result will give a raw score of two for each chromaticity value).
- The score for each region is calculated by subtracting the perfect score value (2 x No. of values in region) from the raw score.
- The total score is calculated by summing the scores for each region.
Scoring: 0 - 16 = Superior 20 - 100 = Average over 100 = Low
3.3 INTENSITY DISCRIMINATION TEST (VE-379)
Intensity discrimination of the VE-379 was tested using a Neutral Density Test Target set with 32 distinct grayscale steps. The test targets were arranged on an opaque white polyethylene board measuring 400mm x 200mm. A frame image was taken of the 32 test targets followed by subframe images of each grayscale target. These images were saved and analysed to measure intensity discrimination. The results show that the VE-379 delivers 100% intensity discrimination for the 32 grayscale steps in both the frame and the subframe images. Figures 5a and 5b show the intensity discrimination test results for the frame and subframe images respectively. The theoretical intensity discrimination of 256 gray levels for the VE-379 cameras was not be tested since the maximum number of grayscale steps available in an off-the-shelf test target is 32.
3.4 SUBFRAME RATE
The subframe acquisition rate of the VE-379 was tested using a custom test target. The target measured 200mm x 400mm and consisted of 30 black numbered circles on a white background. The VE-379 was able to process the frame image to identify the circle locations and consistently capture high resolution subframe images of 25 circles all within one second. The actual total number of subframes acquired depends on the time required to process the frame image to identify objects for subframe capture.
![]()
Figure 3a. VE-379 MTF Over Field of View
![]()
Figure 3b. VE-379 MTF For Various Target Orientations
![]()
Figure 4a. VE-379 Frame Colour Discrimination.
(Averaged over a 10 x 10 pixel region)
![]()
Figure 4b. VE-379 Subframe Colour Discrimination.
(Averaged over a 150 x 150 pixel region)
Figure 5a. VE-379 Frame Intensity Discrimination
Figure 5a. VE-379 Subframe Intensity Discrimination 4.0 CONCLUSION
The VE-379 machine vision system is capable of emulating several of the human eye functions essential for human equivalent inspection. Spatial resolution, intensity discrimination, colour discrimination and subframe acquisition rate of the VE-379 have been tested and results are comparable to human eye performance. Based on the test results the VE-379 is well suited for human equivalent industrial inspection tasks.
Table 1 shows the comparative performance between the VE-379 and the human eye.
Table 1: Comparative Performance Between Human Eye and VE-379.
HUMAN EYE VE-379 FIELD OF VIEW Peripheral
Foveal / Subframe
2.6 rad (14) 10-20 mrad (15)
1 rad 17 mrad
SPATIAL RESOLUTION Foveal / Subframe
~0.1 mrad ~0.1 mrad INTENSITY DISCRIMINATION 0.02 (16) 0.03(17) COLOUR DISCRIMINATION Superior (0 - 16) Average ( 20 - 100)
Low (100 + )
Frame score = 270 Subframe score = 40
SACCADIC FIXATION RATE Subframe Rate
10 Hz (18) 30 Hz SPATIAL RESOLUTION @ 500mm Foveal / Frame / Subframe
4 LPMM 0.5 LPMM @ MTF 0.5 (frame) 5 LPMM @ MTF 0.32 (subframe)
LENS FOCAL LENGTH ~20 mm (19) 6mm (frame) 60-300mm (subframe) adjustable
APERTURE (Variable) 1.5 to 8 mm (20) 0-4mm (frame) adjustable 10mm (subframe) adjustable
5.0 REFERENCES
1. Smith, W.J., Modern Optical Engineering, McGraw Hill, New York, p. 121. 1990.
2. G. Hugh Begbie, Seeing and The Eye. An Introduction to Vision, The Natural History Press, New York p. 93. 1969.
3. Wilman, C.W., Seeing and Perceiving, Perguman Press, Great Britain, 1976.
4. Overington, I., Vision and Acquisition, Pentax Press, London, p. 224. 1976.
5. Overington, I., Vision and Acquisition, Pentax Press, London, p. 86. 1976.
6. Gosine, R. G., Investigation of Computer Vision Techniques for Surface Parasite Detection. Contract Report for Canpolar East Inc. 1994.
7. Overington, I., Vision and Acquisition, Pentax Press, London, p. 7. 1976.
8. Overington, I., Vision and Acquisition, Pentax Press, London, p. 37. 1976.
9. Smith, W.J., Modern Optical Engineering, McGraw Hill, New York. 1990.
10. Overington, I., Vision and Acquisition, Pentax Press, London, p. 48. 1976.
11. Smith, W.J., Modern Optical Engineering, McGraw Hill, New York, p. 126. 1990.
12. Overington, I., Vision and Acquisition, Pentax Press, London, p. 50. 1976.
13. Farnsworth, D., The Farnsworth-Munsel 100-Hue Test for the examination of Colour Discrimination Macbeth, Division of Kollmorgen Instruments Corp., New York. 1957.
14. Smith, W.J., Modern Optical Engineering, McGraw Hill, New York, p. 122. 1990.
15. Overington, I., Vision and Acquisition, Pentax Press, London, p. 7. 1976.
16. Smith, W.J., Modern Optical Engineering, McGraw Hill, New York, p. 126. 1990.
17. Actual Intensity Discrimination test was limited to 32 gray levels which is the maximum number of steps in available grayscale targets.
18. Overington, I., Vision and Acquisition, Pentax Press, London, p. 37. 1976.
19. Smith, W.J., Modern Optical Engineering, McGraw Hill, New York, p. 121. 1990.
20. Overington, I., Vision and Acquisition, Pentax Press, London, p. 7. 1976.
![]()
44 Austin Street, St. John's, NF, Canada A1B 4C2
Phone: (709) 722-6067
Fax: (709) 722-1138
http://www.vetech.com/
e-mail: info@canpolar.com