S. Utku Ay
One day an optometrist was talking to his profoundly-blind patient about the possibility of an eye implant that would give him 16 (4x4) pixels of visual information. The patient then told the doctor “Sometimes I just need one pixel; I want to see whether the light is on or off.”
Human beings are visually-oriented in their daily life; they use the sense of sight more than any of the other senses with which they have been endowed. The modern understanding of human vision and the underlining principles were only discovered in the past couple centuries. The nineteenth and twentieth centuries witnessed the development of photographic and digital imaging camera systems, which partially mimic human visual systems. We will open a small window on the history of human vision and camera systems, and try to compare today’s state-of-the-art cameras with the human visual system, focusing mainly on solid-state image sensors, or camera chips, and the image-sensing element of the human visual system, the eye.
History of human vision
Human vision has been the subject of conflicting interpretations since ancient times. Many ancient physicians and philosophers believed in the theory of extramission, or the active eye. According to this theory, the eye perceives objects by emanating light and seizing objects with its rays. It was in medieval Islamic culture that research on human vision and optics developed into a system similar to the modern theory of vision. Among others, Ibn Al-Haytham (Alhazen) (965-1040 A.D.), a Muslim physicist, astronomer, and mathematician in the tenth century, played a great part in this field by promoting the intromission theory which states that vision only occurs because of light rays entering the eye. Ibn Al-Haytham founded physiological optics, which distinguished the functioning of the eye from the behavior of light. On the other hand, ten centuries after Ibn Al-Haytham, Winer et al. (2002) have found recent evidence that as many as 50% of American college students believe in the extramission theory.1
Although the fundamental features, anatomy, and physiology of the eye were documented by Galen (129–200 A.D.), an ancient Greek physician, in the second century A.D., it was Kepler, a close reader of Ibn Al-Haytham, who offered the first theory of the retinal image and the correct operation of the eye in 1604. He proclaimed, “Therefore vision occurs through a picture of the visible things on the white, concave surface of the retina.” Progress came slowly after Kepler, because little was known about the nervous system until the nineteenth century, and only recently have scientists acquired a more knowledge about how the brain apprehends the retinal image. But many questions still elude us.
History of the camera
In parallel with curiosity about human vision, human beings have also tried to mimic human vision by capturing images of objects with instruments. Around 1000 A.D., Ibn Al-Haytham, also known as the father of modern optics, invented the pinhole camera,2 and explained why the image was upside down. It was Johannes Kepler who further suggested the use of a lens to improve the pinhole camera in the 1600s. Capturing an image on a photographic plate was first achieved in the early 1800s. Consequently, photographic cameras began to be mass-marketed in the twentieth century. The photographic equipment with which we are all familiar today, such as the 35mm camera, flash bulb, Polaroid camera, and the point-and-shoot auto focus camera, all were developed in the twentieth century. The invention of the camera as we know it today paved the way for other technologies, including the moving image capture, and, later, the digital camera, in which electronic image-capture devices were used. In 1972, chemically processing an image onto photographic paper no longer became the sole destination of an image, because the first filmless electronic camera was patented by Texas Instruments Corporation. Filmless electronic cameras were made possible with the invention of solid-state image-capture devices called charge coupled devices (CCD) and metal-oxide-semiconductor (MOS) image sensors in the late 1960s. Since the invention of solid-state imagers, people have become more visually stimulated and oriented than ever before in history.
A comparison of camera chips and the human eye
The technological advancements of solid-state image-capture camera chip design and manufacturing during the past twenty-five years has made digital imaging more affordable and accessible to the general public. These advancements have become more visible to consumers in mobile products, particularly in cellular phones, in which there are still and video-camera functions. Although digital cameras are easily available today, the state-of-the-art image sensor chips used in these cameras exhibit a performance gap when compared with the capabilities of the human eye. How good these image sensor chips are today when compared to our eyes is a question that will be elaborated on.
It is possible to compare the capabilities of the human eye and state-of-the-art image sensor chips used in cellular phones or in mainstream PC and digital still cameras. It is also possible to compare the capabilities of the human visual system, including the eyes, the optic nerve, the visual cortex, etc. with a digital camera system which includes optics, image-capture and signal-processing chips and other camera apparatuses. The capabilities include ability to see different colors (spectral response), photo-element (pixel) characteristics (size, density, distribution), light sensitivity, light-intensity response range, functionality and operation modes, and signal processing capabilities.
Spectral response
A single light-sensing element in a solid-state image sensor is called a pixel. In the human eye it is called the photoreceptor. Both elements convert impinging light or photons into electrical signals. The human eye sees in the so-called visible spectrum, between 380nm (blue) and 750 nm (red), and utilizes two kinds of photoreceptors on the retina; rods and cones. The cones are used for color and daylight vision. Rods are responsible for night vision. There are three types of cone photoreceptors on the retina that contain different types of photosensitive pigments. The three types of cones are L, M, and S, and they have pigments that respond best to wavelengths of light that are long or red (peak at 564 nm), medium or green (peak at 534 nm), and short or blue (peak at 420 nm), respectively. The rods (R) are most sensitive at a wavelength of approximately 498 nm (green), as seen in Figure 1.3 Image sensor pixels in digital cameras mimic the photoreceptors in the human eye for color vision. They utilize three kinds of color filters (red, green, blue) on top of each pixel to convert light rays into electrical signals in different visible spectrums. Unlike the cones in the human eye, camera pixels and color filters can be designed to cover wide spectrums that are not visible to the human eye, for instance, the x-ray, ultraviolet, and infrared spectrums. In the category of spectral response range, camera pixels exhibit greater flexibility than those of the photoreceptors of the human eye. On the other hand, interestingly enough, the eyesight that humans possess has similar spectral characteristics as the sun. The solar light emission peaks in the visible spectrum as seen in Figure 2.4
Figure 1. Spectral absorption curves of the short (S), medium (M), and long (L) wavelength pigments in human cone and rod cells.3
Figure 2. The daylight solar spectral power distribution on earth.4
Pixel and array size
The size of pixels in today’s modern digital cameras is getting closer to the size of the photoreceptors in human eye. The typical human eye contains an average of 130 million photoreceptors. The diameter of the rods and cones varies between 1.0m and 8.0m, depending on their location on the retina.5 Today’s state-of-the-art image sensor chips contain 10 to 30 million pixels. Each pixel can be as small as 1.4m in diameter. To date there has been no image sensor that is 1.4m pixel in size or more than 8 million pixels. However, the human being has been equipped with photoreceptors that are as small as 1.0m and has more than 100 million photoreceptors; and this is since the beginning of existence. It is also estimated that the resolution of the human eye is equivalent to an imager sensor chip of 576 million pixels with a 120 degree field of view.6 Thus we still have a long way to go in improving the image-sensor pixel and array sizes used in cameras if we are to match the human eye.
Pixel distribution and formation
In the human eye the photoreceptor size and densities change, depending on their location on the retina. For example, no rods exist on the focus center of the eye, which is called the fovea. Color vision photoreceptors, which total only 10% of the eye’s photoreceptors, are located mostly on the fovea. There is an irregular distribution of photoreceptors which is unique for every human being, like a fingerprint. Yet, we all see things the same, such as colors (with the exception of people who are colorblind). In camera chips, however, pixels are arrayed regularly, in two-dimensions. As the image-processing techniques and algorithms used in camera systems are linear and do not closely mimic the signal processing that exists in the human visual system, regularly arrayed pixels are required.
Light sensitivity and response range
Although the pixel sizes in image-sensor chips are approaching the size of the photoreceptors in the human eye, camera systems are not yet close to being able to match performance in terms of light sensitivity and response range. The human visual system and photoreceptors can easily adapt to very dim and bright light, with a light-intensity response range of ten billion to one (1010:1).7 This response range goes from light conditions on a bright sunny day to dim night vision. Typically, a conventional consumer camera pixel has a light intensity response range of one thousand to one (103:1).8 In a camera system, details of a captured scene are either concealed in the dark regions or washed out by the bright light, depending on the exposure settings of the system. Thus, one could say that the human visual system works ten million times (107) more efficiently than that of consumer cameras in terms of transferring scenes into images.
Operation principle
In terms of operation principles, the photoreceptors in the human eye convert light rays into electrical signals with extremely rapid electro-chemical reactions which can detect a single photon. Typically, in the image sensor pixel of a digital camera the photoelectric effect is used to convert impinging photons into electrical charges. Electrical charges are collected and stored in each pixel during the exposure period. Collected electric charges in each pixel are amplified and converted into digital ones (logic-1) and zeros (logic-0) during image readout before the image is sent to higher processing elements, such as a personal computer, digital-still or video camera. It is possible for a single photon-counting camera to be developed. However, very special and larger pixel sizes and extra apparatuses are required to build such a camera system. Thus, we could say that it is almost impossible to build imaging pixels that have the capability and dimensions of the photoreceptors of the human eye with today’s state-of-the-art technology.
Signal processing capabilities
The captured image in the human eye is preprocessed before it is sent to the visual cortex of the brain. This preprocessing consists of a data reduction operation in which nothing is lost, with a compression ratio of 130 to 1, as only 1 million optic nerves leave each eye carrying the information from 130 million photoreceptors. This compression allows the brain to process information at a rate of 25 to 150 scenes or frames per second. Typically, every pixel in an image sensor chip is first transferred to higher processing units. A data compression method is either carried out with some loss of details in the image or the compression is never used. The transfer of frames in camera chips typically takes place sequentially, reducing the speed of the image-capture operation or frame rate. Different techniques are used to maintain a capture rate of, at most, 25 frames-per-second in camera chips. With today’s technology, image sensors that have a capture rate of one million frames per second have been proposed and can be manufactured for scientific applications.
The inherent inefficiencies of image-capture in today’s image-sensor chips are hidden by employing the limitations of the human eye. For example, solid-state image sensors have always been produced with row or column-vice uncanny stripes which are easily picked up by the human eye. However, psycho-visual experiments have shown that the human eye can only detect contrasts between two adjacent gray lines when the difference is greater than 0.5%. Thus, if a camera chip is designed to have a column to column or row to row contrast of less than 0.5%, these odd stripes would not be visible.
Conclusion
Humans are visually oriented and without a doubt, our eyes are considered to be our primary source of information. It is obvious that the human visual system is extremely complex and this complexity has fascinated human beings throughout history. Yet, the underlining principles and basic functions of human vision and the eye have only been discovered during the last two centuries. These discoveries have led research in how to mimic these functions, which has resulted in moving and still-photographic and camera equipment, and the image sensors chips used in digital cameras today. Even though human beings are only taking baby steps in fully mimicking the human eye, curiosity and scientific inquiry allows us to discover functions and features of the eye and the visual pathways that will increase our knowledge and help us to build better pixels and image sensor chips.
References
1. Winer, G. A., Cottrell, J. E., Gregg, V., Fournier, J. S., & Bica, L. A., “Fundamentally misunderstanding visual perception: Adults’ beliefs in visual emissions.” American Psychologist, 57, 417-424, 2002.
2. Ertan Salik, “Pinhole Cameras, Imaging, and The Eye” The Fountain Magazine, Issue 54, pp. 30-33, April – June 2006.
3. URL: http://en.wikipedia.org/wiki/Image:Cone-response.png
4. URL: http://www.handprint.com/HP/WCL/color3.html
5. Stefan Winkler, Digital Video Quality – Vision Models and Metrics, John-Wiley & Sons, Ltd., 2005.
6. URL: http://www.clarkvision.com/imagedetail/eye-resolution.html
7. R.C. Gonzalez and R.E. Woods, Digital Image Processing, Addison-Wesley, 1993.
8. M. Schanz, et al., “A high-dynamic-range CMOS image sensor for automotive applications”, IEEE Journal of Solid-State Circuits, vol. 35, no. 7, pp.932-938, July 2000.