Exploring image retrieval systems Exploring image retrieval systems

Increasingly there is a need to be able to efficiently search and browse large databases of images. For example, users often search the web looking for images of a certain type of object or scene: perhaps a picture of a beach at sunset. Typically searches such as these are performed using keywords. So for example, the user looking for a beach at sunset might enter the words "beach" and "sunset" into a search engine and browse the resulting matches looking for relevant images. It has been argued that a useful alternative mode of searching would be to provide a basic outline sketch of the image that is being searched for and that a search engine would find images conforming to this sketch. Alternatively, the user might already have an example of the kind of image they are looking for and would like to be able to search for similar images. In such cases, it would be useful were it possible to perform the search based on the content of the image. That is, we would like to be able to extract some information about the image content from an image and to then search for other images which have similar content.

Indeed, a number of researchers have developed quite sophisticated image search engines [9,10,1] which search using image content. In particular, many researchers have looked at using the colour information in an image to find similar images. Such an approach is often sensible; for example, consider again the case of an image of a beach at sunset: such images will likely contain large regions of yellow sand, a red or orange sun, and perhaps blue from the sky and the sea. Were we able to find all images containing such colours we might expect to find amongst them a large number of beach scenes.

In the Colour Group we are interested in further exploring the use of colour in these image retrievalindexing applications. Below we provide a brief introduction to the use of colour as a tool in this area and outline the main research themes of the Colour Group in image retrieval.

Colour indexing

Before we look at using colour as an aid to image retrieval it is helpful to have an understanding of how colour information is represented in an image. To understand this, consider the image shown in Figure 1. The colour information in this image takes the form of an  triplet at each pixel in the image. That is, at each pixel we have 3-numbers, represented by  , and  which tell us respectively, how red, green, and blue the colour at that pixel is. Colour then is a 3-dimensional phenomenon and so can be conveniently represented in a 3-d co-ordinate system whose axes correspond to the , or information. So, colour at a pixel can be represented as a point in this three dimensional space. Figure 1b shows the colours of the image in Figure 1a represented as points in this three dimensional space. Importantly, an image cannot contain arbitrary , , and responses but rather the triplet of values at a pixel are restricted to some finite range. For example, for a typical image, each of , and  is coded using 8-bits and so a response can take an integer value between 0 and 255. This implies that the set of possible colours observable in an image is restricted to a cube often referred to as the  cube.

Figure 1: An image (left) and the distribution of its colours in  space (right).

An image then is a representation of the colour information in a scene but for the purposes of image retrieval it is not a very useful one. To see why, consider trying to use this representation to compare the colour information of two images. To do so would involve comparing colour information on a pixel by pixel basis - a computationally expensive procedure. Furthermore, other immediate problems arise such as how do we compare two images of different size? And what happens if the two images contain the same single object but differently located in each case?

Figure 2: An example of a 3-d colour histogram for the image shown in Figure 1. The blueness of a block corresponds to the number of pixels in the image with the corresponding .

Swain and Ballard [11] who were the first researchers to investigate the use of colour as an aid to colour indexing suggested that these problems could be overcome by using an alternative representation of the colour information in an image. The representation they proposed was the colour histogram. The colour histogram of an image is formed in the following way. First, the  cube is divided into a set of discrete cells or bins. Then, for each pixel in the image it is determined which of these bins its  value falls within. A count of the number of pixels assigned to each bin is kept and after all pixels have been processed these counts are divided by the total number of image pixels. Thus we have at each bin, a measure of the proportion of image pixels which fall into that bin. Figure 2 is an illustration of the histogram for the image in Figure 1. Each region or bin of this figure is coloured more or less brightly according to what proportion of the image pixels fall within it.

Swain and Ballard showed that by representing colour information in this way it was possible to discriminate between a reasonably large set of colour images. This work was only the beginning of an ongoing investigation into the problem of colour based image retrieval. In the Colour Group we have continued this investigation in a number of important ways. First, it is well known that the s recorded in an image depend not just on the properties of the surfaces being imaged, but also on, amongst other things, the light under which they are imaged. If this fact is not taken into account it is easy to demonstrate [5] that colour indexing will fail. An overview of the contributions made by the Colour Group towards overcoming this problem can be found here. A second issue with colour-based indexing as it was originally proposed also relates to the fact that "colour" is represented as . In fact, there are many alternative ways to represent colour information at a pixel and we in the Colour Group have been investigating which of these representations is most appropriate to the task at hand. A third issue with the original colour-indexing work pertains to its use of the colour histogram to represent the colour information in an image. While this is a better representation than the image itself, comparing colour histograms can still be a computationally demanding problem. To overcome this we have been investigating still more efficient representations of the colour information in an image. Some of the contributions we have made to the latter two issues are outlined below.

Compressing chromaticity histograms

Figure 3 shows an  image (left) and its corresponding 2-d chromaticity histogram (right). Here we have represented the chromaticity histogram itself as a grey-scale image: each pixel of the image corresponds to a bin of the quantised chromaticity space and the brighter the pixel the greater the corresponding histogram value. Thinking of the chromaticity histogram as an image is useful since it leads naturally to the idea of applying conventional image compression techniques to the compression and representation of the histogram information.

Figure 3: An image an its corresponding 2-d chromaticity histogram displayed as an image (right).

The most widely used image compression technique is JPEG which is founded on the observation that a lot of image data is redundant and can be discarded with little or no loss in image quality. An example of this redundancy is the fact that most images contain large regions of identical or very similar pixel values. Another way of saying this is that pixel values change slowly across the spatial extent of an image. Information about how quickly pixels change across the image can be formally quantified by looking at a spatial frequency representation of the image. There are many transforms of an image which provide such a representation, for example the Discrete Fourier transform (DFT) or the Discrete Cosine Transform (DCT). Figure 4 shows a grey-scale version of the image in Figure 3 together with its corresponding Discrete Cosine Transform representation (the representation used by JPEG). In this representation each point (or pixel) corresponds to a particular spatial frequency and the value of the function at this point tells us something about how much of the image information is captured at that pixel. From the figure it is clear that most of the image information is contained in a few spatial frequency elements and that these elements correspond to low frequencies. JPEG works by throwing away information which is held in the higher spatial frequencies of the image, thus reducing the storage requirements for an image.

Figure 4: An image an its corresponding 2-d chromaticity histogram displayed as an image (right).

In our work on coding colour histograms we have investigated whether such a frequency coding of a 2-d histogram is an efficient coding. We discovered that the answer is yes: the salient information held in a colour histogram is adequately represented by just a few spatial frequency components. To understand how the method works, consider a DCT of a 2-d histogram. Each element of the cosine transform corresponds to a 2-d cosine signal of a certain frequency. For example, Figure 5 shows the first three spatial frequency functions. The value at a point in the transform tells us how much the corresponding frequency contributes to the histogram. So, to reconstruct a histogram from its frequency representation we take a weighted sum of each frequency components where the weights are determined from the frequency representation. Up to now we have not reduced the amount of data needed to represent a histogram: there are as many spatial frequency elements as there are bins in our histogram. But, since most of the elements in the frequency representation are small, they contribute little to the weighted sum from which the histogram is reconstructed. Thus, by taking a weighted sum of just a few frequency components we can obtain a very good approximation to the original histogram. What is more, we can now represent the histogram by just a few numbers: the weights of those frequencies required to give a good approximation to the original histogram.

Figure 5: The first 3 basis functions of the discrete cosine transform.

In our work [7] we have shown that very good histogram compression can be achieved in this way: typically  2-d chromaticity histograms can be accurately represented by just 64 frequency components which represents a factor of 4 reduction in data. We have shown also that a range of different frequency transforms (DCT, DFT, Hadamard etc.) can all be used to successfully compress the data.

Hybrid Coding of Histograms

Frequency transforms such as DCT, DFT, and Hadamard etc. differ in their underlying spatial frequency functions. In the case of the the DCT these functions are 2-d cosine functions of different frequencies (as shown in Figure 5) whereas in the case of DFT the functions are combinations of sine and cosine functions. Since these functions form the basis of our histogram representation it might be thought that the particular set of functions we choose would have an effect on the accuracy of the representation. This raises the further question as to whether there is a particular representation which is optimal? It is well known both that the choice of basis function is important and that one particular transform - the Karhunen-Loeve transform - is optimal in the sense that it provides the most parsimonious description of the data. The drawback of this transform is the fact that the particular set of basis functions it employs are not fixed (as they are in DCT, DFT, etc.) but must be discovered from the particular data being modelled. Essentially the basis functions are derived from a Principal Components Analysis [8] (PCA) of (in this case) the colour histograms. For such basis functions to be accurate it is important that the data from which they derived adequately samples the underlying population from which they are sampled.

Unfortunately for the particular domain of image retrieval and colour histograms it is difficult to ensure that basis functions derived in this way do adequately represent the data which is encountered in practice and so the Karhunen-Loeve transform is inappropriate for an analysis of colour histograms. A major reason for this is the fact that colour histograms tend to be quite sparse structures - many of the bins in a histogram are unoccupied and thus discovering a robust set of basis functions requires more training data than can feasibly be employed.

However, we have found that Principal component analysis can be used to further compress the histograms once an initial compression has been performed. To understand this, consider that we have applied, for example, a DCT coding to our colour histograms so that each histogram is now represented by some small number (say 64) of DCT coefficients. That is, a histogram can now be represented as a point in a 64 dimensional space. Given a set of such points (corresponding to many different colour histograms) we can apply a PCA to these points to determine a set of basis functions. In doing this we have found that 64-d DCT coefficient vectors corresponding to colour histograms can be adequately represented by a small number (say 8) principal components so that a second stage of compression can be achieved. Key to the success of PCA at this stage is the fact that the data being analysed (DCT coefficients) is no longer sparse. This fact means that the discovered components provide a robust representation of the underlying data.

In summary then our method provides a way to reduce the data required to accurately represent the colour information in an image by a factor of approximate 32 by applying a two stage hybrid coding scheme in which PCA is applied to a reduced number of DCT components derived in turn from the original colour histogram.

Representing Colour Information

In another strand to our work on image retrieval we have been investigating the role which the representation of ``colour'' plays in accurately retrieving images. There are a great many different representations of colour (for example,  etc.) each developed with a particular task in mind. For example the  representation loosely codes information about the ``colour'' of the underlying light it represents in terms of how much long-wave (red) medium-wave (green) or short-wave (blue) light it contains. Such a representation is the natural way to encode images captured by a camera since  values correspond to what the camera actual measures. It is useful to if images are intended for display on a CRT monitor since in this case the monitor output is driven by three values which control how much energy is emitted by three different phosphors which again typically produce light in long-, medium-, and short-wave regions of the spectrum.

In the case of image retrieval however, it is not clear that the  representation of an image is necessarily the most efficient. In our work we have investigated alternative colour representations and in particular have focused on an opponent representation. This representation is a simple linear transform of  values:


This opponent representation codes colour as an intensity value (WB), and two chromatic values: redness/greenness (RG) and blueness/yellowness (BY). It is thought that the human visual system encodes colour this way [6] and it has been shown that such an encoding is well justified in information theoretic terms [3] that is it is an encoding which maximises the information in a signal.

We modify this opponent representation by working not with  values but rather with  values so that our coding of image colour is:


Working in this space is useful in dealing with the situation in which we would like to compare images captured under different illuminants. Since a change in illumination changes the s in an image it must follow that the colour histograms will also change. Such a change leads inevitably to a failure of image retrieval under varying illumination conditions. To solve this problem we must model and discount the effect of the illuminant. It is well known that under certain circumstances [4] the s of a single surface captured under two different lights are related by a diagonal matrix transform:


That is, image s under illuminant  are related to the original s by three independent scale factors . Now, observe what happens under this model in log-opponent space:


What this tells us is that under a change in illumination all image log-opponent co-ordinates change by a translational term. It follows that to remove the effect of illumination we need only subtract the mean chromaticity from each image pixel. From these mean subtracted chromaticities we can form our colour histograms and compress them as described above, but now we have they have the advantage that they are invariant to a change in illumination. Experiments show [2] that such a representation leads to greatly improved retrieval rates under a change in illumination.

Work is ongoing to further improve colour based indexing and retrieval. In particular we are investigating how to incorporate some image spatial structure into the features on which we index. 


  1. Bach, J.Fuller, C., Gupta, A., Hampapur,A., Horowitz, B., Humphrey, R., and Jain, R. The virage image search engine: An open framework for image management. In SPIE Conf. on Storage and Retrieval for Image and Video Databases, volume 2670, pages 76-87, 1996.
  2. Berens J., and Finlayson, G.D.,Log-opponent chromaticity coding of colour space. In Proceedings of the 15th International Conference on Pattern Recognition, pages 1206-1211, September 2000.
  3. Buchsbaum, G., and Gottschalk, A.,. Chromaticity co-ordinates of frequency-limited functions. Journal of the Optical Society of America, A, 1(8):885-887, 1984.
  4. Finlayson, G.D., Coefficient Colour Constancy. PhD thesis, Simon Fraser University, 1995.
  5. Funt, B. Barnard, K., and Martin, L.,. Is machine colour constancy good enough? In 5th European Conference on Computer Vision, pages 455-459. Springer, June 1998.
  6. Hurvich, L.M. and Jameson, D.. An opponent-process theory of color vision. Psychology REview, 64, 1957.
  7. Finlayson, G.D., Berens, J., and Qiu, G., Image indexing using compressed colour histograms. In The Challenge of Image Rerieval. 1999.
  8. Jolliffe, I.T., Principal Component Analysis. Springer-Verlag, New York, 1986.
  9. Niblack, W., and Barber, R., The QBIC project: Querying images by content using color, texture and shape. In Storage and Retrieval for Image and Video Databases I, volume 1908 of SPIE Proceedings Series. 1993.
  10. Stricker, M., and Orengo, M. Similarity of color images. In SPIE Conf. on Storage and Retrieval for Image and Video Databases III, volume 2420, pages 381-392, 1995.
  11. Swain, M.J. and Ballard, D.H. Colour Indexing. International Journal of Computer Vision, 7(1):11-32, 1991.