The development of diagnosis tools that perform automatic detection of diagnostic finding or that identify areas of interest in medical or microscopic imaging has been the subject of research and development in recent decades due to advances in imaging equipment, and in computer vision and pattern recognition areas.

Automatic detection and recognition of mycobacterium tuberculosis in light microscopy have been active areas of research in recent years (Sena, 2007, Costa et al,2008, Sadaphal et al, 2008, Raof et al,2008,Makkapat et ali,2009, Khutlang et al, 2010a, Khutlang et al, 2010b, Xavier, 2013, Levy, 2012, Costa Filho et al 2012a, Costa Filho and Costa, 2012, Costa Filho, Costa and Kimura Junior, 2013)

Several studies about new techniques and algorithms in the areas of detection and recognition of clinical findings have been reported.

Nevertheless, many of the successful claims reported are from studies whose images used to validate the techniques did not represented the typical images captured in practical applications (images are captured under strictly controlled environment) or the number of images is too small. Thus, a database built this way can lead to results favoring one method over another. On the other hand, robust databases have the great potential to help researchers to evaluate and improve their algorithms about objects detection and recognition, and other purposes.

It is known that the performance comparison between algorithms is only possible when they are tested and validated with the same data set. In some areas, especially in new areas of research, this fact constitute an impediment because still has not been given due attention to the development of robust image databases that can be used as a benchmark to test the performance of these new algorithms.

Within this context, the Biomedical Engineering research group from the Federal University of Amazonas, which published the first results about automatic detection of Mycobacterium tuberculosis in images of light microscopy (SENA, 2007, COSTA et al,2008), have built the first image database of conventional sputum smear microscopy of tuberculosis patients. Now this database is being available to the other research groups to test their algorithms and become them comparable with other proposals developed. Finally, it is expected to potentiate the development, in the shortest possible time, systems to support the automatic diagnosis of tuberculosis.

The sputum smear slides of which were acquired images to the data base were used as controls for staining of the project "Evaluation of alternative dyes for diagnosing Tuberculosis" held from 2008 to 2010, which was approved, without restrictions, under protocol number 186/08 by the Ethics Committee on Human Research – CEP/INPA (National Institute for Research in the Amazon),

The sputum samples are from patients suspected of pulmonary TB. Kinyoun stain was the method of staining used. It is similar to the Ziehl-Neelsen stain, but does not involve heating the slides being stained.

Two types of database were composed:

Image database 1: Out-of-focus-blur-evaluation (TB_IMAGE_DB_FOCUS.V1)

It is known that out-of-focus image is one of the important factors which may influence tasks of detection and recognition. To investigate the effect of out-of-focus blur on the performance of bacillus detection system, an image database was built. From each field from sputum smear 10 images were acquired with different focal lengths. The step of focal-length was 2,5μm.

Figure 1: Examples of two set of microscopy images (ten images were acquired with different focal lengths to each set).


Image database 2: bacilli-detection-evaluation (TB_IMAGE_DB_BACILLI.V1)

This image database is intended to enable performance comparison between algorithms and techniques for detection and recognition of bacilli. It comprises 120 images which were obtained from sputum smear microscopy slices of 12 patients (10 fields for each patient).

Based on the background content, the database is divided in two groups. Group 1 consists of images with high density of background content (HDB). Group 2 consists of images with low density of background content (LDB). The HDB group is characterized by a strong presence of counterstain with methylene blue solution in the background, while the LDB group is characterized by a weak presence of this same counterstain(see examples in figure 2).

Figure. 2. (a) Image with high density of background content. (b) Image with low density of background content

In all the 120 images, the identified objects were enclosed within a geometric shape by a specialist. A true bacillus was enclosed in a circle. An agglomerated bacillus was enclosed by a rectangle and a doubtful bacillus (the image focus or geometry does not allow a clear identification of the object) was enclosed by a polygon. The images with marked objects could be used as gold standard to evaluate the accuracy; sensitivity and specificity of bacilli recognition (see examples in figure 3).


Figure 3. Examples of images with artefacts identified by specialist (a) LDB image (b) HDB image (Circle – true bacillus; Polygon – doubtful bacillus; rectangle – agglomerated bacilli)