In 1998, the genome of the H37Rv strain was published. The size of the genome is 4 million base pairs, and has 3959 genes, out of these 40% of genes have their function characterised and another 44% are postulated with possible functions. Six pseudo genes are present within the genome.
Genome contains 250 genes involved in fatty acid metabolism, with 39 of these involved in the polypeptide metabolism generating the waxy coat. Such vast numbers of conserved genes show the revolutionary importance of the waxy coating to pathogen survival. M. tuberculosis can grow on the lipid cholesterol as a sole source of carbon, as genes involved in the cholesterol use pathways that have been important during various stages of the infection with tuberculosis. Mycobacteria that are isolated from lungs of infected mice were shown to use fatty acids over carbohydrate substrates, especially during the chronic phase of infection when other nutritional sources are not available.
10% of the coding genes are taken up by the amino acids that encode acidic, glycine-rich proteins. These proteins have a conserved N-terminal motif, deletion of which leads to impaired growth in granulomas and macrophages