Thursday, August 5, 2010

Beware the Library Search My Child!

Apologies to Lewis Carroll for butchering the first line of his nonsense poem “Jabberwocky” (for a chemistry version of the poem called ”Beware the Physical Chem” click here: https://www.alphachisigma.org/Page.aspx?pid=536 ). But I do mean what I say in the title, beware of library searching. It is a frequently abused procedure in infrared spectroscopy, and if it is used improperly it can be dangerous.

Back in the day before personal computers (boy I’m giving my age away here) library searching was done by eye. Sadtler Research compiled thousands of paper copies of infrared spectra in green three ring binders, and the user had to flip through them comparing the sample to the reference visually. Thanks to modern computers the comparison job has been automated. Library search programs use algorithms instead of visual comparison to make decisions about match quality. A number, called a Hit Quality Index (HQI) is calculated for each comparison. Then, the best matches are shown in a search report.

I have seen people do a library search, look at the first result in the search report, declare “that is it” and go on their merry way without ever looking at the spectra. This is a recipe for disaster! The library search program will always give you a result, even if it is a bad result. Just because the HQI is a number that comes out of a computer does not sanctify it. Remember, computers are programmed by people, and computer programs make mistakes as easily as people do.

Another pitfall people fall into is over interpreting the HQI. When an HQI of 100 is a perfect match I’ve seen people interpret a 95 as meaning “there is a 95% probably that I identified the sample correctly, or “the samples are 95% the same” or “the spectra are 95% similar”. All of these ideas are wrong. The HQI is not a probability or a percentage, it has no units. The value of the HQI varies with a number of things including the search algorithm and spectral regions used in the search. The HQI simply orders the matches for a given search…that is all!

Another reason to beware the library search is search algorithms. The problem here is that we are trying to automate a visual comparison by substituting a calculation for it. As spectroscopists we know what the peaks mean and what noise and artifacts look like but search algorithms do not. It can happen that two spectra of radically different samples can give a high HQI if by coincidence the noise and artifacts in their spectra are similar. Additionally, the spectra of similar samples can give a low HQI if the noise and artifacts in their spectra are different by chance.

There is one simple solution to the problems of library searching, ALWAYS VISUALLY COMPARE THE SPECTRA! Look at your sample spectrum and the library matches and draw your own conclusions about what is the best match, do not rely on the HQI by itself to make the decision for you. In any competition between your eyeballs and the search algorithm, your eyeballs win. The computer is not smarter than you, it is faster than you. The purpose of a library search is to narrow down the possibilities for you. It is your job as the human being performing the library search to interpret the results to arrive at your own conclusions.

If you follow this advice, you will no longer have to beware the library search.