Skip to main content

Unsupervised Classification of Chemical Compounds

Buy Article:

$51.00 plus tax (Refund Policy)

Abstract:

Clustering chemical compounds of similar structure is important in the pharmaceutical industry. One way of describing the structure is the chemical `fingerprint'. The fingerprint is a string of binary digits, and typical data sets consist of very large numbers of fingerprints; a suitable clustering procedure must take account of the properties of this method of coding, and must be able to handle large data sets. This paper describes the analysis of a set of fingerprint data. The analysis was based on an appropriate distance measure derived from the fingerprints, followed by metric scaling into a low-dimensional space. An approximation to metric scaling, suitable for very large data sets, was investigated. Cluster analysis using two programs, mclust and AutoClass-C, was carried out on the scaled data.

Keywords: Chemical fingerprint; Cluster analysis; Metric scaling; Rand index

Document Type: Original Article

DOI: http://dx.doi.org/10.1111/1467-9876.00146

Affiliations: University of Oxford, UK

Publication date: January 1, 1999

bpl/rssc/1999/00000048/00000002/art00003
dcterms_title,dcterms_description,pub_keyword
6
5
20
40
5

Access Key

Free Content
Free content
New Content
New content
Open Access Content
Open access content
Subscribed Content
Subscribed content
Free Trial Content
Free trial content
Cookie Policy
X
Cookie Policy
Ingenta Connect website makes use of cookies so as to keep track of data that you have filled in. I am Happy with this Find out more