Note to users. If you're seeing this message, it means that your browser cannot find this page's style/presentation instructions -- or possibly that you are using a browser that does not support current Web standards. Find out more about why this message is appearing, and what you can do to make your experience of our site the best it can be.
Clustering by Passing Messages Between Data Points
Brendan J. Frey* and
Delbert Dueck
Clustering data by identifying a subset of representative examplesis important for processing sensory signals and detecting patternsin data. Such "exemplars" can be found by randomly choosingan initial subset of data points and then iteratively refiningit, but this works well only if that initial choice is closeto a good solution. We devised a method called "affinity propagation,"which takes as input measures of similarity between pairs ofdata points. Real-valued messages are exchanged between datapoints until a high-quality set of exemplars and correspondingclusters gradually emerges. We used affinity propagation tocluster images of faces, detect genes in microarray data, identifyrepresentative sentences in this manuscript, and identify citiesthat are efficiently accessed by airline travel. Affinity propagationfound clusters with much lower error than other methods, andit did so in less than one-hundredth the amount of time.
Department of Electrical and Computer Engineering, University of Toronto, 10 King's College Road, Toronto, Ontario M5S 3G4, Canada.
* To whom correspondence should be addressed. E-mail: frey{at}psi.toronto.edu
Quantitative Proteomic Analysis of Bean Plants Infected by a Virulent and Avirulent Obligate Rust Fungus.
J. Lee, J. Feng, K. B. Campbell, B. E. Scheffler, W. M. Garrett, S. Thibivilliers, G. Stacey, D. Q. Naiman, M. L. Tucker, M. A. Pastor-Corrales, et al. (2009)
Mol. Cell. Proteomics
8, 19-31
|Abstract »|Full Text »|PDF »
Message-passing algorithms for the prediction of protein domain interactions from protein-protein interaction data.
M. Iqbal, A. A. Freitas, C. G. Johnson, and M. Vergassola (2008)
Bioinformatics
24, 2064-2070
|Abstract »|Full Text »|PDF »
Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space.
Y. Loewenstein, E. Portugaly, M. Fromer, and M. Linial (2008)
Bioinformatics
24, i41-i49
|Abstract »|Full Text »|PDF »
Cell Identity Mediates the Response of Arabidopsis Roots to Abiotic Stress.
J. R. Dinneny, T. A. Long, J. Y. Wang, J. W. Jung, D. Mace, S. Pointer, C. Barron, S. M. Brady, J. Schiefelbein, and P. N. Benfey (2008)
Science
320, 942-945
|Abstract »|Full Text »|PDF »
Comment on "Clustering by Passing Messages Between Data Points".
VISDA: an open-source caBIGTM analytical tool for data clustering and beyond.
J. Wang, H. Li, Y. Zhu, M. Yousef, M. Nebozhyn, M. Showe, L. Showe, J. Xuan, R. Clarke, and Y. Wang (2007)
Bioinformatics
23, 2024-2027
|Abstract »|Full Text »|PDF »