TY - GEN
T1 - Visual saliency with side information
AU - Jiang, Wei
AU - Xie, Lexing
AU - Chang, Shih Fu
PY - 2009
Y1 - 2009
N2 - We propose novel algorithms for organizing large image and video datasets using both the visual content and the associated side-information, such as time, location, authorship, and so on. Earlier research have used side-information as pre-filter before visual analysis is performed, and we design a machine learning algorithm to model the join statistics of the content and the side information. Our algorithm, Diverse-Density Contextual Clustering (D 2C2), starts by finding unique patterns for each sub-collection sharing the same side-info, e.g., scenes from winter. It then finds the common patterns that are shared among all subsets, e.g., persistent scenes across all seasons. These unique and common prototypes are found with Multiple Instance Learning and subsequent clustering steps. We evaluate D 2C2 on two web photo collections from Flickr and one news video collection from TRECVID. Results show that not only the visual patterns found by D2C2 are intuitively salient across different seasons, locations and events, classifiers constructed from the unique and common patterns also outperform state-of-the-art bag-of-features classifiers.
AB - We propose novel algorithms for organizing large image and video datasets using both the visual content and the associated side-information, such as time, location, authorship, and so on. Earlier research have used side-information as pre-filter before visual analysis is performed, and we design a machine learning algorithm to model the join statistics of the content and the side information. Our algorithm, Diverse-Density Contextual Clustering (D 2C2), starts by finding unique patterns for each sub-collection sharing the same side-info, e.g., scenes from winter. It then finds the common patterns that are shared among all subsets, e.g., persistent scenes across all seasons. These unique and common prototypes are found with Multiple Instance Learning and subsequent clustering steps. We evaluate D 2C2 on two web photo collections from Flickr and one news video collection from TRECVID. Results show that not only the visual patterns found by D2C2 are intuitively salient across different seasons, locations and events, classifiers constructed from the unique and common patterns also outperform state-of-the-art bag-of-features classifiers.
KW - Image classification
KW - Pattern clustering methods
UR - http://www.scopus.com/inward/record.url?scp=70349220914&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2009.4959946
DO - 10.1109/ICASSP.2009.4959946
M3 - Conference contribution
SN - 9781424423545
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 1765
EP - 1768
BT - 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings, ICASSP 2009
T2 - 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009
Y2 - 19 April 2009 through 24 April 2009
ER -