采用PCA的K-means聚类

摘要K-means 聚类方法是一种常用的数据聚类的无监督学习方法。主成分分析（PCA）是一种广泛使用的无监督的降维统计方法。在这里，我们证明了前 K-1 个主成分是在 K-means 聚类过程指标下的离散的群集成员的连续解。换句话说，我们表明，子空间的聚类中心是由数据的协方差矩阵的前 K-1 的主成分所拓展。这些结果表明，无监督的降维与无监督的学习密切相关。在实验上，我们的研究结果表明了有效的 K-means 聚类技术，我们用互联网新闻组分析了这个结果，实验结果表明，新产生的 K-means 下界的目标函数与最优值相差在 0.5%~3.2%之内。68469

毕业论文关键词：K-means 聚类主成分分析奇异值分解

Title K-means clustering via principal component analysis

Abstract

K-means clustering is a commonly used data clustering for unsupervised learning tasks. Principal component analysis (PCA) is a widely used statistical technique for unsupervised dimension reduction. Here we prove that principal components at K-1 terms are the continuous solutions to the discrete cluster membership indicators for K-means clustering. Equivalently, we show that the subspace spanned by the cluster centroids are given by spectral expansion of data covariance matrix truncated at K-1 terms. These results indicate that unsupervised dimension reduction is closely related to unsupervised learning. On learning, our results suggest effective techniques for K-means clustering. Internet newsgroups are analyzed to illustrate the results. Experiments indicate that newly derived lower bounds for K-means objective are within 0.5%-3.2% of the optimal values.

Keywords： K-means clustering principal component analysis singular value decomposition

1 绪论 1

1.1 聚类分析 1

1.1.1 聚类分析简介 1

1.1.2 基于连通性的聚类模型 2

1.1.3 基于划分的聚类模型 3

1.1.4 基于分布的聚类模型 4

1.1.5 基于密度的聚类模型 4

1.2 主成分分析 4

1.2.1 主成分分析概述 4

1.2.2 奇异值分解 5

1.3 K-means 与主成分分析 6

1.4 本文章节介绍 7

2 理论分析 7

2.1 两个聚类 8

2.2 K 个聚类 10

2.2.1 正规化的松弛 11

2.2.2 聚类中心子空间辨识 13

2.2.3 恢复 K 个集群 15

2.2.4

上一篇：生物信息云平台上的微生物群落拓扑分析

下一篇：游戏平台中行为规则抽取研究

采用PCA的K-means聚类

《信息技术课程标准》微课的设计与制作

《读书交流分享》APP的设计与开发

《信息技术课程标准》系列微课的设计与制作

《大学生课堂教学管理与...

教育技术学专业技能学习网站的设计

基于Android的电子拍卖系统设计与开发

基于Web应用的致胜公司企业内部培训系统设计

发酵米粉优势菌株的发酵特性研究

日语论文中日酒文化对比研究

新疆农林高校學生昆虫生...

大淘宝网的虚假交易研究

2021年什么行业赚钱，适合...

浅谈农村大气环境保护的制度构建【1868字】

激光模拟训练器材国内外研究现状

肢体语言在小学英语教学中的应用浅谈

个案管理茬老年糖尿病患...

淮安市高校足球运动损伤问卷调查表