文章目录
Mahout项目发展到了今天已经实现了许多的算法。下面列出Mahout项目主要的算法名称,供大家参考。
一、协同过滤 Collaborative Filtering
1、基于用户的协同过滤 User-Based Collaborative Filtering
2、基于项目的协同过滤统 Item-Based Collaborative Filtering
3、交替最小二乘张量分解 Matrix Factorization with Alternating Least Squares
4、基于隐式反馈的交替最小二乘张量分解 Matrix Factorization with Alternating Least Squares on Implicit Feedback
5、加权矩阵分解 Weighted Matrix Factorization, SVD++, Parallel SGD
基于用户(User-based)的协同过滤和基于项目(Item-based)的协同过滤统称为以记忆为基础(Memory based)的协同过滤技术,他们共有的缺点是资料稀疏,难以处理大资料量影响即时结果,因此发展出以模型为基础的协同过滤技术。
二、分类
1、逻辑回归 Logistic Regression - trained via SGD
2、朴素贝叶斯算法 Naive Bayes/ 互补贝叶斯分类算法 Complementary Naive Bayes - MapReduce
3、随机森林 Random Forest - MapReduce
4、隐马尔可夫模型 Hidden Markov Models - single machine
5、多层感知机 Multilayer Perceptron - single machine
三、聚类 Clustering
1、Canopy聚类算法 Canopy Clustering - single machine / MapReduce (已经遗弃,当K-means算法足够成熟的时候将会被删除)
2、k-Means聚类 k-Means Clustering - single machine / MapReduce
3、模糊k-Means聚类 Fuzzy k-Means - single machine / MapReduce
4、Streaming k-Means - single machine / MapReduce
5、谱聚类 Spectral Clustering - MapReduce
四、维数约化Dimensionality Reduction
1、奇异值分解 Singular Value Decomposition - single machine
2、Lanczos算法 Lanczos Algorithm - single machine / MapReduce
3、Stochastic SVD - single machine / MapReduce / Spark
4、主成分分析 Principal Component Analysis (via Stochastic SVD)- single machine / MapReduce
五、话题模型 Topic Models
1、LDA算法 Latent Dirichlet Allocation - single machine / MapReduce
六、Miscellaneous
1、Frequent Pattern Mining - MapReduce
2、Row Similarity Job - compute pairwise similarities between the rows of a matrix - MapReduce
3、ConcatMatrices - combine 2 matrices or vectors into a single matrix - MapReduce
4、Collocations - find co-locations of tokens in text - MapReduce
原创文章版权归过往记忆大数据(过往记忆)所有,未经许可不得转载。
本文链接: 【Mahout项目已经实现的算法】(https://www.iteblog.com/archives/1130.html)