
这一节的目标是实现一个类似豆瓣的网站,目的在于鉴别出具有相似观点的用户、有相似评论的文章以及相似内容的新闻报道。而且还能推荐给用户有共同兴趣的小组。
3.3.1 简介
对于网站的每个栏目,找出排名最高的文章,同时得到喜欢这些文章的用户以及他们的条目列表。
对于每个用户的每个条目,基于条目内容的相似性找出10个由其他用户发布的条目。
假定这些条目都被用户打了分,打分规则和前面一样。
3.3.2 发现朋友
package com.hankcs;
import iweb2.ch3.collaborative.data.*;
import iweb2.ch3.collaborative.model.SimilarUser;
import iweb2.ch3.collaborative.model.User;
import iweb2.ch3.collaborative.recommender.Delphi;
import iweb2.ch3.collaborative.recommender.DiggDelphi;
import iweb2.ch3.collaborative.similarity.RecommendationType;
public class ch3_5_Digg_ContentAndRatings
{
public static void main(String[] args) throws Exception
{
// Load data from Digg and save them in a file
// BaseDataset ds = DiggData.loadDataFromDigg("C:/iWeb2/data/ch03/digg_stories.csv");
// Load previously saved data
BaseDataset ds = DiggData.loadData("C:/iWeb2/data/ch03/digg_stories.csv");
// 2. Pick a user randomly or by username
// iweb2.ch3.collaborative.model.User user = ds.getUser(1);
// Or pick some specific user by username
//DiggData.showUsers();
User user = ds.findUserByName("adamfishercox");
// 3. Show similar users
DiggDelphi delphi = new DiggDelphi(ds);
SimilarUser[] similarUsers = delphi.findSimilarUsers(user);
delphi.recommend(user);
iweb2.ch3.collaborative.model.User u2 = ds.findUserByName(similarUsers[0].getName());
similarUsers = delphi.findSimilarUsers(u2);
delphi.recommend(u2);
iweb2.ch3.collaborative.model.User u3 = ds.findUserByName(similarUsers[0].getName());
delphi.findSimilarUsers(u3);
delphi.recommend(u3);
}
}
报错
ERROR:
Failed to load properties from resource: '/iweb2.properties'.
null
Exception in thread "main" java.lang.ExceptionInInitializerError
at iweb2.ch3.collaborative.similarity.SimilarityMatrixRepository.<init>(SimilarityMatrixRepository.java:14)
at iweb2.ch3.collaborative.recommender.Delphi.<init>(Delphi.java:50)
at iweb2.ch3.collaborative.data.DiggData.createItemContentDelphi(DiggData.java:311)
at iweb2.ch3.collaborative.data.DiggData.createDataset(DiggData.java:236)
at iweb2.ch3.collaborative.data.DiggData.loadData(DiggData.java:164)
at com.hankcs.ch3_5_Digg_ContentAndRatings.main(ch3_5_Digg_ContentAndRatings.java:30)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Caused by: java.lang.RuntimeException: Failed to load properties from resource: '/iweb2.properties'.
at iweb2.util.config.IWeb2Config.<clinit>(IWeb2Config.java:50)
… 11 more
Caused by: java.lang.NullPointerException
at java.util.Properties$LineReader.readLine(Properties.java:434)
at java.util.Properties.load0(Properties.java:353)
at java.util.Properties.load(Properties.java:341)
at iweb2.util.config.IWeb2Config.<clinit>(IWeb2Config.java:45)
… 11 more
将C:\iWeb2\deploy\conf\iweb2.properties等三个文件放到C:\iWeb2\out\production\iWeb2目录下:
三角效应
A的最好朋友是B,B的某个朋友是C,C的某个朋友只存在A的好友列表中却不存在于B的好友列表中。其实我没有感到任何有趣之处……
3.3.3 DiggDelphi的内部工作机制
这是对上一节的三个推荐器的包装,通过综合三个推荐器的结果得出最终的结果。
/**
* 通过用户间相似度及用户的内容相似度来寻找相似用户
* @param user
* @param topN
* @return
*/
public SimilarUser[] findSimilarUsers(User user, int topN)
{
List<SimilarUser> similarUsers = new ArrayList<SimilarUser>();
// 将通过两种方法找到的用户简单地放到一起
// 基于内容
SimilarUser[] simU = delphiUC.findSimilarUsers(user, topN);
similarUsers.addAll(Arrays.asList(simU));
// 基于用户相似度
simU = delphiUR.findSimilarUsers(user, topN);
similarUsers.addAll(Arrays.asList(simU));
// SimilarUser.print(simU, "Top Friends for user " + user.getName() + ":");
return SimilarUser.getTopNFriends(similarUsers, topN);
}
将三个推荐器的结果走如下工序:
? // 各推荐器计算其最大预测评分
? // 缩放值 = 三者最大值 / 当前最大值
? // 设置阀值 = 0.5 * 缩放值
? // 排除小于阀值的条目
? // 否则缩放
? // 建立三种推荐器结果的交集
? // 处理每一个条目
? // 最终值取平均
/**
* 通过用户-条目-内容、用户、条目相似度来做出推荐
* @param user
* @param topN
* @return
*/
public List<PredictedItemRating> recommend(User user, int topN)
{
List<PredictedItemRating> recommendations = new ArrayList<PredictedItemRating>();
//Establish a relative scaling factor
double maxR = -1.0d;
// Get the maximum predicted ratings from each recommender
// 各推荐器计算其最大预测评分
double maxRatingDelphiUIC = delphiUIC.getMaxPredictedRating(user.getId());
double maxRatingDelphiUR = delphiUR.getMaxPredictedRating(user.getId());
double maxRatingDelphiIR = delphiIR.getMaxPredictedRating(user.getId());
// Find the maximum predicted rating across all recommendations
// 各推荐器的最大预测评分
double[] sortedMaxR = {maxRatingDelphiUIC, maxRatingDelphiUR, maxRatingDelphiIR};
Arrays.sort(sortedMaxR);
// 最大的预测评分
maxR = sortedMaxR[2]; // This is the maximum predicted rating
// auxiliary variable
// 辅助变量
double scaledRating = 1.0d;
// Recommender 1 -- User-to-Item content based
// 缩放值 = 三者最大值 / 当前最大值
double scaling = maxR / maxRatingDelphiUIC;
//Set an ad hoc threshold and scale it
// 设置阀值 = 0.5 * 缩放值
double scaledThreshold = 0.5 * scaling;
List<PredictedItemRating> uicList = new ArrayList<PredictedItemRating>(topN);
uicList = delphiUIC.recommend(user, topN);
for (PredictedItemRating pR : uicList)
{
scaledRating = pR.getRating(6) * scaling;
// 排除小于阀值的条目
if (scaledRating < scaledThreshold)
{
uicList.remove(pR);
}
else
{
// 否则缩放
pR.setRating(scaledRating);
}
}
// Recommender 2 -- User based collaborative filtering
scaling = maxR / maxRatingDelphiUR;
scaledThreshold = 0.5 * scaling;
List<PredictedItemRating> urList = new ArrayList<PredictedItemRating>(topN);
urList = delphiUR.recommend(user, topN);
for (PredictedItemRating pR : urList)
{
scaledRating = pR.getRating(6) * scaling;
if (scaledRating < scaledThreshold)
{
urList.remove(pR);
}
else
{
pR.setRating(scaledRating);
}
}
// Recommender 3 -- Item based collaborative filtering
scaling = maxR / maxRatingDelphiIR;
scaledThreshold = 0.5 * scaling;
List<PredictedItemRating> irList = new ArrayList<PredictedItemRating>(topN);
irList = delphiIR.recommend(user, topN);
for (PredictedItemRating pR : irList)
{
scaledRating = pR.getRating(6) * scaling;
if (scaledRating < scaledThreshold)
{
irList.remove(pR);
}
else
{
pR.setRating(scaledRating);
}
}
/*
* At this point, uicList, urList, and irList contain ratings
* that are scaled and exceed the threshold value.
*
*/
double uicRating = 0;
double urRating = 0;
double irRating = 0;
double vote = 0;
// build a set of items produced by all recommenders
// 建立三种推荐器结果的交集
Set<Integer> allRecommendedItems = new HashSet<Integer>();
for (PredictedItemRating pir : urList)
{
allRecommendedItems.add(pir.getItemId());
}
for (PredictedItemRating pir : irList)
{
allRecommendedItems.add(pir.getItemId());
}
for (PredictedItemRating pir : uicList)
{
allRecommendedItems.add(pir.getItemId());
}
// 处理每一个条目
for (Integer itemId : allRecommendedItems)
{
//Initialize
uicRating = 0;
urRating = 0;
irRating = 0;
vote = 0;
for (PredictedItemRating uic : urList)
{
if (itemId == uic.getItemId())
{
uicRating = uic.getRating(6);
}
}
for (PredictedItemRating ur : urList)
{
if (itemId == ur.getItemId())
{
urRating = ur.getRating(6);
}
}
for (PredictedItemRating ir : irList)
{
if (itemId == ir.getItemId())
{
irRating = ir.getRating(6);
}
}
// 最终值取平均
vote = (uicRating + urRating + irRating) / 3.0d;
recommendations.add(new PredictedItemRating(user.getId(), itemId, vote));
}
rescale(recommendations, maxR);
return PredictedItemRating.getTopNRecommendations(recommendations, topN);
}
码农场