这一节的目标是实现一个类似豆瓣的网站,目的在于鉴别出具有相似观点的用户、有相似评论的文章以及相似内容的新闻报道。而且还能推荐给用户有共同兴趣的小组。
3.3.1 简介
对于网站的每个栏目,找出排名最高的文章,同时得到喜欢这些文章的用户以及他们的条目列表。
对于每个用户的每个条目,基于条目内容的相似性找出10个由其他用户发布的条目。
假定这些条目都被用户打了分,打分规则和前面一样。
3.3.2 发现朋友
package com.hankcs; import iweb2.ch3.collaborative.data.*; import iweb2.ch3.collaborative.model.SimilarUser; import iweb2.ch3.collaborative.model.User; import iweb2.ch3.collaborative.recommender.Delphi; import iweb2.ch3.collaborative.recommender.DiggDelphi; import iweb2.ch3.collaborative.similarity.RecommendationType; public class ch3_5_Digg_ContentAndRatings { public static void main(String[] args) throws Exception { // Load data from Digg and save them in a file // BaseDataset ds = DiggData.loadDataFromDigg("C:/iWeb2/data/ch03/digg_stories.csv"); // Load previously saved data BaseDataset ds = DiggData.loadData("C:/iWeb2/data/ch03/digg_stories.csv"); // 2. Pick a user randomly or by username // iweb2.ch3.collaborative.model.User user = ds.getUser(1); // Or pick some specific user by username //DiggData.showUsers(); User user = ds.findUserByName("adamfishercox"); // 3. Show similar users DiggDelphi delphi = new DiggDelphi(ds); SimilarUser[] similarUsers = delphi.findSimilarUsers(user); delphi.recommend(user); iweb2.ch3.collaborative.model.User u2 = ds.findUserByName(similarUsers[0].getName()); similarUsers = delphi.findSimilarUsers(u2); delphi.recommend(u2); iweb2.ch3.collaborative.model.User u3 = ds.findUserByName(similarUsers[0].getName()); delphi.findSimilarUsers(u3); delphi.recommend(u3); } }
报错
ERROR:
Failed to load properties from resource: '/iweb2.properties'.
null
Exception in thread "main" java.lang.ExceptionInInitializerError
at iweb2.ch3.collaborative.similarity.SimilarityMatrixRepository.<init>(SimilarityMatrixRepository.java:14)
at iweb2.ch3.collaborative.recommender.Delphi.<init>(Delphi.java:50)
at iweb2.ch3.collaborative.data.DiggData.createItemContentDelphi(DiggData.java:311)
at iweb2.ch3.collaborative.data.DiggData.createDataset(DiggData.java:236)
at iweb2.ch3.collaborative.data.DiggData.loadData(DiggData.java:164)
at com.hankcs.ch3_5_Digg_ContentAndRatings.main(ch3_5_Digg_ContentAndRatings.java:30)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Caused by: java.lang.RuntimeException: Failed to load properties from resource: '/iweb2.properties'.
at iweb2.util.config.IWeb2Config.<clinit>(IWeb2Config.java:50)
… 11 more
Caused by: java.lang.NullPointerException
at java.util.Properties$LineReader.readLine(Properties.java:434)
at java.util.Properties.load0(Properties.java:353)
at java.util.Properties.load(Properties.java:341)
at iweb2.util.config.IWeb2Config.<clinit>(IWeb2Config.java:45)
… 11 more
将C:\iWeb2\deploy\conf\iweb2.properties等三个文件放到C:\iWeb2\out\production\iWeb2目录下:
三角效应
A的最好朋友是B,B的某个朋友是C,C的某个朋友只存在A的好友列表中却不存在于B的好友列表中。其实我没有感到任何有趣之处……
3.3.3 DiggDelphi的内部工作机制
这是对上一节的三个推荐器的包装,通过综合三个推荐器的结果得出最终的结果。
/** * 通过用户间相似度及用户的内容相似度来寻找相似用户 * @param user * @param topN * @return */ public SimilarUser[] findSimilarUsers(User user, int topN) { List<SimilarUser> similarUsers = new ArrayList<SimilarUser>(); // 将通过两种方法找到的用户简单地放到一起 // 基于内容 SimilarUser[] simU = delphiUC.findSimilarUsers(user, topN); similarUsers.addAll(Arrays.asList(simU)); // 基于用户相似度 simU = delphiUR.findSimilarUsers(user, topN); similarUsers.addAll(Arrays.asList(simU)); // SimilarUser.print(simU, "Top Friends for user " + user.getName() + ":"); return SimilarUser.getTopNFriends(similarUsers, topN); }
将三个推荐器的结果走如下工序:
? // 各推荐器计算其最大预测评分
? // 缩放值 = 三者最大值 / 当前最大值
? // 设置阀值 = 0.5 * 缩放值
? // 排除小于阀值的条目
? // 否则缩放
? // 建立三种推荐器结果的交集
? // 处理每一个条目
? // 最终值取平均
/** * 通过用户-条目-内容、用户、条目相似度来做出推荐 * @param user * @param topN * @return */ public List<PredictedItemRating> recommend(User user, int topN) { List<PredictedItemRating> recommendations = new ArrayList<PredictedItemRating>(); //Establish a relative scaling factor double maxR = -1.0d; // Get the maximum predicted ratings from each recommender // 各推荐器计算其最大预测评分 double maxRatingDelphiUIC = delphiUIC.getMaxPredictedRating(user.getId()); double maxRatingDelphiUR = delphiUR.getMaxPredictedRating(user.getId()); double maxRatingDelphiIR = delphiIR.getMaxPredictedRating(user.getId()); // Find the maximum predicted rating across all recommendations // 各推荐器的最大预测评分 double[] sortedMaxR = {maxRatingDelphiUIC, maxRatingDelphiUR, maxRatingDelphiIR}; Arrays.sort(sortedMaxR); // 最大的预测评分 maxR = sortedMaxR[2]; // This is the maximum predicted rating // auxiliary variable // 辅助变量 double scaledRating = 1.0d; // Recommender 1 -- User-to-Item content based // 缩放值 = 三者最大值 / 当前最大值 double scaling = maxR / maxRatingDelphiUIC; //Set an ad hoc threshold and scale it // 设置阀值 = 0.5 * 缩放值 double scaledThreshold = 0.5 * scaling; List<PredictedItemRating> uicList = new ArrayList<PredictedItemRating>(topN); uicList = delphiUIC.recommend(user, topN); for (PredictedItemRating pR : uicList) { scaledRating = pR.getRating(6) * scaling; // 排除小于阀值的条目 if (scaledRating < scaledThreshold) { uicList.remove(pR); } else { // 否则缩放 pR.setRating(scaledRating); } } // Recommender 2 -- User based collaborative filtering scaling = maxR / maxRatingDelphiUR; scaledThreshold = 0.5 * scaling; List<PredictedItemRating> urList = new ArrayList<PredictedItemRating>(topN); urList = delphiUR.recommend(user, topN); for (PredictedItemRating pR : urList) { scaledRating = pR.getRating(6) * scaling; if (scaledRating < scaledThreshold) { urList.remove(pR); } else { pR.setRating(scaledRating); } } // Recommender 3 -- Item based collaborative filtering scaling = maxR / maxRatingDelphiIR; scaledThreshold = 0.5 * scaling; List<PredictedItemRating> irList = new ArrayList<PredictedItemRating>(topN); irList = delphiIR.recommend(user, topN); for (PredictedItemRating pR : irList) { scaledRating = pR.getRating(6) * scaling; if (scaledRating < scaledThreshold) { irList.remove(pR); } else { pR.setRating(scaledRating); } } /* * At this point, uicList, urList, and irList contain ratings * that are scaled and exceed the threshold value. * */ double uicRating = 0; double urRating = 0; double irRating = 0; double vote = 0; // build a set of items produced by all recommenders // 建立三种推荐器结果的交集 Set<Integer> allRecommendedItems = new HashSet<Integer>(); for (PredictedItemRating pir : urList) { allRecommendedItems.add(pir.getItemId()); } for (PredictedItemRating pir : irList) { allRecommendedItems.add(pir.getItemId()); } for (PredictedItemRating pir : uicList) { allRecommendedItems.add(pir.getItemId()); } // 处理每一个条目 for (Integer itemId : allRecommendedItems) { //Initialize uicRating = 0; urRating = 0; irRating = 0; vote = 0; for (PredictedItemRating uic : urList) { if (itemId == uic.getItemId()) { uicRating = uic.getRating(6); } } for (PredictedItemRating ur : urList) { if (itemId == ur.getItemId()) { urRating = ur.getRating(6); } } for (PredictedItemRating ir : irList) { if (itemId == ir.getItemId()) { irRating = ir.getRating(6); } } // 最终值取平均 vote = (uicRating + urRating + irRating) / 3.0d; recommendations.add(new PredictedItemRating(user.getId(), itemId, vote)); } rescale(recommendations, maxR); return PredictedItemRating.getTopNRecommendations(recommendations, topN); }