最近的NLP程序要处理的数据上十万,单线程力不从心。写一个小PlayGround来演示分割任务、多线程同步、合并任务。
目标
假设有12个数,对每个数执行一次加法耗时1秒。现在开4个线程,希望在3秒内完成任务。
List<Integer> dataList = new ArrayList<Integer>(); for (int i = 0; i < 12; ++i) { dataList.add(i); } System.out.println("总数据集:" + dataList);
总数据集:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
子线程
static class WorkThread extends Thread { private List<Integer> workDataList; WorkThread(String name, List<Integer> workDataList) { super(name); this.workDataList = workDataList; } @Override public void run() { System.out.println(getName() + "开始处理" + workDataList); for (int i = 0; i < workDataList.size(); ++i) { workDataList.set(i, workDataList.get(i) + 1); try { Thread.sleep(1000); } catch (InterruptedException e) { e.printStackTrace(); } } System.out.println(getName() + "处理完毕" + workDataList); } public List<Integer> getResult() { return workDataList; } }
分割任务
用List.subList来将数据集拆成四部分,注意subList有个很有意思的特性,对子list做的任何改动都会反映在原list上。相反,在原list上做的非结构性改动也会反映在子list上。所谓的结构性改动就是指改变list大小。
WorkThread[] workThreadArray = new WorkThread[4]; for (int i = 0; i < workThreadArray.length; ++i) { workThreadArray[i] = new WorkThread("线程" + i, dataList.subList(i * 3, (i + 1) * 3)); workThreadArray[i].start(); }
线程同步
主线程希望等待所有的子线程都完成任务后汇总结果并展示出来。
for (WorkThread aWorkThread : workThreadArray) { try { aWorkThread.join(); } catch (InterruptedException e) { e.printStackTrace(); } }
完整的代码
package com.hankcs; import java.util.ArrayList; import java.util.List; public class Main { public static void main(String[] args) { List<Integer> dataList = new ArrayList<Integer>(); for (int i = 0; i < 12; ++i) { dataList.add(i); } System.out.println("总数据集:" + dataList); long start = System.currentTimeMillis(); WorkThread[] workThreadArray = new WorkThread[4]; for (int i = 0; i < workThreadArray.length; ++i) { workThreadArray[i] = new WorkThread("线程" + i, dataList.subList(i * 3, (i + 1) * 3)); workThreadArray[i].start(); } for (WorkThread aWorkThread : workThreadArray) { try { aWorkThread.join(); } catch (InterruptedException e) { e.printStackTrace(); } } System.out.println("结果汇总:" + dataList); System.out.println("耗时:" + (System.currentTimeMillis() - start)); } static class WorkThread extends Thread { private List<Integer> workDataList; WorkThread(String name, List<Integer> workDataList) { super(name); this.workDataList = workDataList; } @Override public void run() { System.out.println(getName() + "开始处理" + workDataList); for (int i = 0; i < workDataList.size(); ++i) { workDataList.set(i, workDataList.get(i) + 1); try { Thread.sleep(1000); } catch (InterruptedException e) { e.printStackTrace(); } } System.out.println(getName() + "处理完毕" + workDataList); } public List<Integer> getResult() { return workDataList; } } }
输出
总数据集:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 线程0开始处理[0, 1, 2] 线程1开始处理[3, 4, 5] 线程2开始处理[6, 7, 8] 线程3开始处理[9, 10, 11] 线程3处理完毕[10, 11, 12] 线程0处理完毕[1, 2, 3] 线程1处理完毕[4, 5, 6] 线程2处理完毕[7, 8, 9] 结果汇总:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] 耗时:3002
推荐了解一下:CountDownLatch
可能只是看起来优雅点吧,都一样啦。
推荐了解一下:CountDownLatch
多谢指教,我看了下CountDownLatch的代码比较好看,还有其他优点吗?开销更小吗?
你好,建议创建各个控件的子类,在子类中控制字体。