
最近的NLP程序要处理的数据上十万,单线程力不从心。写一个小PlayGround来演示分割任务、多线程同步、合并任务。
目标
假设有12个数,对每个数执行一次加法耗时1秒。现在开4个线程,希望在3秒内完成任务。
List<Integer> dataList = new ArrayList<Integer>();
for (int i = 0; i < 12; ++i)
{
dataList.add(i);
}
System.out.println("总数据集:" + dataList);
总数据集:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
子线程
static class WorkThread extends Thread
{
private List<Integer> workDataList;
WorkThread(String name, List<Integer> workDataList)
{
super(name);
this.workDataList = workDataList;
}
@Override
public void run()
{
System.out.println(getName() + "开始处理" + workDataList);
for (int i = 0; i < workDataList.size(); ++i)
{
workDataList.set(i, workDataList.get(i) + 1);
try
{
Thread.sleep(1000);
}
catch (InterruptedException e)
{
e.printStackTrace();
}
}
System.out.println(getName() + "处理完毕" + workDataList);
}
public List<Integer> getResult()
{
return workDataList;
}
}
分割任务
用List.subList来将数据集拆成四部分,注意subList有个很有意思的特性,对子list做的任何改动都会反映在原list上。相反,在原list上做的非结构性改动也会反映在子list上。所谓的结构性改动就是指改变list大小。
WorkThread[] workThreadArray = new WorkThread[4];
for (int i = 0; i < workThreadArray.length; ++i)
{
workThreadArray[i] = new WorkThread("线程" + i, dataList.subList(i * 3, (i + 1) * 3));
workThreadArray[i].start();
}
线程同步
主线程希望等待所有的子线程都完成任务后汇总结果并展示出来。
for (WorkThread aWorkThread : workThreadArray)
{
try
{
aWorkThread.join();
}
catch (InterruptedException e)
{
e.printStackTrace();
}
}
完整的代码
package com.hankcs;
import java.util.ArrayList;
import java.util.List;
public class Main
{
public static void main(String[] args)
{
List<Integer> dataList = new ArrayList<Integer>();
for (int i = 0; i < 12; ++i)
{
dataList.add(i);
}
System.out.println("总数据集:" + dataList);
long start = System.currentTimeMillis();
WorkThread[] workThreadArray = new WorkThread[4];
for (int i = 0; i < workThreadArray.length; ++i)
{
workThreadArray[i] = new WorkThread("线程" + i, dataList.subList(i * 3, (i + 1) * 3));
workThreadArray[i].start();
}
for (WorkThread aWorkThread : workThreadArray)
{
try
{
aWorkThread.join();
}
catch (InterruptedException e)
{
e.printStackTrace();
}
}
System.out.println("结果汇总:" + dataList);
System.out.println("耗时:" + (System.currentTimeMillis() - start));
}
static class WorkThread extends Thread
{
private List<Integer> workDataList;
WorkThread(String name, List<Integer> workDataList)
{
super(name);
this.workDataList = workDataList;
}
@Override
public void run()
{
System.out.println(getName() + "开始处理" + workDataList);
for (int i = 0; i < workDataList.size(); ++i)
{
workDataList.set(i, workDataList.get(i) + 1);
try
{
Thread.sleep(1000);
}
catch (InterruptedException e)
{
e.printStackTrace();
}
}
System.out.println(getName() + "处理完毕" + workDataList);
}
public List<Integer> getResult()
{
return workDataList;
}
}
}
输出
总数据集:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 线程0开始处理[0, 1, 2] 线程1开始处理[3, 4, 5] 线程2开始处理[6, 7, 8] 线程3开始处理[9, 10, 11] 线程3处理完毕[10, 11, 12] 线程0处理完毕[1, 2, 3] 线程1处理完毕[4, 5, 6] 线程2处理完毕[7, 8, 9] 结果汇总:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] 耗时:3002
码农场
推荐了解一下:CountDownLatch
可能只是看起来优雅点吧,都一样啦。
推荐了解一下:CountDownLatch
多谢指教,我看了下CountDownLatch的代码比较好看,还有其他优点吗?开销更小吗?
你好,建议创建各个控件的子类,在子类中控制字体。