本文是对SDP格式的官方文档的举例说明。对句子:
More than a few CEOs say the red-carpet treatment tempts them to return to a heartland city for future meetings .
其PSD形式下的语义依存图用SDP格式纯文本表示如下:
#20010019 1 More more JJR - + _ _ _ ACT-arg _ _ _ _ _ 2 than than IN - - _ _ _ _ _ _ _ _ _ 3 a a DT - - _ _ _ _ _ _ _ _ _ 4 few few JJ - - _ _ RSTR _ _ _ _ _ _ 5 CEOs ceo NNS - + _ CPR _ _ _ _ _ _ _ 6 say say VBP + + ev-w2833f1 _ _ _ _ _ _ _ _ 7 the the DT - - _ _ _ _ _ _ _ _ _ 8 red-carpet red-carpet JJ - - _ _ _ _ MANN _ _ _ _ 9 treatment treatment NN - + _ _ _ _ _ ACT-arg _ _ _ 10 tempts tempt VBZ - + ev-w3350f1 _ _ EFF-arg _ _ _ _ _ 11 them #PersPron_#Cor PRP - - _ _ _ _ _ ADDR-arg ACT-arg _ _ 12 to to TO - - _ _ _ _ _ _ _ _ _ 13 return return VB - + ev-w2734f1 _ _ _ _ PAT-arg _ _ _ 14 to to TO - - _ _ _ _ _ _ _ _ _ 15 a a DT - - _ _ _ _ _ _ _ _ _ 16 heartland heartland NN - - _ _ _ _ _ _ _ LOC _ 17 city city NN - + _ _ _ _ _ _ DIR3-arg _ _ 18 for for IN - - _ _ _ _ _ _ _ _ _ 19 future future JJ - - _ _ _ _ _ _ _ _ TWHEN 20 meetings meeting NNS - + _ _ _ _ _ _ AIM _ _ 21 . . . - - _ _ _ _ _ _ _ _ _
方便阅读绘制表格如下:
id |
form |
lemma |
pos |
top |
pred |
arg1 |
arg2 |
|||||||
1 |
More |
more |
JJR |
– |
+ |
_ |
_ |
_ |
ACT-arg |
_ |
_ |
_ |
_ |
_ |
2 |
than |
than |
IN |
– |
– |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
3 |
a |
a |
DT |
– |
– |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
4 |
few |
few |
JJ |
– |
– |
_ |
_ |
RSTR |
_ |
_ |
_ |
_ |
_ |
_ |
5 |
CEOs |
ceo |
NNS |
– |
+ |
_ |
CPR |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
6 |
say |
say |
VBP |
+ |
+ |
ev-w2833f1 |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
7 |
the |
the |
DT |
– |
– |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
8 |
red-carpet |
red-carpet |
JJ |
– |
– |
_ |
_ |
_ |
_ |
MANN |
_ |
_ |
_ |
_ |
9 |
treatment |
treatment |
NN |
– |
+ |
_ |
_ |
_ |
_ |
_ |
ACT-arg |
_ |
_ |
_ |
10 |
tempts |
tempt |
VBZ |
– |
+ |
ev-w3350f1 |
_ |
_ |
EFF-arg |
_ |
_ |
_ |
_ |
_ |
11 |
them |
#PersPron_#Cor |
PRP |
– |
– |
_ |
_ |
_ |
_ |
_ |
ADDR-arg |
ACT-arg |
_ |
_ |
12 |
to |
to |
TO |
– |
– |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
13 |
return |
return |
VB |
– |
+ |
ev-w2734f1 |
_ |
_ |
_ |
_ |
PAT-arg |
_ |
_ |
_ |
14 |
to |
to |
TO |
– |
– |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
15 |
a |
a |
DT |
– |
– |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
16 |
heartland |
heartland |
NN |
– |
– |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
LOC |
_ |
17 |
city |
city |
NN |
– |
+ |
_ |
_ |
_ |
_ |
_ |
_ |
DIR3-arg |
_ |
_ |
18 |
for |
for |
IN |
– |
– |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
19 |
future |
future |
JJ |
– |
– |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
TWHEN |
20 |
meetings |
meeting |
NNS |
– |
+ |
_ |
_ |
_ |
_ |
_ |
_ |
AIM |
_ |
_ |
21 |
. |
. |
. |
– |
– |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
前4列分别是编号、词语、词干、词性,无需多言。
top代表是否是整个图的root。一般语义依存图中的root很少,大多数情况只有一个。因为句子的谓语一般只有一个,比如本句子的root就只有say这一个。
pred代表是否是head,为了将节点链接起来,需要先根据pred是否为正(+)来收集所有的head存入数组heads,约定heads[0]=ROOT。在这个例子中,heads收集如下:
heads = [ROOT, More, CEOs, say, treatment, tempts, return, city, meetings]
arg1和arg2用来标注语义依存弧,在PSD形式下arg1没有用处。重点看arg2,它由多列构成,第i列表示当前词语是heads[i]的dependent。比如“More”的arg2第3列是ACT-arg,而heads[3]=say,所以图中应当连接如下有向边:
More-[ACT-arg]->say
将这些信息绘制成有向图,得到如下语义依存图:
SDP的格式比CoNLL要复杂一些,对人类不友好。你需要先数head,然后数arg2才能找出依存弧。我们可以利用semstr将SDP格式转换为CoNLL格式:
# sent_id = 20010019 1 More more JJR JJR _ 6 ACT-arg _ _ 2 than than IN IN _ 0 ROOT _ _ 3 a a DT DT _ 0 ROOT _ _ 4 few few JJ JJ _ 5 RSTR _ _ 5 CEOs ceo NNS NNS _ 1 CPR _ _ 6 say say VBP VBP _ 0 ROOT _ _ 7 the the DT DT _ 0 ROOT _ _ 8 red-carpet red-carpet JJ JJ _ 9 MANN _ _ 9 treatment treatment NN NN _ 10 ACT-arg _ _ 10 tempts tempt VBZ VBZ _ 6 EFF-arg _ _ 11 them #PersPron_#Cor PRP PRP _ 10 ADDR-arg _ _ 11 them #PersPron_#Cor PRP PRP _ 13 ACT-arg _ _ 12 to to TO TO _ 0 ROOT _ _ 13 return return VB VB _ 10 PAT-arg _ _ 14 to to TO TO _ 0 ROOT _ _ 15 a a DT DT _ 0 ROOT _ _ 16 heartland heartland NN NN _ 17 LOC _ _ 17 city city NN NN _ 13 DIR3-arg _ _ 18 for for IN IN _ 0 ROOT _ _ 19 future future JJ JJ _ 20 TWHEN _ _ 20 meetings meeting NNS NNS _ 13 AIM _ _ 21 . . . . _ 0 ROOT _ _
在CoNLL格式下,我们可以轻松地从第一行发现More的head是6号词语,即say。另外,我们还可以轻易地观察到them有两个head。另外,没有head的词语的head指向了虚拟的ROOT。
这个大前端的改版,看起来不错