Home page

Candidate Generation

Candidate ordering

Candidates are pairs of dialogue units occurring in the same dialogue between which an attachment is possible, and are the units of data for which labels are predicted. The LFs use "local" information to determine whether the dialogue units in a candidate are attached, such as dialogue unit type, speaker identity, raw text, dialogue and speech acts and distance between units. The LFs are also given access to some "global" contexutal information, such as location in a dialgoue, and what has been said/ what attachments have been predicted in the dialogue up to that point.

Ex. Given dialogue X with DUs A, B, C, D, appearing in that order, the set of possible candidates Y is: {(A,B), (A,C), (A,D), (B,C), (B,D) and (C,D)}.

We put the candidates in order of salient previous information using the code below, which for dialogue X yields the following ordered list of candidates O:
[(A,B), (B,C), (A,C), (C,D), (B,D), (A,D)]

As an LF is applied to O, it first sees, for each target unit, the immediately preceding source unit, followed by the source units at incrementally increasing distances to the target unit. Chronolgically speaking, an LF considers the immediate present before moving into the past of a dialogue, where it considers everything which took place before.

get_seg_list takes all of the source and target DUs in a dialogue and orders them chronologically.

def get_seg_list(table):
    sources = [tuple(s) for s in table[['source_id', 'source_span_end']].drop_duplicates().get_values().tolist()]
    targets = [tuple(t) for t in table[['target_id', 'target_span_end']].drop_duplicates().get_values().tolist()]
    sources.extend(targets)
    combined = list(set(sources))
    combined.sort(key=lambda tup: tup[1])
    all_segs = [c[0] for c in combined]
    return all_segs

For each dialogue, once the DUs are ordered, they are converted into an ordered list of candidates:

dialogues = cands.dialogue_num.drop_duplicates()

for d in tqdm(dialogues):
 
    # 2 -- get segment list
    seg_list = get_seg_list(cands[cands.dialogue_num == d])

    # 3 -- create seg pairs list
    seg_pairs = []
    for i, s in enumerate(seg_list):
        for n in [j for j in reversed(range((i+1)-20, i+1)) if j>=0]:
            try:
                seg_pairs.append((seg_list[n], seg_list[i+1]))   
            except IndexError:
                pass