What if all goes wrong? Diagnostics and repairs for error correction inside FCG processing

Katrien Beuls

DISCLAIMER: It is recommended to use Firefox or Safari to optimally explore the contents of this page.

How are anomalous processing results detected and handled in Fluid Construction Grammar? First of all, FCG does not have a built-in notion of grammaticality. Rather, the FCG engine can either find one or more solutions, that is a meaning in parsing or an utterance in production, or return NIL. Solutions are retrieved at the end of the construction applictation process that is completing at every search node information hold by the transient structure. This search process can be terminated in a number of ways. First, the maximum depth of the search tree can be set to a fixed number of nodes, after which the process is automatically terminated and the solution of the final node is returned. Second, goal tests are used that specify concrete stopping criteria for the search process. The tests are called after every expansion of the transient structure and only when all of them are satisfied a solution will be extracted (meaning or utterance).

The goal tests in the default grammar settings are listed here below. They are split up into production goal tests and parsing goal tests. The order of goal tests is used in their execution and for efficiency reasons, the most strict goal test should be tried first.

(:production-goal-tests :no-applicable-cxns :no-meaning-in-root :render-phon-cat)

(:parse-goal-tests :no-applicable-cxns :no-stems/suffixes-in-root)

The test no-applicable-cxns is shared by both sets and succeeds as soon as the constructions that can expand the transient structure (or only match) have been exhausted and no more construction can apply. Moreover, the addditional tests no meaning in root verifies that there are no longer meaning predicates present in the root node of the transient structure. If there would be, the production process would not have been able to express the full meaning that was intended. Similarly, the goal test no-stems/suffixes-in-root checks if there is form left unexpressed at the end of the parsing process (i.e. when no further constructions can be applied). Finally, the production goal test render-phon-cat takes care of the stem and suffix changes that certain constructions might have brought along. As these constructions typically only modify the phon-cat, also the form feature needs to be modified before the final utterance is constructed.

Yet, what happens if a goal test is not met or if other issues emerge during construction application, even before goal tests are run?

The FCG package can be hooked into a general-purpose learning engine that allows you to incorporate learning notifications at points where processing goes wrong (when expanding the transient structure, at the end of parsing/production, etc.). Such opportunities for learning can then be taken up by diagnostics and repairs , which run predefined scripts to monitor certain frequent learning problems and repair them in a certain way. This web demonstration shows how typical learner errors (taken from the Spanish Language Learning Online Corpus) can be repaired on the fly during processing.

For a more in-depth technical analysis of the learning operators available in FCG grammars, please consult the following publication:

Beuls K., van Trijp, R., & Wellens, P. (2012). Diagnostics and repairs in Fluid Construction Grammar. In: Steels, L., & Hild, M. (Eds.) Language Grounding in Robots. Springer: Berlin.


1. Parsing a verb form containing a verb class error

A very frequent (beginner) error in the conjugation of past tense verbs in Spanish is the mixing of verb stems and endings belonging to different verb classes. Similar to Latin, Spanish has three verb classes: verbs ending on -ar ( hablar , 'to speak'), -er ( comer , 'to eat') and -ir ( vivir , 'to live'). The first verb class is most frequent in use and new verbs entering the language typically follow this class. The second and third verb classes, in order to remain strong against the productive first class, share many of their suffixes. These frequencies distributions play a role when the learner has to reconstruct the verb class of a particular verb stem or suffix.

In the next example the learner incorrectly said cantía 'I/he/she sang' (impf.) instead of cantaba . The -ía suffix is used with stems of the second or third verb class, whereas cantar 'to sing' is a first verb class verb. The parsing process of this ungrammatical form is shown below, together with the meaning that could be extracted after parsing.

Parsing *cantía
Parsed meaning
((cantar ?event-4024 ?context-2536))

What went wrong here? There is a second merge failed result when the ía suffix construction is being applied. The reason for this mismatch is the differing verb class features of the stem and the suffix. Such a failed construction application can be turned into an opportunity for adapting the search in such a way that a solution can be found. This process is summarized here below.

The trigger for an intervention is: FAILED-APPLICATION

The diagnostic that can handle this trigger is: DETECT-FEATURE-MISMATCH. This diagnostic returns a feature mismatch problem with the following information:

The repair that was used to solve this problem is: RESTORE-FEATURE-MISMATCH. This repair is temporarily going to overwrite the verb class feature of the stem unit with the "corrupted feature" so that the parsing process can be restarted and lead to an acceptable solution.

Restart transient structure
Solution after repair
Parsed meaning
((1sg-agent ?agent-754 ?event-2931) (cantar ?event-2931 ?context-1835) (profile-event ?event-2931 middle) (relation ?event-2931 ?reference-time-1105 simultaneous) (time-point ?event-2931 ?reference-time-1105 recalled-point))

2. Parsing a verb form with an ill-formed stem

Let us look at another example of a parsing process that is interrupted. Sometimes learners have not yet fully assimilated certain verb stems. When parsing such forms, the FCG engine does not have the a lexical construction that covers the stem in the current construction inventory. Extragrammatical forms can typically be handled in two ways. First, when handling utterances that are not produced by learners, such forms can lead to an extension of the construction inventory to include a new verb. Second, in the case of learner utterances, an option is to look in the inventory for verb stems that are closely related to the current extra-grammatical stem, either through the use of meaning predicates or through similar forms. An example of the latter is included in this demo.

Parsing preperé
Parsed meaning
nil

Preperé cannot be parsed (and results in an empty meaning solution) because the stem preper- is not known by the grammar. In this example, we do not have access to the meaning that was intended by the speaker. This time, the trigger for intervention is not a failed application result as before but instead the absence of a solution in parsing.

The diagnostic that can handle the NO-SOLUTION-FOUND trigger is: DETECT-UNFAMILIAR-STEM

The repair that was used to solve this problem is: RETRY-WITH-CLOSEST-MATCH. The closest match that was found in the grammar is prepar . The initial transient structure is therefore modified by replacing the base-stem feature that contained the ungrammatical string. You can compare the restart transient structure with the one present in the initial node of the first parsing process above. After this intervention, a complete parsed meaning is found. Of course, it can still be that the learner did not mean preparé but yet another verb form such as preveré .

Restart transient structure
Solution after repair
Parsed meaning
((1sg-agent ?agent-869 ?event-3409) (preparar ?event-3409 ?context-2171) (profile-event ?event-3409 complete) (relation ?event-3409 ?reference-time-1294 simultaneous) (time-point ?event-3409 ?reference-time-1294 recalled-point))

3. Parsing a regular verb form that should be irregular

Spanish has many semi-regular verbs that only display irregularities in certain parts of their paradigm. For beginning learners, these irregularities are hard to capture accurately. Therefore, they often rely on the regular conjugation whereas they should have introduced a stem change or suffix change. The last part of this demonstration contains an example of the former that is again taken from the SPLLOC (see above).

The verb fregar 'to fry' is such a semi-regular verb. When its stem vowel receives primary stress it undergoes a process of dipthongization into -ie- . Yet, when a learner produces the erroneous frega for 'he/she fries', the FCG engine can simply parse this utterance to a complete meaning as if it were a regular form. I have included the construction application process and the parsed meaning here below.

Parsing frega
Parsed meaning
((3sg-agent ?agent-2083 ?event-8491) (fregar ?event-8491 ?context-5203) (relation ?event-8491 ?reference-time-3311 simultaneous) (time-point ?event-8491 ?reference-time-3311 present-point))

How can we detect such an overregularization error? The most straightforward way is to have the FCG engine produce the parsed meaning to verify whether the resulting utterance is the same. If there is a difference, a diagnostic can signal a problem. The repair is here already done by the production process and the resulting correction can be passed on as feedback to the learner.

The diagnostic that can handle the PARSING-SOLUTION-FOUND trigger is: DETECT-DEVIATING-FORM

The repair that was used to solve this problem is: RESTORE-DEVIATING-FORM. The correction that the expert grammar has produced is returned by this repair: friega