Aron Culotta, Trausti Kristjansson, Andrew McCallum, Paul Viola
To successfully embed statistical machine learning models in real world applications, twopost-deployment capabilities must be provided: (1) the ability to solicit user corrections and(2) the ability to update the model from these corrections. We refer to the former capabilityas corrective feedback and the latter as persistent learning. While these capabilities have anatural implementation for simple classification tasks such as spam filtering, we argue thata more careful design is required for structured classification tasks.One example of a structured classification task is information extraction, in which rawtext is analyzed to automatically populate a database. In this work, we augment a prob-abilistic information extraction system with corrective feedback and persistent learningcomponents to assist the user in building, correcting, and updating the extraction model.We describe methods of guiding the user to incorrect predictions, suggesting the most in-formative fields to correct, and incorporating corrections into the inference algorithm. Wealso present an active learning framework that minimizes not only how many examples auser must label, but also how difficult each example is to label. We empirically validateeach of the technical components in simulation and quantify the user effort saved. We con-clude that more efficient corrective feedback mechanisms lead to more effective persistentlearning.