Objective The ability of a cue-based system to accurately assert whether a disorder is affirmed, negated, or uncertain is dependent, in part, on its cue lexicon. of several recent shared tasks including the 2010 i2B2/VA Challenge , CoNLL-2010 , and BioNLP 2009  in clinical, biomedical, and biological texts, respectively. Each of these tasks commonly applied rule-based and machine-learning approaches. The shared task closest to clinical negation and uncertainty classification for disorder mentions is the 2010 i2B2/VA Challenge . For the challenge, participants developed systems for asserting whether a disorder 1609960-30-6 was among other assertion labels. The highest performing systems used custom dictionaries and rule-based systems output as feature sources for machine-learning algorithms. The best assertion classifier achieved a 94% F-score with a multi-class support vector machine . Researchers have also developed a number of negation and uncertainty detection systems, independently of shared tasks. For instance, NegEx , NegFinder , and NegExpander  achieve high performance for detecting negated disorders, using Rabbit Polyclonal to MAPKAPK2 (phospho-Thr334) cue lexicons and heuristics. For uncertainty, NLP tools achieve moderate to high performance for asserting the uncertainty level of disorders, using rule-based and machine-learning approaches, including StAC , CARAFE , pyConTextNLP , and others . Traditionally, assertion classification consists of two processing steps: (1) detecting an assertion cue (e.g. not and denies for negation and most likely and possibility of for uncertainty) and (2) predicting its scope. 2.2. Negation cue detection and scope Researchers have used both rule-based and machine-learning approaches to study negation cues and their scope [6, 20C22]. Goldin and Chapman compare naive Bayes and decision trees to learn scope patterns for the frequent cue not and achieve 81% and 88% precision, respectively, compared to 60% for a token distance-based method . Morante et al. used a supervised inductive algorithm based on k-nearest neighbor to predict whether a token is a negation term and to learn its scope using token (morphological and syntactic information) and token context (morphological and syntactic information of three preceding and succeeding tokens) in biomedical texts . They observed that correctly identifying negation terms beyond no and not can improve detection of negation signals with F-scores improving 7 points. Agarwal and Yu describe a systematic analysis of negation terms and scope for predicting the negation status of disorder mentions on the NegEx test data using their systems NegCue and NegScope . The major source of false negatives for NegCue were entities preceded by denied or denies; these negation cues did not occur in the BioScope training corpus. After incorporating these negation cues, the F-score of the system increased by about 7 points. The authors found the majority of remaining errors were scope errors, not errors due to missing cues. 2.3. Uncertainty cue detection and scope Understanding coverage and scope of uncertainty cues is perhaps more complex due to the varying degrees of uncertainty and the lexicosyntactic patterns used to express them . Several studies have described the effect of coverage and scope of uncertainty cues for uncertainty classification tasks [9, 11, 18, 19]. Uzuner et al. developed StAC, a support vector machine trained with lexical and syntactic features, to assert whether a disorder was . They compared StACs performance against an extended version of NegEx called ENegEx. Some of ENegExs most frequent mistakes were the result of an incomplete lexicon, for example, missing uncertainty cues such as most likely. StAC outperformed 1609960-30-6 ENegEx using a 4 word window and section headings for addressing cue scope. They observed that a 2 syntactic link window reduces the number of false positives created when a 1609960-30-6 negation cue such as no modifies the head noun phrase and not the adjectival, prepositional noun phrase as in intervention due to cardiovascular disease. Clark et al. integrated CARAFE, a conditional random fields model trained to detect negation and uncertainty cues and their scopes, with a rule-based module to assert whether a disorder was . Word features such as unigrams within the disorder as well as words within a 3 token window of the disorder contributed most to assertion performance, resulting in an F-score of 91%. Many assertions were classified while because of missing uncertainty cues incorrectly. For example, the doubt cue chance for was not one of the certainty cues discovered through the BioScope corpus. In additional cases, the result from the scope from the cue had not been terminated, therefore a problem was asserted by, both no no certain evidence will be defined as modifiers from.