Translation from narrative text to standard codes variables with Stata

Federico Belotti, Domenico Depalo

September 2010

Abstract

In this article, we describe screening, a new Stata command for data management that can be used to examine the content of complex narrative-text variables to identify one or more user-defined keywords. The command is useful when dealing with string data contaminated with abbreviations, typos, or mistakes. A rich set of options allows a direct translation from the original narrative string to a user-defined standard coding scheme. Moreover, screening is flexible enough to facilitate the merging of information from different sources and to extract or reorganize the content of string variables.

Type

Report

Publication

Stata Journal

screening; keyword matching; narrative-text variables; standard coding schemes