Methods & Infrastructure

Making Words Count: Computational Linguistics in Management Research

Aligning Theory, Language, and Computational Measurement

Joe Simpson, Richard A. Hunt, Judy Rady, Elham Asgari, Mohammed Rady, Daniel Beal

Journal of Management

forthcoming4 min read

Key Finding

The credibility of computational-linguistics research does not rest on the sophistication of the tool. It rests on alignment among three things: the theoretical construct, the linguistic signal through which that construct surfaces in text, and the method used to read it. When they fall out of register — when a theory implies meaning that depends on context but the method only counts words — even a powerful model returns results that look precise and may mean very little.

Overview

Forthcoming at the Journal of Management, this review takes stock of how management scholars analyze the language that organizations produce. Drawing on 353 studies published across fifteen leading journals between 2013 and 2025, work that collectively analyzed more than 2.3 billion text documents, the review pursues one question: when does computational linguistics (CL) actually measure what a theory claims to measure? Its answer is that credibility does not come from the sophistication of the tool but from alignment — the theoretical construct, the linguistic signal through which that construct surfaces in text, and the computational method used to read it must match. Where they do not, even a powerful model can return measures that look precise and carry little theoretical weight.

The paper organizes the field's methods by the kind of linguistic signal each can read rather than by how recently it was invented. Lexical methods operate on word presence and frequency; distributional methods capture statistical regularities in how words co-occur; compositional methods recompute a word's meaning from the surrounding context, so the same word form can be represented differently from one passage to the next. The distinction matters because the appropriate method depends on the mechanism a theory implies, not on the novelty of the algorithm — a point the review then turns into four working principles.

Contribution to the Research Program

This paper is the methodological backbone of the lab's empirical work on language under uncertainty. Much of that program reads organizational and entrepreneurial reality through text: the rhetoric ventures use to mobilize resources, the way founders and firms describe futures they cannot yet calculate, the signals investors and stakeholders parse. The review supplies the standards that make such measurement trustworthy — and it does so at a moment when those standards matter more than usual. As large language models become the instruments through which scholars read text, the configuration choices behind them — prompt wording, model version, sampling temperature — quietly shape what gets measured. The paper treats those choices as design parameters that demand sensitivity analysis and disclosure, not as background implementation details.

A second theme connects directly to the program's interest in how actors behave once they know they are being read. The review argues that what an organization leaves unsaid can be as consequential as what it says, and that AI sharpens this dynamic: as firms anticipate how their words will be parsed by algorithms, regulators, and automated sentiment tools, they adjust not only what they disclose but what they strategically withhold. Silence becomes a theoretically meaningful object — and one that most CL methods, built to read presence rather than absence, are not yet designed to detect. That blind spot is a measurement problem today and a moving target tomorrow, as the systems doing the reading reshape the language being written.

Key Insights

Three method families divide by the linguistic signal each can read — lexical (word presence and frequency), distributional (co-occurrence patterns), and compositional (context-recomputed meaning) — and the right choice follows from the mechanism a theory implies, not from which tool is newest.
Minimal sufficient complexity cuts both ways. Over-engineering — a transformer where a dictionary would do — obscures the path from construct to measure; under-engineering — word counts for meaning that lives in negation, contrast, or sequencing — misses the construct entirely.
Two teams studying the same construct on the same corpus can reach different conclusions through equally defensible prompts, model versions, and temperatures. Robustness across those choices has to be demonstrated and reported, not assumed.
Disclosure does not create validity; it determines whether validity can be assessed at all. The review documents how rarely current practice clears that bar — 47.6% of studies offered limited or no construct validation, 74% omitted adequate preprocessing detail, and 54% shared neither coding schema nor data.
The field's most common corpora are convenience samples with distortions built in: social media (the single largest source, at 30%) and corporate filings carry platform skews, legal filtering, and investor-facing incentives that reshape the very language under study.
Absence is not always missing data. When a construct is partly constituted by what goes unsaid, ignoring silence conflates observed language with the thing itself — a blind spot that grows as actors learn to write for the machines reading them.