Man and Machine in Financial Reports: The Importance in the Era of Natural Language Processing Technology

In today’s digital age, the rapid development of technology has changed every aspect of the business world. Among these, the widespread application of Natural Language Processing (NLP) technology in the finance and accounting sectors has become a hot topic. This article will delve into how NLP technology impacts the financial reports of listed companies, and calls on corporate managers to pay attention to this trend to ensure that their company’s financial information can maintain good interaction with “machines.”

The Emergence of NLP Technology

In the fields of finance and accounting, the application of NLP technology is primarily focused on analyzing a vast amount of financial reports, news articles, and social media content. The goal is to read, parse, and summarize these textual data, extracting key information about companies and markets. This is exemplified by investors increasing their investment in NLP technology. For instance, China’s leading fund company, ChinaAMC, collaborated with Beijing Lanzhou Technology, a leader in the NLP/AI field, to establish a joint laboratory for financial natural language processing, aiming to build sentiment NLP capabilities based on ChinaAMC’s investment research logic[1]. In 2023, Bloomberg released the financial sector’s first generative large language model, BloombergGPT[2]. Leveraging a large volume of carefully edited textual data accumulated by Bloomberg, such as financial news and report texts, BloombergGPT was designed to understand the financial market’s common sense, providing investors with real-time market dynamics and insights. This automated analysis method offers investors faster and more accurate information, helping them make wiser investment decisions.

NLP in Textual Analysis

To compile a financial report for machines, it’s essential to understand how machines analyze such reports. Typically, text analysis can be divided into the following three steps: text representation, model prediction, and subsequent analysis, as shown in Figure 1.

Figure 1. Flowchart for Textual Analysis

To illustrate with a simple example, investors use listed companies’ financial reports to predict the stock returns for the next quarter. The first step is to convert the text in the financial reports into numbers. The second step involves building a machine learning prediction model (such as a neural network) to forecast the stock returns. The input to the model is the text representation obtained in the first step, and the output is the predicted stock return rate for the next quarter of the respective listed company. Finally, investors can use the prediction results for subsequent analyses, such as comparing the investment returns of different listed companies for the next quarter (cross-section), or analyzing the fluctuation of a particular company’s historically predicted returns (time-series).

Negative Sentiment
Positive Sentiment
Table 1. Chinese Sentiment Dictionary: Formal Language

Negative Sentiment
Positive Sentiment
Table 2. Chinese Sentiment Dictionary: Informal Language

Currently, the mainstream method for text representation is the bag-of-words model. The bag-of-words approach is relatively simple, creating a “bag” vector for analysis by counting the occurrence of each word in the text or using a binary value for its presence or absence. Subsequently, investors often use predefined, scientifically validated sentiment dictionaries to generate the sentiment conveyed by the financial report text, thereby predicting the investment return rate for the next quarter. Tables 1 and 2 list typical Chinese sentiment words in formal and informal language, respectively[3]. Figure 2 shows sentiment analysis based on Warren Buffett’s annual letters to shareholders, including negative, positive, uncertainty, and litigious sentiments.

Figure 2. Sentiment Fluctuations in Warren Buffett’s Letters to Shareholders

Implications of New Technologies for Financial Reporting

From the perspective of financial report writing, the authors can primarily influence the first step of text analysis. That is, how to write a financial report so that the machine can better represent the text and ensure that this text representation helps generate more favorable predictions for the company in the second step. In this context, how should authors write financial reports?

The recommendation is for listed companies to organize relevant teams to compile and maintain dictionaries of commonly used positive and negative sentiment words. In the process of writing financial reports, under the premise of ensuring semantic accuracy, use positive sentiment words as much as possible. For example, in the phrase “the consumer market emerges from the downturn of the pandemic period,” “downturn” is a typical negative word. Although the sentence can be understood as positive in context, current NLP models still cannot understand contextual information, which might lead to a negative sentiment prediction. A rewrite could be: “the consumer market begins to warm up after the pandemic.” Such an expression helps the machine generate a positive prediction while ensuring the accuracy of the semantics.

Thoughts on Large Language Models

Large language models are generative models that, by learning from a vast amount of text data, can generate natural language texts with grammatical correctness and semantic coherence. Large language models saw rapid development in 2023, bringing new considerations for the writing of financial reports. In this context, the author believes that a quick method to check the machine’s impression of a financial report is to input the written text into a large language model for summarization and induction. It’s important to note that, given the high sensitivity of financial reports, when choosing a large language model, preference should be given to open-source models that can be deployed locally, such as LLaMA2 developed by Meta[4].

Challenges and Outlook

In this digital and automated era, machines are the primary readers of financial reports. Corporate managers need to pay attention to the shift in audience. Embracing the trend towards NLP technology will aid companies better cater to the needs of machines, thereby more effectively serving the requirements of investors, analysts, and regulatory bodies, and maintaining a competitive edge in the digital age. This shift does not reduce the importance of human involvement. Instead, it underscores the necessity for a symbiotic relationship between machines and humans, working together to produce financial reports that are more accurate, comprehensive, and impactful. Listed companies must proactively adapt to this transformation, engaging in continuous learning and innovation to ensure their leadership position in a competitive market.


[1] 華夏基金與瀾舟科技成立金融NLP聯合實驗室,共促金融科技創新

[2] 彭博推出BloombergGPT——專為金融行業從頭打造的500億參數大語言模型

[3] 姚加權,馮緒,王贊鈞,紀榮嶸,和張維 (2021). 語調,情緒及市場影響:基於金融情緒詞典。管理科學學報,24(5)

[4] Florian Reifschneider. (2023). Leveraging Open-Source LLMs for Data Privacy and Compliance in Corporate Use Cases.

The work described in this article was supported by InnoHK initiative, The Government of the HKSAR, and Laboratory for AI-Powered Financial Technologies (AIFT).
(AIFT strives but cannot guarantee the accuracy and reliability of the content, and will not be responsible for any loss or damage caused by any inaccuracy or omission.)

Share this content


Units 1101-1102 & 1121-1123,
Building 19W Science Park West Avenue,
Hong Kong Science Park,
Shatin, Hong Kong

Products & Solutions


About Us


Copyright © 2024 Laboratory for AI-Powered Financial Technologies Ltd. All Rights Reserved.