본문 바로가기
Paper Review

[논문 리뷰] Analysis of CBDC narrative by central banks using large language models

by hyeonjins 2024. 5. 16.

https://www.sciencedirect.com/science/article/pii/S1544612323010152

 

 

논문 선정 이유

 

카카오뱅크, 한국은행 CBDC 모의실험 완료

[서울=뉴스핌] 홍보영 기자=카카오뱅크가 한국은행의 디지털화폐(CBDC, Central Bank Digital Currency) 모의실험 연구 사업을 성공적으로 수행했다고 15일 밝혔다.카카오뱅크는 지난해 8월부터 총 10개월

newspim.com

 

Abstract

  • CBDC 관련하여 중앙은행의 입장을 이해하기 위해, 중앙은행의 연설에 자연어 처리 기술을 적용하여 감정 분석
  • ChatGPT가 가장 좋은 성능을 보임

1. Introduction

중앙은행의 디지털 통화 CBDC의 등장과 잠재적 가치 탐구

  • Central Bank Digital Currency(CBDC), a new type of money tha exists only in digital form.
  • The implementation of a CBDC could enable central banks to engage in large-scale intermediation for retail deposits, wholesale deposits, or both.
    • 장점: 결제 시스템 무결성, 금융 포용 촉진, 혁신 촉진 등 다양한 목적을 달성
  • But its introduction could suppose unwanted effects on the financial system, such as a flight from commercial deposits, destabilizing financial intermediation, anonymity issues or privacy concerns
    • 단점: 상업 예금 이탈, 금융 중개 불안정, 익명성 문제, 개인정보 보호 문제 등

관할권 간 관심 수준이 다르기 때문에 CBDC의 미래 형태에 대한 높은 수준의 불확실성이 발생함. 연설을 통해 CBDC에 대한 중앙은행의 감정을 분석하면 가능한 정책 방향에 대한 통찰력을 제공하고, 투명성과 시장 기대치를 향상시킬 수 있음

💡 We quantify central bank sentiment towards CBDCs using Natural Language Processing (NLP) techniques, specifically two large language models (LLMs), such as ChatGPT (OpenAI, 2023) and BERT (Devlin et al., 2019), and traditional dictionary methods.
  • This dataset contains 332 speeches on the topic, and the authors include expert opinions on the sentiment displayed by each speech towards CBDC.
  • the sentiment towards CBDCs expressed by central banks is increasingly positive, and that the sentiment obtained with ChatGPT is the most reliable, since it is the most similar to the sentiment labeled by experts.
  • a regression analysis to see which might be the determinants of sentiment towards CBDCs.
  • speeches, particular those from Eurozone, exhibited greater sensitivity to the Facebook’s Libra announcement, a prominent private digital currency initiative.

Related Works

  • Hansen and Kazinnik (2023), who use GPT models to decipher the communication from the Federal Reserve.
    • GPT 모델을 사용하여 연준의 통신 내용을 해독
  • Burlon et al. (2022) studies the impact of CBDC communication in the real economy and financial markets.
    • 실물 경제와 금융 시장에서 CBDC 커뮤니케이션이 미치는 영향을 연구
  • Wang et al. (2022), find that CBDC attention and uncertainty indices from financial news have a positive relationship on cryptocurrencies, foreign exchange, and bond markets
    • 금융 뉴스의 CBDC 관심도와 불확실성 지수는 암호화폐, 외환, 채권 시장에 긍정적인 관계가 있음
  • Scharnowski (2022) uses the speeches towards CBDC from Auer et al. (2020) to study the market reaction to central banks’’ speeches on CBDC
    • CBDC에 대한 중앙은행의 연설에 대한 시장 반응을 연구
  • Tian et al. (2023) relies as well on the texts compiled by Auer et al. (2020) and find that different cybersecurity risks can have an impact on the posture of central banks towards CBDCs
    • 다양한 사이버 보안 위험이 CBDC에 대한 중앙은행의 태도에 영향을 미칠 수 있음

Contributions

  • First, while ChatGPT has been previously used for financial sentiment(Hansen and Kazinnik, 2023; Wang et al., 2023; Zhang et al., 2023), to the best of our knowledge we are the first to apply ChatGPT and other LLMs specifically to the topic of CBDC.
  • Second, we can shed light into the determinants of CBDC, because we now dispose of a continuous and reliable sentiment.

 

2. Dataset

  • a collection of central bank speeches on CBDCs collected at Auer et al. (2020).
  • From a database of more than 16,000 central bank speeches and documents, the authors select 332 texts that explicitly mention CBDC, during the period 2016 to 2022.
  • Auer et al. (2020) also analyzes the sentiment of those speeches using the human expert knowledge from Bank for International Settlements, which allow us to compare human expert opinions with those of NLP techniques.

 

3. Methodology

  1. A dictionary-based method, which can be considered as a benchmark.
    1. calculate the sentiment of a text by computing its polarity: $Polarity = \frac{Positives-Negatives}{Positives+Negatives}$
  2. FinBERT
    1. a financial domain-specific language model based on BERT, pre-trainedusing a large scale of financial communication corpora.
    2. The sentiment output is a probability distribution over negative and positive classes, which can be transformed to a -1 to 1 range.
💡 FinBERT: Financial Sentiment Analysis with Pre-trained Language Models (2019)
Dogu Araci
https://arxiv.org/abs/1908.10063

- Financial sentiment analysis는 specialized language와 labled data가 없기 때문에 어렵고 general-purpose model은 domain에서 사용되는 specialized language 때문에 효과적이지 않다.
- pre-trained model은 더 적은 labled data를 필요로 하고 domain specific corpora에서 train할 수 있기 때문에 이 문제에 도움이 될거라고 가정.
- financial domain에서 NLP task를 처리하기 위해 BERT기반 language model인 FinBERT를 제안.
- Datasets: TRC2-finalcial, Financial PhraseBank, FiQA Sentiment

 

3. ChatGPT

  1. 훨씬 더 큰 코퍼스에서 훈련됨(BERT의 3 terabytes에 비해 훨씬 큰 크기의 45 terabytes로 학습)
  2. 금융 연구쪽에서도 자주 활용됨(Dowling and Lucey, 2023; Hansen and Kazinnik, 2023)
  3. 프롬프트 엔지니어링이 중요함
    1. Our benchmark prompt is*: Compute the sentiment score towards central bank digital currencies, measured between 1 and 1, of a given text. The response should be just a float number, no text. The text is as follows: [...].
    2. We have performed robustness analysis with different prompts
    3. The best version available in ChatGPT API was GPT-3.5, with a limitation of 4,000 tokens per prompt.

 

3.1. Workflow

  1. While dictionary-based methods can be applied to texts of any size, both BERT and ChatGPT have limitations, of 500 tokens and 4,000 tokens respectively.
  2. split the documents into paragraphs
  3. select only relevant paragraphs where a number of keywords are mentioned
  4. calculate the sentiment with the three NLP methods
  5. The overall sentiment of the document is then computed by averaging the sentiments of its paragraphs
  6. repeat this process for all the documents

 

4. Sentiment results

4.1. Evolution of sentiment

  • The average sentiment is higher for ChatGPT, 0.33, followed by BERT, 0.18 and Polarity 0.07.
  • Polarity is much more volatile, with higher variance and extreme values.

분기별 및 연도별 문서 수가 크게 변경되므로 감정 변화를 추적하기 위해 이동 평균 분석을 수행

  • The figure confirms that sentiment towards CBDC appears to be more positive as time goes by.
  • Lower sentiment on CBDC at the outset is in line with the idea that central banks began to talk cautiously about the issue.
  • But over time, the rising trend in sentiment reflects that they have become more open to explore the possibility of issuing a CBDC
💡 Moving Average Analysis
- 어떤 것이 방향성을 가지고 움직일 때, 이동하면서 구해지는 평균
- n번째 데이터의 단순이동평균 = n번째 데이터를 포함한 왼쪽  m개의 데이터의 산술평균
- window: 20

 

분기별, 연도별로 그룹화해도 결과가 크게 바뀌지 않음

 

4.2. Comparing with humans and labeled data

  • compare NLP techniques to sentiment of human experts on Auer et al. (2020), who provided a score of -1, 0 or +1, depending on the document’s sentiment towards CBDC.

전체 텍스트와 관련 문단이 2개 이상인 더 큰 텍스트를 비교했을때, 큰 텍스트에서 더 잘 포착됨

  • All correlations are positive and significant, which indicate that the techniques are capturing to some extent the sentiment expressed by humans. But LLMs are measuring better that sentiment, specially ChatGPT.
  • The difference between ChatGPT and the other techniques is bigger in larger texts, where the correlation between ChatGPT and BIS labeled data is significantly higher than the others.

  • Fig. 7 shows the correlation between ChatGPT and human labels in speeches with more than one relevant paragraph, where this relationship can be visually appreciated.

ChatGPT scores가 -1과 1 사이에서 연속적이므로, 범주형 자료인 BIS와 비교하기 위해, 점수의 하위 10%는 -1, 상위 50%는 1, 나머지 점수는 0으로 변환

  • We reconstructed the graph made in Auer et al. (2020), in which they plot the cumulative sum of sentiment month by month.
  • The trend changes are very similar, which is remarkable, especially considering that the prompt we used was quite generic, and it could be further tailored to capture BIS preferences.

 

5. Determinants of sentiment

Ordinary Least Squares로 추정된 회귀식을 활용하여, ChatGPT에서 계산한 연속 감정 점수에 영향을 미치는 잠재 요인 분석

$$ y_{i,t}=\beta_0+\beta_1Centralbank_i+\beta_2Size_i+\beta_3GoogleTrends_t+\beta_4EPU_t+\beta_5Bitcoin_t+\beta_6Libra_t+\sum_{k=7}^{12}\beta_kTime_t+u_{i,t} $$

  • $y_{i,t}$ represents sentiment for speech i at time t
  • $CentralBank_i$ is the central bank that gave the speech i
    • 연설 i를 제공한 중앙 은행
  • $Time_t$ is a time dummy indicating the year at which the speech i took place
    • 연설 i가 발생한 연도를 나타내는 시간 더미
  • $Size_i$ is the number of words of speech i
    • 연설의 단어 수 i
  • $GoogleTrends_t$ indicates the amount of searches in Google Trends of the term CBDC at the time of the speech i
    • 연설 당시 구글 트렌드에서 CBDC라는 용어에 대한 검색량
  • $EPU_t$ indicates the Economic Policy Uncertainty index at the time of the speech
    • 연설 당시의 경제 정책 불확실성 지수를 나타냄
  • $Bitcoin_t$ indicates the price of Bitcoin at the time of the speech
    • 연설 당시 비트코인 가격
  • $Libra_t$ is a dummy that could be 1 after the announcement of Libra (June 2019) and 0 otherwise
    • 리브라 발표(2019년 6월) 이후에는 1, 그 외에는 0
💡 OLS (최소제곱법): 오차의 제곱합이 최소가 되는 𝑏0, 𝑏1을 찾는 것이 목적

 

  1. 미국, 일본, ECB, 영국을 대표하는 더미는 유의미한 음의 계수: CBDC에 대한 이들 기관의 태도가 싱가포르, 홍콩 및 기타 국가보다 더 조심스러움
  2. 유로존에 초점을 맞출 때 Libra 변수의 중요성이 커짐- 민간 계획이 디지털 유로 프로젝트 시작의 촉매제가 되었을 수 있음
  3. 크기가 두 데이터 세트 모두에서 중요한 변수로 나타남
    1. LLM이 텍스트 길이에 민감할 수 있음
  4. GoogleTrends 및 Bitcoin과 같은 변수는 데이터셋에 따라 다름
  • 결과 (전체):
    • JPN, ECB, UK, USA: 각 국의 연설이 감정 점수에 부정적인 영향을 미치며, 모두 통계적으로 유의합니다 (**).
    • Size: 연설 길이가 길수록 감정 점수가 낮아짐, 매우 유의함 (**).
    • GoogleTrends: 긍정적인 영향을 미치나 유의성은 낮음 (*).
    • Libra: 긍정적인 영향을 미치며 유의함 (*).
  • 결과 (EU Only):
    • ECB: 유럽중앙은행의 연설이 EU 내에서는 감정 점수에 거의 영향을 미치지 않음 (통계적으로 유의하지 않음).
    • Size: EU 내에서도 연설 길이가 감정 점수에 부정적인 영향을 미치나 전체보다는 영향이 적음 (*).
    • Libra: 리브라 발표 이후 감정 점수에 긍정적인 영향을 미침, EU 내에서 매우 유의함 (*).

 

6. Conclusion and further work

CBDC에 대한 중앙은행의 입장 감정분석을 LLM(BERT, ChatGPT)를 사용하여 수행함

  1. the sentiment towards CBDC seems to be increasing from 2017 onwards according to all techniques.
  2. ChatGPT sentiment towards CBDC is closer to that labeled data by human experts.
  3. the sentiment of the speeches became more positive following the launch of Facebook’s Libra, a private digital currency, specially in the Eurozone

그러나 LLM의 사이즈로 인해 새로운 리스크가 발생할 수 있음

  1. interpretability of the results, which is a challenge under scrutiny by regulators, with the aim of identifying which parts of an LLM are responsible for its behaviors
  2. concerns about third-party dependencies and the potential electrical and environmental cost of keeping these models online for everyone to access

향후 연구

  1. it might be interesting to assess the importance of prompt engineering when defining the task for ChatGPT, like changing its content, length, etc.
  2. extend the analysis to other LLM techniques, like GPT4, XLNet, LLaMA, or T5.

 

Appendix

A.1 Robustness analysiss. Prompts for crypto

  • To test whether ChatGPT is more optimistic than BERT or Polarity by default, we replicate the workflow from Fig. 3 to paragraphs with following keywords: crypto(s), crypto asset(s), crypto-asset(s), using the prompt: *Compute the sentiment score towards crypto assets, measured between 1 and 1, of a given text. The response should be just a float number, no text. The text is as follows: [...]

이동 평균 분석 결과, 암호화폐에 대한 감정은 상승 궤적을 띄지 않음

  • Polarity is the more volatile, and although on average is always below BERT and ChatGPT, it occasionally surpasses the LLMs.
  • ChatGPT does not always exhibit a higher sentiment that its counterparts.
  • Therefore, the fact that ChatGPT captures more sentiment towards CBDC seems to be genuine, dispelling doubts towards its possible bias towards positive sentiment.

A.2. Robustness analysis. Prompts without asking specifically about CBDC

CBDC 관련 프롬프트를 작성하지 않았을때의 ChatGPT는 BERT와 유사함

  • We can see that BERT and ChatGPT without CBDC are more similar to each other, supporting the greater positive effect found by ChatGPT when asked about CBDC.