Bank of America, National Association Customer Complaint Analysis

Author

Hue Nguyen Denke

Published

October 2, 2025

1 Introduction

This project analyzes the customer narrative complaint from Bank of America to find relationship between customer’s emotion with complaint dispute rate. In this project, I compare the emotional content of disputed vs. non-disputed complaints to identify emotional patterns that might predict complaint resolution difficulty.

Source: https://www.consumerfinance.gov/data-research/consumer-complaints/#download-the-data

2 Data Dictionary

Our dataset includes the following columns:

Date.received: Date that the complaint was received

Product: Product that was complaint about (i.e Debt Collection, Mortgage,..)

Sub.product: Sub category of the product (i.e Credit Card, Debit Card,…)

Issue: Issue category

Sub.issue: Sub issue category

Consumer.complaint.narrative: Consumer explanation on the issue

Company.public.response: Company public response via website or social media

Company: Name of bank. Note: Our dataset includes all US banks, but for the range of this project I will filter out only Bank of America

State: The state of the mailing address provided by the consumer

ZIP.code: The mailing ZIP code provided by the consumer

Tags: The department/office that the issue will be directed to

Consumer.consent.provided.: Consumer consent to public issue

Submitted.via: Platform the issue was submitted

Date.sent.to.company: Date that the issue was sent to the bank

Company.response.to.consumer: Company resolution to the issue

Timely.response: Whether if this is a timely response or not

Consumer.disputed.: Whether if the consumer dispute the resolution or not

Complaint.ID: The unique identifier of each complaint

3 Data Cleaning Methodology To Ensure Tidy Data

Converted all date column from character to Date format

data$Date.received <- as.Date(data$Date.received,
                              format = "%Y-%m-%d")

data$Date.sent.to.company <- as.Date(data$Date.sent.to.company,
                              format = "%Y-%m-%d")

Standardize empty cells to NA format

data$Consumer.complaint.narrative[which(data$Consumer.complaint.narrative == "")] <- NA

4 Data Summary

Our dataset spans 14 years (2011-12-01 - 2025-09-24) with 11,109,951 complaints.

These complaints come from 7,756 US banks.

The top 5 companies with most complaints are:

  1. Equifax, Inc. 

  2. Transunion Intermediate Holdings, Inc.

  3. Experian Information Solutions, Inc.

  4. Bank of America

  5. Wells Fargo & Company

In the following analysis, we would focus on Bank of America only

5 Key Findings

5.1 High-level view of the customer complaint

The most common problem are likely related to fraud, debt or denied issue.

5.3 Comparative Analysis using nrc sentiment

  • Method: Compare the emotional content of disputed vs. non-disputed complaints

  • Goal: Identify emotional patterns that might predict complaint resolution difficulty

  • Result: Largest dispute ratio falls within trust and positive emotions

5.4 Perform statistical analysis to find correlation between emotion and dispute rate

Run model

Significant predictors are

joy (p = 0.000157): negative relationship

sadness (p = 0.002610): positive relationship

trust (p = 1.15e-08): positive relationship

surprise (p = 0.010196): negative relationship

anticipation (p = 0.004157): positive relationship

5.5 Validate model with Chi-Squared Test

Significant predictors are

Anger (p = 2.700e-11)

Fear (p = 3.797e-05)

Sadness (p = 1.841e-06)

Trust (p = 1.888e-11)

Surprise (p = 0.029413)

Anticipation (p = 0.004549)

Joy is significant in the coefficient test but not in the sequential test, suggesting it may share explanatory power with variables added earlier

Anger is significant in the sequential test but not in the coefficient test

6 Final Suggestions

  • Focus on sadness, trust, surprise and anticipation as your primary findings since they are significant in both tests

  • Acknowledge joy as potentially important since it’s significant when controlling for all variables

  • Consider whether to include anger based on your research question and theoretical framework

7 R script

Link