Bert and earnings:  Predicting bottleneck stages by pretraining large language models

Collin Guo

Back

Bert and earnings: Predicting bottleneck stages by pretraining large language models

Thesis

Open access

Bert and earnings: Predicting bottleneck stages by pretraining large language models

Collin Guo

California State University, Sacramento

Master of Science (MS), California State University, Sacramento

12/15/2025

Handle:

https://hdl.handle.net/20.500.12741/rep:13793

Abstract

BERT

LLM

Business bottleneck

Large language model

Statement of Problem Bottlenecks in business are crucial turning points that can make or break the development of an industry or firm. We introduce BottleneckBERT, pretraining and finetuning BusinessBERT and bert-uncased respectively, on the task of classifying text by the associated bottleneck stage. We adapt this model to the highly specific target domain of bottlenecks in streaming media by using trade publications in the field. Sources of Data We initially gathered and preprocessed data from two primary business trade publications: Multichannel News, and Streaming Media. Streaming Media is one of the most well-known trade publications in the digital media industry. Led by a team of recognized experts, it serves the streaming ecosystem by providing industry news, trade publications, research reports, and case studies. Similarly, Multichannel covers cable television and telecommunications industries. It provides timely news and analysis on programming, technology, business strategies, and policy developments affecting cable TV networks, satellite providers, telcos, and streaming services. Conclusions ReachedBottleneckBERT has substantial improvements over simple finetuning of existing models and also represents a methodology to produce such improvements in any domain. Starting from the initial BERT-only approach at .80 and business-based approach at .72 macro F1, we then combined our BERT-based architecture, the business-based features, and state-of-the-art BusinessBERT to achieve .89 macro F1. We then pretrained BusinessBERT further to create BottleneckBERT, which outperformed the state-of-the-art with a .91 macro F1. We also achieved .89 macro F1 by pretraining on naive BERT, giving nearly identical performance with 70 MB unclean data instead of the 12+ GB that BusinessBERT uses. We introduce this model as BottleneckBERT-lite.

Files and links (1)

pdf

Guo Collin Project (1)1.19 MBDownload View

TextProject Open Access

Metrics

5 Record Views

Details

Title: Bert and earnings: Predicting bottleneck stages by pretraining large language models
Creators: Collin Guo
Contributors: Jung Yoon Jang (Committee Member)
Haiquan Chen (Advisor)
Academic Unit: Computer Science Department
Theses and Dissertations: Master of Science (MS); Computer Science; California State University, Sacramento; 04/30/2025; 2025
Publisher: California State University, Sacramento
Publication Details: 12/15/2025
Identifiers: 99258253266401671; https://hdl.handle.net/20.500.12741/rep:13793
Resource Type: Masters Project
Language: English
Number of pages: 52
Comment: The accessibility of this document has been verified by Sacramento State University Library. For questions, please contact lib-508Accessibility@csus.edu.