Abstract
Statement of Problem Bottlenecks in business are crucial turning points that can make or break the development of an industry or firm. We introduce BottleneckBERT, pretraining and finetuning BusinessBERT and bert-uncased respectively, on the task of classifying text by the associated bottleneck stage. We adapt this model to the highly specific target domain of bottlenecks in streaming media by using trade publications in the field.
Sources of Data We initially gathered and preprocessed data from two primary business trade publications: Multichannel News, and Streaming Media. Streaming Media is one of the most well-known trade publications in the digital media industry. Led by a team of recognized experts, it serves the streaming ecosystem by providing industry news, trade publications, research reports, and case studies. Similarly, Multichannel covers cable television and telecommunications industries. It provides timely news and analysis on programming, technology, business strategies, and policy developments affecting cable TV networks, satellite providers, telcos, and streaming services.
Conclusions ReachedBottleneckBERT has substantial improvements over simple finetuning of existing models and also represents a methodology to produce such improvements in any domain. Starting from the initial BERT-only approach at .80 and business-based approach at .72 macro F1, we then combined our BERT-based architecture, the business-based features, and state-of-the-art BusinessBERT to achieve .89 macro F1. We then pretrained BusinessBERT further to create BottleneckBERT, which outperformed the state-of-the-art with a .91 macro F1. We also achieved .89 macro F1 by pretraining on naive BERT, giving nearly identical performance with 70 MB unclean data instead of the 12+ GB that BusinessBERT uses. We introduce this model as BottleneckBERT-lite.