Cracking the Human-Language Code of NLP in Financial Services

In summary: Natural Language Processing (NLP) is revolutionizing the financial industry by automating human-language tasks, enhancing content processes, and addressing the challenges of scale, speed, and complexity.

Data Revolution in Finance: Financial services benefit from data analytics across trading, risk management, etc. But human language information remains less automated.
Advanced NLP with Large Language Models: ChatGPT 3.5 and other models demonstrate the ability to answer questions and generate human-like content.
Applications & Challenges: NLP can optimize various content processes in finance, from knowledge gathering to analytics. However, challenges include data security, skillset availability, system accuracy, and emerging AI regulation.
A tech-focused future: At TOPPAN Digital Language, we became early adopters of NMT (a sub-field of NLP), incorporating it into our toolkit, and we will continue to be pioneeringly involved in the exploring use cases of NLP.

The financial services industry has already benefited from a data revolution across many domains, such as trading, risk management, and fraud detection, to name just a few. However, the industry also deals with huge amounts of human language information. To date, there has been much less automation in producing and analyzing such information, resulting in a heavy reliance on humans for these tasks.

However, the arrival of Artificial Intelligence in the form of Natural Language Processing (NLP) will produce a wave of innovation and paradigm shift in the speed, scale, and complexity of processing human-language information. Examples of tasks that NLP can address include:

Information gathering in complex topics across multiple languages.
Immediate analysis of large quantities of newsflow.
Automated report writing and content repurposing.

What is Natural Language Processing?

What is Natural Language Processing (NLP)?

NLP is just one field of AI, but it has long captured the imagination. Alan Turing’s ‘imitation game’ (also known as the Turing Test) focused precisely on the potential future, the uncanny ability of machines to imitate human conversation and to answer questions to such a level that a human could not tell it was a machine. However, such a capability was beyond reach with traditional computer programming methods.

This was highly evident in the NLP sub-field of Machine Translation (MT). Until the late 2010s, MT (using firstly Rules-based and then Statistical MT) was relatively poor, to the extent that the only significant use-case was the trawling of foreign-language information by intelligence agencies.

Statistical MT improved only incrementally each year and could barely handle some language pairs at all if the grammatical structures were too different from each other.

However, the advent of neural networks and machine learning revolutionized MT. By training Neural Machine Translation (NMT) engines on large quantities of sources and translated material, it became a performant application. In fact, at TOPPAN Digital Language, we recognized early on the potential of NMT and actively incorporated it into our toolkit, witnessing its growth and potential to greatly enhance translation accuracy and efficiency for our clients in financial services.

Today, MT is a firmly established technology used in the translation process. It continues to have its limitations, but those limitations reduce every year.

Solving Machine Translation was hard, but to meet the Turing Test, there were further challenges to solve: Question Answering and Natural Language Generation (NLG), which is a computer’s ability to generate content that humans easily understand.

To achieve this, more advanced machine learning would have to be applied to much larger quantities of data. When the Large Language Model (“LLM”) ChatGPT 3.5 was released, it surprised not just ordinary users but many in the NLP world. Its scale, question-answering capability, and ability to generate well-structured, fluent text was seriously impressive.

Since then, there have been further releases of LLMs that have delivered even more stunning capabilities.

How can enterprises harness the power of NLP?

To start to answer this question, it is helpful to understand the types of problems that NLP can solve. These include keyword and named entity recognition, tagging, text classification, and parsing, through to more complex actions, such as translation, summarization or paraphrasing, and the significantly more difficult tasks of question-answering and natural language generation.

Furthermore, NLP is increasingly combined with Automatic Speech Recognition and synthetic voice for text-to-speech, speech-to-text, and speech-to-speech.

This broad set of problem-solving capabilities can be applied in many ways to assist with or fully automate existing content processes. Consider the end-to-end content value chain. At a high level, the content value chain comprises the following steps:

Knowledge gathering
Content authoring
Content transformation,
repurposing and updating
Content formatting
Content management
Content Translation
Content publication
Content performance analytics

Each step can be enhanced or transformed with NLP at a new scale, speed and complexity. Let’s think of some more specific examples pertinent to financial services. Here are just some ideas:

Automatically generate a draft ESG report based on access to the company’s information systems, benchmark information from third-party reports and application of the relevant taxonomy
Automatically generate an analyst’s recommendation report from the analysis of a company announcement, combined with input from previous announcements, meeting transcripts, peer group news and the analyst’s previous notes
The trading desk receives an automatic report synthesizing all the newsflow in the market that day. Based on sentiment analysis, it creates a heat map of negative and positive newsflow. Based on the analysis, they also create a predictive model for share price moves.
A fund manager creates a report summarizing and synthesizing all of the analysts’ reports in the market within a particular sector or across the market that week. The report is converted into audio for them to listen to on their way home.
An investment analyst instantly analyses an IPO document, picks out the key points, risks, and anomalies and disseminates it to their fund management teams in multiple countries in their native languages.

Of course, many more examples will be even more powerful when combined with quantitative data.

So what are the challenges?

The first challenge is information security: confidential information should not be entered into any public systems, particularly not into public Large Language Models, because of the discoverability of the information. This means that organizations must create their own secure environments for their models.

Second is finding the skill sets to help build the models and applications in these novel technologies, requiring a mix of industry subject matter expertise with NLP data science and more traditional IT capabilities.

Third is accuracy and quality: humans will still have to be “in the loop” until the reliability of these systems can be accurately predicted. This might even include partnering with a reputable third-party provider for some parts of the content value chain – especially when translating content using Neural Machine Translation.

And finally, it is likely that regulation of AI will start to emerge quite soon, requiring a further overlay of compliance which could differ region by region.

There is a long road of innovation ahead

The current wave of innovation is not only here to stay, but the speed of innovation is accelerating markedly, enabled by AI itself. Now is the time to educate and familiarize your business with NLP, how it works and its potential applications. It is also a great time to start identifying the use cases where NLP can add significant value to your existing processes or enable whole new capabilities.