NTT DATA Starts Verification of Natural Language Processing Technology for Financial Industry

On July 10, NTT Data used the financial version of BERT (Bidirectional Encoder Representations from Transformers: Google’s natural language processing model, which has been attracting attention in various benchmarks in the field of natural language processing, such as exceeding the accuracy of conventional models). Regarding natural language processing technology, it has announced that it will solicit financial companies such as banks and securities companies, and will start verification verification in sequence from July.

The financial version of BERT is a language model that NTT DATA has uniquely specialized BERT for the financial industry. It eliminates the need to learn a language model each time a document containing financial jargon or a specific context is analyzed, making it possible to obtain highly accurate results while shortening the learning process.

  • Financial version BERT image

    Financial version BERT image

BERT has the feature that it can analyze based on the context, which was difficult with conventional natural language processing technology, and by applying a language model, natural language such as assigning FAQ answers at financial industry call centers and extracting information from daily business reports, etc. It is expected to improve the accuracy of various processes that require processing technology.

In recent years, research toward business utilization of natural language processing technology has progressed, and while it is being utilized in the financial industry for improving customer support by chatbots and improving operational efficiency by screening support, the texts of the financial industry are unique to the industry. There were many highly specialized terms and phrases, and it took labor and time to apply natural language processing technology, such as the need to maintain a dictionary and build a large number of rules.

In addition, in order to apply BERT to Japanese financial documents, a BERT model for Japanese is first required, but it was learned with a large-scale corpus (a linguistic material that collects texts and utterances on a large scale and creates a database). There were also issues such as the lack of Japanese models.

To address these issues, the company will develop a financial version of BERT using its own financial documents based on the NTT version of BERT learned on a large-scale corpus, and start proof-of-concept verification of business application. I made it.

While the BERT model announced by Google was trained with a corpus of 13GB or more, most of the published BERT models for Japanese were trained with the Japanese Wikipedia corpus (about 3GB). NTT Media Intelligence Laboratories uses a large-scale corpus (12.7GB) collected from news sites and blogs in addition to Japanese Wikipedia, and has developed a BERT model (NTT version BERT) trained with Japan’s largest corpus. ..

The financial version BERT is a model that is additionally learned for financial documents by using the financial documents originally collected by NTT DATA for the NTT version BERT, and the BERT model trained in the corpus of a specific field is used for tasks in that field. It is reported that it achieves higher accuracy than the BERT model trained by a general corpus.

In order to evaluate the performance of the financial version BERT model in natural language processing of financial documents, we conducted two preliminary verifications: “evaluation of accuracy of word prediction in financial documents” and “comparison of scores in financial qualification tests”. The accuracy of word prediction in financial documents is evaluated by masking the financial documents collected by the company and evaluating the accuracy of the original word prediction.

In the score comparison in the financial qualification test, we developed a test answer AI using BERT that answers the mock test of a kind of sales representative qualification test created by a teaching material production company as an issue that requires financial knowledge, and compared the scores by each model. .. In both verifications, it was confirmed that the NTT version of BERT is effective in large-scale learning, and that the financial version of BERT is a suitable model for financial documents.

It is envisioned that the financial version of BERT will be used to extract information from daily reports, check the contents of approval documents, extract risks from financial information, automatically assign FAQ answers, and respond to inquiries using chatbots.

In the future, the company plans to carry out five verification verifications of the financial version BERT during FY2020, and aims to start providing the financial version BERT service in FY2021 by utilizing the results of the verification verification.

In addition, in order to utilize the company’s unique know-how and technology related to natural language processing and promote the application of the financial version BERT model to actual business, we are looking for companies that support verification, and apply for financial institutions until the end of September 2020. accept. After the reception, the verification plan will be collated with the financial institution and NTT DATA, the effect will be measured and analyzed by the company, and the effectiveness of the technology and the issues and countermeasures for introduction to the business will be clarified through the verification.