Investors in Structured Credit products receive trustee reports as PDF documents every month, and are well aware of the hard manual work required to extract necessary data points.
We automating the extraction of desired values through a suite of Artificial Intelligence models. In contrast to financial spreading, we are mostly dealing with multiple data points per header, which suggests different models. Additionally, a classification to unknown classes at training, e.g. for company names, is necessary.
Interpreting information presented graphically using a Convolutional Neural Network is essential. Much information is contained in merged cells and indentation which is necessary to reach human levels of accuracy.
After graphical interpretation, we are transforming images to text using an LSTM model of tesseract, enhanced by domain-specific training data. Tesseract has good accuracy out of the box but enhanced with a financial library we have increased the accuracy considerably.
Each Trustee Report contains many named entities, such as company names, that need to be matched across different reports. The list of names is constantly changing. We are building an artificially intelligent model to group names together and assign them to entities.
Normally, Trustee Reports group information based on tests in contrast to entities. Consequently, data has to be united to be useful to investors. Based on a strong entity building, we can easily unify the useful data.
We are currently working on processing Trustee Reports of Collateralized Loan Obligations with very promising results. The threshold for successful processing is quite high as we are aiming to completely remove the necessity for humans. Human accuracy is greater than 99%, due to the fact that data elements are very distinguishable. We can match that currently for many of the data items and are working relentlessly to achieve the same results for all items.
Drop us a note and we will get in touch with you.