Fast development relies heavily on reusable building blocks. This is evident with hardware such as bricks used to build a building but it is equally true for software.
At cognaize, we have developed several horizontally scalable microservices, that serve as building blocks to more quickly produce a solution to a bespoke problem.
Understanding the specific regulatory constraints of the financial industry very well, we have built our microservices keeping security in mind. Even internally, all communication is encrypted at transit. Even caches are encrypted at rest and all microservices can be deployed locally.
Artificial Intelligence has been using the same concepts for decades but has seen tremendous growth in recent years. Machine Learning is an application of Artificial Intelligence that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Deep Learning is a technique of Machine Learning that is based on learning data representations, as opposed to task-specific algorithms. Deep Learning is the reason for the explosion of Artificial Intelligence in recent years and delivers increasingly better results with the growing size of training data. Consequently, dealing with Big Data and horizontal scalability is mandatory for good results. At cognaize, we natively embrace the concepts of Big Data. This is also reflected in our microservices which are all fully horizontally scalable.
Below are a few examples of our most important microservices.
Most financial documents include tables. In order to automatically process information contained in such tables, a graphical interpretation is necessary. Indentation, size and text weight all provide importnat hints to understanding the context.
Using a Convolutional Neural Network we detect tables within a document. The borders are then adjusted using a custom-built Optical Character Recognition (OCR) to make sure the table does not cut into text. Finally, using a financial industry domain specific sentence prediction model, we are adjusting for merged cells. The result is a near-perfect table detection that can be used for any kind of financial document.
Using graphical interpretation, the text in images has to be transformed to be processable. For this Optical Character Recognitßion (OCR), we are using the LSTM model of tesseract. Tesseract is an open source project, currently managed by Google. It has good accuracy out of the box, but at cognaize, we have enhanced it with a financial library. The result is a considerable gain in accuracy.
Word embedding is the most popular representation of document vocabulary. It is capable of capturing context of a word in a document as well as semantic and syntactic similarity to and relation with other words. Although there are open source word embeddings available, they are trained on generic text.
At cognaize, we have built a word embedding model specifically for the financial industry. Our model understands the relationship between 'notional' and 'principal' or 'venture capital' and 'series A funding'. These examples clearly show the necessity of such a model for successful processing of financial documents.
Drop us a note and we will get in touch with you.