Karakun @ SwissText 2022
From June 8 to 10, 2022, text analysis experts from industry and academia meet at SwissText 2022 at SUSPI in Lugano. In addition to our commitment as a Gold-Sponsor, we will be presenting ourselves at the exhibition. We also actively contribute to the top-class conference program with the following talk:
Integrating ML-based Classifiers into an Enterprise Search System
HIBU is a proprietary software platform that we use to build customer solutions around enterprise search and multilingual text analysis. Its architecture provides two analysis pipelines: a first one embeds basic NLP steps, based on the detected document language and used to pre-elaborate the document’s content; a second one contains a sequence of high-level annotators, able to discover information in the document. Some examples are extracting entities from the text, such as persons, places and organizations, identifying paragraphs containing confidential information etc.
Both pipelines use the framework Apache UIMA to combine the annotators that are relevant for the target application. Each single one can be adapted and switched on and off by configuration. Moreover, the framework allows us to add new annotators based on the individual customer’s needs.
In this context, we recently integrated some new ML-based annotators as part of an Innosuisse project carried out in collaboration with SUPSI and DSwiss (“EXTRA”, presented separately, leveraging a fine-tuned version of the pre-trained BERT model and other ML technologies). These annotators allow us to provide scalable document classification, as well as customized information extraction, to be used by applications for further workflow-based functionalities.
In this demo we will show how we wrap the new functionalities into the base platform, and how these are integrated to further enrich the final results.
At our booth in the exhibition area, interested visitors can inform themselves about our HIBU platform. HIBU is a flexible software platform for the cost-effective development of customer solutions, especially in the areas of enterprise search, business intelligence and workflow automation.
