PolyAnalyst adds support for automatically identifying and removing duplicate documents
As a part of the release of PolyAnalyst 6.0.920, PolyAnalyst now provides a new node, Distinct Texts. The Distinct Texts node is data cleansing operation that examines a set of documents and looks for content duplication. If any two documents are “similar” enough, one of the documents can be filtered from the set of documents. The node produces a new set of documents where the duplicate documents have been removed, along…

