Document classification and text analysis in .NET

Analyze text of PDF, DOC, DOCX, ODT, RTF, TXT, and other documents in your C# classification apps. Organize file contents with the help of different taxonomies such as IAB-2, document, and sentiment analysis taxonomy

View all APIsTry our APIs for Free

Automate text and document classification in .NET

Text classification is a process of organizing text-based data into predefined classes or categories. This process is useful for simplifying large amounts of text data into more manageable sets as well as language processing, sentiment analysis, and document classification. Taxonomy on the other hand is a structure for classifying items into distinct categories. Different types of taxonomies serve different purposes. For instance, the IAB-2 taxonomy offers a hierarchical structure to classify digital content. Sentiment analysis taxonomy helps in classifying text into positive, negative, or neutral sentiment while document taxonomy is used for the classification and easy retrieval of documents into various categories such as legal documents, medical records, and financial reports.

To analyze and classify text, GroupDocs.Classification for .NET API is a great choice for .NET developers. It enables you to build text classifier applications for PDF, Word, OpenDocument, RTF, and Text files. It supports IAB-2, sentiment analysis, and document taxonomies allowing you to organize the document content according to your text classification requirements.

Getting Started

To use GroupDocs.Classification for .NET API, please make sure to set it up correctly on your system with the help of the below-given information.

GroupDocs.Classification for .NET installation

Please download the DLLs or the MSI installer from the downloads section. Alternatively, you can install the API from NuGet.
PM> Install-Package GroupDocs.Classification 

Document classification and analysis use cases

Upon successfully setting up GroupDocs.Classification for .NET API, we can now look at some of the popular use cases for text analysis and classification.

Learn to classify text and documents using IAB-2 taxonomy

The IAB-2 (Interactive Advertising Bureau) taxonomy is a system of classifying digital content into organized categories. This system is used by many businesses, including publishers, advertisers, and marketers, to manage digital content and make it easier to find and use it. It is a two-level classification system that is designed to help businesses quickly and accurately arrange digital content into specific categories. It makes it easier for marketers to pitch their messages to the right audiences. GroupDocs.Classification for .NET API supports IAB-2 taxonomy and you can efficiently classify text and PDF, Microsoft Word, OpenDocument, RTF, and Text documents using this taxonomy.

Learn to classify text and documents using IAB-2 taxonomy

Develop text classification apps with IAB-2 taxonomy in .NET

To classify text in .NET, please use the following C# code:
/*
* Text classifier with IAB-2 Taxonomy using C#
*/
Classifier classifier = new Classifier();
string statement = "Medicine is an important part of our lives";

var response = classifier.Classify(statement, 3, Taxonomy.Iab2);
response.BestResults.ToList().ForEach(bestResult => Console.WriteLine($"Class: {bestResult.Name}, \tProbability: {bestResult.Probability}"));

Perform document classification in .NET using IAB-2 taxonomy

The following code shows how to do document analysis in .NET:
/*
* Document analysis and classification (PDF, Word, ODT, RTF, TXT) with IAB-2 Taxonomy using C#
*/
Classifier classifier = new Classifier();
var filename = "document.pdf";
var response = classifier.Classify(filename, "" , 4, Taxonomy.Iab2);
response.BestResults.ToList().ForEach(bestResult => Console.WriteLine($"Class: {bestResult.Name}, \t Probability: {bestResult.Probability}"));

Classification of text and data using document taxonomy in .NET

Document taxonomy is the process of organizing different types of documents into meaningful categories. It is used to classify digital content enabling users to search and retrieve the desired information quickly and accurately. Using this system, you can classify the text and files into several categories such as financial and legal documents, reports, emails, forms, and other digital media. Additionally, document taxonomy could provide context for your search engine optimization (SEO) efforts too. Using GroupDocs.Classification for .NET API, you can classify popular data files including PDF, DOC, DOCX, DOCM, DOT, DOTX, RTF, ODT, OTT, and TXT.

Classification of text and data using document taxonomy in .NET

How to perform content classification using document taxonomy in .NET?

The following C# code will let you classify text using document taxonomy:
    /*
* Classify Text with Document Taxonomy using C#
*/
Classifier classifier = new Classifier();
string statement = "Sooner or later technology will overcome labor work";

var response = classifier.Classify(statement, 2, Taxonomy.Documents);
response.BestResults.ToList().ForEach(bestResult => Console.WriteLine($"Class: {bestResult.Name}, \tProbability: {bestResult.Probability}"));
  

Perform document analysis using document taxonomy in the .NET platform

Please use the below-given C# code to classify documents in .NET:
    /*
* Document analysis and classification (PDF, Word, ODT, RTF, TXT) with Document Taxonomy using C#
*/
Classifier classifier = new Classifier();
var filename = "document.pdf";
var response = classifier.Classify(filename, "" , 4, Taxonomy.Documents);
response.BestResults.ToList().ForEach(bestResult => Console.WriteLine($"Class: {bestResult.Name}, \t Probability: {bestResult.Probability}"));
     

Using sentiment analysis taxonomy for classification of text in .NET

Sentiment analysis is a method of organizing digital content and other data to measure the sentiment of customers and other stakeholders. It is used to analyze and quantify the attitude of a person or group toward a particular topic or issue. By arranging sentiment into positive, negative, and neutral categories, it becomes possible to identify trends and patterns in the sentimentality of digital content, which in turn helps to observe customer feedback and behavior, evaluate customer satisfaction, and take informed decisions on product development and marketing strategies. You can make use of the sentiment analysis taxonomy when classifying documents with GroupDocs.Classification for .NET API and automate the process of text analysis.

Using sentiment analysis taxonomy for classification of text in .NET

Analyze text with sentiment analysis in .NET

For evaluating the tone of text with sentiment analysis, please use this C# code:
    // Analyze the positivity of text using sentiment classifier in C#.
var sentiment = "Experience is simply the name we give our mistakes";
var sentimentClassifier = new SentimentClassifier();
/// PositiveProbability method returns the positive probability of the sentiment.
var positiveProbability = sentimentClassifier.PositiveProbability(sentiment);
Console.WriteLine($"Positive Probability of the sentiment { positiveProbability }");
      

Classifying a group of texts with sentiment analysis

To classify more than one comment, please utilize the following code snippet:
    // Insert the comments to classify in a string array
var sentiments = new string [ ] {
  "Now that is out of the way, this thing is a beast. It is fast and runs cool.",
  "Experience is simply the name we give our mistakes",
  "When I used compressed air a cloud of dust bellowed out from the card (small scuffs and scratches).",
  "This is Pathetic."
};
var classifier = new GroupDocs.Classification.SentimentClassifier();
var sentimentPositivity = sentiments.Select(x => classifier.PositiveProbability(x)).ToArray();
Console.WriteLine(string.Join("\\n", sentimentPositivity));
   

Please feel free to view our fully working GitHub-hosted API code examples. And if you want to classify text and PDF, DOCX, DOC, ODT, DOTX, DOCM, RTF, and TXT files from your mobile, tablet, or desktop PC, please use our Free Online Document and Text Classification Apps.

Independently automate your document and image processing tasks

Why choose GroupDocs?

Unmatched file formats support

  • All popular file formats supported including documents, images, audio, videos, and ebooks.
  • PDF, DOC, DOCX, XLS, XLSX, PPT, PPTX, PUB, PNG, PSD, ODT, MSG, EML, MP3, MP4, and many more.

Extensively programmable libraries

  • Use GroupDocs APIs to build fully customizable .NET and Java apps.
  • Manipulate your business documents, spreadsheets, presentations, and images any way you like.

Hundreds of supported features

  • Convert Word or Excel to PDF, annotate PDFs, edit DOC, DOCX, or watermark files.
  • Work with esignatures, tables, mail-merge, attachments, shapes, and much more.

Tailored to your needs

  • Free trials and different paid licensing options to choose from.
  • Well-suited to individual users, startups, as well as small and large enterprises.

APIs for Developers

  • Programmatically process your digital documents and images in .NET and Java platforms.
  • Document APIs designed specifically for .NET and Java application developers.

Trusted by users globally

  • Preferred by developers and businesses alike, our libraries are used globally.
  • Generate optimised documents easily in standalone and distributed environments.

Do more with your documents and images

  • Create, render, edit, convert, compare, digitally sign, watermark, and export your files.
  • Experience endless possibilities by creating multi-functional, high-performance apps.

Simple integration and convenient application

  • Enjoy greater flexibility by integrating with your existing software applications.
  • Get up and running using a few lines of code with our super-fast and reliable APIs.

Multiple support channels

  • Need help? Look no further than one of our developer-led support options.
  • Explore the APIs structure, and documentation, or dive into the knowledge base.

Ready to get started?

Download Free Trial