Search and index documents and images in .NET and Java

Search PDF, DOCX, XLSX, PPTX, ONE, ODT, RTF, EPUB, TXT, VSD, MPP, HTML, XML, EML, MSG, MP3, AVI, ZIP, PNG, JPG, PSD, and many other files using full-text, fuzzy, boolean, RegEx search. Apply data index to your secure documents in .NET and Java

View all APIsTry our APIs for Free

Efficiently search and index your documents and images

As the amount of data stored in digital documents continues to grow, it is essential to have an easier and faster way to index documents and then search for information from these documents. Document search and indexing is a key component of digital information management, enabling users to efficiently find the information they need. It allows users and businesses to search through large collections of files and quickly and accurately retrieve the required data to help make better decisions.

For users looking to index and search PDF, Word documents, Excel spreadsheets, PowerPoint presentations, OneNote and OpenDocument files, eBooks, Audio and Videos, Drawings, Projects, Emails, ZIP archives, and image files, GroupDocs.Search APIs are an ideal choice. These .NET and Java APIs let you develop document search apps boasting different types of search queries such as fuzzy, boolean, regular expression, synonym, wildcard, full-text search, and more. You can also create image finder applications and use data indexing features too with GroupDocs.Search for .NET and Java.

Getting Started

Please install the required version of GroupDocs.Search API (for .NET or Java) and all other prerequisites to have a smooth user experience.

GroupDocs.Search for .NET installation

Please download the MSI installer (or DLLs) from the downloads section. Or, you can install the API via NuGet.
PM> Install-Package GroupDocs.Search 

GroupDocs.Search for Java installation

You can download the JAR file from the downloads section or use the latest repository and dependency configurations for your Maven-based Java apps.
<repository>
<id>GroupDocsJavaAPI</id>
<name>GroupDocs Java API</name>
<url>http://repository.groupdocs.com/repo/</url>
</repository>
<dependency>
<groupId>com.groupdocs</groupId>
<artifactId>groupdocs-search</artifactId>
<version>21.3</version>
</dependency>
   

Document search and indexing use cases

After successfully setting up the desired GroupDocs.Search API version at your end, we can now check a few of the popular document search and indexing use cases. It must be noted that GroupDocs.Search APIs (for .NET and Java) use a two-step approach; indexing and searching. Therefore, you will be noticing this same approach applied in all coding samples we share in subsequent sections.

How to use boolean search to look up documents in .NET and Java

Boolean search is a powerful and versatile tool for searching for documents of different formats. It uses the AND, OR, and NOT logical operators to define the parameters of a search query and combine search terms. Boolean search enables you to create complex queries with multiple layers of criteria to find the exact information you are looking for. You can search PDF, Word documents, Excel spreadsheets, PowerPoint presentations, and many other data files with GroupDocs.Search APIs for .NET and Java using this search type.

How to use boolean search to look up documents in .NET and Java

Using the AND operator in a boolean search in .NET

You can utilize the below-given C# code snippet to perform a boolean search with AND operator:
string indexFolder = @"c:\MyIndex\";
string documentsFolder = @"c:\MyDocuments\";
 
// Creating an index in the specified folder
Index index = new Index(indexFolder);
 
// Indexing documents from the specified folder
index.Add(documentsFolder);
 
// Search with a text query
SearchResult result1 = index.Search("theory AND relativity");
 
// Search with object query
SearchQuery wordQuery1 = SearchQuery.CreateWordQuery("theory");
SearchQuery wordQuery2 = SearchQuery.CreateWordQuery("relativity");
SearchQuery andQuery = SearchQuery.CreateAndQuery(wordQuery1, wordQuery2);
SearchResult result2 = index.Search(andQuery); 
You can also use OR, and NOT boolean operators, or combine all these operators to form complex search queries. Please view this page for more reference.

Perform a boolean search with the OR operator in Java

Please use the following Java code snippet to execute a Boolean search with the OR operator:
String indexFolder = "c:\\MyIndex\\";
String documentsFolder = "c:\\MyDocuments\\";
 
// Creating an index in the specified folder
Index index = new Index(indexFolder);
 
// Indexing documents from the specified folder
index.add(documentsFolder);
 
// Search with a text query
SearchResult result1 = index.search("Einstein OR relativity");
 
// Search with object query
SearchQuery wordQuery1 = SearchQuery.createWordQuery("Einstein");
SearchQuery wordQuery2 = SearchQuery.createWordQuery("relativity");
SearchQuery orQuery = SearchQuery.createOrQuery(wordQuery1, wordQuery2);
SearchResult result2 = index.search(orQuery);
Additionally, you can use the AND and NOT operators or apply all operators together for a combined boolean search query. Please find more help here.

Search PDF, Word, Excel, PowerPoint, RTF, Visio, Project, and image files using fuzzy search in .NET and Java

Fuzzy search is a type of search algorithm that helps to find results that are similar to what you are looking for, even without the exact words or search phrases. This type of search is becoming increasingly popular in the world of information retrieval, as it allows users to find information without having to know the exact search words. Considering trying to look up information that may contain invalid text or errors. Fuzzy search is supported in GroupDocs.Search for .NET and Java APIs and lets you work your way around the limitations of traditional searching methods by providing a more effective and efficient mode of searching through your PDF, DOCX, XLSX, PPTX, RTF, VSD, MPP, PNG, JPEG, and many other types of files.

Search PDF, Word, Excel, PowerPoint, RTF, Visio, Project, and image files using fuzzy search in .NET and Java

Apply fuzzy search to your documents and images in .NET

For using the fuzzy search method in .NET, please use the following code:
    string indexFolder = @"c:\MyIndex\";
    string documentsFolder = @"c:\MyDocuments\";
    string query = "Einstein";
     
    // Creating an index in the specified folder
    Index index = new Index(indexFolder);
     
    // Indexing documents from the specified folder
    index.Add(documentsFolder);
     
    SearchOptions options = new SearchOptions();
    options.FuzzySearch.Enabled = true; // Enabling the fuzzy search
    options.FuzzySearch.FuzzyAlgorithm = new SimilarityLevel(0.8); // Creating the fuzzy search algorithm
    // This function specifies 0 as the maximum number of mistakes for words from 1 to 4 characters.
    // It specifies 1 as the maximum number of mistakes for words from 5 to 9 characters.
    // It specifies 2 as the maximum number of mistakes for words from 10 to 14 characters. And so on.
     
    // Search in index
    SearchResult result = index.Search(query, options);    

Use fuzzy search for looking up your documents in Java

The following sample code shows how to use fuzzy search in Java:
    String indexFolder = "c:\\MyIndex\\";
    String documentsFolder = "c:\\MyDocuments\\";
    String query = "Einstein";
     
    // Creating an index in the specified folder
    Index index = new Index(indexFolder);
     
    // Indexing documents from the specified folder
    index.add(documentsFolder);
     
    SearchOptions options = new SearchOptions();
    options.getFuzzySearch().setEnabled(true); // Enabling the fuzzy search
    options.getFuzzySearch().setFuzzyAlgorithm(new SimilarityLevel(0.8)); // Creating the fuzzy search algorithm
    // This function specifies 0 as the maximum number of mistakes for words from 1 to 4 characters.
    // It specifies 1 as the maximum number of mistakes for words from 5 to 9 characters.
    // It specifies 2 as the maximum number of mistakes for words from 10 to 14 characters. And so on.
     
    // Search in index
    SearchResult result = index.search(query, options);
     

Learn to use a regular expression (RegEx) search for documents in .NET and Java

A regular expression (RegEx) search is a text-matching tool that allows users to search for patterns in text, rather than exact strings of text characters. RegEx search is used for a variety of purposes, including data validation, data extraction, and data mining. While this type of search query is incredibly useful, it does have some limitations such as when used to find specific information from large datasets, it might cause the performance levels to drop. GroupDocs.Search APIs for .NET and Java let you search PDF, DOC, DOCX, XLS, XLSX, PPT, PPTX, ODT, EPUB, MSG, EML, TXT, RTF, and many other types of files using ReGex search queries.

Learn to use a regular expression (RegEx) search for documents in .NET and Java

Find information in .NET documents using regular expression search

The following C# code will help you look up data and information in .NET using ReGex search:
    string indexFolder = @"c:\MyIndex\";
string documentsFolder = @"c:\MyDocuments\";
 
// Creating an index in the specified folder
Index index = new Index(indexFolder);
 
// Indexing documents from the specified folder
index.Add(documentsFolder);
 
// Search for the phrase in text form
string query1 = "^^(.)\\1{1,}"; // The first caret character at the beginning indicates that this is a regular expression search query
SearchResult result1 = index.Search(query1); // Search for two or more identical characters at the beginning of a word
 
// Search for the phrase in object form
SearchQuery query2 = SearchQuery.CreateRegexQuery("^(.)\\1{1,}"); // Search for two or more identical characters at the beginning of a word
SearchResult result2 = index.Search(query2);
      

Search documents in Java with the help of regular expression search query

To look up information in your Java documents with ReGex search, please use this sample code:
    String indexFolder = "c:\\MyIndex\\";
String documentsFolder = "c:\\MyDocuments\\";
 
// Creating an index in the specified folder
Index index = new Index(indexFolder);
// Indexing documents from the specified folder
index.add(documentsFolder);
 
// Search for the phrase in text form
String query1 = "^^(.)\\1{1,}"; // The first caret character at the beginning indicates that this is a regular expression search query
SearchResult result1 = index.search(query1); // Search for two or more identical characters at the beginning of a word
 
// Search for the phrase in object form
SearchQuery query2 = SearchQuery.createRegexQuery("^(.)\\1{1,}"); // Search for two or more identical characters at the beginning of a word
SearchResult result2 = index.search(query2);
   

How to use a reverse image search query to build image finder apps in .NET and Java?

Reverse image search is a technique used to look up information in images. This type of image search uses an image as the search query, instead of text, to find related images. Along with the photo search functionality, it also helps in finding other image-related content. Reverse image search has become more popular over the years and is an effective search tool for businesses, marketers, and researchers who need to perform photo searches, locate information about an image or image-based content, or simply search by images. If you looking to develop full-featured image finder apps in .NET or Java, you can rely on GroupDocs.Search APIs and incorporate image search functionality into your applications for PNG, JPG, BMP, WEBP, TIFF, and GIF image files.

How to use a reverse image search query to build image finder apps in .NET and Java?

How to search with images in the .NET platform?

The code snippet shown below will help you with image finding searches in .NET:
        string indexFolder = @"c:\MyIndex";
string documentFolder = @"c:\MyDocuments";

// Creating an index
Index index = new Index(indexFolder);
// Setting the image indexing options
IndexingOptions indexingOptions = new IndexingOptions();
indexingOptions.ImageIndexingOptions.EnabledForContainerItemImages = true;
indexingOptions.ImageIndexingOptions.EnabledForEmbeddedImages = true;
indexingOptions.ImageIndexingOptions.EnabledForSeparateImages = true;

// Indexing documents in a document folder
index.Add(documentFolder, indexingOptions);

// Setting the image search options
ImageSearchOptions imageSearchOptions = new ImageSearchOptions();
imageSearchOptions.HashDifferences = 10;
imageSearchOptions.MaxResultCount = 100;
imageSearchOptions.SearchDocumentFilter = SearchDocumentFilter.CreateFileExtension(".zip", ".png", ".jpg");

// Creating a reference image for search
SearchImage searchImage = SearchImage.Create(@"c:\MyDocuments\image.png");

// Searching in the index
ImageSearchResult result = index.Search(searchImage, imageSearchOptions);

Console.WriteLine("Images found: " + result.ImageCount);
for (int i = 0; i < result.ImageCount; i++)
{
    FoundImageFrame image = result.GetFoundImage(i);
    Console.WriteLine(image.DocumentInfo.ToString());
}
  

Building image finder applications in the Java platform

Please use the following sample code if you wish to search by image in Java:
        String indexFolder = "c:\\MyIndex";
        String documentsFolder = "c:\\MyDocuments";
        
        // Creating an index
        Index index = new Index(indexFolder);
        
        // Setting the image indexing options
        IndexingOptions indexingOptions = new IndexingOptions();
        indexingOptions.getImageIndexingOptions().setEnabledForContainerItemImages(true);
        indexingOptions.getImageIndexingOptions().setEnabledForEmbeddedImages(true);
        indexingOptions.getImageIndexingOptions().setEnabledForSeparateImages(true);
        
        // Indexing documents in a document folder
        index.add(documentsFolder, indexingOptions);
        
        // Setting the image search options
        ImageSearchOptions imageSearchOptions = new ImageSearchOptions();
        imageSearchOptions.setHashDifferences(10);
        imageSearchOptions.setMaxResultCount(10000);
        imageSearchOptions.setSearchDocumentFilter(SearchDocumentFilter.createFileExtension(".zip", ".png", ".jpg"));
        
        // Creating a reference image for search
        SearchImage searchImage = SearchImage.create("c:\\MyDocuments\\image.png");
        
        // Searching in the index
        ImageSearchResult result = index.search(searchImage, imageSearchOptions);
        
        System.out.print("Images found: " + result.getImageCount());
        for (int i = 0; i < result.getImageCount(); i++) {
            FoundImageFrame image = result.getFoundImage(i);
            System.out.print(image.getDocumentInfo().toString());
        } 

In addition to the aforementioned search options, GroupDocs.Search APIs provide many other document search features including data range search, full-text search, faceted search, case-sensitive search, numeric search, synonym search, and wildcard search. Please visit the .NET and Java sections of GroupDocs.Search API documentation for more help and information on the available search options.

Learn to index your protected documents in .NET and Java

Indexing in simplest terms is the process of organizing information in documents in such a way that it can be easily and quickly retrieved. It is an integral part of the process of searching for information in documents. Indexing improves the searchability of files and it would be impossible to efficiently locate the required information from within a document without it. GroupDocs.Search APIs for .NET and Java provide users with several indexing options including the indexing of password-protected files.

Learn to index your protected documents in .NET and Java

Index your secure documents easily in .NET

The sample C# code shown below allows you to index your protected PDF documents:
string indexFolder = @"c:\MyIndex\";
string documentsFolder = @"c:\MyDocuments\";
 
// Creating an index
Index index = new Index(indexFolder);
 
// Adding document passwords to the dictionary
string key = Path.GetFullPath(@"C:\MyDocuments\ProtectedDocument.pdf");
index.Dictionaries.DocumentPasswords.Add(key, "123456");
// ...
 
// Indexing documents from the specified folder
// Passwords will be automatically retrieved from the dictionary when necessary
index.Add(documentsFolder);
To view more .NET indexing options such as updating indexes, merging indexes, and many others, please visit this docs section.

Password-protected documents indexing in Java

You can use the below-given code snippet to index your secure PDF files in Java:
String indexFolder = "c:\\MyIndex\\";
String documentsFolder = "c:\\MyDocuments\\";
 
// Creating an index
Index index = new Index(indexFolder);
 
// Adding document passwords to the dictionary
String path = new File("C:\\MyDocuments\\ProtectedDocument.pdf").getAbsolutePath();
index.getDictionaries().getDocumentPasswords().add(path, "123456");
// ...
 
// Indexing documents from the specified folder
// Passwords will be automatically retrieved from the dictionary when necessary
index.add(documentsFolder);
 
We also have several other indexing options in Java including index merging and index updating. Please learn more here.

We provide GitHub hosted examples for users to easily review API features. If you are also interested, please check GroupDocs.Search API code examples that are available for both .NET and Java platforms.

Are you looking to perform document search and indexing on the fly using your mobile or tablet? If yes, please feel free to try out our Online Free Apps which let you easily search PDF, Word, Excel, PowerPoint, OneNote, OpenDocument, Visio, Project, eBooks, Emails, Web, Audio, Video, and image files.

Independently automate your document and image processing tasks

Why choose GroupDocs?

Unmatched file formats support

  • All popular file formats supported including documents, images, audio, videos, and ebooks.
  • PDF, DOC, DOCX, XLS, XLSX, PPT, PPTX, PUB, PNG, PSD, ODT, MSG, EML, MP3, MP4, and many more.

Extensively programmable libraries

  • Use GroupDocs APIs to build fully customizable .NET and Java apps.
  • Manipulate your business documents, spreadsheets, presentations, and images any way you like.

Hundreds of supported features

  • Convert Word or Excel to PDF, annotate PDFs, edit DOC, DOCX, or watermark files.
  • Work with esignatures, tables, mail-merge, attachments, shapes, and much more.

Tailored to your needs

  • Free trials and different paid licensing options to choose from.
  • Well-suited to individual users, startups, as well as small and large enterprises.

APIs for Developers

  • Programmatically process your digital documents and images in .NET and Java platforms.
  • Document APIs designed specifically for .NET and Java application developers.

Trusted by users globally

  • Preferred by developers and businesses alike, our libraries are used globally.
  • Generate optimised documents easily in standalone and distributed environments.

Do more with your documents and images

  • Create, render, edit, convert, compare, digitally sign, watermark, and export your files.
  • Experience endless possibilities by creating multi-functional, high-performance apps.

Simple integration and convenient application

  • Enjoy greater flexibility by integrating with your existing software applications.
  • Get up and running using a few lines of code with our super-fast and reliable APIs.

Multiple support channels

  • Need help? Look no further than one of our developer-led support options.
  • Explore the APIs structure, and documentation, or dive into the knowledge base.

Ready to get started?

Download Free Trial