As the amount of data stored in digital documents continues to grow, it is essential to have an easier and faster way to index documents and then search for information from these documents. Document search and indexing is a key component of digital information management, enabling users to efficiently find the information they need. It allows users and businesses to search through large collections of files and quickly and accurately retrieve the required data to help make better decisions.
For users looking to index and search PDF, Word documents, Excel spreadsheets, PowerPoint presentations, OneNote and OpenDocument files, eBooks, Audio and Videos, Drawings, Projects, Emails, ZIP archives, and image files, GroupDocs.Search APIs are an ideal choice. These .NET and Java APIs let you develop document search apps boasting different types of search queries such as fuzzy, boolean, regular expression, synonym, wildcard, full-text search, and more. You can also create image finder applications and use data indexing features too with GroupDocs.Search for .NET and Java.
Please install the required version of GroupDocs.Search API (for .NET or Java) and all other prerequisites to have a smooth user experience.
After successfully setting up the desired GroupDocs.Search API version at your end, we can now check a few of the popular document search and indexing use cases. It must be noted that GroupDocs.Search APIs (for .NET and Java) use a two-step approach; indexing and searching. Therefore, you will be noticing this same approach applied in all coding samples we share in subsequent sections.
Boolean search is a powerful and versatile tool for searching for documents of different formats. It uses the AND, OR, and NOT logical operators to define the parameters of a search query and combine search terms. Boolean search enables you to create complex queries with multiple layers of criteria to find the exact information you are looking for. You can search PDF, Word documents, Excel spreadsheets, PowerPoint presentations, and many other data files with GroupDocs.Search APIs for .NET and Java using this search type.
string indexFolder = @"c:\MyIndex\";
string documentsFolder = @"c:\MyDocuments\";
// Creating an index in the specified folder
Index index = new Index(indexFolder);
// Indexing documents from the specified folder
index.Add(documentsFolder);
// Search with a text query
SearchResult result1 = index.Search("theory AND relativity");
// Search with object query
SearchQuery wordQuery1 = SearchQuery.CreateWordQuery("theory");
SearchQuery wordQuery2 = SearchQuery.CreateWordQuery("relativity");
SearchQuery andQuery = SearchQuery.CreateAndQuery(wordQuery1, wordQuery2);
SearchResult result2 = index.Search(andQuery);
String indexFolder = "c:\\MyIndex\\";
String documentsFolder = "c:\\MyDocuments\\";
// Creating an index in the specified folder
Index index = new Index(indexFolder);
// Indexing documents from the specified folder
index.add(documentsFolder);
// Search with a text query
SearchResult result1 = index.search("Einstein OR relativity");
// Search with object query
SearchQuery wordQuery1 = SearchQuery.createWordQuery("Einstein");
SearchQuery wordQuery2 = SearchQuery.createWordQuery("relativity");
SearchQuery orQuery = SearchQuery.createOrQuery(wordQuery1, wordQuery2);
SearchResult result2 = index.search(orQuery);
Fuzzy search is a type of search algorithm that helps to find results that are similar to what you are looking for, even without the exact words or search phrases. This type of search is becoming increasingly popular in the world of information retrieval, as it allows users to find information without having to know the exact search words. Considering trying to look up information that may contain invalid text or errors. Fuzzy search is supported in GroupDocs.Search for .NET and Java APIs and lets you work your way around the limitations of traditional searching methods by providing a more effective and efficient mode of searching through your PDF, DOCX, XLSX, PPTX, RTF, VSD, MPP, PNG, JPEG, and many other types of files.
string indexFolder = @"c:\MyIndex\";
string documentsFolder = @"c:\MyDocuments\";
string query = "Einstein";
// Creating an index in the specified folder
Index index = new Index(indexFolder);
// Indexing documents from the specified folder
index.Add(documentsFolder);
SearchOptions options = new SearchOptions();
options.FuzzySearch.Enabled = true; // Enabling the fuzzy search
options.FuzzySearch.FuzzyAlgorithm = new SimilarityLevel(0.8); // Creating the fuzzy search algorithm
// This function specifies 0 as the maximum number of mistakes for words from 1 to 4 characters.
// It specifies 1 as the maximum number of mistakes for words from 5 to 9 characters.
// It specifies 2 as the maximum number of mistakes for words from 10 to 14 characters. And so on.
// Search in index
SearchResult result = index.Search(query, options);
String indexFolder = "c:\\MyIndex\\";
String documentsFolder = "c:\\MyDocuments\\";
String query = "Einstein";
// Creating an index in the specified folder
Index index = new Index(indexFolder);
// Indexing documents from the specified folder
index.add(documentsFolder);
SearchOptions options = new SearchOptions();
options.getFuzzySearch().setEnabled(true); // Enabling the fuzzy search
options.getFuzzySearch().setFuzzyAlgorithm(new SimilarityLevel(0.8)); // Creating the fuzzy search algorithm
// This function specifies 0 as the maximum number of mistakes for words from 1 to 4 characters.
// It specifies 1 as the maximum number of mistakes for words from 5 to 9 characters.
// It specifies 2 as the maximum number of mistakes for words from 10 to 14 characters. And so on.
// Search in index
SearchResult result = index.search(query, options);
A regular expression (RegEx) search is a text-matching tool that allows users to search for patterns in text, rather than exact strings of text characters. RegEx search is used for a variety of purposes, including data validation, data extraction, and data mining. While this type of search query is incredibly useful, it does have some limitations such as when used to find specific information from large datasets, it might cause the performance levels to drop. GroupDocs.Search APIs for .NET and Java let you search PDF, DOC, DOCX, XLS, XLSX, PPT, PPTX, ODT, EPUB, MSG, EML, TXT, RTF, and many other types of files using ReGex search queries.
string indexFolder = @"c:\MyIndex\";
string documentsFolder = @"c:\MyDocuments\";
// Creating an index in the specified folder
Index index = new Index(indexFolder);
// Indexing documents from the specified folder
index.Add(documentsFolder);
// Search for the phrase in text form
string query1 = "^^(.)\\1{1,}"; // The first caret character at the beginning indicates that this is a regular expression search query
SearchResult result1 = index.Search(query1); // Search for two or more identical characters at the beginning of a word
// Search for the phrase in object form
SearchQuery query2 = SearchQuery.CreateRegexQuery("^(.)\\1{1,}"); // Search for two or more identical characters at the beginning of a word
SearchResult result2 = index.Search(query2);
String indexFolder = "c:\\MyIndex\\";
String documentsFolder = "c:\\MyDocuments\\";
// Creating an index in the specified folder
Index index = new Index(indexFolder);
// Indexing documents from the specified folder
index.add(documentsFolder);
// Search for the phrase in text form
String query1 = "^^(.)\\1{1,}"; // The first caret character at the beginning indicates that this is a regular expression search query
SearchResult result1 = index.search(query1); // Search for two or more identical characters at the beginning of a word
// Search for the phrase in object form
SearchQuery query2 = SearchQuery.createRegexQuery("^(.)\\1{1,}"); // Search for two or more identical characters at the beginning of a word
SearchResult result2 = index.search(query2);
Reverse image search is a technique used to look up information in images. This type of image search uses an image as the search query, instead of text, to find related images. Along with the photo search functionality, it also helps in finding other image-related content. Reverse image search has become more popular over the years and is an effective search tool for businesses, marketers, and researchers who need to perform photo searches, locate information about an image or image-based content, or simply search by images. If you looking to develop full-featured image finder apps in .NET or Java, you can rely on GroupDocs.Search APIs and incorporate image search functionality into your applications for PNG, JPG, BMP, WEBP, TIFF, and GIF image files.
string indexFolder = @"c:\MyIndex";
string documentFolder = @"c:\MyDocuments";
// Creating an index
Index index = new Index(indexFolder);
// Setting the image indexing options
IndexingOptions indexingOptions = new IndexingOptions();
indexingOptions.ImageIndexingOptions.EnabledForContainerItemImages = true;
indexingOptions.ImageIndexingOptions.EnabledForEmbeddedImages = true;
indexingOptions.ImageIndexingOptions.EnabledForSeparateImages = true;
// Indexing documents in a document folder
index.Add(documentFolder, indexingOptions);
// Setting the image search options
ImageSearchOptions imageSearchOptions = new ImageSearchOptions();
imageSearchOptions.HashDifferences = 10;
imageSearchOptions.MaxResultCount = 100;
imageSearchOptions.SearchDocumentFilter = SearchDocumentFilter.CreateFileExtension(".zip", ".png", ".jpg");
// Creating a reference image for search
SearchImage searchImage = SearchImage.Create(@"c:\MyDocuments\image.png");
// Searching in the index
ImageSearchResult result = index.Search(searchImage, imageSearchOptions);
Console.WriteLine("Images found: " + result.ImageCount);
for (int i = 0; i < result.ImageCount; i++)
{
FoundImageFrame image = result.GetFoundImage(i);
Console.WriteLine(image.DocumentInfo.ToString());
}
String indexFolder = "c:\\MyIndex";
String documentsFolder = "c:\\MyDocuments";
// Creating an index
Index index = new Index(indexFolder);
// Setting the image indexing options
IndexingOptions indexingOptions = new IndexingOptions();
indexingOptions.getImageIndexingOptions().setEnabledForContainerItemImages(true);
indexingOptions.getImageIndexingOptions().setEnabledForEmbeddedImages(true);
indexingOptions.getImageIndexingOptions().setEnabledForSeparateImages(true);
// Indexing documents in a document folder
index.add(documentsFolder, indexingOptions);
// Setting the image search options
ImageSearchOptions imageSearchOptions = new ImageSearchOptions();
imageSearchOptions.setHashDifferences(10);
imageSearchOptions.setMaxResultCount(10000);
imageSearchOptions.setSearchDocumentFilter(SearchDocumentFilter.createFileExtension(".zip", ".png", ".jpg"));
// Creating a reference image for search
SearchImage searchImage = SearchImage.create("c:\\MyDocuments\\image.png");
// Searching in the index
ImageSearchResult result = index.search(searchImage, imageSearchOptions);
System.out.print("Images found: " + result.getImageCount());
for (int i = 0; i < result.getImageCount(); i++) {
FoundImageFrame image = result.getFoundImage(i);
System.out.print(image.getDocumentInfo().toString());
} In addition to the aforementioned search options, GroupDocs.Search APIs provide many other document search features including data range search, full-text search, faceted search, case-sensitive search, numeric search, synonym search, and wildcard search. Please visit the .NET and Java sections of GroupDocs.Search API documentation for more help and information on the available search options.
Indexing in simplest terms is the process of organizing information in documents in such a way that it can be easily and quickly retrieved. It is an integral part of the process of searching for information in documents. Indexing improves the searchability of files and it would be impossible to efficiently locate the required information from within a document without it. GroupDocs.Search APIs for .NET and Java provide users with several indexing options including the indexing of password-protected files.
string indexFolder = @"c:\MyIndex\"; string documentsFolder = @"c:\MyDocuments\"; // Creating an index Index index = new Index(indexFolder); // Adding document passwords to the dictionary string key = Path.GetFullPath(@"C:\MyDocuments\ProtectedDocument.pdf"); index.Dictionaries.DocumentPasswords.Add(key, "123456"); // ... // Indexing documents from the specified folder // Passwords will be automatically retrieved from the dictionary when necessary index.Add(documentsFolder);
String indexFolder = "c:\\MyIndex\\";
String documentsFolder = "c:\\MyDocuments\\";
// Creating an index
Index index = new Index(indexFolder);
// Adding document passwords to the dictionary
String path = new File("C:\\MyDocuments\\ProtectedDocument.pdf").getAbsolutePath();
index.getDictionaries().getDocumentPasswords().add(path, "123456");
// ...
// Indexing documents from the specified folder
// Passwords will be automatically retrieved from the dictionary when necessary
index.add(documentsFolder);
We provide GitHub hosted examples for users to easily review API features. If you are also interested, please check GroupDocs.Search API code examples that are available for both .NET and Java platforms.
Are you looking to perform document search and indexing on the fly using your mobile or tablet? If yes, please feel free to try out our Online Free Apps which let you easily search PDF, Word, Excel, PowerPoint, OneNote, OpenDocument, Visio, Project, eBooks, Emails, Web, Audio, Video, and image files.