Extract Text & Images with Powerful .NET Document Parsing API

GroupDocs.Parser for .NET Installation

To install the library, please download the MSI installer or the DLLs by visiting the downloads section. Or, you can use the NuGet package to set up the .NET document parsing library. The command for setting up the library from the Visual Studio package manager console is shared below:

PM > Install-Package GroupDocs.Parser

Effortlessly Extract Images and Metadata from PDF and Word Documents in .NET

GroupDocs.Parser for .NET enables extracting different types of data from the supported document formats. Developing solutions to parse documents and fetch data is a great automation idea that helps optimize process efficiency. The .NET API offers unparalleled parsing ability and assists users in taking their productivity to the next level with text, metadata, and image extraction features for multiple data files. In this section, we will learn to programmatically retrieve images from PDFs and metadata from Microsoft Word documents.

Extract Images from PDF Documents in .NET

PDF files contain text, images, forms, tables, and other types of information. PDFs boast matchless portability across various OS and devices, offering seamless rendering and collaboration. The following C# coding example provides more insight into extracting images from PDFs using the .NET document parsing API.

Load the input PDF document using the Parser class.
Use the GetImages method to extract the document’s images.
Fetch and save each image in the collection with the Save method.


// Lean to extract images from PDFs using C#
using (Parser parser = new Parser("path/document.pdf"))
{
    IEnumerable images = parser.GetImages();
    // Check if image extraction is supported
    if (images == null) 
    {
        Console.WriteLine("Images extraction isn't supported");
        return;
    }
    
    ImageOptions options = new ImageOptions(ImageFormat.Jpeg);
    int imageNumber = 0;
    
    // Iterate over retrieved images
    foreach (PageImageArea image in images)
    {
        // Save Images
        image.Save("imageFilePath/image-" + imageNumber.ToString() + ".jpeg", options);
        imageNumber++;
    }
}

Extract Metadata from Word Files in C#

Microsoft Word is a popular word-processing format that allows quick and easy storage, sharing, printing, and exporting of file content. Let’s review the following code example, demonstrating how to extract metadata from Word documents within the .NET data extraction applications.

Load the input DOCX file using the Parser class instance.
Obtain the metadata collection using the GetMetadata method.
Get the name/value of metadata by iterating over the collection.


// Extract metadata from Word documents in .NET
// Create an instance of the Parser class
            using (Parser parser = new Parser("sample.docx"))
            {
                // Extract metadata from the document
                IEnumerable metadata = parser.GetMetadata();
                // Check if metadata extraction is supported
                if (metadata == null)
                {
                    Console.WriteLine("Metatada extraction isn't supported");
                }

                // Iterate over metadata items
                foreach (MetadataItem item in metadata)
                {
                    // Print an item name and value
                    Console.WriteLine(string.Format("{0}: {1}", item.Name, item.Value));
                }
            }

Please find more coding examples on the GroupDocs.Parser for .NET GitHub examples page. If you are looking to parse documents and extract text, images, attachments, or attachments from Word, PDF, Excel, PowerPoint, emails & many other files on the fly, please check out our Free Online Document Parsing and Data Extraction Apps.

Unmatched Cross-platform Parsing and Text Extraction with GroupDocs

GroupDocs.Parser for .NET ensures simplicity and ease of use for end users, as it requires minimum coding to start parsing multi-format documents. The API boasts excellent cross-platform support, and users can experience smooth operations across a diverse set of popular operating systems and .NET frameworks. Furthermore, there is no reliance on third-party software installations, which sets the .NET document parser API apart, ensuring an independent and issue-free user experience. With the API, developers can build comprehensive, platform-independent applications that cater to distinct organizational requirements and effectively augment the performance of document management solutions.

FAQ

1. What is GroupDocs.Parser for .NET API?

GroupDocs.Parser for .NET is a powerful API for extracting text, metadata, images, and other information from many well-known document formats. Developers can integrate the advanced API functionality into apps to extract data from files for analysis, storage, and further processing.

2. What document formats does the API support?

The API supports PDF, DOC, DOCX, XLS, XLSX, PPT, PPTX, ODT, OTT, ONE, MSG, OST, PST, EML, XML, HTML, MHTML, EPUB, ZIP, and numerous other data file formats to extract data and perform document parsing operations on the .NET platform.

3. Are there code examples to help learn about parsing and extracting data using the .NET API?

Yes, the GitHub examples page of the .NET API includes many coding examples for developers to get up to speed quickly. These code examples help users greatly during the initial learning curve, and they can start extracting data by parsing documents with ease.

4. Is the API secure for sensitive documents and data?

Yes, we fully understand the sensitivity surrounding the security of your data. Therefore, we ensure the safety and integrity of your data while using GroupDocs.Parser for .NET API and take all necessary measures to provide you with a secure user experience.

5. Is GroupDocs.Parser for .NET compatible with different .NET frameworks?

Absolutely! The .NET document parsing API is compatible with multiple .NET frameworks and runs smoothly on different operating systems. It doesn’t require additional third-party software installations, and you can feel confident when using the API to build high-performance document parsing applications.

Parsing and Data Extraction Tools

Extract Images from PDF Extract Images from DOCX Extract Images from MSG Extract Images from RTF Extract Hyperlinks from TXT Extract Hyperlinks from ODT Extract Hyperlinks from XLS Extract Hyperlinks from XML Extract Tables from MD Extract Tables from EML Extract Tables from PDF Extract Tables from DOC

.NET Document Parsing API to Extract Text & Metadata

Seamlessly integrate data extraction features into your solutions and easily parse PDFs, DOCX, XLSX, PPTX, MSG, EPUB, HTML, ODT, and more formats.

Advanced Document Parsing Solutions for .NET

Getting Started

GroupDocs.Parser for .NET Installation

Effortlessly Extract Images and Metadata from PDF and Word Documents in .NET

Extract Images from PDF Documents in .NET

Extract Metadata from Word Files in C#

Unmatched Cross-platform Parsing and Text Extraction with GroupDocs

FAQ

1. What is GroupDocs.Parser for .NET API?

2. What document formats does the API support?

3. Are there code examples to help learn about parsing and extracting data using the .NET API?

4. Is the API secure for sensitive documents and data?

5. Is GroupDocs.Parser for .NET compatible with different .NET frameworks?

Parsing and Data Extraction Tools

Looking for help?

Resources

Blogs

Documentation

Knowledge Base

Ready to get started?