Welcome to the Resume Craft parser playground, where you can see our intelligent parsing system in action. Explore different PDF examples below to see how our parser extracts key information.

Want to test your own resume? Drop it in the box below to see how well it performs with Application Tracking Systems (ATS). A higher parsing accuracy suggests better resume formatting and readability. For optimal results, ensure your name and email are clearly formatted for accurate detection.

Resume Parsing Results

Profile
Name
Email
Phone
Location
Link
Summary
Education
School
Degree
GPA
Date
Descriptions
Work Experience
Company
Job Title
Date
Descriptions
Skills
Descriptions

Resume Parser Algorithm Deep Dive

Curious about the technology behind Resume Craft? This section explains our advanced parsing algorithm through a 4-step process. (Currently optimized for English-language resumes with single-column layouts)

Step 1. Read the text items from a PDF file

PDF documents follow the ISO 32000 specification. When viewed in a text editor, PDF content appears as encoded data that's not human-readable. To access the actual content, Resume Craft's parser needs to first decode this information into a format we can process.

Rather than building a PDF decoder from scratch using the ISO 32000 specification, Resume Craft integrates with pdf.js, Mozilla's powerful open-source library, to extract text elements efficiently and reliably.

Below you'll find 0 text elements extracted from your uploaded resume. Each element contains both the actual text and important metadata, including coordinates (x,y positions), font properties (like bold styling), and line break information. (Note: Coordinates are measured from the bottom-left corner of the page, marked as position 0,0)

#	Text Content	Metadata

Step 2. Intelligent Text Line Formation

Before processing the parsed text elements, we need to address two critical challenges:

Challenge 1: Text FragmentationDuring the initial extraction process, Resume Craft may detect single text elements split into multiple fragments. For instance, contact information like a phone number "555-123-4567" might be separated into distinct elements: "555", "-", "123", "-", "4567".

Our Approach: Resume Craft employs an intelligent text clustering algorithm that merges adjacent text elements based on spatial proximity. The system calculates the gap between elements using the formula: $Gap = Element2StartX - Element1EndX$ To determine optimal clustering, Resume Craft calculates a baseline character width by analyzing the document's text metrics. This calculation excludes bold text and line breaks to ensure accurate measurements and natural text flow.

Challenge 2: Missing Contextual RelationshipsWhen reviewing resumes, our eyes naturally follow a top-to-bottom flow, where visual elements like text weight and spacing help us understand how information is connected. Traditional text extraction strips away these vital visual relationships, leaving behind disconnected pieces of information that lose their original meaning and hierarchy.

Our Approach: Resume Craft's intelligent parser rebuilds these relationships by mimicking human reading patterns. First, it organizes extracted text into coherent lines, then combines these lines into logical sections - a process we'll explore in detail in the next phase.

In this second phase, Resume Craft's parser has successfully identified0 distinct lines from your uploaded PDF resume, as detailed in the table below. This line-by-line presentation makes the content significantly more comprehensible. (Note: Some lines contain multiple text elements, separated by a blue vertical marker | )

Lines	Line Content

Step 3. Understanding Section Organization

Resume Craft's intelligent parser works in stages to analyze your document. After organizing individual elements into lines in Step 2, the system moves on to Step 3, where it intelligently identifies and groups these lines into meaningful sections.

A key characteristic of professional resumes is that each major section begins with a distinct header line. This organizational principle, common across professional documents, enables Resume Craft to accurately identify and categorize content by mapping text blocks to their corresponding section headers.

Our advanced parsing system employs specific criteria to identify section headers. A line must meet all three of these requirements:
1. It must be standalone (no other text in the line)
2. The text must be bold formatted
3. All characters must be in CAPITAL LETTERS

This combination of bold formatting and capitalization is a standard practice in professional resume design. While exceptions may exist, using both emphasis techniques typically indicates a section header, and deviating from this convention might not represent optimal resume formatting.

Resume Craft includes a sophisticated backup detection system for cases where the primary criteria aren't met. This secondary system utilizes smart keyword recognition to match common section header terminology found in professional resumes.

Upon completing Step 3, Resume Craft successfully maps out the document's structure, as demonstrated in the table below. For clarity, section headers appear in bold and related content is highlighted with consistent color coding.

Lines	Line Content

Step 4. Resume Information Extraction

The final phase in Resume Craft's parsing workflow is the extraction step - the cornerstone of our intelligent resume analysis system. This crucial step identifies and extracts key information from previously identified sections.

Intelligent Scoring Algorithm

At the heart of Resume Craft's extraction engine lies our sophisticated scoring algorithm. Each resume element is evaluated using specialized feature sets, combining pattern matching functions with weighted scoring metrics. These scores can be either positive or negative, reflecting the likelihood of a match. The system processes each text element through multiple feature sets, aggregating scores to determine the best match. Within each section, the text element achieving the highest cumulative score is selected as the target information.

To illustrate this process, examine the following table showing how three key profile elements are scored in a sample resume.

Resume Attribute	Text (Highest Feature Score)	Feature Scores of Other Texts
Name
Email
Phone

Feature Set Architecture

Resume Craft's feature sets are built on two fundamental principles:
1. Feature sets are designed with contextual awareness, considering all possible elements within their respective sections.
2. Each feature set is carefully engineered based on data patterns and statistical likelihood of occurrence.

Below is an example of feature sets used for name detection. The system employs both positive patterns that identify name characteristics and negative patterns that help exclude non-name elements.

Name Feature Sets
Feature Function	Feature Matching Score
Contains only letters, spaces or periods	+3
Is bolded	+2
Contains all uppercase letters	+2
Contains @	-4 (match email)
Contains number	-4 (match phone)
Contains ,	-4 (match address)
Contains /	-4 (match url)

Feature Detection System

Each component of your resume is analyzed through multiple detection algorithms. While the complete set of detection features can be found in our extract-resume-from-sections directory, we'll focus on the primary detection methods that make our system uniquely effective at identifying key resume elements.

Resume Attribute	Core Feature Function	Regex
Name	Contains only letters, spaces or periods	/^[a-zA-Z\s\.]+$/
Email	Match email format [email protected] xxx can be anything not space	/\S+@\S+\.\S+/
Phone	Match phone format (xxx)-xxx-xxxx () and - are optional	/$?\d{3}$?[\s-]?\d{3}[\s-]?\d{4}/
Location	Match city and state format City, ST	/[A-Z][a-zA-Z\s]+, [A-Z]{2}/
Url	Match url format xxx.xxx/xxx	/\S+\.[a-z]+\/\S+/
School	Contains a school keyword, e.g. College, University, School
Degree	Contains a degree keyword, e.g. Associate, Bachelor, Master
GPA	Match GPA format x.xx	/[0-4]\.\d{1,2}/
Date	Contains date keyword related to year, month, seasons or the word Present	Year: /(?:19\|20)\d{2}/
Job Title	Contains a job title keyword, e.g. Analyst, Engineer, Intern
Company	Is bolded or doesn't match job title & date
Project	Is bolded or doesn't match date

Advanced Section Analysis

One of our system's sophisticated capabilities is handling complex section structures. While profile sections can be processed as single units, sections like work experience and education require intelligent subdivision. Our system automatically identifies and processes multiple entries within these sections, ensuring accurate data extraction for each individual experience or qualification.

Resume Craft employs smart recognition algorithms to identify section breaks. Our primary method analyzes vertical spacing patterns, detecting when the gap between lines exceeds the standard spacing by 40%. This works particularly well for professionally formatted resumes. As a backup method, we also examine text formatting characteristics, such as bold styling, to ensure reliable section detection even in non-standard layouts.

This comprehensive approach makes Resume Craft's parsing technology both robust and reliable!

Inspired by Xitang

Resume Parser Playground

Resume Example 1

Resume Example 2