Construction and engineering firms face a recurring challenge that consumes significant professional time while providing limited strategic value. When clients request proposals or when projects move from design to execution, someone needs to extract detailed Bills of Quantities and Bills of Materials from engineering drawings. This process traditionally requires experienced engineers to manually review drawings, identify every component, measure quantities, and compile comprehensive lists of materials needed for construction. A single project might require days of this meticulous work, and any mistakes could lead to cost overruns or project delays when materials were under-ordered.
A construction technology company wanted to understand whether artificial intelligence could automate this extraction process without sacrificing the accuracy that made it reliable. The challenge extended beyond simple optical character recognition or pattern matching. Engineering drawings contain complex visual information that requires understanding construction conventions, interpreting symbols and notations, recognizing components at various scales, and reasoning about relationships between elements. The solution needed to handle the variety and ambiguity that characterize real-world engineering drawings rather than only working with perfectly standardized documents.
Building Agentic AI Workflows
We designed the system using an agentic architecture, which means the AI doesn't simply follow a fixed sequence of steps but rather makes decisions about how to approach each drawing based on what it observes. This architectural choice proved crucial because engineering drawings vary enormously in their organization, level of detail, and presentation style. Some drawings clearly separate different building systems onto separate sheets, while others overlay electrical, plumbing, and structural information on the same diagram. Some use standardized symbols while others include custom notation that requires interpretation from context.
An agentic system begins by analyzing the drawing to understand its structure and identify what types of information it contains. It might recognize that a particular sheet shows a floor plan with architectural elements, while another shows electrical layout, and yet another provides detail views of specific connections. Based on this initial analysis, the system decides which AI models and processing techniques to apply to each portion of the drawing, much like an experienced engineer would approach the task by first orienting themselves to understand what they're looking at.
The agent then orchestrates a series of specialized models that each handle different aspects of the extraction task. Computer vision models identify visual elements like walls, doors, fixtures, and equipment. Language models interpret text annotations, dimensions, and specifications. The agent coordinates between these different models, using information extracted by one model to inform how other models should interpret related portions of the drawing. If the vision model identifies a fixture symbol, the language model helps interpret nearby text to determine specifications for that fixture.
Combining Language Models with Deep Learning Vision
The system employed large language models not just for text interpretation but as a reasoning engine that could make sense of the entire extraction process. When the vision models identified components in a drawing, the language model would reason about whether those identifications made sense given the type of drawing and the context of surrounding elements. If the vision model tentatively identified something as a light fixture but the language model recognized from annotations that this section showed plumbing layout, it could flag this inconsistency for further analysis or correction.
Deep learning vision models handled the challenge of recognizing components across varying drawing styles, scales, and quality levels. These models were trained on large sets of engineering drawings to learn the visual patterns that indicate different types of building components. Unlike traditional computer vision that looks for exact matches to predefined templates, these models could recognize a component even when drawn with slight variations in style or viewed from different angles.
The vision models also needed to understand spatial relationships and layering in drawings. Engineering drawings use conventions like line weights, colors, and patterns to indicate different information layers. A thick solid line might represent a wall, while a dashed line indicates something hidden or proposed rather than existing. The vision models learned these conventions so they could correctly interpret what they were seeing rather than simply identifying visual patterns without understanding their meaning.
Handling Ambiguity and Edge Cases
Real engineering drawings contain ambiguities that even experienced engineers sometimes need to resolve through additional research or questions to the designer. The agentic system needed strategies for handling these situations rather than simply failing or making guesses. When the system encountered elements it couldn't identify with high confidence, it would flag them for human review rather than including uncertain extractions in the final output.
The system used confidence scoring throughout the extraction process. Each identified component received a confidence score based on how clearly it could be recognized and how well it matched expected patterns. Components with low confidence scores were either analyzed with additional techniques or escalated to human reviewers. This approach meant the system could operate autonomously for clear-cut situations while appropriately involving humans for genuinely ambiguous cases.
The language model component played a crucial role in reasoning about ambiguity. When vision models proposed multiple possible interpretations for a visual element, the language model would consider contextual information from annotations, dimensions, and nearby elements to determine which interpretation made most sense. This reasoning capability helped the system handle the kinds of context-dependent judgments that make engineering document interpretation a knowledge-intensive task rather than just a visual recognition problem.
From Hours to Minutes with Comparable Quality
The transformation in workflow proved dramatic. Tasks that previously required engineers to spend hours carefully reviewing drawings and compiling lists now completed in minutes. The system would process a set of engineering drawings, extract all relevant quantities and materials, and generate structured Bills of Quantities and Bills of Materials that engineers could review and refine rather than creating from scratch.
The quality of automated extraction approached human accuracy for straightforward drawings and common building components. For more complex or unusual situations, the system appropriately flagged items for human review rather than confidently extracting incorrect information. This balance between automation and human oversight meant that engineering teams could trust the output while still applying their expertise to validate and refine results.
The business impact extended beyond time savings. Engineering firms could respond to client requests for proposals more quickly, giving them competitive advantages in situations where speed mattered. They could handle larger volumes of proposal work without proportionally increasing engineering staff. Most importantly, engineers shifted their time from tedious extraction work to higher-value activities like design optimization, constructability review, and client consultation.
The project demonstrated that successful AI automation of professional knowledge work requires more than applying machine learning models to documents. It demands architectural approaches that can orchestrate multiple AI capabilities, handle the ambiguity inherent in real-world documents, and appropriately involve human expertise where it remains essential. The agentic approach we developed for this construction application established patterns that could be applied to document automation in other professional domains facing similar challenges.