Data Extraction Architect
PST.AG
Specification-Driven Extraction Engineering: Design and maintain declarative extraction specifications—using Pydantic models, JSON schemas, or domain-specific languages—that describe exactly which fields to capture, their types, and validation rules. Implement pipelines that translate these specifications into executable extraction plans, leveraging both classical (Scrapy, Playwright) and AI-augmented (LLM-based semantic parsing) backends. Build reusable specification libraries for recurring data types (product prices, tariff codes, regulatory texts) to accelerate onboarding of new sources. Autonomous & Self-Healing Systems: Deploy self-healing spiders that automatically detect website layout changes and repair themselves using Model Context Protocol (MCP) servers (e.g., Scrapy MCP Server, Playwright MCP). Integrate semantic extraction (Scrapy-LLM, custom LLM pipelines) to eliminate selector brittleness—spiders rely on field descriptions, not fragile XPaths. Hands-on experience building AI agents and orchestration systems. Orchestrate complex, multi-step browsing workflows with agentic frameworks (BMAD/TEA, AutoGPT-like agents) that reason about page state, adapt to anti-bot measures, and correct their own behaviour in real time. Platform Thinking & Reusability: Move beyond one-off scrapers: build a component-based extraction platform where selectors, login handlers, and pagination logic are shared, versioned, and tested. Implement monitoring, alerting, and automatic rollback for failed extraction runs. Champion ethical crawling by design—rate limiting, robots.txt respect, and compliance with GDPR/CCPA are built into the specification layer, not retrofitted. Collaboration & Continuous Innovation: Partner with data scientists and domain experts to refine extraction specifications for complex, unstructured domains (e.g., legal texts, tariff classifications). Evaluate and pilot emerging tools to push automation coverage beyond 90%. Document and evangelise specification-driven best practices across the engineering organisation. Qualification: Bachelor’s degree in Computer Science 3+ years of experience in web scraping or data extraction Required Skills: Proficiency with Python Experience with specification-Driven Extraction Hands‑on use of Scrapy‑LLM, Scrapy MCP Server, or similar systems that decouple field definitions from page structure Familiarity with frameworks that give LLMs browser control (Playwright + MCP, BMAD/TEA) to handle complex, non‑deterministic crawling tasks. Design and implement autonomous data extraction agents that can make decisions about source selection, retry logic, and parsing strategies Classical Scraping Fundamentals Data Validation & Storage – Ability to define validation rules within specifications and land clean data into SQL/NoSQL databases or data lake Basic API integration and authentication flows. DOM, XPath, CSS. Nice to Haves: Contributions to open-source scraping or AI-automation projects. Contributions to open-source scraping or AI-automation projects. Familiarity with data privacy engineering (GDPR, CCPA) baked into specification design. DevOps light – Docker, CI/CD for testing extraction specifications. Mindset & Approach (Non-Negotiable): Strong belief that the future of scraping is declarative, not imperative. You’d rather write a schema that says “extract the price” than debug an XPath when a website redesigns. Looking to shift from “code that scrapes” to “systems that understand extraction”
- ...Type: Full-time What You’ll Do Salesforce Application Architect who is already operating at senior engineering level and is... ...Responsibilities Lead solution design across Salesforce: data model, security/sharing model, automation strategy, integration...
- ...Make your mark on how JPMorganChase monitors and manages risk across third‑party payment processing relationships. You’ll join a data-focused governance team where your analytics and automation skills directly improve the quality, speed, and reliability of oversight...
- ...Role Overview A global organization is seeking a Snowflake Data Engineer to design, build, and optimize modern cloud-based data solutions. This role focuses on developing scalable data pipelines, implementing efficient ELT processes, and leveraging advanced platform...
- ...Role Overview A leading global technology organization is looking for a Databricks Data Engineer to design, build, and optimize modern data platforms in a cloud environment. This role focuses on developing scalable data pipelines, ensuring data quality, and enabling...
640000 - 910000 Php per year
...About the Role Join our team as a Data Center Technician in Cebu, Philippines . This is a full-time, on-site position where you'll support critical infrastructure at world-class data center facilities. Compensation & Benefits ₱640,000-910,000/year (PHP annual...- ...~ College Graduate Qualifications: With at least 4 years of BPO experience Proficient in Advanced Excel and data tools with strong numerical and technical skills Excellent communication skills to convey complex analyses clearly Detail-oriented...
- ...One of the most famous manufacturing companies for luxury cars is expanding its worldwide production network Position: Mandarin Data Specialist Industry: German Luxury Car Company Location: Cebu City Schedule: Midshift Schedule Salary: Php 80, 000 Work Set...
- ...traffic, spend more time with your loved ones, and work comfortably without a dress code. Role We are looking for a detail-oriented Data Management Specialist to oversee real-time reporting and maintain the highest standards of data integrity. In this role, you will be...
- DUTIES and RESPONSIBILITIES: • Analyze data gathered through different sources, collated through Microsoft Excel, using various formulas and functions. • Ensure compliance with established internal control procedures • Involve in the creation of reports (either...
- ...processing for sellers and buyers. This has been awarded as one of the Top Logistics service providers. Position: Mandarin Supply Chain Data Analyst Industry: Biggest Logistics Firm Location: Cebu City Schedule: Midshift Schedule | (Monday to Friday at 12 noon to 9 pm)...
- ...Structured Query Language (SQL) , Statistical Analysis Software (SAS) or other programming languages Ability to query large amounts of data and transform the raw data into actionable management information Strong analytical and problem-solving abilities Effective...
- ...solutions Apply AI technologies across different domains such as data, engineering, operations, platforms, and business decision-... ...multidisciplinary teams including engineers, data scientists, architects, and domain specialists Contribute to the development of secure...
- ...This role is responsible for identifying business bottlenecks, architecting scalable system solutions, and driving technology initiatives that... ...systems, and cross-platform integrations Analyze data flows between Amazon, internal tools, and external systems...
- ...primary liaison between external stakeholders—including developers, architects, contractors, and property managers—and internal operations,... ...timely execution and proactively resolve bottlenecks. Data Integrity: Maintain precise, real-time records of all sales pipelines...
- ...infrastructure, and operational defenses rather than only protecting data. Main Role of a Cybersecurity Specialist Cybersecurity... ...Cybersecurity Specialist 3. Security Engineer 4. Security Architect 5. Chief Information Security Officer (CISO) Certifications...
- ...with e-commerce or marketplace platforms Strong written English communication skills Proficiency in Microsoft Excel (filtering, data extraction, reporting) Exceptional attention to detail Ability to interpret complex policy and compliance documentation Self-motivated...
- ...PM AEST | 6:00 AM – 3:00 PM PHT Employment Type: Full-time Ready to do work that actually excites you? As a remote Architect, you will collaborate with our local team on various architectural projects, providing creative design solutions, technical drawings...
- ...formerly Power Virtual Agents) Integrate Copilot with enterprise data sources (Dataverse, SharePoint, external APIs) Identify... ...transformation, and AI data readiness Collaboration Work with architects, functional consultants, and stakeholders Participate in...
- ...materials for the whole team. Duties & Responsibilities Compliance Monitoring: Attend and review monthly MQG compliance sessions and extract key updates. Reporting & Documentation: Prepare concise and structured compliance summary reports. Training & Knowledge...
- ...Apps, Automate, Dataverse) Dynamics 365 CE Integration and data architecture Define scalable, secure, and maintainable... ...approaches Stakeholder Engagement Work directly with clients, architects, and business stakeholders Translate business needs into...
- ...to join our IT team. In this role, you will act as a "solutions architect," helping us design, build, and maintain secure and user-... ...implementation of web security principles to protect sensitive financial data. Qualifications / Requirements - Education: Currently...
- ...drawing standards, templates, and layer conventions across all deliverables. Coordination & Collaboration Liaise directly with architects, builders, and consultants to ensure technical accuracy and alignment across disciplines. Review and integrate consultant...
- ...platform. This is a high-level leadership role responsible for architecting the systems that manage user behavior risk, transaction... ...promotional control, and KYC ecosystems. By leveraging advanced data analysis, risk modeling, and strategic planning, the successful...
- ...of Sales Representatives to ensure high performance. • Build and maintain strong ties with key clients, contractors, developers, architects, and engineers. • Conduct research to identify new opportunities, trends, and the competitive landscape in the region. •...
- ...Conduct outbound calls and schedule qualified appointments Qualify leads and gather project-related information Communicate with architects, contractors, designers, developers, and other industry professionals Maintain accurate CRM records and follow-up activities...
- ...mockups and projects As a key member of our Sales Team, you will: • Develop and manage client relationships with contractors, architects, and distributors across nationwide, promoting the company's innovative lightweight concrete product. • Achieve sales targets...
- ...looking for a seasoned IT powerhouse to lead our network infrastructure. This isn’t a routine support role; you will be the primary architect and guardian of our enterprise ecosystem—from multi-vendor firewalls to Azure Cloud architecture, Microsoft 365. If you have...
- ...drawings for New York-based projects Ensure all plans comply with NYC Building Code (NYCBC) requirements Coordinate with engineers, architects, and project teams Review and interpret technical specifications and construction documents Maintain accuracy and compliance in...
- ...presentations, client visits, mockups and projects Key Responsibilities: Develop and manage B2B client relationships with contractors, architects, developers, and distributors nationwide, promoting the company’s innovative lightweight concrete product. Conduct field selling...
- ...com, Expedia, and Google Travel, Guesty empowers property managers to deliver exceptional guest experiences while running efficient, data-driven operations. Today, we’re proud to have 800+ team members across 16 countries, all working together to shape the future of...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Data Extraction Architect. Be the first to apply!

