Job Description:
Data Scientist / Data Warehouse Engineer (Unstructured Data Extraction & Processing)Position Overview
Parsons Corporation is seeking a Data Scientist / Data Warehouse Engineer with a strong focus on handling unstructured data extraction and processing. The ideal candidate will design, develop, and maintain scalable data pipelines, integrating both structured and unstructured data from various sources. This role requires technical expertise in data engineering tools and best practices, as well as excellent communication and collaboration skills to work cross-functionally with data analysts, scientists, and stakeholders. Join a dedicated and distributed team of scientists, software architects, and software engineers responsible for developing a Generative Artificial Intelligence (GenAI) enabled capability to expedite the design of infrastructure projects such as highways, bridges, etc.
Key Responsibilities
Unstructured Data Processing
- Extract, cleanse, and process unstructured data (e.g., text, logs, images) for use in analytics and machine learning.
- Develop and optimize custom ETL/ELT pipelines to handle complex data formats and large data volumes.
Data Pipeline Development
- Build robust and scalable data pipelines using Apache Spark, Hadoop, or Apache Beam.
- Automate workflows and schedule data processes using orchestration tools such as Apache Airflow, Prefect, or Luigi.
Data Warehousing & Storage
- Design, implement, and maintain modern data warehouse solutions (e.g., Databricks, Snowflake, Redshift, BigQuery).
- Manage both relational (SQL) and NoSQL databases for structured and unstructured data storage.
Cloud Integration
- Deploy and optimize data solutions on cloud platforms (Azure, AWS, or GCP).
- Leverage services like Azure Data Factory, AWS Glue, or Google Dataflow for seamless data ingestion and transformation.
Performance Optimization & Troubleshooting
- Monitor, diagnose, and improve data system performance and reliability.
- Collaborate with other teams to refine database queries, optimize ETL processes, and ensure data integrity.
Data Governance & Security
- Implement data quality checks, versioning, and security protocols in compliance with regulations (GDPR, CCPA).
- Ensure robust access controls and encryption measures for sensitive information.
Collaboration & Documentation
- Work closely with cross-functional teams to understand data requirements and deliver solutions.
- Document workflows, system designs, and troubleshooting procedures to support knowledge sharing and future maintenance.
Required Technical Skills
Programming
- Proficiency in Python for data processing and automation.
- Experience with scripting languages (e.g., Bash, Shell) is a plus.
Data Processing Frameworks
- Hands-on experience with Apache Spark, Hadoop, or Apache Beam.
- Familiarity with ETL/ELT processes and best practices.
Database & Querying
- Strong understanding of SQL with experience in PostgreSQL, MySQL, or Oracle.
- Exposure to NoSQL databases like MongoDB, Cassandra, or DynamoDB.
Cloud Platforms
- Working knowledge of Azure (e.g., Data Factory, Synapse, Data Lake), AWS (e.g., S3, Redshift, Glue), or GCP (BigQuery, Dataflow).
Data Warehousing
- Experience with Databricks, Snowflake, Redshift, or BigQuery.
Data Pipelines & Orchestration
- Familiarity with workflow orchestration tools (Airflow, Prefect, Luigi).
Big Data Tools
- Proficiency working with distributed data systems like HDFS or cloud-native equivalents.
Version Control
- Skilled in Git for collaborative development and code versioning.
Experience
- Years of Experience: Minimum 4 years in data engineering, data warehousing, or a related field.
- Project Exposure: Demonstrated ability to build and optimize scalable data pipelines for both batch and real-time processing.
- Debugging & Optimization: Proven track record of diagnosing performance issues and optimizing data systems.
- Data Governance & Security: Experience implementing data privacy regulations and best practices in data quality and access controls.
Soft Skills
Problem-Solving
- Capable of independently troubleshooting complex data and system issues.
Communication
- Strong ability to collaborate with data analysts, scientists, and other engineers to translate business requirements into effective data solutions.
Documentation
- Competent in documenting data workflows, system designs, and troubleshooting steps clearly and concisely.
Team Collaboration
- Experience working in cross-functional teams of professionals that are located around the world, ideally within Agile or similar methodologies.
Education
- Bachelor’s or Master’s degree in Computer Science, Information Systems, Engineering, or a related field.
- Equivalent practical experience can compensate for formal education in some cases.
Certifications (Optional but Valuable)
- AWS Certified Data Analytics – Specialty
- Google Professional Data Engineer
- Microsoft Azure Data Engineer Associate
- Databricks Certified Data Engineer Associate
Additional Considerations
- Analytical & Statistical Skills: A background in data analysis or data science is highly beneficial for designing effective data models and understanding business insights.
- Machine Learning Integration: Exposure to integrating machine learning pipelines, especially GenAI technology, for further data-driven intelligence is a plus.
- Innovative Mindset: Enthusiasm for exploring new tools, frameworks, and methodologies to continually optimize data solutions.
Why Join Us
- Impactful Role: Shape the architecture and strategy for unstructured data management and analytics, influencing key decisions and driving business value.
- Collaborative Environment: Work alongside a dynamic team of data professionals, leveraging cutting-edge technologies to solve real-world challenges.
- Professional Growth: Expand your technical acumen and leadership capabilities in a role that offers continuous learning and development opportunities.
Minimum Clearance Required to Start:
Not Applicable/NoneParsons is an equal opportunity employer committed to diversity in the workplace. Minority/Female/Disabled/Protected Veteran.