ABOUT THE ROLE
We are seeking a skilled and motivated Data Engineer to join our growing technology team. The role involves building and maintaining scalable, reliable, and secure data infrastructure to support analytics, data-driven decision-making, and AI/ML pipelines. You’ll work with diverse data types and modern data platforms to design efficient data pipelines and ensure smooth data flow across systems.
Key Responsibilities:
- Design, develop, and maintain robust ETL/ELT pipelines for structured and unstructured data using tools like Apache NiFi, Airflow, or Dagster.
- Build streaming and event-driven data pipelines using Kafka, RabbitMQ, or similar systems.
- Design and manage scalable data lakes (e.g., Apache Hudi, Iceberg, Delta Lake) over Amazon S3 or MinIO.
- Implement and optimize distributed databases such as Cassandra, MongoDB, ClickHouse, and ElasticSearch.
- Ensure data quality, monitoring, and observability across all data pipeline components.
- Work with query engines like Trino for federated data access.
- Manage data versioning and reproducibility using tools like DVC.
- Perform data migrations, query optimization, and system performance tuning.
- Collaborate with analytics, product, and AI teams to provide clean and well-structured datasets.
Must-Have Skills & Experience:
- Bachelor’s or Master’s degree in Computer Science, Information Technology, or a related field.
- 1–2 years of experience as a Data Engineer or in a similar role.
- Strong proficiency in Python and SQL.
- Hands-on experience with ETL orchestration tools (Airflow, NiFi, Dagster).
- Familiarity with data lakes, streaming platforms, and distributed databases.
- Experience working with cloud/object storage (Amazon S3, MinIO).
- Knowledge of data governance, security, and pipeline observability.
Good-to-Have Skills:
- Experience with time-series databases (InfluxDB, TimescaleDB, QuestDB).
- Familiarity with graph databases (Neo4j, OrientDB, or RavenDB).
- Understanding of MLOps, feature stores, or data lifecycle automation.
- Exposure to Elasticsearch for indexing and search use cases.
- Experience in query performance tuning and data migration strategies.
Tech Stack in Use
| Category | Tools & Technologies |
|---|---|
| Programming & Scripting | Python, SQL |
| Orchestration & ETL | Apache Airflow, Dagster, Apache NiFi |
| Streaming & Messaging | Apache Kafka, RabbitMQ |
| Data Lakes & Storage | Apache Hudi, Iceberg, Delta Lake, Amazon S3, MinIO |
| Data Versioning | DVC |
| Query Engines | Trino |
| Distributed Databases | Cassandra, ClickHouse, MongoDB, ElasticSearch, Redis, HBase |
| SQL/Relational Databases | PostgreSQL, MySQL, MariaDB, Bytebase |
| Time-Series Databases | InfluxDB, TimescaleDB |
| Graph Databases | OrientDB, RavenDB, Neo4j |
| Distributed Processing | Apache Spark |
Information
- Job Title: Data Engineer
- Company: NUVO AI
- Location: Vapi, Gujarat
- Joining Date: Immediate
- Salary Range: INR 4 - 8 LPA (Fresher) | Based on Interview performance (Experienced)
- Job Type: Full-Time
- Education: B.E./B.Tech/M.Tech in Computer Science, AI, ML & Data Science (2024, 2025, or Earlier graduates).
Our Hiring Partners