Skip to main content

BAJAJ TECHNOLOGY SERVICES

Website Indexers and Crons for a leading E-commerce Platform

Website Indexers and Crons for a leading E-commerce Platform
This case study aims to provide an in-depth understanding of functionality of indexers and crons, highlighting their business purpose, technical details, and performance metrics.
Oct 22, 2024
Website Indexers and Crons for a leading Ecommerce Platform

Introduction

An e-commerce website, utilizes a complex system of indexers and crons to manage product data, update the catalog, and provide a seamless user experience. The primary objective of the indexers and crons is to enhance the browsing experience for customers and ensure the website remains up-to-date with the latest products and offers. Additionally, they aim to automate repetitive tasks, making the website's operations more efficient.

Business Challenge

An e-commerce website, faced a significant challenge in managing its vast product catalog and ensuring data accuracy. The website hosted the followings:

120+ product categories

92,000+ product catalogues

550+ brands

28,000+ local sellers

It received data from various channels, including the catalog team, which created product categories and catalogues on Magento, and respective Scheme-dealer mapping. The major challenge was to extract, transform, and load this huge dataset into the database, while keeping in mind the frequency of changes and the need for a scalable and stable solution.

Solution

To address the challenge of keeping the website up-to-date with the latest product information, we designed a solution that leverages a centralized AWS S3 bucket to receive dealer details, catalog data, inventory, and dealer master data from the business team. This data is received in ETL format, thrice a day, and daily files are processed through an automated ETL pipeline that transforms and loads the data into the database and process through the various indexers and cron jobs mentioned below.

Indexers

  • Bulk Indexer: Responsible for precooking product data, including SKU, model, scheme, dealer, and city details, and storing it in MongoDB.
  • Express Bulk  Indexer: Processes incremental data, updating the catalog with new  or modified products.
  • Elastic  Search Indexer: Pushes data from MongoDB and Magento DB to Elastic  Search, enabling fast and accurate search results.
  • Doc Indexer:Validates and updates offer data from CSV files, ensuring data  consistency and accuracy.
  • Offer Indexer: Activates or deactivates products based on offer data, updating the catalog accordingly.
  • Configurable Product Indexer: Maps similar products based on group identifiers,  enabling product variations.
  • Special Offer Indexer: Pushes special offers to MongoDB, enhancing the user  experience.
  • Two-wheeler Indexer: Updates two-wheeler product data, ensuring accurate and  up-to-date information.
  • Delete Offer Indexer: Deletes duplicate records from the indexer_seller_offers table.
  • Dealer  Coordinate Indexer: Pushes logistic details to PM Mongo, enhancing the delivery experience.
  • Logistic Category Indexer: Updates logistic data in PM Mongo DB, ensuring  accurate delivery information.
  • FAQ Indexer: Updates FAQ data in Elastic Search, improving the user experience.

Cron Jobs

  • Catalog Cron  Magento: Creates a JSON file of the entire SKU catalog, pushing it  to AWS S3.
  • Catalog Cron  AEM: Creates PDP pages based on the catalog.json file, ensuring  accurate and up-to-date product information.
  • Category Cron Magento: Creates a JSON file of categories, pushing it to AWS  S3.
  • Category Cron AEM: Deletes and recreates PLP pages based on the category.json  file.
  • Breadcrumb Cron Magento: Generates breadcrumbs for each PDP page, enhancing  navigation.
  • Order Dump Cron: Generates order-related data in xlsx format for reporting  purposes.
  • Catalog Product CSV Cron: Generates a complete catalog dump, including all  product details.
  • One Minute Cron: Pushes order data from SFDC, ensuring timely order processing.
  • Online DP PG Push Cron: Pushes online down payment cases to SFDC, ensuring  accurate payment processing.
  • PG Retry Cron: Adds a RETRY CTA to the "my order" section, enhancing the user experience.
  • Google Feed Cron: Generates a JSON file for indexing on Google, improving search  engine optimization.
  • Search Feed Cron: Generates a large JSON file for third party search provider, enabling fast and accurate  search results.

Specific Actions Taken

Setup Data Sharing Process: We established a process for the business team to share data in a specific format, aligning their frequency of sending data to our ETL pipeline.

Aligned Business Team: We worked closely with the business team to ensure they understood the data sharing process and frequency, minimizing manual intervention and ensuring a smooth process.

Impact

Improved Data Syncing: The website's database is now synced with the latest data, ensuring accuracy and reducing manual intervention.

Increased Efficiency: Automated processing of large datasets has improved efficiency, reducing the time and effort required to update the website.

Enhanced User Experience: With the latest data available, users can now search and find products more easily, enhancing their overall experience on the website.

Business KPIs

Monthly Indexer Performance Report: Tracks the performance of each indexer, including success and failure rates.

Sample indexers running status

Indexer Dashboard: Provides a detailed report of all indexers, enabling performance analysis and optimization.

Ecommerce2

Written by

Dhiraj Jha
Head - commerce & experience