Introduction
An e-commerce website, utilizes a complex system of indexers and crons to manage product data, update the catalog, and provide a seamless user experience. The primary objective of the indexers and crons is to enhance the browsing experience for customers and ensure the website remains up-to-date with the latest products and offers. Additionally, they aim to automate repetitive tasks, making the website's operations more efficient.
Business Challenge
An e-commerce website, faced a significant challenge in managing its vast product catalog and ensuring data accuracy. The website hosted the followings:
120+ product categories
92,000+ product catalogues
550+ brands
28,000+ local sellers
It received data from various channels, including the catalog team, which created product categories and catalogues on Magento, and respective Scheme-dealer mapping. The major challenge was to extract, transform, and load this huge dataset into the database, while keeping in mind the frequency of changes and the need for a scalable and stable solution.
Solution
To address the challenge of keeping the website up-to-date with the latest product information, we designed a solution that leverages a centralized AWS S3 bucket to receive dealer details, catalog data, inventory, and dealer master data from the business team. This data is received in ETL format, thrice a day, and daily files are processed through an automated ETL pipeline that transforms and loads the data into the database and process through the various indexers and cron jobs mentioned below.
Indexers
- Bulk Indexer: Responsible for precooking product data, including SKU, model, scheme, dealer, and city details, and storing it in MongoDB.
- Express Bulk Indexer: Processes incremental data, updating the catalog with new or modified products.
- Elastic Search Indexer: Pushes data from MongoDB and Magento DB to Elastic Search, enabling fast and accurate search results.
- Doc Indexer:Validates and updates offer data from CSV files, ensuring data consistency and accuracy.
- Offer Indexer: Activates or deactivates products based on offer data, updating the catalog accordingly.
- Configurable Product Indexer: Maps similar products based on group identifiers, enabling product variations.
- Special Offer Indexer: Pushes special offers to MongoDB, enhancing the user experience.
- Two-wheeler Indexer: Updates two-wheeler product data, ensuring accurate and up-to-date information.
- Delete Offer Indexer: Deletes duplicate records from the indexer_seller_offers table.
- Dealer Coordinate Indexer: Pushes logistic details to PM Mongo, enhancing the delivery experience.
- Logistic Category Indexer: Updates logistic data in PM Mongo DB, ensuring accurate delivery information.
- FAQ Indexer: Updates FAQ data in Elastic Search, improving the user experience.
Cron Jobs
- Catalog Cron Magento: Creates a JSON file of the entire SKU catalog, pushing it to AWS S3.
- Catalog Cron AEM: Creates PDP pages based on the catalog.json file, ensuring accurate and up-to-date product information.
- Category Cron Magento: Creates a JSON file of categories, pushing it to AWS S3.
- Category Cron AEM: Deletes and recreates PLP pages based on the category.json file.
- Breadcrumb Cron Magento: Generates breadcrumbs for each PDP page, enhancing navigation.
- Order Dump Cron: Generates order-related data in xlsx format for reporting purposes.
- Catalog Product CSV Cron: Generates a complete catalog dump, including all product details.
- One Minute Cron: Pushes order data from SFDC, ensuring timely order processing.
- Online DP PG Push Cron: Pushes online down payment cases to SFDC, ensuring accurate payment processing.
- PG Retry Cron: Adds a RETRY CTA to the "my order" section, enhancing the user experience.
- Google Feed Cron: Generates a JSON file for indexing on Google, improving search engine optimization.
- Search Feed Cron: Generates a large JSON file for third party search provider, enabling fast and accurate search results.
Specific Actions Taken
Setup Data Sharing Process: We established a process for the business team to share data in a specific format, aligning their frequency of sending data to our ETL pipeline.
Aligned Business Team: We worked closely with the business team to ensure they understood the data sharing process and frequency, minimizing manual intervention and ensuring a smooth process.
Impact
Improved Data Syncing: The website's database is now synced with the latest data, ensuring accuracy and reducing manual intervention.
Increased Efficiency: Automated processing of large datasets has improved efficiency, reducing the time and effort required to update the website.
Enhanced User Experience: With the latest data available, users can now search and find products more easily, enhancing their overall experience on the website.
Business KPIs
Monthly Indexer Performance Report: Tracks the performance of each indexer, including success and failure rates.
Indexer Dashboard: Provides a detailed report of all indexers, enabling performance analysis and optimization.