Project Overview
FindGlo is a vertical search engine designed to disrupt the "Brand Tax" in the beauty industry. While marketing often dictates price, the chemical reality of skincare is defined by INCI (International Nomenclature of Cosmetic Ingredients) lists.
This platform bridges that gap. By treating skincare formulas as structured data, FindGlo scrapes, normalizes, and indexes thousands of SKUs to mathematically identify "dupes"—products with statistically significant chemical similarity but drastically different price points.
Technical Architecture
1. The Data Pipeline (Python & ETL)
The core of FindGlo is a robust Extract, Transform, Load (ETL) pipeline designed to handle the messy reality of e-commerce data.
- Distributed Scraping: A custom Python engine crawls manufacturer sites and authorized retailers, bypassing anti-bot measures to gather verified SKU-level data.
- INCI Normalization: Raw ingredient strings are parsed and mapped to a canonical chemical database (e.g., treating "Water," "Aqua," and "Eau" as a single entity).
- Automated Validation: Cross-referencing logic ensures that formulation changes or discontinuations are reflected in real-time.
2. The Similarity Algorithm
How do we define a "dupe"? The platform uses a weighted vector approach:
- Ingredient Vectorization: Each product is represented as a high-dimensional vector based on its ingredient list.
- Active Weighting: "Active" ingredients (Retinol, Vitamin C, Acids) are weighted more heavily than solvents or stabilizers.
- Cosine Similarity: We calculate the cosine similarity between product vectors to generate a "Match Score," allowing users to find mathematical equivalents, not just subjective alternatives.
3. Frontend Performance
Built on Next.js and Tailwind CSS, the frontend is optimized for instant search and SEO.
- SSG for Product Pages: Individual product pages are Statically Generated (SSG) to ensure instant loading and maximum indexability by Google.
- Edge Caching: Search results for popular queries are cached at the edge, delivering sub-second response times for complex database queries.
Key Features
The "Brand Tax" Calculator
FindGlo exposes price disparities by normalizing cost against active ingredients.
- Price-Per-Active: A custom metric that calculates value based on the concentration of key ingredients rather than volume.
- Visual Comparison: Side-by-side "diff" views highlight exactly which ingredients differ between a $300 luxury cream and its $50 alternative.
Ingredient Decoder
Transforming scientific nomenclature into consumer insights.
- Safety Flags: Automatic detection of common allergens, irritants, and comedogenic ingredients.
- Function Mapping: Hovering over any obscure chemical name (e.g., Tocopherol) instantly explains its function (Vitamin E / Antioxidant).
User Impact
| User Segment | Problem | Solution |
|---|---|---|
| "Skintellectuals" | Overwhelmed by marketing jargon. | Ingredient Decoder: Plain English explanations for complex chemicals. |
| Budget Shoppers | Cannot afford luxury formulations. | Dupe Finder: Finds 95% chemical matches at 20% of the price. |
| Dermatologists | Need quick ingredient verification. | INCI Index: Rapid lookup of potential patient allergens. |
Results & Scalability
Currently indexing thousands of SKUs, the database grows daily. The normalized data structure allows for future expansion into "Personalized AI Dermatologist" features, where recommendation engines can suggest routines based on chemical compatibility rather than brand loyalty.
![FindGlo [Coming soon]](/_next/image?url=%2Fprojects%2Ffind-glo.webp&w=3840&q=75)