Karthik Kandaswamy

Facial Emotion Recognition (Assistive Tech)

Compared three approaches—CNN on pixels, MLP on pixels, and MLP on landmark-distance features— to classify 5 emotions on 48×48 face crops. Built to understand accuracy vs robustness tradeoffs for real-time, accessible use cases.

TensorFlow/KerasCNNsFeature EngineeringEvaluationAssistive Tech
Timeline: Spring 2023 – Spring 2024 • Presented at Synopsys Science Fair (Spring 2024)
Task
5-class emotion
ANGRY/HAPPY/SAD/SURPRISE/NEUTRAL
Inputs
Pixels + Landmarks
48×48 pixels + landmark-derived features
Models
CNN + 2×MLP
Comparison across pipelines

Interactive case study

Model
CNN on pixels
Best at capturing texture cues; typically strongest accuracy with enough data.
Input
FER-style grayscale faces (48×48)
ANGRY • HAPPY • SAD • SURPRISE • NEUTRAL
Reported performance
~70% validation
From project writeup / prior runs
Architecture (high-level)
48×48PixelsConv + Pool× NFlattenDenseSoftmax(5 classes)
This is a visual explanation of the pipeline (not a live inference demo yet).
Tradeoff knob
Slide toward speed or toward accuracy/robustness — updates the recommendation below.
Speed
Robustness
Balanced: CNN is strong, but landmarks can help if domain shift is expected.
What worked well
  • Learns spatial features automatically (edges → patterns)
  • Good performance when data is consistent
  • Works well for end-to-end pipelines
Limitations / gotchas
  • More sensitive to lighting/background/domain shift
  • Heavier compute than a small MLP
Image
Image
Confusion matrix
Confusion matrix

Approach

  • Baseline: MLP on flattened pixels to set a simple performance floor.
  • CNN: learn spatial features directly from 48×48 images (stronger representation).
  • Landmarks: feature-engineered distances to reduce sensitivity to background/lighting changes.
  • Evaluation mindset: focus on confusion patterns (e.g., angry vs sad), not just overall accuracy.