Learn Creative Coding (#95) - Face Mesh and Expression

1 day ago

Learn Creative Coding (#95) - Face Mesh and Expression

Two episodes ago we tracked the body as a skeleton -- 17 keypoints from nose to ankles. Last episode we zoomed into the hands -- 21 keypoints per hand, enough to detect pinches, count fingers, classify gestures. Each step brought finer resolution. 17 points for the whole body. 21 points for one hand. Now we go finer still: 468 points on the face.

Face mesh is a different kind of tracking. The body gives you posture. Hands give you gestures. The face gives you expression. Raised eyebrows. A smile spreading. Lips parting in surprise. A jaw clenching. The face is the most communicative surface on the human body -- we read faces constantly, unconsciously, millisecond by millisecond. And now we can read them computationally. 468 landmarks tracing every contour: jawline, cheekbones, nose ridge, eye outlines, eyebrow arcs, lip boundaries, forehead. Enough detail to reconstruct a 3D wireframe of the face, deform it in real time as expressions change, and map those changes to visual output.

ml5 wraps MediaPipe's FaceMesh model in the same API pattern we've used for bodyPose and handPose. Load model, feed video, get results. The difference is scale -- instead of 17 or 21 keypoints you get 468, each with x, y, and z coordinates. And the face mesh comes with a triangulation map that tells you how to connect those points into a surface. It's not just dots on a face. It's a full mesh -- geometry you can render, deform, texture, and project onto.

Setting up face mesh

Same setup pattern as the last two episodes. Load the model in preload, capture video, start continuous detection. The API call is ml5.faceMesh.

let video;
let faceMesh;
let faces = [];

function preload() {
  faceMesh = ml5.faceMesh({ flipped: true });
}

function setup() {
  createCanvas(640, 480);
  video = createCapture(VIDEO, { flipped: true });
  video.size(640, 480);
  video.hide();

  faceMesh.detectStart(video, function(results) {
    faces = results;
  });
}

function draw() {
  image(video, 0, 0);

  if (faces.length === 0) return;

  // draw all 468 keypoints
  const face = faces[0];
  for (const kp of face.keypoints) {
    noStroke();
    fill(180, 120, 220, 120);
    circle(kp.x, kp.y, 3);
  }
}

Run this and look at the camera. Your face fills with a cloud of purple dots -- 468 of them, tracing every surface contour. Move your head and they follow. Raise your eyebrows and the dots above your eyes shift upward. Open your mouth and the lip dots spread apart. It's denser than anything we've seen from the previous models. Where pose detection gave us a stick figure and hand tracking gave us a skeleton, face mesh gives us a surface. The dots are so close together they almost form a continuous field.

One thing to note: face mesh is the heaviest model we've used so far. Expect 8-15 fps for detection depending on your hardware. If you're also running complex visuals, you'll want to keep your render loop efficient. Same optimization tricks from episode 92 apply -- keep the video input small if you need more headroom.

Understanding the 468 landmarks

The 468 keypoints aren't random dots on the face. They follow a specific layout defined by the MediaPipe face mesh model. Certain ranges of indices map to specific facial features:

// key landmark regions (approximate index ranges)
// these are the most useful groups for creative coding

const FACE_REGIONS = {
  // lip contours
  lipsOuter: [61, 146, 91, 181, 84, 17, 314, 405, 321, 375, 291,
              409, 270, 269, 267, 0, 37, 39, 40, 185],
  lipsInner: [78, 191, 80, 81, 82, 13, 312, 311, 310, 415,
              308, 324, 318, 402, 317, 14, 87, 178, 88, 95],

  // left eye contour
  leftEye: [33, 7, 163, 144, 145, 153, 154, 155, 133,
            173, 157, 158, 159, 160, 161, 246],

  // right eye contour
  rightEye: [362, 382, 381, 380, 374, 373, 390, 249,
             263, 466, 388, 387, 386, 385, 384, 398],

  // left eyebrow
  leftBrow: [70, 63, 105, 66, 107, 55, 65, 52, 53, 46],

  // right eyebrow
  rightBrow: [300, 293, 334, 296, 336, 285, 295, 282, 283, 276],

  // nose bridge and tip
  nose: [168, 6, 197, 195, 5, 4, 1, 19, 94, 2],

  // jawline
  jaw: [10, 338, 297, 332, 284, 251, 389, 356, 454,
        323, 361, 288, 397, 365, 379, 378, 400,
        377, 152, 148, 176, 149, 150, 136, 172,
        58, 132, 93, 234, 127, 162, 21, 54, 103, 67, 109]
};

These indices come from MediaPipe's documentation. You don't need to memorize them -- just know that specific groups of landmarks trace specific facial features. The lip landmarks form closed loops around the mouth. The eye landmarks trace the eye opening. The jaw landmarks follow the chin from ear to ear.

Let me draw the feature contours instead of raw dots:

function drawFaceContour(face, indices, col) {
  stroke(col[0], col[1], col[2], 150);
  strokeWeight(1.5);
  noFill();

  beginShape();
  for (const idx of indices) {
    const kp = face.keypoints[idx];
    vertex(kp.x, kp.y);
  }
  endShape(CLOSE);
}

function draw() {
  image(video, 0, 0);

  if (faces.length === 0) return;

  const face = faces[0];

  drawFaceContour(face, FACE_REGIONS.lipsOuter, [220, 100, 120]);
  drawFaceContour(face, FACE_REGIONS.lipsInner, [255, 130, 150]);
  drawFaceContour(face, FACE_REGIONS.leftEye, [100, 180, 220]);
  drawFaceContour(face, FACE_REGIONS.rightEye, [100, 180, 220]);
  drawFaceContour(face, FACE_REGIONS.leftBrow, [180, 200, 140]);
  drawFaceContour(face, FACE_REGIONS.rightBrow, [180, 200, 140]);
  drawFaceContour(face, FACE_REGIONS.nose, [200, 180, 150]);
  drawFaceContour(face, FACE_REGIONS.jaw, [160, 160, 180]);
}

Now instead of a dot cloud you get colored outlines tracing your facial features -- pink lips, blue eyes, green eyebrows, a beige nose line, a gray jawline. Move your face and the contours follow. Smile and the lip contour widens and curves upward. Raise your eyebrows and the green arcs shift. It's a computational line drawing of your face, updating in real time. Already kind of beautiful as-is.

The triangulation: from dots to mesh

The 468 points become a true mesh when you connect them with triangles. MediaPipe provides a triangulation array -- a long list of index triplets where each triplet defines one triangle. The triangulation covers the entire face surface.

// MediaPipe provides TRIANGULATION as a flat array of indices
// every 3 consecutive values form one triangle
// example: [0, 1, 2, 0, 2, 3, ...] means triangle(0,1,2), triangle(0,2,3), etc.
// the full array has 468 * ~3 entries (about 900+ triangles)

// ml5 exposes this as face.faceTriangulation or you can import it

function drawFaceMesh(face) {
  const tri = faceMesh.getTriangles();

  stroke(120, 180, 200, 60);
  strokeWeight(0.5);
  noFill();

  for (let i = 0; i < tri.length; i += 3) {
    const a = face.keypoints[tri[i]];
    const b = face.keypoints[tri[i + 1]];
    const c = face.keypoints[tri[i + 2]];

    triangle(a.x, a.y, b.x, b.y, c.x, c.y);
  }
}

function draw() {
  background(12, 15, 22);

  if (faces.length === 0) return;

  drawFaceMesh(faces[0]);
}

The result is a wireframe face. Hundreds of tiny triangles covering your entire face, deforming in real time as you move. Turn your head and the mesh rotates. Open your mouth and the triangles around your lips stretch and spread. Raise your eyebrows and the forehead triangles compress. It's like a digital mask that follows every micro-movement of your face. The mesh is denser around the eyes and mouth (where expressions happen) and sparser on the cheeks and forehead (where less detail is needed). That's deliberate -- the model allocates more resolution where it matters most.

This wireframe is the foundation for everything else. Once you have a mesh, you can fill the triangles with color, project textures onto them, distort them, exaggerate them, map data onto them. The face becomes a canvas with 900+ individually addressable surfaces.

Expression detection: measuring the face

The real power of face mesh isn't just tracking where the face is -- it's tracking what the face is doing. Expressions change the distances between landmarks. A smile pulls the mouth corners apart. Surprise opens the mouth and raises the eyebrows. A frown pulls the eyebrow inner ends together and down. You detect expressions by measuring landmark distances and comparing them to neutral baselines.

function getFaceDist(face, idxA, idxB) {
  const a = face.keypoints[idxA];
  const b = face.keypoints[idxB];
  return dist(a.x, a.y, b.x, b.y);
}

function detectExpressions(face) {
  // mouth openness: distance between upper and lower inner lips
  // landmark 13 = upper inner lip center
  // landmark 14 = lower inner lip center
  const mouthOpen = getFaceDist(face, 13, 14);

  // mouth width: distance between mouth corners
  // landmark 61 = left mouth corner
  // landmark 291 = right mouth corner
  const mouthWidth = getFaceDist(face, 61, 291);

  // normalize mouth open by face height for distance independence
  // landmark 10 = top of forehead
  // landmark 152 = bottom of chin
  const faceHeight = getFaceDist(face, 10, 152);
  const mouthOpenNorm = mouthOpen / faceHeight;

  // eyebrow raise: distance from eyebrow to eye
  // left: brow landmark 65, eye landmark 159
  // right: brow landmark 295, eye landmark 386
  const leftBrowRaise = getFaceDist(face, 65, 159) / faceHeight;
  const rightBrowRaise = getFaceDist(face, 295, 386) / faceHeight;
  const browRaise = (leftBrowRaise + rightBrowRaise) / 2;

  // eye openness: distance between upper and lower eyelid
  // left eye: 159 (upper), 145 (lower)
  // right eye: 386 (upper), 374 (lower)
  const leftEyeOpen = getFaceDist(face, 159, 145) / faceHeight;
  const rightEyeOpen = getFaceDist(face, 386, 374) / faceHeight;
  const eyeOpen = (leftEyeOpen + rightEyeOpen) / 2;

  // smile detection: mouth width relative to neutral
  // wider mouth + slight upward curve = smile
  const smileRatio = mouthWidth / faceHeight;

  return {
    mouthOpen: mouthOpenNorm,
    browRaise: browRaise,
    eyeOpen: eyeOpen,
    smile: smileRatio,
    mouthWidth: mouthWidth
  };
}

All measurements are normalized by face height. This is the same principle as normalizing arm spread by shoulder width in episode 93 -- it makes the values independent of how close you are to the camera. A mouth that's 10 pixels open at arm's length is the same ratio as a mouth that's 30 pixels open up close. Without normalization, every expression threshold would break when you lean forward or back.

Let's visualize these values:

function draw() {
  background(12, 15, 22);
  image(video, 0, 0, 320, 240);

  if (faces.length === 0) return;

  const face = faces[0];
  const expr = detectExpressions(face);

  // draw expression meters
  const labels = ['mouth open', 'brow raise', 'eye open', 'smile'];
  const values = [expr.mouthOpen, expr.browRaise, expr.eyeOpen, expr.smile];
  const ranges = [[0, 0.15], [0.04, 0.09], [0.01, 0.06], [0.25, 0.45]];

  for (let i = 0; i < labels.length; i++) {
    const x = 350;
    const y = 30 + i * 55;

    // normalize to 0-1 for display
    const norm = map(values[i], ranges[i][0], ranges[i][1], 0, 1, true);

    // label
    fill(160);
    noStroke();
    textSize(10);
    textFont('monospace');
    textAlign(LEFT);
    text(labels[i], x, y);

    // bar background
    fill(30, 35, 45);
    rect(x, y + 5, 250, 16, 3);

    // bar fill
    const hue = map(norm, 0, 1, 200, 0);
    fill(hue, 60, 70);
    rect(x, y + 5, norm * 250, 16, 3);

    // raw value
    fill(100);
    textSize(8);
    text(values[i].toFixed(4), x + 255, y + 17);
  }
}

Open your mouth and the "mouth open" bar fills up. Raise your eyebrows and the "brow raise" bar jumps. Widen your eyes and "eye open" increases. Smile and the "smile" bar moves right. Each expression is a continuous value, not a binary state. You can half-smile. You can slightly raise one eyebrow. The face mesh captures all of it as smooth, floating-point data. That continuity is what makes it such a rich source for creative coding. It's not yes/no. It's a slider.

Mapping expressions to visuals

Once you have expression values, mapping them to visuals follows the same pattern as everything else in this series. Expression values are just numbers. Numbers map to colors, sizes, positions, particle counts, noise parameters. Your face becomes a multi-channel controller.

let particles = [];

function draw() {
  background(8, 10, 16, 25);

  if (faces.length === 0) return;

  const face = faces[0];
  const expr = detectExpressions(face);

  // mouth open = particle emission rate
  const emitCount = Math.floor(map(expr.mouthOpen, 0.01, 0.12, 0, 15, true));

  // smile = warm colors, neutral/frown = cool colors
  const baseHue = map(expr.smile, 0.28, 0.42, 220, 20, true);

  // brow raise = particle speed
  const speed = map(expr.browRaise, 0.04, 0.08, 0.5, 4, true);

  // eye openness = particle size
  const pSize = map(expr.eyeOpen, 0.015, 0.05, 2, 14, true);

  // emit from nose tip (landmark 1)
  const noseTip = face.keypoints[1];

  for (let i = 0; i < emitCount; i++) {
    particles.push({
      x: noseTip.x,
      y: noseTip.y,
      vx: random(-1, 1) * speed,
      vy: random(-2, -0.3) * speed,
      hue: baseHue + random(-25, 25),
      size: pSize + random(-2, 2),
      life: 1.0
    });
  }

  // update and draw
  colorMode(HSB, 360, 100, 100, 100);

  for (let i = particles.length - 1; i >= 0; i--) {
    const p = particles[i];
    p.x += p.vx;
    p.vy += 0.02;
    p.y += p.vy;
    p.life -= 0.008;

    if (p.life <= 0) {
      particles.splice(i, 1);
      continue;
    }

    noStroke();
    fill(p.hue % 360, 55, 65, p.life * 50);
    circle(p.x, p.y, p.size * p.life);
  }

  colorMode(RGB);
}

Smile and warm orange-red particles bloom from your nose. Go neutral and the particles shift to cool blue. Open your mouth wide and the emission rate spikes -- particles flood outward. Raise your eyebrows and the particles accelerate. Widen your eyes and they grow larger. Close everything down -- relaxed face, mouth closed, neutral expression -- and the emission slows to a trickle of small, cool dots. Your facial expressions are directly driving a particle synthesizer. Each muscle movement changes a parameter.

Face as canvas: projecting onto the mesh

Here's where face mesh gets genuinely distinctive from the other tracking models. Because you have a full triangulated mesh, you can project visual patterns onto the face surface. The triangles deform with your facial movements, so the projected pattern follows your expressions. It's projection mapping on a face, in a browser, from a webcam.

function drawTexturedMesh(face) {
  const tri = faceMesh.getTriangles();

  for (let i = 0; i < tri.length; i += 3) {
    const a = face.keypoints[tri[i]];
    const b = face.keypoints[tri[i + 1]];
    const c = face.keypoints[tri[i + 2]];

    // center of triangle
    const cx = (a.x + b.x + c.x) / 3;
    const cy = (a.y + b.y + c.y) / 3;

    // use noise at the triangle center for coloring
    const n = noise(cx * 0.01, cy * 0.01, frameCount * 0.02);
    const hue = n * 360;

    colorMode(HSB, 360, 100, 100, 100);
    fill(hue, 60, 55, 50);
    noStroke();

    triangle(a.x, a.y, b.x, b.y, c.x, c.y);
    colorMode(RGB);
  }
}

function draw() {
  background(10, 12, 18);

  if (faces.length === 0) return;

  drawTexturedMesh(faces[0]);
}

Your face becomes a noise-colored mask. The Perlin noise pattern (episode 12) flows across the triangulated surface, shifting colors over time. Tilt your head and the pattern deforms with your face geometry. Open your mouth and the triangles around your lips stretch, warping the noise pattern. Raise an eyebrow and the forehead triangles shift. The face is simultaneously the geometry and the canvas. The pattern isn't painted onto a flat rectangle -- it's mapped onto a living 3D surface.

You can swap the noise for anything. Stripe patterns. Checkerboards. Data-driven colors. Audio-reactive hues (episode 19). The mesh gives you the geometry. What you put on it is up to you.

Stylized face drawing: computational portraiture

468 landmarks contain enough information to draw a recognizable face without the video. Strip away the webcam image and draw only the landmarks, and you get a minimal computational portrait. The face is recognizable from its geometry alone.

function drawStylizedFace(face) {
  background(15, 18, 25);

  // jawline as a smooth curve
  stroke(180, 170, 190, 100);
  strokeWeight(2);
  noFill();
  beginShape();
  for (const idx of FACE_REGIONS.jaw) {
    const kp = face.keypoints[idx];
    curveVertex(kp.x, kp.y);
  }
  endShape();

  // eyes as filled shapes
  fill(60, 90, 120, 80);
  noStroke();
  beginShape();
  for (const idx of FACE_REGIONS.leftEye) {
    const kp = face.keypoints[idx];
    vertex(kp.x, kp.y);
  }
  endShape(CLOSE);

  beginShape();
  for (const idx of FACE_REGIONS.rightEye) {
    const kp = face.keypoints[idx];
    vertex(kp.x, kp.y);
  }
  endShape(CLOSE);

  // eyebrows as thick strokes
  stroke(140, 130, 150, 120);
  strokeWeight(3);
  noFill();
  beginShape();
  for (const idx of FACE_REGIONS.leftBrow) {
    const kp = face.keypoints[idx];
    curveVertex(kp.x, kp.y);
  }
  endShape();

  beginShape();
  for (const idx of FACE_REGIONS.rightBrow) {
    const kp = face.keypoints[idx];
    curveVertex(kp.x, kp.y);
  }
  endShape();

  // lips
  fill(160, 90, 100, 70);
  noStroke();
  beginShape();
  for (const idx of FACE_REGIONS.lipsOuter) {
    const kp = face.keypoints[idx];
    vertex(kp.x, kp.y);
  }
  endShape(CLOSE);

  // nose as a simple line
  stroke(160, 150, 170, 80);
  strokeWeight(1.5);
  noFill();
  beginShape();
  for (const idx of FACE_REGIONS.nose) {
    const kp = face.keypoints[idx];
    curveVertex(kp.x, kp.y);
  }
  endShape();
}

No video. Just lines and shapes derived from the 468 landmarks. But it looks like a face -- YOUR face. The jawline curves with your jaw shape. The eyes track your eye shape and position. The lips follow your lip shape. Smile and the portrait smiles. Tilt your head and the portrait tilts. It's a live computational sketch that reduces you to essential contours. Abstraction through data reduction. 468 points is enough for recognition but abstract enough to feel like art rather than surveillance.

You could push the abstraction further. Use only the eye and lip contours -- skip the jaw and nose. Use circles at landmark positions instead of connected lines. Randomize positions slightly for a sketchy, hand-drawn feel. Each level of reduction creates a different aesthetic relationship between the person and their computational portrait. Pretty cool, right? :-)

Multi-face interaction

Face mesh detects multiple faces. The faces array can hold more than one entry, each with its own set of 468 landmarks. Two people in front of the camera means 936 tracked points. This opens up collaborative face art.

function draw() {
  background(10, 12, 18, 20);

  if (faces.length < 2) {
    fill(80);
    textSize(11);
    textFont('monospace');
    text('need 2 faces', 20, height - 20);
    return;
  }

  const faceA = faces[0];
  const faceB = faces[1];

  // draw connections between corresponding landmarks
  stroke(150, 180, 220, 25);
  strokeWeight(0.5);

  // connect every 10th landmark to keep it manageable
  for (let i = 0; i < 468; i += 10) {
    const a = faceA.keypoints[i];
    const b = faceB.keypoints[i];
    line(a.x, a.y, b.x, b.y);
  }

  // draw both face contours
  stroke(220, 130, 120, 80);
  strokeWeight(1);
  noFill();
  drawContourOnly(faceA, FACE_REGIONS.jaw);
  drawContourOnly(faceA, FACE_REGIONS.lipsOuter);

  stroke(120, 180, 220, 80);
  drawContourOnly(faceB, FACE_REGIONS.jaw);
  drawContourOnly(faceB, FACE_REGIONS.lipsOuter);

  // midpoint face: average of corresponding landmarks
  noStroke();
  fill(180, 160, 200, 40);
  for (let i = 0; i < 468; i += 5) {
    const a = faceA.keypoints[i];
    const b = faceB.keypoints[i];
    const mx = (a.x + b.x) / 2;
    const my = (a.y + b.y) / 2;
    circle(mx, my, 2);
  }
}

function drawContourOnly(face, indices) {
  beginShape();
  for (const idx of indices) {
    const kp = face.keypoints[idx];
    vertex(kp.x, kp.y);
  }
  endShape(CLOSE);
}

Two people sit in front of the camera. Thin lines connect corresponding landmarks between the two faces -- left eye to left eye, nose tip to nose tip, chin to chin. The connecting lines form a web between the two faces. And in the middle, a ghost face appears: the average of both faces, a blended identity rendered as a faint dot cloud. Move closer together and the ghost face sharpens (the averages converge). Move apart and it smears into abstraction. The piece visualizes the space between two faces -- literally.

Smoothing face landmarks

Face mesh is even jitterier than hand tracking. 468 points on a relatively small area means tiny model errors translate to visible vibrations, especially around the lips and eyebrows where the landmarks are dense. Smoothing is not optional here -- it's a requirement for anything that looks intentional rather than glitchy.

let smoothedFace = {};

function smoothFaceLandmarks(face) {
  const amt = 0.3;

  for (let i = 0; i < face.keypoints.length; i++) {
    const kp = face.keypoints[i];
    const key = 'kp' + i;

    if (!smoothedFace[key]) {
      smoothedFace[key] = { x: kp.x, y: kp.y, z: kp.z || 0 };
    }

    smoothedFace[key].x = lerp(smoothedFace[key].x, kp.x, amt);
    smoothedFace[key].y = lerp(smoothedFace[key].y, kp.y, amt);
    if (kp.z !== undefined) {
      smoothedFace[key].z = lerp(smoothedFace[key].z, kp.z, amt);
    }
  }

  return smoothedFace;
}

function getSmoothed(idx) {
  return smoothedFace['kp' + idx];
}

Same lerp approach as episodes 93 and 94. A smoothing amount of 0.3 damps most jitter while staying responsive. For the stylized portrait drawing, go lower (0.2) -- the slower response actually looks more like a hand drawing that follows the face with a slight delay. For expression detection where you need quick response to a smile or brow raise, go higher (0.4-0.5). The expression meters from earlier look much better with smoothed input -- the bars move steadily instead of jittering.

Z-depth on the face

Each face landmark includes a z coordinate -- depth from the camera. For a face, z-depth is more meaningful than for hands. The nose sticks out further than the cheeks. The eye sockets are recessed. This z variation gives you a 3D surface even from a 2D camera.

function draw() {
  background(10, 12, 18);

  if (faces.length === 0) return;

  const face = faces[0];
  const sm = smoothFaceLandmarks(face);

  for (let i = 0; i < 468; i++) {
    const raw = face.keypoints[i];
    const s = getSmoothed(i);
    const z = raw.z || 0;

    // z is negative (closer to camera = more negative)
    // normalize to 0-1 range
    const depth = map(z, -50, 10, 1, 0, true);

    // closer = bigger, brighter. further = smaller, dimmer
    const size = lerp(1, 6, depth);
    const alpha = lerp(20, 180, depth);

    noStroke();
    fill(160, 180, 220, alpha);
    circle(s.x, s.y, size);
  }
}

The nose landmarks are bright and large. The ear-area landmarks are dim and small. The eye socket landmarks sit somewhere in between. You get a 3D-feeling face from dots alone -- the depth encoding creates a natural sense of volume. The nose pops forward. The temples recede. Turn your head slightly and the depth profile changes -- the near cheek brightens while the far cheek dims. It's essentially a depth map rendered as a point cloud.

Privacy and consent

Face tracking deserves extra care compared to body or hand tracking. A face is identity. Body pose data is anonymized by nature -- a stick figure looks the same for anyone of similar build. But 468 facial landmarks encode the specific geometry of a specific person's face. The distances between your eyes, the shape of your jawline, the proportions of your nose -- these are biometric data that can identify you.

ml5 runs locally. Nothing leaves the browser. That's good. But if you're building a face-mesh installation in a public space, the considerations from episode 92 apply with even more urgency. People are rightfully sensitive about face tracking. It's associated with surveillance, with cameras that follow and identify, with systems that have documented biases around skin color, age, and gender. Even if your art is entirely benign -- a pretty generative mask, a face-driven particle system -- the act of face-tracking people without clear consent crosses a line.

For public installations: display clear signage. Process locally. Never store face data. Let people opt out by walking away. Better yet, make the face tracking visible -- show the mesh on screen so people understand exactly what the system is seeing. Transparency reduces the surveillance anxiety. For your own creative experiments in your own room, go wild. But build the awareness now, because this matters when your work eventually faces an audience.

The creative exercise: living mask

Allez, let's build something that ties it all together. A living generative mask that projects onto your face, with patterns that respond to your expressions. Smile and flowers bloom across the mesh. Open your mouth and waves ripple outward from the lips. Raise your eyebrows and the forehead pattern accelerates. Neutral face shows slow, calm noise drift. Your face is both the canvas and the controller.

let video, faceMesh;
let faces = [];
let smoothed = {};
let particles = [];

function preload() {
  faceMesh = ml5.faceMesh({ flipped: true });
}

function setup() {
  createCanvas(640, 480);
  colorMode(HSB, 360, 100, 100, 100);
  video = createCapture(VIDEO, { flipped: true });
  video.size(640, 480);
  video.hide();
  faceMesh.detectStart(video, function(r) { faces = r; });
}

function draw() {
  background(0, 0, 5, 30);

  if (faces.length === 0) return;

  const face = faces[0];
  const sm = smoothFaceLandmarks(face);
  const expr = detectExpressions(face);

  // smile drives hue: smile = warm, neutral = cool
  const baseHue = map(expr.smile, 0.28, 0.42, 220, 30, true);

  // brow raise drives noise speed
  const noiseSpeed = map(expr.browRaise, 0.04, 0.08, 0.005, 0.04, true);

  // eye openness drives triangle opacity
  const meshAlpha = map(expr.eyeOpen, 0.015, 0.05, 15, 55, true);

  // draw the face mesh with expression-driven coloring
  const tri = faceMesh.getTriangles();
  noStroke();

  for (let i = 0; i < tri.length; i += 3) {
    const a = getSmoothed(tri[i]);
    const b = getSmoothed(tri[i + 1]);
    const c = getSmoothed(tri[i + 2]);

    if (!a || !b || !c) continue;

    const cx = (a.x + b.x + c.x) / 3;
    const cy = (a.y + b.y + c.y) / 3;

    const n = noise(cx * 0.008, cy * 0.008, frameCount * noiseSpeed);
    const hue = (baseHue + n * 120) % 360;

    fill(hue, 55, 60, meshAlpha);
    triangle(a.x, a.y, b.x, b.y, c.x, c.y);
  }

  // mouth open = emit particles from lip center
  if (expr.mouthOpen > 0.03) {
    const upperLip = face.keypoints[13];
    const lowerLip = face.keypoints[14];
    const lipCx = (upperLip.x + lowerLip.x) / 2;
    const lipCy = (upperLip.y + lowerLip.y) / 2;
    const emitRate = Math.floor(map(expr.mouthOpen, 0.03, 0.12, 1, 8, true));

    for (let i = 0; i < emitRate; i++) {
      particles.push({
        x: lipCx,
        y: lipCy,
        vx: random(-2, 2),
        vy: random(1, 3),
        hue: baseHue + random(-30, 30),
        size: random(3, 8),
        life: 1.0
      });
    }
  }

  // draw and update particles
  for (let i = particles.length - 1; i >= 0; i--) {
    const p = particles[i];
    p.x += p.vx;
    p.y += p.vy;
    p.life -= 0.01;

    if (p.life <= 0) {
      particles.splice(i, 1);
      continue;
    }

    fill(p.hue % 360, 50, 70, p.life * 40);
    circle(p.x, p.y, p.size * p.life);
  }

  // draw face contours on top as thin lines
  stroke(0, 0, 80, 20);
  strokeWeight(0.5);
  noFill();

  drawSmoothedContour(FACE_REGIONS.leftEye);
  drawSmoothedContour(FACE_REGIONS.rightEye);
  drawSmoothedContour(FACE_REGIONS.lipsOuter);
}

function drawSmoothedContour(indices) {
  beginShape();
  for (const idx of indices) {
    const s = getSmoothed(idx);
    if (s) vertex(s.x, s.y);
  }
  endShape(CLOSE);
}

Put your face in front of the camera. A generative mask appears -- noise-colored triangles covering your face, shifting in hue with your expression. Smile and the mask warms to oranges and reds as the noise speeds up slightly. Go neutral and it cools to blues and teals, drifting slowly. Open your mouth and particles stream downward from your lips like you're breathing colored mist. Wide eyes make the mesh more opaque, more present. Squint and it fades to a ghost. The mask is alive because your face is alive. Every micro-expression changes the visual. It's not a static filter plastered on top of video -- it's a generative system that uses your face as both its geometry and its input.

Practical notes

Performance budget. Face mesh is the heaviest ml5 model. 468 landmarks per face, triangulated into 900+ triangles. On top of the detection cost, rendering 900+ filled triangles every frame is non-trivial. If you need headroom, render every other triangle, or use a subset of landmarks. Or drop the video feed and only show the mesh -- that eliminates the image(video) call which itself costs something.

Lighting matters more than ever. The model needs to see your face clearly. Side lighting that casts half the face into shadow confuses the landmark placement on the dark side. Even, frontal lighting gives the best results. A ring light or desk lamp facing you works well.

Glasses and occlusion. Thick-rimmed glasses can throw off the eye landmarks. The model might place the eye contour on the glasses frame rather than the actual eye. Hats that cover the forehead reduce brow landmark accuracy. The model handles these cases better than you'd expect, but not perfectly.

Head rotation limits. Face mesh works best for faces roughly facing the camera. Turn your head past about 45 degrees and the model starts losing landmarks on the far side of the face. Extreme profile views (full side face) produce unreliable results. Design your interections for front-facing or slightly angled positions.

't Komt erop neer...

ml5's faceMesh model detects 468 landmarks on the face in real time: jawline, eyes, eyebrows, nose, lips, cheeks, forehead. Each landmark has x, y coordinates (and z for depth). Setup follows the same pattern as bodyPose and handPose -- load model, feed video, get results in a callback. The heaviest ml5 model at 8-15 fps
The 468 landmarks are grouped by facial feature: lip contours (inner and outer loops), eye contours, eyebrow arcs, nose bridge, jawline. Specific indices map to specific features. You don't memorize the indices -- you define region arrays and use them as lookup tables
The triangulation turns 468 points into a mesh surface. MediaPipe provides a triangulation array (index triplets) that connects the landmarks into roughly 900 triangles. This mesh deforms in real time with facial movement. It's denser around the eyes and mouth where expression detail matters most
Expression detection measures distances between landmarks: upper lip to lower lip for mouth openness, mouth corner to mouth corner for smile width, eyebrow to eyelid for brow raise, upper eyelid to lower eyelid for eye openness. All measurements normalized by face height for distance independence
Expressions are continuous values, not binary states. You can half-smile. You can slightly raise one eyebrow. Each expression is a floating-point parameter that maps to visual properties using the same map() approach from every previous episode. Face as multi-channel controller
Face as canvas: project patterns onto the triangulated mesh surface. Noise patterns, stripes, data-driven colors -- anything you draw into the mesh triangles deforms with the face. The pattern follows the geometry. Tilt your head and the pattern tilts. Smile and the lip triangles stretch. It's projection mapping in the browser
Stylized face drawing: render only the feature contours (jaw, eyes, eyebrows, lips, nose) without the video feed. The 468 landmarks contain enough geometric information for a recognizable portrait. Abstract the face by reducing which landmarks you draw -- fewer landmarks means more abstraction while maintaining identity
Multi-face detection: the faces array holds one entry per detected face. Connect corresponding landmarks between faces to visualize the space between them. Average corresponding landmarks to create a blended "ghost face" between two people
Smoothing is critical for face mesh. 468 landmarks on a small surface means visible jitter on every frame. Lerp at 0.3 for most uses. Lower (0.2) for stylized drawing. Higher (0.4-0.5) for expression detection. Unsmoothed face mesh looks like a vibrating mess
Z-depth on the face is more meaningful than on hands. The nose protrudes, eye sockets are recessed, temples are set back. Map z to dot size and brightness to get a 3D point-cloud feel from a 2D camera. Turn your head slightly and the depth profile changes visibly
Face tracking raises stronger privacy concerns than body or hand tracking. A face is identity. 468 landmarks encode biometric data unique to an individual. For installations: clear signage, local processing only, no face data storage, opt-out by walking away, and make the tracking visible so people understand what the system sees
The living mask pattern combines it all: face mesh as geometry, expressions as input, noise as texture, particles as response. Smile changes colors. Open mouth emits particles. Brow raise controls speed. The face is simultaneously the canvas, the controller, and the subject

Four episodes into the ML arc. We went from flat category labels (episode 92) to body skeletons (93) to finger joints (94) to a full face surface with 468 points of expression data. Each model gave us finer resolution and richer data. The patttern holds: model produces structured data, you map it to creative output using the techniques from the first ninety episodes. The body parts we've tracked -- torso, hands, face -- are the three pillars of human nonverbal communication. And we haven't even gotten to what happens when you train models on your own categories.

Sallukes! Thanks for reading.

@femdev

stem stemsocial steemstem programming creativecoding

0.000

0 comments