Learn Creative Coding (#97) - Style Transfer: Painting with Neural Networks
Learn Creative Coding (#97) - Style Transfer: Painting with Neural Networks

Last episode we got deep into image classification -- not just the flat "what is it" label, but the full confidence distribution, spatial scanning, feature extraction, transfer learning, and multi-model layering. We turned MobileNet into a creative instrument where each recognized object drove a different visual world. Classification reads the image and tells you what it sees. But it doesn't change the image. It labels, it doesn't transform.
Style transfer does the opposite. It takes an image and fundamentally rewrites how it looks. The content stays -- the objects, the shapes, the composition -- but the rendering changes. Feed it a webcam frame and a Monet painting, and your webcam looks like Monet painted it. Feed it a Van Gogh and everything becomes swirling impasto brushstrokes. The neural network separates what's in the image from how it's rendered, and recombines them. Content from source A, style from source B, output as something that never existed before. It's one of the most visually striking things neural networks can do, and it runs in the browser.
The math behind it is surprisingly elegant. A pre-trained convolutional neural network (usually VGG-19, trained on ImageNet for classification) has layers that detect increasingly abstract features. The early layers see edges, textures, small patterns -- low-level visual stuff. The later layers see objects, faces, scenes -- high-level semantic stuff. "Style" lives in the early layers. "Content" lives in the later layers. Style transfer optimizes a new image so that its early-layer activations match the style image and its late-layer activations match the content image. The result is an image with the content of one picture rendered in the visual style of another. A photo of your room, painted like Starry Night.
ml5 style transfer: the fast path
ml5 provides a pre-trained style transfer model that works in real time on video. The original style transfer algorithm (Gatys et al., 2015) was slow -- it optimized a single image iteratively, taking minutes. The fast approach (Johnson et al., 2016) trains a feedforward network to apply one specific style instantly. That's what ml5 uses. Each model is trained on one painting, and it can transform any input frame to that style in a single forward pass.
let video;
let styleModel;
let styledImage;
function setup() {
createCanvas(640, 480);
video = createCapture(VIDEO);
video.size(640, 480);
video.hide();
// load a pre-trained style model
// ml5 hosts several: wave, udnie, scream, la_muse, rain_princess, wreck
styleModel = ml5.styleTransfer('wave', video, function() {
console.log('style model loaded');
transferLoop();
});
}
function transferLoop() {
styleModel.transfer(function(err, result) {
styledImage = result;
transferLoop();
});
}
function draw() {
background(10, 12, 18);
if (styledImage) {
image(styledImage, 0, 0, width, height);
} else {
// show raw video while model loads
image(video, 0, 0, width, height);
fill(180);
noStroke();
textSize(11);
textFont('monospace');
text('loading style model...', 10, height - 10);
}
}
The 'wave' model applies the style of Hokusai's Great Wave. Your webcam feed transforms into a Japanese woodblock print in real time. Move your hand -- it moves in the painting. Walk around -- the scene updates, every frame repainted in Hokusai's brushwork. The model loads once (might take a few seconds depending on connection) and then runs continuously.
The other built-in models each apply a different painting style. 'udnie' uses Francis Picabia's cubist painting. 'scream' applies Munch's The Scream. 'la_muse' uses Picasso. 'rain_princess' applies a rain-slicked cyberpunk aesthetic. Each transforms the same webcam feed into a completely different visual world.
Side-by-side comparison
One model at a time is nice but comparing styles side by side shows how dramatically the same content changes across different renderings.
let video;
let models = {};
let styledImages = {};
let styleNames = ['wave', 'udnie', 'la_muse'];
let loadedCount = 0;
function setup() {
createCanvas(960, 480);
video = createCapture(VIDEO);
video.size(320, 240);
video.hide();
for (const name of styleNames) {
models[name] = ml5.styleTransfer(name, video, function() {
loadedCount++;
console.log(name + ' loaded (' + loadedCount + '/' + styleNames.length + ')');
if (loadedCount === styleNames.length) {
startAllTransfers();
}
});
}
}
function startAllTransfers() {
for (const name of styleNames) {
transferStyle(name);
}
}
function transferStyle(name) {
models[name].transfer(function(err, result) {
styledImages[name] = result;
transferStyle(name);
});
}
function draw() {
background(10, 12, 18);
const cellW = width / styleNames.length;
for (let i = 0; i < styleNames.length; i++) {
const name = styleNames[i];
const x = i * cellW;
if (styledImages[name]) {
image(styledImages[name], x, 40, cellW, cellW * 0.75);
}
// label
fill(160, 170, 190);
noStroke();
textSize(10);
textFont('monospace');
textAlign(CENTER);
text(name, x + cellW / 2, 30);
}
}
Three style models running on the same video feed simultaneously. The same room, the same lighting, the same objects -- but the wave panel looks like a woodblock, the udnie panel looks like cubist abstraction, and the la_muse panel fractures everything into angular Picasso forms. Moving your hand creates three simultaneous hand-paintings in three radically different styles. The visual gap between them -- same content, completely different rendering -- is what makes style transfer so fascinating. It makes "style" tangible. You can SEE what the word means when three paintings of the same scene sit next to each other.
Performance drops with multiple models. Three simultaneous transfers might get 3-5 fps total. For a live demo that's rough, but for generating static comparisons or slow-moving generative work it's fine.
Style strength: the creative slider
The raw style transfer output can be overwhelming -- every pixel gets fully repainted in the style, and the original content can be hard to read. Blending the styled frame with the original video lets you control how much style to apply. Think of it as a dial from "photograph" to "full painting" with everything in between.
let styleAmount = 0.7;
function draw() {
background(10, 12, 18);
if (!styledImage) return;
// draw original video at low opacity
tint(255, 255 * (1 - styleAmount));
image(video, 0, 0, width, height);
// draw styled image on top
tint(255, 255 * styleAmount);
image(styledImage, 0, 0, width, height);
// reset tint
noTint();
// slider label
fill(160);
noStroke();
textSize(10);
textFont('monospace');
text('style: ' + (styleAmount * 100).toFixed(0) + '%', 10, height - 10);
text('mouse X to adjust', 10, height - 24);
}
function mouseMoved() {
styleAmount = map(mouseX, 0, width, 0, 1, true);
}
Move the mouse left and the image is mostly photograph with a subtle painted texture -- like a Instagram filter. Move right and the painting takes over completely, the original barely recognizable. The sweet spot depends on the style. Hokusai's wave works well at 60-70% because the bold lines are readable even partially applied. Udnie needs higher amounts (80%+) because the cubist fragmentation is subtle at low levels. Finding the right balance for each style is a creative decision.
You can also animate the style amount -- smoothly transitioning from photograph to painting and back. Map it to audio amplitude (episode 19) and the image "paints itself" on the beat. Map it to the confidence score from a classifier (episode 96) and the style intensity depends on what the camera sees. Lots of possibilities.
Video style consistency: fighting flicker
Frame-by-frame style transfer has a problem: temporal inconsistency. Each frame is styled independently, and the result jitters between frames. A pixel that was blue-swirl in frame N might be green-dot in frame N+1 even though nothing in the scene changed. It's like watching a painting vibrate. For static images it's fine. For video it's distracting.
The simplest fix is to blend with the previous styled frame:
let prevStyledCanvas;
let blendAmount = 0.6;
function setup() {
createCanvas(640, 480);
prevStyledCanvas = createGraphics(640, 480);
video = createCapture(VIDEO);
video.size(640, 480);
video.hide();
styleModel = ml5.styleTransfer('wave', video, function() {
transferLoop();
});
}
function transferLoop() {
styleModel.transfer(function(err, result) {
styledImage = result;
transferLoop();
});
}
function draw() {
if (!styledImage) return;
// blend current styled frame with previous
prevStyledCanvas.tint(255, 255 * blendAmount);
prevStyledCanvas.image(prevStyledCanvas, 0, 0);
prevStyledCanvas.tint(255, 255 * (1 - blendAmount));
prevStyledCanvas.image(styledImage, 0, 0);
prevStyledCanvas.noTint();
image(prevStyledCanvas, 0, 0, width, height);
}
The blend damps the flicker. A blendAmount of 0.6 means 60% previous frame + 40% new frame, which smooths out most jitter while still tracking movement. Higher values (0.8) give a very smooth, ghostly look where motion leaves painted trails. Lower values (0.3) show more of the raw styled output but with less stabilization. The tradeoff is smoothness vs responsiveness -- high blend = smooth but laggy, low blend = responsive but flickery.
This isn't proper optical flow based stabilization (that would warp the previous frame to match current motion, which is much more computationally expensive). It's just alpha blending. But for most creative applications it works well enugh. The slight ghosting from high blend values actually looks good -- it creates a painted-trail effect that wouldn't exist in the raw output.
Applying style to generated art
Here's the part that connects style transfer back to everything else in this series. You don't have to style transfer from a webcam. You can generate an image with p5 -- particles, noise, geometry, whatever -- capture it, and apply style transfer. The generative output becomes the source material for neural painting.
let sourceCanvas;
let styleModel;
let styledResult;
let applyStyle = false;
function setup() {
createCanvas(640, 480);
sourceCanvas = createGraphics(640, 480);
// load style model using the source canvas as input
styleModel = ml5.styleTransfer('la_muse', sourceCanvas.canvas, function() {
console.log('ready');
});
}
function draw() {
// generate art on the source canvas
sourceCanvas.background(15, 18, 25, 8);
for (let i = 0; i < 5; i++) {
const x = width / 2 + cos(frameCount * 0.02 + i * 1.2) * 200;
const y = height / 2 + sin(frameCount * 0.03 + i * 0.8) * 150;
const size = noise(i * 0.3, frameCount * 0.01) * 60 + 10;
sourceCanvas.noStroke();
sourceCanvas.fill(
120 + sin(frameCount * 0.01 + i) * 60,
150 + cos(frameCount * 0.015 + i * 2) * 50,
200 + sin(frameCount * 0.02 + i * 3) * 55,
30
);
sourceCanvas.circle(x, y, size);
}
if (applyStyle && styledResult) {
image(styledResult, 0, 0, width, height);
} else {
image(sourceCanvas, 0, 0, width, height);
}
}
function keyPressed() {
if (key === 's') {
applyStyle = !applyStyle;
if (applyStyle) {
styleTransferLoop();
}
}
}
function styleTransferLoop() {
if (!applyStyle) return;
styleModel.transfer(function(err, result) {
styledResult = result;
styleTransferLoop();
});
}
Press S to toggle style transfer. The generative circles and trails you see on screen suddenly get re-rendered as a Picasso painting. The shapes are the same, the motion is the same, but the visual treatment is completely different. Toggle back and it's smooth, clean vector circles again. Toggle forward and it's fragmented cubist geometry. Code becomes art becomes painting. Each layer of transformation adds another creative dimension.
You could chain this with the data art techniques from episodes 79-91. Visualize a dataset as particles, apply style transfer, and the data portrait becomes a painted data portrait. Or feed a classification result into a generative sketch, style-transfer the output, and you have three layers of neural interpretation stacked on top of each other.
The Gram matrix: what IS style, mathematically?
Allez, let's go a bit deeper into the math, because this is one of the most elegant ideas in deep learning and it's worth understanding even if you never implement it from scratch.
When a CNN processes an image, each convolutional layer produces a set of feature maps -- 2D grids where each cell contains the activation (response) of a specific filter at that position. Early layers have filters for edges, lines, gradients. Later layers have filters for eyes, wheels, fur textures. A feature map tells you "how much of pattern X is at position Y."
Style is captured by the correlations between these feature maps. If the "horizontal line" filter and the "blue color" filter tend to activate together, that's a stylistic property -- the image has blue horizontal lines. The Gram matrix computes all pairwise correlations between filter activations in a layer:
// conceptual -- not runnable in ml5, but shows the math
// featureMaps: array of 2D arrays, one per filter
// each map is height x width
function computeGramMatrix(featureMaps) {
const numFilters = featureMaps.length;
const gram = [];
for (let i = 0; i < numFilters; i++) {
gram[i] = [];
for (let j = 0; j < numFilters; j++) {
// dot product of flattened feature maps i and j
let sum = 0;
const flatI = featureMaps[i].flat();
const flatJ = featureMaps[j].flat();
for (let k = 0; k < flatI.length; k++) {
sum += flatI[k] * flatJ[k];
}
gram[i][j] = sum;
}
}
return gram;
}
The Gram matrix is a square matrix where entry (i, j) tells you how much filter i and filter j co-activate across the whole image. High values mean they activate together (the image has patterns that trigger both). Low values mean they're independent. This matrix captures the texture, the color relationships, the brushwork patterns -- everything we'd call "style" -- without any information about WHERE things are in the image. It's purely about WHAT patterns co-occur.
Two images with similar Gram matrices at a given layer look stylistically similar at that scale. Match the Gram matrix at a fine-grained layer and you match the texture. Match it at a coarser layer and you match the larger-scale patterns. Style transfer matches across multiple layers simultaneously, capturing style at every scale from individual brushstrokes to the overall color composition.
The fact that this works at all is kind of remarkable. Nobody designed these features to capture "style." The network was trained to classify images -- to tell cats from dogs. But the features it learned for classification happen to decompose images into content (what) and style (how) in a way that's separable and recombineable. That's an emergent property of convolutional networks, not something anyone planned for. And it's why a network trained to recognize 1000 object categories can be repurposed to paint pictures.
Real-time style with the webcam: practical setup
Let me put together a more complete webcam style transfer setup with the controls you'd actually want for a creative session or a live performance.
let video;
let models = {};
let currentStyle = 'wave';
let styledImage = null;
let styleAmount = 0.85;
let showOriginal = false;
let availableStyles = ['wave', 'udnie', 'la_muse', 'scream', 'rain_princess'];
let loadedStyles = [];
function setup() {
createCanvas(640, 480);
video = createCapture(VIDEO);
video.size(640, 480);
video.hide();
// load all style models
for (const name of availableStyles) {
models[name] = ml5.styleTransfer(name, video, function() {
loadedStyles.push(name);
console.log(name + ' loaded');
if (name === currentStyle) {
startTransfer();
}
});
}
}
function startTransfer() {
models[currentStyle].transfer(function(err, result) {
styledImage = result;
startTransfer();
});
}
function draw() {
background(10, 12, 18);
if (showOriginal || !styledImage) {
image(video, 0, 0, width, height);
} else {
// blend original and styled
tint(255, 255 * (1 - styleAmount));
image(video, 0, 0, width, height);
tint(255, 255 * styleAmount);
image(styledImage, 0, 0, width, height);
noTint();
}
// HUD overlay
fill(0, 0, 0, 140);
noStroke();
rect(0, height - 55, 300, 55);
fill(160, 170, 190);
textSize(10);
textFont('monospace');
textAlign(LEFT);
text('style: ' + currentStyle, 10, height - 38);
text('amount: ' + (styleAmount * 100).toFixed(0) + '%', 10, height - 24);
text('keys: 1-5 style | up/dn amount | O original', 10, height - 10);
// loading indicator
if (loadedStyles.length < availableStyles.length) {
fill(200, 150, 80);
text('loading models... ' + loadedStyles.length + '/' +
availableStyles.length, 10, 20);
}
}
function keyPressed() {
if (key >= '1' && key <= '5') {
const idx = parseInt(key) - 1;
if (idx < availableStyles.length) {
currentStyle = availableStyles[idx];
styledImage = null;
if (loadedStyles.indexOf(currentStyle) !== -1) {
startTransfer();
}
}
}
if (key === 'o' || key === 'O') {
showOriginal = !showOriginal;
}
if (keyCode === UP_ARROW) {
styleAmount = min(styleAmount + 0.1, 1.0);
}
if (keyCode === DOWN_ARROW) {
styleAmount = max(styleAmount - 0.1, 0.0);
}
}
Press 1-5 to switch between styles. Up/down arrows to adjust style intensity. O to toggle back to the original video for comparison. All five models pre-load in the background so switching is instant once they're ready. The HUD shows you which style is active and the current blend amount.
In a live performance context you'd control this with MIDI rather than keyboard -- map a fader to styleAmount and buttons to style selection. But the principle is the same. The performer's job is choosing which neural lens to look through and how strongly to apply it. The painting happens automatically.
Limitations and what works well
Style transfer isn't magic. Some styles transfer beautifully and others don't, and understanding why helps you choose the right source paintings.
Textures transfer well. Van Gogh's swirly brushstrokes, Monet's dappled light, pointillist dots -- these are textural patterns that the Gram matrix captures perfectly. The model reproduces the brushwork across the entire image regardless of content. These styles work with almost any input.
Color palettes transfer well. A painting with a dominant blue-and-gold palette will push those colors into your webcam feed. Color is essentially a low-level feature that's easy for the network to match. Mondrian's primaries, Rothko's deep reds, Klimt's golds -- all transfer reliably.
Geometric structure transfers poorly. Escher's impossible geometries, Mondrian's precise rectangles, geometric patterns with specific shapes -- the model can't replicate these because they require spatial precision that the Gram matrix (which is position-independent) doesn't capture. You get the colors and textures of Escher but not the tessellation. The Mondrian output has Mondrian-ish colors but blobby shapes instead of clean rectangles.
Faces and familiar objects get weird. Style transfer doesn't understand semantics. It'll happily paint an eye with the same swirl pattern as a cloud. For portraiture this creates interesting effects -- your face rendered as a Van Gogh self-portrait has the right color palette and brushwork but the features might get distorted. Sometimes that's the point. Sometimes it's just ugly.
// tip: styles with strong, consistent textures work best
// for live webcam use. avoid styles with important
// geometric structure.
//
// good for live: wave, starry night, monet water lilies,
// pointillism, impressionist landscapes
//
// bad for live: escher, mondrian, technical drawings,
// geometric patterns, line art
Combining style transfer with ML models
The previous five episodes gave us body tracking, hand tracking, face mesh, and classification. Style transfer can layer on top of any of those. Here's one combination -- pose detection driving which style gets applied:
let video, bodyPose, styleModels;
let poses = [];
let currentStyleName = 'wave';
let styledImage = null;
let styleNames = ['wave', 'udnie', 'la_muse'];
function preload() {
bodyPose = ml5.bodyPose('MoveNet', { flipped: true });
}
function setup() {
createCanvas(640, 480);
video = createCapture(VIDEO, { flipped: true });
video.size(640, 480);
video.hide();
bodyPose.detectStart(video, function(r) { poses = r; });
// load style models
styleModels = {};
for (const name of styleNames) {
styleModels[name] = ml5.styleTransfer(name, video, function() {
console.log(name + ' loaded');
});
}
transferLoop();
}
function transferLoop() {
if (styleModels[currentStyleName]) {
styleModels[currentStyleName].transfer(function(err, result) {
styledImage = result;
transferLoop();
});
} else {
setTimeout(transferLoop, 200);
}
}
function draw() {
if (styledImage) {
image(styledImage, 0, 0, width, height);
} else {
image(video, 0, 0, width, height);
}
if (poses.length > 0) {
const pose = poses[0];
const leftWrist = pose.keypoints.find(function(kp) {
return kp.name === 'left_wrist';
});
const rightWrist = pose.keypoints.find(function(kp) {
return kp.name === 'right_wrist';
});
if (leftWrist && rightWrist) {
// hand height selects style
// both hands low = wave (calm)
// left hand high = udnie (energetic)
// right hand high = la_muse (angular)
const leftHigh = leftWrist.y < 200;
const rightHigh = rightWrist.y < 200;
if (leftHigh && !rightHigh) {
currentStyleName = 'udnie';
} else if (rightHigh && !leftHigh) {
currentStyleName = 'la_muse';
} else {
currentStyleName = 'wave';
}
}
}
// info
fill(0, 0, 0, 140);
noStroke();
rect(0, height - 30, 200, 30);
fill(170, 180, 200);
textSize(10);
textFont('monospace');
text('style: ' + currentStyleName, 10, height - 12);
}
Raise your left hand above your head and the scene shifts from calm Japanese waves to energetic cubist fragmentation. Raise your right hand and it becomes angular Picasso. Both hands down returns to Hokusai. Your body posture controls which neural painter is active. Dance and the style changes with your movement. That's a performable piece -- the audience sees a painted world that responds to the performer's gestures. The performer is simultaneously the subject, the canvas, and the controller.
Performance is heavy here. Pose detection + style transfer running together might get 3-5 fps. But as we discussed in episode 96, the slow update rate creates a dreamlike quality where style changes drift in rather than snap. For some pieces, that's actually better than real-time.
Capturing styled frames
Style transfer produces beautiful still images, not just video. Capturing individual styled frames gives you high-quality prints, postcards, social media posts -- each one a unique neural painting of whatever was in front of your camera.
function keyPressed() {
if (key === 'c' && styledImage) {
// save current styled frame
save(styledImage, 'styled-' + currentStyle + '-' + frameCount + '.png');
console.log('frame captured');
}
}
Press C to save the current styled frame as a PNG. Each capture is a one-of-a-kind image -- the specific moment, the specific lighting, the specific camera angle, all run through the neural style. Collect a series of them, print them, and you have a physical gallery of neural paintings derived from your webcam. The camera-as-paintbrush metaphor becomes literal.
The creative exercise: neural painting studio
Allez, time to build the full thing. A webcam painting studio where you cycle through styles, control intensity with mouse position, blend between original and styled in real time, and capture your favorite frames. Your room becomes the canvas, the neural network becomes the painter, and you're the art director choosing which moments to preserve.
let video;
let models = {};
let styleNames = ['wave', 'udnie', 'la_muse', 'scream', 'rain_princess'];
let currentIdx = 0;
let styledImage = null;
let styleAmount = 0.9;
let blendCanvas;
let prevBlend;
let temporalSmooth = 0.5;
let captureCount = 0;
let loaded = {};
function setup() {
createCanvas(800, 600);
blendCanvas = createGraphics(800, 600);
prevBlend = createGraphics(800, 600);
prevBlend.background(0);
video = createCapture(VIDEO);
video.size(320, 240);
video.hide();
for (const name of styleNames) {
loaded[name] = false;
models[name] = ml5.styleTransfer(name, video, function() {
loaded[name] = true;
console.log(name + ' ready');
if (name === styleNames[currentIdx]) {
doTransfer();
}
});
}
}
function doTransfer() {
const name = styleNames[currentIdx];
if (!loaded[name]) return;
models[name].transfer(function(err, result) {
styledImage = result;
doTransfer();
});
}
function draw() {
background(10, 12, 18);
// style amount from mouse Y
styleAmount = map(mouseY, 0, height, 1.0, 0.0, true);
// temporal smoothing from mouse X
temporalSmooth = map(mouseX, 0, width, 0.1, 0.9, true);
if (styledImage) {
// blend styled + original based on styleAmount
blendCanvas.clear();
blendCanvas.tint(255, 255 * (1 - styleAmount));
blendCanvas.image(video, 0, 0, width, height);
blendCanvas.tint(255, 255 * styleAmount);
blendCanvas.image(styledImage, 0, 0, width, height);
blendCanvas.noTint();
// temporal blend with previous frame
prevBlend.tint(255, 255 * temporalSmooth);
prevBlend.image(prevBlend, 0, 0);
prevBlend.tint(255, 255 * (1 - temporalSmooth));
prevBlend.image(blendCanvas, 0, 0);
prevBlend.noTint();
image(prevBlend, 0, 0);
} else {
image(video, 0, 0, width, height);
}
// controls overlay
fill(0, 0, 0, 150);
noStroke();
rect(0, 0, 260, 90);
fill(170, 180, 200);
textSize(10);
textFont('monospace');
textAlign(LEFT);
const styleName = styleNames[currentIdx];
text('style: ' + styleName + (loaded[styleName] ? '' : ' (loading)'), 10, 18);
text('intensity: ' + (styleAmount * 100).toFixed(0) + '% (mouse Y)', 10, 34);
text('smoothing: ' + (temporalSmooth * 100).toFixed(0) + '% (mouse X)', 10, 50);
text('left/right: switch | C: capture', 10, 66);
text('captures: ' + captureCount, 10, 82);
}
function keyPressed() {
if (keyCode === LEFT_ARROW) {
currentIdx = (currentIdx - 1 + styleNames.length) % styleNames.length;
styledImage = null;
doTransfer();
}
if (keyCode === RIGHT_ARROW) {
currentIdx = (currentIdx + 1) % styleNames.length;
styledImage = null;
doTransfer();
}
if (key === 'c' || key === 'C') {
saveCanvas('neural-painting-' + styleNames[currentIdx] + '-' + captureCount, 'png');
captureCount++;
console.log('captured frame ' + captureCount);
}
}
Mouse Y controls how much painting vs photograph you see. Mouse X controls how much temporal smoothing is applied. Left/right arrows cycle through styles. C captures the current frame. Move the mouse to the top-left corner: full painting, heavy trail effects, a slow, dreamy, heavily processed image. Bottom-right: mostly photograph with minimal smoothing, almost real-time video with a hint of painted texture. Every position on the canvas gives a different balance.
The captured frames are your portfolio. Each one is a unique neural painting -- your room, your face, your objects, all reinterpreted through a different artistic lens. Print the best ones. They look genuienly good on paper.
Where does this lead?
Style transfer is the first technique where a neural network actively transforms images rather than just analyzing them. Classification reads. Feature extraction summarizes. Style transfer rewrites. That's a different category of operation, and it opens a door. If a network can paint in Van Gogh's style from a photo, what else can it generate? Can it turn a sketch into a photograph? Can it generate images from text? Can it hallucinate entirely new images?
The answer to all of those is yes, and the techniques that make them possible (GANs, diffusion models, pix2pix) build on the same foundation we explored here -- convolutional features, learned representations, the separation of content from style. We've seen how a pre-trained classification network (VGG/MobileNet) develops internal representations that capture both what's in an image and how it's rendered. Style transfer exploits that. The next techniques exploit it further.
't Komt erop neer...
- Style transfer separates the content of an image (what's in it) from the style (how it's rendered) and recombines them. Feed it your webcam and a Monet painting and your webcam looks like Monet. The neural network rewrites the rendering while preserving the structure. It's one of the most visually striking ML applications and it runs in the browser through ml5.js
- The fast style transfer approach (Johnson et al., 2016) trains a feedforward network to apply one specific style in a single forward pass, fast enough for real-time video. ml5 provides pre-trained models for several styles: wave (Hokusai), udnie (Picabia), la_muse (Picasso), scream (Munch), rain_princess. Each transforms the same input into a completely different visual world
- Style strength is controlled by blending the styled output with the original video. Low blend = photograph with subtle painted texture. High blend = full neural painting. Mouse position, audio amplitude, classification confidence -- anything can drive this parameter. The sweet spot depends on the style: bold styles (wave) read well at 60-70%, subtle styles (udnie) need 80%+
- Frame-by-frame style transfer flickers because each frame is styled independently. Temporal smoothing (alpha blending with the previous styled frame) damps the jitter. A blend of 0.5-0.7 smooths most flicker while tracking motion. High blend values create painted ghost trails. The tradeoff is smoothness vs responsiveness
- Style transfer works on generated art too, not just webcam video. Generate a p5 sketch (particles, noise, geometry), feed it to the style model, and your code output gets repainted as cubist geometry or impressionist landscape. Code becomes art becomes painting -- each layer of transformation adds a creative dimension
- The math: a CNN trained for classification develops layers that capture low-level features (edges, textures = style) and high-level features (objects, scenes = content). The Gram matrix computes correlations between filter activations at each layer, capturing which visual patterns co-occur without position information. Matching Gram matrices across layers transfers style at every scale from individual brushstrokes to overall color composition
- Textures and color palettes transfer well (Van Gogh swirls, Monet light, pointillist dots). Geometric structure transfers poorly (Escher tessellation, Mondrian rectangles) because the Gram matrix is position-independent. This guides style image selection -- choose paintings with strong, consistent textures for best results
- Combining style transfer with ML models from previous episodes: pose detection selects which style is active (raise left hand = cubist, right hand = impressionist). Classification confidence drives style intensity. Face mesh could drive style parameters. Each combination creates a performable piece where the body controls the neural painter
- The neural painting studio: mouse Y controls style intensity, mouse X controls temporal smoothing, arrow keys cycle styles, C captures frames. Every position on the interaction space gives a different balance of photograph vs painting, smooth vs responsive. Captured frames are printable neural paintings -- your room through different artistic lenses
- Style transfer opens the door from analysis to generation. Classification reads images. Feature extraction summarizes them. Style transfer rewrites them. The techniques that come next -- sketch-to-photo, image generation from text, GANs -- build on the same convolutional features and learned representations
Five episodes into this ML arc and the models keep getting more powerful. We went from labeling (episode 92) to spatial tracking (93-95) to deep classification (96) to neural image transformation. Each episode the network's role shifts from observer to creator. Classification watches and reports. Style transfer watches and paints. And we haven't gotten to the models that generate entirely new images from nothing but a sketch.
Sallukes! Thanks for reading.
X