Just an update on what I have learned from playing a bit more with the generative art method of CLIP+Diffusion modelling which is made simple by people far smarter than me.
I have not had a chance to play a ton, but did get to test about 4 more text prompts to instruct the AI to generate images hopefully very close to what I am picturing.
houses suspended over a ravine filled with galaxies. In a steampunk sci-fi style. Stained Glass Windows, Copper pipes, Mechanical.
I tried to be rather specific in this prompt, and then add what I call modifiers after the main instruction. I don't think this worked so well as you will see below. Now it could be words like "suspended" and possibly visually a "gorge", which would be a better option than "ravine".
We have to keep in mind the AI is basically googling things and concepts that seem quite simple to us might get lost in translation.
As you can see there is this general brown it likes to add in the background, I have no clue why it thinks it should be there but it could be "ravine" of stone? I only ran the 2 batches then my account ran out of free time.
Fortunately, I have a legit secondary google account, and used that to connect. They did not seem to IP block me , but possibly constantly swapping between account Google will nail me at some point.
A big thing that confused me when using the workbook and rendering was that sometimes it would take up to 3 hours for a render and then others it would only take 15minutes for the same thing.
I eventually tracked it down to what type of VM google provides you. Since on the free account resources are shared you get only 12 hours render time but also could get any one of many types of machines to render your artwork.
The one that takes extremely long is the Tesla K80, and really if you get that then it might be best to try later or refresh to hopefully get a better machine.
In your workbook you can know what type of machine you have by running only the first option setting before doing anything else.
After pressing the play button in the Set Up panel, 1.1 Check GPU Status, I see that I have a Tesla T4. That is a pretty good one, and will render an image with 500 Steps in about 30minutes.
Another thing is that it is not so good at rendering people. Just a note.
Second Prompt, same style
For the second run of 2 batches, I changed the prompt up to give a bit more room but also be more specific of main elements.
Houses flaoting in a rvier made of nebula, with a planet in the background. Futuristic, Space ships, Bubbles
I kind of expected the "modifiers" I tacked on at the end to guide it a bit but this is where I see they are very influential in how it decides to render the overall image. Really it is all trial and error.
It even took what I can only assume is some signature, which does beg the question what is the copyright on images that kinda get style transferred but also sampled drastically to the point parts of the original may only be a single star of a specific galaxy.
The above images did turn out way better and I think it is because of the specific but also generally accessible style. Things like clouds are abundant combining that with general items can get great abstract results since there are just so many reference images available.
A good place to check these out is this imgur someone made of what you can possibly expect by using certain phrases: 200 CLIP+VQGAN keywords on 4 subjects, by kingdomakrillic
Sourced from the above link.
As you see even just simple terms can make a big difference. Considering the AI is pretty dumb you do need instruct it just right and saying things like "made of vines" or "I want detail" are required if that is your intent. Instead of saying "very ornate" or maybe "viney"
For some tutorial vids you can check my previous post on this and or just google. First attempt at Generative Art