I’m experimenting with one of ChatGPT’s image generators. Here are some results. I’ll be writing more about this.

To begin with, it’s not anything close to human intelligence. Kind of dumb, actually, and prone to getting itself stuck in blind alleys. It also has quirks that make it obvious it’s an LLM. Many times I’d ask it to remove a unwanted image part and it would say it did it when it clearly hadn’t.

It’s very plain I’m interacting with just another computer program, a powerful one, of course, but still just a program. It’s quickly clear my input is the most important part of the interaction, needing a lot of thought and careful noting of what the program does with parts of the input prompt.

I went into this experiment to explore ChatGPT’s creative or artistic possibilities. I found it worked well if I simplified a great deal. This isn’t surprising, and fits within my usual aesthetic or design approach of simplify, simplify, simplify. The images here are defined as “image of an animal, 20th century linoleum cut style, contained within a simple circular frame, with lots of white space around the circle, muted colors, only three or four colors maximum, in a square screensize”.

Defining the basic style, such as engraving, lithograph, oil painting etc., gives the program a narrower set of things to choose from when it constructs an image. I’ve been avoiding anything with words, as graphical LLMs have a very hard time, oddly, with words and letters, and can be quite stubborn about insisting it’s spelling or forming letters correctly. It’s good to remember there are humans behind the programming of ChatGPT, who make human decisions based on all kinds of circumstances, such as intellectual property.

Simplification removes some of the quirkiness of LLM, or at least makes the quirks more acceptable. The more complex and realistic renderings tend to have weirdness in the odd place, signs of non-human presence. I’m also not attracted to the discernable AI Style. It’s quite possible I’m just not putting the right words into my prompts.

The program can create some very attractive visual elements, impressively so. Frames, strokes and flourishes are often aesthetically pleasing.

I’m also narrowing down the subject matter, envisioning an “A is for Aardvark” type of sequence. These images appear to be well suited to book or picture form, as well as the web. Children will like them. And everyone likes critters.

More later.


Later: I’m not going into the social or economic parts of LLM graphical programs. That’s a vast issue by itself, and we’re at the very early stages. We don’t know what real-life impact LLM will have on artists’ fortunes. I’ll write about it at some point, but want for the moment to look at technical and aesthetic aspects of this new thing.

I’m a technologist who’s earned a living in information technology since 1982, so the technical puzzle appeals to my curiosity and I want to see if I can put it to use. I have friends who are very upset at graphical LLMs, coming from an arts professional place. I’m no arts professional, and can be a bit of a money-grubber at times, so it doesn’t feel as dire to me.

P is for Pig

Q is for Quetzal

G is for Gerbil

C is for Cat