Chances are in the past year you’ve read or seen something that was not created by a human being. Artificial intelligence (AI) has become increasing prevalent in all walks of life including the creation of text, images and videos. The use of AI has drastically increased since the launch of OpenAI’s ChatGPT in November 2022 and has no signs of slowing down. According to a report from Europol Innovation Lab, a European Union law enforcement agency, as much as 90 percent of online content may be synthetically generated by 2026.
With such a staggering projected increase in AI-generated content, it is more important than ever to be aware of what content is synthetically created, what that means and what’s next.
How to identify text created by AI
Ohio University Assistant Professor of English and AI Expert Dr. Paul Shovlin says detecting text generated by AI can be tricky, especially across different kinds of writing. Faculty might identify a student’s writing assignment as being AI-generated because it doesn’t exhibit the kinds of specificity and word choice they are accustomed to from a particular student. This can become difficult when the writing isn’t as personalized and doesn’t have as much voice.
“The issue is that the characteristics [a professor] may be using to intuit aren’t necessarily stabile in different kinds of writing,” said Shovlin. “A scientific report isn’t going to have an identifiable, eccentric personal voice in it, for example.”
At the same time there are instances of someone not using AI to write and their work getting flagged anyway.
“There have been reports of the writing of some neurodivergent writers as being flagged as likely AI-generated when these individuals did not use any AI-assistance, at all,” emphasized Shovlin.
Large language models (LLMs), the AI that specialize in analyzing, generating and understanding text can have a “tell” at times. LLMs often function by predicting the best next word to use. This can result in certain "tell" words that are overrepresented in the training data, but not used in colloquial speech says Dr. Chad Mourning, an Ohio University assistant professor of computer science and expert in AI and machine learning.
“One that shows up a lot, particularly in the academic setting is ‘delve,’” explained Mourning. “I see many student papers using that word, but they don't say that out loud. Makes one suspicious.”
Mourning added that earlier LLM models tended to ramble and didn’t seem to know when they were done. Newer models, however, can add to the confusion as they tend to do a better job replicating organically created text.
“Advanced prompt engineering and bot programming can lead to AI-generated writing that looks more like ‘organically created text,’ than the general ChatGPT model many people use as a go to solution for AI-generated text,” said Shovlin.
How to identify images created by AI
When it comes to images, AI often struggles to generate uniquely human features like faces and fingers. A quick method for identifying images that may have been synthetically created is counting the fingers of the people or seeing if their faces appear to be distorted.
Even if an image does include people, additional steps may need to be taken to distinguish an image as AI-generated. Any sort of distortion or proportions that look extremely out of place can be red flags. For a more objective approach, applications and even AI itself can be used to detect images created by AI.
“In theory, any image generated with an AI can be detected by an AI, but there's a lot more effort going into generation than detection,” said Mourning. “In fact, this task is, itself, a type of technique we call Generative Adversarial Networks (GANs). You train a generator, then tell it which ones are fake to make a discriminator, then exclude the ones the discriminator detects to train a better generator, which can be used to train a better detector.”
How data and the internet influence what AI generates
Artificial intelligence and LLMs are strongly influenced by the content they are trained with. Mourning says much of the growth we have seen in AI is based on training data.
“Most of these generational algorithms are basically weighted combinations of things from the training data, a millionth of this, a millionth of that,” explained Mourning. “If every picture labelled butterfly had a certain kind of symmetry, it will ensure that the generated image of a butterfly does too.”
Since a LLM like ChatGPT is reliant on the data it is trained with, if the training content is biased or problematic, the resulting content will likely be the same. User-generated content that contains accidental misinformation or intentional disinformation can also pose an issue.
“If there is enough deliberate disinformation that makes its way into the training models, it will show up in the output,” emphasized Mourning. “There have been AI generated search result suggestions telling people they should chew rocks to cure some ailment, based on a humorous response in a Reddit thread. I don't think it was a real danger, but there might be some cases that weren't so obvious.”
Shovlin says there are ways to avoid some of this disinformation and misinformation when utilizing AI to generate content.
“You can prompt ChatGPT and other AI tools to focus on specific texts you feed into them and only those texts,” he said. “In the case of a programmed bot with rules to not access the greater web, you may be reasonably assured that the responses it generates are from the specific sources you loaded into it.”
Is it ethical to use AI?
The short answer is it entirely depends on the context. Mourning and Shovlin agree that there is nothing inherently unethical about using generative AI, but aspects of deception and privacy can present more of a complex grey area. Shovlin encourages users of generative AI to use rhetorical awareness—critical thinking related to the text they are composing and the audience they are composing it for.
“One question to ask one’s self, is: ‘What would my audience think if they knew I was generating this text with AI,’” said Shovlin. “Another question is ‘what are the expectations of my organization regarding privacy, copyright, and artificially generated vs. human generated text.’”
How is AI regulated?
Mourning believes that the big ethical questions are related to deception and the unauthorized use of training data. The deception aspect could be easily remedied by adding disclosures, the data portion is a bit more complex. Some LLM models have been trained using YouTube transcripts, something that creators didn’t necessarily sign off on.
If companies are made to disclose all of their data, their methods would be public knowledge, but disclosing where they gathered data could be a good compromise.
“If you make people disclose the actual training data, that's like forcing disclosure of trade secrets,” Mourning explained. “But in aggregate, if you have to list where you got the data from, people can at least see if their rights were violated—whether it's an artist's copyright or YouTube's terms of service.”
Shovlin is more pessimistic about AI regulation and doesn’t think there will be meaningful regulation of generative tools.
“The companies are very powerful, the technology prolific and profuse, and politicians seem to be generally technologically ignorant, based on their responses, for example, to social media controversies,” Shovlin emphasized. “There is a powerful point of view that AI regulation gets in the way of innovation and that given the extreme potential of AI, politicians may be hesitant to develop guidelines for it.”
Will AI replace writers, other creative industries?
AI is already replacing some writers to an extent says Shovlin. ESPN, for example, was under fire for using AI-generated stories to replace what would have been reported on by human journalists for some “underserved” sports.
“While times change and jobs change, it’s important that we carefully consider how AI is affecting the workforce and remember that we have a voice and can use it when it’s merited,” he said.
Creatives are already being replaced and AI is only going to get better but some creatives may be able to leverage the new technology, according to Mourning.
“There will always be room for some creatives, but it's going to be fewer of them,” he said. “Existing writers may make the best of the inaugural class of ‘prompt engineers’ though. It's a transition, not an extinction.”