ChatGPT Panel 3/2023: Audience Questions Answered

Research & Engagement

Bureau of Educational Research

BER Events

BER Events

ChatGPT Panel 3/8/2023: Audience Questions Answered

Thank you for the good audience questions submitted during the live panel event. Here, panelists respond to questions that were not answered live.

SOCIAL ISSUES:

If the quality of output is so questionable, how do these LLMs become so popular, reaching almost a super-status in the public sphere?
- Nigel Bosch: The quality is questionable in terms of factual correctness, but quite good in terms of syntax, self-consistency, and flexibility. LLMs are, if nothing else, entertaining.
In the future, ChatGPT would be embodied and have a voice. For children growing up with ChatGPT, could their behavior with ChatGPT carry forward to real people, if the dominant metaphor is “assistant?”
- Mike Twidale: Yes this is a real risk. Do we want to teach our students how to be little aristocrats imperiously ordering their robot servants what to do? And will they unconsciously carry over how they talk to computer assistants to human assistants? Even worse, what happens when the ChatGPT assistant makes a mistake, and the child gets cross and shouts at it. Will that transfer? Will children make the necessary distinction that this is not OK to do with a human who seems to talk the same way? Most children can make the distinction between playful telling off of a ‘naughty teddy bear’ or a real-life dog that fails to understand. But anthropomorphism of something that seems to have human-like intelligence is hard to avoid. So we will need to be very careful about the metaphors we use consciously and unconsciously and distinctions that we make.

RESEARCH & DESIGN:

Can ChatGPT be used as a prototype to develop an AI that only pulls factual information or rival that of university approved databases such as JSTOR?
- Roxana Girju: Potentially, yes (but not currently). Although ChatGPT has generated such a frenzy, the focus should not be on this particular LLM. More and better such tools will surface and be made available so that we can train them on more specialized, higher quality datasets - which is another way of controlling the output quality. However, it is very important to know the limitations of such tools (by design) and to let the research community (at large) identify case studies and applications that can benefit from such models. By design, the focus should be on improving task efficiency (like a smart text editor that generates a 1st or 2nd draft), but this requires (most of the time) validation/post editing by the user.
Are any of you currently using (or plan on using) LLMs in your research? Are you willing to share how?
- Nigel Bosch: A couple graduate students I work with are using LLMs in research. One is interested in how they might be used to generate narrative math problems, answers to them, and interactive explanations for students. Another student is interested in the models themselves, and in particular how to assess ways in which LLMs capture and reiterate stereotypes about people, especially in situations where the inner workings of the LLMs are not accessible or interpretable.
- Roxana Girju: I am interested in case studies that give us a better understanding of their strengths and limitations (upper and lower bounds of performance). Specifically, I will be working on the human-AI cognitive space of affordances.
Can you comment about the role of what is now termed “prompt engineering”? Where I am coming from with this question: We’ve just completed a research intervention comparing human (peer) review with AI review (connecting to ChatGPT via API), then having the students critically evaluate the differences. We ran through their written works multiple times based on 10 different prompts. Preliminary finding: students find AI provides more useful feedback this way, and quite different from human.
- Mike Twidale: I think “prompt engineering” and variants on it are really important to explore and teach. And not just for LLMs.

After all, most good research consists of crafting the right kind of question: one that is useful, insightful and can lead in a productive direction. Also, it is not a one-shot process. Iterative refinement is everything. But sadly high-stakes testing can train students to think there is a trick to getting to the right thing first time.

So too with interacting with people - asking good questions is an art - but one that people can get better at with practice - once they realize it is a learnable skill, and one worth improving.

Roxana Girju: The next (big) wave of research papers will be on case studies like yours, that show the input-output correspondence. Prompt engineering iterations do help to get to the desired answer (to some extent), but in reality it will become evident that this is a limitation of the model design and not really of how good people are in formulating their natural language questions (like in daily human-human interactions).
Audience Comments:

I've found "prompt engineering" to be critical to using many of these models effectively. Poor construction of a question will provide poor results. The types of prompts also can be idiosyncratic to the model.
Sounds like a lemma to Cunningham's Law.
We’re exploring the use of rubrics for this purpose through the API connection.

TEACHING & INSTRUCTION:

Re writing instruction, I am surprised no one mentioned style or voice.
- Mike Twidale: No time to cover everything. Yes it can be really useful in teaching to get students to use chat GPT to generate multiple essays on the same topic in multiple styles and voices. That is a useful skill to develop, but painful and laborious to do yourself - and students may perceive it as pointless busywork. But exploring multiple voices can really help in understanding both topic and rhetoric and how they are different but composable.
The two predominant responses I have seen regarding ChatGPT and education are "this will ruin student's ability to reason and express their ideas and therefore should be rejected because we have no other method for praxis," and "if pedagogy doesn't react to technological shifts, we will leave students in the dust." What is your response to those two positions? Does it shift based on subject matter (statistics versus English), or the student's academic level (undergrad versus graduate)?
- Nigel Bosch: Historically, technological innovations have usually had less impact on education than expected—or at least, more slowly than expected. I expect large language models will not be too different, as educators and researchers gradually explore ways to use such technologies to augment learning rather than replace it. Learning technology often also requires self-regulation skills that older students are more likely to have, but which can be taught and may become increasingly important if large language models eventually lead to more student-driven learning processes.
Has any college, or the university writ large, developed some language that can be included in syllabi about proper use, citation expectations, etc.?
- Elizabeth Niswander: We’re not aware that this university or any of its units, has yet established any formal policies on use of ChatGPT in instructional settings. The Center for Innovation in Teaching & Learning (CiTL) is endeavoring to follow ChatGPT-related teaching practices and make recommendations for instructional use.
My question is on educators raising their games. Is it now more urgent than ever before to ask learners to provide references for their sources?
- Mike Twidale: Yes, but not just splat a few refs at the end of the essay. The argument structure needs to use the sources in the body of the text. Plus large language models will start to emulate that structure, using the huge number of papers that follow that structure. So refs are necessary but not sufficient.
Since Chatbots are more about resummarizing information, how do Chatbots do with critical thinking skills? For example, taking an application case study and making decisions using foundational concepts.
- Julia Hockenmaier: LLMs like ChatGPT have no critical thinking skills, even if their output gives the appearance that they might.
- Audience Comment:
  - Thus, asking students to utilize what they learned for new situations will be better than asking them to summarize what they know about topics learned in class.

Can we have a brief live demonstration of using ChatGPT in the classroom?
- Elizabeth Niswander: While we won’t be able to offer any live demo during this session, the CiTL panel held 3/3/3023 did provide some live screens (https://go.illinois.edu/chatgptrecording). In the future, we may also be able to point to live demonstrations.

RESEARCH PUBLISHING:

How does Chatbot affect copyrights of manuscripts under subscription or other intellectual property issues?
- Mike Twidale: AFAIK LLMs work by vacuuming up truly enormous amounts of text from the web and any other databases they can access, in order to create a vector space that allows them to autogenerate plausible sentences. Is that fair use? I don’t know IANAL. I presume they do not / cannot / should not access text protected by paywalls. Interesting to speculate about the copyright status of the texts they generate!

Have journals set any guidelines for using ChatGPT in writing journal articles, that we could teach our graduate students about? Our teaching plagiarism detection software Turnitin has AI detection built in now, suggesting that AI-written journal manuscripts might be rejected on that basis.
- Julia Hockenmaier: Yes. Interestingly, the International Baccalaureate has just declared that students can use ChatGPT as long as they cite it, while the Association for Computational Linguistics advises to only use them for minor editorial purposes, and also requires their use to be disclosed. https://2023.aclweb.org/blog/ACL-2023-policy/

OTHER AUDIENCE COMMENTS:

It is a mimicry engine not an oracle.

My group is trying to train a LLM to become proficient in a single specific topic, e.g., to be a teaching assistant in a particular ECE class. We are using a much smaller LLM compared to the 175B GPT-3 model.

Our research group is currently employing ChatGPT to generate peer reviews of student works in classes this term. We are exploring letting students critically evaluate GPT responses as a learning vector.

Apply Now Request Information Contact Us

College of Education

BER Events

ChatGPT Panel 3/8/2023: Audience Questions Answered

SOCIAL ISSUES:

RESEARCH & DESIGN:

TEACHING & INSTRUCTION:

RESEARCH PUBLISHING:

OTHER AUDIENCE COMMENTS: