- If the quality of output is so questionable, how do these LLMs become so popular, reaching almost a super-status in the public sphere?
- Nigel Bosch: The quality is questionable in terms of factual
correctness, but quite good in terms of syntax, self-consistency, and
flexibility. LLMs are, if nothing else, entertaining.
- In the future, ChatGPT would be embodied and have a voice. For children growing up with ChatGPT, could their behavior with ChatGPT carry forward to real people, if the dominant metaphor is “assistant?”
- Mike Twidale: Yes this is a real risk. Do we want to teach our students how to be little aristocrats imperiously ordering their robot servants what to do? And will they unconsciously carry over how they talk to computer assistants to human assistants? Even worse, what happens when the ChatGPT assistant makes a mistake, and the child gets cross and shouts at it. Will that transfer? Will children make the necessary distinction that this is not OK to do with a human who seems to talk the same way? Most children can make the distinction between playful telling off of a ‘naughty teddy bear’ or a real-life dog that fails to understand. But anthropomorphism of something that seems to have human-like intelligence is hard to avoid. So we will need to be very careful about the metaphors we use consciously and unconsciously and distinctions that we make.
RESEARCH & DESIGN:
- Can ChatGPT be used as a prototype to develop an AI that only pulls factual information or rival that of university approved databases such as JSTOR?
Roxana Girju: Potentially, yes (but not currently). Although ChatGPT has generated such a frenzy, the focus should not be on this particular LLM. More and better such tools will surface and be made available so that we can train them on more specialized, higher quality datasets - which is another way of controlling the output quality. However, it is very important to know the limitations of such tools (by design) and to let the research community (at large) identify case studies and applications that can benefit from such models. By design, the focus should be on improving task efficiency (like a smart text editor that generates a 1st or 2nd draft), but this requires (most of the time) validation/post editing by the user.
- Are any of you currently using (or plan on using) LLMs in your research? Are you willing to share how?
Nigel Bosch: A couple graduate students I work with are using LLMs in research. One is interested in how they might be used to generate narrative math problems, answers to them, and interactive explanations for students. Another student is interested in the models themselves, and in particular how to assess ways in which LLMs capture and reiterate stereotypes about people, especially in situations where the inner workings of the LLMs are not accessible or interpretable.
Roxana Girju: I am interested in case studies that give us a better understanding of their strengths and limitations (upper and lower bounds of performance). Specifically, I will be working on the human-AI cognitive space of affordances.
- Can you comment about the role of what is now termed “prompt engineering”? Where I am coming from with this question: We’ve just completed a research intervention comparing human (peer) review with AI review (connecting to ChatGPT via API), then having the students critically evaluate the differences. We ran through their written works multiple times based on 10 different prompts. Preliminary finding: students find AI provides more useful feedback this way, and quite different from human.
After all, most good research consists of crafting the right kind of question: one that is useful, insightful and can lead in a productive direction. Also, it is not a one-shot process. Iterative refinement is everything. But sadly high-stakes testing can train students to think there is a trick to getting to the right thing first time.
So too with interacting with people - asking good questions is an art - but one that people can get better at with practice - once they realize it is a learnable skill, and one worth improving.
- Roxana Girju: The next (big) wave of research papers will be on case studies like yours, that show the input-output correspondence. Prompt engineering iterations do help to get to the desired answer (to some extent), but in reality it will become evident that this is a limitation of the model design and not really of how good people are in formulating their natural language questions (like in daily human-human interactions).
- Audience Comments:
I've found "prompt engineering" to be critical to using many of these models effectively. Poor construction of a question will provide poor results. The types of prompts also can be idiosyncratic to the model.
Sounds like a lemma to Cunningham's Law.
We’re exploring the use of rubrics for this purpose through the API connection.
TEACHING & INSTRUCTION:
- Re writing instruction, I am surprised no one mentioned style or voice.
Twidale: No time to cover everything. Yes it can be really useful in
teaching to get students to use chat GPT to generate multiple essays on
the same topic in multiple styles and voices. That is a useful skill to
develop, but painful and laborious to do yourself - and students may
perceive it as pointless busywork. But exploring multiple voices can
really help in understanding both topic and rhetoric and how they are
different but composable.
- The two
predominant responses I have seen regarding ChatGPT and education are
"this will ruin student's ability to reason and express their ideas and
therefore should be rejected because we have no other method for
praxis," and "if pedagogy doesn't react to technological shifts, we will
leave students in the dust." What is your response to those two
positions? Does it shift based on subject matter (statistics versus
English), or the student's academic level (undergrad versus graduate)?
Bosch: Historically, technological innovations have usually had less
impact on education than expected—or at least, more slowly than
expected. I expect large language models will not be too different, as
educators and researchers gradually explore ways to use such
technologies to augment learning rather than replace it. Learning
technology often also requires self-regulation skills that older
students are more likely to have, but which can be taught and may become
increasingly important if large language models eventually lead to more
student-driven learning processes.
any college, or the university writ large, developed some language that
can be included in syllabi about proper use, citation expectations,
- Elizabeth Niswander: We’re not aware
that this university or any of its units, has yet established any formal
policies on use of ChatGPT in instructional settings. The Center for Innovation in Teaching & Learning (CiTL) is endeavoring to follow ChatGPT-related teaching practices and make recommendations for instructional use.
question is on educators raising their games. Is it now more urgent
than ever before to ask learners to provide references for their
- Mike Twidale: Yes, but not just
splat a few refs at the end of the essay. The argument structure needs
to use the sources in the body of the text. Plus large language models
will start to emulate that structure, using the huge number of papers
that follow that structure. So refs are necessary but not sufficient.
Chatbots are more about resummarizing information, how do Chatbots do
with critical thinking skills? For example, taking an application case
study and making decisions using foundational concepts.
- Julia Hockenmaier: LLMs like ChatGPT have no critical thinking skills, even if their output gives the appearance that they might.
- How does Chatbot affect copyrights of manuscripts under subscription or other intellectual property issues?
Mike Twidale: AFAIK LLMs work by vacuuming up truly enormous amounts of text from the web and any other databases they can access, in order to create a vector space that allows them to autogenerate plausible sentences. Is that fair use? I don’t know IANAL. I presume they do not / cannot / should not access text protected by paywalls. Interesting to speculate about the copyright status of the texts they generate!
- Have journals set any guidelines for using ChatGPT in writing journal articles, that we could teach our graduate students about? Our teaching plagiarism detection software Turnitin has AI detection built in now, suggesting that AI-written journal manuscripts might be rejected on that basis.
Julia Hockenmaier: Yes. Interestingly, the International Baccalaureate has just declared that students can use ChatGPT as long as they cite it, while the Association for Computational Linguistics advises to only use them for minor editorial purposes, and also requires their use to be disclosed. https://2023.aclweb.org/blog/ACL-2023-policy/