Deep thoughts on Microsoft’s Bing chat

Perhaps not deep thoughts…but thoughts, nonetheless, on the apple of everyone’s eye in February 2023: Microsoft’s AI-assisted Bing chat. (Cue subtle jab at Siri.)

This is why we can’t have nice things

Unsurprisingly, it took the Internet about a week to break Bing chat and coerce it into stilted sentence structures, bizarre responses, and a less-than-polite demeanor. The folks at Microsoft Bing alluded to these human shenanigans in a blog post, pointing out:

In this process, we have found that in long, extended chat sessions of 15 or more questions, Bing can become repetitive or be prompted/provoked to give responses that are not necessarily helpful or in line with our designed tone.
[…] The model at times tries to respond or reflect in the tone in which it is being asked to provide responses that can lead to a style we didn’t intend. This is a non-trivial scenario that requires a lot of prompting so most of you won’t run into it, but we are looking at how to give you more fine-tuned control.
Microsoft Bing blog post, February 15, 2023

Additional reporting from Bloomberg uncovered more tweaks to Bing’s chatbot, suggesting that developers are short-circuiting questions that inquire about the system’s apparent codename, “Sydney,” or its feelings. Questions about how the search engine feels to be a search engine — a suggested follow-up question, no less — were immediately shut down with “I prefer not to continue this conversation,” and the reporter’s question asking if the name “Sydney” could be used as a pretend name instead of Bing ended with a similarly-curt answer: “This conversation is over. Goodbye.” (Looks like Sydney spent too much time with Anne Robinson.)

A case of data science deja vu

Last year, I took one of the first sessions of the SANS SEC595 course, Applied Data Science and Machine Learning for Cybersecurity Professionals. The latter part of the course dives in to neural networks and autoencoders, common components in machine learning applications. While SEC595 applies those AI/ML concepts to CAPTCHAs — in keeping with the course’s orientation towards practical cybersecurity use-cases — it’s easy to draw a line from there to systems like OpenAI’s ChatGPT, the apple of everyone’s eye in January 2023. (Meanwhile, ask yourself out loud: Ok, Google, it’s March; where’s your chatbot?)

Computers are only as smart as the people programming them, and data science projects are only as good as the data they’re being handed. For a humorous example, see Randall Munroe’s comic XKCD, number 2739, titled “Data Science.” (Don’t forget to check out the alt text for the image.) For a not-so-humorous example, see Bing Chat’s conversation with the Associated Press, comparing their reporter to dictators and war criminals. Or, for a bizarre example, see the two-hour conversation with a New York Times columnist, with Bing Chat declaring passionate affection for its questioner.

IBM’s Watson seemingly solved the natural language question-and-answer problem over a decade ago, though its success on Jeopardy! hasn’t translated elsewhere. Microsoft’s ill-fated Tay AI project in 2016 went to school in the Twitter-verse — and dropped out before day’s end. While all of that was going on, we were promised self-driving and flying cars and human trips to Mars. In 2023, it’s an all-out battle among big tech for the best chatbot on the Internet, where the leading entry finds itself going down dark rabbit holes, only to be saved at the last moment by a hero programmer.

We’ve been here before. That’s because the underlying technology remains the same: train models based on human-curated subsets of human-generated content. Computers are only as smart as the people programming them, and data science projects are only as good as the data they’re being handed.

Now for those deep thoughts…

So what are some takeaways from this nascent technology that has everyone talking [to it]?

First: data science projects are only as good as the underlying data. Garbage in, garbage out. If you use a social media network as your data source, your models and outputs are going to look a lot like a social media network. If your goal is to build machine learning models based on conversations on a Final Fantasy XIV fan forum, don’t vacuum up the entirety of Reddit to train your model.

Second: machine learning is not a magic wand. ML models won’t single-handedly fix your cyber security problems, resolve your IT operations shortcomings, correctly pick next week’s Powerball lottery numbers, or make you an espresso con panna. At its core, data science (to include machine learning) is applying statistical analysis to solve linear and nonlinear problems — sometimes with failed results. It’s basically pattern matching, and as we all know, if the last five numbers on the roulette table have come up red, the next number definitely won’t be red.

Finally: data science projects will never be a 100% solution. Read up on the concept of overfitting. The nuances of human conversation (i.e. features) are too complex for a machine learning model to capture, so any claim of a “perfect” chatbot should be viewed with suspicion. A ham/spam model might perform flawlessly on a training and test data set, and even on real-world emails, but it’s unrealistic to expect that performance to continue in perpetuity, especially without model retraining. Data science is a great solution for making smaller haystacks. It isn’t going to leave you with a tidy pile of straw-free sewing needles.

…and a parting thought

“I like travel and I have a question for you. Could you tell me how you feel about Sydney?