A number of financial institutions over the past decade have implemented voice-based authentication on their customer service phone systems. Call in and at some point you’ll be asked to say, “My voice is my password.”
An enterprising writer at Motherboard decided to take advantage of the recent renaissance in consumer-friendly AI/ML and turned their voice into a machine model, then pointed that model at a bank’s phone system. After multiple attempts and model retraining, the bank’s voice authentication system finally accepted the machine-generated voice and the writer was in, able to access account details and transaction history.
The writer concluded, with the agreement of several experts, that voice identification is now broken beyond repair by “AI-voices,” and it should be completely abandoned because it’s not foolproof and could lead to widespread hacking of people’s financial accounts. The current trends in cyber security, however, suggest these fears are overblown, and I argue that the writer’s thesis is misguided, based on a casual walk through the history of authentication, from the perspective of both the finance and tech industries.
In my day, we authenticated by voice, uphill both ways
When I had a brokerage account with a small West Coast firm in the 2000s, I could call in to the equities desk and place a trade. The broker’s authentication method? Name a position (stock ticker and quantity) currently held in the account. Realistically, there was no way for the firm to really know it was me, and I didn’t personally know anyone on the desk, but it was an easy way for me to plunk down a few hundred dollars to scoop up some sweet shares of Radio Shack.
The predecessor to online banking was…banking by phone. (Sounds quaint now.) For that, you’d punch in your account number and some other authenticator, like a PIN or the last four of your Social Security number. The functionality was limited, but you could at least check balances and move money between your own accounts.
The rise of hardware tokens
PayPal was one of the early large-scale, customer-facing adopters of two-factor authentication (2FA) with time-based one-time-pad (TOTP) codes. For a nominal fee, you could get a hardware token with a LCD display to improve login security. You’d enter your username and password, and then be prompted for the six-digit number displayed on the token, which would change every 30 seconds.
The advantage of these hardware tokens was simple: without knowledge of the token’s serial number and associated seed stored on the token authentication server, it was impossible to calculate or predict the OTP code. And because these tokens lacked network connectivity, they couldn’t be hacked remotely.
A major downside, of course, was cost: hardware tokens are expensive to procure and ship, and they’re easy to lose and logistically difficult to replace. They also have to be physically accessible, i.e. carried everywhere you might need to use it, making them annoying at best (taking up room in a pocket or purse) and paralyzing at worst (left at home while you’ve traveled to the other side of the world).
The rise of SMS authentication
What happens when the demands of security meet the reality of business? In most cases, a compromise. And this led hardware tokens to the next iteration of two-factor authentication: SMS authentication.
The idea was simple: solve the problem of asking people to carry a hardware token with them by putting the token into something they already carry. Once cell phones went mainstream and the cost of texting went down, a service could send the OTP code to the phone as a way to “prove” the second factor.
SMS-based authentication, however, was [and still is] inherently insecure. Texts fly through mobile networks in the clear, SIM-swapping (where a mobile phone number is ported away from the subscriber) is a thing, and most phones are configured by default to display notifications, including text messages in the default messaging apps, on the lock screen.
What’s old is new again
RSA SecurID tokens were the “OG” device for 2FA, but they never caught on beyond large enterprise deployments. That’s because they had to be integrated as a full-stack solution — servers plus tokens — making the cost and technical effort a high barrier for most implementations.
In the 2010s, two things happened: the launch of YubiKey hardware tokens, a low-cost alternative to RSA’s tokens; and development of open-source authentication standards, specifically OATH and FIDO, as a free alternative to RSA’s server software. This led to easier and faster adoption of 2FA on both the client side (Google was the first on the browser side with Chrome, followed by Mozilla with Firefox) and the server side.
More importantly, YubiKey tokens combined with FIDO had two huge advantages: unlike a RSA SecurID token, a YubiKey token could be used with multiple services; and a user could self-register a YubiKey at any time, without requiring assistance from the online service. And while the tokens do connect to a computer, the protocols they use prevent access to private keys, making them resistant to phishing and hacking.
And then there were authenticator apps
Cyber security is a constant tradeoff with usability. Case in point: hardware tokens. They’re highly resistant to phishing, especially with a protocol like FIDO. But you have to physically carry the token with you, and you have to be able to connect the token to your computer, tablet, or phone. Not to mention you’re asking a customer to buy something to make their experience more secure. It’s a tough sell.
Cue the authenticator app.
Think 2FA, but with a QR-style code instead of a six-digit code. And unlike SMS authentication, the “something you have” is not the phone number, but a user session inside the app. There’s a trust channel built between the app and the consuming service, secured by a private key during enrollment. Even if you swapped SIMs, ported out the phone number, or destroyed the SIM (leaving only WiFi), 2FA would work because it relies only on the app.
Plus, the marginal cost to using authenticator apps is zero — most people have a cell phone, and the apps themselves don’t cost anything. It’s hard for a $30 hardware token to compete with free.
Why Motherboard’s “experiment” misses the mark
The writer argues that synthetic voices can be used to gain entry to a bank account via a phone-based system. But consider where we came from: in the early days of phone-based systems, all you had to do was enter an account number and a PIN. Yes, voice authentication can be spoofed, but it requires an additional step, knowledge of the victim, some amount of technical skill, and a bit of luck. And it falls apart if you end up speaking to a real person, because you can’t rely on pre-recorded, canned responses in a live conversation.
Put all of that aside, though. The reality is, your bank isn’t relying on just a voiceprint analysis to authenticate you. In fact, one of the banks referenced in the article specifically says it uses a “layered approach to security and fraud prevention.” To suggest otherwise is naive, if not misleading.
Cyber security has been at the forefront of risk-based authentication. Microsoft, for instance, employs risk detection in Azure Active Directory to block logins or force the use of MFA, or even require a password change. A simple example is IP address analysis: detect changes in the user’s IP address or IP block. If a user has traditionally connected from the same IP or the same ISP’s block of IPs, and the new session looks the same, allow the session. If the IP or block changes, prompt for a second authentication factor, like SMS, email, or an authenticator app.
Another example — found in Cloudflare’s Ruleset Engine — is the user agent string that’s transmitted in HTTP requests. This string identifies the browser being used to access a web page, and is frequently leveraged by websites to deliver mobile-optimized pages. Just as a change in IP address could be a flag, so could a change in the user agent. If a user always logs in with Safari, but this time is using Edge, we might want to challenge the user for 2FA.
We have a cyber security term for this: defense-in-depth. The concept has been around for a long time.
Beyond all of the examples of risk-based analysis in cyber security, one other element stands out: the scalability of an attack vector. Sasser and Blaster were shockingly destructive attacks on computer systems because they required no authentication. Phishing emails require action by the recipient, either by triggering malicious code or exposing sensitive credentials, limiting the success rate.
To pull off this “attack,” the writer had to collect enough voice recordings to generate a synthetic voice and collect enough personal information (in this case, date of birth) to access the bank account. That’s incredibly time-consuming, especially if the access you get is limited to account information, balances, and transferring funds between accounts.
So is my voice my password? Not exactly. It’s more like one piece of a risk-based authentication puzzle, and the financial industry keeps adding more pieces over time to make it more complex, with many of those puzzle pieces hidden from the public.