AI Voices Could Upend Economics of K-Pop Production

Sejin Kim 2024.05.22 14:22 PDT
AI Voices Could Upend Economics of K-Pop Production
(출처 : provided by Supertone)

Interview with Lee Kyo-gu, CEO of Supertone
AI's 'Infinite Vocal Ranges' Open Door to Virtual Youtubers, Virtual Idols, Micro-Celebrities and Human-AI Collabs.

A new AI-powered voice conversion tool is threatening to upend traditional models of content production across streaming, film, music, podcasting, and more. Supertone, a subsidiary of HYBE, a major South Korean entertainment company known for managing global K-pop sensation BTS, has unveiled the beta version of its AI voice conversion service called "Shift."

The implications of Shift's capabilities are raising eyebrows. The technology stands to 'democratize' content creation by empowering individuals to produce diverse vocal outputs and narratives virtually. This eliminates the need for teams of voice actors, audio engineers, and creators.

"Youtubers often don't reveal their dual identities. Shift caters to that audience by enabling immersive role-play and disguised personas," Lee Kyo-gu, Supertone's CEO emphasized in an interview with The Miilk.

At its core, Shift represents a significant move towards a future of near-zero marginal costs for content creation. AI voice synthesis like Supertone's also radically reduces the production costs and resources previously required. This cutting-edge technology could transform content production across K-pop, movies, YouTube, and beyond, ushering in an era of mass production that defies traditional constraints.

"Youtubers often don't reveal their dual identities. Shift caters to that audience by enabling immersive role-play and disguised personas," Lee Kyo-gu, Supertone's CEO emphasized in an interview with The Miilk. (출처 : supertone)

The Handcrafted Approach's Curtain Call

Historically, contents like BTS and Squid Game emerged through an intensely handcrafted process. Cultivating a K-pop idol group demanded years of exhaustive resources for scouting trainees, rigorous training regimens, and choreographed debuts. Filmmakers similarly assembled intricate teams of actors, directors, crews, developers, and locations over months or years to create a movie, drama, or game.

Artificial intelligence is poised to upend these established norms, democratizing content creation. Imagining and crafting diverse characters and narratives is becoming increasingly accessible – a single creator can virtually manifest a multiverse of possibilities. Supertone, with Hybe as its majority shareholder, exemplifies how AI lowers production barriers through its voice conversion capability, Shift.

👉 The AI-Driven Digital Divide: Who Should Build the Guardrails?

"YouTubers and streamers now often command fan bases rivaling artists. Some reveal their identities; others don't. We're catering to the latter," explains Lee Kyo-gu, Supertone's CEO, who has academic expertise in electrical engineering, music technology, and computer music from Seoul National University, NYU, and Stanford. His post-doctoral research in machine learning and audio signal processing inspired him to found Supertone in 2020.

Released on May 15th, Shift allows users to select from 10 character voices, transforming their speech into the chosen persona's timbre in real-time. This real-time voice conversion service is popular among virtual YouTubers (vTubers), live streamers, and podcasters enticed by the allure of "borrowed identities" and immersive role-play.

The vTuber phenomenon is rapidly growing, with content consumption rising 14% from 970 million hours in 2022 to 1.31 billion hours in 2023. Within days of its beta launch, Shift had already garnered over 10,000 users across 100 countries, with Japan accounting for over a third of downloads.

In the Shift beta service, you can choose one of 10 characters to convert your voice. (출처 : Shift, captured by Sejin Kim)

Navigating Ethical Quandaries: Profit-Sharing and Consent

As AI-driven content tools like Shift disrupt traditional production models, concerns arise around potential job displacement for voice actors, audio engineers, and creators. Legitimate apprehensions over copyright conflicts, profit distributions, data privacy breaches, and impersonation risks loom.

To address these ethical challenges, Lee has proposed a "voice rights" system that shares profits with voice actors. Under this model, Shift users would be charged for using a voice actor's vocals, with a predetermined percentage going to the actor as ongoing royalties – a shift from the traditional one-time recording fee model.

👉 Technology Has Turned New York Dating Into 'Perfectionism'

"We're already securing unique voices and outstanding acting talents from indie voice actors through top-down contracts. A draft contract is prepared," Lee affirms. "While Shift's pricing model is still being finalized, we plan to introduce profit-sharing when it officially launches and transitions to a paid service."

Supertone's proprietary speech synthesis foundation model, NANSY (Neural Analysis & Synthesis), was primarily trained on data from the Korean government's AI Hub. However, Lee has drawn a firm line against using real individuals' voices for now to avoid impersonation risks.

"Even if a voice is converted to the same character, the original speaker's style and intonation remain distinct," Lee explains, highlighting Shift's ability to adjust the user-character voice blend ratio and implement watermarking for traceability.

While clarifying that Shift primarily targets creators rather than specific HYBE artists due to copyright complexities, Supertone is exploring collaborations with production companies to manifest character voices, such as de-aging actor Choi Min-sik for Disney+'s Casino or voicing the transformative protagonist in Netflix's Mask Girl.

Lee has proposed a "voice rights" system that shares profits with voice actors. (출처 : provided by Supertone)

Sejin's View: A Pivotal Juncture for Entertech

The Korean entertainment landscape is shifting with AI. "We empower creators to craft content unshackled by physical constraints," Lee emphasizes, accelerating the transition to a mass production paradigm. If Shift achieves commercial success, it could reorganize the K-content marketplace spanning K-pop, films, dramas, and gaming. Significant potential beckons for virtual idols, local micro-idols, and even even human idols or creators.

However, given ongoing copyright conflicts and fan sentiment around deploying AI for artists, creators, and producers, widespread adoption may unfurl gradually. Supertone's Shift symbolizes how AI-powered tools are enabling mass content production while raising vital questions about the future of legacy roles. As this frontier emerges, balanced models that protect and compensate creative contributors will be imperative.

In April, the theme song of Virtual Idol PLAVE, "Way 4 LUV," already took the top spot on a Korean music show, beating out popular human idols such as Le Sserafim. "We help creators transcend physical limits through virtuality's boundless possibilities," reiterates Lee. "This is just the beginning."

👉 The rising influence of Korean crypto during Bitcoin rallies

This is not unfamiliar scenery. This ethos mirrors Supertone's parent company HYBE's multi-label approach to K-pop artist development, facilitating rapid, high-volume idol group debuts. Since 2019, Hybe has unveiled nine idol groups, disrupting the traditional cycle of 5-7 year trainee incubation periods followed by sporadic debuts every 2-5 years. Recently, the girl group ILLIT, produced by beliftlab, one of HYBE's labels, debuted less than two years after New Jeans, produced by ADOR, another HYBE label. ADOR CEO Hee-jin Min has accused HYBE of copying New Jeans with ILLIT.

This significant shift has compelled stalwart entertainment titans like SM (agent of Aespa), JYP (agent of Twice), and YG (agent of Blackpink) to change beyond their one-man producer systems, adopting more corporate structures.

Lee added, "Whether it’s music or films, now anyone with an idea can create something with technology. On platforms like Netflix and Spotify, there are already millions of offerings, many of which may have never been played. This trend of overproduction is clearly ongoing. Content will continue to increase exponentially, and ultimately, it will be the consumers (viewers) who determine what succeeds."

HYBE's multi-label approach to K-pop artist development, facilitating rapid, high-volume idol group debuts. (출처 : HYBE)

Note: This article is a slightly modified translation of the original Korean articles by reporter Sejin Kim using generative artificial intelligence (AI). The Miilk not only covers generative AI, but also experiments with using generative AI solutions to create actual content.

We look forward to hearing your thoughts on this emerging technology and its potential impact, please drop us a line at

회원가입 후 뷰스레터를
주 3회 무료로 받아보세요!

단순 뉴스 서비스가 아닌 세상과 산업의 종합적인 관점(Viewpoints)을 전달드립니다. 뷰스레터는 주 3회(월, 수, 금) 보내드립니다.