Thoughts on Convergence and Divergence in Vocaloid Culture (and Beyond)


Since the early 2000s, the concept of convergence has been discussed actively in the fields of media, fan, and popular music studies. Scholars in these fields have developed this concept to explain the fluidity of digital media content across different online platforms, cooperation among media industries, and the active and participatory nature of end users (Galbraith and Karlin, eds. 2016; Jenkins 2006; Meikle and Young 2012). Their works commonly highlight the collective power of users in the web 2.0 world. Convergence, however, leaves a fundamental question unanswered: How does it further lead us to think about and understand the individuality, skillfulness, competence, and agency of creators of digital media content and musical works?[1]

In this essay, I discuss divergence as a conceptual frame to pursue this inquiry. Divergence, I argue, can lead us to a deeper appreciation of an individual user’s creativity, the new types of cultural works she or he invents, and their sociocultural impacts. As a case study, I discuss recent Vocaloid music scenes in Japan and examine musical works created by several individual Vocaloid producers.[2]


Vocaloid Culture and VocaloPs

Vocaloid is a singing synthesizer technology invented by Yamaha. In March 2000, Yamaha initiated the project in collaboration with Universitat Pompeu Fabra in Barcelona, Spain with the aim of creating synthesizer software that can produce “human” singing voices. The company released the first version of Vocaloid in 2004 and produced its fourth version in 2014. Vocaloid has been updated in consideration of detailed vocal nuances and articulation, techniques that are applied in actual singing, multiplicity of languages that Vocaloid covers (such as English, Korean, Spanish, Chinese, and Japanese), and overall user-friendliness, among other things. Yamaha licensed Vocaloid to third-party companies, so that these companies could produce software based on the singing synthesizer technology. These companies include Zero-G in the United Kingdom, Power FX in Sweden, and Crypton Future Media (hereafter, Crypton) in Japan.

Crypton’s voice synthesis software, Hatsune Miku (2007), is probably the most significant among Vocaloid-based software that contributed to the initial growth of Vocaloid culture in Japan. Hatsune Miku incorporates the 2D anime character Hatsune Miku originally designed by Japanese illustrator KEI and the voice samples of Japanese voice actress Fujita Saki. This software enables its users to create Vocaloid songs with their personal computers, and thus to manipulate the virtual pop star Miku into singing songs in the ways they like. This aspect has attracted attention not only from music creators but also from anime fans.[3] By the end of October 2007, Crypton sold more than 20,000 units, which was certainly a hit in the sound-making software market.

At the same time, the wider availability of “distribution infrastructure” (Tamagawa 2012) has provided amateur music makers with greater opportunities to circulate their Vocaloid works among users and fans of Vocaloid, as well as those followers of Japanese pop culture in general. A distribution infrastructure includes online social spaces, such as YouTube and Niconico, where users can show their own musical works and/or illustrations of Vocaloid characters such as Miku, exchange information, and collaborate with each other.[4] In particular, Niconico is an important distribution infrastructure for Vocaloid producers––often called “VocaloPs” in the musical communities. Niconico offers a video-sharing platform. Distinct from YouTube, Niconico allows its users to overlay their comments on video clips, and subsequent viewers can see these comments scrolling across the screen.

Since Crypton released Hatsune Miku in August 2007, VocaloPs began to upload their Vocaloid works onto the Niconico site. One of the most famous VocaloPs in Vocaloid fandom is called “ika,” who uploaded his original Vocaloid song entitled “Miku Miku ni shite ageru” (Let’s All Miku Miku) onto Niconico in September 2007.

This song became the runaway success of this VocaloP; as of January 2017, this original video clip received more than 12,500,000 views and 1,790 comments. Vocaloid songs that have reached more than ten million views are classified as shinwa (“myths”) in the communities. In addition to ika’s “Miku Miku ni shite ageru,” only two other Vocaloid songs became shinwa: ryo’s “Melt,” uploaded on December 7, 2007; and KurousaP’s “Senbon zakura” (One Thousand Cherry Trees), uploaded on September 17, 2011. These Vocaloid producers later played key roles in the entertainment industry. After their music videos became well known, a number of amateur VocaloPs signed contracts with major record labels, and their original compositions appeared in popular Vocaloid products, including a video game series called Hatsune Miku: Project Diva, created by Sega and Crypton and Vocaloid-related compilation albums.[5]

Here the question of what social and cultural contexts contributed to the visibility of VocaloPs in Japanese media culture––and how we can explicate these contexts––remains to be answered.


Vocaloid in Convergence Culture

Convergence can be a useful concept in explaining how collective grassroots activities of VocaloPs gain their social and cultural significance. In Convergence Culture: Where Old and New Media Collide (2006), Henry Jenkins explains media convergence as a cultural process in which different types of media content flow across multiple platforms and the behavior of media audiences and users who help the circulation of media content across the media platforms, thereby shedding light on the participatory nature of media and popular culture. Convergence illustrates “technological, industrial, cultural, and social changes,” and in such conditions, “every important story gets told, every brand gets sold, and every consumer gets courted across multiple media platforms” (2006:3). It involves “[t]he informal and sometimes unauthorized flow of media content when it becomes easy for consumers to archive, annotate, appropriate, and recirculate media content” (ibid.:326). This concept, therefore, provides us with a lens through which to recognize the participatory and fluid nature of Vocaloid culture; this culture has been sustained through fans’ active rereading and reproduction of original works (Yamada forthcoming).[6]

Media convergence pertains to the Hatsune Miku phenomenon, prompting researchers to inquire how fans circulate Vocaloid works originally created by VocaloPs, including ika, ryo, and KurousaP, among many others. As Leavitt et al. note, “much of Miku’s success is based on fans recreating, remixing and recirculating her character in [a] bottom-up way” (2016:223). This process is often called “niji sōsaku” (“secondary creation”). Different kinds of secondary creation help vitalize Vocaloid fandom. For instance, arranger and Niconico user KM created an orchestral arrangement of ika’s “Miku Miku ni shite ageru” and posted it onto both Niconico and YouTube on December 9, 2012.

Although this arrangement does not include Miku’s singing voice, it received a large number of positive comments along with 1,164 thumbs-ups and 18 thumbs-downs as of January 12, 2016. Another kind of secondary creation is “utattemita” (“I tried to sing it”), in which fans sing along with existing Vocaloid songs composed by famous VocaloPs. Searching “Miku miku ni shite ageru utattemita” on YouTube yields about 37,400 results as of January 2017. “Odottemita” (“I tried to dance it”) videos constitute another kind of secondary creation; in odottemita videos, fans dance along with these Vocaloid songs. Looking at instances of secondary creation in Vocaloid fandom can be a way to realize the convergent nature––as well as performative aspects––of contemporary media culture; fans enhance their social and cultural visibility and become active participants in the construction of media culture––no longer passive consumers or “dupes of mass deception” (Adorno 1991 [1938]).

Although the concept of convergence, together with active audiences and participatory culture, has offered a perspective in which to recognize the collective power of mass audiences, how do we then take into account and explain the individuality and specificity of active users of Vocaloid, as well as moments where different kinds of cultural creativity emerge from the Vocaloid phenomenon––aside from the collective signification of the phenomenon?


Divergence in the Contemporary Moment

In this section, I discuss the work of an individual producer by using divergence as an alternative conceptual frame. I apply this concept in ways distinct from those who use it to view linkages between different media industries, consumers across multiple media platforms, and media content as “extension from one medium to another” (Fujiki 2016:62–63), and from those who focus on macro-structural aspects of information and communication industry in which “divergent technology and infrastructure [lead] to divergent media production, distribution and consumption” (Galbraith and Karlin 2016:14–15). Jenkins also uses it to refer to “the diversification of media channels and delivery mechanisms” (2006:324). Instead, divergence here refers to moments where new cultural possibilities emerge and spread out. And such creative, tangential spaces are constructed following experimental cooperation and collaboration between traditional and modern media-based cultural practices––through interventions of creative and purposive individual subjects, or agents (see Yamada 2017). I therefore develop divergence to direct our attention to individual VocaloPs’ creativity, artistry, new types of cultural products they contrive, and their idiosyncratic outcomes in convergence culture. This project also aims to further facilitate a creator-centered approach to the studies of current web 2.0 phenomenon in general.[7]

Indeed, innovative users find different ways of utilizing Vocaloid technology, producing the Vocaloid works, and, more important, broadening the cultural horizon in which Vocaloid plays important roles in Japanese performing arts. In the summer of 2014, librettist, lyricist, and composer Tamawari Hiroshi created a 30 minute-long opera film entitled Vocaloid Opera AOI with Bunraku Puppets, in which the Vocaloid singing synthesizer technology is used for the music in the play, and all the actresses are bunraku puppets.

Bunraku is a form of traditional Japanese performing arts, which has existed on the Japanese mainland for more than three centuries. This is, thus, a hybrid of cultural elements derived from traditional Japanese performing arts and those derived from the new pop culture phenomenon at the contemporary moment. This Vocaloid opera was premiered in Hyper Japan 2014––the United Kingdom’s largest Japanese culture festival held by the London-based Japanese media company Cross Media. Vocaloid Opera AOI with Bunraku Puppets has been played in different countries since then; for example, this opera film was played during Hinode, a Japanese pop culture festival in Moscow in April 2016.

Such an innovative attempt pushes the edge of Vocaloid culture. What anthropologist Paul Rabinow calls “the contemporary” is a concept that “provides an orientation that seeks out and takes up practices, terms, concepts, forms, and the like from traditional sources but seeks to do different things with them from the things they were forged to do originally or how they have been understood more recently” (2011:110). In this sense, Tamawari’s project of creating the Vocaloid opera can be situated in the contemporary moments where the old cultural elements and traditions gain their new meanings, which are often distinct from their previous ones––that is, where cultural divergence takes place. Vocaloid technology has enabled Tamawari not only to create this new type of Vocaloid works but also to introduce bunraku puppets into Japanese pop culture events.



Current scholarship on the Vocaloid phenomenon has tended to focus mostly on visual and performative aspects of the virtual pop idol Hatsune Miku and “social energy arising from a collective interest in Miku” (Condry 2013:63) and other Vocaloid characters that Crypton has created (see, for example, Black 2012; Leavitt, Knight, and Yoshiba 2016; Zaborowski 2016). Such research has been done mostly in the domain of cultural and media studies.[8] Therefore, in this context, a rethinking of divergence in convergence culture leads not only to shedding new light on individual VocaloPs’ creativity, artistry, and actual techniques used to manipulate Vocaloid technology in the web 2.0 era, but also, in a broader sense, to offering fresh theoretical perspectives and insights into the fluidity and the dynamic nature of today’s media culture. This attempt may also contribute to the construction of new venues for active cross-disciplinary conversations across the fields of media, fan, cultural studies, and (ethno)musicology.



Adorno, Theodor W. 1991 [1938]. The Culture Industry: Selected Essays on Mass Culture. Edited by J. M. Bernstein. New York: Routledge.

Black, Daniel. 2012. “The Virtual Idol: Producing and Consuming Digital Femininity.” In Idols and Celebrity in Japanese Media Culture, edited by Patrick W. Galbraith and Jason G. Karlin, 209-28. Basingstoke, UK: Palgrave Macmillan.

Cardeno, Sean. 2014. “Miku Expo 2014 Report: Miku’s Popularity is Strong in the US Too.” Tokyo Otaku Mode.’s-Popularity-is-Strong-in-the-US-Too. Accessed January 14, 2017.

Condry, Ian. 2013. The Soul of Anime: Collaborative Creativity and Japan’s Media Success Story. Durham, NC: Duke University Press.

Duffett, Mark, ed. 2014. Popular Music Fandom: Identities, Roles and Practices. New York:  Routledge.

Fujiki, Hideaki. 2016. “Networking Citizens through Film Screenings: Cinema and Media in Post-3/11 Social Movements.” In Media Convergence in Japan, edited by Patrick W. Galbraith and Jason G. Karlin, 60-87. Kinema Club.

Galbraith, Patrick W. and Jason G. Karlin. 2016. “Introduction: At the Crossroads of Media Convergence in Japan.” In Media Convergence in Japan, edited by Patrick W. Galbraith and Jason G. Karlin, 1-28. Kinema Club.

Galbraith, Patrick W. and Jason G. Karlin, eds. 2012. Idols and Celebrity in Japanese Media Culture. Basingstoke, UK: Palgrave.

_______. 2016. Media Convergence in Japan. Kinema Club.

Ito, Mizuko, Daisuke Okabe, and Izumi Tsuji, eds. 2012. Fandom Unbound: Otaku Culture in a Connected World. New Haven: Yale University Press.

Jenkins, Henry. 2006. Convergence Culture: Where Old and New Media Collide. New York:  New York University Press.

Kinsella, Sharon. 1998. “Japanese Subculture in the 1990s: Otaku and the Amateur Manga Movement.” Journal of Japanese Studies 24(2):289-316.

Leavitt, Alex, Tara Knight, and Alex Yoshiba. 2016. “Producing Hatsune Miku: Concerts, Commercialization, and the Politics of Peer Production.” In Media Convergence in Japan, edited by Patrick W. Galbraith and Jason G. Karlin, 200-29. Kinema Club.

Meikle, Graham and Sherman Young. 2012. Media Convergence: Networked Digital Media in Everyday Life. Basingstoke, UK: Palgrave Macmillan.

Rabinow, Paul. 2011. The Accompaniment: Assembling the Contemporary. Chicago: University of Chicago Press.

Rice, Timothy. 2003. “Time, Place, and Metaphor in Musical Experience and Ethnography.” Ethnomusicology 47(2):151-79.

Ruskin, Jesse D. and Timothy Rice. 2012. “The Individual in Musical Ethnography.” Ethnomusicology 56(2):299-327.

Stock, Jonathan P. J. 2001. “Toward an Ethnomusicology of the Individual, or Biographical Writing in Ethnomusicology.” The World of Music 43(1):5-19.

Tamagawa, Hiroaki. 2012. “Comic Market as Space for Self-Expression in Otaku Culture.” In Fandom Unbound: Otaku Culture in a Connected World, edited by Mizuko Ito, Daisuke Okabe, and Izumi Tsuji, 107-32. New Haven: Yale University Press.

Yamada, Keisuke. Forthcoming. Supercell’s Supercell featuring Hatsune Miku. New York: Bloomsbury.

_______. 2017. “Rethinking Iemoto: Theorizing Individual Agency in the Tsugaru Shamisen Oyama-ryū” Asian Music 48(1):28-57.

Zaborowski, Rafal. 2016. “Hatsune Miku and Japanese Virtual Idols.” In The Oxford Handbook of Music and Virtuality, edited by Sheila Whiteley and Shara Rambarran, 111-28. Oxford: Oxford University Press.



[1] This inquiry pertains, especially, to the field of ethnomusicology, in which subject-centered ethnographic approaches have long been valued (see Ruskin and Rice 2012).

[2] I have a forthcoming book about Japanese creator group Supercell’s eponymous first album, which features Hatsune Miku, released in Japan in March 2009 by Sony Music Japan (see Yamada Forthcoming). This book is part of Bloomsbury Publishing’s new 33-1/3 Japan Series (

[3] Hatsune Miku software was successfully marketed to anime fans, as Japan has large anime fan bases (see, for example, Condry 2013; Ito et al. 2012; Kinsella 1998).

[4] Besides Hatsune Miku, Vocaloids include female anime characters such as Meiko, Kagamine Rin, and Megurine Luka; and male characters such as Kaito and Kagamine Len. Following Miku, these characters became famous.

[5] These pieces have also been performed by Miku, who is displayed as a 3D hologram in live stage shows taking place in many countries, including China, Taiwan, Mexico, Brazil, Canada, the US, and Japan, among others. These live performances are often part of an event called “Hatsune Miku Expo.” This event usually attracts a large number of fans. For instance, in Hatsune Miku Expo 2014 in Los Angeles on October 11 and 12, approximately 15,000 fans were in attendance (Cardeno 2014).

[6] Convergence remains an important subject of discussion in media studies. For instance, Patrick Galbraith and Jason Karlin (2016) recently edited a collection of essays on contemporary media culture in Japan, which is entitled Media Convergence in Japan. Contributors to these essays seem to be more or less following Jenkins’s mission of understanding media culture through the lens of convergence. Nevertheless, these editors claim that Jenkins’s work is narrowly limited to a US context; in the introductory chapter, Galbraith and Karlin argue that “because of [Jenkins’s] position in the US and consideration of media convergence in primarily that national context, there is still room to consider how the coming together of different technologies, industries, markets, genres and audiences plays out in different places and times” (2016:12).

[7] We can draw on subject-centered approaches developed within the field of ethnomusicology, including, for example, Jonathan Stock’s “ethnomusicology of the individual” (2001) and Timothy Rice’s “subject-centered musical ethnography” (2003). See also Ruskin and Rice (2012).

[8] I recognize that there has been a lack of cross-disciplinary conversations between cultural and media studies and ethnomusicology in general. For instance, although a number of edited collections that cover different kinds of user/fan creativity, performativity, and productivity in popular music culture have recently been published (e.g., Duffett 2014; Galbraith and Karlin 2012; Ito et al. 2012), ethnomusicologists have rarely been part of these cross-disciplinary projects.



Keisuke Yamada is a PhD student in ethnomusicology at the University of Pennsylvania. He is currently working on a long-term ethnographic project involving the shamisen (Japanese three-stringed instrument) in East Asia, Southeast Asia, and North America, especially focusing on the transnational circulation of the materials from which the instrument is made. He has published an article in Asian Music and a review in the Journal of World Popular Music. Also, he has written a book for Bloomsbury Publishing’s new 33-1/3 Japan Series, which is entitled Supercell’s Supercell featuring Hatsune Miku (forthcoming 2017).



University of Pennsylvania, Department of Music, 201 South 34th Street, Philadelphia, PA 19104-6313, USA.



"Sounding Board" is intended as a space for scholars to publish thoughts and observations about their current work. These postings are not peer reviewed and do not reflect the opinion of Ethnomusicology Review. We support the expression of controversial opinions, and welcome civil discussion about them. We do not, however, tolerate overt discrimination based on race, sex, gender, sexual orientation, or religion, and reserve the right to remove posts that we feel might offend our readers.