The process of turning web novels into webtoons and data science
Picture
Member for
8 months 2 weeks
Real name
Keith Lee
Bio
Professor of AI/Data Science @SIAI
Senior Research Fellow @GIAI Council
Head of GIAI Asia
Published
Modified
Web novel to Webtoon conversion is not only based on 'profitability' If the novel author is endowed with money or bargaining power, 'Webtoonization' may be nothing more than a marketting tool for the web novel. Data science modeling based on market variables unable to grab such cases
A student in SIAI's MBA AI/BigData progam, struggling with her thesis, chose her topic as the condition for turning a web novel into a webtoon. In general, people would simply think that if the number of views is high and the sales volume of the web novel is large, a follow-on contract with a webtoon studio will be much easier. She brought in a few reference data science papers, but they only looked into publicly available information. What if the conversion was the choice of the web novel author? What if the author just wanted to spend more marketing budget by adding webtoon in his line-up?
Literature mostly runs hierarchical structures during 'deep learning' and use 'SVM', a task that simply relies on computer calculations, and calculate the number of all cases provided by the Python library. Sorry to put it this way, but such calculations are nothing more than a waste of computer resources. It has also been pointed out that the crude reports of such researchers are still registered as academic papers.
WebNovel WebToon
Put all crawled data into 'AI', then it will swing a majic wand?
Converting a web novel into a webtoon can be seen as changing a written story book into an illustrated story book. Professor Daeyoung Lee, Dean of the Graduate School of Arts at Chung-Ang University, explained that the change to OTT is a change to video story books.
The reason this transition is not easy is because the transition costs are high. Domestic webtoon studios have a team of designers ranging from as few as 5 to as many as dozens of designers, and the market has been differentiated considerably into a market where even a small character image or pattern that seems simple to our eyes must be purchased and used. After paying all the labor costs and purchasing costs for characters, patterns, etc., it still takes $$$ to turn a web novel into a webtoon.
This is probably the mindset of typical 'business experts' to think that manpower and funds will be concentrated on web novels that seem to have a high possibility of success as webtoons, as investment money is invested and new commercialization challenges are required.
However, the market does not operate solely on the logic of capital, and 'plans' based on the logic of capital are often wrong due to failing to read the market properly. In other words, even if you create a model by collecting data such as the number of views, comments, and purchases provided by platforms and consider the possibility of webtoonization and the success of the webtoon, it is unlikely that it will actually be correct.
One thing to point out here is that although there are many errors due to market uncertainty, there are also a significant number of errors due to model inaccuracy.
Wrong data, wrong model
For those who simply think that 'deep learning' or 'artificial intelligence' will take care of it, creating a model incorrectly means using a less suitable algorithm when one of the 'deep learning' algorithms is said to be a better fit, or worse. It will result in the understanding that good artificial intelligence should be used, but less good artificial intelligence is used.
However, which 'deep learning' or 'artificial intelligence' is a good fit and which one is not a good fit is a matter of lower priority. What is really important is how accurately you can capture the market structure hidden in the data, so you must be able to verify whether it fits well not only by chance in the data selected today, but also consistently fits well in the data selected in the future. Unfortunately, we have already seen for a long time that most 'artificial intelligence'-related papers published in Korea intentionally select and compare data from well-matched time points, and professors' research capabilities are judged simply by the number of K-SCI papers, and the papers are compared. We cannot help but point out that proper verification is not carried out due to the Ministry of Education's crude regulations regarding which academic journals that appear frequently are good journals.
The calculation known as 'deep learning' is simply one of the graph models that finds nonlinear patterns in a more computationally dependent manner. In natural language that must be used according to grammar, computer games that must be operated according to rules, etc., there may be no major problems in use because the probability of errors in the data itself is close to 0%, but the above webtoonization process is not expected to respond in the market. There may be problems that are not resolved, and the decision-making process for webtoons is likely to be quite different from what an outsider would see.
Simply put, it can be pointed out that the barriers given to writers who already have a successful 'track record' are completely different from the barriers given to new writers. Kang Full, a writer who recently achieved great success with 'Moving', explained in an interview that he started with the intellectual property rights of webtoons from the beginning, and that he made major decisions during the transition to OTT. This is a situation that ordinary web novel and webtoon writers cannot even imagine. This is because most web novel and webtoon platforms can sell their content on the platform through contracts that retain intellectual property rights for secondary works.
How much of it is possible for an author to decide whether to make a webtoon or an OTT, reflecting his or her own will? If this proportion increases, what conclusion will the โdeep learningโ model above produce?
The general public's way of thinking does not include cases where webtoons and OTT adaptations are carried out at the author's will. The 'artificial intelligence' models mentioned above will only explain what percentage of the 'logic of capital' that operates inside the web novel and webtoon platform is correct. However, as soon as the proportion of 'author's will' instead of 'logic of capital' is reflected increases, that model will judge the effects of variables we expected to be much lower, and conversely, it will appear as if the effects of unexpected variables are higher. In reality, it was simply because we failed to include an important variable called 'author's will' that should have been reflected in the model, but since we did not even consider that part, we only ended up with an absurd story with an absurd title of 'Webtoonization process informed by artificial intelligence'.
Before data collection, understand the market first
It has now been two months since the student brought that model. For the past two months, I have been asking her to properly understand the market situation to find the missing pieces in the webtoonization process.
From my experience with business, I have seen that even though the company thought that it could take on an interesting challenge with enough data, it could not proceed due to the lack of the โChairmanโs willโ. On the other hand, companies that were completely unprepared or did not even have the necessary manpower said, โThis is the story you heard from the Chairman.โ I've seen countless times where they come up with absurd project ideas saying they're going to proceed 'as usual', and then only IT developers are hired without data science experts, and the work of copying open libraries from overseas markets is repeated.
Considering the amount of capital and market conditions that are also required for the webtoonization process, it is highly likely that a significant number of webtoons will be included in web novel writers' new work contracts in the form of a 'bundle', which is naturally included to attract already successful web novel writers, and generate profits. In the case of writers who want to dominate the webtoon studio, they are likely to sign a contract with the webtoon platform by signing a contract with the webtoon studio themselves and starting to serialize the webtoon after the first 100 or 300 episodes of the web novel are released. From the perspective of a web novel writer who has already experienced that profits increase due to the additional promotion of the web novel as the webtoon is developed, there are cases where the webtoon product is viewed as one of the promotional strategies to sell their intellectual property (IP) at a higher price. It happens.
To the general public, this 'author's will' may seem like an exception, but even if the above proportion of web novels converted to webtoons exceeds 30%, it becomes impossible to explain webtoons using data collected through general thinking. In a situation where there are already various market factors that make it difficult to increase accuracy, and in a situation where more than 30% is driven by other variables such as 'the author's will' rather than 'market logic', how can data collected through general thinking lead to a meaningful explanation? Can I?
Data science is not about learning โdeep learningโ but about building an appropriate model
In the end, it comes back to the point I always give to students. It is pointed out that 'we must understand reality and find a model that fits that reality.' In plain English, the expression changes to the need to find a model that fits the 'Data Generating Process (DGP)', but the explanatory model related to webtoonization above is a model that does not currently take 'DGP into consideration' at all. If scholars are in a situation where they are listening to the same presentation, complaints such as 'Who on earth selected the presenters' may arise, and there will be many cases where they will just leave even if they are criticized for being rude. This is because such an announcement itself is already disrespectful to the attendees.
In the above situation, in order to create a model that can be considered for DGP, you must have a lot of background knowledge about the web novel and webtoon markets. It does not reflect factors such as how web novel writers on major platforms communicate with platform managers, what the market relationship between writers and platforms is like, and to what extent and how the government intervenes, and simply inserts materials scraped from the Internet. There is no point in simply doing the work of โputting data intoโ the models that appear in โartificial intelligenceโ textbooks. If an understanding of the market can be derived from that data, it would be an attractive data work, but as I keep saying, if the data is not in the form of natural language that follows grammar or a game that follows rules, it will only be a waste of computer resources with no meaning. It's just that.
I don't know whether that student will be able to do some market research to destroy my counterargument at the meeting next month, or whether he will change the detailed structure of the model based on his understanding of the market, or worse, whether he will change the topic. What is certain is that a 'paper' with the name 'data' as a simple way to put the collected data into a coding library will end up being nothing more than a 'mixed-up code' containing only one's own delusions and a 'novel filled with text only'.
Picture
Member for
8 months 2 weeks
Real name
Keith Lee
Bio
Professor of AI/Data Science @SIAI
Senior Research Fellow @GIAI Council
Head of GIAI Asia
Microsoft Unveils Its First Custom, In-house Chips
Microsoft developed two new chips, including an AI accelerator, to better control the cost and performance of its Azure hardware stack.
At a Glance
Microsoft unveiled two new chips, one specially made for intense AI workloads, to better control its infrastructure stack.
Maia is a new line of AI accelerators while Cobalt is its Arm-based CPU meant for general purpose cloud workloads.
Microsoft said developing its own chips makes it easy to control the cost and performance of its cloud hardware stack.
Microsoft today unveiled its first two custom, in-house chips including an AI accelerator designed specifically for large language models. The tech giant said developing its own chips would let it โoffer more choice in price and performance for its customers.โ
At the companyโs Ignite event, CEO Satya Nadella showed off Maia, its first internally developed AI accelerator chip, and Cobalt, its first custom, in-house CPU meant for general purpose cloud workloads. Both chips are set to be available to customers in 2024.
Alan Priestley, vice president analyst at Gartner, said it makes sense for Microsoft to join other hyperscalers who have developed their own AI chips. "Deploying large scale infrastructure to host large language models like ChatGPT is expensive and hyperscalers like Microsoft can leverage their own custom-designed chips, optimized for these applications to lower operational costs โ reducing cost to consumers and businesses that want to use these large language models."
Maia, the AI accelerator The Maia 100 AI Accelerator is designed to power internal AI workloads running on Azure. Microsoft enlisted the help of OpenAI, its strategic partner and maker of ChatGPT, to provide feedback on how its large language models would run on the new hardware.
Sam Altman, CEO of OpenAI, said in a blog post: โWe were excited when Microsoft first shared their designs for the Maia chip, and weโve worked together to refine and test it with our models.โ
Microsoft had to build racks specifically for the Maia 100 server boards. These racks (pictured below) are wider than what typically sits in the companyโs data centers. The company claims that the expanded design โprovides ample space for both power and networking cables, essential for the unique demands of AI workloads.โ
Next to the Maia racks are โsidekicksโ that supply cold liquid to cold plates that are attached to the surface of Maia 100 chips, to remove heat.
"We've designed Maia 100 as an end-to-end rack for AI," Nadella said at the event. "AI power demands require infrastructure that is dramatically different from other clouds. The compute workloads require a lot more cooling as well network density."
Microsoft is already working on the next generation of Maia AI chips. Pat Stemen, partner program manager on the Microsoft AHSI team, said in a blog post: โMicrosoft innovation is going further down in the stack with this silicon work to ensure the future of our customersโ workloads on Azure, prioritizing performance, power efficiency and cost.โ
Cobalt CPUs to power general purpose workloads Cobalt CPUs are built on Arm architecture and is optimized for greater efficiency and performance in cloud native offerings. These chips already are powering servers inside Microsoftโs data center in Quincy, Washington (pictured below). Each chip has 128 cores and is designed to use less energy.
The company is using Cobalt for general purpose compute workloads, like Microsoft Teams and SQL servers, but is also planning on expanding its scope to virtual machine applications. At Ignite, Microsoft highlighted virtual machines from AMD that are optimized for AI workloads. The Azure ND MI300x v5 Virtual Machine features AMDโs Instinct MI300X as it is designed to support AI innovation for enterprises including AI model training and generative inferencing.
The goal of making custom chips Rani Borkar, corporate vice president for Azure Hardware Systems and Infrastructure (AHSI), said in a blog post that โthe end goal is an Azure hardware system that offers maximum flexibility and can also be optimized for power, performance, sustainability or cost."
AI workloads can be expensive to run. Building its own custom chips lets Microsoft ensure they perform optimally on its most important workloads, testing different frequency, temperature and power conditions. โBy controlling every facet โ from the low-power ethos of the Cobalt 100 chip to the intricacies of data center cooling โ Microsoft can orchestrate a harmonious interplay between each component,โ the company said.
Microsoft already builds its own servers and racks to drive down costs and give customers a โconsistentโ experience. Chips were the final missing piece. Prior to 2016, Microsoft had bought most layers of its cloud hardware off the shelf.