과제 작업 중에는 보도자료 -> 자체제작기사처럼 작성해주시면 됩니다. 이해를 돕기위해 팔로업 기사까지 추가해드립니다. 내부적으로는 소제목으로 추가되는 꼭지를 2-3개 뽑아드리는 총괄 관리, 편집인 및 인포그래픽 디자인 담당이 있습니다. 본 과제는 꼭지에 맞춘 논지를 끌어나갈 힘이 있는 분인지 판단하기 위한 목적입니다.
기사 작성 가이드
보도자료 요약
ㄴ보도자료 링크: 디즈니플러스 신규 가입자 중 절반이 광고 상품 선택 - ZDNet korea ㄴLead-in: 광고 요금제 쓰는 사람들이 이렇게 많아졌군요. 역시 가격이 떨어지면 그만큼 수요는 늘어날 수밖에 없을텐데, 반대로 가격을 낮추면 수익성이 떨어질 테니 디즈니도 고민이 많겠습니다. 광고로 수익 부족분을 메워야 될텐데, 요즘처럼 데이터 이용해서 광고 타게팅하는 것도 불법된 시대에 광고가 수익성이 나려나요…
자체 Talking point들을 소제목 1개씩으로 뽑아서 원래의 보도자료를 Lead-in과 3-4개의 소제목이 추가된 기사로 만들어주시면 됩니다. 각 소제목 별로 대략 3문단 정도의 논지 전개를 통해 기존 보도자료의 부족한 점을 메워넣으시면 됩니다. 위의 방식이 실제로 일하는 방식입니다.
던져드리는 포인트들을 빠르게 읽고 소화해서 보도자료에 추가 정보를 붙인 고급 기사로 변형시키는 업무를 거의 대부분 못하시는데, 이유가
1.내용을 이해 못하는 경우와
2.기사 형태의 글로 작성하지 못하는 경우
로 구분됩니다. 대부분은 내용을 이해 못해서 기사 자체를 쓰지도 못하고, 시간을 들여 노력해도 이해를 못해서, 빠르게 이해할 수 있는 능력을 점검하기 위해 이런 테스트를 만들었습니다.
더불어 블로그 글을 쓰는 것이 아니라 기사로 만들어야하니까, 기사형 문체를 쓸 수 있는지도 확인 대상입니다.
거의 대부분은 1번에서 문제가 있어서 읽는 사람을 당황스럽게 만드는 경우가 많고, 최근에는 2번에 문제가 있는데도 불구하고 지원하는 사례들도 부쩍 늘었습니다. 저희 언론사들의 기사를 몇 개 정도 읽어보고 2번에 좀 더 신경써서 작업 부탁드립니다.
실제 업무시 진행 속도
실제 업무를 시작하면 처음 적응기에는 3-4시간을 써야 기사 1개를 쓰시던데, 점차 시간이 줄어들어 2시간 이내에 쓰시게 되더라구요. 빠르게 쓰시는 분들은 20~30분에 1개 씩의 기사를 작성하십니다.
시급제로 운영하다가 최근 시스템이 안착되고 난 다음부터는 1건당으로 급여를 책정했습니다. 기본급은 1건 당 25,000원입니다만, 퀄리티가 나오는 기사만 싣고 있어 실질적인 운영은 +5,000원해서 30,000원입니다.
Not the quality of teaching, but the way it operates Easier admission and graduation bar applied to online degrees Studies show that higher quality attracts more passion from students
Although much of the prejudice against online education courses has disappeared during the COVID-19 period, there is still a strong prejudice that online education is of lower quality than offline education. This is what I feel while actually teaching, and although there is no significant difference in the content of the lecture itself between making a video lecture and giving a lecture in the field, there is a gap in communication with students, and unless a new video is created every time, it is difficult to convey past content. It seems like there could be a problem.
On the other hand, I often get the response that it is much better to have videos because they can listen to the lecture content repeatedly. Since the course I teach is an artificial intelligence course based on mathematics and statistics, I heard that students who forget or do not know mathematical terminology and statistical theory often play the video several times and look up related concepts through textbooks or Google searches. There is a strong prejudice that the level of online education is lower, but since it is online and can be played repeatedly, it can be seen as an advantage that advanced concepts can be taught more confidently in class.
Is online inferior to offline?
While running a degree program online, I have been wondering why there is a general prejudice about the gap between offline and online. The conclusion reached based on experience until recently is that although the lecture content is the same, the operating method is different. How on earth is it different?
The biggest difference is that, unlike offline universities, universities that run online degree programs do not establish a fierce competition system and often leave the door to admission widely open. There is a perception that online education is a supplementary course to a degree course, or a course that fills the required credits, but it is extremely rare to run a degree course that is so difficult that it is perceived as a course that requires a difficult challenge as a professional degree.
Another difference is that there is a big difference in the interactions between professors and students, and among students. While pursuing a graduate degree in a major overseas city such as London or Boston, having to spend a lot of time and money to stay there was a disadvantage, but the bond and intimacy with the students studying together during the degree program was built very densely. Such intimacy goes beyond simply knowing faces and becoming friends on social media accounts, as there was the common experience of sharing test questions and difficult content during a degree, and resolving frustrating issues while writing a thesis. You may have come to think that offline education is more valuable.
Domestic Open University and major overseas online universities are also trying to create a common point of contact between students by taking exams on-site instead of online or arranging study groups among students in order to solve the problem of bonding and intimacy between students. It takes a lot of effort.
The final conclusion I came to after looking at these cases was that the difficulty of admission, the difficulty of learning content, the effort to follow the learning progress, and the similar level of understanding among current students were not found in online universities so far, so we can compare offline and online universities. I came to the conclusion that there was a distinction between .
Would making up for the gap with an online degree make a difference?
First of all, I raised the level of education to a level not found in domestic universities. Most of the lecture content was based on what I had heard at prestigious global universities and what my friends around me had heard, and the exam questions were raised to a level that even students at prestigious global universities would find challenging. There were many cases where students from prestigious domestic universities and those with master's or doctoral degrees from domestic universities thought it was a light degree because it was an online university, but ran away in shock. There was even a community post asking if . Once it became known that it was an online university, there was quite a stir in the English-speaking community.
I have definitely gained the experience of realizing that if you raise the difficulty level of education, the aspects that you lightly think of as online largely disappear. So, can there be a significant difference between online and offline in terms of student achievement?
The table above is an excerpt from a study conducted to determine whether the test score gap between students who took classes online and students who took classes offline was significant. In the case of our school, we have never run offline lectures, but a similar conclusion has been drawn from the difference in grades between students who frequently visited offline and asked many questions.
First, in (1) – OLS analysis above, we can see that students who took online classes received grades that were about 4.91 points lower than students who took offline classes. Various conditions must be taken into consideration, such as the student's level may be different, the student may not have studied hard, etc. However, since it is a simple analysis that does not take into account any consideration, the accuracy is very low. In fact, if students who only take classes online do not go to school due to laziness, their lack of passion for learning may be directly reflected in their test scores, but this is an analysis value that is not reasonably reflected.
To solve this problem, in (2) – IV, the distance between the offline classroom and the students' residence was used as an instrumental variable that can eliminate the external factor of students' laziness. This is because the closer the distance is, the easier it will be to take offline classes. Even though external factors were removed using this variable, the test scores of online students were still 2.08 points lower. After looking at this, we can conclude that online education lowers students' academic achievement.
However, a question arose as to whether it would be possible to leverage students' passion for studying beyond simple distance. While looking for various variables, I thought that the number of library visits could be used as an appropriate indicator of passion, as it is expected that passionate students will visit the library more actively. The calculation transformed into (3) - IV showed that students who diligently attended the library received 0.91 points higher scores, and the decline in scores due to online education was reduced to only 0.56 points.
Another question that arises here is how close the library is to the students' residences. Just as the proximity to an offline classroom was used as a major variable, the proximity of the library is likely to have had an effect on the number of library visits.
So (4) – After confirming that students who were assigned a dormitory by random drawing using IV calculations did not have a direct effect on test scores by analyzing the correlation between distance from the classroom and test scores, we determined the frequency of library visits among students in that group. and recalculated the gap in test scores due to taking online courses.
(5) – As shown in IV, with the variable of distance completely removed, visiting the library helped increase the test score by 2.09 points, and taking online courses actually helped increase the test score by 6.09 points.
As can be seen in the above example, the basic simple analysis of (1) leads to a misleading conclusion that online lectures reduce students' academic achievement, while the calculation in (5) after readjusting the problem between variables shows that online lectures reduce students' academic achievement. Students who listened carefully to lectures achieved higher achievement levels.
This is consistent with actual educational experience: students who do not listen to video lectures just once, but take them repeatedly and continuously look up various materials, have higher academic achievement. In particular, students who repeated sections and paused dozens of times during video playback performed more than 1% better than students who watched the lecture mainly by skipping quickly. When removing the effects of variables such as cases where students were in a study group, the average score of fellow students in the study group, score distribution, and basic academic background before entering the degree program, the video lecture attendance pattern is simply at the level of 20 or 5 points. It was not a gap, but a difference large enough to determine pass or fail.
Not because it is online, but because of differences in students’ attitudes and school management
The conclusion that can be confidently drawn based on actual data and various studies is that there is no platform-based reason why online education should be undervalued compared to offline education. The reason for the difference is that universities are operating online education courses as lifelong education centers to make additional money, and because online education has been operated so lightly for the past several decades, students approach it with prejudice.
In fact, by providing high-quality education and organizing the program in a way that it was natural for students to fail if they did not study passionately, the gap with offline programs was greatly reduced, and the student's own passion emerged as the most important factor in determining academic achievement.
Nevertheless, completely non-face-to-face education does not help greatly in increasing the bond between professors and students, and makes it difficult for professors to predict students' academic achievement because they cannot make eye contact with individual students. In particular, in the case of Asian students, they rarely ask questions, so I have experienced that it is not easy to gauge whether students are really following along well when there are no questions.
A supplementary system would likely include periodic quizzes and careful grading of assignment results, and if the online lecture is being held live, calling students by name and asking them questions would also be a good idea.
Can a graduate degree program in artificial intelligence actually help increase wages?
Picture
Member for
2 months 3 weeks
Real name
Keith Lee
입력
수정
Asian companies convert degrees into years of work experience Without adding extra values to AI degree, it doesn't help much in salary 'Dummification' in variable change is required to avoid wrong conclusion
In every new group, I hide the fact that I have studied upto PhD, but there comes a moment when I have no choice but to make a professional remark. When I end up revealing that my bag strap is a little longer than others, I always get asked questions. They sense that I am an educated guy only through a brief conversation, but the question is whether the market actually values it more highly.
When asked the same question, it seems that in Asia they are usually sold only for their 'name value', and the western hemisphere, they seem to go through a very thorough evaluation process to see if one has actually studied more and know more, and are therefore more capable of being used in corporate work.
Typical Asian companies
I've met many Asian companies, but hardly had I seen anyone with a reasonable internal validation standard to measure one's ability, except counting years of schooling as years of work experience. Given that for some degrees, it takes way more effort and skillsets than others, you may come to understand that Asian style is too rigid to yield misrepresentation of true ability.
In order for degree education to actually help increase wages, a decent evaluation model is required. Let's assume that we are creating a data-based model to determine whether the AI degree actually helps increase wages. For example, a new company has grown a bit and is now actively trying to recruit highly educated talent to the company. Although there is a vague perception that the salary level should be set at a different level from the personnel it has hired so far, there is actually a certain level of salary. This is a situation worth considering if you only have very superficial figures about whether you should give it.
Asian companies usually end up only looking for comparative information, such as how much salary large corporations in the same industry are paying. Rather than specifically judging what kind of study was done during the degree program and how helpful it is to the company, the 'salary' is determined through simple separation into Ph.D, Masters, or Bachelors. Since most Asian universities have lower standard in grad school, companies separate graduate degrees by US/Europe and Asia. They create a salary table for each group, and place employees into the table. That's how they set salaries.
The annual salary structure of large companies that I have seen in Asia sets the degree program to 2 years for a master's and 5 years for a doctoral degree, and applies the salary table based on the value equivalent to the number of years worked at the company. For example, if a student who entered the integrated master's and doctoral program at Harvard University immediately after graduating from an Asian university and graduated after 6 years of hard work gets a job at an Asian company, the human resources team applies 5 years to the doctoral degree program. The salary range is calculated at the same level as an employee with 5 years of experience. Of course, since you graduated from a prestigious university, you may expect higher salary through various bonuses, etc., but as the 'salary table' structure of Asian companies has remained unchanged for the past several decades, it is difficult to avoid differenciating an employee with 6 years of experience with a PhD holder from a prestigious university.
I get a lot of absurd questions about whether it would be possible to find out by simply gathering 100 people with bachelor, master, and doctoral degree, finding out their salaries, and performing 'artificial intelligence' analysis. If the above case is true, then no matter what calculation method is used, be it highly computer resouce consuming recent calculation method or simple linear regression, as long as salary is calculated based on the annualization, it will not be concluded that a degree program is helpful. There might be some PhD programs that require over 6 years of study, yet your salary in Asian companies will be just like employees with 5 years experience after a bachelor's.
Harmful effects of a simple salary calculation method
Let's imagine that there is a very smart person who knows this situation. If you are a talented person with exceptional capabilities, it is unlikely that you will settle for the salary determined by the salary table, so a situation may arise where you are not interested in the large company. Companies looking for talent with major technological industry capabilities such as artificial intelligence and semiconductors are bound to have deeper concerns about salary. This is because you may experience a personnel failure by hiring people who are not skilled but only have a degree.
In fact, the research lab run by some passionate professors at Seoul National University operates by the western style that students have to write a decent dissertation if to graduate, regardless of how many years it takes. This receives a lot of criticism from students who want to get jobs at Korean companies. You can find various criticisms of the passionate professors on websites such as Dr. Kim's Net, which compiles evaluations of domestic researchers. The simple annualization is preventing the growth of proper researchers.
In the end, due to the salary structure created for convenience due to Asian companies lacking the capacity to make complex decisions, the people they hire are mainly people who have completed a degree program in 2 or 5 years in line with the general perception, ignoring the quality of thesis.
Salary standard model where salary is calculated based on competency
Let's step away from frustrating Asian cases. So you get your degree by competency. Let's build a data analysis in accordance with the western standard, where the degree can be an absolute indicator of competency.
First, you can consider a dummy variable that determines whether or not you have a degree as an explanatory variable. Next, salary growth rate becomes another important variable. This is because salary growth rates may vary depending on the degree. Lastly, to include the correlation between the degree dummy variable and the salary growth rate variable as a variable, a variable that multiplies the two variables is also added. Adding this last variable allows us to distinguish between salary growth without a degree and salary growth with a degree. If you want to distinguish between master's and doctoral degrees, you can set two types of dummy variables and add the salary growth rate as a variable multiplied by the two variables.
What if you want to distinguish between those who have an AI-related degree and those who have not? Just add a dummy variable indicating that you have an AI-related degree, and add an additional variable multiplied by the salary growth rate in the same manner as above. Of course, it does not necessarily have to be limited to AI, and various possibilities can be changed and applied.
One question that arises here is that each school has a different reputation, and the actual abilities of its graduates are probably different, so is there a way to distinguish them? Just like adding the AI-related degree condition above, just add one more new dummy variable. For example, you can create dummy variables for things like whether you graduated from a top 5 university or whether your thesis was published in a high-quality journal.
If you use the ‘artificial intelligence calculation method’, isn’t there a need to create dummy variables?
The biggest reason why the above overseas standard salary model is difficult to apply in Asia is that it is extremely rare for the research methodology of advanced degree courses to actually be applied, and it is also very rare for the value to actually translate into company profits.
In the above example, when data analysis is performed by simply designating a categorical variable without creating a dummy variable, the computer code actually goes through the process of transforming the categories into dummy variables. In the machine learning field, this task is called ‘One-hot-encoding’. However, when 'Bachelor's - Master's - Doctoral' is changed to '1-2-3' or '0-1-2', the weight in calculating the annual salary of a doctoral degree holder is 1.5 times that of a master's degree holder (ratio of 2-3). , or an error occurs when calculating by 2 times (ratio of 1-2). In this case, the master's degree and doctoral degree must be classified as independent variables to separate the effect of each salary increase. If the wrong weight is entered, in the case of '0-1-2', it may be concluded that the salary increase rate for a doctoral degree falls to about half that of a master's degree, and in the case of '1-2-3', the same can be said for a master's degree. , an error is made in evaluating the salary increase rate of a doctoral degree by 50% or 67% lower than the actual effect.
Since 'artificial intelligence calculation methods' are essentially calculations that process statistical regression analysis in a non-linear manner, it is very rare to avoid data preprocessing, which is essential for distinguishing the effects of each variable in regression analysis. Data function sets (library) widely used in basic languages such as Python, which are widely known, do not take all of these cases into consideration and provide conclusions at the level of non-majors according to the situation of each data.
Even if you do not point out specific media articles or the papers they refer to, you may have often seen expressions that a degree program does not significantly help increase salary. After reading such papers, I always go through the process of checking to see if there are any basic errors like the ones above. Unfortunately, it is not easy to find papers in Asia that pay such meticulous attention to variable selection and transformation.
Obtaining incorrect conclusions due to a lack of understanding of variable selection, separation, and purification does not only occur among Korean engineering graduates. While recruiting developers at Amazon, I once heard that the number of string lengths (bytes) of the code posted on Github, one of the platforms where developers often share code, was used as one of the variables. This is a good way to judge competency. Rather than saying it was a variable, I think it could be seen as a measure of how much more care was taken to present it well.
There are many cases where many engineering students claim that they simply copied and pasted code from similar cases they saw through Google searches and analyzed the data. However, there may be cases in the IT industry where there are no major problems if development is carried out in the same way. As in the case above, in areas where data transformation tailored to the research topic is essential, statistical knowledge at least at the undergraduate level is essential, so let's try to avoid cases where advanced data is collected and incorrect data analysis leads to incorrect conclusions.
Did Hongdae's hip culture attract young people? Or did young people create 'Hongdae style'?
Picture
Member for
2 months 3 weeks
Real name
Keith Lee
입력
수정
The relationship between a commercial district and the concentration of consumers in a specific generation mostly is not by causal effect Simultaneity oftern requires instrumental variables Real cases also end up with mis-specification due to endogeneity
When working on data science-related projects, causality errors are common issues. There are quite a few cases where the variable thought to be the cause was actually the result, and conversely, the variable thought to be the result was the cause. In data science, this error is called ‘Simultaneity’. The first place where related research began was in econometrics, which is generally referred to as the three major data endogeneity errors along with loss of important data (Omitted Variable) and data inaccuracy (Measurement error).
As a real-life example, let me bring in a SIAI's MBA student's thesis . Based on the judgment that the commercial area in front of Hongik University in Korea would have attracted young people in their 2030s, the student hypothesized that by finding the main variables that attract young people, it would be possible to find the variables that make up the commercial area where young people gather. If the student's assumptions are reasonable, those who analyze commercial districts in the future will be able to easily borrow and use the model, and commercial district analysis can be used not only for those who want to open only small stores, but also for various areas such as promotional marketing of consumer goods companies, street marketing of credit card companies, etc.
Simultaneity error
However, unfortunately, it is not the commercial area in front of Hongdae that attracts young people in their 2030s, but a group of schools such as Hongik University and nearby Yonsei University, Ewha Womans University, and Sogang University that attract young people. In addition, the subway station one of the transportation hubs in Seoul. The commercial area in front of Hongdae, which was thought to be the cause, is actually the result, and young people in their 2030s, who were thought to be the result, may be the cause. In cases of such simultaneity, when using regression analysis or various non-linear regression models that have recently gained popularity (e.g. deep learning, tree models, etc.), it is likely that the simultaneity either exaggerates or under-estimates explanatory variables' influence.
The field of econometrics has long introduced the concept of ‘instrumental variable’ to solve such cases. It can be one of the data pre-processing tasks that removes problematic parts regardless of any of the three major data internal error situations, including parts where causal relationships are complex. Since the field of data science was recently created, it has been borrowing various methodologies from surrounding disciplines, but since its starting point is the economics field, it is an unfamiliar methodology to engineering majors.
In particular, people whose way of thinking is organized through natural science methodologies such as mathematics and statistics that require perfect accuracy are often criticized as 'fake variables', but the data in our reality has various errors and correlations. As such, it is an unavoidable calculation in research using real data.
From data preprocessing to instrumental variables
Returning to the commercial district in front of Hongik University, I asked the student "Can you find a variable that is directly related to the simultaneous variable (Revelance condition) but has no significant relationship (Orthogonality condition) with the other variable among the complex causal relationship between the two? One can find variables that have an impact on the growth of the commercial district in front of Hongdae but have no direct effect on the gathering of young people, or variables that have a direct impact on the gathering of young people but are not directly related to the commercial district in front of Hongdae.
First of all, the existence of nearby universities plays a decisive role in attracting young people in their 2030s. The easiest way to find out whether the existence of these universities was more helpful to the population of young people, but is not directly related to the commercial area in front of Hongdae, is to look at the youth density by removing each school one by one. Unfortunately, it is difficult to separate them individually. Rather, a more reasonable choice of instrumental variable would be to consider how the Hongdae commercial district would have functioned during the COVID-19 period when the number of students visiting the school area while studying non-face-to-face has plummeted.
In addition, it is also a good idea to compare the areas in front of Hongik University and Sinchon Station (one station to east, which is another symbol of hipster town) to distinguish the characteristics of stores that are components of a commercial district, despite having commonalities such as transportation hubs and high student crowds. As the general perception is that the commercial area in front of Hongdae is a place full of unique stores that cannot be found anywhere else, the number of unique stores can be used as a variable to separate complex causal relationships.
How does the actual calculation work?
The most frustrating part from engineers so far has been the calculation methods that involve inserting all the variables and entering all the data with blind faith that ‘artificial intelligence’ will automatically find the answer. Among them, there is a method called 'stepwise regression', which is a calculation method that repeats inserting and subtracting various variables. Despite warnings from the statistical community that it should be used with caution, many engineers without proper statistics education are unable to use it. Too often I have seen this calculation method used haphazardly and without thinking.
As pointed out above, when linear or non-linear series regression analysis is calculated without eliminating the 'error of simultaneity', which contains complex causal relationships, events in which the effects of variables are over/understated are bound to occur. In this case, data preprocessing must first be performed.
Data preprocessing using instrumental variables is called ‘2-Stage Least Square (2SLS)’ in the data science field. In the first step, complex causal relationships are removed and organized into simple causal relationships, and then in the second step, the general linear or non-linear regression analysis we know is performed.
In the first stage of removal, regression analysis is performed on variables used as explanatory variables using one or several instrumental variables selected above. Returning to the example of the commercial district in front of Hongik University above, young people are the explanatory variables we want to use, and variables related to nearby universities, which are likely to be related to young people but are not expected to be directly related to the commercial district in front of Hongik University, are used. will be. If you perform a regression analysis by dividing the relationship between the number of young people and universities before and after the COVID-19 pandemic period as 0 and 1, you can extract only the part of the young people that is explained by universities. If the variables extracted in this way are used, the relationship between the commercial area in front of Hongdae and young peoplecan be identified through a simple causal relationship rather than the complex causal relationship above.
Failure cases of actual companies in the field
Since there is no actual data, it is difficult to make a short-sighted opinion, but looking at the cases of 'error of simultaneity' that we have encountered so far, if all the data were simply inserted without 2SLS work and linear or non-linear regression analysis was calculated, the area in front of Hongdae is because there are many young people. A great deal of weight is placed on the simple conclusion that the commercial district has expanded, and other than for young people, monthly rent in nearby residential and commercial areas, the presence or absence of unique stores, accessibility near subway and bus stops, etc. will be found to be largely insignificant values. This is because the complex interaction between the two took away the explanatory power that should have been assigned to other variables.
There are cases where many engineering students who have not received proper education in Korea claim that it is a 'conclusion found by artificial intelligence' by relying on tree models and deep learning from the perspective of 'step analysis', which inserts multiple variables at intersections, but there is an explanation structure between variables. There is only a difference in whether it is linear or non-linear, and therefore the explanatory power of the variable is partially modified, but the conclusion is still the same.
The above case is actually perfectly consistent with the mistake made when a credit card company and a telecommunications company jointly analyzed the commercial district in the Mapo-gu area. An official who participated in the study used the expression, 'Collecting young people is the answer,' but then as expected, there was no understanding of the need to use 'instrumental variables'. He simply thought data pre-processing as nothing more than dis-regarding missing data.
In fact, the elements that make up not only Hongdae but also major commercial districts in Seoul are very complex. The reason why young people gather is mostly because the complex components of the commercial district have created an attractive result that attracts people, but it is difficult to find the answer through simple ‘artificial intelligence calculations’ like the above. When trying to point out errors in the data analysis work currently being done in the market, I simply chose 'error of simultaneity', but it also included errors caused by missing important variables (Omitted Variable Bias) and inaccuracies in collected variable data (Attenuation bias by measurement error). It requires quite advanced modeling work that requires complex consideration of such factors.
We hope that students who are receiving incorrect machine learning, deep learning, and artificial intelligence education will learn the above concepts and be able to do rational and systematic modeling.
One-variable analysis can lead to big errors, so you must always understand complex relationships between various variables. Data science is a model research project that finds complex relationships between various variables. Obsessing with one variable is a past way of thinking, and you need to improve your way of thinking in line with the era of big data.
When providing data science speeches, when employees come in with wrong conclusions, or when I give external lectures, the point I always emphasize is not to do 'one-variable regression.'
To give the simplest example, from a conclusion with an incorrect causal relationship, such as, "If I buy stocks, things will fall," to a hasty conclusion based on a single cause, such as women getting paid less than men, immigrants are getting paid less than native citizens, etc. The problem is not solved simply by using a calculation method known as 'artificial intelligence', but you must have a rational thinking structure that can distinguish cause and effect to avoid falling into errors.
SNS heavy users end up with lower wage?
Among the most recent examples I've seen, the common belief that using social media a lot causes your salary to decrease continues to bother me. Conversely, if you use SNS well, you can save on promotional costs, so the salaries of professional SNS marketers are likely to be higher, but I cannot understand why they are applying a story that only applies to high school seniors studying intensively to the salaries of ordinary office workers.
Salary is influenced by various factors such as one's own capabilities, the degree to which the company utilizes those capabilities, the added value produced through those capabilities, and the salary situation of similar occupations. If you leave numerous variables alone and do a 'one-variable regression analysis', you will come to a hasty conclusion that you should quit social media if you want to get a high-paying job.
People may think ‘Analyzing with artificial intelligence only leads to wrong conclusions?’
Is it really so? Below is a structured analysis of this illusion.
Problems with one-variable analysis
A total of five regression analyzes were conducted, and one or two more variables listed on the left were added to each. The first variable is whether you are using SNS, the second variable is whether you are a woman and you are using SNS, the third variable is whether you are female, the fourth variable is your age, the fifth variable is the square of your age, and the sixth variable is the number of friends on SNS. all.
The first regression analysis organized as (1) is a representative example of the one-variable regression analysis mentioned above. The conclusion is that using SNS increases salary by 1%. A person who saw the above conclusion and recognized the problem of one-variable regression analysis asked a question about whether women who use SNS are paid less because women use SNS relatively more. In (11.8), we differentiated between those who are female and use SNS and those who are not female and use SNS. The salary of those who are not female and use SNS increased by 1%, and conversely, those who are female and use SNS also increased by 2%. Conversely, wages fell by 18.2%.
Those of you who have read this far may be thinking, 'As expected, discrimination against women is this severe in Korean society.' On the other hand, there may be people who want to separate out whether their salary went down simply because they were women or because they used SNS. .
The corresponding calculation was performed in (3). Those who were not women but used SNS had their salaries increased by 13.8%, and those who were women and used SNS had their salaries increased only by 1.5%, while women's salaries were 13.5% lower. The conclusion is that being a woman and using SNS is a variable that does not have much meaning, while the variable of being given a low salary because of being a woman is a very significant variable.
At this time, a question may arise as to whether age is an important variable, and when age was added in (4), it was concluded that it was not a significant variable. The reason I used the square of age is because people around me who wanted to study ‘artificial intelligence’ raised questions about whether it would make a difference if they used the ‘artificial intelligence’ calculation method, and data such as SNS use and male/female are simply 0/ Because it is 1 data, the result cannot be changed regardless of the model used, while age is not a number divided into 0/1, so it is a variable added to verify whether there is a non-linear relationship between the explanatory variable and the result. This is because ‘artificial intelligence’ calculations are calculations that extract non-linear relationships as much as possible.
Even if we add the non-linear variable called the square of age above, it does not come out as a significant variable. In other words, age does not have a direct effect on salary either linearly or non-linearly.
Finally, when we added more friends in (5), we came to the conclusion that having a large number of friends only had an effect on lowering salary by 5%, and that simply using SNS did not affect salary.
Through the above step-by-step calculation, we can confirm that using SNS does not reduce salary, but that using SNS very hard and focusing more on friendships in the online world has a greater impact on salary reduction. It can also be confirmed that the proportion is only 5% of the total. In fact, the bigger problem is another aspect of the employment relationship expressed by gender.
Numerous one-variable analyzes encountered in everyday life
When I meet a friend in investment banking firms, I sometimes use the expression, ‘The U.S. Federal Reserve raised interest rates, thus stock prices plummeted,’ and when I meet a friend in the VC industry, I use the expression, ‘The VC industry is difficult these days because the number of fund-of-funds has decreased.’
On the one hand, this is true, because it is true that the central bank's interest rate hike and reduction in the supply of policy funds have a significant impact on stock prices and market contraction. However, on the other hand, it is not clear in the conversation how much of an impact it had and whether only the policy variables had a significant impact without other variables having any effect. It may not matter if it simply does not appear in conversations between friends, but if one-variable analysis is used in the same way among those who make policy decisions, it is no longer a simple problem. This is because assuming a simple causal relationship and finding a solution in a situation where numerous other factors must be taken into account, unexpected problems are bound to arise.
U.S. President Truman once said, “I hope someday I will meet a one-armed economist with only one hand.” This is because the economists hired as economic advisors always come up with an interpretation of event A with one hand, while at the same time coming up with an interpretation of way B and necessary policies with the other hand.
From a data science perspective, President Truman requested a one-variable analysis, and consulting economists provided at least a two-variable analysis. And not only does this happen with President Truman of the United States, but conversations with countless non-expert decision makers always involve concerns about delivering the second variable more easily while requesting a first variable solution in the same manner as above. Every time I experience such a reality, I wish the decision maker were smarter and able to take various variables into consideration, and I also think that if I were the decision maker, I would know more and be able to make more rational choices.
Risks of one-variable analysis
It was about two years ago. A new representative from an outsourcing company came and asked me to explain the previously supplied model one more time. The existing model was a graph model based on network theory, a model that explained how multiple words connected to one word were related to each other and how they were intertwined. It is a model that can be useful in understanding public opinion through keyword analysis and helping companies or organizations devise appropriate marketing strategies.
The new person in charge who was listening to the explanation of the model looked very displeased and expressed his dissatisfaction by asking to be informed by a single number whether the evaluation of their main keyword was good or bad. While there are not many words that can clearly capture such likes and dislikes, there are a variety of words that can be used by the person in charge to gauge the phenomenon based on related words, and there is information that can identify the relationship between the words and key keywords, so make use of them. He suggested an alternative.
He insisted until the end and asked me to tell him the number of variable 1, so if I throw away all the related words and look up swear words and praise words in the dictionary and apply them, I will not be able to use even 5% of the total data, and with less than that 5% of data, I explained that assessing likes and dislikes is a very crude calculation.
In fact, at that point, I already thought that this person was looking for an economist with only one hand and was not interested in data-based understanding at all, so I was eager to end the meeting quickly and organize the situation. I was quite shocked when I heard from someone who was with me that he had previously been in charge of data analysis at a very important organization.
Perhaps the work he did for 10 years was to convey to superiors the value of a one-variable organ that creates a simple information value divided into 'positive/negative'. Maybe he understood that the distinction between positive and negative was a crude analysis based on dictionary words, but he was very frustrated when he asked me to come to the same conclusion. In the end, I created a simple pie chart using positive and negative words from the dictionary, but the fact that people who analyze one variable like this have been working as data experts at major organizations for 1 years seems to show the reality in 'AI industry'. It was a painful experience. The world has changed a lot in 1 years, so I hope you can adapt to the changing times.
High accuracy with 'Yes/No' isn't always the best model
Picture
Member for
2 months 3 weeks
Real name
Keith Lee
입력
수정
With high variance, 0/1 hardly yields a decent model, let alone with new set of data What is known as 'interpretable' AI is no more than basic statistics 'AI'='Advanced'='Perfect' is nothing more than mis-perception, if not myth
5 years ago. Just not long after an introduction of simple 'artificial intelligence' learning material that uses data related to residential areas in the Boston area to calculate the price of a house or monthly rent using information such as room size and number of rooms was spread through social media. An institution that claims they do hard study in AI together with all kinds of backgrounds in data engineering and data analysis requested me to give a speeach about online targetting ad model with data science.
I was shocked for a moment to learn that such a low-level presentation meeting was being sponsored by a large, well-known company. I saw a SNS post saying that the data was put into various 'artificial intelligence' models, and that the model that fit the best was the 'deep learning' model. That guy showed it off and boasted that they had a group of people with great skills.
I was shocked for a moment to learn that such a low-level presentation meeting was being sponsored by a large, well-known company. I saw a SNS post saying that the data was put into various 'artificial intelligence' models, and that the model that fit the best was the 'deep learning' model. He showed them off and boasted that they had a group of people with great skills.
Back then and now, studies such as putting the models introduced in textbooks into the various calculation libraries provided by Python and finding out which calculation works best are treated as a simple code-run preview task rather than research. I was shocked, but since then, I have seen similar types of papers not only among engineering researchers, but also from medical researchers, and even from researchers in mass communication and sociology. This is one of the things that shows how shockingly the most degree programs in data science are run.
Just because it fits ‘yes/no’ data well doesn’t necessarily mean it’s a good model
The calculation task of matching dichotomous result values classified as 'yes/no' or '0/1' is robustness verification that determines whether the model can repeatedly fit well with similar data rather than the accuracy of the model on the given data. ) must be carried out.
In the field of machine learning, robustness verification as above is performed by separating 'test data' from 'training data'. Although this is not a wrong method, it has the limitation that it is limited to cases where the similarity of the data is continuously repeated. This is a calculation method.
To give an example to make it easier to understand, stock price data is known as data that typically loses similarity. Among the models created by extracting the past year's worth of data and using the data from 1 to 1 months as training data, it is applied to the data from 6 to 7 months. Even if you find the best-fitting model, it is very difficult to obtain the same level of accuracy in the following year or in past data. As a joke among professional researchers, the evaluation of a meaningless calculation is expressed in the following way: “It would be natural to be 12% correct, but it would make sense if the same level of accuracy was 0%.” However, in cases where the similarity is not repeated continuously, ‘ It will help you understand how meaningless a calculation it is to find a model that fits '0/0' well.
Information commonly used as an indicator of data similarity is periodicity, which is used in the analysis of frequency data, etc., and when expressed in high school level mathematics, there are functions such as 'Sine' and 'Cosine'. Unless the data repeats itself periodically in a similar way, you should not expect that you will be able to do it well with new external data just because you are good at distinguishing '0/1' in this verification data.
Such low-repeatability data is called ‘high noise data’ in the field of data science, and instead of using models such as deep learning, known as ‘artificial intelligence’, even at the cost of enormous computer calculation costs, general A linear regression model is used to explain relationships between data. In particular, if the distribution structure of the data is a distribution well known to researchers, such as normal distribution, Poisson distribution, beta distribution, etc., using a linear regression or similar formula-based model can achieve high accuracy without paying computational costs. This is knowledge that has been accepted as common sense in the statistical community since the 1930s, when the concept of regression analysis was established.
Be aware of different appropriate calculation methods for high- and low-variance data
The reason that many engineering researchers in Korea do not know this and mistakenly believe that they can obtain better conclusions by using an 'advanced' calculation method called 'deep learning' is that the data used in the engineering field is 'low-dispersion data' in the form of frequency. This is because, during the degree course, you do not learn how to handle highly distributed data.
In addition, as machine learning models are specialized models for identifying non-linear structures that repeatedly appear in low-variance data, the challenge of generalization beyond '0/1' accuracy is eliminated. For example, among the calculation methods that appear in machine learning textbooks, none of the calculation methods except 'logistic regression' can use the data distribution-based analysis method used for model verification in the statistical community. This is because the variance of the model cannot be calculated in the first place. Academic circles express this as saying that ‘1st moment’ models cannot be used for ‘1nd moment’-based verification. Variance and covariance are commonly known types of ‘second moment’.
Another big problem that arises from such 'first moment'-based calculations is that a reasonable explanation cannot be given for the correlation between each variable.
The above equation is a simple regression equation created to determine how much college GPA (UGPA) is influenced by high school GPA (HGPA), CSAT scores (SAT), and attendance (SK). Putting aside the problems between each variable and assuming that the above equation was calculated reasonably, it can be confirmed that high school GPA influences as much as 41.2% in determining undergraduate GPA, while CSAT scores only influence 15%. there is.
As a result, machine learning calculations based on 'first moment' only focus on how well college grades are matched, and additional model transformation is required to check how much influence each variable has. There are times when you have to give up completely. Even verification of statistics based on 'second moment', which can be performed to verify the accuracy of the calculation, is impossible. If you follow the statistical verification based on the Student-t distribution learned in high school, you can see that 1% and 2% in the above model are both reasonable figures, but machine learning series calculations use similar statistics. Verification is impossible.
Why the expression ‘interpretable artificial intelligence’ appears
You may have seen the expression ‘Interpretable artificial intelligence’ appearing frequently in the media, bookstores, etc. The problem that arises because machine learning models have the blind spot of transmitting only the ‘first moment’ value is that interpretation is impossible. As seen in the above example, it cannot provide reliable answers at the level of existing statistical methodologies to questions such as how deep the relationship between variables is, whether the value of the relationship can be trusted, and whether it appears similarly in new data. Because.
If we go back to a data group supported by a large company that created a website with the title ‘How much Boston house price data have you used?’, if there was even one person among them who knew that models based on machine learning series had the above problems, Could they have confidently said on social media that they have used several models and found 'deep learning' to be the best among them, and sent me an email saying they are experts because they can run the code to that extent?
As we all know, real estate prices are greatly influenced by government policies, as well as the surrounding educational environment and transportation accessibility. Not only is this the case in Korea, but based on my experience living abroad, the situation is not much different in major overseas cities. If I were to be specific, the brand of the apartment seems to be a more influential variable due to its Korean characteristics.
The size of the house, the number of rooms, etc. are meaningful only when other conditions are the same, and other important variables include whether the windows face south, southeast, southwest, plate type, etc. Data on house prices in Boston that were circulating on the Internet at the time were All such core data had disappeared, and it was simply example data that could be used to check whether the code was running well.
If you use artificial intelligence, wouldn't accuracy be 99% or 100% possible?
Another expression I often heard was, “Even if you can’t improve accuracy with statistics, isn’t it possible to achieve 99% or 100% accuracy using artificial intelligence?” Perhaps the ‘artificial intelligence’ that the questioner meant at the time was general. It would have been known as 'deep learning' or 'neural network' models of the same series.
First of all, the model explanatory power of the simple regression analysis above is 45.8%. You can check that the R-squared value above is .458. The question would have been whether this model could be raised to 99% or 100% by using other ‘complex’ and ‘artificial intelligence’ models. The above data is a calculation to determine how much the change in monthly rent in the area near the school is related to population change, change in income per household, and change in the proportion of students. As explained above, knowing that the price of real estate is affected by numerous variables, including government policy, education, and transportation, it is understood that the only surefire way to fit the model with 100% accuracy is to match the monthly rent by monthly rent. It will be. Isn’t finding X by inserting X something that anyone can do?
Other than that, I think there is no need for further explanation as it is common sense that it is impossible to perfectly match the numerous variables that affect monthly rent decisions in a simple way. The area where 99% or 100% accuracy can even be attempted is not social science data, but data that repeatedly produces standardized results in the laboratory, or, to use the expression used above, 'low-variance data'. Typical examples are language data that requires writing sentences that match the grammar, image data that excludes bizarre pictures, and games like Go that require strategies based on rules. Although it is natural that it is impossible to match 99% or 100% of the highly distributed data we encounter in daily life, at one time the basic requirements for all artificial intelligence projects commissioned by the government were 'must use deep learning' and 'must have 100% accuracy.' It was to show '.
Returning to the above equation, we can see that the student population growth rate and the overall population growth rate do not have a significant impact on the monthly rent increase rate, while the income growth rate has a very large impact of up to 50% on the monthly rent increase. In addition, when the overall population growth rate is verified by statistics based on the Student-t distribution learned in high school, the statistic is only about 1.65, so the hypothesis that it is not different from 0 cannot be rejected, so it is a statistically insignificant variable. The conclusion is: Next, the student population growth rate is different from 0, so it can be determined that it is a significant value, but it can be confirmed that it actually has a very small effect of 0.56% on the monthly rent growth rate.
The above computational interpretation is, in principle, impossible using 'artificial intelligence' calculations known as 'deep learning', and a similar analysis requires enormous computational costs and advanced data science research methods. Paying such a large computational cost does not mean that the explanatory power, which was only 45.8%, can be greatly increased. Since the data has already been changed to logarithmic values and only focuses on the rate of change, the non-linear relationship in the data is internalized in a simple regression model. It is done.
Due to a misunderstanding of the model known as 'deep learning', industries made a shameful mistake of paying a very high learning cost and pouring manpower and resources into the wrong research. Based on the simple regression analysis-based example above, ' We hope to recognize the limitations of the computational method known as 'artificial intelligence' and not make the same mistakes as researchers over the past six years.
[email protected]
세상에 알려야 할 수많은 이야기 가운데 독자와 소통할 수 있는 소식을 전하겠습니다. 정보는 물론 재미와 인사이트까지 골고루 갖춘 균형 잡힌 기사로 전달하겠습니다.
입력
수정
중소기업 10곳 중 7곳은 올해 신규 인력 채용 계획이 있는 것으로 조사됐다. 지난해 조사와 비교할 때 채용을 고려하는 기업 비율은 소폭 줄었지만, 평균 채용계획 인원은 오히려 늘었다. 특히 제조업 생산직에서 인력 수요가 가장 높았던 것으로 나타나며 팬데믹 이후 중소기업 고용시장 내부에서도 양극화 현상이 나타나고 있는 것으로 보인다.
지난해 대비 채용계획 기업은 5.6% 하락
중소기업중앙회가 지난 4월 ‘참 괜찮은 중소기업’ 플랫폼에 등재된 중소기업 1,031개사를 대상으로 실시한 ‘2023년도 채용동향조사’ 결과를 14일 발표했다. 조사결과 응답기업의 71.0%가 신규 인력 채용을 계획하고 있다고 답했다. 지난해 같은 조사에선 응답기업의 76.6%가 채용계획이 있다고 답한 것과 비교할 때 채용을 고려하는 기업의 비율이 소폭 줄어든 셈이다.
그러나 기업당 평균 채용인원은 상반된 양상을 나타냈다. 올해 채용 규모는 평균 6.6명으로 지난해 4.3명보다 2.3명이나 더 늘었다. 채용 규모를 확대한다는 응답(27.4%)도 규모를 축소한다는 응답(9.7%)보다 높았다. 지난해와 유사한 수준이라는 응답은 62.9%였다.
한편 채용계획이 있는 기업 가운데 37.6%가 경력직을 선호했고, 별도 자격을 요구하지 않는다고 응답한 비율도 41.4%로 높았다. 아울러 올해 인력운용현황에 대한 설문에서는 과반수(55.7%)의 중소기업이 인력 상황이 적정하다고 응답했다. 필요인원 대비 재직인원 비율은 평균 90.9%로 전년 대비 8%p 증가했으며, 필요인원의 ‘100%’ 이상을 채용한 기업 역시 49.9%로 지난해(29.3%)보다 증가했다. 이는 코로나19 방역조치가 완화됨에 따라 고용 상황이 점차 회복되는 것으로 풀이된다.
팬데믹 이후 회복되는 고용시장에 나타난 양극화 현상
고용시장 전반이 회복되고 있지만, 중소기업계 내부에서도 양극화 현상이 깊어지고 있다. 특히 기업 규모가 클수록 신규 직원을 채용하는 현상이 두드러졌다.
채용 계획이 있는 기업별 규모를 따졌을 때 300인 이상 기업이 82.6%로 가장 많았다. 이어 △100~299인 82.6% △50~99인 74.4% △10~49인 67.4% △10인 미만 52.6% 순으로 신규 직원 채용 계획이 있다고 답했다. 지난해 조사에서 기업 규모와 상관없이 신규 채용계획을 가진 기업 비율이 모두 70%대를 넘어섰던 것과 대조적이다.
특히 직무별로 살펴보면 생산직의 채용 계획이 44.7%로 가장 높았다. 팬데믹에 따라 고용 규모를 대폭 축소했던 제조업 중심으로 활발히 채용이 이뤄지고 있는 것으로 보인다. 그 뒤로는 연구개발·생산관리(32.8%), 기타(20.8%), 국내외영업·마케팅(20.1%) 순으로 높았다.
한편 정부와 지자체의 청년 취업 지원 정책 등이 중소기업의 신규 채용규모 확대에 영향을 줬다는 분석도 나온다. 실제 정부는 올해 청년 지원 제도 전반을 정비하며 청년들의 고용 확대를 위한 ‘청년 일자리 지원 제도’를 확대했다. 이 가운데 청년들의 취업 촉진을 위해 취업 수당과 인센티브를 지급하는 ‘청년도전지원사업’과 기업들의 청년 고용 확대를 유도하기 위한 ‘청년일자리도약장려금 제도’ 등이 대표적인 정책으로 꼽힌다.
청년 실업률 개선되고 있지만 ‘불안정한 일자리’ 위주
올해 들어 청년 실업률의 개선세가 두드러지고 있다. 지난달 2일 통계청에 따르면 올해 1분기 만 15∼29살 청년 실업률은 6.7%(청년 경제활동인구 417만 명 중 실업자 27만9천명)다. 이는 1999년 6월 이래 역대 1분기 가운데 가장 낮은 수치로, 코로나19 기간인 2021년 이후 매 분기 개선되고 있다.
다만 이 같은 개선세와 달리 청년들의 고용 안전성을 오히려 낮아지고 있다. 청년 취업자의 산업별 취업 분포를 살펴보면 ‘숙박 및 음식점업’이 올 1분기 청년 취업자 수 증가세의 높은 비중을 차지했다. 지난해 3월 기준 청년 취업자수가 55만3천명이었지만 올해 3월에는 64만3천명으로 9만 명이나 늘었다. 반면 상대적으로 양질의 일자리로 꼽히는 제조업과 도매 및 소매업은 지난해 3월보다 각각 5만 명, 7만6천명 줄었다.
근로 계약기간을 살펴봐도 일자리 질이 나빠지고 있음이 드러난다. 올해 3월 근로 계약 기간이 1년 이상인 청년층 상용 근로자(249만3천명)는 지난해보다 4만5천명 감소한 반면, 계약 기간 1개월 이상∼1년 미만인 청년 임시직(106만8천명)과 계약 기간 1개월 미만인 청년 일용직(13만8천명)은 각각 1만3천명, 1만 명 늘어났다.
나아가 실업자 통계에 포함되지 않는 ‘그냥 쉬는 청년’도 급증하고 있다. 올해 1분기 자신의 활동 상태를 ‘쉬었음’이라고 답한 청년 수는 전년 동기 대비 5.1% 늘어난 45만5천명으로 1분기 기준 역대 최대치를 기록했다. 한국노동연구원 관계자는 “과거 쉬는 인구에는 정년퇴직이나 건강상의 이유를 가진 고령층 비중이 높았지만, 현재는 청년 비중이 급증하는 추세”라며 “청년들의 일자리 질을 개선하기 위한 정책과 고용 정책에 더욱 적극적인 지원이 필요할 것으로 보인다”고 강조했다.
Picture
Member for
2 months 3 weeks
Real name
한세호
Position
기자
Bio
[email protected]
세상에 알려야 할 수많은 이야기 가운데 독자와 소통할 수 있는 소식을 전하겠습니다. 정보는 물론 재미와 인사이트까지 골고루 갖춘 균형 잡힌 기사로 전달하겠습니다.
The Economy Korea 뉴스 포털은 파이낸셜, 테크, 바이오, 폴리시 이코노미의 한국 내 총괄 서비스입니다. 글로벌 본사인 The Economy는 AI/Data Science 기반 경제 분석 기관으로 글로벌 AI협회(Global Institute of Artificial Intelligence, GIAI)와 글로벌 교육 전문지 EduTimes가 각각 연구 부분과 언론 매체 운영을 분담하고 있습니다.
연구 사업으로는 경제 정책 분석, 분야별 기업 랭킹 발표, AI/Data Science 활용 연구 등이 있고, 언론 홍보 목적에서 시작된 언론 매체는 영문 콘텐츠의 타국어 번역 정확도를 향상시키기 위한 연구를 진행 중입니다.
한국어 판은 GIAI의 한국 자회사 (GIAI Korea, https://kr.giai.org)에서 글로벌 서비스와 콘텐츠 및 기술 제휴 아래 운영됩니다.
국내 운영 언론사들의 기사가 작성되는 방식은 다음과 같습니다
1.기초 소스 확보
취재를 나갈 수도 있겠지만, 요즘은 보도자료를 뿌리는 경우가 많습니다. 그러나 대부분의 보도자료는 자기들이 보여주고 싶은 부분만 보여줍니다. 정부의 정책브리핑에서 예시를 하나 갖고 왔습니다.
2.보도자료에 대한 의구심
이건 한국 벤처업계가 유니콘 기업 22개나 만들었다고 엄청나게 자랑하는 보도자료인데, 우리나라에 있는 유니콘 기업들 중에 기술력이 있거나, 남들이 하지 않은 도전을 해서 성공한 덕분에 시장에서 정말 유니콘 대접을 받는 스타트업들은 거의 없습니다.
보도자료 요약 ㄴ어제(9일) 중기부가 유니콘 기업이 22개라고 현황자료 발표했는데, 내실이 전혀 없습니다 그걸 까 봅시다.
Talking Point 1.리스트에 있는 회사들 논란 많음 ㄴ옐로 모바일은 사실상 망한 회사입니다. 대표였던 이상혁은 제주도 어딘가에 몰래 숨어서 산다는 소문이 있습니다 ㄴ티몬도 2천억원 남짓에 그것도 현금도 아니고 지분 교환 방식으로 작년 9월에 큐텐에 헐값 매각 됐습니다 ㄴ쏘카는 IPO로 졸업했다는데, 어제 주가 기준 시총이 7,026억원에 불과합니다. 1조원 클럽인 유니콘 조건에 거리가 멀죠 ㄴ올 초에 상장 예정인 오아시스도 서울거래 비상장에서 현재 가치가 6,989억원입니다.언급된 회사들은 서울거래 비상장 들어가서 검색해서 스크린 샷을 좀 추가해놓읍시다 일단 오아시스 하나 추가해놨습니다
3.중기부가 저렇게 과대평가된 걸 더 홍보해주고 돌아다니는게 아니라, 거꾸로 구조개혁해서 합리적인 평가가 이뤄지도록 시장 개선에 도움을 줘야 함 - 노동 개혁, 정부 개혁 어쩌고 그러는데, 정작 스타트업계 개혁도 필수 ㄴhttps://www.sedaily.com/NewsView/1Z451UBMWF 상장 후에 주가 부진한게 이미 한 두번이 아님. 카카오 그룹 계열사들, 크래프톤, 쏘카 등등등 잘못된 밸류에이션으로 개인 투자자들 농락하지 못하도록 시장 규제 만드는데 중기부가 앞장서도 시원잖을 판국에 거꾸로 가짜 밸류에이션을 홍보해주고 있으니 ㅉㅉ
필요한 경우에는 이미지도 제작해야 됩니다. 물론 직접 이미지 작업까지 다 하라는게 아니라, 디자인 담당자가 배정되어 있습니다.
위와 같이 디자인 팀에 적절한 이미지를 요청합니다. 제대로 잘 되었다면 아래와 같이 적절히 작성된 이미지가 들어간 기사가 나옵니다
6.추가 편집
아무리 열심히 기사를 썼어도 오탈자가 있거나 이미지에 문제가 있거나 등등으로 사소한 문제가 생길 수 있습니다. 그럼 편집 팀이 작업을 진행합니다. 뿐만 아니라, 사실 관계에 문제가 있을 경우 '팩트 체크'까지 진행합니다.
인력 뽑아본 후기
저렇게 Talking Point 뽑고 설명을 포함한 관련 기사를 뽑는 작업이 귀찮은 것이 사실입니다. 무슨 학창 시절에 레포트 급하게 하나 써서 내는 기분인데, 대학을 무사히 졸업하신 분들이라면 저런 자료 조사 정도는 직접 할 수 있어야 되는 것 아닌가요? 뽑는데 빠르면 5분, 꼼꼼하게 하면 20분 정도 걸리는데, 실제로 20분이면 전문기자들이나 증권사 리서치 애널리스트들이 기사, 보고서를 하나 쓸 수 있는 시간입니다. 이렇게까지 친절하게 뭘 써야하는지 설명을 해 줄 필요가 있나, 월급 아깝고 Talking Point 뽑는 시간 아까운데.. 라는 생각을 하지 않을 수 없습니다. 그럼에도 불구하고, 어떤 사건에 대해 무슨 자료를 찾아보고 어떤 방식으로 생각을 가다듬어야 한다는 방향 설정을 해 줘야 인력을 키울 수 있다고 생각해서, 잘 써봐야 기사가 아니라 소설 밖에 못 쓰던 인력들을 내보내면서 한국 자회사 운영방식을 변경했습니다.
그렇게 일반 기자들을 내보내고, 기사 작성 시스템을 바꾸면서, '설마 이 정도는 다들 할 수 있겠지'라고 생각하고 인재를 뽑아봤습니다. 안타깝게도, 이 정도 요청을 정상적인 신문 기사로 만들어 낼 수 있는 인력도 찾기가 쉽지 않았습니다.
(2022년 12월 기준) 88명 서류 받으면서 당사에서 운영 중인 언론사들 명칭을 지원서에 쓰라고 했더니, 절반 이상이 틀렸습니다. 한 60대 아저씨는 그게 무슨 말인지 모르겠다고 전화까지 왔습니다. 전직 기자 경력 20년이라는 분입니다. 홈페이지 하단에 언론사 명칭이 있는게 당연한 경험들이 오랫동안 쌓이셨을텐데.... 지원하는 회사가 운영하는 언론사 명칭도 못 찾아보면 어떻게 일을 하겠다는거죠?
저렇게 뽑아서 공유한 Talking Point를 이메일로 보내줬더니 실제로 기사를 써서 내는 경우가 13명이었습니다. 대부분 충격적으로 문장 구성이 조잡했는데, 그래도 좀 가르쳐서라도 쓸 수 있겠지라고 양보하고 뽑아보니 5명이 남았습니다. 2일간 교육 자료 읽어보라고 PDF 설명서 파일도 주고, 웹 상에서 볼 수 있도록 OneNote 링크도 보내주고, 공지와 직원 간 대화를 찾아볼 수 있는 저희 회사 내부 게시판도 열어줬습니다. 읽어보면서 찬찬히 준비하라고.
업무를 시작한 첫째 날부터 기사 편집할 일이 넘쳐난다고 갑자기 편집 팀에서 화를 냅니다. 기본적인 문장 구성도 못 하길래 도대체 어떻게 서류 통과한거지 궁금해하며 1명씩 내보내고 나니 1주일도 되기 전에 딱 3명 남았습니다.
제시해 준 Talking Point를 바탕으로 실제로 읽기에 불편하지 않은 글을 적당한 시간 안에 뽑아올 수 있는 경우는 평소에도 위의 3/88 = 3.41% 정도에 지나지 않았습니다. 이 정도가 한국 사회에서 '글 밥'을 먹고 싶다는 분들의 현 주소입니다. 저희가 쓰는 기사라는 글이 기껏해야 1-2장짜리 문과 교양 수업 레포트에 불과한데, 이걸 못하면서 글로 돈을 벌겠다는 생각을 하는게 좀 납득하기가 어려웠습니다.
떨어지신 분들 중에는 이름이 알려진 굴지의 국내 신문사 출신이신 분들도 있습니다. 신문사 아니고 증권사 리서치 같은 기관이냐고 질문하신 모 신문사 출신 기자 분도 있었군요. 국내 신문사들 대부분이 이렇게 자료 조사하는 일 없이, 기업에서 보내주는 보도자료 적당히 베껴 쓰고, 부족하면 그 회사에 '출입처'라는 걸 두고 전화해서 전해들은 내용을 쓴다더군요. 그게 우리나라 신문사들의 '기자'라는 분들이 일하는 방식이었습니다.
발로 취재? 구글링으로 취재도 제대로 못하는데 어떻게 기자라고 할 수 있겠습니까?
어떤 조직의 구성원이라는 사실이 자랑스러우려면 그 조직이 역량 측면에서 글로벌 최상위권 조직이어야 할 겁니다. 역량 측면에서 글로벌 최상위권 조직이라는 인정을 받으려면 만들어내는 상품이 글로벌 최상위권 수준이어야 합니다. 지식 상품으로 글로벌 최상위권 상품을 만들어 내는 방법은 크게 2가지 입니다. 노벨상을 도전해볼만한 연구 논문처럼 천재들만 도전할 수 있고, 천재가 아니면 기적이 일어나야 고급 논문을 쓰는 방식이 그 중 하나입니다. 다른 하나는 매우 뛰어나지는 않지만 열정과 능력을 갖춘 인재들이 자신들만의 강점을 협업과 분업으로 결합해서 1명의 천재가 만들어낸 것과 유사한 수준의 고급 콘텐츠를 만들어내는 것입니다. 협업과 분업으로 노벨상은 버겁겠지만, 기업의 고급 제품을 만들어내는 것 정도는 충분히 가능하다는 것이 이미 산업화가 시작된 1700년대부터 인류에게 상식이 되어 있습니다.
고작 문과 교양 수업 레포트 정도의 업무를 하면서 글로벌 최상위권 상품을 목표로 해야할 이유도 없고, 천재가 투입되어야 할 이유도 없습니다. 저희는 2번째 방법으로 협업과 분업을 통해 콘텐츠의 수준을 높이는 것을 목표로 돌아가는 조직입니다. Talking Point라는 이름으로 기사 방향도 상세하게 뽑아주고, 그래픽 작업을 위한 디자인 팀도 있고, 기사 편집도, 심지어 팩트 체크도 돌아갑니다. 글 작성자가 편하게 글을 쓸 수 있는 IT시스템도 개발했고, 웹사이트 디자인의 완성도도 대단히 높은 편입니다. 구글 페이지 스피드(https://pagespeed.web.dev)에서 저희 웹사이트와 국내 1등 IT기업들인 네이버/다음 홈페이지들의 점수를 비교해보시면 저희가 웹사이트 완성도를 얼마나 높여놨는지 눈으로 확인하실 수 있을 겁니다.
지난 몇 년간의 시행착오 끝에 완성도 높은 '기사'라는 상품을 대량 생산해 낼 수 있는 생산 공정을 완성했습니다. 남은 빈 칸은 그런 지원을 묶어 '고급 기사'라는 글을 써 내는 일입니다. 그렇게 남은 빈 칸을 채워서 고급 기사를 만들어 낼 수 있는 역량을 갖춘 분, 그 과정에서 짜릿한 성취감을 느끼고 싶은 분들과 함께 하고 싶습니다.
(2024년 7월 추가) 자체 기사 작성과 외부의 전문 콘텐츠 번역 기사 업무로 공고를 올렸습니다. 1주일 동안 합계 33개의 지원서를 받았는데, 공고 안에 꼭 제출해라고 명시해놓은 과제를 제출한 경우는 불과 5명입니다. 번역은 경제지 관련해서 상당한 전문성을 갖춘 분이 아니면 어려울 것이라고 공고 안에 명시를 했는데, 지원자만 많고, 과제는 거의 제출을 안 했습니다. 기사 쓰는 건 어렵고, 번역이 만만하다고 느껴졌나본데, 정작 공고는 꼼꼼하게 읽지 않았다는 뜻이겠죠.
과제를 제출하신 분들은 그 자체만으로 이미 몇 발 앞선 분들이라 어지간하면 뽑고 싶습니다만, 내용 이해는 둘째 문제고, 한글 문장 자체가 어색한 과제들만 받았습니다. 일부 공고는 사전 질문을 몇 개 추가해서, 그 질문에 적절한 답을 해야 지원서를 확인하겠다고 했는데도 불구하고 제대로 읽어보지도 않고 그냥 지원서를 던지는 경우도 많았습니다. 사전 질문이라는 중간 단계를 넣을 수 없는 공고에는 과제 제출 비중이 1/10 이하로 떨어집니다.
위의 정보에서 3가지 행동 양식을 확인할 수 있습니다.
공고의 제목만 보고, 상세 내용을 전혀 읽지 않는 지원자들이 굉장히 많다
읽긴 했지만 제대로 읽지 않는 지원자들이라 저희 기사들을 한번 정도는 읽어보고 난이도를 가늠하는 시도조차 제대로 안 했을 것이다
사전 질문에 제대로 된 답을 못 하면 고생해서 작업한 과제를 봐 주지 않겠다는 공고를 무시할만큼 자신감이 넘쳤다
과제 제출하신 5명 중 1명 정도가 읽다가 화가 나지 않을 수 있는 최소한의 요건을 갖췄습니다. 이 분도 기사라는 글을 쓸려면 많은 공부를 해야할텐데, 내부 시스템을 둘러보며 최종 심사 단계 전에 준비하시는 걸 보면서 쉽지 않겠다는 생각을 하게 됐습니다. 어디에서 어떻게 찾아서 확인해야 된다는 걸 잘 정리해놨는데, 찾질 못하기 때문에 중간에 계속 브레이크가 걸리는 것이 눈에 보이기 때문입니다. 기사를 쓸려면 많은 글을 빠르게 읽고 이해해야 할 텐데, 그런 글들에서 핵심 정보들을 바로바로 찾아내야 할텐데, 과연 살아남으실 수 있을까요?
웹 디자인을 하면 직관적으로 이해할 수 있도록 매우 쉽게 웹사이트를 구성해야하고, '바보'가 와서 실수하는 사건들을 역추적하는 QA라는 작업을 최소화하기 위해 많은 고민을 담습니다. 그렇게 디자인을 해도 결국 QA에 상당한 비용을 쓰지만, 아예 읽지 않고, 보지 않고, 듣지 않는 사람들은 배제합니다. 듣지 않는 사람들과는 토론하지 않는 것과 같은 맥락이죠. 글을 써서 돈을 벌겠다는 분이 전문 작가 수준으로 글을 잘 쓰지도 못하시면서 글을 읽지도 않으면 과연 성장할 수 있을까요? 글을 잘 쓰는 첫 걸음은 좋은 문장을 많이 읽는거라는 다독, 다작, 다상량의 3다(多) 이론을 굳이 언급할 필요는 없을 겁니다.
초A급 기자가 아니면 쓸 수 없는 기사를 쓰라고 강요한다며 기자 출신들이 불평을 하다가 회사를 떠났습니다. 남들과 다를 바 없는 기사를 쓰는 조직을 키울 생각이 없는 만큼, 아니 그렇게는 조직이 크지 못할 것을 아는 만큼, 기사 수준을 끌어올리기 위해 많은 고민을 하다 지금의 분업 시스템을 구축했습니다. Talking Point는 국내 극초최상위 0.01%의 인재가 뽑아야겠지만, 글로 옮기는 기자들은 화려한 스펙의 소유자들이 아닙니다. 그럼에도 불구하고 국내 기업 관계자들을 만나면 '연구소인 것 같다', '인력 수준이 엄청 높을 것 같다'는 칭찬 아닌 칭찬을 자주 듣습니다. 분업 전에는 3류 찌라시 취급을 받다가, 그 분들의 태도가 180도 바뀐 것을 확인하면서 겨우 한 걸음 내디뎠구나는 생각을 합니다. 글로벌 본사가 AI 연구소, 경제 연구소인데, 체면은 유지시켜줬구나 싶어서 안도의 한숨도 내쉬기도 하는군요.
영어권에도 공고 안에 특정 단어, 문장, 표현을 웹사이트 어딘가에서 찾아서 지원해야된다고 해 놓으면 인도, 아랍 쪽 지원자들 1/10 미만에게서 답을 확인할 수 있습니다. 영어가 모국어가 아니어서 그럴 수도 있다고 반박하겠지만, 영어가 모국어가 아닌 국가들 중에 필리핀, 대만, 아프리카 몇몇 국가에서 거의 예외없이 지원자들이 정답을 제출합니다. 국가 별로 문장을 읽고 이해하는 교육 수준이 다른 것이 지원자들의 행동 양식에도 반영된 것일 겁니다.
한국은 위에 언급한 국가들 대비 급여 수준이 적게는 4~5배, 많게는 10배 이상 높습니다. 분업 시스템이 갖춰져 있어 업무 난이도도 낮은 편입니다. 글로벌 팀이 효율적인 시스템이라고 판단했는지 저희 한국 시스템을 벤치마킹하려고 많은 노력을 하고 있습니다. 그런데, 한국 실상 탓에 채용과 운영을 이렇게 타협할 수밖에 없었다고 설명해주면 많이들 놀랍니다. 한국은 글로벌 시장에서 가장 교육열이 높은 나라, 인구 대비 가장 인재가 많은 나라라는 선입견이 깔려 있었기 때문일 겁니다. 그들의 선입견과 여러분들의 지원 자세 간의 격차가 얼마나 큰 지 한번 돌이켜 보고 나면, 굳이 저희 회사가 아니더라도 여러분들의 눈높이에 맞는 직장을 찾아가시는데 많은 도움이 되리라 생각합니다.
조금 더 배경을 추가하면, 대학들은 교육용 재산과 수익용 재산을 보유해야 하고, 교육용 재산은 애들 가르치는데 쓰이는 건물, 운동장, 도서관 같은 것들, 수익용 재산은 재단 운영비를 애들 등록금에서 충당하지 말고 너네 수익성 재산으로 충당해라, 등록금은 오직 애들 교육 목적으로만 써야 한다는 이유에서 구분이 되어 있습니다.근데, 우리나라 대학들 중에 수익용 재산을 교육부 요건대로 갖고 있는 곳들이 거의 없습니다.
4년제 대학은 300억인데, 몇 군데가 그걸 갖고 있으려나요? 심지어 대부분은 어디 산골에 있는 산비탈 같은거에요. 재산상의 의미가 없는 것들이 대부분이죠. 인제대학교도 서울 도심 한 가운데에 있는 땅에서 수익도 안 나오는 병원을 계속 갖고 있을 이유가 없으니까 저렇게 정리해버리고 수익용 재산으로 변경하겠다는건데 (실제 속셈은 잘 모르겠습니다만...)
2.서울시가 도시계획시설로 지정하겠다 ㄴ좀 전에 나온 서울시 보도자료 내용 입니다 (파일 다운로드 참조) ㄴ읽어보면 알겠지만, 수익용 재산으로 못 바꾸도록 도시계획시설 -(https://m.blog.naver.com/seog11111/221373718301) 중 보건위생시설로 강제로 지정해버리겠다는겁니다. ㄴ이렇게 지정되면 빼박 무조건 여기서 병원해야지 다른 사업을 할 수가 없게 됩니다. (저희 회사도 지방에 있는 땅이 흑흑흑) ㄴ백병원 너네 사업 접겠다고? 어쭈? 엿 먹어라~ 이거죠