Skip to main content

Class 6. Recurrent Neural Network

Class 6. Recurrent Neural Network
Picture

Member for

5 months 3 weeks
Real name
Keith Lee
Bio
Head of GIAI Korea
Professor of AI/Data Science @ SIAI

Recurrent Neural Network (RNN) is a neural network model that uses repeated processes with certain conditions. The conditions are often termed as 'memory', and depending on the validity and reliance of the memory, there can be infinitely different variations of RNN. However, whatever the underlying data structure it can fit, the RNN model is simply an non-linear & multivariable extension of Kalman filter.

Given that NN models are just an extension of Factor Analysis for non-linear & multivariable cases with network structure, Kalman filter to RNN follows the same logic. The Kalman filter process updates previous state's variables after an observation and potential errors. Say, one can predict a car to move from position A to B. But in reality, you maybe able to find the car in position D. The error, e=(D-B), should be used to fix the model's next stage prediction. Assume that you give 50% weight to the error correction, because the error is not always that large. Then, the updated model will give the expected position C=B+0.5e. In other words, in every stage, previous stage's error helps correction so that we can hopefully minimize errors in later stages.

COM503 Class6 p4

Then, we can see two aspects of RNN is just another combination of traditional stat models. From autogressive processes, we can see that memory is preserved. It does not mean that memory is completely preserved (like LSTM, a variation of RNN), but ARMA keeps memory to some distant future.

The error correction part by the state variable is similar to Kalman filter. RNN uses the state variable whether to turn on memory or skip it. When it turns on memory, depending on the choice of weights, the model reflects proportional amount of memory with some correction by the new input. In Kalman, it is called weighted error correction, like we end up with C, instead of B or D. In RNN, it is just feed forward and back propagation.

COM503 Class6 p2
COM503 Class6 p2

The reason RNN can perform superior to many time series processes with memory is due to its power to fit to non-linear and multivariable cases. Below is an equational form comparison between Kalman filter and RNN that illustrates the spectacular similarity in functional forms, except non-linear transformation like activation function.

Although one can construct a non-linear Kalman filter and even include more than one variable, but then VAR(Vector AutoRegressive) models are required with fixed functional form. RNN, on the other hand, relies more on data. Though the dependency to data creates similar problems that we have seen with other regular NN models, additional computational costs for certain data processes can be decently compensated for the better fit and flexibility.

COM503 Class6 p12
Picture

Member for

5 months 3 weeks
Real name
Keith Lee
Bio
Head of GIAI Korea
Professor of AI/Data Science @ SIAI

Class 5. Image recognition

Class 5. Image recognition
Picture

Member for

5 months 3 weeks
Real name
Keith Lee
Bio
Head of GIAI Korea
Professor of AI/Data Science @ SIAI

As shown by RBM's autoencoder versions, if the neural network is well-designed, it can perform better than PCA in general when it comes to finding hidden factors. This is where image recognition relies on neural network.

Images are first converted to a matrix with RGB color value, (R, G, B), entries. The matrix is for the coordination of each RGB color value. If three dimensional image, then you need a tensor with RGB color value entries. In below sample, I took the average value of RGB entries for better understanding. In fact, if it is black/white, you only need the matrix, because (R, G, B) can be translated to a single 0~1 value.

COM503 Class5 p2

When a modeler tries to feed the image data to the model, it relies on sliding for various reasons. The slider is also known as a filter, which is widely used for photo filtering. SNS images, for example, are frequently modified. The filter you can find in the photo app or an SNS app is basically the same as the slider that we use in image recognition.

COM503 Class5 p3

Depending on your choice of the filter, the output matrix becomes different. It may help you to have black/white effect or sharpening. There are thousands of different filters.

One of the key reasons that we rely on the slider in image recognition is, after the sliding, the feed data size becomes smaller. The higher resolution the image is, the more data the Neural Network has to process. Given that Neural Network model is known as one of the the most computational cost consuming method, it is strictly preferred to reduce the image data size. One does not want to lose the important features of the data, just like we use PCA. This is why the choice of filter is a key component of the successful image recognition. For some images, filter A can perform magnificently better than filter B.

COM503 Class5 p7

Another technique that we rely on for image recognition is convolutioning. From the slided, or scanned, image data, fully connected neural network forces us to find weights for all links. The convolution helps us to avoid such overlaps, which can further minimize computational costs.

COM503 Class5 p14

To help you to understand that CNN based image recognition model is still an extension of factor analysis, let's talk little bit of Generative Adversarial Networks (GAN).

The model captures a couple of latent features of the image at the beginning. Just like the first stage of autoencoder, the choise of latent nodes is a key to build a winning model in accuracy and speed. From the latent components, the CNN's convolutioning helps us to further speed up finding weights. The GAN model itself is just another variation of MCMC sampling. You just create a lot of images with some random error from the same latent space. The error imposed images are considered fake images. Your discreminator is supposed to exclude the fakes. It then becomes a simple classification problem. By the artificially created fake image data, the model can learn from more number of data. As discussed in COM501: Scientific Programming, the simulation can help us to find the true population in that $I_M \rightarrow I$ as $M \rightarrow \infty$.

Picture

Member for

5 months 3 weeks
Real name
Keith Lee
Bio
Head of GIAI Korea
Professor of AI/Data Science @ SIAI

Class 4. Boltzmann machine

Class 4. Boltzmann machine
Picture

Member for

5 months 3 weeks
Real name
Keith Lee
Bio
Head of GIAI Korea
Professor of AI/Data Science @ SIAI

Constructing an Autoencoder model looks like an art, if not computationally heavy work. A lot of non-trained data engineers rely on coding libraries and a graphics card (that supports 'AI' computation), and hoping the computer to find an ideal Neural Network. As discussed in previous section, the process is highly exposed to overfitting, local maxima, and humongous computational cost. There must be more elegant, more reasonable, and more scientific way to do so.

Let's think about a multi-layered neural network model. From the eyes of Factor Analysis, each layer is one round of factor analysis. The factor analysis is in fact a re-contruction of vector space, as was discussed in PCA. In sum, the multi-layered neural network model is a series of re-constructions in vector space. What is the re-construction doing? By PCA, it orthogonalizes data's dimension. Factor analysis in general changes the vector space's key axes. In statistical terms, it is a transformation of density functions from one to another. Both processes preserve the data's hidden information. The vector space as a whole is the same. Only axes are different. Density functions are different, but information in the data set is still the same.

COM503 Class4 p2

Since each node is a marginal density function and the combination of them on each layer is a joint density, moving from one layer to another layer is a transformation of one joint density to another. In the density, it is no more than a multiplication of functions, if they are independent. However, between two layers, we know that the deep learning structure has dependency to each other. Assuming that the first layer is the data input, then the second layer depends on how much weights are assigned to each link. Depending on the structure of the second layer and weights, the third layer is affected. Once the feed forward process is done, then the back propagation is the opposite process. Though the chain rule helps us to avoid painstaking calculations in each step, nonetheless, the dependency to each layer remains.

This is where we need MCMC, or more specifically Gibbs sampling type approximation. Note that Gibbs sampling assumes input data's distribution, and expects what will be the outcome's density. Once done, then the outcome's density becomes the new input, and we use the information to re-construct the outcome, which is the original input. By running back and forth, the process is expected to converge. Although the correlation between nodes can be a bothersome issue, either over- or under-estimating key weights, such irregularity can be handled by Metropolis-Hastings type corrections. Gibbs class samplings are, in short, two groups of dependent sampling process, instead of a single group. The Bayesian technique precisely fits to our autoencoder problem.

COM503 Class4 p5

Note that constructing a belief network for rational model has to deal with multiple intrinsic problems. (Mentioned in the aboved screen-captured lecture note). All of them can be successfuly handled by Gibbs type autoencoder.

The structure is known as 'Restricted Boltzmann Machine (RBM)'.

COM503 Class4 p13

As is illustrated in the above lecture note, the model can capture key hidden factor components better than PCA.

Picture

Member for

5 months 3 weeks
Real name
Keith Lee
Bio
Head of GIAI Korea
Professor of AI/Data Science @ SIAI

Class 3. MCMC and Bayesian extensions

Class 3. MCMC and Bayesian extensions
Picture

Member for

5 months 3 weeks
Real name
Keith Lee
Bio
Head of GIAI Korea
Professor of AI/Data Science @ SIAI

Bayesian estimation tactics can be used to replace arbitrary construction of deep learning model's hidden layer. In one way, it is to replicate Factor Analysis in every layer construction, but now that one layer's value change affects the other layers. This process goes from one layer to all layers. What makes this job more demanding is that we are still unsure the next stage's number of nodes (or hidden factors) are right, precisely as we are unsure about the feeding layer's node numbers. In fact, everything here is unsure, and reliant to each other. This is where Bayesian help us in less arbitrary manner.

Let's think about what we learn from basic Bayesian courses. By a handful of sample data, we assume probable distribution A, B, and C as candidates for the entire population. Say we give 1/3 probability to each. Bayesians call it 'Prior'. We find another sample data, which works like a new weight to distributions. In Bayesian world, we call it 'Likelihood'. Combining the 'Prior' and 'Likelihood', we can find 'Posterior'. With another set of sample data, we can do the process again, as we place the 'Posterior' as the 'Prior'. We do this process Nth times, and at some point, the 'Posterior' hardly is affected by 'Likelihood'. The final 'Posterior' is the probability assignment that we initially looked for. We can do the same process with A, B, C, and D distributions, or even more candidates that fit to sample data.

COM503 LectureNote3 MCMC Sampling

The structure of Bayesian 'learning' in fact is similar to what we do with Feed Forward and Back Propagation in multiple loops. This is where Bayesian meets Deep Learning.

MCMC (Monte-Carlo + Markov-Chain) simulation

The term MCMC is closely related to Bayesian model building by ways of creating probably simulated data and make the model to learn by itself.

The first MC, Monte Carlo, means a simulation by a prior assumptions on data set's distribution. Recall the basics from COM501: Scientific Programming that by LLN (Law of Large Number), MC approximates $I_M \rightarrow I$ as $M \rightarrow \infty$. In Bayesian, more data helps us to closely approximate the true underlying distribution, when population density is unknown. Remember that we are unsure about the number of nodes in each layer of Autoencoder. So, as long as we can construct a convergence path, the outcome will be most fitted model that represents the data's hidden structure.

The second MC, Markov-Chain, is to explain each simulation's independency to other simulated samples. In other words, we assume i.i.d draws, which ensures unbiased convergence in Monte Carlo process. This helps us if our data follows i.i.d. For example, when we do image recognition, each image's numbers are independent to each other. It is not like I am going to have number 6 in the next flip, if I have number 5 now. And when we feed images to the model, we already preprocess the image by sliding windows, which will be re-visited in image recognition part.

Overall, the MCMC simulation greatly helps us to construct an ideal Autoencoder model without arbitrarily experimenting all possible combinations of data. One may still rely on stepwise type searching, but given the risk of overfitting, local maxima, and exponentially increasing computational cost, it is always wise to rely on more scientific tools like MCMC.

In the lecture, Gibbs sampling, the most well-known MCMC technique, is presented for a sample construction of the Autoencoder. One can also rely on Metropolis-Hastings, if tampering marginal distribution by truncation is necessary.

Picture

Member for

5 months 3 weeks
Real name
Keith Lee
Bio
Head of GIAI Korea
Professor of AI/Data Science @ SIAI

Class 1. Introduction to deep learning

Class 1. Introduction to deep learning
Picture

Member for

5 months 3 weeks
Real name
Keith Lee
Bio
Head of GIAI Korea
Professor of AI/Data Science @ SIAI

As was discussed in [COM502] Machine Learning, the introduction to deep learning begins with history of computational methods as early as 1943 where the concept of Neural Network first emerged. From the departure of regression to graph models, major building blocks of neural network, such as perceptron, XOR problem, multi-layering, SVM, and pretraining, are briefly discussed.

As for the intro, logistic regression is re-visited that whatever the values are in the input, the outcome is converted to only a number from 0 to 1. To find the best fit, the binary loss function is introduced in below format.

$\mathcal{L} (\beta_0, \beta_1) = \Sigma_{i} y_i \log{p_i} + (1-y_i) \log{(1-p_i)}$

The function is to minimize losses in both cases where $y=0$ and $y=1$. The limitation of the logistic regression, in fact any regression style functional format, is that without further assumptions in non-linear shape, the estimation ends up with linear form. Even if one introduces non-linear features like $X^2$, it is still limited to functional format. Support Vector Machine (SVM) departs from such limitation, but it still has to be pre-formated.

Neural Network partly solves the limitation in that it allows researchers to by-pass equational assumptions and let the function to find the best fit to the data. Though it still is dependant upon how the form of neural network is structured, what activation functions are used, and how much relevant data is fed to the network, it is a jump start from the functionally limited estimation strategies that had been pre-dominant in computational methods.

In essense, the loss functional shape is the same, but the way deep learning deal with the optimization is to leverage chain rule in partial derivative, therefore speed up the computationally heavy calculation.

COM503 Class1 p14
COM503 Class1 p15
COM503 Class1 p16

With the benefit of decomposition and chained-partial differentiation, back-propagation can achieve speed of computation that simply needs a single higher order matrix (or tensor) type transaction, instead of feeding nxk columns. An example of such is illustrated below.

COM503 Class1 p19

To further speed up the computation, one can modify the gradient approximation with varying weights, disproportional learning rates for vertical/horizontal angles, or even with correction by dynamically changing second moment (or variance-covariance) structure. In fact, any optimization is possible, depending on the data structure. In general, Adam optimizer performs marginally faster than RMSProp as it incorporates both first and second moments. The advantage will most likely disappear if the fed data set does not have dynamically changing variance for the reasons that second moment does not have any more information than base case.

COM503 Class1 p23
Picture

Member for

5 months 3 weeks
Real name
Keith Lee
Bio
Head of GIAI Korea
Professor of AI/Data Science @ SIAI

Class 2. Autoencoder

Class 2. Autoencoder
Picture

Member for

5 months 3 weeks
Real name
Keith Lee
Bio
Head of GIAI Korea
Professor of AI/Data Science @ SIAI

Feed forward and back propagation have significant advantage in terms of speed of calculation and error correction, but it does not mean that we can eliminate the errors. In fact the error enlarges if the fed data leads the model to out of convergence path. The more layers there are, the more computational resources required, and the more prone to error mis-correction due to the structure of serial correction stages in every layer.

The reason we rely on Neural Network model is because we need to find a nested non-linear structure of the data set, which cannot be done by other computational models, such as SVM. Then, is there a way to fix the mis-correction? Partly, it can be done by drop-outs, but one has to compromise loss of information and the structure of drop-out is also arbitrary.

COM503 Class2 p2
COM503 Class2 p2

What, instead, can be done is to build an autoencoder model in a way to replicate Factor Analysis in network form. Recall from Machine Learning that Factor Analysis is a general form of Principle Component Analysis (PCA). Though PCA is only limited to linear combination of variables to re-construct the data's vector spaces by variance-covariance matrix. Given that all computational models are in a way to re-structure original data sets to our desired form, PCA could help us to find hidden key components of variance's dimension. i.e. the target of all regression based models, including Neural Network, is to maximize explanatory power in terms of variability matching.

From the vector space of second moments, the combination of explanatory variables does not necessarily have to be linear. This is why we moved away from linear regression and have been exploring all non-linear models. The same works for hidden factors. The combination of variables to match hidden factors does not, again, have to be linear. In Factor Analysis, we have explored a bit that depending on the data's underlying distribution, factor combination can be non-linear.

COM503 Class2 p13

Here comes the benefit of Autoencoder. We can construct Neural Network model with the concept of Factor Analysis. With the right number of hidden factors, the re-designed Neural Network model not only becomes more robust to changes in data sets, but it also becomes less prone to error mis-correction and/or over-fitting. If one have more than needed hidden factors, it is likely the model is going to be overfit. To deal with it, most frequent choice is drop-out, but the result, as mentioned earlier, is not robust, and sometimes it is no more than an improvision for that specific learning event. To small number of hidden factors in a particular layer obviously results in insufficient learning.

To help you to follow Autoencoder's logic in network building, let's talk about matching number 0 to 9 in shapes in image recognition. By PCA, as discussed in Machine Learning, one tries to find PCs from transformed image. Some PCs may help us to differenciate 0 and 3, but that particular PC is not the strongest vector space that we usualy look for in PCA based regressions (PCR). Together with other similar images like 9, the upper right parts of the images will give us one PC due to commonality. Here in image recognition, we need PCs from common parts for some guessing, at the same time we also need uncommon parts' PC to differentiate 0, 3, and 9. There could be over 100 PCs initially, but eventually, we need 10 hidden factors to verify 0 to 9.

The same thought experiment can be constructed by a rather a huge size of Neural Network. Your end layer will have 10 nodes, but at the beginning, you may need over 100 nodes, just like PCA. The construction of multiple hidden layers in the middle, coined as 'Encoder', should be carefully designed to pick up separable PCs at first, then exclude unnecessary PCs for the end goal. Like we have witnessed in tree-based models, creating a non-linear function requires a number of different layers.

COM503 Class2 p14

If the 'Encoder' is designed well, for any data that fits to your data-preprocessing for image 0 to 9, it should not be that difficult to unwind the process and recover the original data (or very close to original data). In short, it is a network version of Factor Analysis that combines 'from' and 'to'.

This process is called Autoencoder, which can be used to not only for image recognifition, but it can also be used to non-linear simulation like GAN (Generative Adversarial Network).

To further optimize the painstaking encoder construction, we can borrow Bayesian estimation tactics.

Picture

Member for

5 months 3 weeks
Real name
Keith Lee
Bio
Head of GIAI Korea
Professor of AI/Data Science @ SIAI

Deep Learning

Deep Learning
Picture

Member for

5 months 3 weeks
Real name
Keith Lee
Bio
Head of GIAI Korea
Professor of AI/Data Science @ SIAI
Picture

Member for

5 months 3 weeks
Real name
Keith Lee
Bio
Head of GIAI Korea
Professor of AI/Data Science @ SIAI

Machine Learning

Machine Learning
Picture

Member for

5 months 3 weeks
Real name
Keith Lee
Bio
Head of GIAI Korea
Professor of AI/Data Science @ SIAI
Picture

Member for

5 months 3 weeks
Real name
Keith Lee
Bio
Head of GIAI Korea
Professor of AI/Data Science @ SIAI

AI/Data Science 강의노트

AI/Data Science 강의노트
Picture

Member for

5 months 3 weeks
Real name
Keith Lee
Bio
Head of GIAI Korea
Professor of AI/Data Science @ SIAI

본 문서는 GIAI 산하에서 운영되는 스위스AI대학(Swiss Institute of Artificial Intelligence, SIAI)의 강의노트 중 일부를 한국어로 번역한 것입니다.

영어 원문 및 전체 버전은 아래의 링크를 통해 확인하시기 바랍니다.

아래에 번역된 노트는 학부/예비석사 과정, 혹은 AI MBA 과정에서 발췌 했습니다.

학업에 바쁜 와중에도 번역을 맡아주신 김광재(MBA AI/BigData, 2023), 전웅(MBA AI/BigData, 2023) 학생들께 감사를 표합니다.

Picture

Member for

5 months 3 weeks
Real name
Keith Lee
Bio
Head of GIAI Korea
Professor of AI/Data Science @ SIAI

SK하이닉스, 오는 11월 6세대 D램 양산 돌입 "삼성과의 격차 더 좁힌다"

SK하이닉스, 오는 11월 6세대 D램 양산 돌입 "삼성과의 격차 더 좁힌다"
Picture

Member for

5 months 3 weeks
Real name
남윤정
Position
기자
Bio
[email protected]
금융 산업에서의 경험을 바탕으로 정확하고 이해하기 쉬운 기사를 쓰겠습니다. 경제 활력에 작은 보탬이 되기 바랍니다.

수정

D램 시장 1위 삼성전자와 격차 좁히기 전략
고성능 D램 확보해 HBM 시장 주도권 경쟁
삼성전자도 '초격차' 앞세워 연내 양산 선언
samsung_sk_HBM_20240419

SK하이닉스가 10나노미터(㎚)급인 6세대 D램(1c) 개발을 앞당겨 11월까지 양산 준비를 마무리하기로 했다. 이는 점유율 45%로 업계 1위인 삼성전자와의 격차를 좁히기 위한 조치로, 계획대로 실행된다면 삼성전자보다 한 달 정도 먼저 양산을 시작하게 된다. 앞서 삼성전자도 반도체 초격차를 유지하기 위해 연내 6세대 D램 개발을 마무리하겠다고 밝힌 바 있다.

SK하이닉스, 6세대 D램 양산 앞당기기로

3일 업계에 따르면 SK하이닉스는 코드명 '스피카'로 개발 중인 1c D램을 오는 11월 양산 과정으로 이관하기로 했다. 양산 과정 이관은 기술 개발을 마치고 대량 생산 체제로 전환하는 작업을 말한다. 업계 관계자는 "SK하이닉스가 오는 11월 1c D램을 양산 과정으로 이관할 계획"이라며 "내부적으로는 이 시기를 좀 더 앞당기는 방안까지 고려하고 있다"고 전했다.

SKHBM_20240503
SK하이닉스의 AI용 메모리 HBM3E/사진=SK하이닉스

SK하이닉스가 1c D램의 양산을 서두르는 이유는 D램 부문 1위 삼성전자를 추격하기 위한 조치로 풀이된다. 삼성전자는 올해 말 1c D램 양산을 목표로 하고 있다. SK하이닉스가 11월 양산 이관과 동시에 대량 생산에 돌입하면 시기적으로는 삼성전자를 한 달여 앞서거나 큰 차이가 없어지게 된다. 그동안 업계에서는 삼성전자와 SK하이닉스가 D램 개발·양산 시기를 기준으로는 약 6개월, 수율 90% 도달까지는 1년 정도의 차이를 보이는 것으로 평가해 왔다.

SK하이닉스가 경쟁 우위를 점하고 있는 고대역폭메모리(HBM) 시장에서 선도기업의 지위를 이어가기 위해서는 차세대 D램 확보 필수적이다. HBM은 D램을 수직 적층하는 만큼 고성능 D램이 전제돼야 하기 때문이다. 1c D램 양산이 가시화되면서 대대적인 초미세 공정을 위한 극자외선(EUV) 노광장비 투자도 뒤따를 전망이다. 1c D램은 EUV 공정이 필요한 레이어가 6개로 1b D램의 2배, 1a D램의 6배다. SK하이닉스가 보유한 EUV 노광장비는 지난해 말 기준 5대로, 올해 8대를 추가할 계획이다. 이어 내년 1c D램 양산 등을 고려해 단기적으로 누적 20대가량을 늘릴 것으로 전해졌다.

20조원 투입해 청주 공장을 D램 생산기지로 전환

6세대 D램 대량 생산을 위한 거점은 청주에 마련한다. 앞서 지난달 24일 SK하이닉스는 이사회를 열고 청주 M15X 공장을 D램 생산기지로 전환하는 안건을 통과시켰다. SK하이닉스가 M15X를 낸드플래시 공장으로 짓기 위해 이미 터파기 공사를 한 터라 건물을 짓고 장비만 들여놓으면 곧바로 D램을 생산할 수 있다. 이에 일반적인 반도체 공장 건설보다 공기를 확 앞당길 수 있을 것으로 전망된다.

SK하이닉스가 예상한 M15X의 D램 양산 시점은 내년 11월로 순차적으로 장비를 추가로 들여와 생산량을 늘린다는 계획으로, 공장 건설에 투입되는 5조3,000억원을 비롯해 장비 구입 등 비용을 모두 합치면 총 20조원이 소요될 것으로 추산된다. SK하이닉스 관계자는 "D램 생산기지 전환은 급증하는 AI 반도체 수요에 대응하기 위한 조치"라며 "M15X가 현재 HBM 패키징(TSV) 라인을 확충하고 있는 M15 공장과 가까운 점도 생산 전략을 바꾸는 데 영향을 미쳤다"고 말했다.

SK하이닉스는 D램 생산거점 확대해 생산역량을 추가 확보함으로써 앞으로 글로벌 반도체 시장에서 HBM 수요 증가에 적극 대응할 수 있을 것으로 기대하고 있다. 이를 통해 향후 5년간 SK하이닉스의 HBM 부문 매출이 연평균 60% 이상 증가할 것으로 추산된다. 현재 엔비디아 등 HBM을 활용해 'AI 가속기'를 제작하는 회사들은 SK하이닉스에 선급금까지 주면서 HBM 물량 확보에 적극 나서고 있다. 'AI 가속기'는 데이터 학습·추론에 특화한 반도체 패키지로 주로 미국의 엔비디아나 AMD가 고성능 그래픽처리장치(GPU) 형태로 공급되는데 여기에 SK하이닉스와 삼성전자의 HBM이 들어가 대용량 데이터 처리 성능을 높여준다.

HBM_20240503
삼성전자의 HBM3E D램/사진=삼성전자

삼성 "연내 차세대 D램 양산해 초격차 이어갈 것"

한편 삼성전자도 HBM 추격과 온디바이스 AI 제품을 겨냥한 차세대 D램 시장 선점에 나섰다. 앞서 지난 3월 삼성전자는 미국 실리콘밸리에서 열린 글로벌 반도체 학회 ‘멤콘(MemCon) 2024’ 연설을 통해 연말에 10나노급 6세대 D램을 양산하겠다고 밝힌 바 있다. 삼성전자가 이 제품을 계획대로 양산한다면 현재까지 가장 최신 제품인 10나노급 5세대 제품이 만들어진 지 1년 만에 차세대 메모리 칩을 출시하는 셈이다. 이 제품의 가장 큰 특징은 전작보다 더 많은 회로를 극자외선(EUV) 기술을 활용해 만든다는 점이다.

EUV를 적용할 경우 동일한 칩 면적에도 기억 소자를 더욱 정밀하게 배치할 수 있어 기존보다 용량이 큰 제품을 한층 수월하게 생산할 수 있다. 이렇게 생상된 반도체 칩은 고용량 메모리가 필요한 온디바이스 AI 기기에 가장 먼저 채택될 가능성이 높아 AI 시장을 선점할 수 있을 것으로 보인다. 또한 개선된 EUV 기술로 미세 회로를 기존 제품보다 더 매끈하게 만들 수 있으며, 생산성은 물론 칩의 전력효율까지 향상돼 완성도와 원가 경쟁력에 큰 도움이 될 것으로 예상된다.

삼성전자가 연말에 진행할 차세대 D램 양산을 미리 공개한 것은 메모리 분야에서 초격차를 지켜내겠다는 메시지로 해석된다. 삼성전자는 D램 시장에서 45% 이상의 점유율을 보유한 자타 공인 메모리 1위 회사로 특히 전체 D램 시장 중 80~90%를 차지하는 서버·스마트폰·전자기기용 범용 D램 부문에서 압도적인 경쟁력을 과시하고 있다. 이런 가운데 선단 기술 개발과 함께 방대한 생산능력과 공정 효율을 앞세워 다가올 D램 슈퍼 사이클에서 우위를 점하겠다는 포부다.

Picture

Member for

5 months 3 weeks
Real name
남윤정
Position
기자
Bio
[email protected]
금융 산업에서의 경험을 바탕으로 정확하고 이해하기 쉬운 기사를 쓰겠습니다. 경제 활력에 작은 보탬이 되기 바랍니다.