32nd IEEE International Conference on Computer Design_Best Paper Award

Awardee
Jaehyeong Sim, Jun-Seok Park, Seungwook Paek, Lee-Sup Kim
Paper Title
Timing Error Masking by Exploiting Operand Value Locality in SIMD Architecture
Abstract
A significant amount of energy is consumed by a voltage guardband to ensure error-free operations under the worsening PVT variations in modern processors. Circuit-level timing speculation has become a popular approach that increases energy efficiency by removing such guardband and tolerating occasional timing errors. However, SIMD processors suffer from a large throughput and energy efficiency loss induced by a conventional error correction mechanism which requires several extra cycles for each timing error. In this paper, we present an error masking scheme to eliminate the chances of performing the error correction. The error masking is done by allowing potential erroneous addition instructions to reuse the partial result of previous operations. We show that reuse can be applied to a large number of addition instructions by exploiting the observations that SIMD applications exhibit high levels of temporal operand value locality and operand value locality across SIMD lanes. Our implementation of the proposed masking scheme is augmented with the conventional pipeline logics. Simulation results verify that our scheme achieves up to 5.1% improvement in energy efficiency and 30% improvement in EDP (Energy-Delay-Product) over the baseline design.
 

2014 IDEC SoC Congress Chip Design Contest_Best Design Award

Awardee
Yong-Hun Kim, Young-Ju Kim, Lee-Sup Kim
Paper Title
A 21Gb/s, 1.63pJ/bit Adaptive CTLE and 1-tap DFE with Single Loop Spectrum Balancing Method in 65nm CMOS
Abstract
.
 

2014 IEEE International Symposium on High-Performance Computer Architecture

Awardee
Wongyu Shin, Jeongmin Yang, Jungwhan Choi, Lee-Sup Kim
Paper Title
NUAT: A Non-Uniform Access Time Memory Controller
Abstract
With rapid development of micro-processors, off-chip memory access becomes a system bottleneck. DRAM, a main memory in most computers, has concentrated only on ca-pacity and bandwidth for decades to achieve high perfor-mance computing. However, DRAM access latency should also be considered to keep the development trend in multi-core era. Therefore, we propose NUAT which is a new memory controller focusing on reducing memory access latency without any modification of the existing DRAM structure. We only exploit DRAM’s intrinsic phenomenon: electric charge variation in DRAM cell capacitors. Given the cost-sensitive DRAM market, it is a big advantage in terms of actual implementation. NUAT gives a score to every memory access request and the request with the highest score obtains a priority. For scoring, we introduce two new concepts: Partitioned Bank Rotation (PBR) and PBR Page Mode (PPM). First, PBR is a mechanism that draws information of access speed from refresh timing and position; the request which has faster access speed gains higher score. Second, PPM selects a better page mode between open- and close-page modes based on the information from PBR. Evaluations show that NUAT decreases memory access latency significantly for various environments.
 

Undergraduate Research Participation(URP) Program_Encouragement Award

Awardee
Minhye Kim, Young-Ju Kim, Lee-Sup Kim
Paper Title
Fibonacci Codes for Crosstalk Avoidance
Abstract
.
 

2013 International SoC Design Conference

Awardee
Young-Ju Kim, Lee-Sup Kim
Paper Title
A 12Gb/s 0.92mW/Gb/s Forwarded Clock Receiver based on ILO with 60MHz Jitter Tracking Bandwidth Variation Using Duty Cycle Adjuster in 65nm CMOS
Abstract
This paper presents a quarter-rate forwarded clock (FC) receiver based on an injection-locked oscillator (ILO) which exploits a phenomenon that phases of the output clock are shifted by the duty cycle of an injection clock. To utilize this phase shifting phenomenon, a simple duty cycle adjuster (DCA) is proposed. By using the DCA, the proposed FC receiver achieves 760MHz of wide jitter tracking bandwidth (JTB) while consuming 11mW. Furthermore, it has only 60MHz JTB variation which is reduced by 74% compared to the conventional ILO in spite of clock deskew. The test chip achieves 12Gb/s data rate with 0.92mW/Gb/s in a 1V 65nm CMOS process.
 

2012 International SoC Design Conference

Awardee
Yong-Hun Kim, Lee-Sup Kim
Paper Title
An 8Gb/s 1-tap Feed Forward Equalizer and 1-tap Decision Feed Forward Equalizer in 65nm CMOS
Abstract
This work presents an equalization unit for 8Gb/s serial links as compensating -23.5dB attenuation. It consumes 33mW from a 1V supply with a BER of 10-12 for 8% UI horizontal eye opening for the output data after 1:4 demux.
 

2011 International SoC Design Conference

Awardee
Seungwook Paek, Young-Jun Kim, Lee-Sup Kim
Paper Title
Homogeneous Stream Processors and Embedded Special Function Units for Mobile Applications
Abstract
Recently, mobile devices tend to embed high performance graphics processing unit (GPU). Unlike desktop GPUs, mobile GPUs are integrated in an application processor (AP). This raises a tight constraint in area since many other functional blocks such as central processing unit (CPU), memory controllers and several application-specific blocks are integrated together in a single chip. Since we cannot use enough number of cores, it is important to increase the utilization of limited processing resources. In conventional architecture which is known as single instruction, multiple data (SIMD) with 4-way configuration, a large portion of instructions do not use whole data paths. These instructions cause lots of idle time of processing elements and result in severe performance degradation. To solve these problems, we present a GPU architecture for mobile applications with homogeneous stream processors (SP) and embedded special function units (SFU).
 

제 18회 한국반도체학술대회 Chip Design Contest_Best Design Award_특별상부문 SSCS 챕터상

Awardee
Won-Young Lee, Lee-Sup Kim
Paper Title
DisplayPort version 1.2 용 5.4Gb/s Clock and Data Recovery 회로
Abstract
A dual-mode binary phase detector enables the multi-rate operation of the CDR circuit. The recovered 1.35 GHz clock shows the peak-to-peak jitter of 29.9 ps and the rms jitter of 3.215 ps for 5.4 Gb/s input. The power consumption is 147.6 mW at 5.4 Gb/s from a 1.2 V supply
 

제 3회 동부하이텍 IP 설계 공모전_Best Design Award

Awardee
Won-Young Lee, Kyu-Dong Hwang, Lee-Sup Kim
Paper Title
5.4Gb/s Transceiver for DisplayPort version 1.2
Abstract
This paper presents a 5.4Gb/s transceiver for DisplayPort version 1.2. The proposed transceiver consists of an adaptive pre-emphasis circuit and a data recovery circuit (CDR) using the seamless loop transition scheme. The adaptive pre-emphasis effectively compensates for channel loss about various channel length. The CDR circuit achieves the phase noise reduction of 22.5dBc/Hz at 10MHz offset. A tested chip is manufactured using 0.13um CMOS process.
 

제 17회 한국반도체학술대회 Chip Design Contest_Best Design Award

Awardee
Won-Young Lee, Lee-Sup Kim
Paper Title
An Adaptive Equalizer for Display Interface
Abstract
An adaptive equalizer with an active source degeneration capacitor has been implemented. The proposed equalizing filter consists of MIM capacitors and a sub-amplifier. The equalizer satisfies the specification of DisplayPort version 1.1a. The prototype chip has been tested using FR4 PCB trace. The core area is 286×380μm2 and power consumption is 22.3mW at 2.7Gb/s at 1.8V.
 

제 2회 동부 하이텍 IP 설계 공모전_Best Design Award

Awardee
Seok-Hoon Kim, Hong-Yun Kim, Young-Jun Kim, Kyusik Chung, Lee-Sup Kim
Paper Title
A 116fps 74mW 3D Display Processor with adaptive power management for mobile application
Abstract
A 3D display processor is designed, supporting all 3D display contents by combining a 3D display IP with a stereo video decoder and a 3D graphics IP. For mobile environment, adaptive power management saves power consumption up to 165mW. Proposed modulo operators synthesize 3D images at maximum 116fps, 17 times faster than a previous work, with adaptive quality configuration. An IEEE 754 compliant floating point vector unit reduces critical latency by 30% compared to previous works.
 

제 11회 휴먼테크 논문대상_동상

Awardee
Chiyeon Kim, Lee-Sup Kim
Paper Title
A Low Power Hybrid Adder using Single-stage Multiplexer Circuits
Abstract
This paper presents a hybrid adder to satisfy simultaneously high performance operation, low power consumption, and small area. The new conditional sum architecture (CSA) composed of the single-stage multiplexer circuits is proposed. This architecture can reduce the load capacitance in the critical path. With the proposed CSA, the proposed hybrid adder achieves the low power consumption without sacrificing performance. To compare the proposed hybrid adder with conventional hybrid adder, two adders are simulated using the 0.18um CMOS process parameter. The proposed hybrid adder results in about 4% less maximum delay, about 19% less average power consumption, about 7% less layout area, and about 9% less transistor count than the conventional hybrid adder, respectively. In addition, to verify the operation and measure the critical path delay of two adders, they are fabricated in a 0.18um 1-poly 6-metal CMOS process.
 

IEEE SSCS/EDS Seoul Chapter 논문상_최우수상

Awardee
Changhyo Yu, Lee-Sup Kim
Paper Title
An Adaptive Spatial Depth Filter for 3D Rendering IP
Abstract
In this paper, we present a new method for early depth test for a 3D rendering engine. We add a filter stage to the rasterizer in the 3D rendering engine, in an attempt to identify and avoid the occluded pixels. This filtering block determines if a pixel is hidden by a certain plane. If a pixel is hidden by the plane, it can be removed. The simulation results show that the filter reduces the number of pixels to the next stage up to 71.7%. As a result,67% of memory bandwidth is saved with simple extra hardware.
 

제 9회 휴먼테크 논문대상_동상

Awardee
Kyusik Chung, Lee-Sup Kim
Paper Title
수정 Bresenham 알고리즘을 이용한 PN triangle의 적응 모자이크 가공 기법에 관한 연구
Abstract
Reducing the required memory bandwidth is a main issue in 3D computer graphics. PN triangle solves the memory bandwidth problem by using curved surface representation and tessellation. It reconstructs a smooth and detailed 3D model from blocky one on graphics hardware and then reduces bandwidth consumption required for data transmission. But the existing PN triangle hardware tessellates a curved surface according to the user-defined and fixed Level Of Detail (LOD) and redundant geometric operations can be executed.In this paper, we insert adaptive LOD concept in PN triangle and propose several schemes for implementation and reducing visual artifacts. Simulation results show the reduced operation count and improved visual quality. Additionally we propose hardware architecture of PN triangle generation unit using adaptive LOD. The required hardware cost for PN triangle generation unit is not overhead to overall 3D graphics hardware.
 

2001년도 IDEC 컨퍼런스_우수논문상

Awardee
Byungdo Yang, Lee-Sup Kim
Paper Title
High-Speed and Low-Swing On-Chip Bus Interface Using Threshold Voltage Swing Driver and Dual Sense Amplifier Receiver
Abstract
A new high-speed and low-swing on-chip bus interface using Threshold Voltage swing (Vt) driver and Dual Sense Amplifier (DSA) receiver is proposed. The Vt-driver reduces the rising time in the bus to 30% of the full CMOS inverter and the DSA-receiver increases twice the throughput of the conventional reduced-swing buses using sense amplifiers. With Vt-driver and DSA-receiver combined, approximately 60% speed improvement and 75% power reduction are achieved in the proposed scheme compared to the conventional full CMOS inverter for the on-chip bus interface.
 

제 6회 휴먼테크 논문대상_은상

Awardee
Jinaeon Lee, Lee-Sup Kim
Paper Title
Implementation of a Single-Pass Antialiased Rasterization Processor
Abstract
Antialiased is one of challenging problems to be solved for the high fidelity image synthesis in 3D graphics. In this paper a rasterization processor which is capable of single-pass full-screen antialiasing is presented. To implement a H/W accelerated single-pass antialiased rasterizatioo processor at the reasonable H/W cost and minimized processing performance degradation. Our work is mainly focused on the efficient H/W implementation of a modified version of the A-buffer algorithm. For the efficient handling of partial-pixel-merging scheme and a simple and efficient new dynamic memory management scheme are proposed. For the final blending of partial-pixels without loss of generality, a parallel subpixel blender is introduced. To study the feasibility of the proposed rasterization processor as a practical rasterization processor, a prototype processor has been designed using a 0.35um EML technology. It operate 100MHz@3.3V and has the renderig performance from 25M to 80M pixel-fragments/sec depending on the scene complexity.