Introduction to UCIe™ Webinar: Q&A Recap

May 3, 2023
8 min read

UCIe Consortium Chair Dr. Debendra Das Sharma recently presented a webinar exploring the industry demands and developments that resulted in the need for a UCIe specification. He also shared ways in which end-users can easily mix and match chiplet components developed by a multi-vendor ecosystem for System-on-Chip (SoC) construction — including customized SoC.

The webinar recording is now available for public viewing on our YouTube channel and can be downloaded from the UCIe Consortium website. The following are responses to some of the compelling questions we were unable to address during the live webinar. We hope you find this information helpful! If you have additional questions you’d like to pose to UCIe representatives, please let us know and we will get in touch.

General Questions

Q: Is the UCIe specification available for download?

A: Yes, the UCIe 1.0 specification is available for download by request here.

Q: Where can I watch a recording of this webinar?

A: The “Introduction to UCIe” webinar recording is available via YouTube or for download on the UCIe Consortium website.

Q: Is the webinar slide deck available for the attendees?

A: Yes, the webinar presentation can be viewed here.

Industry Verticals

Q: Many SoCs for cellular infrastructure require not just low but deterministic latency between when a bit is transmitted from one chiplet across the interface to another chiplet and then effectively transmitted from the radio. The system must know to have very tight tolerances for when a bit reaches the analog transmitter. Does UCIe plan to include any provisions for this?

A: UCIe as an interface can be designed to have deterministic latency across the stack.

Q: Does UCIe support the automotive market, or will it be the case in future revisions?

A: UCIe’s membership is comprehensive and is exploring all verticals where solutions are needed. Stay tuned for more on how our members plan to approach each market.

Compliance Testing

Q: As most of the interface pins for the chiplet cannot be probed any more (mechanically and electrically), how is test access envisioned? Will this be up to the customer (sacrificial pads at probe)? How will testing be conducted in the assembled (packaged) state? Will it need to be persistent and routed through the carrier/interposer, too?

A: We have a set of well-defined configuration register based access mechanisms that can be accessed through other ports for test and debug purposes. We are working on additional testability and debug enhancements to the existing specification. Stay tuned for more details to emerge.

Q: What methods are expected for validating at the electrical level?

A: The compliance methodology has tests for this. The initial link training reports the margin. We also have mechanisms such as CRC error logging and periodic parity injection to assess link health during normal operation. We are working on some enhancements in this area that we will be publishing shortly.

Power Requirements

Q: Does UCIe need additional power aside from VCCIO and VCCAON?

A: Its implementation specific. As an interface, no additional power is needed. It is possible for the same power rail to be used for other purposes in the chiplets. In some implementations a lower voltage can be used for the last stage driver to save power.

Q: Does the power efficiency include all FEC?

A: The power efficiency does include the D2D adapter which encompasses CRC and retry. Given that the BER is 10-15 (or 10-27), no FEC is needed similar to what external interconnects like PCIe, CXL, USB etc. where FEC was deployed only when the BER went down to 10-6 or lower.

Q: What is the function of VCCFWDIO?

A: VCCFWDIO is used for forwarding the supply voltage from the Transmitter chiplet to the Receiving chiplet when the PHYs are configured in tightly coupled mode. This along with a bigger RX eye mask requirement are expected to allow for lower power operation.

Q: Will the UCIe specification evolve to cover other voltage domains besides VCCIO and VCCAON?

A: At this point, we do not have any plans though we may revisit that if the need arises in the future.

Physical Questions

Q: Which interface is used to configure DIE2DIE and PHY Layer registers?

A: These are accessed through FDI and RDI. They can also be accessed through sideband if the device provisions for it.

Q: Why do we need a sideband at PHY layer?

A: Sideband functionality is critical for any open standard interconnect. The sideband is used for link training, debugging, manageability, and firmware download/upgrades. We have given a very high-performance interface in an always on voltage domain. All external interconnects, for example PCIe, have sideband. Lack of a specified sideband has caused these to have proprietary solutions that we want to avoid in UCIe. In the bump map, we've depopulated VSS pads for sideband so there is no shoreline penalty. Most people will route these signals in the backside package layers so no critical routing issues are expected. Our compelling key performance indictors (bandwidth density, power, etc) do include the sideband.

Q: What is the number of spare lanes? How was this number decided?

A: We have 2 spare lanes per 32 data lanes and a spare lane for clock, track, and valid in the advanced package. The determination comes from the industry’s rich experience in deploying different advanced packaging solutions in volume products.

Q: If I need only 2.5GB/s (PCIe Gen1), do I use the UCIe standard x16(64 GB/s) – is this the lowest available?

A: Looks like the text in the question was meant to be Gb/s rather than GB/s. The lowest frequency is 4G and a x16 that can be used – so that is 64 Gb/s/direction (or 8GB/s/direction). If one desires to have 64 GB/s/direction, then they have a few options such as a x16 module at 32G in standard, 2 x16 modules at 16G in standard, x64 module at 4G in advanced, etc.

Q: Why is x16 lanes the standard?

A: People mostly design to power of 2 in their internal data paths and with external standards like PCie / CXL being x16, that seems like a natural choice. Also, there is a tradeoff in the module width vs overhead (clocking, valid, track, sideband) – a larger width while it reduces the overhead makes it challenging from a clock distribution, lane to lane skew point of view whereas a smaller width means the overheads become dominant. From that perspective, x16 was at the sweet spot.

Q: Why can’t UCIe PHY use x1 lane and scale it as needed?

A: The overheads (clocking, valid, track, sideband) make the x1 unattractive.

Q: What is the difference between logical lane & physical lane?

A: It has to do with mapping. A physical numbering is what we have at the bump level. Imagine we have lane reversal on a x16 Link. Then bit 0 will be sent on the Transmitter physical lane #15 – so the physical lane #15 is the logical Lane 0 on the Transmitter.

Q: Any plans to expand the data lanes to PAM-3, PAM-4 in the physical layer?

A: Currently there are no plans. We have excellent bandwidth density that we expect to meet the needs that we see. We do expect the bump pitches to go down over time for advanced package. When that happens, people will reduce the frequency as the bandwidth density will increase exponentially to save power even more.

Security Questions

Q: What are the security aspects of UCIe?

A: Each protocol (e.g., PCIe/ CXL) has its own security flows that they will leverage. We are also exploring if there are other aspects we should address.

Q: Can UCIe leverage CXL security?

A: Yes – if CXL protocol is used, we should use that security which should already be included in the protocol layer.

Q: What are the security measurements that interconnects must enable to protect the SoC against a reverse engineering attack?

A: Each protocol (e.g., PCIe/ CXL) has its own security flows that they will leverage. We are also exploring if there are other aspects we should address.

Miscellaneous

Q: Can we achieve repeatable/known latency through the UCIe link across power cycles? If not, will this be addressed in the future?

A: UCIe has a common clock – so the expectation is repeatability should be possible on the interface. While events like replay when the link BER is 10-15 or retrain (e.g., those initiated by software) make the repeatability a challenge, known techniques of introducing delay prior to data forwarding can be deployed as they have been in other external standards. The broader question is if we send a transaction across, it goes beyond the interface as it does on other external interconnect standards. For example, a memory read may even go across external link. So applications that rely on repeatability need to deploy the mechanisms they already deploy to ensure repeatability as they already do for external interconnect standards like PCIe or CXL. This aspect is beyond the scope of any interconnect specification.

Q: How much latency variation can be expected with the UCIe 1.0 specification?

A: We have provided the latency targets (2ns for round-trip from FDI to bump) assuming a 16G link. This is provided as a guidance for designs to target. It is expected that designs will land in that ballpark though some variations may occur depending on factors such as the frequency of operation, the process node, and the placement of logic.

Q: Is the 2-nanosecond latency related to frequency?

A: The 2ns calculation is an estimate based on 16G frequency and a 2G internal clock. We expect the values to be in the ballpark of 2ns for other frequencies.

Q: Does lower frequency mean longer latencies?

A: Lower frequency means longer accumulation time for a Flit. However, pipelining techniques may be deployed to hide that latency and do some processing with known techniques such as late cancel on a CRC error. Performance critical bits (such as Flit Header) appear early on in the Flit to enable pipelined operations.

Q: What is difference between differential and pseudo-differential implementation of the clock receiver?

A: In the UCIe spec, these two terms are used interchangeably (does not mandate specific design). In analog design: Differential refers to a fully differential buffer/amplifier implementation with two (+ and –) inputs. Pseudo-differential implementations have two separate circuits opposite phases.

Q: How does UCIe repair clock and track for differential & pseudo-differential receiver?

A: The details are provided in Section 4.3.5 of UCIe 1.0 specification.

Q: Can we do a Precision Time Measurement (PTM) on UCIe with streaming?

A: That depends on the support of PTM in the streaming protocol.

Q: If we have an advanced package implementation with 0.4V unterminated swing, how do we interoperate with another advanced package implementation with 0.8V unterminated swing?

A: The Rx has the capability of accepting the common mode range of 0.4V to 0.85V swing (the common mode can be setup during initialization and training).

Q: Does UCIe contain any provisions to mitigate the problem of supply noise and coupling to sensitive radios?

A: Data transmitted over UCIe Link is Scrambled. The details of scrambling/descrambling polynomial are provided in the specification.

Q: Does UCIe envision integrating photonics in a 2.5D interposer for future die-to-die (chiplet to chiplet) interconnect?

A: We have defined UCIe Retimer for communication across chiplets in different packages that may be used with media such as optical.

Q: How does UCIe work with DRAM standards such as HBM3, JEDEC, and JESD238?

A: UCIe uses CXL.Mem for memory semantics to connect to any type of memory device or memory controller.

Q: Will UCIe propose how to ship compatibility data for each die?

A: UCIe defines link initialization as an integral part of the specification to negotiate the link operating parameters such as width, lane numbering, frequency, protocol support etc.

Q: How is manufacturing and ATM scaling up to test bigger and bigger package?

A: We have test and compliance mechanisms in the specification. We will be enhancing that in the next revision of the specification.

Q: Does the ACK/NAK flit contain data? If so, what data does it carry?

A: A Flit carries Ack/Nak as a part of the 2B Header inside each Flit (68B or 256B). The other bytes carry information (e.g., data) along with CRC etc.