<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE rfc SYSTEM "rfc2629-xhtml.ent">
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<rfc
      xmlns:xi="http://www.w3.org/2001/XInclude"
      category="info"
      docName="draft-xiong-rtgwg-use-cases-hp-wan-00"
      ipr="trust200902"
      obsoletes=""
      updates=""
      submissionType="IETF"
      xml:lang="en"
      tocInclude="true"
      tocDepth="4"
      symRefs="true"
      sortRefs="true"
      version="3">

 <!-- ***** FRONT MATTER ***** -->

 <front>

   <title abbrev="Use Cases for High-performance Wide Area Network">Use Cases for High-performance Wide Area Network</title>
    <seriesInfo name="Internet-Draft" value="draft-xiong-rtgwg-use-cases-hp-wan-00"/>
   
   <author fullname="Quan Xiong" initials="Q" surname="Xiong">
      <organization>ZTE Corporation</organization>
      <address>
        <postal>
          <street/>
         <city></city>
          <region/>
          <code/>
          <country>China</country>
        </postal>
        <phone></phone>
        <email>xiong.quan@zte.com.cn</email>
     </address>
    </author>
	
 <author fullname="Zongpeng Du" initials="Z" surname="Du">
      <organization>China Mobile</organization>

      <address>
        <postal>
          <street></street>
          
          <city></city>
          
          <region></region>
  
          <code></code>

          <country>China</country>
        </postal>

        <phone></phone>

        <email>duzongpeng@chinamobile.com</email>
      </address>
    </author>

	 <author fullname="Tao He" initials="T" surname="He">
      <organization>China Unicom</organization>

      <address>
        <postal>
          <street></street>
          
          <city></city>
          
          <region></region>
  
          <code></code>

          <country>China</country>
        </postal>

        <phone></phone>

        <email>het21@chinaunicom.cn</email>
      </address>
    </author>	
    
	 <author fullname="Huiyue Zhang" initials="H" surname="Zhang">
      <organization>China Telecom</organization>

      <address>
        <postal>
          <street></street>
          
          <city></city>
          
          <region></region>
  
          <code></code>

          <country>China</country>
        </postal>

        <phone></phone>

        <email>zhanghy30@chinatelecom.cn</email>
      </address>
    </author>	

    <author fullname="Junfeng Zhao" initials="J" surname="Zhao">
      <organization>CAICT</organization>

      <address>
        <postal>
          <street></street>
          
          <city></city>
          
          <region></region>
  
          <code></code>

          <country>China</country>
        </postal>

        <phone></phone>

        <email>zhaojunfeng@caict.ac.cn</email>
      </address>
    </author>

   <area>Routing</area>
    <workgroup>RTGWG</workgroup>
   <keyword></keyword>
   
   <abstract>
   
    <t>Big data and intelligent computing is widely adopted and in rapid 
	development, with many applications demand massive data transmission 
	with higher performance in wide area networks and metropolitan area 
	networks. This document describes the use cases for High-performance 
	Wide Area Networks (HP-WAN).</t>
	  
    </abstract>
  </front>
  <middle>
    <section numbered="true" toc="default"> <name>Introduction</name>
	
	<t>With the rapid development of big data and intelligent computing, 
	there are many applications requiring data transmission between data
	centers (DC), such as cloud storage and backup of industrial internet data,
	digital twin modeling, Artificial Intelligence Generated Content (AIGC), 
	multimedia content production, distributed training, High Performance 
	Computing (HPC) for scientific research and so on. The long-distance 
	connection and massive data transmission between intelligent computing 
	centers have become a key factor affecting the performance. Increasingly 
	HPC connectivity must ensure data integrity and provide stable and 
	efficient transmission services in Wide Area Networks (WAN) and 
	Metropolitan Area Networks (MAN). </t>
	
	<t>Compared with ordinary networks, High-performance Wide Area 
	Network (HP-WAN) puts forward higher performance requirements 
	such as ultra-high bandwidth utilization, and ultra-low packet
	loss ratio ensuring effective high-throughput transmission. 
	This document describes key use cases for High-performance 
	Wide Area Networks (HP-WAN).</t>
	    
      <section numbered="true" toc="default"><name>Requirements Language</name>
	  
	 <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
       "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
       "OPTIONAL" in this document are to be interpreted as described in BCP
       14 <xref target="RFC2119" pageno="false" format="default"/> 
	   <xref target="RFC8174" pageno="false" format="default"/> when, and only when, 
	   they appear in all capitals, as shown here.</t>
	   
      </section>
    </section>
	
    <section anchor="Terminology" numbered="true" toc="default"> <name>Terminology</name>
	<t>The terminology is defined as following.</t>

    <t>High-performance Wide Area Networks (HP-WAN): indicates the WAN 
	or MAN which puts forward higher performance requirements such as
	ultra-high bandwidth utilization, and ultra-low packet loss ratio 
	ensuring effective high-throughput transmission.</t>
	
	<t>It also makes use of the following abbreviations and definitions
	 in this document:</t>
	   
	    <dl newline="false" spacing="normal" indent="15" pn="section-2-3">
		<dt>DC: </dt>
		<dd>Data Center</dd>	
	    <dt>DCI: </dt>
	    <dd>Data Centers Interconnection</dd>
	    <dt>HPC: </dt>
	    <dd>High Performance Computing</dd>
	    <dt>WAN: </dt>
	    <dd>Wide Area Networks</dd>
	    <dt>MAN: </dt>
	    <dd>Metropolitan Area Networks</dd>  
		</dl>
	
    </section>
	
    <section numbered="true" toc="default"><name>Use Cases</name>
	
	<t>Several characteristics and use cases may be documented for
	scenarios requiring high-performance data transmission
	in wide area networks, including:</t>
	
   <ul spacing="normal">

   <li>High Performance Computing (HPC): uses computing clusters to 
   perform complex scientific computing and data analysis tasks. 
   It is necessary to support large-scale parallel processing, 
   high-speed data transmission, and low latency communication 
   to achieve effective collaboration between computing nodes.</li>
   
   <li>Distributed Storage: provides distributed data storage in 
   different physical locations. It is necessary to ensure
   efficient and reliable data transmission between different
   storage sites.</li>
   
   <li>Data Migration: refers to the process of transferring data 
   from one system or storage location to another while ensuring 
   the integrity, consistency, and usability of the data.</li> 
   
   <li>Collaborative Training across Multiple DCs: refers to the
   process of distributed machine learning training between 
   multiple data centers with Data Centers Interconnection (DCI). 
   It should provide sufficient bandwidth, low latency, and high 
   reliability for data centers communications. </li>
   
   <li>Cloud Computing Services: provides computing resources
   through the internet for users. It is necessary to optimize
   data synchronization between data centers.</li>
   
   <li>Autonomous Driving: provides big data management of autonomous
   driving which needs to be transmitted from the car systems 
   to data center.</li>
   </ul>
 
    <section numbered="true" toc="default"> <name>High Performance Computing (HPC)</name>
	
	<t>High Performance Computing (HPC) uses computing clusters to 
    perform complex scientific computing and data analysis tasks. 
	HPC is a critical component to solve some complex problems
	in various fields such as scientific research, engineering, 
	finance, and data analysis.</t>
	
	<t>For example, the research data of large science and engineering
	projects in cooperation with many research institutions requires 
	long-term archiving of about 50~300PB of data every year. The 
	PSII protein process generates 30 to 120 high-resolution images 
	per second during experiments. This results in 60~100 GB of data
	every five minutes, requiring data transmission from one laboratory
	to another for analysis. Another example is FAST astronomical 
	data calculation with over 200 observations for each project, 
	a single project generating observation data of TB~PB, and an 
	annual production data of about 15PB per year.</t>
	    
	<t>HPC requires high bandwidth and high-speed network to facilitate 
	the rapid data exchange between processing units. It also requires
	high-capacity and high-throughput storage solutions to handle the
	vast amounts of data generated by simulations and computations.
	It is necessary to support large-scale parallel processing, 
    high-speed data transmission, and low latency communication 
    to achieve effective collaboration between computing nodes.</t>
	</section>
	
    <section numbered="true" toc="default"> <name>Distributed Storage</name> 
	
	<t>Distributed storage is a method of storing data across 
	multiple physical or virtual devices, which can be spread 
	across different locations. This method is designed to 
	enhance data availability, improve performance, and provide 
	redundancy. In the big data environment, the increase in 
	data size and complexity is often very rapid, requiring high
	scalability performance of the storage system. The scale of 
	the big data storage system is huge and the node failure 
	rate is high, so it is necessary to complete adaptive
	management functions. The system must be able to estimate 
	the required number of nodes based on the amount of data 
	and computational workload, and dynamically migrate data
	between nodes and systems. It also needs to move data from 
	one storage system to another due to multiple reasons such as 
	upgrading to a new storage system, consolidating storage, 
	or moving to a cloud-based solution. It needs to ensure 
	that the migration process maintains data consistency 
	across the distributed storage systems.</t>
	
	</section>
	
   <section numbered="true" toc="default"> <name>Data Migration</name> 	
	
	<t>Data migration is the process of transferring data from 
	one system or storage location to another while ensuring the
	integrity, consistency, and usability of the data. This can 
	be necessary for various reasons, such as system upgrades, 
	consolidation, or moving the data to a platform or storage 
	site. </t>
	
	<t>For example,	with the development of new media such as 
	4K/8K, 5G, AI, VR/AR and short video, large amount of audio
	and video data needs to be transmitted between data centers 
	or different storage sites. For AR/VR videos, the terminal
	outputs 1080P image quality requires 40M per user. It demands
	data transmission with the traffic characteristics such as
	massive data scale and large burst. For multimedia content 
	production, the raw material data of a large-scale variety
	show or film and television program is at the PB level, with
	a single transmission of data in the range of 10TB to 100TB.</t>
	
	<t>Another data migration example is a P2P data express service, 
	which requires task-based data transmission, point-to-point model, 
	network resource pooling, high resilience and throughput, with 
	single data ranging from TB to 100TB. For the migration of backup
	data with the IT cloud resource pool is at the TB level, the 
	working and backup data centers are built in different locations.
	It requires long distance and massive data transmission for
	disaster recovery.  </t>
	
    <t>Traditional data migration solutions include high-speed
	dedicated connectivity, which is expensive and manual transportation
	of hard copy which is as long as several days of each data
	transfer. It is necessary to ensure efficient and reliable
	data transmission between different storage sites.</t>
	
	</section>
	
    <section numbered="true" toc="default"> <name>Collaborative Training across Multiple DCs</name>
	
	<t>With the increasing demand for computing power in AI 
	large-scale model training, the scale of a single data
	center is limited due to factors such as power supply.
	The AI training clusters expands from single data center
	to multiple DCs. Collaborative training across multiple DCs
	typically refers to the process of distributed machine 
	learning training across multiple data centers. </t>
	
	<t>For example, it is used for the training process of deep learning
	and the training data has reached 3.05TB. Uploading a large model 
	training templates requires uploading TB/PB level data to the 
	data center. Each training session has fewer data flows with 
	larger bandwidth. And 20% of the current network's services 
	accounts for 80% of the traffic which resulting in elephant flows.
	The collaborative training method can improve computational 
	efficiency, accelerate model training speed, and utilize more
	data resources. It will distribute different parts of the model
	to different data centers, with each data center is responsible 
	for calculating a portion of the model and then synchronously 
	updates model parameters. Due to the demand for information 
	exchange between data centers, communication efficiency
	is crucial for collaborative training. </t>
	
	<t>Compared with traditional DCI scenarios, parameters exchange
	significantly increases the amount of data transmission across
	DCs, typically from tens to hundreds of TBPS. In addition, there
	is a higher demand for network latency and stability. It must
	provide on-demand task allocation to different clusters, sufficient
	bandwidth, low latency, high throughput, and extremely high network 
	availability and reliability for data centers communications. </t>
	
	</section>
	
   	<section numbered="true" toc="default"> <name>Cloud Computing Services</name> 
	
	<t>Cloud computing services represent a model where computing
	resources and data storage are provided over the Internet for
	users. Main types of cloud computing services include Infrastructure
	as a Service (IaaS), Platform as a Service (PaaS), Database as 
	a Service (DBaaS), Network as a Service (NaaS), Management
	as a Service (MaaS) and so on. For example, MaaS is a cloud 
	computing service model that service providers provide 
	professional management services and tools through the
	internet to help customers manage their IT infrastructure, 
	applications or business processes more effectively. 
	MaaS services allow users to purchase services on the cloud,
	connect to multi-cloud computing, and achieve high experience.</t>
	
	<t>It must provide identity authentication and access 
	management to ensure that only authorized users can access cloud
	services. It is also required to synchronize user information
	between cloud storage services and local user data directories 
	(such as Active Directory) for data backup, disaster recovery, 
	and remote access. It is necessary to optimize data synchronization
	between data centers.</t>
	</section>
   
   <section numbered="true" toc="default"> <name>Autonomous Driving</name> 
	
	<t>Autonomous driving refers to the technology of vehicles that
	are capable of navigating without the need for human input. It 
	needs to use machine learning and AI algorithms to analyze the
	data from sensors to make decisions, such as identifying other
	vehicles, pedestrians, and traffic signals. Vehicles record
	data from 4K HD cameras, laser scanners, and radars on the road. 
	Each vehicle can generate 80TB of data per day. It is
	challenging for big data management to use autonomous driving.
	Autonomous driving technology is categorized into different 
	levels of automation, typically ranging from Level 0 (no automation) 
	to Level 5 (full automation). The amount of data required to 
	be collected for vehicles of different levels shows a geometric
	increase. For example, Level 2 autonomous vehicle needs 4~10PB
	data, Level 3 needs 50~100PB, and Level 5 needs more than 3EB.
	It needs to transmit the vehicles record data from the car 
	systems to data center.</t> 
	
	</section>


   </section>

   <section  numbered="true" toc="default"> <name>Security Considerations</name>
   <t>This document covers a number of representative applications and
   network scenarios that are expected to make use of HP-WAN
   technologies.  Each of the potential use cases does not raise
   any security concerns or issues, but may have security 
   considerations from both the use-specific perspective and
   the technology-specific perspective.</t>
   </section>
   <section numbered="true" toc="default"> <name>IANA Considerations</name>
   <t>This document makes no requests for IANA action.</t>
   </section>
	
   <section numbered="true" toc="default"> <name>Acknowledgements</name>
   <t>The authors would like to acknowledge Zheng Zhang, Yao Liu and Bin Tan for 
   their thorough review and very helpful comments.</t>
   </section> 
   
  </middle>
  
  <!--  *****BACK MATTER ***** -->

 <back>
 
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>
        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>
        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8664.xml"/>
        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.9232.xml"/>
        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.7424.xml"/>	
        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.3168.xml"/>
		
      </references>
    </references>
 
 </back>
</rfc>
