Related Work
Distributed Computing
Distributed computing is a computing paradigm that achieves efficient data processing and analysis by decentralizing large-scale computing tasks to different computing nodes. The four currently recognized architectures for distributed computing include Client-Server Architecture, Three-tier Architecture, N-tier Architecture and Peer-to-peer Architecture. Client-server is the most common way of organizing software on distributed systems, the functional components include clients and servers, in fact it is a kind of access processing, the client runs the application and runs the program locally by accessing the server data, which is the way of running almost all the applications nowadays. Client-server architecture offers the advantages of security and ease of ongoing management. Focusing only on protecting the server computer, any changes to the database system require only changes to the server.The Three-tier architecture further divides the server computer into two categories: application servers, which still receive client requests, and database servers, which are the third tier that stores and manages the data, and which are responsible for data retrieval and data consistency.The N-tier architecture includes a number of tiers, including the N-tier model and the N-tier model. The N-tier model includes several different client-server systems that communicate with each other to solve the same problem. Most modern distributed systems use the N-tier architecture, where different enterprise applications work together as one system in the background.Peer-to-peer distributed systems assign equal responsibilities to all networked computers. There is no distinction between client and server computers, and any computer can perform all duties.Peer-to-peer architecture has become popular for content sharing, file streaming, and blockchain networks.
The first three architectures are commonly used in the Internet today, and their flaws are obvious; centralized servers control almost all the data and control the rules of operation of all clients, which is the data and arithmetic monopoly problem we hope to solve. The core of distributed computing is the processing of data. To achieve perfect, efficient and reliable distributed data processing, we first consider Parallelization and Failure resilience, the tasks in Parallelization include data management, task scheduling, load balancing, consistency, communication, etc. Failure resilience needs to consider fault tolerance, recovery, security and privacy, and so on, Failure resilience requires consideration of fault tolerance, recovery, security and privacy, and so on.
The earliest distributed computing solutions date back to the 1970s and 1980s, and although these early systems differed significantly from modern distributed computing, they laid the groundwork for today's technologies, which include Pervasive Computing at Xerox PARC, HTCondor developed at the University of Wisconsin-Madison, Oak Ridge National Laboratory's Parallel Virtual Machine (PVM) developed at Oak Ridge National Laboratory, etc., all of which are still in use today and form the basis of distributed computing systems.
HTCondor is used to deal with high-throughput computing (High Throughput Computing) related problems. Throughput in High Throughput Computing should be the meaning of throughput, that is, the ability to schedule computer resources. the idea of HTC is to split the scale of the intensive operation into a sub-task, to the cluster computers. htcondor provides the following functions:
Publishing tasks: tasks are published to cluster computers based on set conditions for computing resources within the cluster;
Scheduling tasks: tasks can be sent to a computer that meets the conditions or migrated to another computer;
Monitoring tasks: monitor the task running status and computing resources at any time;
HTCondor is currently mainly used for academic experiments, but the subsequent distributed computing computing architecture is also basically in line with the setup of the functional process. the core idea of MapReduce developed by Google is to divide the computing task into two stages: Map stage and Reduce stage, respectively, parallel processing, and scheduling and monitoring by the master. MapReduce is a low-level data processing model, this mode of input jobs can be directly local data sources, if we combine blockchain, the use of consensus algorithms, and the design of a reasonable economic model for task incentives, we can remove the master node dedicated to centralized scheduling and monitoring.
Microservices and Pooling
Microservices is an architectural style in which the software development process is accomplished using autonomous components that isolate fine-grained business functions and communicate with each other through standardized interfaces.
Traditional application building approaches focus on monolithic architectures. In a monolithic architecture, all functions and services within an application are locked together and operate as a single unit. The architecture becomes more and more complex when the application is added to or improved in any way. This makes it more difficult to optimize any single function in the application without taking apart the entire application. It also means that if one process in the application needs to be extended, then the entire application must be extended as well. In a microservices architecture, each core function in the application runs independently. This allows the development team to build and update new components to meet changing business needs without having to disrupt the entire application.
Microservices are implemented by partitioning business processes into individual services that execute autonomously, and coordinating them into a distributed infrastructure through virtualization technology (typically containerization, such as Docker) to form mutually isolated autonomous units, each with its own processing logic and database. Microservices are currently used in many business environments, and we hope to apply microservices to high concurrency scenarios such as artificial intelligence and blockchain, which can adopt the following three ideas: asynchronous, caching and pooling. Asynchronous processing means that when a program needs to wait for an operation (e.g., I/O operation, network request, etc.) to be completed during execution, it will not block the execution of the current thread, but will continue to execute other tasks until the operation is completed. Caching can reduce the number of accesses to back-end services, such as databases, by storing frequently accessed data in a cache, thus improving the response speed and throughput of the system.
Asynchronous execution and caching is not only a solution for handling high concurrency in microservice architectures, but can be used in any distributed architecture. Pooling in microservice architecture means caching the objects that consume more resources to create, avoiding frequent creation and destruction of objects. This reduces the consumption of system resources and improves system stability and performance. We would like to consider more complex pooling processing in distributed scenarios. In the current context of rapid development of AI and blockchain, the growth in the number of applications has increased the demand for computing resources, many applications will exclusively occupy a single processor during operation, but they only use a small amount of resources, which tends to lead to low processor utilization, and the current major pooling technologies include NIC pooling, storage pooling, memory pooling, CPU pooling, and GPU pooling. We mainly consider GPU pooling in decentralized AI scenarios. GPU-Pooling is a software-defined approach to physical GPUs that incorporates multiple capabilities such as GPU virtualization, multi-card aggregation, remote invocation, and dynamic release, etc. Through the pooling technology, AI applications can invoke GPUs of any size according to the load demand, and can even aggregate GPUs from multiple physical nodes. The number and size of virtual GPUs can be adjusted after the container or virtual machine is created. When the AI application stops, the GPU resources are immediately released and returned to the entire GPU resource pool, realizing efficient resource invocation and full utilization. There are already many mature GPU pooling technologies for production environments, including Nvidia's Remote GPU, Fungible (acquired by Microsoft), which connects GPUs over a network via DPUs and PCI vSwitch, rCUDA developed by the Parallel Architecture Group of the Universidad Politécnica de Valencia, Spain, Bitfusion, acquired by VMware, and Chinese GPU pooling technology. Bitfusion, and OrionX, developed by VirtAITech in China.
Colocation
In cloud computing, jobs can be categorized into online and offline jobs according to the different computing tasks. Online jobs are usually in the form of services that process user requests and perform computational tasks, such as web search services, online game services, transaction services, AI reasoning services, etc., with high real-time and stability requirements, while offline jobs are usually computationally intensive batch jobs, such as MapReduce, Spark jobs, AI training, etc. Mixing online and offline jobs in the same cluster is called Colocation. Deploying a mix of online and offline jobs in the same cluster is called Colocation. online jobs are real-time, delay-sensitive, and have low resource consumption, while offline jobs are delay-insensitive and have high resource consumption. what Colocation needs to do is to populate the offline jobs to utilize the free resources of the online services in each time slot, and when the online jobs need resources, the offline jobs will return the occupied resources in time. When the online job needs resources, the offline job returns the occupied resources in a timely manner, and the operation of the offline job tasks can not cause significant interference to the online job tasks. Overall, the most important goal of Colocation is to maximize the utilization of single-computer resources while guaranteeing the SLA (Service Level Agreement) of online services and offline jobs.
Last updated