Network virtualization, management silos and missed opportunities

狂欢‰一夜 发表于 2015-11-5 15:10:18

Conway's law states that"organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations"
Figure 1: Management silosManagement silos described how the organization of operations teams into functional silos (network, storage and server groups) creates an inflexible management structure thatmakes it hard to deal with highly dynamic cloud architectures.
Figure 2: OpenStack Quantum IntroWhen you look at the OpenStack architecture shown in Figure 2, it bears a strong resemblance to existing organizational silos. Is this really the best way to architect next generation cloud systems or is it simply a demonstration of Conway's law in action?

The OpenStack compute scheduler documentation describes the factors that can be included in deciding which compute node to use when starting a virtual machine. Notable by their absence is any mention of storage or network location. In contrast, the Hadoop scheduleris both storage and network topology aware, allowing it to place compute tasks close to storage and replicate data within racks for increased performance and across racks for availability. A previous article, Systemboundary, discussed the importance of including all the tightly coupled network, storage, and compute resources within an integrated control system and NUMA discussed the importanceof location awareness for optimal performance.

Note: OpenStack was selected as a representative example to demonstrate architectural features that are common to many cloud stacks. This article shouldn't be seen as a specific criticism of OpenStack, but as a general discussion of cloud architectures.
Figure 3: OpenStack Quantum IntroOpenStack is still in active development, so one might hope that future schedulers will be enhanced to be more location aware as the network service matures. However, looking at Figure 3, it appears that this will not be possible since the APIs being developedto access the network service do not expose network topology or performance information to the scheduler.

Figure 4: VMware NSX Network VirtualizationFigure 4 shows how the situation becomes even worse as additional layers are added, further removing the scheduler from the information it needs to be location aware. In fact, the lack of location awareness is touted as an advantage, providing"A complete and feature rich virtual network can be defined at liberty from any constraints in physical switching infrastructure features, topologies or resources."

Each orchestration layer kicks the problem of network resource management down to lower layers, until you are left selecting from a range ofvendor specific fabrics which also hide the network topology and present the abstraction of a single switch.
Figure 5: Juniper QFabric ArchitectureOn a slightly different tack, consider whether the organizational divisions in cloud orchestration systems are being justified based on one or more Fallacies of Distributed Computing:

[*]The network is reliable
[*]Latency is zero
[*]Bandwidth is infinite
[*]The network is secure
[*]Topology doesn't change
[*]There is one administrator
[*]Transport cost is zero
[*]The network is homogeneous
A corollary to Conway's law is that flexible organizations are willing and able to reorganize to produce optimal designs. The DevOps movement is breaking down the silos between application developmentand operations teams in order to improve the agility and reliability of cloud based applications. The standards for cloud computing are just starting to emerge and it would be tragic if the opportunity to deliver agile, robust, efficient and scaleable cloudsystems is lost because of an inability to create the flexible, cross disciplinary design groups needed to re-imagine the relationship between networking, storage and computing and produce new architectures.

It is easy to be complacent based on the the buzz around cloud computing, software defined networking and the software defined data center. However, if these architectures don't deliver on their promise, there is competition waiting in the wings - see Returnof the Borg: How Twitter Rebuilt Google’s Secret Weapon. The difference is that these alternative architectures are being developed by flexible organizations that are prepared to consider all aspects of their stack in order to make disruptive improvements.

The unified visibility across all network, server, storage and application resources provided by the multi-vendor sFlow standard offers a solution. Piercing through the layers of abstraction and architectural silos delivers the comprehensive real-timeanalytics and location awareness for efficient scheduling.

页: [1]

运维网's Archiver

Network virtualization, management silos and missed opportunities