-
Notifications
You must be signed in to change notification settings - Fork 7
Road Map
==
While YAYA already provides a powerful Java DSL, Scala as a language is far more powerful and feature-full when building DSLs and given the fact that Scala is JVM-based language and nicely integrates with Java, having Scala-based DSL should be attractive not only to Scala developers but Java developers as well.
==
Spring is a popular general purpose Application Framework with one of the largest communities out there. It would only be natural to attract such community by providing variety of Spring Bindings in a form of namespace support as well as annotations.
==
For long-lived Application Containers YarnApplication.launch()
call returns an array of addressable ContainerDelegates.
And while developers can already choose (based on the address or other logic) which container they want to interact with, the task would be greatly simplified by providing a Load Balancing strategy of some type which can internally maintain the knowledge of which actual Application Containers to interact with. For example; You may have HostBasedFirstAvailableLoadBalancingStartegy which as the name suggest will have some internally defined host filter (only use host 192.168.2.3) and you may have 5 out of 10 Application Containers running on this host. Some of them may be busy while others available. Such strategy will maintain filtering and distribution of the messages.
==
Similar to the Configurable Load Balancers, another strategy would be to wrap ContainerDelegates in some type of configurable Executor abstraction which would control and distribute work between available Application Containers, essentially making it a type of Load Balancer. Very likely these two items will merge into one at some point when it gets passed POC stage.
==
One of the core requirement for any distributed system is its ability to share the load. This essentially implies delegation of work between available workers. In YARN such workers are represented through Application Containers. However, while performing work an Application Container may decide (based on variety of things) that the load is too high for it to handle on its own and it may choose to delegate part of its load to another YARN Application. Such application may or may not be running in the same YARN Cluster. What further complicates things is that in certain cases its hard to predetermine in advance how many Application Containers one would need to adequately process the load. Take a reverse Map/Reduce paradigm (e.g., Monte Carlo Simulation) where the input data is rather small but the computation performed on such data produces larger amounts of data which may need to be analyzed in real time and if so may result in production of more data to be analyzed essentially creating a non-deterministic work tree where the size of this tree and its growth is controlled by each branch spinning off (or not) another branch based on some condition.
Given such uncertainty it would be very difficult to impossible to maintain consistent system load within a single cluster while expecting timely responses to such computations. In other words we may need to start borrowing additional computation and IO resources from another cluster(s) (stand-by cluster).
While its already possible to accomplish by simply creating and launching a new YARN Application within an Application Container, the goal of this road-map item is to simplify such distribution requirement through a higher level strategy so it could be exposed as a simple method call.
==
This road-map is a living document and will be updated as needed. New items will be added and implemented items will be removed, so keep watching.