-
Notifications
You must be signed in to change notification settings - Fork 21
Getting Started
Prerequisites: Java 1.8+
, ant
, bash
Binary: Download the latest stable binary from the releases page.
Source:
git clone https://github.com/MobilityFirst/gigapaxos
cd gigapaxos
ant jar
GigaPaxos is a group-scalable replicated state machine (RSM) system, i.e., it allows applications to easily create and manage a very large number of separate RSMs. Clients can associate each service with a separate RSM of its own from a subset of a pool of server machines. Thus, different services may be replicated on different sets of machines in accordance with their fault tolerance or performance requirements. The underlying consensus protocol for each RSM is Paxos, however it is carefully engineered so as to be extremely lightweight and fast. For example, each RSM uses only a few hundred bytes of memory when it is idle (i.e., not actively processing requests), so commodity machines can participate in millions of different RSMs. When actively processing requests, the message overhead per request is similar to Paxos, but automatic batching of requests and Paxos messages significantly improves the throughput by reducing message overhead, especially when the number of different RSM groups is small. The lightweight API for creating and interacting with different RSMs allows applications to “carelessly” create consensus groups on the fly for even small shared objects, e.g. a simple counter or a lightweight stateful servlet. GigaPaxos also has extensive support for reconfiguration, i.e., the membership of different RSMs can be programmatically changed by application developers by writing their own reconfiguration policies.
GigaPaxos has a simple Replicable wrapper API (Replicable.java) that can be written for any "black-box" application in order for GigaPaxos to automatically replicate and reconfigure it in accordance with an application-specified policy. This API requires three methods to be implemented:
boolean execute(Request request)
String checkpoint(String name)
boolean restore(String name, String state)
to respectively execute a request, obtain a state checkpoint, or to roll back the state of a service named name
. GigaPaxos ensures that applications implementing the Replicable interface are also automatically reconfigurable, i.e., their replica locations are automatically changed in accordance with an application-specified policy. GigaPaxos only requires the Replicable
wrapper to be written in Java; the application itself may be in any language. GigaPaxos works with unmodified legacy applications even if the application was originally developed as a standalone application without any fault-tolerance, consistency, or replica-coordination mechanisms.
In this tutorial, we will learn how to use GigaPaxos' Replicable API on a single physical machine by creating virtual application server replicas listening on different ports on the loopback interface.
Using GigaPaxos requires a gigapaxos.properties file with the contraints described in Configuration Properties. The default gigapaxos.properties file conf/gigapaxos.properties
has two sets of entries respectively for "active" replica servers and "reconfigurator" servers. Every line in the former starts with the string active. followed by an alphanumeric string that is the name of that active server (e.g., AR0, AR1, or AR2 below) followed by the separator '=' and a host:port address listening address for that server. Likewise, every line in the latter starts with the string reconfigurator. followed by the separator and its host:port information
The APPLICATION parameter below specifies the name of the application, or equivalently the Replicable
wrapper we will be using. The default is edu.umass.cs.reconfiguration.examples.noopsimple.NoopApp, so uncomment the APPLICATION line below (by removing the leading # if needed) as we will be using this simpler stateless NoopPaxosApp application in this first tutorial. Alternatively, copy over conf/examples/nooppaxos.properties
to conf/gigapaxos.properties
.
#APPLICATION=edu.umass.cs.gigapaxos.examples.noop.NoopPaxosApp
active.AR0=127.0.0.1:2000
active.AR1=127.0.0.1:2001
active.AR2=127.0.0.1:2002
reconfigurator.RC0=127.0.0.1:3100
reconfigurator.RC1=127.0.0.1:3101
reconfigurator.RC2=127.0.0.1:3102
Reconfigurators form the "control plane" of gigapaxos and are responsible for providing a directory service to direct client requests to active replicas in an application-agnostic manner while active replicas form the application "service plane" and are responsible for executing client requests. At least one active server is needed to use gigapaxos. Three or more active servers are needed in order to tolerate a single active server failure. At least one reconfigurator is needed in order to be able to reconfigure RSMs on active servers, and three or more for tolerating reconfigurator server failures. Both actives and reconfigurators use consensus, so they need at least 2f+1 replicas in order to make progress despite up to f failures.
For the single-machine local test, except for setting APPLICATION to NoopPaxosApp, you can leave the default gigapaxos.properties file as above unchanged with 3 actives and 3 reconfigurators even though we won't really be using the reconfigurators at all.
Run the servers as follows from the top-level directory:
bin/gpServer.sh start all
If any actives or reconfigurators or other servers are already listening on those ports, you will see errors in the log file ( /tmp/gigapaxos.log ) by default). To make sure that no servers are already running, do
bin/gpServer.sh stop all
To start or stop a specific active or reconfigurator, replace all above with the name of an active (e.g., AR0) or reconfigurator (e.g., NoopPaxosRC1) above. To stop and restart a running server, use the restart
option. Running gpServer.sh
without any options displays a manual of supported options.
Wait until you see all servers ready on the console before starting any clients.
Then, start the default client as follows from the top-level directory:
bin/gpClient.sh
The client will by default use NoopPaxosAppClient if the application is NoopPaxosApp, and will use NoopAppClient if the application is the default NoopApp. As we are using the former app in this tutorial, running the above script will launch NoopPaxosAppClient.
For any application, a default paxos group called <app_name>0 will be automatically created by the servers, so in this example, our (only) paxos group will be called NoopPaxosApp0.
The NoopPaxosAppClient client will simply send a few requests to the servers, wait for the responses, and print them on the console. The client is really simple and illustrates how to send callback-based requests. You can view its source here: NoopPaxosClient.java
NoopPaxosApp is a trivial instantiation of Replicable and its source is here: NoopPaxosApp.java
You can verify that stopping one of the actives as follows will not affect the system's liveness, however, any requests going to the failed server will of course not get responses. The sendRequest method in NoopPaxosAppClient by default sends each request to a random replica, so roughly a third of the requests will be lost with a single failure.
bin/gpServer.sh stop AR1
Next, do the following by yourself:
- Browse the methods in NooPaxosAppClient's parent PaxosClientAsync.java and use one of the sendRequest methods therein to direct all requests to a specific active replica.
- Verify that all requests succeed despite a single active failure if requests are sent to an alive server.
- Verify that with two active failures, no requests succeed.
Stateless applications (like NoopPaxosApp
above) are generally not good candidates for employing RSM-based fault tolerance, so we will next walk through a stateful application and make it Replicable
. This toy application called StatefulAdderApp
(StatefulAdderApp.java) expects incoming requests to contain an integer and the service itself maintains a single variable, total, that is the sum of the integers in all executed requests and returns this total in response to each executed request.
The configuration properties file used in this example is conf/examples/adder.properties
and, as above, we continue to run all active replica servers on a single physical machine. Start the servers and, after the servers are ready, start the client as follows:
bin/gpServer.sh -DgigapaxosConfig=conf/examples/adder.properties restart all
bin/gpClient.sh -DgigapaxosConfig=conf/examples/adder.properties edu.umass.cs.gigapaxos.examples.adder.StatefulAdderAppClient
The client StatefulAdderAppClient
(StatefulAdderAppClient.java) above sends a random integer in each request. Note that a request with the value 0 is implicitly a read request for the current total. Next, do the following by yourself:
- Change the client to send all requests to a single server; stop one of the other servers; and verify that all requests continue to succeed.
- Restart the stopped server; modify the client to send a read request to that server; and verify that it returns the same total as that most recently returned by the other two servers.
The above examples required the application to serialize all requests into strings and enforced linearizability by requiring every request to be coordinated by the RSM. Applications may often require only some types of requests to be linearized via replica coordination while allowing others to be executed locally or through some other application-specific replica coordination mechanism. The example in this tutorial, LinWritesLocReadsApp
(LinWritesLocReadsApp.java) is similar in spirit to StatefulAdderApp
above but enforces a total order only on write operations while allowing replicas to serve reads locally.
To this end, we have to move away from the simpler PaxosClientAsync client we used above and instead extend the more flexible ReconfigurableAppClientAsync
in the corresponding client LinWritesLocReadsApp
(LinWritesLocReadsAppClient.java) that supports multiple request types and a number of other developer-friendly mechanisms.
Run the LinWritesLocReadsApp
servers and LinWritesLocReadsClient
client as follows:
bin/gpServer.sh -DgigapaxosConfig=conf/examples/linwrites.properties restart all
bin/gpClient.sh -DgigapaxosConfig=conf/examples/linwrites.properties edu.umass.cs.reconfiguration.examples.linwrites.LinWritesLocReadsAppClient
In order to support multiple request types in this example, we used a class SimpleAppRequest
with two request types COORDINATED_WRITE
and LOCAL_READ
(refer source here: SimpleAppRequest.java). The method getRequestTypes
in the Replicable
application LinWritesLocReadsApp
tells GigaPaxos what request types the application expects. The method needsCoordination()
in the ReplicableRequest
interface implemented by SimpleAppRequest
tells GigaPaxos whether the request should or should not be coordinated.
Next, do the following by yourself:
- Verify first that local reads above are faster than coordinated writes, and then change
SimpleAppRequest
to make reads also coordinated and verify that reads become as slow as writes. - Stop one active server, say,
AR1
; run the providedLinWritesLocReadsAppClient
client again; start backAR1
; and use a modified version ofLinWritesLocReadsAppClient
to send a single read request toAR1
. Does that read request see the result of the most recent write coordinated by the other two servers while it was stopped? Explain your observation.
In this tutorial, we will exercise GigaPaxos' abilities to create many different RSMs on the fly as well as to reconfigure them as desired.
For this tutorial, we will use the default conf/gigapaxos.properties
configuration file that runs a default application called NoopApp
. If you changed the APPLICATION
or made any other changes to the default conf/gigapaxos.properties
in previous tutorials, you need to undo them (or restore the file from conf/examples/noop.properties
). If so, it is best to use a fresh gigapaxos install as the APPLICATION
can not be changed midway in an existing gigapaxos directory; doing so will lead to errors as gigapaxos will try to feed requests to the application that the application will fail to parse. Alternatively, you can first wipe out all logs as follows:
bin/gpServer.sh clear all
Or you can manually remove logs as follows from their default location (or from their
non-default locations if you set GIGAPAXOS_DATA_DIR
in gigapaxos.properties):
rm -rf ./paxos_logs ./reconfiguration_DB
Next, run the servers and clients exactly as before. You will see console output showing that NoopAppClient creates a few names and successfully sends a few requests to them. A Reconfigurable application must implement slightly different semantics from just a Replicable application. You can browse through the source of NoopApp and NoopAppClient and the documentation therein below:
Step 1: Repeat the same failure scenario as above and verify that the actives exhibit the same liveness properties as before.
Step 2: Set the property RECONFIGURE_IN_PLACE=true in gigapaxos.properties in order to enable trivial reconfiguration, which means reconfiguring a replica group to the same replica group while going through all of the motions of the three-phase reconfiguration protocol (i.e., STOP the previous epoch at the old replica group, START the new epoch in the new replica group after having them fetch the final epoch state from the old epoch's replica group, and finally having the old replica group DROP all state from the previous epoch).
The default reconfiguration policy trivially reconfigures the replica group after every request. This policy is clearly an overkill as the overhead of reconfiguration will typically be much higher than processing a single application request (but it allows us to potentially create a new replica at every new location from near which even a single client request originates). Our goal here is to just test a proof-of-concept and understand how to implement other more practical policies.
Step 3: Run NoopAppClient by simply invoking the client command like before:
bin/gpClient.sh
NoopApp should print console output upon every reconfiguration when its restore method will be called with a null argument to wipe out state corresponding to the current epoch and again immediately after when it is initialized with the state corresponding to the next epoch for the service name being reconfigured.
Step 4: Inspect the default reconfiguration policy in
and the abstract class
AbstractDemandProfile.java (doc)
that any application-specific reconfiguration policy is expected to extend in order to achieve its reconfiguration goals.
Change the default reconfiguration policy in DemandProfile so that the default service name NoopApp0 is reconfigured less often. For example, you can set MIN_REQUESTS_BEFORE_RECONFIGURATION and/or MIN_RECONFIGURATION_INTERVAL to higher values. There are two ways to do this: (i) the quick and dirty way is to change DemandProfile.java directly and recompile gigapaxos from source; (ii) the cleaner and recommended way is to write your own policy implementation, say MyDemandProfile, that either extends DemandProfile or extends AbstractDemandProfile directly and specify it in gigapaxos.properties by setting the DEMAND_PROFILE_TYPE property by uncommenting the corresponding line and replacing the value with the canonical class name of your demand profile implementation as shown below. With this latter approach, you just need the gigapaxos binaries and don't have to recompile it from source. You do need to compile and generate the class file(s) for your policy implementation.
#DEMAND_PROFILE_TYPE=edu.umass.cs.reconfiguration.reconfigurationutils.DemandProfile
If all goes well, with the above changes, you should see NoopApp reconfiguring itself less frequently as per the specification in your reconfiguration policy!
Troubleshooting tips: If you run into errors:
(1) Make sure the canonical class name of your policy class is correctly specified in gigapaxos.properties and the class exists in your classpath. If the simple policy change above works as expected by directly modifying the default DemandProfile implementation and recompiling gigapaxos from source, but with your own demand profile implementation you get ClassNotFoundException or other runtime errors, the most likely reason is that the JVM can not find your policy class.
(2) Make sure that all three constructors of DemandProfile that respectively take a DemandProfile, String, and JSONObject are overridden with the corresponding default implementation that simply invokes super(arg); all three constructors are necessary for gigapaxos' reflection-based demand profile instance creation to work correctly.
Step 5: Inspect the code in
to see how it is creating a service name by sending a CREATE_SERVICE_NAME request. A service name corresponds to an RSM, but note that there is no API to specify the set of active replicas that should manage the RSM for the name being created. This is because gigapaxos randomly chooses the initial replica group for each service at creation time. Applications are expected to reconfigure the replica group as needed after creation by using a policy class as described above.
Once a service has been created, application requests can be sent to it also using one of the sendRequest methods as exemplified in NoopAppClient.
Deleting a service is as simple as issuing a DELETE_SERVICE_NAME request using the same sendRequest API as CREATE_SERVICE_NAME above.
Note that unlike NoopPaxosAppClient above, NoopAppClient as well as the corresponding app, NoopApp use a different request type called AppRequest as opposed to the default RequestPacket packet type. Reconfigurable gigapaxos applications can define their own extensive request types as needed for different types of requests. The set of request types that an application processes is conveyed to gigapaxos via the Replicable.getRequestTypes() that the application needs to implement.
Applications can also specify whether a request should be paxos-coordinated or served locally by an active replica. By default, all requests are executed locally unless the request is of type ReplicableRequest and its needsCoordination method is true (as is the case for AppRequest by default).
Verify that you can create, send application requests to, and delete a new service using the methods above.
A list of all relevant classes for Tutorial 4 above is listed below for convenience: