Search in sources :

Example 1 with AppEvent

use of com.tencent.angel.master.app.AppEvent in project angel by Tencent.

the class PSAgentManager method psAgentKilled.

@SuppressWarnings("unchecked")
private void psAgentKilled(PSAgentManagerEvent event) {
    killedPSAgentMap.put(event.getPsAgentId(), psAgentMap.get(event.getPsAgentId()));
    context.getEventHandler().handle(new AppEvent(context.getApplicationId(), AppEventType.KILL));
}
Also used : AppEvent(com.tencent.angel.master.app.AppEvent)

Example 2 with AppEvent

use of com.tencent.angel.master.app.AppEvent in project angel by Tencent.

the class PSManagerTest method testPSDone.

@SuppressWarnings("unchecked")
@Test
public void testPSDone() throws Exception {
    try {
        AngelApplicationMaster angelAppMaster = LocalClusterContext.get().getMaster().getAppMaster();
        ParameterServer ps = LocalClusterContext.get().getPS(psAttempt0Id).getPS();
        Location masterLoc = ps.getMasterLocation();
        TConnection connection = TConnectionManager.getConnection(ps.getConf());
        MasterProtocol master = connection.getMasterService(masterLoc.getIp(), masterLoc.getPort());
        WorkerDoneRequest workerRequest = WorkerDoneRequest.newBuilder().setWorkerAttemptId(ProtobufUtil.convertToIdProto(worker0Attempt0Id)).build();
        WorkerDoneResponse workerResponse = master.workerDone(null, workerRequest);
        assertEquals(workerResponse.getCommand(), WorkerCommandProto.W_SUCCESS);
        Thread.sleep(5000);
        angelAppMaster.getAppContext().getEventHandler().handle(new AppEvent(AppEventType.COMMIT));
        PSDoneRequest request = PSDoneRequest.newBuilder().setPsAttemptId(ProtobufUtil.convertToIdProto(psAttempt0Id)).build();
        master.psDone(null, request);
        Thread.sleep(5000);
        ParameterServerManager psManager = angelAppMaster.getAppContext().getParameterServerManager();
        AMParameterServer amPs = psManager.getParameterServer(psId);
        PSAttempt psAttempt = amPs.getPSAttempt(psAttempt0Id);
        assertEquals(psAttempt.getInternalState(), PSAttemptStateInternal.SUCCESS);
        assertTrue(amPs.getState() == AMParameterServerState.SUCCESS);
        assertEquals(amPs.getNextAttemptNumber(), 1);
        assertNull(amPs.getRunningAttemptId());
        assertEquals(amPs.getSuccessAttemptId(), psAttempt0Id);
        assertEquals(amPs.getPSAttempts().size(), 1);
    } catch (Exception x) {
        LOG.error("run testPSDone failed ", x);
        throw x;
    }
}
Also used : WorkerDoneRequest(com.tencent.angel.protobuf.generated.WorkerMasterServiceProtos.WorkerDoneRequest) AMParameterServer(com.tencent.angel.master.ps.ps.AMParameterServer) AngelException(com.tencent.angel.exception.AngelException) AMParameterServer(com.tencent.angel.master.ps.ps.AMParameterServer) ParameterServer(com.tencent.angel.ps.impl.ParameterServer) AppEvent(com.tencent.angel.master.app.AppEvent) TConnection(com.tencent.angel.ipc.TConnection) PSAttempt(com.tencent.angel.master.ps.attempt.PSAttempt) ParameterServerManager(com.tencent.angel.master.ps.ParameterServerManager) WorkerDoneResponse(com.tencent.angel.protobuf.generated.WorkerMasterServiceProtos.WorkerDoneResponse) Location(com.tencent.angel.common.location.Location) Test(org.junit.Test)

Example 3 with AppEvent

use of com.tencent.angel.master.app.AppEvent in project angel by Tencent.

the class MasterService method save.

/**
 * Save model to files.
 *
 * @param controller rpc controller of protobuf
 * @param request save request that contains all matrices need save
 * @throws ServiceException some matrices do not exist or save operation is interrupted
 */
@SuppressWarnings("unchecked")
@Override
public SaveResponse save(RpcController controller, SaveRequest request) throws ServiceException {
    List<String> needSaveMatrices = request.getMatrixNamesList();
    List<Integer> matrixIds = new ArrayList<Integer>(needSaveMatrices.size());
    AMMatrixMetaManager matrixMetaManager = context.getMatrixMetaManager();
    int size = needSaveMatrices.size();
    for (int i = 0; i < size; i++) {
        MatrixMeta matrixMeta = matrixMetaManager.getMatrix(needSaveMatrices.get(i));
        if (matrixMeta == null) {
            throw new ServiceException("matrix " + needSaveMatrices.get(i) + " does not exist");
        }
        LOG.info("Need save matrix " + matrixMeta.getName());
        matrixIds.add(matrixMeta.getId());
    }
    context.getEventHandler().handle(new CommitEvent(matrixIds));
    context.getEventHandler().handle(new AppEvent(AppEventType.COMMIT));
    return SaveResponse.newBuilder().build();
}
Also used : AppEvent(com.tencent.angel.master.app.AppEvent) ServiceException(com.google.protobuf.ServiceException) AMMatrixMetaManager(com.tencent.angel.master.matrixmeta.AMMatrixMetaManager) MatrixMeta(com.tencent.angel.ml.matrix.MatrixMeta) CommitEvent(com.tencent.angel.master.ps.CommitEvent)

Aggregations

AppEvent (com.tencent.angel.master.app.AppEvent)3 ServiceException (com.google.protobuf.ServiceException)1 Location (com.tencent.angel.common.location.Location)1 AngelException (com.tencent.angel.exception.AngelException)1 TConnection (com.tencent.angel.ipc.TConnection)1 AMMatrixMetaManager (com.tencent.angel.master.matrixmeta.AMMatrixMetaManager)1 CommitEvent (com.tencent.angel.master.ps.CommitEvent)1 ParameterServerManager (com.tencent.angel.master.ps.ParameterServerManager)1 PSAttempt (com.tencent.angel.master.ps.attempt.PSAttempt)1 AMParameterServer (com.tencent.angel.master.ps.ps.AMParameterServer)1 MatrixMeta (com.tencent.angel.ml.matrix.MatrixMeta)1 WorkerDoneRequest (com.tencent.angel.protobuf.generated.WorkerMasterServiceProtos.WorkerDoneRequest)1 WorkerDoneResponse (com.tencent.angel.protobuf.generated.WorkerMasterServiceProtos.WorkerDoneResponse)1 ParameterServer (com.tencent.angel.ps.impl.ParameterServer)1 Test (org.junit.Test)1