Search in sources :

Example 1 with TaskStoreException

use of cz.metacentrum.perun.taskslib.exceptions.TaskStoreException in project perun by CESNET.

the class TaskStoreImpl method addTask.

@Override
public Task addTask(Task task) throws TaskStoreException {
    if (task.getService() == null) {
        log.error("Tried to insert Task {} with no Service", task);
        throw new IllegalArgumentException("Tasks Service not set.");
    } else if (task.getFacility() == null) {
        log.error("Tried to insert Task {} with no Facility", task);
        throw new IllegalArgumentException("Tasks Facility not set.");
    }
    Task idAdded;
    Task otherAdded;
    synchronized (this) {
        idAdded = tasksById.put(task.getId(), task);
        otherAdded = tasksByFacilityAndService.put(new Pair<>(task.getFacility().getId(), task.getService().getId()), task);
    }
    if (idAdded != otherAdded) {
        log.error("Task returned from both Maps after insert differ. taskById {}, taskByFacilityAndService {}", idAdded, otherAdded);
        throw new TaskStoreException("Tasks returned after insert into both Maps differ.");
    } else {
        return idAdded;
    }
}
Also used : Task(cz.metacentrum.perun.taskslib.model.Task) TaskStoreException(cz.metacentrum.perun.taskslib.exceptions.TaskStoreException) Pair(cz.metacentrum.perun.core.api.Pair)

Example 2 with TaskStoreException

use of cz.metacentrum.perun.taskslib.exceptions.TaskStoreException in project perun by CESNET.

the class TaskStoreImpl method removeTask.

@Override
public Task removeTask(Task task) throws TaskStoreException {
    Task idRemoved;
    Task otherRemoved;
    synchronized (this) {
        idRemoved = tasksById.remove(task.getId());
        otherRemoved = tasksByFacilityAndService.remove(new Pair<>(task.getFacility().getId(), task.getService().getId()));
    }
    if (idRemoved != otherRemoved) {
        log.error("Inconsistent state occurred after removing Task {} from TaskStore", task);
        throw new TaskStoreException("Unable to remove Task properly.");
    }
    return idRemoved;
}
Also used : Task(cz.metacentrum.perun.taskslib.model.Task) TaskStoreException(cz.metacentrum.perun.taskslib.exceptions.TaskStoreException) Pair(cz.metacentrum.perun.core.api.Pair)

Example 3 with TaskStoreException

use of cz.metacentrum.perun.taskslib.exceptions.TaskStoreException in project perun by CESNET.

the class PropagationMaintainerImpl method endStuckTasks.

// ----- methods ------------------------------
public void endStuckTasks() {
    // handle stuck GEN tasks
    for (Map.Entry<Future<Task>, Task> generatingTask : generatingTasks.getRunningTasks().entrySet()) {
        Task task = generatingTask.getValue();
        Future<Task> future = generatingTask.getKey();
        LocalDateTime startTime = task.getGenStartTime();
        long howManyMinutesAgo = 0;
        if (startTime != null) {
            howManyMinutesAgo = ChronoUnit.MINUTES.between(startTime, LocalDateTime.now());
        }
        if (startTime == null) {
            // by implementation can't happen, we set time before adding to the generatingTasksMap
            log.error("[{}] Task in generatingTasks has no start time. Shouldn't happen by implementation.", task.getId());
        } else if (howManyMinutesAgo >= rescheduleTime) {
            if (!future.isCancelled()) {
                // Cancel running GEN Task - we expect that it will be picked by GenCollector
                // and removed from the Engine.
                log.debug("[{}] Cancelling stuck generating Future<Task>.", task.getId());
                future.cancel(true);
            } else {
                // We cancelled Task in previous run, but it wasn't picked by GenCollector
                // GenCollector probably doesn't run -> abort task manually
                log.debug("[{}] Cancelled stuck generating Future<Task> was not picked by GenCollector, forcefully removing from Engine.", task.getId());
                // to release semaphore
                generatingTasks.removeStuckTask(future);
                abortTask(task, TaskStatus.GENERROR);
            }
        }
    }
    // handle stuck SEND tasks
    for (Map.Entry<Future<SendTask>, SendTask> sendingSendTask : sendingSendTasks.getRunningTasks().entrySet()) {
        SendTask sendTask = sendingSendTask.getValue();
        Future<SendTask> future = sendingSendTask.getKey();
        Task task = sendTask.getTask();
        Date startTime = sendTask.getStartTime();
        int howManyMinutesAgo = 0;
        if (startTime != null) {
            howManyMinutesAgo = (int) (System.currentTimeMillis() - startTime.getTime()) / 1000 / 60;
        }
        if (startTime == null) {
            // by implementation can't happen, we set time before adding to the generatingTasksMap
            log.error("[{}] SendTask in sendingSendTask has no start time for Destination {}. Shouldn't happen by implementation.", task.getId(), sendTask.getDestination());
        } else if (howManyMinutesAgo >= rescheduleTime) {
            sendTask.setStatus(SendTaskStatus.ERROR);
            if (!future.isCancelled()) {
                // Cancel running Send Task - we expect that it will be picked by SendCollector
                // and removed from the Engine if all SendTasks are done
                log.debug("[{}] Cancelling stuck sending Future<SendTask> for Destination: {}.", task.getId(), sendTask.getDestination());
                future.cancel(true);
            } else {
                log.debug("[{}] Cancelled stuck sending Future<SendTask> for Destination: {} was not picked by SendCollector, forcefully removing from Engine.", task.getId(), sendTask.getDestination());
                // We cancelled Task in previous run, but it wasn't picked by SendCollector
                // SendCollector probably doesn't run
                // to release semaphore
                sendingSendTasks.removeStuckTask(future);
                // make sure Task is switched to SENDERROR
                task.setSendEndTime(LocalDateTime.now());
                task.setStatus(TaskStatus.SENDERROR);
                // report result
                TaskResult taskResult = null;
                try {
                    taskResult = schedulingPool.createTaskResult(task.getId(), sendTask.getDestination().getId(), sendTask.getStderr(), sendTask.getStdout(), sendTask.getReturnCode(), task.getService());
                    jmsQueueManager.reportTaskResult(taskResult);
                } catch (JMSException | InterruptedException e) {
                    log.error("[{}] Error trying to reportTaskResult {} of {} to Dispatcher: {}", task.getId(), taskResult, task, e);
                }
                // lower counter for stuck SendTask if count <= 1 remove from Engine
                try {
                    schedulingPool.decreaseSendTaskCount(task, 1);
                } catch (TaskStoreException e) {
                    log.error("[{}] Task {} could not be removed from SchedulingPool: {}", task.getId(), task, e);
                }
            }
        }
    }
    // check all known Tasks
    Collection<Task> allTasks = schedulingPool.getAllTasks();
    if (allTasks == null) {
        return;
    }
    for (Task task : allTasks) {
        switch(task.getStatus()) {
            case WAITING:
                /*
					Such Tasks should never be in Engine, (only in Dispatcher) since when they are sent to Engine,
					status is set to PLANNED in both components. If they are already present in SchedulingPool
					(Engine), then adding of new (same) Task is skipped and previous processing is finished first.
					=> just remove such nonsense from SchedulingPool and don't spam Dispatcher
					 */
                try {
                    // TODO - can such Task be in any structure like generating/sending/newTasks/generatedTasks ?
                    schedulingPool.removeTask(task.getId());
                    log.warn("[{}] Task in WAITING state shouldn't be in Engine at all, silently removing from SchedulingPool.", task.getId());
                } catch (TaskStoreException ex) {
                    log.error("[{}] Failed during removal of WAITING Task from SchedulingPool. Such Task shouldn't be in Engine at all: {}", task.getId(), ex);
                }
            case PLANNED:
                /*
					Check tasks, that should be put to scheduling pool by EventProcessorImpl and taken by GenPlanner.
					Tasks might be like that, because adding to BlockingDeque has limit on Integer#MAX_SIZE
					(while EventProcessorImpl adds Task to the scheduling pool).
					Also if GenPlanner implementation fails it might take Task from the BlockingDeque but doesn't change
					its status or doesn't put it between generatingTasks.
					 */
                BlockingDeque<Task> newTasks = schedulingPool.getNewTasksQueue();
                if (!newTasks.contains(task)) {
                    try {
                        log.debug("[{}] Re-adding PLANNED Task back to pool and newTasks queue. Probably GenPlanner failed.", task.getId());
                        schedulingPool.addTask(task);
                    } catch (TaskStoreException e) {
                        log.error("Could not save Task {} into Engine SchedulingPool because of {}, setting to ERROR", task, e);
                        abortTask(task, TaskStatus.ERROR);
                    }
                }
                break;
            case GENERATING:
                /*
					This is basically the same check as for the GENERATING Tasks above,
					but now for Tasks missing in "generatingTasks".
					!! We can't abort GENERATING Tasks with startTime=NULL here,
					because they are waiting to be started at genCompletionService#blockingSubmit() !!
					*/
                LocalDateTime startTime = task.getGenStartTime();
                long howManyMinutesAgo = 0;
                if (startTime != null) {
                    howManyMinutesAgo = ChronoUnit.MINUTES.between(startTime, LocalDateTime.now());
                }
                // somebody probably wrongly manipulated the structure
                if (howManyMinutesAgo >= rescheduleTime && !generatingTasks.getRunningTasks().values().contains(task)) {
                    // probably GenCollector failed to pick task -> abort
                    abortTask(task, TaskStatus.GENERROR);
                }
                break;
            case GENERROR:
            case GENERATED:
                /*
					Check Tasks, which should be processed by GenCollector and taken by SendPlanner or reported as GENERROR to Dispatcher.
					Task must have endTime set by GenWorker, otherwise it failed completely and should be reported as error.
					If either of GenCollector and SendPlanner fails to process generated tasks, it's missing in generatedTasksQueue.
					*/
                LocalDateTime genEndTime = task.getGenEndTime();
                howManyMinutesAgo = 0;
                if (genEndTime != null) {
                    howManyMinutesAgo = ChronoUnit.MINUTES.between(genEndTime, LocalDateTime.now());
                }
                // If too much time has passed for Task and its not present in generatedTasksQueue, something is broken
                if ((genEndTime == null || howManyMinutesAgo >= rescheduleTime) && !schedulingPool.getGeneratedTasksQueue().contains(task)) {
                    abortTask(task, TaskStatus.GENERROR);
                }
                break;
            case SENDING:
                // TODO   since Task is switched to SENDING before blockingSubmit() of any SendWorker.
                break;
            case WARNING:
            case SENDERROR:
                LocalDateTime endTime = task.getSendEndTime();
                howManyMinutesAgo = 0;
                if (endTime != null) {
                    howManyMinutesAgo = ChronoUnit.MINUTES.between(endTime, LocalDateTime.now());
                }
                // If too much time has passed something is broken
                if (endTime == null || howManyMinutesAgo >= rescheduleTime) {
                    abortTask(task, TaskStatus.SENDERROR);
                }
                break;
            case ERROR:
                break;
            case DONE:
            default:
                // unknown state
                log.debug("[{}] Failing to default, status was: {}", task.getId(), task.getStatus());
                abortTask(task, TaskStatus.ERROR);
        }
    }
}
Also used : LocalDateTime(java.time.LocalDateTime) Task(cz.metacentrum.perun.taskslib.model.Task) SendTask(cz.metacentrum.perun.taskslib.model.SendTask) TaskStoreException(cz.metacentrum.perun.taskslib.exceptions.TaskStoreException) Date(java.util.Date) Future(java.util.concurrent.Future) SendTask(cz.metacentrum.perun.taskslib.model.SendTask) TaskResult(cz.metacentrum.perun.taskslib.model.TaskResult) Map(java.util.Map)

Example 4 with TaskStoreException

use of cz.metacentrum.perun.taskslib.exceptions.TaskStoreException in project perun by CESNET.

the class GenCollector method run.

@Override
public void run() {
    BlockingDeque<Task> generatedTasks = schedulingPool.getGeneratedTasksQueue();
    while (!shouldStop()) {
        try {
            Task task = genCompletionService.blockingTake();
            // set ok status immediately
            task.setStatus(Task.TaskStatus.GENERATED);
            // report to Dispatcher
            try {
                jmsQueueManager.reportTaskStatus(task.getId(), task.getStatus(), task.getGenEndTime().atZone(ZoneId.systemDefault()).toInstant().toEpochMilli());
            } catch (JMSException e) {
                jmsErrorLog(task.getId(), task.getStatus());
            }
            // push Task to generated
            if (task.isPropagationForced()) {
                generatedTasks.putFirst(task);
            } else {
                generatedTasks.put(task);
            }
        } catch (InterruptedException e) {
            String errorStr = "Thread collecting generated Tasks was interrupted.";
            log.error(errorStr);
            throw new RuntimeException(errorStr, e);
        } catch (TaskExecutionException e) {
            // GEN Task failed
            Task task = e.getTask();
            if (task == null) {
                log.error("GEN Task failed, but TaskExecutionException doesn't contained Task object! Tasks will be cleaned by PropagationMaintainer#endStuckTasks()");
            } else {
                task.setStatus(GENERROR);
                for (Destination dest : task.getDestinations()) {
                    try {
                        jmsQueueManager.reportTaskResult(schedulingPool.createTaskResult(task.getId(), dest.getId(), e.getStderr(), e.getStdout(), e.getReturnCode(), task.getService()));
                    } catch (JMSException | InterruptedException ex) {
                        log.error("[{}] Error trying to reportTaskResult for Destination: {} to Dispatcher: {}", task.getId(), dest, ex);
                    }
                }
                try {
                    jmsQueueManager.reportTaskStatus(task.getId(), GENERROR, System.currentTimeMillis());
                } catch (JMSException | InterruptedException e1) {
                    jmsErrorLog(task.getId(), task.getStatus());
                }
                try {
                    schedulingPool.removeTask(task.getId());
                } catch (TaskStoreException e1) {
                    log.error("[{}] Could not remove error GEN Task from SchedulingPool: {}", task.getId(), e1);
                }
            }
        } catch (Throwable ex) {
            log.error("Unexpected exception in GenCollector thread. Stuck Tasks will be cleaned by PropagationMaintainer#endStuckTasks() later.", ex);
        }
    }
}
Also used : TaskExecutionException(cz.metacentrum.perun.engine.exceptions.TaskExecutionException) Destination(cz.metacentrum.perun.core.api.Destination) Task(cz.metacentrum.perun.taskslib.model.Task) JMSException(javax.jms.JMSException) TaskStoreException(cz.metacentrum.perun.taskslib.exceptions.TaskStoreException)

Example 5 with TaskStoreException

use of cz.metacentrum.perun.taskslib.exceptions.TaskStoreException in project perun by CESNET.

the class SendCollector method run.

@Override
public void run() {
    while (!shouldStop()) {
        SendTask sendTask = null;
        Task task = null;
        Service service = null;
        Destination destination = null;
        String stderr;
        String stdout;
        int returnCode;
        // FIXME - doesn't provide nice output and clog the log
        log.debug(schedulingPool.getReport());
        try {
            sendTask = sendCompletionService.blockingTake();
            task = sendTask.getTask();
            /*
				 Set Task "sendEndTime" immediately for each done SendTask, so it's not considered as stuck
				 by PropagationMaintainer#endStuckTasks().
				 Like this we can maximally propagate for "rescheduleTime" for each Destination and not
				 all Destinations (whole Task). Default rescheduleTime is 3 hours * no.of destinations.
				 */
            task.setSendEndTime(LocalDateTime.now());
            // XXX: why is this necessary? Rewriting status with every completed destination?
            if (!Objects.equals(task.getStatus(), Task.TaskStatus.SENDERROR) && !Objects.equals(task.getStatus(), Task.TaskStatus.WARNING) && !Objects.equals(sendTask.getStatus(), SendTaskStatus.WARNING)) {
                // keep SENDING status only if task previously hasn't failed
                task.setStatus(Task.TaskStatus.SENDING);
            } else if (!Objects.equals(task.getStatus(), Task.TaskStatus.SENDERROR) && sendTask.getStatus() == SendTaskStatus.WARNING) {
                task.setStatus(Task.TaskStatus.WARNING);
            }
            destination = sendTask.getDestination();
            stderr = sendTask.getStderr();
            stdout = sendTask.getStdout();
            returnCode = sendTask.getReturnCode();
            service = sendTask.getTask().getService();
        } catch (InterruptedException e) {
            String errorStr = "Thread collecting sent SendTasks was interrupted.";
            log.error("{}: {}", errorStr, e);
            throw new RuntimeException(errorStr, e);
        } catch (TaskExecutionException e) {
            task = e.getTask();
            /*
				 Set Task "sendEndTime" immediately for each done SendTask, so it's not considered as stuck
				 by PropagationMaintainer#endStuckTasks().
				 Like this we can maximally propagate for "rescheduleTime" for each Destination and not
				 all Destinations (whole Task). Default rescheduleTime is 3 hours * no.of destinations.
				 */
            task.setSendEndTime(LocalDateTime.now());
            // set SENDERROR status immediately as first SendTask (Destination) fails
            task.setStatus(Task.TaskStatus.SENDERROR);
            destination = e.getDestination();
            stderr = e.getStderr();
            stdout = e.getStdout();
            returnCode = e.getReturnCode();
            service = task.getService();
            log.error("[{}] Error occurred while sending Task to destination {}", task.getId(), e.getDestination());
        } catch (Throwable ex) {
            log.error("Unexpected exception in SendCollector thread. Stuck Tasks will be cleaned by PropagationMaintainer#endStuckTasks() later.", ex);
            continue;
        }
        // this is just interesting cross-check
        if (schedulingPool.getTask(task.getId()) == null) {
            log.warn("[{}] Task retrieved from SendTask is no longer in SchedulingPool. Probably cleaning thread removed it before completion. " + "This might create possibility of running GEN and SEND of same Task together!", task.getId());
        }
        try {
            // report TaskResult to Dispatcher for this SendTask (Destination)
            jmsQueueManager.reportTaskResult(schedulingPool.createTaskResult(task.getId(), destination.getId(), stderr, stdout, returnCode, service));
        } catch (JMSException | InterruptedException e1) {
            log.error("[{}] Error trying to reportTaskResult for Destination: {} to Dispatcher: {}", task.getId(), destination, e1);
        }
        try {
            // Decrease SendTasks count for Task
            // Consequently, if count is <=1, Task is reported to Dispatcher
            // as DONE/SENDERROR and removed from SchedulingPool (Engine).
            schedulingPool.decreaseSendTaskCount(task, 1);
        } catch (TaskStoreException e) {
            log.error("[{}] Task {} could not be removed from SchedulingPool: {}", task.getId(), task, e);
        }
    }
}
Also used : Destination(cz.metacentrum.perun.core.api.Destination) Task(cz.metacentrum.perun.taskslib.model.Task) SendTask(cz.metacentrum.perun.taskslib.model.SendTask) BlockingSendExecutorCompletionService(cz.metacentrum.perun.engine.scheduling.impl.BlockingSendExecutorCompletionService) Service(cz.metacentrum.perun.core.api.Service) JMSException(javax.jms.JMSException) TaskStoreException(cz.metacentrum.perun.taskslib.exceptions.TaskStoreException) TaskExecutionException(cz.metacentrum.perun.engine.exceptions.TaskExecutionException) SendTask(cz.metacentrum.perun.taskslib.model.SendTask)

Aggregations

TaskStoreException (cz.metacentrum.perun.taskslib.exceptions.TaskStoreException)11 Task (cz.metacentrum.perun.taskslib.model.Task)10 Destination (cz.metacentrum.perun.core.api.Destination)5 SendTask (cz.metacentrum.perun.taskslib.model.SendTask)4 JMSException (javax.jms.JMSException)4 Service (cz.metacentrum.perun.core.api.Service)3 InternalErrorException (cz.metacentrum.perun.core.api.exceptions.InternalErrorException)3 Facility (cz.metacentrum.perun.core.api.Facility)2 Pair (cz.metacentrum.perun.core.api.Pair)2 FacilityNotExistsException (cz.metacentrum.perun.core.api.exceptions.FacilityNotExistsException)2 PrivilegeException (cz.metacentrum.perun.core.api.exceptions.PrivilegeException)2 ServiceNotExistsException (cz.metacentrum.perun.core.api.exceptions.ServiceNotExistsException)2 TaskExecutionException (cz.metacentrum.perun.engine.exceptions.TaskExecutionException)2 PerunClient (cz.metacentrum.perun.core.api.PerunClient)1 PerunPrincipal (cz.metacentrum.perun.core.api.PerunPrincipal)1 PerunBl (cz.metacentrum.perun.core.bl.PerunBl)1 EngineMessageProducer (cz.metacentrum.perun.dispatcher.jms.EngineMessageProducer)1 InvalidEventMessageException (cz.metacentrum.perun.engine.exceptions.InvalidEventMessageException)1 SendWorker (cz.metacentrum.perun.engine.scheduling.SendWorker)1 BlockingSendExecutorCompletionService (cz.metacentrum.perun.engine.scheduling.impl.BlockingSendExecutorCompletionService)1