Examples with Instances - weka.core.Instances

Example 31 with Instances

use of weka.core.Instances in project iobserve-analysis by research-iobserve.

the class UserGroupExtraction method extractUserGroups.

/**
 * Function to extract user groups.
 */
public void extractUserGroups() {
    final ClusteringPrePostProcessing clusteringProcessing = new ClusteringPrePostProcessing();
    final XMeansClustering xMeansClustering = new XMeansClustering();
    ClusteringResults xMeansClusteringResults;
    /**
     * 1. Extraction of distinct system operations. Creates a list of the distinct operation
     * signatures occurring within the entryCallSequenceModel. It is required to transform each
     * user session to counts of its called operations. The counts are used to determine the
     * similarity between the user sessions
     */
    final List<String> listOfDistinctOperationSignatures = clusteringProcessing.getListOfDistinctOperationSignatures(this.entryCallSequenceModel.getUserSessions());
    /**
     * 2. Transformation to the call count model. Transforms the call sequences of the user
     * sessions to a list of counts of calls that state the number of calls of each distinct
     * operation signature for each user session
     */
    final List<UserSessionAsCountsOfCalls> callCountModel = clusteringProcessing.getCallCountModel(this.entryCallSequenceModel.getUserSessions(), listOfDistinctOperationSignatures);
    /**
     * 3. Clustering of user sessions. Clustering of the user sessions whose behavior is
     * represented as counts of their called operation signatures to obtain user groups
     */
    final Instances instances = xMeansClustering.createInstances(callCountModel, listOfDistinctOperationSignatures);
    /*
         * The clustering is performed 5 times and the best result is taken. The quality of a
         * clustering result is determined by the value of the sum of squared error (SSE) of the
         * clustering. The lower the SSE is the better the clustering result.
         */
    for (int i = 0; i < 5; i++) {
        xMeansClusteringResults = xMeansClustering.clusterSessionsWithXMeans(instances, this.numberOfUserGroupsFromInputUsageModel, this.varianceOfUserGroups, i);
        if (this.clusteringResults == null) {
            this.clusteringResults = xMeansClusteringResults;
        } else if (xMeansClusteringResults.getClusteringMetrics().getSumOfSquaredErrors() < this.clusteringResults.getClusteringMetrics().getSumOfSquaredErrors()) {
            this.clusteringResults = xMeansClusteringResults;
        }
    }
    /**
     * 4. Obtaining the user groups' call sequence models. Creates for each cluster resp. user
     * group its own entry call sequence model that exclusively contains its assigned user
     * sessions
     */
    final List<EntryCallSequenceModel> entryCallSequenceModelsOfXMeansClustering = clusteringProcessing.getForEachUserGroupAnEntryCallSequenceModel(this.clusteringResults, this.entryCallSequenceModel);
    /**
     * 5. Obtaining the user groups' workload intensity. Calculates and sets for each user group
     * its specific workload intensity parameters
     */
    clusteringProcessing.setTheWorkloadIntensityForTheEntryCallSequenceModels(entryCallSequenceModelsOfXMeansClustering, this.isClosedWorkload);
    /**
     * Sets the resulting entryCallSequenceModels that can be retrieved via the getter method
     */
    this.entryCallSequenceModelsOfUserGroups = entryCallSequenceModelsOfXMeansClustering;
}

Also used : Instances(weka.core.Instances) UserSessionAsCountsOfCalls(org.iobserve.analysis.userbehavior.data.UserSessionAsCountsOfCalls) ClusteringResults(org.iobserve.analysis.userbehavior.data.ClusteringResults) EntryCallSequenceModel(org.iobserve.analysis.data.EntryCallSequenceModel)

Aggregations

Instances (weka.core.Instances)31 Attribute (weka.core.Attribute)12 ArrayList (java.util.ArrayList)9 File (java.io.File)8 Instance (org.dkpro.tc.api.features.Instance)8 Test (org.junit.Test)8 MultiLabelInstances (mulan.data.MultiLabelInstances)7 IOException (java.io.IOException)5 DenseInstance (weka.core.DenseInstance)5 Instance (weka.core.Instance)5 ArffSaver (weka.core.converters.ArffSaver)5 Feature (org.dkpro.tc.api.features.Feature)4 Classifier (weka.classifiers.Classifier)3 FastVector (weka.core.FastVector)3 SparseInstance (weka.core.SparseInstance)3 HashMap (java.util.HashMap)2 Result (meka.core.Result)2 AnalysisEngineProcessException (org.apache.uima.analysis_engine.AnalysisEngineProcessException)2 TextClassificationException (org.dkpro.tc.api.exception.TextClassificationException)2 FeatureType (org.dkpro.tc.api.features.FeatureType)2