With all the AI hype, I wanted to see how AI could be applied to xAPI data to generate some meaningful results. I am in no way an AI expert….. but I am incredibly curious.

xAPI Reports, in my view, has not been easy or readily available in Learning Record Stores. Whilst there are plenty of LRS’s with great reporting tools for charts (Veracity LRS), I was still curious as to how AI can work with xAPI and what I can do with it.

So, I got to thinking about how I can achieve this. My knowledge of AI in a programming sense is pretty limited, in fact it’s zero!. And so begins the journey of discovery.

This article will focus on my initial exploration. I’ve a long way to go, so as I learn more, I’ll share updates.

I know there are heaps of examples out there, because I used them to figure it all out. This article is about how I did it and lessons learnt along the way. Thanks to all the sites I visited, too many to name and give credit to, just know you’re appreciated 🙂

Where do I start?

Whilst I have a technical background, I’ve never delved into the world of AI. Here I was thinking you just ‘upload’ the data you want and voila – you get the report you wanted! Well, it’s not quite like that.

My exposure to AI, like most of us, is through ChatGPT. I am using this in Remote Reviewer to generate checklists based on a prompt. This works quite well, but I am not ‘uploading’ or ‘providing’ additional data. With Remote Reviewer, I started looking at how I could provide some xAPI data for a Task and also generate a report. This was also pretty simple. Just connect to the API, pass your API key, the model and the prompt and away you go. What I did learn from this was that the data from xAPI needed to be transformed to make the prompt more meaningful.

As an example, below is the text output for a task would look something like:

“Assessor is Julian Davis Test Reviewer and conducted the review of Checking the task Demonstrate how to use a Cordless Drill on 26-10-2023 23:38:52. The person being reviewed is Jon B. and the instructions for the task are <p></p><b>Tasks to be completed: </b><ul><li>Check for wear on drill </li><li>Battery is charged </li><li>Correct PPE is worn </li></ul>.The following comments were made on the review using Remote Reviewer: <p>Correct PPE was used</p> at 4.783 seconds into the video”

Using this prompt (given it’s just an example) I was able to derive a pretty simple report.

Where does the data come from – an LRS? Database?

This is the next challenge. I want to feed xAPI data into the Large Language Model (LLM). Getting xAPI data from a Learning Record Store (LRS) is very well documented. For the testing and experimenting I wanted to do, I had direct access to the MySQL database that is part of the TraxLRS. This meant I could develop some queries to pull the statements out I needed.

The particular LRS has over 4 million xAPI Statements – this is massive! Using an xAPI Wrapper and only pulling 100 records at a time would be very time consuming, hence the use of a direct database connection and query. I’ve not ruled this out yet, but for the initial experiment, it would work.

The structure of the database meant that there is one column in a table that contains the entire JSON for the xAPI statement. With this, I created a simple PHP page that will check any passed variables (Course and or Verb) to run a query. Given the enormous amount of the data, I added a date range and a limit. I set a default limit as well as passing the variables.

The final queries ended up as:

//Set default date range to 7 days

$date = strtotime('-7 days', time());

$startdate = date('Y-m-d',$date);
$enddate = date('Y-m-d');
$num_records = "";
$course = "";


if(isset($_REQUEST['course'])){
    $course = ' and JSON_EXTRACT(data, "$.object.definition.name.en") = "'.$_REQUEST['course'].'"';
    $prompt .=  " on the course titled ".$_REQUEST['course'];
}

//Set default query limits
$limit = 0;
$Ulimit = 10000;

if(isset($_REQUEST['l'])){
    $limit = $_REQUEST['l'];
}
if(isset($_REQUEST['u'])){
    $Ulimit = $_REQUEST['u'];
}

$verb = "";
$activity = "";

    $conn = new mysqli($dbhost2, $dbuser, $dbpass,$dbname) or die("Connect failed: %s\n". $conn -> error);
   
    //get the data but check the requests
    $verb = $_REQUEST['verb'];
    if($verb = 'all'){
        $verb = "";
    }
    $activity = str_replace('%20','+',$_REQUEST['activity']);

    if(isset($_REQUEST['startdate'])){
        $startdate = $_REQUEST['startdate'];
    }
    if(isset($_REQUEST['enddate'])){
        $enddate = $_REQUEST['enddate'];
    }
    if($activity != '' && $verb != ''){
        $sql = "SELECT data FROM trax_xapi_statements where created_at >='".$startdate."' and created_at <= '".$enddate."' and JSON_EXTRACT(data, '$.object.definition.name') like '%".$activity."%' and and JSON_EXTRACT(data, '$.verb.name') = '".$verb."' ".$course. " limit ".$limit.",".$Ulimit;
        }
    else if($verb != ''){
        $sql = "SELECT data FROM trax_xapi_statements where created_at >='".$startdate."' and created_at <= '".$enddate."' and JSON_EXTRACT(data, '$.verb.name') = '".$verb."' ".$course. " limit ".$limit.",".$Ulimit;
    }else if($activity != ''){
        $sql = "SELECT data FROM trax_xapi_statements where created_at >='".$startdate."' and created_at <= '".$enddate."' and JSON_EXTRACT(data, '$.object.definition.name') like '%".$activity."%' ".$course. " limit ".$limit.",".$Ulimit;
    }else{
        $sql = "SELECT data FROM trax_xapi_statements where created_at >='".$startdate."' and created_at <= '".$enddate."' ".$course. " limit ".$limit.",".$Ulimit;
   
    }

Now that I have the data from the database, I needed to format each statement into a string that the OpenAI API can interpret. So to do this I wrote a function that takes the xAPI Object and transforms it into a string. This is appended to a variable which is what is passed to the OpenAI API query.

A note here is that I know the data and that the definitions are in EN and UND

function formatXapiData($data) {

    $formattedData = "";
    $checkObject = "";
    $checkVerb = "";

        $actor = isset($data->actor->name) ? $data->actor->name : "Unknown Actor";

        $verb = $data->verb->display;
        $Varray = (array) $verb ;

            if($Varray['und']){
                $checkVerb=  $Varray['und'];
            }else{
               $checkVerb =  $Varray['en'];
            }

        $verb = isset($checkVerb) ? $checkVerb : "Unknown Verb"
        $object = $data->object->definition->name;
        $Oarray = (array) $object ;

            if($Oarray['und']){
                $checkObject=  $Oarray['und'];
            }else{
                $checkObject =  $Oarray['en'];
            }

        $object = isset($checkObject) ? $checkObject : "Unknown Object";
        //check if a question has been answered and format results

        if($checkVerb === 'answered'){

            $choices  =" Question Choices: \n";
            $answers = explode('[,]',$data->definition->correctResponsesPattern);
            $question = isset($checkObject) ? $checkObject . ' with the id '. $data->object->id : null;

            foreach($data->object->definition->choices as $choice){

                $isCorrect = ' is not ';
                if(in_array($choice->id,$answers) ){
                    $isCorrect = ' is ';
                }
                $choices .= $choice->description .' and '. $isCorrect ." the correct response\n";
            }

        }

        $result = isset($data->result->success) ? $data->result->success : null;
        $resultDuration = isset($data->result->duration) ? ISO8601ToSeconds($data->result->duration) : null;

        $formattedData .= "Actor: " . $actor . "\n";
        $formattedData .= "Verb: " . $verb . "\n";
        $formattedData .= "Object: " . $object . "\n\n";

        if($question){

        $formattedData .= "Question: " . $question . "\n\n";
        $formattedData .= "Question Choices: " . $choices . "\n\n";
        $formattedData .= "Correct Answer: " . $question . "\n\n";
        $formattedData .= "Answer Provided: " . $question . "\n\n";
        $formattedData .= "Result: " . $result . "\n\n";
        $formattedData .= "ResultDuration: " . $resultDuration . "\n\n";

        }

        //Get and format when the experience occurred

        $date = date_create($data->timestamp);
        $formattedData .= "When: " . date_format($date,'d-m-Y h:i'). "\n\n";

    return $formattedData;

}

Once this was done, I then use cURL to query the OpenAI API with the key (you will need to get this from the OpenAI Platform page and will cost you) to let the OpenAI do its magic:

$data = [

    "model" => "gpt-3.5-turbo-0125",
    "messages" => [
        [
            "role" => "system",
            "content" => $prompt . '. The results must include duration in seconds if specified.' ],
        [
            "role" => "user",
            "content" => $output
        ],
    ],
];

$headers = [
    "Content-Type: application/json",
    "Authorization: Bearer $openaiApiKey",
];

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $openaiUrl);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($data));
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);

$response = curl_exec($ch);

if (curl_errno($ch)) {
    echo 'Error:' . curl_error($ch);
}
curl_close($ch);

$report = json_decode($response, true);
if($report['error']){
    $message = 'I was able to find '. $num_records .' xAPI records. But there was an error :( ';
    echo $message . '<hr><p>';
    echo 'Error Message:<br/>'.$report['error']['message'] .'</p>';
}else{
    echo 'I was able to find '. $num_records .' xAPI records.  The results are:<br/>';
    echo nl2br($report['choices'][0]['message']['content']);
}

}catch(Exception $ex){

    $message = 'I was able to find '. $num_records .' xAPI records.';
    echo $message . ', but there was an Error: ' . $ex->getMessage();

}

The final result is something like:

screencapture-lrs-isq-qld-edu-au-xapiAI-2024-05-28-20_12_24-1024x768 Using AI to report on xAPI - an Experiment

Now, I’m pretty happy with this..BUT, there is still a looooonnngggg way to go to get meaningful data back.

What I have found so far:

  1. I need to improve the prompts
  2. I need to improve the formatted data
  3. Size (ie Tokens) is an issue. The ChatGPT gpt-3.5-turbo-0125 model only allows for 16,385. This is the length of data I am sending to be queried and reported back on
  4. Look at using a free LLM. I am looking at Ollama on a Mac to serve up local LLM
  5. The way I’ve approached it may not even be the right way
  6. There has to be a better way
  7. Keep being curious

Stay tuned as I continue the discovery. Be really interested to know if you’ve tried anything and would like to share.

Leave a Reply

Your email address will not be published. Required fields are marked *

© The Digital Learning Guy | xapi.com.au
ABN 364 4183 4283