SCIP 2020 - lrd: An R Package and Shiny Application for Lexical Response Data

Nov 30, 2020 17:00 · 1136 words · 6 minute read column containing unique participant identifiers

Hi everyone. So today I’ll be talkin about lrd, which is an r package that can be used to quickly process lexical response data. So recall testing is commonly used to gauge memory retrieval. However, coding the responses from these tests can be time-intensive and often prone to errors. So lrd has been designed to automate scoring of recall data specifically, this package can handle data from cued recall studies, free recall studies, and it can even process sentence responses. So this package consists of three core functions, one for each of those three types of data, and the package also contains several plotting functions.

00:43 - These can be used to create serial position curves, compute the probability of first response, etc. So the package is currently available via GitHub, and to install it you can get in through using the devtools package. So starting with cued recall, cued recall data is handled via the prop correct cued function. So this function takes all the input as a dataframe, this data frame will need to contain a column with the participant responses, an answer key column. The answer key column will also need to have an additional column with item numbers.

01:26 - You’ll need a column containing unique participant identifiers. You’ll need a column containing item numbers for the study pairs. You will also need to specify a cutoff value used for scoring. So lrd uses a Levenshtein distance for scoring. This value can range between 0 and 5 with a cut off of 0 being the strictest, so everything would have to match, and cut off five being the most lenient. Additionally, this function gives you the option to flag any outliers. So, it’ll flag any participants proportions correct that are above or below the three standard deviations from the mean, and finally, you can have the option to group output by experimental condition. So when you run this function, the first thing is going to give you is the trial level data. So, you’ll see each participants, what they type for each trial, what they were supposed to type, and whether or not it was scored as correct. Next, you’ll get their proportions of correct responses.

02:34 - So, in addition to these proportions, you also get a z score for each participant relative to other members of their experimental condition. If you specified one, and also, relative to all of the other participants in the sample collapsed across experimental condition. Alright, the next function is for scoring free recalled. So, prop correct free is structured very similarly to the cued recall function. One thing to note here, is that for the free recall data, the data is going to need to be structured in long format, and you can easily convert your data from wide-format to long format using the arrange data function that is included with this package.

03:22 - Once you run prop correct free, you’ll again first get the trial level data. And then again, you’ll get each participant’s proportion of correct responses. Finally, you can use prop correct sentence to score sentence data. This function is against set up like the other two, the main difference here is that you will also need to specify the tokens split. All this is, is just what character separates the words in each sentence. Generally speaking, this is going to be a space. Once you run this function, you’ll again get the trial level output first. This is formatted mostly like the others but it has a few additional outputs. So, this output will also give you the shared items between the participant’s response and the answer key. It’ll also show you any items that the participant left out of the sentence in the omitted items column, and the extra items column will show anything extra that the participants typed.

04:26 - And again, this function will give you each participant’s proportion of correct responses, and those corresponding Z scores. Alright, so in addition to an r package, lrd also is available as a shiny application. This is available via GitHub. And for the next part of this talk, I’m going to do a very quick demonstration of the shiny application. Alright, so when you first open up the shiny application, you’ll be on the information page. Over on the left hand side, you will see four other tabs.

Three of these 05:02 - correspond to each of the scoring functions, and the top one is the arrange data where you can convert your wide format data into long format. The three scoring tabs are are structured somewhat similarly. So for purposes of this demonstration, I’m going to focus primarily on the cued recall one. So when you click on this tab, you’ll first be presented at the top with an area where you can upload your data files. So, you’ll need to upload your data file that contains the participant responses, along with trial numbers, and the participant identifiers, that’ll go to the top, and you also need to upload a data file containing the answer key.

Now the participant data and the answer 05:46 - key can be in the same data file. All you’ll need to do is just upload the same file twice. Once you upload these, the check your data area will populate with the data frames. You can just look through here. And this will, just to make sure that everything is imported correctly. After you check your data, you’ll then go down here to the scoring set up area.

06:11 - So in this section, these will auto-populate with the participant or with the column headings from your data frame. Just click on these and make sure to select the right ones. So, for this one, I text the response column as my response, key for the answer key, etc. After selecting the column headings, you can choose a column from group your data by, so I am going to group by condition. Once you select that, you then choose your scoring cut off. I’m going to use the Levenshtein distance of one for my scoring. And then finally you can check the box in here to flag for outliers. After setting everything up, go ahead and click score your data. Once it scores, everything’s going to import down here at the bottom, and you’ll have the trial level, the proportion corrects, and you’ll get a graph based on the conditional level stuff. So in conclusion, lrd allows researchers to quickly score several types of recall data, and overall the goal of this package is to provide a standardized open-source method for processing flexible output across psychological studies.

Alright at this time I will be able to take any 07:34 - questions. .