Malaterre – Topic-Modeling of Multilingual Non-Parallel Corpora – DS² 2021
Mar 17, 2021 08:25 · 12590 words · 60 minute read
Hey, and we’re live. Let me let the settings finish getting loaded up. And wait just a few seconds for any attendees to who see the see the notification. Bell, I’ll talk a little extra long introducing you to give people time to drift in here at the start.
00:22 - But okay, I think with that we’re we’re rolling. Welcome back, everybody. very much appreciated to see you. As, as I’ve heard people say at events like this before, I’m glad, I’m glad to see that you decided that day one wasn’t a waste of your time, and you’re back for day two. So thanks very much. We have another really exciting day worth of talks lined up lined up for you here today. Commencing with I’m getting an echo of my own app.
Sure. It’ll go away. It’s fine. extra long introducing, if people know I’m getting here.
01:00 - I’m getting a bad egg. But okay, I think with that we’re we’re rolling. Well, there’s that everybody. Sorry, one second. I do not know where this echo is coming from. As, as I’ve heard. Oh, Christophe, do you have? Do you have crowdcast open on your computer? Wasn’t a waste of your time. I think I think you’re watching your own broadcast in tape delay. Talks lined up lined up for you here. Let me see.
Commencing with disconnect here. How about that? There we go. Okay.
01:41 - That is what it was. Cool. All right. So yes, well, we get the screen sharing back. Good. Okay, here we are. So commencing with our second keynote speaker. So we’ll kick off every day with a keynote as you’ve seen on the schedule, and we have with us this morning or afternoon, depending on your on your timezone. Christophe malatya of the invested qhubeka mafell. Who is there, Canada Research Chair in philosophy, the life sciences and puff smoke in the department philosophy has done some fantastic work over the last few years really, really a pioneering group, working there at Duke on the use of topic modeling for purposes of history of philosophy, philosophy of science, history of philosophy of science, etc.
And he’ll be talking to us today on the topic modeling of multi lingual, non parallel corpora. So how we might think about applying machine translation to a philosophy of science corpus.
02:43 - And so with that, I will I’ll give you the floor. Thanks so much.
02:48 - Thank you. Thank you, Charles. And Luca for having me. With you, this has been a great conference. so far. So I’m really happy to be with you. today. My talk is about will be about exploring multilingual nonparallel capoeira. In specially with topic modeling, and with a special focus here on the philosophy of science for philosophy of science corpus, and this is actually a problem that we stumbled upon. When we targeted our our corpus and fellow philosophy of science journals.
We were not expecting actually to have to face multilingual corpora, but that we did, so we had to find solutions. And what I’m going to share with you today is some solutions that we implemented. So my talk is going to be probably quite heavy on the methodological side, but I will also share with you some of the results that we we got. So the background question is, how can we map disciplines through time in particular hack and we identify the research topics of a discipline such as philosophy of science, if we’re interested in philosophy of science in the history of philosophy of science? And how can we investigate changes of this discipline through through time, of course, we can use expert analysis.
And this is the usual methodology of close reading that which we look at texts in detail and with expert knowledge, we make sense of all this literature. And we’re able to reconstruct a history of the discipline with its main research themes. Opening and closing throughout through time, but of course, it is very time consuming, especially if the corpus becomes larger and larger and larger. So another approach is to use computational text mining methods.
There’s something that we also saw yesterday, Cody in the presentation of Cody or tool, for instance, in which they were different approaches being used to investigate a large, large corpus. And one of the possibilities to use topic modeling algorithms over very large corpora as a form of distant reading, to be able to collect somehow make a merge main topics that are present in a corpus through through time and how they evolve through time.
So one of the advantage is of these methods is that the can tackle very large corpora a huge amount of text. Another advantage is that they’re bottom driven or bottom up, they’re data driven somehow. So, they may help set an empirical basis, you know, for otherwise, what made the informal claims. And so here, this is what we intended to do on the full text content have eight major philosophy of science journals, from the 1930s up till 2017.
But sometimes, maybe even more often than we think corpora are multilingual.
06:09 - In our particular case, we found out that 6% of the articles were published actually in German, Dutch or French. And so, there was not so much as a percentage of the whole corpus, but the non English article still represented, but 44% of articles before World War Two. So, the first option is when you have such a corpus could be to exclude all in non English articles and just focus on the English articles and this is what we did in the first study.
But then, we thought that, you know, we were probably missing something in the pre World War Two period in which we had so many non English texts, and we wish to include all the articles including this non English articles in a subsequent study, but how can we do this? So, when you tackle multilingual corpora, there are actually two types of multilingual corpora that need to be aware of, and that we own which we found were in the in the literature, there are what are called parallel corpora and non parallel corpora.
So parallel corpora include expert translations, texts are available in in at least two languages, or several languages and their gold typically gold translations.
07:33 - Between of all the text An example is the proceedings of the European Parliament, which are available in several languages and they’re perfect translations of one another. So we find a lot of work on multilingual corpora that use such corporate as the proceedings of the European Parliament. Another form of parallel text include comparable texts, but that are not expert translations, texts in different languages that are supposed to roughly exhibit the same distributions of themes or subjects, though they’re not exact translations of one another.
And this is what you find in Wikipedia, for instance, and which articles are entries in French and English talking about the same things for instance, but they’re not necessarily exact translations of one another. And then there is a whole other class of multilingual copart the class of non Perl corpora, which include texts that are not aligned. And one typical example is articles from journals that accept publications in different languages and in this case, the articles are never translated in several languages, right you have articles that are found in different languages on the ones so, we had to tackle or in our case, non parallel corpora.
So what do you do if you want to do some topic modeling, which are the types of solutions that are there in the literature for multilingual corpora generally speaking, we’ve seen that there is a lot of things that have been developed in particular for proloquo poor topic modeling algorithms have been developed to carry out topic modeling across pal texts, by aligning the topics at a sentence or document level. And we were somehow puzzled to find out how this by the the interest of this of doing this type of topic modeling is actually due to help improve machine translation through this types of topic.
Matching panel topic model. So this is something that we’re that we had to be aware of, but this is not something that could anyway hubless with our with our problem for nonparallel Cora, you need to have specific language bridging solutions that are Somehow implemented. And so, first solution is to use what we can call advanced topic modeling algorithms that include such language Prejean solutions. And in that particular case, several algorithms have been developed or proposed, such as some that include multi lingual dictionaries directly inside themselves or all their lexical resources.
Or there’s a use concept trees, for instance, inferred from the word net or based on user input. So all there is use a combination of partial text alignment and lexical resources, quite sophisticated stuff, indeed, the objective being to be able to carry out a topic model on multilingual non parallel text without having any translation in between but using, you know, bridges, dictionaries, and so on to to be able to make the connections between the topics in one language and topics in another language.
Another solution is on the other way to use what I’d say, we’re called Advanced machine translation.
11:14 - And together with monolingual, very simple or vanilla topic modeling tools. And this has also been investigated in literature. And there, we found two options, two ways of doing it, just to do the machine translation on just the specific terms that are typically present in a term document matrix. And those were, in other words, you not translate through a machine translation, the entire corpus, but you would just filter out all the words and just send the words for translations once they’ve been aggregated, throughout all your, your corpus.
But another option that has been investigated more recently is to do the machine translation of the complete texts. And to have full text translation. This has been assessed recently, in the context of pal text by de Reese and colleagues, looking specifically at the European Parliament, corpus, it has also been assessed for studies based on linguistic inquiry word counts by some by some other teams. And we found this a quite attractive solution to implement.
But our it’s our case, though, is that we did not have parallel texts to check for the translation. So we had to devise ways of checking for the quality of the translation somehow, even without having the golden translation, you know, to check against. So we just said we opted for the second solution, especially because in enormous it provided adventures, the other solutions did not, which was that it provided a possibility to have access to the article content in English.
13:05 - And I don’t know about you, but I can read French, this is no problem, my German is very rough, and my Dutch is non existence. So having access to the full text was somehow a great advantage to be able to make sense of, of some of the results of the of the analysis. So this is one of the main advantages that we see to this solution. We used machine translation with Google Translate, following what had been done, by the way, some colleagues also for consistency, and to show that, you know, this top solution that they tested on one curl corpus could also be implemented on nonpareil corporate corpus.
And we used a very simple plain vanilla LDA topic model, most basic form that is very well known and very, very robust. And then, on top of that, we did further analysis with adoke codes, and I’ll, I’ll go into that as well.
14:05 - So, remember that our corpus is non parallel. And so contrary to what had been done before with multilingual corpora and machine translation, we did not have the possibility of of, of checking with the, with algorithms, the quality of the translation, so we had to devise a new way of assessing this transcript translation quality problem.
14:33 - So, in the talk here, they’re hoping you understand now there are two intertwined question, I met with a methodological question that we stumbled upon when we wanted to answer a history philosophy question. So the methodological question is, how can one tackle non parallel multilingual corpora and evaluate the machine translation quality and this is where we devise what we call a semantic tool. ecology preservation test.
15:02 - And then there is a history of philosophy question the content question, so to speak. What are the research topics investigated in the philosophy of science? And how have they changed through time. So, with this, here, our main results is to do a complete topic model of the complete corpus and to bring new insights from the non English text, notably pre World War Two.
15:31 - So the structure of my talk will be two very classical to present to you the data, then the methods, then the results, and then we’ll go to a discussion period, depending on how much time we have left, and then we’ll have we’ll be happy to answer questions. So the data itself as I said, it’s a philosophy of science corpus includes eight major journals of the philosophy and philosophy of science. So the BJP s f kenntnisse, the European Journal for the philosophy of science, international studies and the philosophy of science, the journal for general philosophy of science, philosophy of science studies, Part A and centers.
So you can see here, the publication periods, the number of articles in the different languages, and it’s a, so most articles like 15,900, are in English, but there were still quite a lot in German, some in Dutch, and some in French. The German articles were mostly from Eric, and this pre word word, pre World War Two.
16:38 - And from the journal for general philosophy of science, from 1970s. Through the 1990s. journal articles in Dutch and French were mostly from centers, and typically before the 1940s, or 1960s, altogether. So all in all, this is a corpus that includes nearly 17,000 full text articles. This is the how the the number of articles evolved over time. So you see, in shades of orange here, the English articles of the different journals, these are a number of articles, you see how centers typically came to become one of the journals with the highest number of publications today, but you see also in blue, the non English articles, and how significant they were before World War Two.
And then how some of them appeared also in the 1970s and 90s. These were articles from the journal for general philosophy of science. So before we’re we’re essentially non English articles, quite a lot of them in centers and erkenntnis, and then later on in the JPS. So that’s about the corpus itself. Now, then, the methods we implemented here a research design following two main stages. So stage a is more concerns the translation and stage B consumes more of the topic modeling itself and the content of the corpus.
So stage a includes all the translation issues of the non English articles into English and the assessment of the translation quality, especially for the purposes of bag of word textual analysis. In stage V is the topic modeling of the entire core business analysis, both from synchronic and diachronic.
18:40 - perspective in comparison with the previous topic modeling that we had done on the English only portion of the corpus. So this is the overall research design in orange, you see all the different steps involved in the translation and translation, quality assessment, and in blue, all the topic modeling work and in gray, that was the methodology followed for a previous topic model. So we’ll go into some of the details here state step by step, just to make sure that you’re fully aware of what went on behind the scenes, so to speak.
So what do we do for the machine translation itself? We took all journal articles, and they’re made of data that we had organized into a data frame. We use automatic language detection to detect the article language. We then split the corpus into four step goalpara, English, German, Dutch and French and it’s non English support this was sent to Google Translate by chunks of about 25,000 characters as requested by google translate the translation results were then reassembled into articles and then The whole assemble back into a data frame.
We carry out a manual quality assessment. In this case, each non for each non language, non English language corpus 10 texts were randomly selected. And for each, we inspected their first 500 words of the original document. And the translations in particular was called the text for three types of problems that might have possible impact on computational textual analysis. We found out that there were spelling issues in the original text, most of the resulting from OCR and encoding issues.
And some of these issues spelling issue, the original text might have also induced issues in the translation. So they were present in both, we looked for inaccurate terms that were introduced by the translation.
20:52 - So translation mistakes, so to speak. And we’ll look at OCR and encoding issues that were present in the original text, and that were corrected through a machine translation, so improvement in the text quality through machine translation. But that was done only on sample taste. And we wanted to have some form of assessment of quality assessment over the entire course something that could be implemented algorithmically. So to provide a systematic translation assessment over the whole non English corpus, we chose to compare the relative distances between documents before translation and after translation.
The rationale is that documents that are close to one another in the word vector space before translation should also remain close to one another after translation. And if this is the case, then bad word algorithms should provide similar types of results if they were to be run on the original text, or if it were to be run on the translated text. And this is what we what we call the year the topology preservation test because in other ways, in other words, it means measuring how similar the structure or the topology of the document term spaces is the same in the original corpus are in the translated corpus.
So we devised this topology preservation test. More specifically, this test consists in constructing the document term mattresses of the three circle poor in their original languages, so Dutch, German and French, and of the same thing for the three translations. We constructed the document or matrices directly, without any pre processing of the text, therefore, the dimensionality was extremely high. To reduce the dimensionality of all six document mattresses, we use single value decomposition, we then calculated the document document mattresses, within their respective words, they were vector spaces for all six document term mattresses, we use Euclidean distance here, and then we measured with the similarity of the original and the translated distance matrices for each sub corpus.
And here we use two similarity metric similarity measures such as mental coefficient, procrustes coefficient and RV coefficient and we use this indicate coefficients as indicators of the translation accuracy for better word analysis. Now, the methods for topic modeling itself, here this is more straightforward.
23:47 - The first stage typically is to do a corpus pre processing once you have assemble all the English and all the translators are cooperate together in a data frame. We do a word tokenization with part of speech, tagging and lemmatization to reduce the number of word variants with your welcome package of tree tagger with Penn treebank. We added a word filtering here, removing stop words and also based on frequencies and that resulted in a reduced lexicon of about 24,000 terms in all articles of the of the corpus.
After the pre processing, we implemented the topic modeling itself is said we use the LDA algorithm for m Bly and in colleagues so the one of the classical Python package was Gibbs sampling. And we did this with a number of topics of 25 as in the previous topic modeling that had been done the advantage is that this is a fairly coarse grain view that is that we found quite suited to be able when it comes to describing very general trends in the discipline over nearly a century.
But of course, more much more detailed topic modeling can be can be implemented. The results that are obtained from this topic modeling stage are the 25 topics, which are probably probability distributions over the lexicon. And the probability distributions are the topics over all articles in the corpus. Once we have this, we did a topic interpretation by examining the most probable words in each topic. And by retrieving the articles in which a given topic was in turn the most probable.
We also grouped the topics into clusters on the basis of our own expert knowledge so as to facilitate the handling of the topics. And we also use topic correlation within corpus documents. To do this, this manual clustering. In a subsequent step, we compared this synchronic topic modeling with a previous topic model done only on the English text. To do this, we assess the similarity between the new topics and the similarity of the new topics with the previous topics, and we use the Euclidean pairwise distance between the relative the respective probability distribution vectors, for all words that were shared in between the two lexicons.
The result of this step is a distance matrix between the new topics and the previous topics, that shows the alignment or non alignment of the topics and the effect of including the non English non English corpora into the into the study. As a sixth step, we did the diachronic analysis and also a journal by General analysis that was simply done by adding publication years and journal meta data to the model. And I in aggregating the topic probability distributions in articles either published in specific time periods or published by specific journals, or even.
But the results of this step is a diachronic topic model that shows how the topics evolve through time. As well as journal profiles, and even journal profiles evolving through time. This will all shift, we also did further comparisons with the previous topic model on the on the journal profiles in the diachronic. So here it is, we’re more qualitative comparisons, not algorithmic. But this led us to investigate also more in details, the pre word world to period for which the proportion of non English articles was significant.
And I’ll share some of the results with you also here. Right, so this is for the, this is for the methods itself.
28:10 - So now the results, what do you get? First of all, the machine translation. So very simply, the outcome output of the Russian translation was the English translations of all the three corpora. The peculiarity was that we had a high number of OCR and encoding issues, especially in the pre 1960s. Documents found out that all documents that we had gotten from J store Typically, what we’re plagued with question marks and where, where it goes with OCR issues.
28:42 - So wsoc as uniform as alto do, more eastwatch should actually be less residual for says, Here, this sounds great, the introduction of the entendu Mo, should actually be nice, too. So what we found out is, when we looked at the question marks, there’s something that we could do algorithmically. We could measure count the number of question marks either inside words or outside words in the English corpus or in the non English sub corpora.
And we found out this is what shows here that the translation actually reduces on average 80% of the question marks found in the in the text, so we didn’t know exactly where this question marks were, were reduced and half. And this is why we’ll look at so the text by with some close reading in some manner, quality instruction. So this is an example of comparison here on just a very simple paragraph of a text by Louis Zhang lavalla tivity logic publishing by Cantonese in air conditioned 1939 This is the original text that we got from J store.
And as you see, there are many issues here with question marks. And this is machine translation. So as you can see, there were issues that were present in the original text that would have been would have impacted the topic modeling. If we had done a topic modeling just in French here that were corrected by machine translation. There were others like here, that would not correct it, that stayed. And there was still some that were, you know, some, like proposition here that was translated by proposals, this is not really accurate.
This is a, what we call a translation of translation mistake. So you have here are the three types of anomalies that we we identified anomalies that were present that got corrected anomalies that were present, that did not that stayed somehow and things that were not anomaly that anomalies that were introduced by the by the translation. We quantified this over the sample text. And what we found out is the older three types of anomalies that on average, the translation machine translation, left out 3%, three words per 100 words in each corpus of anomalies were left translations introduced about 1.
4 words per 100 words of text of mistakes, or issues and translations, distortions, and machine translation corrected that 9. 4 words out of 100 words, so all in all, it’s a net, if you will, a net benefit of about 8% of improvements of the machine translation in the quality of the of the text. So we’re really positively surprised by the this machine translation step. At least from this qualitative perspective, now, we looked at the topology preservation test itself.
These are the results of the different matrix similarity coefficient, here you see the mantle kristus, and RV coefficients for the three languages. And as you can see, the results are extremely good. I mean, over point nine, eight, for French, in the range of point nine for Dutch and German here. So not only you know, the on the sample texts that we studied, we found a significant improvements in the translations, very, very good translations, and even improvements in the quality of the text.
But overall, also, it seems that the mission translations preserved extremely well, the ways the documents are, are grouped together in their different vector spaces.
33:06 - So that gave us a very good confident, very good level of confidence in this machine translation step. Now, results of the topic modeling itself. Once we’re confident about the mission train station, the results our first set of topics was at 25 topics were found by the by the the model itself, so topic is nothing but a, you know, a probability distribution offer the lexicon. So here you see the top 10 words, which are the words that are the most likely or the most probable.
In this particular topic, the first topic, and the name here is the name that we assigned to that particular topic. Once we interpreted the topic on the basis of the set of words, and on the basis of the top articles in which the the, the topic was most likely. So here you see the the year 25 labels with the 25 bags of words on top 10 words, but don’t don’t forget that that the in the LDA topic model is not just a bag of words, it’s probability distribution of this word.
So, some of these words are actually much more likely to be found in in given topics.
34:33 - What we said what another way to look at the topics is to look at how topics relate to one another in documents. So this graph here in this network represent the topics nodes are topics and edges, the thickness of the edges is proportional to the correlation of topics within documents. Do you see that some topics tend to often found together in specific documents throughout corpus. So here we see a first let’s say cluster of topics that are about formal themes like fill out language mathematics truth sentence, we can, we can imagine that this is something about philosophy of language logic, typically philosophy of mathematics.
And this is exactly what we find when we go to some texts. And when we we look, the text in which these topics are the most present, another set of topic concerns, it’s more epistemology oriented because it concerns knowledge arguments or scientific theory. There is a another third topic here that concerns with the confirmation problem of induction experiment, but then probability, somehow in between, there is a single topic here, which more about agent, Game Theory agent decision, these types of things.
36:12 - A, another topic here is about. So I’m jumping by order A, B, C, D, E, here is about biology, philosophy of biology with a topic evolution and philosophy of mind and the neurosciences with three different topics here, one about perception of the mind about mind and other ones about specifically neurosciences. We have three topics somewhere in the middle here, that are maybe in some of the books say the core topics in the philosophy of science about explanation, causation.
And here in prop property, there is a lot of, there are a lot of articles about for instance, supervenience, emergence, this type of things. And then we have a little cluster here, there is more about philosophy of physics with something which more quantum philosophy of quantum physics, but also some relativity found in there, as well as another cluster, which is more about atoms in chemistry that we named particles. And then finally, a cluster which is more of a social historical nature.
That includes, like topic classics, in which one finds classical authors in chancing Galileo or dig, or was a Newton type of type of investigations. And then history or philosophy in which you have the articles that are bedfellows for the science, but in both classical authors such as Kant, or decart. And then there is a topic social, which involves more social aspects of science, typically. So this is one way of looking at the topics I’m very quickly going through the years through them.
What we’ve built is a a map of topic digitization tool that makes it possible to look at all the details, because there’s just so much information here that I cannot convey the whole of it and just in presentation, this is a map here Is he a scatterplot of all the documents with air dominant topic and once you click on a particular document, within specific cluster, you get the details here of the of the article, which is with its probability distribution, you can also select which of the topics you want to display in this in this scatterplot and then, you can also look at topic details.
So, if you choose one topic here, you see a word cloud of that particular topic or the key the top two top words here, the Top Words are again present here, but also in the other topics in which this word is present, that other topics here are listed. So the word selection is you know, the year one of the top worries of the year topic evolution, but it’s also present with a very smaller, much smaller probability and topic experiment into the probability confirmation or agent decision for instance, you know, so, this is something that can be used as far as I can use to really look at the details of the topic modeling results.
Now, just remember, we wanted to improve or to see how to improve our previous topic modeling with addition of the non English text and so one of the core question that we had is how do the new topics compared to the previous topics and this is a heat map based on the distance We measured on the distance between the old topics here and the new topics, the topics are range still by cluster, they are the color shades are also representative of the of the clusters.
And this can be seen the the distance that are the smallest are bred here, there is a fairly good alignment of the new topics with the old topics, but some differences right. So, the introduction of the 6% of non English text did change a bit the the topic model we had so, there was there was a good alignment of the old topic except one topic here for explanation that did not really match properly the previous explanation topics but on the other hand, that particular new topics match better.
This other topic here about game theory, probably because the models being involved there would have to investigate that in more details. But on the other hand, that previous topic model topic, your explanation was a better fit for this one than any other here. So, it’s a, you know, if you look from the new to the old, you don’t get good pairing if you look from the old to new, the pairing is maybe a bit better a bit better here, what we see also is that so, the mapping is not exactly the same in the sense that in here for instance, concerning the cluster of conformation and probability, we previously had two topics and the new topic model includes three topics, but there is a good you know, good matching have to complete that topic to the to the to other to the to us to the to new topics here.
Same thing for let’s say philosophy of mind in the neurosciences, there were two topics and now there are three topics and philosophy of physics on the other hand, somehow shrank a bit in the in the representation problem meaning that the added non term non English text did not have much faster physics in them. And on the other hand, they increased the number of social historical types of topics here because the new topic model includes four topics, whereas, there were only three in the previous topic model.
So, fairly good alignment, but still some changes. So, what is the now the topic evolution itself, if we look at the diachronic view of the topic model, you see, how the topics evolved year through time. So, you see, on the x axis the different time periods from 1932 to 2017. the x axis here is the probability of the topics being found in articles of the given time period and the right axis here is the number of articles per time period and this is the shaded little lines here, the dark dotted line is the total number of articles and the gray dotted line is the number of English all the articles.
So, you see that, it was really pre World War Two that there were some major additions as well as your through the Journal of general philosophy of science. So, we can analyze the the trends even at this large scale. So, there was a, you know, relative significance of topics related to history of philosophy up until the 1960s with a decreasing trend, and then there’s some ups and downs or social aspects of science, typically, there was concerning philosophy of language, there was a strong decrease.
So, the language topic that can be seen here, significance and a decrease of it is linked to a significant decrease of the Logical Empiricism. We see also in the in the corpus here, topics such as confirmation were really significant in 1960s, but then decreased there was a slight increase of the topic probability to 1960s and then a stagnation we find an increase in epistemology related topics in be seen here, all this area, in particular through the topics arguments and knowledge, we need to look at more details which types particles were behind that.
There is in terms of philosophy of biology, a strong presence, there was a presence of philosophy of biology. You know, Before the 19th, I said if you want 1940s, but actually before 1950s, and of mine here, a decline and then more developments, especially in the 1970s, especially in the philosophy of biology and the neurosciences, and the relative constancy here are these little single topics, Agent decision.
45:23 - All throughout the corpus, though slightly increasing from the 1940s.
45:30 - And concerning the other two clusters, there is relative significance of topics related to philosophy of physics, yet a slight decreasing trend in the, in this same topic, since the 1970s. Things have been different, you know, depending on which topic you look at, but this is this is the general trends that we observed here. And there is a regular increase of themes related to causation, explanation and property all throughout the corpus.
So, these are the broad trends that we can, that we can get through the topic model, we can explore in detail. So this also in the web interface that we’ve, we’ve built, looking at the different time slices, also looking at specific topics in detail, and so as to be able to, you know, to gather more insights, because this is again, extremely rich in terms of data. We calculated by using the yen, journal mainly meta data, we calculated the local journal performance and this is where we see the topical distribution through all the different journals in the in the different journals being here on the x axis.
So, our convenience centers, the BJP s, philosophy of science, the JPS, ISP, SD GPS and sbsa, Port eight. And we see that the profiles are, you know, different. Some higher contests and centers are quite heavy on the formal side philosophy of language, philosophy, logic, also on epistemology, on the opposite journals such as isbs, dgps, whereas hbsag are much more of a high proportion of the social historical types of types of topics. But even looking at details, you could look into the details here, and you see that some of the journals have tend to have different emphasis and different on some topics more than on more than all others.
We said, we wanted also to compare things before and after adding the non English copyrights. So this is a comparison of the topic model on the on the complete corpus, the diachronic view and the previous topic model on the English on the corpus. So, of course, one of main thing was the addition of in time here and here earlier, publications that were not present, we’ve seen is a slight shift here off the social historical topics that were, you know, that level here that went increased at that time period, and also here in the 70s.
Probably, this is due to the inclusion of the non English texts So, that were, there were also some had different nature than the average English on the document. We did similar comparisons on the journal basis. So here you see, the top the journal diachronic views on the complete corpus, this is the first roll here. So we see our continuous centers and gjp s by 10 periods when they were published. So Eric in this case was published very much as you as you may know, before World War Two that was interrupted by the war and only recovered in the 1970s centers, was published before World War Two and got interrupted republish every interrupted again and the GDP has already started in the 1970s.
What we see is the administrator can compare the the new topic model with the previous ones at the level of journals to kindle first edition, new time periods but we see also the some of the topics actually for air conditioner increased here the topics on philosophy, logic and language increased in Shere Khan get to where they were before. On the other hand, it was something that Trent went the other way, with centers, in which there were few social historical topics being present.
And there were more of these topics. In the subsequent and subsequent modeling, and then we also observed some changes for gdps.
50:08 - But we thought that that was very interesting to look at the pre World War Two period, especially in our companies and some days as we saw that some of the impacts of adding the new text non English text were the strongest before World War Two.
50:26 - So, we looked at author publications, and throwing in authors were able to impute the average contributions of authors during specific time periods. So, that in this particular case, this is for all articles published before World War Two up until watching 1941. And so, this depicts the proportion the contribution of each author to a given topic for that period through his or her articles. So, for instance, what is really noticeable here is the for Eric Agnes the importance of in blueish and right off of formal topics philosophy of language and mathematics and logic with a strong contribution by by actionbar by current up near at Frank Schleck handbook but also by the Polish logician cubics this is what we see here.
We also see that these are the strong contributions of all the founders of the Vienna Circle, typically typically. So this is not surprising somehow that we get this out of paper. What is maybe a bit surprising is that some of these contributing authors also contributed to the more general topics called philosophy and history and this isn’t particularly the case not so much for carnac but for an author like aschenbach merchandise all over the place, for instance, proceeds here here is also in philosophy and a little bit here in history, naret also contributed strongly to philosophy and in history.
To get more details on that one needs to look at the articles also. And this is what we did here, for instance, we there is a this is a sample of 10 articles, by some of the main contributors to Action News during that this pre World War Two period and we see four there just listed by alphabetic order here, these are the contributions and these are the topics and their probability distributions. So we see for instance that the logician gt VIX in an article entitled language meaning is has high proportion of topic language here.
And on the other hand, we can see for instance, or narrate that here we have three article by narrator, that the two of them, this two here are very heavy on philosophical language.
53:13 - They, they are about political sentences about radical physicalism and the real world. On the other hand, this other article about ways of the scientific worldview is really not heavy on this formal topics, but much more heavy on the history, you know, and philosophy. So general type of topics, looking at syntes brings a total different picture with a totally different set of authors. Much, much more heavy on the philosophy and historical general type of topics, little bit on the formal topics, but really not so much definitely not the same type of picture as as Eric in this in this pre World War Two period and a radically different set of authors, we see authors such as shun makers and groups.
They were actually quite prolific authors, who contributed much to the topics philosophy and history with very diverse, diverse articles on matter and thinking cosmogony, but also on beauty on sin, among other things, one of the early growth operators of centers not to knock on the founder, but early collaborator schoenmakers was a mathematician. But he was also a member of the tails of equal society, like the founder of the journal, and actually many of the collaborators of centers at that time.
Were also a member of the tails ethical society as I found out it was the case for the Dutch philosopher and mathematician, Gary, Missouri, is here and who contributed also the most to some of the formal topics.
55:01 - But that was also the case of the German biologist and philosopher, Hans de haish, who’s here as you can see, and also have a other contributors like kruseman, I’m looking at my notes here kruseman was also a biologist, admirer of duration.
55:26 - And he too was a founder of the material jubile society and also a member of the International Society for significance that gathered several other of these are the contributors. So, a very different set of authors, probably with a different mindset, different approach to philosophy of science that is depicted here, when we looked at the details of the centers of the of the pre wormhole, two centers. Again, same type of exercise can be done by looking at some of the deep some of the articles and looking at how some of these articles contributed to the different topics.
As I said, for instance, schoenmakers contributed very diverse range of articles on thought on beauty on sin, at that time, kruseman contributed articles, for instance, organisms and society and said a little bit about here on the biology topics, for instance, and so forth and so forth. I’m going out to the third journal that we thought was interesting to look at in pre world period, which was the third journal published at that time, in the was philosophy of science published in English language.
And here, again, a different picture with a radically different set of authors, we can see a, a picture which is still more dominated by this historical, philosophical general types of topics, but with a bit more of the formal philosophy of language and logic topics, but definitely not as much as in air kindness, authors, some of the main authors included malissa here.
57:14 - So malissa was the founder of and the first editor of philosophy of science. And so he contributed himself a lot to the early philosophy of science. And this is something different, that we saw in centers where the founder did not contribute to the early early centers. But they were all there. So, Mel itself is present here and many other topics have written about philosophy, physics, which is about speaking many different things. He’s somehow all over the place.
So he addressed a very broad range of questions. See, also philosophy of party was it particles and philosophy of even quantum, the topic of quantum mechanics, here are five other figures also emerge. If you look at some of the strongest contributors, for instance, David Miller are going to be muted much to your philosophy, as well as Charles Hart Oren Hartshorn hartshorne sorry for the pronunciation known philosopher of religion and metaphysician of the time.
On the other hand, logicians such as Luis cat Cerf, here, or Henry Smith also going to be oops contributed to philosophy, language and logic related topics. As you can see, there is maybe a bit more more small cluttered here, they’re more contributions, there are more scattered throughout throughout the different topics. So all in all, and so, the same exercise can be carried out here looking at different articles and how these articles contributed to the specific topics at that particular time to receive the rich richness behind the the the model itself.
So all in all, the this the addition of the non English text shows a pre World War Two philosophy of science that is somehow much more nuanced than we had before we just when we just looked at the English text, it shows very strong also a specificity of the Street Journal’s or companies centers in philosophy of science and how it’s very different set of authors all throughout the day, the period. So I’ll just take a few minutes maybe to wave in direction of some discussion and bring some some some topics here for discussion, especially concerning the methods but also the results.
59:58 - What do we get of the usefulness of machine translation for bikur word analysis were our results. Both the manual inspection and the topology preservation test provide good reasons to trust machine translation for backward analysis your chance to have a model here which had chosen Google Translate services for consistency with previous studies by device but other machine translation services also offer equally valid solutions. This has been tested by machine translation we found that also is of great help for facing OCR related or encoding issues.
Yeah, this is a as a free benefit of the machine translation. But interestingly, it preserves word or ordering also. So it should be adequate for all their analysis, not just bag of word analysis, all their analysis that would rely on word ordering in text such as collocation, co occurrences or in sentiment analysis, or even more word embeddings. The topology preservation test that we propose is of course only in necessary condition for reliable translations not sufficient per se.
So it’s not guaranteed that the translation will work, but it’s it provides a something comforting somehow increases our level of confidence of the translations work well. And this is actually one of the only things that we can get if you have no reference translation to check the translations against and this is the case for non parallel multilingual coupler.
61:32 - So we believe that this is a new interesting way of, you know, systematically checking where the translation went, whether it went well or not well, without having to look at all the all the details about the corpus itself, the where we saw that is that we got more accurate topic modeling, and that was a motivation for including our non English tags. The the, so there was, yes, there was a motivation. Of course, we could say that the I mean, the comment, a comment that can be done is that we only looked at eight journals.
And that philosophy of Sciences published in many other places may or the journals, sometimes more specialized ones, depending on scientific disciplines of philosophy, physics, also biology is partially as time went on, and more and more journals got funded. There’s also a lot of philosophy of science published in numerous monography is and edited volumes. But so all in all, this shows that the doesn’t show but it is just a warning, of course, the results should be interpreted in light of this corpus related limitation, we only looked at a journals.
But we believe that the representativeness of the selected journals which are among the most central journals publishing general philosophy of science, lends confidence that the topical trends that were observed and did captured meaningful disciplinary patterns, at least at this at this level. But of course, comments can be made on the corpus itself, the topping model, there are different comments that can be made. Of course, we could have chosen different algorithms, but we don’t believe this would have would have changed much.
One of the things that do change a lot the the models is the number of topics. So choosing the number of topics was not a model is something really crucial. Here we delve into and chose 25 topics as to facilitate comparisons with a previous English only topic model. And as I said, this Bailey has the advantage of offering a fairly cool screen view, which suits the purpose of sketching a discipline report right over the course of you know, more or less eight, nine decades.
But in the previous topic model, we had tested also different types of models. So we did by trial and error wrens, we did topic models with different values of the different values of the number of topics, some with a smaller, like 1215 topics to 50 100 150 or 200 topics. An Indian we settle on the on 25. And, of course fine grain topic models, like topic models with 100 250 200 topics would offer much more details. This is what we’ve done in a previous study, the one that we published in Harper’s but here you know, the choice of a topic, the number of topics just granularity.
Ultimately it depends on your research questions. What is also key is the fact of being able to interpret topics and of course, one should always bear in mind Even though LD Toby model is a generative model that builds up topics from the corpora expert knowledge is always needed to interpret the topics and the results of the oldmeldrum. Then final comment about the philosophy of science itself the translated text as we saw most affected the early decades of the diachronic topic model and somehow the topical profiles of the year three journals or kidney centers and the gdps.
As we saw the these text counted to 6% of the total corpus, but the share rose to 54% before were two of course, topic modeling only provides a descriptive view of the topical content of a corpus, but it cannot explained cannot explain the observed facts. This is again an area that a researcher has to fill in with specific knowledge of the field or that can lend itself to further investigations, because there is a wealth of ways in which the data can be further investigated.
And you know, as possibilities here, the the changes themselves made you to different factors that can be researcher driven, they can be drawn driven, they can be driven by disciplinary dynamics, they can be driven by extra discipline dynamics, but still in science, or they can be driven by extra scientific factors, including funding policies, or broader historical or sociological factors. So understanding the whys behind the changes in probability topic probabilities is something that requires definitely further investigation beyond the topic model.
So just to conclude, now, as to my time’s up, what can we say more generally about computational takes mining approaches in philosophy, we believe they’re extremely powerful to study large corpora. And here, we wanted to show that they can even be implemented on men nonparallel, multilingual copertura, with the help of machine learning, machine translation, many, many different types of analysis are possible topic analysis are one. But also, you can do many analysis, depending on the metadata that you add to the topic analysis, you can do diachronic analysis, job analysis, offer analysis, you can also do conceptual analysis by focusing specific words you can do author network analysis, and so forth.
They’re useful in a descriptive way that describe, you know, what, what is in the corpus, they’re also useful in a juristic way, potentially, by pointing to areas of worth a further investigation through classical closed reading methods, for instance, I think can also have a justificatory usefulness in the sense that they provide an empirical grounding to claims that may otherwise be quite informal. So with that, let me thank my co authors on this novel, The master Luna, who designed the website and two other students, who helped with German and Dutch text, as well with publishers, institutions and agencies.
Fantastic, thanks so much. questions have been absolutely pouring in. So I’m going to get right to it. So this is there are there 11 in the in the in the q&a box. So I’ll even even preemptively ask you to try to be a little quick so we can see if we can get through them all.
68:36 - top questions should be should be fast enough is that is tough.
68:39 - What are questions from me? So just to be clear, those error and improvement rate estimates that you were mentioning, that’s just you had people manually Sit, sit and read through a selection of these articles. Right.
68:49 - Right. Okay. And someone else wrote, Travis also mentioned in a comment to that question. So how in some cases, were you evaluating? I mean, the question of translation error can be kind of difficult in philosophical context, right, knowing what the right translation is. Where they all pretty obvious, or were there some judgment calls? No, no, they were not obvious. And it’s, you know, it’s more to get an order of magnitudes of where we were, because we, we did a, with some code, we’re able to assess the, you know, the total number of question marks that were eliminated through the machine translation.
So that gave us an indication that yes, machine translation eliminated a lot of the question marks or did it really improve the text? So this is where we had to look at some of the details somehow. And so what was really easy with the was to track problems in the original text that were corrected, you know, a question mark inserted in the middle of the text because it was a, an encoding issues, and that was eliminated in the translation there was most of them not ambiguous at all.
The translation mistakes was more difficult, you know, the one I showed, someone could say, well, yes, it’s, it’s not exactly the same word. So therefore the meaning is going to be different. But we were, I would say, conservative. I feel so we probably should we did more error translations to Google translates, then would would be meaningful. But the key point was that, you know, by looking at this x excerpts, these randomly chosen excerpts, we’re able to really understand where the machine translations did did some improvements, and potentially were introduced some, some, some some mistakes.
But the numbers that we measured manually, you know, they’re just made on on some sample text. And they’re just, I would say, order of magnitude. So, sure, sure.
70:58 - All right. Next Next question from Stefan hispanica, who asks, the curious whether there were any papers containing none, or at least not not highly representing any of the of the 25 topics? So for example, wondering about the apparent absence of chemistry, right, in the in the topic model.
71:16 - Sorry, what about chemistry? So there’s not really a topic for chemistry? So is there a way to look at papers? Were there? Were there some papers that seem to not be a very carry a very strong signal for any of the topics in the topic model? Did you Did you see any that? Yes, but it’s true that the topic distribution per per article is not necessarily can be a bit flat, but the typically the LDA tends to arrange make it so that there are typically some topics are more represented in, in articles.
So for instance, chemistry would be in the particles where we call particles, because there is talk about atoms that molecules. So looking at looking at the details with that particular topic, you would find articles on in chemistry, but we did not look systematically at all articles would have rather flat distribution of topics.
72:21 - Okay, next question coming in. Yeah. So, uh, have you evaluated other methods, so why the choice to use traditional backwards representations instead of some, for example, a transformer embeddings, that might seem a bit more context aware that that would be kind of in line with some of the other ideas that are that are in the talk.
72:41 - And, again, here are the Yeah, the LDA works very, very well, we didn’t see the need to do some to implement some more sophisticated word embedding, top machine learning type of type of tool here, the I’d say the advantage of the vanilla LDA, is that it’s very simple. It’s a complex, it’s complex somehow, but compared to others, still very simple algorithm is very well proven. It’s been used in many different studies. So in terms of, you know, acceptance, it’s extremely well accepted by all their scholars.
The point also is that we had done previously, the study using the LDA, the original NDA, so we wanted to be considered to be able to compare. And the other point is that device when the tested missing translation and topic modeling on parallel corpora also use the original vanilla LDA. And we wanted to be able to compare our results to their results or to contribute to the same type of work here.
73:49 - So this is where we didn’t look further, but we I mean, are we tested on our computers, different types of models? Different ways of doing topic modeling, we didn’t like even the diachronic ones we didn’t. They weren’t there, there are some improvements that are under the I don’t think they’re very significant improvements. But maybe, maybe some, like word embeddings could be used for all their studies, but we didn’t we chose not to hear for the reasons I just mentioned.
74:27 - Sure. Next, next up a really great question from Susan Hudson, who will be our keynote speaker tomorrow, tomorrow at the start of this sort of the day tomorrow. So how would you assess the potentially negative impact on the discipline of philosophy of science if the move toward publishing only or mainly in English? Is there a way to approach that question with with this kind of analysis? This is not something that is shown in Yeah, in the topic model, so what you know, there are things that the topic model shows certain trends.
Now, the trends, it’s, you know, the trends that we observe are not necessarily caused by the shift to English only. Right. So, we have to be careful here, we only observe certain, you know, distribution of probabilities, but we do not have access to the causes behind the shifts that we that we see is something that would need to be investigated.
75:34 - And the fields actually much more complex, you know, also then then this eight journals that we selected, so, there is only so much that we can say about, you know, diachronic changes, probably what we can say is that there was a style of doing philosophy, pre world war two or even up to the 1950s or 60s that change afterwards. I’m not sure it was only because it was in English connected English, we would have to see the similar changes would be the case and it’s a French only corpora are German only corpora.
But that did change and, and here’s some other authors, like Richardson for instance of or Geary have proposed the the fact that the, you know, the field has professionalized itself, also, and therefore, it went through a transformation, a significant transformation phase. And that was probably there a significant factor in the change that we observed in the topic distribution, together with editorial policies, let’s let’s be clear, because editorial policies have a strong impact that we see not on the overall diachronic picture, but that we see on the journal profiles.
77:04 - Great, thanks. Now, back to some more technical questions. So, Eugenio Petrovich, actually, our next speaker asks, did you experiment with other distance measures in addition to Euclidean distance, cosine distance, for example? Yes, we did. We did similar results. Cool. All right. And so far, so we use the Euclidean distance here for consistency throughout. But there is the reason we’re similar, especially for the for the topic to topic, you know, previous topic to the new topic, similarity measures where we use the here the Euclidean distance with did this on their word vectors.
But we also did this on their distribution on the distribution patterns over over articles. And the results were also similar.
77:56 - Not exactly the same, but very, very similar. So that didn’t change.
78:01 - Great. Great. Thanks. Question from from Luca revelli. Oh, this is cool. So what about measuring topology preservation on machine translated text? On a round trip machine translation text from from from the original language out and back? Did you? Did you mess with that at all? No, we didn’t, we didn’t do that we could, we could do that. We thought about, you know, taking the English text through translation and to another language and back.
Especially because we found out that some of the also or earlier publications in English also included some question marks and OCR or encoding issues, but much, much fewer than the than in German or in French. But we didn’t do that. What we thought would be really interesting will be to do the topic preservation test on the corpus of the race, let’s say on the European Parliament corpus, because there there is a gold standard translation and also a machine translation.
But to implement the topic preserved, top topology preservation test, we will need to have access to the entire rock corpus. And, you know, we didn’t have access we didn’t ask for for having access. But it will be interesting, because in that particular case, we would be able to have the results of the topology preservation test together with also their own metrics on this on similarity that they implemented, but on the day this, their similarity metrics that they implemented was between the machine translation and the human translation in English.
So they were both you know, they were comparing English to English machine translation to expert translation, but they were not comparing machine translation or their expert translation to the original text and this Whether topology preservation test.
80:03 - Sure, sure. Okay, next question about about a cleaning from a different kind of cellblock, who asks, did you look into some some of the newer methods actually googling this during the talk a machine learning methods like the Python library autocorrect to parse through some of these OCR errors sort of as typos as a pre processing step.
80:26 - We will look at them a posteriori. But we did not implement them, we just found out actually, it was a bit unfortunate, but we found out about the the errors nddb the amount of errors somehow afterwards, and through the translation and find out that the machine translation corrected a lot of them. But of course, there are many tools that are available to patch these, these issues. We thought there was something you know, interesting to point out, this was somehow our initial mistake not to have found out about this earlier, but only the hand we found out something that machine translation was able to fix it quite significantly, also an end of the way in in ways that then would actually be very satisfying for backerboard approaches or even more than by the board approaches.
But yes, there. Thank you for pointing this. There are other methods for correcting those issues.
81:28 - That’s That’s really funny. One of these cases of serendipitous discovery from from these tools. Yeah. Question from Stefan linguist who asks some curious about the methodological question of how to settle on a number of topics. I know that interpretability is important, but are there advantages, perhaps to presenting one’s results at multiple levels of grain? Yes, you could, you could do that. And this is what we’re thinking about doing with the tool that we showed online, you know, the web based visualization, the thing is, the topic model is already extremely rich in the amount of information.
And if you show two topic models at different scales, then you know, you need to explain even much more, and potentially, you need to investigate much more, or you need them to do a high level, very fine grained topic model to explore only some very specific topics. And this is something that we’ve done, for instance, when we we did a topic model of over 100 topics of just the general philosophy of science, and when you do this, you see some, you know, very fine grained topics that are really about for instance, some some models of explanation.
So you see the topic about for instance, the how the dn model handles the model developed in the 50s 60s. And then its popularity went down and then there were there is another topic about much more general causal modeling, and that that goes up and then goes a bit down. And then there is another one that goes up all the mechanistic model of explanation that goes up the start in the late 1990s 2000, you know, and still on the on the rise. So, you see is a state of details.
So, I would recommend to do a fine grained topic modeling, if you want that particular type of detail. If you want to investigate, for instance, what were the different research topics in philosophy of science concerning, you know, say, explanations or models or concerning causation, for instance. And then which were the some of the key articles that appeared at at rich, which point in time that you can, you can really do but presenting simultaneously, you know, fine grain and the coarse grain is also possible, but you’d need a mapping tool somehow to be able to investigate, you know, see all the details and find out what is interesting in each one of these models.
So they do not lend themselves to publication in traditional or, you know, traditional articles warm there. Isn’t that yes, that I I can imagine and have some experience. Yes. question coming in from from arlie beliveau, who writes Do you have any plans to test this with non European languages? Do you know of any corpora that might be available for that? I haven’t, we haven’t planned to do it be very, very happy to to know that others would like to do it.
We don’t we? I mean, we know that some multilingual you know topic modeling have been done also on non accidental languages. But we haven’t done it ourselves.
85:05 - Okay. The question from under was promoted, you know, Petrovich, who asks, did you check the overlap between the sets of authors for each journal some kind of way to measure that as a as a as a quantitative analysis? No, we did we, this is a good this is a good point. We did not implement No, we did not implement a metric to do this systematically. But some things could be seen already in, you know, in this hierarchical diagrams here.
You see one finger here, current app just appears in philosophy, this lines, and this is the late current app in the 1930s, after he immigrated typically to the United States. And before he was he was here, right? So you see some of the overlaps here, but you see it qualitatively. And so we did not implement a metric, but that would be something good to do. We, you have to be aware that working on authors is it’s still a lot of work needed a lot of curation work, because authors may be spelled differently in different articles.
And also you have the problems of multi authorship, but bill authorship, so it needs to Yeah, there’s a lot of work.
86:25 - Sure, sure. Actually, hang on. I’m gonna I’m gonna piggyback there for a clarification question. So for these diagrams, did you guys actually sit down? I mean, I guess that’s why the time periods a little limited as well, that you guys actually sit down and clean these author lists by hand. Yeah, Oof. Yeah.
86:40 - Okay. Wow. That’s Yeah, that’s a lot. That’s a lot. One last question from from Stefan Hesburgh. And so we actually, we might have time for one more if somebody wants to add one more into the box. So regarding monographs versus journal articles, do you think there’s some way to to produce an estimate of how of how both forms of publication might have developed during the period in question? How would you have access, I mean, having assets that are jewels, there are ways of doing it having access to the to the monography is in or edited volumes.
Your that will be I think, quite awkward, I don’t see, you know, like this spontaneously any easy solution to this, something like Google Scholar maybe or a search on bookfinder. database, because they’re multilingual to be able to retrieve, then there will be a massive amount of work, be able to not only retrieve the titles and all the publications, but also retrieve the content of them to be able to sift through that. It’s a computationally also, that would be something something quite significant.
But it’s true. I mean, every year, you would need to have a view of also what is happening in the, you know, in the, the, in this book portion of the of the philosophy of science. But unfortunately, I don’t have an easy solution. So if you have one, I’d be really happy to hear about it.
88:32 - Yeah, yeah. That’s, that’s always it’s, there’s always a data access question here. With that, I think let me let me, let me let the broadcast catch up a little bit and see if anyone has one final question. If you have a quick one. We could get it in.
88:48 - We have a minute left in the time slot. This was actually quite nice. We’ve lined up very well. Feeling that let me go ahead. And yeah, let me go ahead. And thank everybody, I think we’ll call it there. So that’s, that’s very nice. Very nice on timing. Thanks very much. This was this was a fantastic talk. You You can go back and look at the chat later.
89:14 - That’s been it’s been active in everybody is a passing on many thanks. So fantastic, fantastic stuff. And I’m looking forward to being able to play with that website at some point too. So that’s gonna be that’s gonna be really fun.
89:27 - Thank you again, Charles. And Luca. Oh, you’re very You’re very welcome. We’re very happy that that everything’s been going so well. So we’ll be back in five minutes with our next talk. Thanks very much. .