Keynote: R Spatial
Aug 8, 2021 03:00 · 9965 words · 47 minute read
Edzer Pebesma: …of the Spatial data and how to handle them.
00:13 - So yeah, if you have to talk – I will talk a little – quite a bit about how maps are plotted about the question, what straight lines mean when we – when you faces sphere.
00:27 - I will say something about handling large spatial data sets with R, say a little bit about a lifecycle of R Spatial packages and say something about the R Spatial community and how we interact with R communities.
00:39 - Here is an – here is the link to the slides that I will copy and I will try to copy that in the chat to everyone.
00:54 - I don’t think there is an option to… share it with everyone, but to the panelists, maybe they could forward it to somewhere else.
01:03 - From there, you will – we will find the – basically the…
01:10 - the slides themselves and also the R Markdown that was used in the R code that was used to generate most of the pictures although now and then I cheated here in that.
01:21 - Yeah. So how do we plot maps? This is an interesting question.
01:24 - The problem is that unless you’re in the – actually in the business of creating globes, then basically every – everything you do is a two-dimensional plot of things of Earth-bound data that – and that involves projection, right? The earth is round. It’s a sphere and anything that you look at, even image through – you’re looking at the image – you look at and a little animation on the top– on the bottom left is a projection.
01:50 - And that’s a 2D thing of something that is not 2D.
01:54 - So that is a problem. And so, you have quickly run into this problem.
01:59 - So I had an example here often, you know, a fairly arbitrary, but well-known and respected data scientist active on Twitter, trying to make maps and trying to – running into the problem of…
02:14 - of having spatial data and having to make a map and whatever, the problem like. Okay, what am I going to do here? It was a nice thread and a follow-up advice and so on.
02:24 - It was very helpful. It is good, but anyone doing this basically runs into, you know, creating something and then basically you ask what to do and you think, oh, well, there might be a plot command.
02:35 - So let’s try this, let’s try to plot this map, right? World map and we plot it. And then we see this.
02:42 - And then you think, oh my luck, you know, I’m done, right? Here’s a world map and everyone recognizes this as a world map.
02:50 - Of course – it is a couple of issues with it.
02:53 - I mean, besides that it is flat. The thing I learned in primary school actually looking at world maps is that this island here, the island of Guinea is kind of half the size of Greenland, right? And if that is not the case, then something is wrong with that map, right? So you see that as a strong distortion here, in that kind of – of course you see it if you look at Antarctica, that is the largest continent of this map, which it isn’t, right? Everyone knows that.
03:22 - So, there are some weird distortions going on.
03:25 - And essentially where this came from, why we do it this way, I try to dig it up.
03:30 - And, you know, I always think that I came at – that I came up with it myself, but of course I didn’t.
03:35 - So there is this package maps and the package maps is really an ancient R package.
03:40 - It says something like, you know. . 2003, the first version, but it’s probably older.
03:45 - This is probably where the CRAN archive start.
03:48 - And it says based on older work by – I think…
03:53 - Richard Becker and Allan Wilks, or something like that in S plus.
03:58 - So this is really from the old S plus days.
04:01 - And they made maps like this, you would say map world and map, you were saying, you’ve got this map.
04:07 - And it had an argument projection. Then it had possibilities to use with map porch library to do projections.
04:14 - You said this, the default is to use a rectangular projection with the aspect ratio chosen so that longitude and latitude scales are equivalent at the center of the picture, right? That is what we see here basically, latitude and longitude are basically mapped to X and Y.
04:30 - And so on the equator, we have one kilometer east is one kilometer north and of the equator this is not true, right? Things are stretched, really stretched kind of in east west direction.
04:42 - And so you see that the United States, for instance, has a very different, very much more elongated shape here than here, because for this map, again, the aspect ratio is chosen such that in the center of the map, one, well, it will be one mile, one mile north will be one mile east, right? So we have that at least some kind of, you know, useful scale for limited areas.
05:06 - And for the shape, we have to see whether this is, you know, this is a good way of displaying things.
05:13 - In any case, this is where it came from and this is now what we do.
05:17 - And it’s just one of the many things you can do.
05:18 - There is a very nice xkcd, where Randall Munroe comments on the…
05:24 - on this particular projection, which we call Plate Carree.
05:27 - You think this one is fine. You like how X and Y map to latitude and longitude.
05:31 - The other projections over-complicate things.
05:34 - You want me to stop asking about maps so you can enjoy dinner.
05:37 - And this was probably, you know, the authors of this maps package 40 years ago and me like almost 20 years ago, I think, oh yeah, this is easy, this is cool and so on.
05:48 - of course you can go for a globe, but… Munroe was not so fairly positive about that either, right? Being sarcastic there.
05:57 - So there is a, you know, a couple of things with projections.
06:00 - The thing is that for small areas, usually not such a big issue because anyway, you have pretty much flat space.
06:07 - You don’t notice that the earth is round and projections cannot preserve distances. Yeah.
06:13 - So you lose that in any case, and then it can preserve something.
06:16 - You can either preserve area or shade or directions or some compromise of these things.
06:23 - And then other aspect is that there are so many protections.
06:26 - Early – if you know, nowadays people who create new ones.
06:31 - There’s also no need for the north to be up or for Europe to be in the middle, right? These are arbitrary choices.
06:37 - But doing, you know, a random rotation of years and then sort of unfolding into a projection is often hard to read.
06:44 - It’s a nice exercise, but it’s not… easy to communicate things like that.
06:49 - So here are a couple of alternatives. This is the one that we looked at, which is the default you get from sf, from sp and from geem_sf, and ggplot2.
06:58 - This is also one that everyone knows, it’s the Web Mercator.
07:02 - And it’s the one used in leaflet and the map views or any kind of web interface that uses these tile backgrounds.
07:11 - Google Maps has this as one of the options that is basically where it came from.
07:16 - And you can see there’s much more distortion here.
07:19 - If you look at the relative size of this island of Greenland, but if you zoom in and you look at local areas, it’s pretty good because it preserves shapes, right? It does not–not like here that everything becomes very flattened.
07:32 - So for the purpose of me doing local analysis, Web Mercator is not even such a bad idea.
07:39 - But if we’re looking for creating global, for – creating political maps, for instance, it is of course terrible because Greenland looks like the largest thing in Antarctica, you can’t even do it. It kind of disappears because it gets too large.
07:53 - So this doesn’t work for global maps. Alternatives are there, for instance, Equal Earth projection and Eckhart-IV, which is used for tMap package.
08:03 - And these are some of the very good alternatives that in any case that preserve shape, as you can see here, if you add this house in the cartridges, which are basically circles that have an equal shape and size, if – when you would draw them on the globe, right? And they are basically here, projected, and then you can see the deformation.
08:26 - You can see that they get elongated on the – in – on the Plate Carree and that they get – they keep the same shape, but get much and much larger when you look at Web Mercator.
08:35 - If we look close you see that they get different shapes, but they remain the same area on Equal Earth and pretty much also on the Eckhart-IV.
08:45 - So these are – these would be good alternatives.
08:48 - Looking at more regional maps, you know, we don’t always make global maps, but if projections are a problem and we would do this for large regions, for instance, for a continent, then of course the most extreme example is – would be Antarctica, right? If we plot it in equidistant cylindrical, we get something like this, where we have this entire line, which isn’t to be the South Pole, right? Anything else, Lambert equal area or Orthographic, which is basically the globe view, but then centered on Antarctica, would give something like this.
09:18 - And for North America, we see something like this, where Greenland and again, it’s incredibly, exaggerated than a Lambert equal area or Orthographic projections looks actually much nicer in the sense of that the area proportions are more realistic. Yeah. So they get much more better views on this.
09:39 - So then the question comes like what is a straight line? How we – how do we deal with straight lines in spatial data? The thing is that for simple features, which is basically the way we nowadays handle old spatial data, like points, lines, and polygons, in any case lines and polygons, is it that we handle, we say things are feature.
09:58 - Features means there’s an abstraction of a real-world phenomenon that can be anything like a house or a parcel or a country or a region.
10:05 - And it has in any case geometrical properties and it for simple feature, a simple reading means that we describe the geometric attributes as piecewise…
10:16 - by straight lines or planar interpolation between sets of points. So we have a line – we have a curve essentially that we approximate by…
10:25 - piecewise, by sections of straight lines, right? Otherwise, we can’t handle it.
10:30 - That is essentially what it comes back to. So why is this such a big deal? Well, it’s a big deal because straight lines after reprojection are no longer straight lines.
10:38 - So you have to ask, you have to wonder where, in which projection are they straight, because if you do something else, they are no longer straight.
10:46 - And then there are instances, GeoJSON IETF standard, which prescribes, how do GeoJSON, which is a JSON format, popular by web developers, should handle a spatial data.
10:59 - And they say a line between two positions is a straight Cartesian line, the shortest line between those two points in the coordinate reference system and in the next line or in the next section or something like that.
11:10 - They say the coordinate reference system is the geographic coordinate reference systems, meaning degrees, latitude, longitude using WGS 84 as the datum.
11:19 - Yeah. So, these are latitude, longitude degrees, yeah, but assuming straight Cartesian lines essentially in this space, right, in the space of this projection.
11:31 - And that is an interesting finding. And the question is whether, for instance, GeoJSON users realized it because of course you often, you know, have database and your comfort at it, you take it out and you use basically something as an intermediate format and then push it into some web something, some web application.
11:54 - Then, you know, that’s the question, whether everyone realizes that the standard basically assumes that.
12:00 - So looking at a few adhere, a very contrived example, if we have a straight line between two points on their artic, on this particular projection and we projected line, then that line should actually be like this, would basically is more than a half circle around the south pole, right, which looks like a straight line here and looks like a curve line here.
12:19 - And then the other way around, if we have a straight line on this Lombardi equal area projection, or if we have a great circle, this basically the shortest distance over the sphere between these two points, we would project back to here, it looked like this, and you see that it crosses the anti-Meridian.
12:37 - So it goes basically from one half to the other half and continues here.
12:42 - So how do we do these kinds of actions, right? If we would just say, we take these two points and we say, it’s a line, yeah and we project that line.
12:51 - We say, we basically project these two points and we get this line out, right? Which is an entirely different thing, right? We want to get this one out. So we do that by adding points on that straight line. Yeah. And then basically assuming, short straight lines between these points and then transforming all these points.
13:09 - And then we have a new stripe. We have a curve line here, which is a sequence of small sort of, you know,there should be curve.
13:17 - But if we take that straight and this still works out, right? In the other way around, we basically node things here and we get something out there.
13:26 - Yeah. So we can always add nodes and we can also remove them.
13:29 - We can simplify – we could simplify this one to, you know, to these two end points. But then…
13:35 - if we sort of transform that again, then of course that leads to confusion.
13:39 - You don’t end up with line that you basically have in mind, it’s moderate approach…
13:44 - Yeah. So these are things to do, to be aware of and to take care of.
13:49 - One of the bigger things that I’ve been involved within our. .
13:53 - in our spatial for the last one and a half year is that of spherical geometry, right? And – so since about a month ago or so, we have this package sf, which is used by a lot of other packages to basically to represent spatial data in this point, lines, and polygons, or factor data, with Gorden reference systems.
14:20 - With Gorden, it’s essentially, we have now the case that if these data are represented by ellipsoidal coordinates, so expressed as longitude, latitude degrees, we use spherical geometry.
14:35 - So we basically assume that they are on the sphere rather than on a – in a flat plane, right? And it’s – that sounds like a crazy idea like, why wouldn’t you do that, right? The same people would say, obviously you would do that.
14:47 - Well, the thing is that, like for 50 or more years, we have not done that. We have basically assumed that these two live in the flat arrays, space of a flats, like GeoJSON, just assumes that, writes that literally on paper that we should assume it’s like that…
15:03 - I know we don’t do that anymore. And now a lot of things actually run much better in the sense that we can do buffering. We can do geometric predicates on the sphere.
15:11 - We can do distances on the sphere aswell. So you don’t have to – by doing that, you assumes, you don’t have to worry more – any more about going to a particular projection and that choosing this protection has an effect on what you do.
15:23 - Or you just do things on the sphere. Of course, ideally you would do things on an ellipsoid, which is even a better approximation of the sphere, but the difference between the sphere and ellipsoid, is really, really very small and sort of incomparable to the difference between something that is flat and ellipsoid.
15:40 - So if one could go back to the sort of the pre-sf…
15:45 - 1. 0 behavior by setting a couple of flags and you get the old you get the old behavior.
15:52 - And there’s more discussion on this issue actually in the upcoming book on spatial data science that Roger Bivand and I are finishing up.
16:01 - Then you also need to sort of look at the work that Dewey Dunnington did. Dewey Dunnington mostly wrote the s2 package, which is basically the underlying, the sf package for doing all these spherical, geometrical operations.
16:13 - So we have now two engines, basically one spherical engine and one flat engine, depending on when you have unprojected data, ellipsoidal coordinates or projected data, which are done – handled by the sort of the flat space geometrical library.
16:30 - Right. Another issue that comes up that is sort of worth discussing is that of handling large spatial data sets with R, so there’s, you know, in general sort of handling large data sets, with R is an interesting topic and different, you know, different groups are, our lines of fault have done that in different ways.
16:57 - And if you look at tidy first, it is much more sort of an interface to databases where the data might live, yeah.
17:03 - Your data might be in a… big query, SQL, a Google database.
17:10 - And you basically write your code in tidy first and have an interface to the database that carries out the hard work on the large data sets.
17:18 - basically, you look at the reduced results of operations on that.
17:23 - With spatial data, it is a little bit different in the sense that a lot of spatial data doesn’t live in databases, doesn’t live in tables or the tables don’t work so really well, although for instance, for factor data, you could use Google BigQuery, GIS, which is a way to do that, or other spatial databases that are there to do similar things.
17:47 - And there have been reports of successfully doing that actually with the DB plier interfaces, which is very interesting.
17:54 - So you can essentially have the case that all your data can be held in memory. Yeah.
18:00 - I always sort of buy laptops with the maximum amount of memory that I can afford, or my Institute can afford.
18:06 - So I can do experiments with like, you know, up to whatever 48 gigabytes of RAM or something like that. Yeah, so…
18:13 - package– the most special packages basically hold elements in memory and some of them go further and sort of say, okay, now I assume my data is on local storage, is on hard drive and they will have load everything in memory.
18:29 - So raster and terra, and to some extent, stars also, packages that work with the raster data mostly.
18:35 - So image data, and, those are mostly larger and basically, allow you to make expressions. And then if you want to compute things, it’s going to go through all these imageries and so without sort of trying to load everything in memory, because that will not work anyway.
18:50 - And then there’s a third category, and that is basically big category where you will have data that are not, you know, that you’re not going to download, right? So, so there’s a lot of data now available for free like weather data, that’s ERA5, where the real analysis data to climate modeling intercomparison programs, CMIP6, an enormous amount of earth observation data that is all in principle for free and you can usually download sections of those, but you’re not going to download everything simply because the network is not going to allow you.
19:23 - We think we have fast networks, but if the problem is suddenly that you have to download three petabytes, even if you had the local storage to, you know, to hold that, it’s going to take, you know, years or so to move this data.
19:38 - So this is really a thing where, the network bandwidth hasn’t hold up up with the data – with the volumes of the data that we collected.
19:46 - Nevertheless, all these data are fairly irrelevant for, you know, for questions, for related to sustainability and for the effect of climate impact or weather extremes and so on, or for emergencies satellite data.
20:01 - So, this as stated, we would like to be able to use much easier than we can.
20:05 - A platform that can do that is for instance, Google Earth Engineer and there are a couple of other platforms in the similar fashion to allow you to work with large data sets in the cloud through interfaces that are user-friendly, but it makes it very hard to reproduce analysis independently to scrutinize your computations and also to basically do run your own R scripts or your own R time series model or something like that.
20:30 - But there’s a couple of projects open your own, open your platform, They’re funded by the European Commission and now the second one by the European Space Agency, that are part of a larger initiative for allowing reproducible, meaning open-source and fender independent computing and large cloud-based data archives.
20:50 - And we are also involving R there in the sense that we want to be able to basically run R scripts, run R code on the pixel – on the pixel level of these datasets.
21:01 - And these projects have all contributed to something that is also interested is the STAC (spatio-temporal asset catalog), which is basically formal description of a catalog that allows you to find imagery and the questions you would say, you know, finding data, how do I find data? You go to a cloud and ask for a directory listing.
21:20 - Yeah. But if there are like 50 million images in this directory, right, because it is a document store, then this is not going to work, right? So, you end – you wait endlessly before you get it in, then you get direct relisting and then the directory listing is like 50 gigabytes.
21:35 - Then you think what I’m going to do with that, right? So STAC is a very lightweight and simple, but a modern approach to basically finding images, finding image collections.
21:46 - The idea of opening, always, basically, if we have all these kinds of cloud platform, cloud storage ways of things, and then we have the software layers on top of them, that are, here, is open data cube is a popular reason, modern month, all these software layers that have their own interface of allowing you to work with large imagery data.
22:07 - And that is API basically gets you a uniform frontend, where you can from different clients, quantum GIS, or R or Python or web interfaces.
22:16 - Basically, you can access any back-end through a single client.
22:20 - Basically, carry out the same analysis, use the same script to run an analysis on this cloud or on that cloud and then see if they give you the same answer.
22:31 - It is a project, it is actually, you know, I was one of the initiators, but it’s much larger.
22:35 - It involves, it’s like 10 programmers or so, top 10 software engineers and a lot of institutional support of organizations that actually run these clouds and trying to make this data in these clouds available to a wider user group.
22:54 - But the nice thing is that these organizations actually see the need and see the benefit of using this and R actually – while we are developing it – are actually using this in production.
23:07 - So that it’s basically, not like a proof of concept or a prototype or something, but it is something that they think might be viable if it starts to really work.
23:22 - And right now, we are at the stage, we’re close to the probably in a couple of months or in one month or so, we are – we will be able to offer public access to these kinds of systems.
23:32 - And then of course you need to think about when people are starting to do massive computations, that there is a cost to that, right? Cloud computing is not something that’s for free.
23:41 - And the next thing I’m going to talk about is that of the…
23:44 - lifecycle of a special packages. Special, you know, R packages have a lifecycle.
23:51 - There, this is, I think it’s might be something that they are still in your community came up with, like this is experimental, this is mature, this is retired.
24:01 - And if you are, you know, somewhat familiar with the R Spatial community and you follow the RCT or mailing list, you know, not everyone these days follows mailing lists, but this isn’t mailing list that has been around for over for nearly 20 years.
24:16 - And that, Roger Bivand has actually – has managed all the time.
24:21 - Then you can see, you would have been able to see in his, email signatures that he is now an Emeritus Professor.
24:28 - Yes, that – so that means essentially that Roger retired from his job.
24:34 - And people who retire, they actually deserve to enjoy their retirement and to take on the good things in life that might be answering questions on our mailing list and so on.
24:50 - But one of the sorts of harder things to life, they are allowed, you’re entitled to actually drop that.
24:56 - So I’m reacting on this announcement of my keynote.
24:59 - At least when the abstract also came up, I commented that they might; “abstract” It says, when RGDAL and RGLs retire in 2024.
25:10 - So that was a bit of an announcement that I made – I coordinate this with Roger, and actually we started talking about this five years ago and RGDAL and RGLs basically form together with SP formed sort of the first foundations for R Spatial packages.
25:31 - RGDAL did the IO, the reading of factor and the rest of the data and RGL did all the geometry, of course, in two dimensions.
25:38 - But basically, there with that, you had the components that, would build you, that would give you a GIS.
25:45 - And I said, anyone volunteered to take over maintenance, should contact Roger.
25:50 - And then Roger answered, I’m not sure to taking over maintenance is a sensible use of effort and add map tools to that list.
25:57 - So that is a clear sign that there needs to be – so we have been working actually, working hard on replacements, on modern, more modern invocations of the same ideas of SP, RGDAL and this is basically as F starts, there is not a terra packages.
26:13 - That the terra packets from Robert Heiman’s, that is a replacement for the raster package.
26:18 - Raster uses RGDAL really for reading and writing and terra includes the GDAL directly links to the GDAL library.
26:28 - So, it does not use RGDAL, the R package for that.
26:32 - This is the same thing that the sf package tells.
26:36 - And, so this is basically a signal, right, that everyone should take seriously.
26:42 - R Spatial is, is really a very open ecosystem that is…
26:51 - relatively complex in the sense that, for doing this – as I mentioned already for doing these geometrical operations, we do with intersections or unions or before and so on about geometries, but also for reading and writing data and for doing, for handling coordinate reference systems and so on.
27:14 - We use a lot of tools, a lot of infrastructure that is essentially used by a much wider community, the open-source spatial community, I would say.
27:26 - And, that is, we do that on purpose here. You could write your own projection library that is – that I mentioned earlier, there is this map project library and it was another library that had projections in it.
27:40 - The thing that you can do with, of course, some projections are simple and you can write a five liner or a ten liner R packages.
27:46 - Does it – the thing is that does it keep up with all the other kind of the other changes of the world.
27:52 - It is very convenient that the… sort of all of the people working with open source spatial open-source software for spatial data, essentially look at the limited number of libraries and focus their effort in making them good and agreeing on what they do and what they should do and how things can be improved.
28:13 - And then that is basically depicted in this image, which just sketches the dependencies of the SF package.
28:20 - And, for the terra package this would be similar.
28:22 - It would also link to GRS into project GDAL.
28:25 - And so, these 3 GDAL for I/O, for reading, writing, vector and rusted data approach for handling code and reference systems and for computing transformation.
28:37 - So doing projections, but also data transformations with just one level, more complicated.
28:42 - It’s basically going from one model of elipse in white of the earth to another model of the elipse wide, which is an approximate operation, as opposed to projection, which is basically a mathematical formula.
28:56 - The project in the GEOS library, which does two-dimensional geometry, those are all, the main sort of, work horse.
29:05 - If you would use OJS or quantum GIS, or if you would use a Python package that does anything spatial, like Pandas or PyTorch or Rasterio all these python packages use the same, exactly the same libraries.
29:19 - And so they all look at the same mailing lists and communicate with the same set of developers that work on that.
29:28 - Recently the GDAL project has actually secured a lot of structural funding through, I think through NumFOCUS the organization that also does NumPy and they managed to secure folks.
29:43 - Because there were so much infrastructure in this world also all these cloud platforms like earth engine and the Microsoft variety are so all basically lean on using GDAL for reading and writing cloud optimized GeoTIFFs, and so on.
29:56 - Nobody’s going to duplicate those efforts. Other libraries are NetCDF for array data and UDUNITS for handling units and S2 geometry, which is basically the spherical geometry library, that is a contribution, like open source contribution by Google.
30:13 - So, that essentially powers also the – it’s the geometry engine behind Google maps, Google earth, Google earth engine, and so on, and Google BigQuery GIS.
30:25 - GDAL is it’s a fairly complicated dependency in the sense that it’s like a meta library that uses it something like 100 other libraries for actually reading things, there’s Tiff and GeoTiff and so on then.
30:38 - SQLite3 for light reading and writing GEOS packages et cetera.
30:42 - So this is a complicated thing also for package managers, if you directly link to that and there is very valuable work from fluence assignment or bionic on we’re realizing this on the OS X binaries and also Brian Ripley, works – helps a lot in looking at versions of these new versions of these libraries.
31:04 - And you don’t own too, as it does a great work on the wind builder things.
31:09 - So making it easier for packages to basically link to a very complicated dependencies like GDAL.
31:17 - So that essentially brings me to the end of my talk to the conclusions summarizing, many data scientists will run someday into challenges with spatial data.
31:27 - One of the earliest challenges is then that our projections, how to deal with that? R Spatial is an open and friendly community of people using the R package ecosystem for handling and analyzing spatial data.
31:41 - And there are a number of people in the R community that I just mentioned, but in particular also the Chrome team who have been very much instrumental in making this thing succeed.
31:52 - It has taken a lot of effort from them and still takes a lot of effort to get these packages all the time running with new versions of everything.
32:02 - And we are successful, I think because we use and we interface a lot of software that is used by much larger community.
32:09 - So we basically we use it. We can talk about the same thing.
32:12 - A large part of that is the OSGEO foundation, open-source geospatial foundation.
32:17 - And we are trying now with our spatial to become a community project with a number of key spatial packages become a community project in the OSGEO organization.
32:26 - So we have closure. We develop closer contact with them.
32:31 - Robin Lovelace has been very instrumental in setting that up.
32:34 - And we were also are having an R Spatial panel session at the phosphor G the free and open source software for geospatial world conference, which is held this fall in online in Argentina.
32:47 - As I mentioned, the SF package, which is the new, I would say, central sort of holder reader/writer of spherical, of geometrical of effector data now uses a spherical geometry which is a new thing.
33:01 - So we need to think about straight lines thing may need noting at some stage, we may want to automate it at some point or not we have to figure that out.
33:11 - And you can do simplifying, but only after projecting in your target projection.
33:17 - And we may want to automate is noting, as I said, at some stage.
33:22 - As I implied already a little bit I think we should really reconsider the way we do, the way we plot data now.
33:29 - If data is unprojected, if it is in lattitude longitute degrees, we still choose some projection.
33:35 - We choose path grey, which is a bad thing to think.
33:38 - So we should get rid of it and do other things.
33:40 - And also for smaller regions do different things, probably an autographic S2 plot a library by Dewey Donington already does.
33:48 - After all strings has factors is also no longer true that took 25 years, but it’s never too late to reconsider in any way the spherical geometry was a big step and I think a large improvement.
34:01 - Analyzing large spatial datasets is and will remain a challenge because there is the whole cloud administrative thing involved and there is data sets that all the time become larger.
34:14 - And we have this retirement not only of Rbevan, but also of RGDAL and our RGEOS packages that will happen in 2024 and it has strong consequences.
34:23 - We have been working on good alternatives are there.
34:27 - And these three packages or might be others and UseR’s, UseR’s and developers will have to migrate to these new packages.
34:38 - And that there are quite a – there’s a large number of packages at this moment, still depending on rgeos, rgdal and rgeos.
34:47 - So there’s a lot of work to do that where we will be happy to help with.
34:51 - So that was – that brings me to the end of my talk.
34:55 - Thank you, Edzer, for the very interesting keynote taking us through projections, large datasets and lifecycle of special packages.
35:07 - We have a lot of interesting questions coming up, and I would encourage attendees to keep posting their questions in Q&A up voting any questions that you might want to ask and probably the first to ask a question that I have.
35:24 - You have given an interesting discuss on projections.
35:29 - I wonder how the R special community do take great care in choosing what projection to use when they are doing a specific type of analysis.
35:39 - Maybe these analysis is good for this projection and analysis B is good for this projection.
35:45 - And if not, why that might be not the case of taking into consideration, that I should be using a type of projection for my R Spatial analysis.
35:56 - Yeah, that is a good question, Peter. I’m also not the projection expert.
36:04 - I just have been sort of hiding all the time and saying, well we do this because we always did that right? And that is basically the case now.
36:12 - So, I think the case we have now the situation for default projections, plots of defaults for unprojected data, it’s very unlucky.
36:22 - And if we can improve there and I think anything that’s equal area is there a much better idea than what we do now, which is very non-legal area.
36:30 - Because a lot of larger area plots, even if you are doing global predictions of both grounds biomass or something like that or are maps or forest, forest coverage or something like that.
36:41 - Equal area is always better because it represents equal areas, areas as equal.
36:45 - So you’re not blowing up one part of the world where it looks that you have very low predictions in and decrease in other part where you have very low so.
36:57 - So it is I think even if it’s not about political data, equal areas are much better if even if land miles or ocean coverage or something like that, it is just a much better idea.
37:10 - Thank you and then I’ll go direct to the Q&A.
37:14 - There is a question that has been upvoted here by many attendees and the question is alarming R specialty is a huge challenge for most students.
37:24 - And then Miko is asking, could you suggest two to three skills that are most important to spatial analysis that aim later in your career or job? Yeah. So let me think about that.
37:43 - I think that understanding geometries and what you can do with them is a very useful approach.
37:56 - Or we do that in the introductory chapters of the upcoming book basically thinking of measures, right? Area, length, distance between objects and what does it mean? What does it – what does a polygon mean or a set of polygons? What does it mean? What’s it hexagons and polygon and how they are, how they represent it, representation is not so very important, but kind of what are the implications and then how do two geometries relate, right? What are the possible relations? Do they touch? Do they overlap? What is intersecting? Are they disjoined? What you, which words do you use for these kinds of concepts? And the next step is kind of, how do you use these things in analysis? How are going to use that in analysis? I think those skills are useful.
38:39 - There is that sort of one angle and the other angle is obviously raster analysis.
38:43 - So handling a raster data in a sensible way and doing operations on that.
38:52 - Thanks for the insights. More questions are coming. A very updated question here.
38:59 - What is the relationship between stars and terra, now and in the future, since both seem replacements for raster? Yeah, they are not entirely replacements for raster.
39:10 - So there is, I think, the Terra is really meant is written as a replacement for raster because it has the same author.
39:17 - And he also moves parts of the code base from raster to Terra for obvious reasons.
39:23 - And Stars has a little bit of a different idea.
39:30 - Yeah. That is more the idea of we have airy data and airy data as a somewhat more generic concept than raster data, because we could also have, like, if we have time series of polygons or time is associated with points, right? You can put them in raster data, but you can put them in stars objects.
39:49 - So we basically have, in a spatial dimension, in a temporal dimension, and then how do you, how are you going to do that? Are you going to put like columns next to each other, the wide form or the long form, both is very inconvenient.
40:00 - The logic on this basically an array where you have one dimension time, and one dimension spatial features, and that – these are the star subject, the stars model is more that of arrays and high dimensional arrays more than three dimensional arrays.
40:16 - And so if you have like a time series of multi-spectral images then you have a time, you have the X and Y dimensions of your images, of your layers, right? And you have a spectral dimension then you have a time dimension.
40:29 - So it’s, it’s more sort of meant to do those kinds of things and the Terra is more directly of aiming at raster stacks.
40:40 - Although it also includes now its own classes, for factor data.
40:43 - Yeah. So it’s basically a one, a one fits all package, right then.
40:49 - So Robert and I have somewhat different views on that.
40:52 - Obviously stars probably fits, closer to SF and to the sort of the tidyverse that I like to work with and to implement and what he does in terra, closer matches what he did in raster, in this sort of a new and work performance, iteration of that.
41:12 - So it is – there’s overlaps and there is differences.
41:15 - I cannot say anything about, I can, well, you can look at the number of lines of codes in Terra.
41:22 - It is like five times the number in stars, probably.
41:24 - So that’s much more, but then, you know, stars reuses a couple of things, very cleverly.
41:30 - And it is just different, you just need to see what is best your purpose, I think.
41:35 - And a slight follow up to that question. Lee is asking, who – in which life cycle stage is the current Star’s package? Oh, this is a very good question.
41:46 - Yeah, because it is, it’s kind of a zero something, 05, 05, version.
41:53 - So I think it is – it can be used for serious work.
41:56 - There is just some, some things that work, don’t work that easily, right? Some, some of the things work really good for smaller rasters that they are kept in memory, I think it works great.
42:06 - For larger rasters, a lot of things, for things that it – that are really have to be kept on desk, because they’re way too large to be handle the memory.
42:16 - A lot of things work, but don’t, everything works, right? There’s a lot of things where the idea is right, but then you run into the implementation and basically fixed.
42:24 - It’s not always work, but that is sometimes the case with Tarra, maybe as well, I don’t know.
42:31 - It’s hard, yeah. So it’s – if people run into problems then please respond with issues on GitHub or the mailing list, that really works good.
42:42 - So help us progress the both things. Lot of interesting questions keep coming.
42:50 - You mentioned about the specialty of the science book, and a lot of people are wondering when it will be released? Yeah, right. So the, it is already – so the things that we are writing, is it’s in slight delay is already available on online and it will remain available online.
43:13 - But we are basically finishing up the first complete text that we are going to submit in the next weeks.
43:22 - So that would be, you know, then it needs to go to review and it needs to go editing again, and then it needs to go in print.
43:28 - So, print versions will not be there within six months, I expect.
43:33 - That is always takes a lot of time. Frankly, thanks for leaving the book online for people who like to see the soft copy before the hard copy.
43:42 - Another question is, could you elaborate a bit on the challenge of leaking out with QGIS and GEE.
43:50 - Are these very doable? Linking R with quantum GIS, yes, there is, I think there is, there’s different half mean different attempts to do this.
44:03 - One way would be to use DIS and essentially in to do R as a processing engine.
44:08 - The other way would be the other way around where you use R as your client basically, and then call a quantum GIS processes, right? That might involve other GIS software, like whatever saga GIS or something like that.
44:21 - So there are a number of packages and I don’t know how stable they are.
44:26 - The – this R there is, this new thing by again, by doing Dunnington, I think called Quantum GIS processes.
44:33 - And I, myself, I’m not a quantum GIS user. So I tried to do everything with R and see where things break down.
44:40 - For GigE there’s also packets called RGE, which I think is kind of an R interface to the Python interface to Google Earth engine.
44:51 - I think it uses Regelets to translate commands it’s called translate instructions and to, to obtain objects back.
45:01 - And that I hear people and I hear good stories about it.
45:05 - Yeah, that is very useful. Yeah. Thanks.
45:08 - I think that question by Mita is an interesting one, even that like me I started from ArcMap, QGIS and when we tend to migrate to add, we always wonder, do we have something that can leak us up instead of having like a baptism by fire and going direct to the other one, having started to the graphical user interfaces kind of softwares? Right. Yeah. Yeah, they are important.
45:32 - And there is a number of things that I can imagine that you really want to do with quantum GIS.
45:37 - And you want to keep going doing with quantum GIS.
45:39 - It’s a good software. It is also complicated software, right? It is another, you know, hates to get her a lot of things and then combining it with R is, it’s a challenge. Yeah.
45:52 - Yeah. And then Luca is asking, is there any notes about SF- flux as we’ve did that, is that stable? Yeah, that’s a good question.
46:02 - There’s, there’s one or two issues that some of them might still be open or not on the SF, GitHub site and also on a data table site.
46:11 - I heard, Tim apple Hanskei, the author of Mathew was successfully doing these things I think, you can, in data tables, you can handle as FC, so you can handle, geometry list columns, right? So that basically means you can also work with data frames frames and have a geometry list column in them that they’re not SF objects, but of course they carry on their geometry.
46:35 - You can, you can work with that, right or tables themselves.
46:38 - So, that is similar. The thing where it breaks down is that SF tries to basically take over a number of methods from data table and does it not entirely the way they, the type of likes to do it.
46:49 - So there is, I think there is some conflicts, if you want to sort of work with SF objects that basically R proxies that are data table objects.
46:58 - I think there are some conflicts there. I’m not entirely sure whether we can resolve that.
47:04 - I also, haven’t seen people putting much effort in that, right? And I didn’t do it then so that simply has somebody has to sort it out.
47:13 - I have to feel it. I think it can be done but, you know, because it’s R, anything can be done.
47:20 - -Yeah. -Yeah Who will do it, right? This is the question, -Who will do it? -Yeah.
47:26 - And we have a few minutes to the end of the session.
47:28 - We have about 10 minutes to go. And, I see some concerns, people asking about access to strike, where the conversation can go on.
47:37 - I know Luci will probably mentioned to us the end on how people can access it then if the email has been sent out.
47:44 - I found a question here by Gabriela that has also been wondering about, and he says that, I’ve played with specialty R not for a few years, not many, and have found interesting packages of great value that grew on their own.
47:59 - Is it on the roadmap for R spatial integrate all of them into a broader ecosystem? Yeah. This is an interesting question.
48:07 - And, the question is done on, what is an ecosystem, right? And, the ecosystem, you have collaboration, but you also have competition.
48:17 - And competition can also be very healthy, right? It can be a very good thing.
48:20 - And things in ecosystems grow and they have success and are big and then, after some states, they retire or they die or something like that, or they get killed or whatever, you know, you couldn’t, if something happened and somebody found something on crown and you never could solve it, and then it falls apart, right? So, we are not like, if I think of, if I think of the sort of the R studio, the diverse packages as a successful ecosystem, you have to think that there is a, there is like whatever hundred software engineers, a large number of people, with that are very highly skilled that basically have all their, they can put all their effort in doing that, right? And if you look at what are special, who they are, who we are, and the amount of time that we have for package development, then we are much smaller.
So it is actually, it is, it is a miracle that we, I think, that we’ve got, that we are now where we are.
49:20 - And, in that sense, you have to think also about capacity.
49:23 - And of course, we get occasionally some seed money from Yarkon, searching for things, but we don’t have like capacity in the sense of software engineers working, consistently or constructively on things.
49:36 - Yeah. So off of the competition, we already, we, I mentioned the stars and terra thing, where there is a certain overlap of things.
49:44 - I think that’s good. And the thing is that, Robert Deny, we work very differently and Robert is really somebody who focuses on, on software and then make something that is brilliant and that does everything, right? And, but he’s not like, he’s not constantly communicating with everyone. How shall we do this? How should we do this and that kind of thing, right? That is a different way of doing things.
50:07 - And that creates brilliant products that are just not, that are then also there and there’s an alternative.
50:13 - So that makes it may be easier, for users. If we were like clear, if all the developers who think, you should do dealt with that, and dealt with that and dealt with that, that’s now not the case, right? But I think that is in general, a characteristic of the R community of many our communities, because a lot of packages are very much individual contributions, right?Then you do what you born for and how you think you should do it.
50:43 - And then take it or leave it. Right? So, that is one of the struggles that you have to deal with.
50:49 - I think that it’s fairly similar in the Python world where things are even much less coherent, I think, than in the R world.
50:58 - Thanks for the reflection and insights. Also somebody here is having some deflection and saying it is humbling to me, remind them how difficult it is to be find a straight line on a sphere Then he’s asking, given all the existing knowledge of geometry and calculus we have, Does it fascinate you that sometimes that we are still grappling with a problem of figuring out how to call any present data in really on a sphere? No, I don’t.
I think I’m very optimistic. I look at it from a different site.
51:28 - the reason there, I struggled so much in explaining this is, it’s hard to comprehend first to think on spheres as, as sort of, unless you are a mathematician being grown up, have studied geometry or something like that.
51:43 - But I think it’s the legacy. I think it’s the 50 years that we’ve worked with 2D.
51:49 - with global data, essentially on the two square flat screens. Right? And that we now see, NO, of course Google started 15 years ago, not doing that.
51:59 - They didn’t sort of look at what the GIS community had done.
52:02 - They just said, oh, we’re going to solve this problem right in. Then they said, oh, well, here we have this library anyone can use.
52:08 - So, I think, it’s a matter of catching up. And, the difficulties are really coming from the legacy from how we’ve done it all the time.
52:16 - And from these things written in the DOJ, soon standard where you think, Why on Earth? Right? Why on earth, would somebody write that down that way? Maybe yeah.
52:26 - That’s a good idea. Let’s see. We just have to see sort of how things develop.
52:30 - And I think they are there, they’re getting better.
52:33 - -And probably here taking the last question before I will come to see, or to talk us about slacking so I can know things are working now.
52:43 - Coleen is asking any thinking around support for discrete global grid system in R? -Yeah, there is basically the S2 library that we are using essentially has a grid index.
52:59 - It has and there’s a six phase cube and then a Quadric is on, these are our sort of space funding curves, on these cube phases that is essentially an indexing structure.
53:10 - So, it hasn’t an index that works effectively on the sphere.
53:16 - Other systems are, H3 there is, there are several H3 package, R packages, linking to the JavaScript link to the C libraries that are recently taught.
53:26 - There was also, that do hexagonal grids. There is also a DG grid R packaged it has now been archived. Because of it, there were problems in the, C++ libraries.
53:41 - It used to write by Richard Barnes that is now in archive, but you can still, get it from there and install it on your computer.
53:51 - So, there are several, packages actually doing this, in various purposes and with various interfaces.
54:00 - -Yes. -Thank you. I’ll stop taking the questions at this moment.
54:05 - And I’d like to take this opportunity to thank you for taking us through and answering all the questions that we had.
54:13 - Those that have not been answered, I hope people can come continue on the slack.
54:19 - And at the moment, I would wish to welcome Roziya.
54:23 - To tell us about slack. If people have been able to access them how to go about it.
54:28 - And again, thank you to our sponsors for the day Appsilon and up projected you can see the upcoming sessions for your information.
54:36 - Over to you Roziya. Thank you. Thank you very much, Edzer, for this great talk and, Peter, for being such an amazing Chair.
54:49 - Just for the people who missed the messages at the beginning, we have sent all of the participants, an e-mail invitation to slack.
55:00 - Please check your spam in case you have not seen that.
55:05 - And if you have missed it anyway, for any reason, probably your fault. Please communicate, send us an e-mail, through comfortable or you know any e-mail that you should be able to communicate with us.
55:27 - Many people have already entered the new slack space.
55:30 - So please it is very similar to what we had at the lounge.
55:36 - So, there’s one channel per session. And at the beginning, you’re not going to see the whole list of channels, but you’re going to see – you’re going to be able to click on plus and then see the whole list there.
55:51 - So, see you there. Thank you for your patience.
55:54 - And yeah see you soon. .