PHP Internals News: Episode 47: Attributes v2

PHP Internals News: Episode 47: Attributes v2

In this episode of "PHP Internals News" I chat with Benjamin Eberlei (Twitter, GitHub, Website) about an RFC that he wrote, that would add Attributes to PHP.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 47. Today I'm talking with Benjamin Eberlei about the attributes version 2 RFC. Hello, Benjamin, would you please introduce yourself?

Benjamin Eberlei 0:34

Hello, I'm Benjamin. I started contributing to PHP in more detail last year with my RFC on the extension to DOM. And I felt that the attributes thing was the next great or bigger thing that I should tackle because I would really like to work on this and I've been working on this sort of scope for a long time.

Derick Rethans 0:58

Although RFC startled attribute version two. There was actually never an attribute version one. What's happening there?

Benjamin Eberlei 1:05

There was an attributes version one.

Derick Rethans 1:07

No, it was called annotations?

Benjamin Eberlei 1:08

No, it was called attributes. There were two RFCs. One was called annotations, I think it was from 2012 or 2013. And then in 2016, Dmitri had an RFC that was called the attributes, original attributes RFC.

Derick Rethans 1:25

So this is the version two. What is the difference between attributes and annotations?

Benjamin Eberlei 1:30

It's just a naming. So essentially, different languages have this feature, which we probably explain in a bit. But different languages have this. And in Java, it's called annotations. In languages that are maybe more closer home to PHP, so C#, C++, Rust, and Hack. It's called attributes. And then Python and JavaScript also have it, that works a bit differently. And it's called decorators there.

Derick Rethans 1:58

What are these attributes or annotations to begin with?

Benjamin Eberlei 2:01

They are a way to declare structured metadata on declarations of the language. So in PHP or in my RFC, this would be classes, class properties, class constants and regular functions. You could declare additional metadata there that sort of tags those declarations with specific additional machine readable information.

Derick Rethans 2:27

This is something that other languages have. And surely people that use PHP will have done something similar already anyway?

Benjamin Eberlei 2:35

PHP has this concept of doc block comments, which you can access through an API at runtime. They were originally I guess, added as part or of like sort of to support the PHP doc project which existed at that point to declare types on functions and everything. So this goes way back to the time when PHP didn't have type hints and everything had to be documented everywhere so that you at least have roughly have an idea of what types would flow in and out of functions.

Derick Rethans 3:07

Why is that now no longer good enough?

Benjamin Eberlei 3:09

Essentially, user land developers use doc blocks to put metadata in there, and you could access them through an API. We had two sort of standards, or we still have two standards that use this. The documentation standard coming from the PHP documentor community. And then mostly runtime use case that exists now is covered by the doctrine annotations library, which, incidentally, I have also worked on a lot. It is used, for example, by the Symfony community, by the Drupal community, and by a few other communities as well that are smaller that wanted to go into the direction of using annotations in this case or attributes.

Derick Rethans 3:53

What would doctrine use an annotation for?

Benjamin Eberlei 3:55

I said before that annotations, add metadata to declarations. So let's say you have in your code, for example, classes that you want to store in the database. So you need to map PHP classes to database tables and back. Usually, you would do that using some kind of configuration. And configuration can be many folds. So the easiest way would be to write this in PHP, say, this is the column name, this is the field name, this is the class name and then store and use this information. And then you can go and store this in ini files, yaml files, XML files. The problem with this kind of approach is often that you have the configuration file and you have the class, and they are totally separate from each other, usually in very different places of the codebase. This is not some kind of configuration that is fluid. It's very, very static configuration that depends on the class. And it will not really change unless the class also changes. So changes are usually done together. In this case, it might make sense to put the configuration on to the class. Because then you see the declaration, you see it's configured in some way. And then you can more easily understand that changes affect each other in some way. And it leads to less mistakes, in my opinion. And it makes it a little bit more obvious that the class is used in some configured way.

Derick Rethans 5:26

We've had a quick look at what annotations are. The RFC introduces them in a different way, the attributes that you're not proposing, how are they different from the doc block comments?

Benjamin Eberlei 5:37

The idea is that we introduce a new syntax that is independent of the doc block comments. Essentially, before each declaration, you can use the lesser than symbol twice, then the attribute declaration, and then the greater than sign twice. This is the syntax I've used from the previous attributes RFC. And Dmitri at that point used the syntax from Hack. And it makes sense to reuse this not because Hack and PHP are going in the same direction any more. But because Hack at that point they introduced it that they had the same problems with which symbols are actually still easy to use. And we do have a problem in PHP a little bit with the kind of sort of free symbols that we can still use at certain places. And lesser than and greater than at this point are easy to parse. There are a bunch of alternatives and one thing that I will probably propose is an alternative syntax where we start with a percentage sign, then the square bracket open and then a square bracket close. This is more in line with how Rust declares attributes. While Rust uses the sort of the hash symbol, which we can't use because it's a comment in PHP.

Derick Rethans 6:54

And you don't want to use emojis.

Benjamin Eberlei 6:55

Some crazy people propose to use emojis which would easily work in PHP, but I guess it would be hard to remember the number to get the Unicode sign.

Derick Rethans 7:06

Within the two opening lesser than signs and two greater than signs to close it. What's in the middle?

Benjamin Eberlei 7:12

You declare an attribute name. And then you sort of have a parenthesis open, parentheses close, to pass optional arguments. You don't have to use them. So you can only use the attribute name. If you sort of want to tag something: just this is a validator, or this is an event listener, whatever you come up with, to use attributes for. But if you need to configure something in addition, then you can use. The syntax sort of looks like if you would construct a new class, except that you don't have to put the new keyword in front of it.

Derick Rethans 7:45

It looks like function arguments pretty much.

Benjamin Eberlei 7:47

Yes, exactly. Yeah.

Derick Rethans 7:48

What kind of values can you use in the optional arguments to the attributes?

Benjamin Eberlei 7:53

The attributes are not really runnable code in a way. Since they are declarations, they don't allow arbitrary PHP code to run there. What is obviously allowed a simple literal values, so a number, or a fixed string, a fixed array declaration, and all this kind of things are possible. What is also possible is exactly the same expressions that you can also declare in class constants. So, in the class constants, you can do simple mathematical expressions, you can reference other constants. So, this is something that will be very interesting for attributes to do reference class names for example.

Derick Rethans 8:34

What happens if you define an attribute on a declaration element?

Benjamin Eberlei 8:38

What happens is that while the PHP script gets compiled, it will see that there are attributes declared and it will parse the attributes and similar to the doc block store them on the internal structure for future reference. Attributes are parsed in my current proposal in a way that you can have every attribute just once. This is something that is still under heavy discussion, because there are a few good ideas why you would need two, or multiple. Essentially similar to how a doc block is a string, we then store an array, which represents the attributes belonging to the class or the function or the constant. And this is something that the engine stores and also stores it in OPCache.

Derick Rethans 9:27

How would you access these attributes?

Benjamin Eberlei 9:28

Attributes are accessed through the reflection API. The reflection API also allows access to doc blocks. For attributes that would be a new function called getAttributes(). And it returns a list of all attributes using a new reflection class called ReflectionAttribute. There you can access what name does this attribute have? What are the arguments that are passed? And then this goes into one of the next features of this RFC proposal. You can also ask it to return this attribute as an object instance.

Derick Rethans 10:05

An object instance of which class though?

Benjamin Eberlei 10:07

Attributes, and this is something that is different to the initial version, the version one attributes RFC is, attributes names resolve to class names. That means if you declare an attribute, for example, Foo, and you have an import for our class, MyApplication/Foo, then during passing the attribute will be resolved to my attribute view name. It uses the same mechanism for class resolving that is used in every script. It reflects the use statements that are declared in the file. And you can use namespaces, namespace operators to reference the attributes as well.

Derick Rethans 10:49

These are attributes not classes, so I don't quite see all the link between the attribute names in the classes is?

Benjamin Eberlei 10:55

One problem with the original doc block based system was that there are conflicts between attributes of different systems. One library would have a type annotation, or a var annotation, and some other library would also use it. This could lead to conflict if the syntax for them was slightly different. So this would lead to problems when multiple parses would use the same attribute. And they would parse them differently. And this could lead to errors. One problem that was mentioned in the initial attributes RFC and that, I think, if you vote us all so used as a reason for voting no is that there was no namespacing, which means that different libraries could clash and their use of attributes. My idea was we already have classes, we have namespacing. We can resolve this by using this mechanism. You declare an attribute and an attribute always resolves to a class. In the best case scenario, you would also declare this class in your code. Essentially, the attribute is not an attribute, but it's a special class that represents an attribute. This is also shown in the code that by having an additional interface, or a sort of a marker interface, that attributes can implement to make it obvious that they they are used as an attribute.

Derick Rethans 12:19

You mentioned that you could access the attributes through reflection API, and you can get them out as an object?

Benjamin Eberlei 12:25

Yes, this is why I mentioned before that the syntax sort of looks like constructing a new object, but without the new keyword. When you access the objects through the reflection API, it would essentially instantiate the class, and all the arguments that you put into the attribute declaration are passed into the constructor of the object. And this is why the connection is there between a class and an attribute. It directly goes to instantiating the attributes as an object using the arguments and giving the developer access to them.

Derick Rethans 13:00

Does it only do something like this when you use the getObject() on the reflection arguments? Or is it also possible that I don't care about these classes things whatsoever, and I can just get a list of attributes and their optional values that are associated with them?

Benjamin Eberlei 13:16

You don't have to have a class, and the class name resolving in PHP is independent of classes actually existing. The attributes RFC respect that. You can just import anything that is not a class and use an import statement to shorten the attribute usage, or you can use the absolute namespace syntax to put a fully qualified attribute name into your code. And it wouldn't fail. The fail would only happen when you call the method on ReflectionAttribute to get the attribute as an object. So this is something the RFC is also in flux with and about to change it. The first version mentioned that attributes will always be auto loaded when they are declared at compile time. This would essentially treat attributes similar to base classes or interfaces, in a way that they are always resolved, they're always checked. However, this is a little bit overkill for userland attributes. And a lot of feedback was related to this should only happen when the reflection API is used. So I'm going to change this. One thing that we do need to handle in a way is a built in attributes. One reason why I want to add this RFC as well is that there are a few use cases coming up in PHP itself, that could benefit a lot if we had built in attributes. Since we don't have a clear path forward there. But Nikita has published his ideas on editions. So there's some paths forward to having PHP code work slightly differently depending on what developers want. Attributes could be helpful there. Other things for example, the JIT. JIT has features where you can at the moment use doc block comments to declare methods as always JIT-able or never JIT-able. Dmitri used doc block comments to check for JIT or no JIT tag in there. This is essentially something that attributes should be used for because should be machine readable. Then there's a lot of other stuff that for example, Rust also put forward that PHP is struggling with: conditional declarations of functions. For example, Symfony has a polyfill library that adds functions that are in higher languages, re implements them in a way that they're also available in lower versions where they don't exist in core. There are a lot of hacks around the sort of conditional declaration of functions and classes and stuff that make it difficult for OPCache to actually cache the files. I believe there are also even more problems if you use these kind of fights with pre loading. Essentially what could be done with attributes would be something like conditionally declared as function only if it's on PHP 7.3 and lower something like this.

Derick Rethans 16:13

You just mentioned using JIT or no JIT as an annotation. Does that also mean that extensions have easy access to these attributes?

Benjamin Eberlei 16:21

OPCache's not a PHP core functionality. It's still its own extension. The idea is that extensions have access to attributes in a very simple way. So there will be a Zend API, sort of an internal name for an API that the Zend engine provides to extensions and extensions will be able to access attributes and make decisions based on this. Extensions can already hook into the compile step of PHP and there's a hook called zend_ast_process. During AST processing, you can do stuff. That would be one way to, for extensions to look at attributes and maybe change code if they want. Then the engine obviously has tonnes of other hooks where the declarations are available in the data structure that the Zend engine provides. So there's zend_class_entry, for example, where you could look into the attributes as an extension and make decisions.

Derick Rethans 17:20

This is a pretty new RFC, and hence there're always going to be few open issues. Because we like to argue about stuff. What are the open issues on this RFC?

Benjamin Eberlei 17:29

This is the seventh RFC on this topic. So there has been a lot of discussion. I guess this feature is, in a way quite controversial because of the implementation details. A lot of my work now will be to find the best implementation that can actually make this feature part of core by getting enough votes for it. And so I gathered a lot of feedback from the community; also talked a lot to contributors. Changes that I will be probably doing is allowing multiple attributes. What I said before, the auto loading has to be clarified. There has to be some distinction between internal attributes and user land attributes in a way that doesn't require auto loading. Hack, for example, has __ as a magic prefix, which I want to avoid, because it puts up all this magic methods, sort of argument back on the table. We need to have something to make a distinction between userland and internal attributes, because the internal attributes need to be validated very strictly at compile time. And the userland attributes need to be validated only when you call the getAsObject() method on the reflection API.

Derick Rethans 18:42

How long do you think there'll be before you put this RFC up for a vote?

Benjamin Eberlei 18:46

It's a bit tricky because this issue is so controversial. I don't want to invest month of work and then get a no vote. And so I do want to have some feedback quite quick enough. I do realise that the first draft needs some work and clarifications that would otherwise lead to no votes from contributors. So I hope to get this done in, let's say, two to four weeks of additional work.

Derick Rethans 19:09

All right, Benjamin. That was a great explanation of the attributes version two RFC.

Benjamin Eberlei 19:16

Thank you for having me, and I really appreciate it again.

Derick Rethans 19:21

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 46: str_contains()

PHP Internals News: Episode 46: str_contains()

In this episode of "PHP Internals News" I chat with Philipp Tanlak (GitHub, Xing) about his str_contains() RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 46. Today I'm talking with Phillipp Tanlak, about an RFC that he's made titled str_contains. Phillipp, would you please introduce yourself.

Philipp Tanlak 0:35

Hey, Derick. My name is Philipp. I'm 25 years old and I live in Germany. I work for an IT service company, which does mainly development and maintenance of IT projects. We specialise in the maintenance of e-commerce website and create enterprise applications.

Derick Rethans 0:52

How long have you been using PHP for?

Philipp Tanlak 0:54

I've been using PHP for quite a long time now that might be six years I guess.

Derick Rethans 0:58

What brought to you creating an RFC?

Philipp Tanlak 1:02

The main reason I've created this RFC was out of necessity and interest, mainly to scratch my own itch.

Derick Rethans 1:08

That is how most things make it into PHP in the end isn't it?

Philipp Tanlak 1:11

Yeah, I guess.

Derick Rethans 1:12

The RFC is titled str_contains, that tells me something that is about strings and containing things. How do we currently find a string in a string?

Philipp Tanlak 1:22

The current approach to find the string in a string is to use the strpos() function or the strstr() function. But on Reddit, I found someone also use preg_match which I find kind of interesting.

Derick Rethans 1:35

There are multiple amount of different methods in use, what are the general problems with these approaches that people have made?

Philipp Tanlak 1:41

So the current approach which I find is not very intuitive, and mainly because of the return values of these functions. For example, the strpos() returns either the position where the string is found, or a false value if the string is not found, but there has to be a check with a !== operation, and the strstr() function just returns a string. So you have to convert that to a boolean to check if the string is found or not.

Derick Rethans 2:11

Because with strpos(), if you wouldn't use the === or !== operator. Of course, if it would find it at the first position of the string, it'd be zero position, and it would return false, even though it's sfound it.

Philipp Tanlak 2:26

Yeah.

Derick Rethans 2:27

So there's a few different problems with these things. Also, I don't think it's particularly vary intuitive to do because you sort of need to come up with like a whole construct to see whether it's part of a string.

Philipp Tanlak 2:37

Correct. I don't think it's intuitive for a beginner. So if someone is learning PHP for the first time, then he has to search through the documentation, what are the exact return values for these functions, and has to remember that so I thought, string or str_contains() might be a better fit for that to just return a true or false value.

Derick Rethans 2:58

We've mentioned str_contains() a few times now, I guess the RFC is producing to add this function. How would this function differ from what PHP already has?

Philipp Tanlak 3:07

So this function does not differ in a lot of ways. It's basically the same implementation of the strpos() function. But instead of returning the position of the found string, it just simply returns it as a boolean value. So either true or false.

Derick Rethans 3:23

I can imagine some people will say, well, you can just do this in your own wrapper function, right? Because pretty much what it deos is converting the results from strpos() to a boolean. But you must have a good reason of why to want to add an extra function here.

Philipp Tanlak 3:38

The reason for this function, and maybe someone might disagree is, mainly a user experience for the developer. So this is just out of necessity which I found, and I've been using this function quite a lot. So I thought this might be a valid add to the PHP language. So I tried to implement it and it got some great reviews. So I thought that wasn't a very bad idea I had.

Derick Rethans 4:04

Is the RFC suggesting just out a single function: str_contains().

Philipp Tanlak 4:09

Yes, the RFC is currently adding just a single function, which is the str_contains(). When I first submitted the discussion about this RFC, there were quite a few people asking why is there no case insensitivity or multibyte versions for these, and I did not think of those at first. But in the discussion, it became clear that the multibyte version did not seem to be very necessary because the comparison is going to be byte by byte. Unlike strpos(), the position of the found string is not relevant. So it doesn't matter if there is any difference in encoding.

Derick Rethans 4:47

I remember in last year, there was another RFC related to strings functions they were the string_starts_with() and a string_ends_with(). Those are two functions and there were also variants for both case insensitivity, ss well as multibyte. Which made eight different functions to be added to pretty much do a single thing. That RFC failed, potentially because there are so many things being added.

Philipp Tanlak 5:11

Yeah, that was also the main reason, I think the case insensitivity of this function, or the variant of it was not so relevant. So I did not include it into the RFC just because of this case you mentioned. So instead of polluting the global space with more functions, someone suggested to just advance PHP incrementally and add in case sensitivity for this function just if it is necessary.

Derick Rethans 5:37

This is a common recurring subject. Most of the people I spoke with in the last few episodes are all adding things to PHP bit by bit instead of coming up with big RFCs which I think is a good way of going forwards. When reading the RFC, I had a quick look at which argument the function would accept. PHP of course this weakly typed strings in most of time. Is this str_contains() function handling distinct different from what strpos() does for function arguments.

Philipp Tanlak 6:10

So the str_contains() function uses the same internal function, which is php_memnstr(), if I recall correctly. It tries to interpret it as a string. And if it's not a string, it either throws a warning or notice, but I've just run some checks and it seems like in the next PHP version, non string values which are passed into the string functions will be interpreted as a string, and if that is not the case, it will throw an error or usually return false.

Derick Rethans 6:43

So it doesn't do any special magic, and just relies on the PHP tends to do for parsing arguments and weak and strict typing.

Philipp Tanlak 6:51

Yes, that's correct.

Derick Rethans 6:53

Most RFCs they come with a patch, as does yours. How did you find it getting started with writing things for PHP instead of using PHP.

Philipp Tanlak 7:02

So basically, I've looked at the PHP source code in the past, just to see how things are implemented. And I had some basic background in C. So I thought that this was not very hard for me. Most of the functions or things I had to do to include this patch, were already there. So basically, I just copied the strpos() function and remove the, when the string is found, use the position to calculate a new string and just remove that code and return the boolean value from the found position.

Derick Rethans 7:35

Because it is not a very different function from strpos(), it's just pretty much a different return type. It's a lot easier to do.

Philipp Tanlak 7:44

Yeah.

Derick Rethans 7:45

When looking at feedback, what were the main criticisms of this?

Philipp Tanlak 7:48

The main criticism of this was basically just the variants of these functions. So mainly the multibyte variant or the in case sensitivity. Other than that, the response was very, very nice and, and also very rewarding for me. So I thought I did a good job on this. And many people wanted to have this function in PHP, but either did not have the time to implement it or it was too easy. I'm not sure how that went. But I think the response from the devs and the overall PHP community was very nice.

Derick Rethans 8:23

The RFC is already in voting, so I'm I'm a bit late to talk about them. Usually I'm and things are still in discussion. And at the moment, it looks like it is passing because the votes are 43 to 6 with another weeks ago, then.

Philipp Tanlak 8:37

Yeah.

Derick Rethans 8:37

Do you think this will be your last RFC? Or do you have something else in mind?

Philipp Tanlak 8:41

At the time of this recording I don't have anything else in mind, but maybe if I find something. Since I'm working with PHP on a daily basis, which I think is worth adding to PHP I might create a new RFC.

Derick Rethans 8:54

That's how I started and see what happens now. Thank you for taking the time to talk to me today Phillipp, I hope you enjoyed this.

Philipp Tanlak 9:01

Yeah, thanks for having me Derick.

Derick Rethans 9:05

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 45: Language Evolution Overview Proposal

PHP Internals News: Episode 45: Language Evolution Overview Proposal

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the Language Evolution Overview Proposal RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 45. Today I'm talking with Nikita Popov yet again about a non technical RFC that he's produced titled language evolution overview. Somewhere last year, there was a big discussion about P++, an alternative ID of how to deal with improving PHP as a language but also still think about how some other people already use PHP and I don't really want to change how they currently use PHP. Like then I didn't really have an episode about that because I'd like to keep politics out of this podcast, or definitely PHP's internals politics. I do think that we realised at that moment that something did have to happen, because there's not really policy about when we can add things, when we can remove things, and so on. So I was quite pleased to see that you have come up with a quite wordy RFC, not talking about anything technical, but more looking forward of were will see PHP in the near or medium future, I would say. What are your thoughts about making this RFC to start with?

Nikita Popov 1:29

As you mentioned we had some pretty, let's say heated discussions last year, concerning especially backwards incompatible changes. So there were a number of very, very contentious RFCs. One of them was the short opentags removal, and another one was the classification of undefined variable warnings. So whether those should throw or not throw, and well basic contention is this that PHP is a by now pretty old language, 25 years old. And we can all admit that it's not the language with the best design. So it has evolved relatively organically with quite a few words, and the famous inconsistencies. And now we have this problem where we would like to resolve some of these long standing issues. Many of them are genuine problems that are introducing bugs in code, that reduce developer productivity. But at the same time, we have a huge amount of legacy code. So there are probably many hundreds of millions of lines of PHP code. And every time we do a backwards compatibility break, that code has to be updated, or more realistically, that code does not get updated and keeps hitting on old PHP version that, at some point also drops out of security support. And now the question is how can we fix the problems that PHP has, while still allowing this legacy code to update their PHP version. The general idea of how to fix this is to make certain backwards compatibility breaks opt in. By default, you just get the old behaviour, but you can specify in some way, exactly how it's done doesn't really matter at this point, that you want to opt into some kind of change or improvement.

Derick Rethans 3:34

As one example being the strict types that have been introduced in PHP that you need to turn on with a switch with a declare switch.

Nikita Popov 3:42

Strict types is really a great example because it has the important characteristic that has done per file. So you can turn on the strict types in one file and not affect any other code, at least in theory. So there are some edge cases, but I think like mostly you can just enable strict types in your library and you don't affect any other library that the project uses. We would like to extend this concept. It should be possible that libraries can update to your language, well, it's called language dialect without forcing other libraries or without forcing the using codes to update as well. Because this is what we have to do right now, though, before you can update your project to PHP eight, let's say, you first have to wait that all the libraries you're using update to PHP eight. And maybe there are libraries that are going to update but also say that: Okay, now actually PHP eight is required. And then you kind of get these complex dependencies with libraries supporting these versions and not supporting those versions, and doing updates becomes pretty hard. As I said, the idea is to make the these backwards incompatible changes opt in some way, and there are multiple general models. So as you mentioned, P++ is the most radical approach. It's more or less a separate language but sharing the same implementation. And as the name suggests that this is inspired by C and C++. So those are usually implemented in the same compiler. And they can be interoperable in a limited way, mostly in that you can use C code inside C++ easily. Using C++ code inside C code tends to be much harder. Yeah, P++ is, I think the option we are pretty unlikely to take for a couple of reasons, because it's this kind of one time huge break which first means that we only have one chance to get it right, and given all the track record, we should maybe not rely on that. Also means that the upgrade becomes especially hard because you have to do everything at once. It's not spread out over a longer time.

Derick Rethans 5:54

You say that we need to get it right in one go, but that is hard to say because you don't know, in the future what else we want to add? Like the RFC mentions a few few other cases, like, for example, things like forbidding dynamic Object Properties, we'd have to do right away now as well, if he'd go with the two languages one implementation phase, right? I mean, if we hadn't thought about it, nobody would have thought about it after the split as we made, we'd still not be able to do it.

Nikita Popov 6:20

That's true. So P++ is, one time, one time solution. It doesn't really scale over time. I mean, there are also other concerns. And I think like in the end, one of the big ones is just that we don't have the resources for it anyway. So we have only maybe three full time developers on PHP. And I don't think we want to start focusing on this huge separate language more or less. Now we're just going to take a couple of years. Next to having this entirely separate language, there are two other ways to approach the problem. One is editions, which is a concept used by the rust programming language. The idea there is that next to the version, which is more or less than implementation version, you also have this edition, which is a completely orthogonal concept. Basically, we will say: okay right now we are for example at edition zero. And then in addition one you opt into some kind of set of backwards incompatible changes. Then in addition two, there are more backwards incompatible changes, and so on. Each edition is essentially a superset of the previous one.

Derick Rethans 7:32

Would it also mean you couldn't get new features in a new edition or is it purely about making backwards incompatible changes?

Nikita Popov 7:40

So, this is purely about backwards compatibility. So, if a new feature can be added without breakage then should always be available. The editions switch would only control the backwards incompatible parts. This is to contrast with the second approach, which is to have fine grained declare statements. As you already mentioned, we have the existing strict types directive and we could continue down the same path. So, we could add new declare for no dynamic Object Properties equals one, and then for a strict operators equals one, and for whatever else equals one. And then you would have this long list of possible declares, with which you could enable or disable some particular bit of language behaviour.

Derick Rethans 8:26

Then I can imagine that in another five years, that list might be 20 options long.

Nikita Popov 8:31

Right. So, the concern there is of course, one part is maintenance, because we have to support basically an exponential combination of different options. And the other is from the programmer perspective, that the like mental model becomes more complicated because you have to keep in mind like which exact set of declares am I using right now? I should say, though, that this model is actually used by Python. Because Python has this import or use from future feature. So there is basically this magic module __future from which you can import language features that will become the default in newer Python versions. For example, you can import the new integer division behaviour inside an older version. This is more or less the same as doing the declares, the fine grained declares, just with a different syntax and with the I think, stronger focus that the behaviour is going to become the default in the future version.

Derick Rethans 9:38

So basically, you're opting into experimental functions really?

Nikita Popov 9:41

Could be either experimental functions, or it could be really functions from newer versions. In particular Python, also for a while had parallel development of Python 2 and Python 3, in which context this probably makes more sense.

Derick Rethans 9:56

There's pretty much three options that the RFC mentions: a new language common implementation or the PHP / P++ option, the editions, and the fine grained declares. These are all still going to be based per file?

Nikita Popov 10:12

So that's the second large question, what is the general model? And the second one is where we declare it. The approach I was initially pursuing was to have this declare it at the package level. So for a whole library or for for a whole project.

Derick Rethans 10:32

How would you define what a package is?

Nikita Popov 10:33

We have namespaces. And there is a somewhat loose coupling between namespaces and packages. So I have an old RFC for a namespace scope declares, where you could, for example, specify strict types for whole namespace, which is, I think, maybe the most natural way to treat packages right now, because this is the closest thing to a package we have. Fortunately, it does have a few issues. One of them is that this namespace package mapping is not always there. So there are packages that have some somewhat odd nesting of name spaces. And I've also heard that some people, for example, define their models inside the Doctrine name space, because they're, you know, extend their classes. So they also put them the namespace. Of course, you shouldn't do that. But it's things that could happen, because we don't really have this enforcement that the namespace really is a package. And then there are also technical concerns, because right now, namespaces are really just a compile time thing to handle name resolution, and now they kind of turn into a feature that also has some kind of runtime impact. And you have to consider things like what happens if you have multiple namespaces in the same file, and also other considerations, like what happens if the names namespace is first used, and you issue some namespace scope declares afterwards. All that can be resolved, but it makes the model somewhat more complicated.

Derick Rethans 11:53

And I guess you end up having to declare these namespace scope declares maybe in a separate file or something like that?

Nikita Popov 12:14

At least what I have in mind that is that you would declare them in composer.json, and Composer would then take care of registering them with PHP itself. Of course, you could also do that manually, which are not using Composer but that at least was the 95% use case.

Derick Rethans 12:31

In applications that make use of Composer, it is very likely that Composer knows about all the libraries that a specific application uses, and hence will be able to construct an array, where it can tell PHP by calling a function declaring all the different options or editions of whatever that end's up being.

Nikita Popov 12:49

So that's one of the approaches. There are also some alternatives. One is to instead introduce an actual package concept. One of the possibilities is to basically: add an extra line to each file, which says package and the package name. So that really removes any and all ambiguities. But you do have to add that extra line, which serves some very limited purpose. And basically only for these package scope declares, could maybe also be used for some extra features, like, package private symbols.

Derick Rethans 13:23

But it would also instantly make that code base non-parsable with older PHP versions.

Nikita Popov 13:28

That's also true, right. But that's a general problem that most approaches I think, would have. So namespace scope declares is one that doesn't have it, but even the per file approach would have this problem because if you write for example, declare edition, then you would right now on PHP seven get the warning that the edition declare is not known. Yeah, last variant that I'm discussing here is to make packages based on the file system, which is something many other languages do. So you have some kind of magic file somewhere that says okay, this directory and all the sub directories are part of the package. In PHP, this kind of file system based approach is somewhat problematic, because our include mechanism is not really based on the file system but on fairly general stream abstraction. You can include from the file system, you can include, if you're really crazy from HTTP, but you can also include from Phar files, from an input stream, or from some kind of custom defined stream. These file system based packages require some additional operations to be well defined. So they have to have a notion of path canonicalization so you can determine whether a file is inside the directory, even if there are things like symlinks or the file system is case insensitive. Which does exist for the file system. So we have the real path syscall, but doesn't exist for streams right now. And a similar problem is that we need to be able to walk up from a path to the directories. And that's also something that doesn't exist for streams. And like more generally, not all streams really have a well defined concept of a directory. For example, if you are reading a file from stdin, so the stdin or the input stream, then there is no directory and like, which package is that going to be in?

Derick Rethans 15:31

I think it would be hard to end up debugging at some point. So why some things don't actually end up being in a package where you expect them to be, for example. And then on top of that, you also need to define: Well, how do I call this file and things like that, right? I mean, a PHP script wouldn't be just a single file, for example, would be a single file and this extra definition file. And that's the concept of course that we don't have in PHP at all. Everything is on profile pretty much.

Nikita Popov 15:56

Which is why at least to right now. I think, like the immediate way forward, is to use per file declares. So if we don't use the fine grained declare approach, and instead have a single edition, then it's not really a problem to put the declare edition inside every file, because this is already what we do for strict types. It's like not super ergonomic. But I think it's also not a huge problem. And it does have the one very big advantage that files are and remain self contained. So you don't have to consult an external definition that may be hard to locate to figure out how to process.

Derick Rethans 16:36

And every IDE or tool would have to implement that same logic and make sure that it's all consistent with each other as well.

Nikita Popov 16:43

I wouldn't say it's really hard, but it might be somewhat fragile, especially when it comes to convention. I said if we put things in composer.json, there's probably something tooling can easily deal with. But if you then encounter a project that doesn't use Composer and uses as some other way to register the package declares, then you might run into problems.

Derick Rethans 17:09

Lots of things to talk about and discuss at some point. As you submitted this RFC to the mailing list some time ago now, what is sort of the feedback that you're getting on this?

Nikita Popov 17:19

So I think the general direction, at least this pretty clear. Most of the discussion is focused on the addition concept, not the finger in declaratives, or the P++. I think for now, we would also go with the per file approach. Now, the main two points that remain contentious is: first, how does the support timeline look like? So basically, the concept of editions just enables different libraries to upgrade independently. That's the core premise. But at least in Rust additionally editions of are also guaranteed to be supported forever. So you can leave your old code running on the old edition, and you do not have to ever update it.

Derick Rethans 18:10

How often do they make new editions? Every three years?

Nikita Popov 18:13

Yeah, it's not quite clear yet, but probably it's going to be every three years. And now for us, the question is, well, do we want to support old editions forever? Or do we want to give them a finite lifetime? Say we introduced a new edition in PHP eight, and then we supported until PHP nine. That means code can take its time to do the necessary updates, but it does have to do the updates at some point.

Derick Rethans 18:37

But you'd have five years?

Nikita Popov 18:39

It's more of the general question of if it's forever or if it's limited. So I think based on the discussion, there is a pretty strong preference to not support them forever.

Derick Rethans 18:51

But for how long then? I mean, it must be longer than what we support a normal PHP version for, right?

Nikita Popov 18:56

Yeah, would expect it to be something like a major version cycle. The second question is related to the strict types, as you said, strict types is like an existing example of a mechanism that works like this. And now we're introducing a second mechanism with the same basic characteristics. Are we going to merge them or not? Would we say that, in the new edition that strict types is enabled by default, or even always enabled? If we do that, and we say that additions have limited support life, that means that strict types is going to become the only option in the future at some point, at least. You can imagine that this is somewhat contentious because there are quite a lot of people who consider weak types to still be the superior option.

Derick Rethans 19:49

Whenever I go speak at conferences or user groups, that's not the case. One question is, which keeps recurring always is: Why isn't this the default in PHP eight? I think there's an expectation that strict title at some point is going to be turned on by default.

Nikita Popov 20:04

Yeah, and the thing, this is where people disagree whether this expectation is this or not. So there are plenty of people in the discussion thread, well, by plenty I mean, at least two, who strongly think that strict types should remain an option. I mean, PHP of deals with often deals with input coming from HTTP or from a database which is usually coming in as a string. And they think that the typecast you have to do to make that work with strict types actually kind of weaken the type safety guarantees, because if you perform an explicit cast, then that cast is performed basically without any checks. So you can like take a completely non numeric string cast it to integer and you will get zero without any warning or whatever. While even in weak typing mode, that would still result in an error.

Derick Rethans 20:58

It's a curious thing actually when you mention databases because, of course databases, you've defined very strict types for your data in them. It's just that it's interesting that PHP's interface to most of these old SQL databases, just decided to always turn into a string.

Nikita Popov 21:14

It's it does actually support returning things in they're like native type.

Derick Rethans 21:20

With PDO, yes.

Nikita Popov 21:21

But under options, and I think it's also like dependent on whether you do emulation or not, and stuff like that. And you have all these different drivers that have differing support for that. But yeah, to get back to strict types, but one of the options is to really keep editions and strict types separate, and also evolve the strict and the non strict mode independently. So you could say that in the new edition, the strict typing mode becomes stricter, for example, by also extending to operators, arithmetic operators, not just to function arguments, but that of course doesn't mean that: Yeah, we saying strict types of states exist forever as a separate track of language.

Derick Rethans 22:06

Yeah, that's an interesting one. I'm not sure how to get to a conclusion there actually. Because there's always going to be people on each side side.

Nikita Popov 22:13

Yeah.

Derick Rethans 22:13

Would you think that this language evolution overview proposal would have been decided on which way to go by the time feature freeze for PHP eight comes around?

Nikita Popov 22:23

I think it would be pretty good to have this for PHP eight, because well, it's new major version and the time to introduce this kind of concept. I should say, though, that we already have quite a few backwards incompatible changes in PHP eight, and at least some of them are, like, we are definitely not going to retrofit them into the editions concept. So there are already certainly going to be breaking changes there.

Derick Rethans 22:52

Why wouldn't you retrofit them? I mean, if we end up deciding a PHP eight will have these editions, would they not be part of that or would they always end up breaking anyway? Because it seems like a sort of an ideal place to then do it.

Nikita Popov 23:05

And yeah, problem is just that the there are some quite extensive changes, especially when it comes to warnings versus exceptions, and will just be like a lot of efforts to get this under an edition flag and to support both behaviours there. Maybe some of the existing changes could be moved into there, with not a huge amount of effort. But I think there are definitely going to be some like hard edition independent breaking changes.

Derick Rethans 23:37

New major PHP versions still might have some backward breaking changes independently from when we do the editions or not, or more declares or not?

Nikita Popov 23:46

Yeah, that's like one more question, what exactly is the scope of editions? What goes into the edition, what doesn't go into there? I mean, there is always a cost to ending something with this mechanism. One is just maintenance for us. And of course that like user has to consider more different versions of the language. And I think one particularly large aspect that would likely never fall under edition concept is changes to the standard library. So additions work well for language changes, but I don't think they really make sense for a standard library changes. So everything that involves depreciations, or functions with eventual removal would not be covered for that.

Derick Rethans 24:31

Do you have an example of such a change in the standard library that PHP eight might have?

Nikita Popov 24:36

What I just said might as the general that, usually in every PHP version, we deprecate a bunch of functions and are going to remove them at some point. And these deprecations are like going to apply independently of what edition you set. Actual changes in terms of like real behaviour changes of the standard library I think that's something we quite rarely do. Actual changes to the standard library where the behaviour of a function is changed. That's something we generally try to avoid. Specifically because this causes relatively subtle backwards compatibility breaks. So usually we will either do changes by introducing a new flag or a new function, or by deprecating the functionality entirely. Even when it comes to language changes, there is like I know one example. And the discussion was, well, if we had the edition concept, and we wanted to introduce something like traits, the trait functionality in general is not backwards compatibility breaking. But the trait feature does introduce two new reserved keywords, which is trait and insteadof. So there is technically a backwards compatibility break even though it's finer. And now you have the trade off. Do you introduce traits in the new edition and only reserve the keywords there, thus removing any backwards compatibility break. Or do you you introduce it always, which means that everyone can benefit from it, even if they haven't updated the code to the new edition yet. But it does introduce the small backwards compatibility break. And then you get this trade off and the discussion what you should be doing about that.

Derick Rethans 26:17

I think making that kind of decisions will have to be done based on evidence. And I think in the past you've used the top thousand projects on GitHub and see whether things break or not to make a decision. For example, having the nested, or the triple, quadruple nested ternary. Anytime people use it, it's pretty much a bug in the code.

Nikita Popov 26:36

Yeah, so to give one example, in PHP 7.4, we introduced the short closure syntax with the fn keyword, and they're the source code analysis showed that basically, fn is not used outside of tests, apart from one library, which is my own. Which does have quite a few dependencies. And that library was indeed broken essentially completely by that change. So in that case, I think there might have been an argument that this feature should be introduced under an edition, because there is like evidence of actual breakage in the wild.

Derick Rethans 27:14

This is one of us trying to get it right. We now have evidence for it.

Nikita Popov 27:18

And probably like the insteadof keyword for traits, that there's much less problematic.

Derick Rethans 27:24

Again, as I say, it's the data that speaks that there right? That was quite a bit to go through. I'm curious to see where those discussions ends up going. Hopefully, we get to a conclusion somewhere in the next few months and ready for PHP 8.0. Who knows? Maybe we have another podcast episode where we introduce a new editions concept.

Nikita Popov 27:43

So this is probably my most vague RFC, with a somewhat unclear goal and the somewhat unclear discussion outcome.

Derick Rethans 27:53

Do you have anything else to add to this discussion that we've missed?

Nikita Popov 27:55

I think there is just one thing maybe worth mentioning, which Rust uses pretty extensively, which has automatic upgrades. So they have some tooling to do that, which is mostly reliable. And I think it would be pretty nice if in PHP, we had something similar. In PHP, we can't really make this reliable because language is just way too dynamic. And we actually do have some tooling in the form of the rector library. But we might want to think about providing something under the PHP project umbrella that is more geared towards like doing updates that are as safe as possible. So you can run them without thinking but still reduce your loads some what.

Derick Rethans 28:40

And that is something that is definitely for the future. Thanks for talking to me about the language evolution overview proposal.

Nikita Popov 28:46

Thanks for having me, Derick.

Derick Rethans 28:53

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP line. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 44: Write Once Properties

PHP Internals News: Episode 44: Write Once Properties

In this episode of "PHP Internals News" I chat with Máté Kocsis (Twitter, GitHub, LinkedIn) about the Write Once Properties RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 44. Today I'm talking with Máté Kocsis about an RFC that he produced called write only properties. Hello, Máté. How's it going?

Máté Kocsis 0:34

Yeah, fine. Thanks.

Derick Rethans 0:36

Would you mind introducing yourself a moment?

Máté Kocsis 0:38

My name is Máté Kocsis and I'm a software engineer at LogMeIn. I've been using PHP for 15 years now. And after having followed the mailing list for quite some time, I started contributing to the project last October, and now Write Once properties is my first RFC.

Derick Rethans 0:58

What is the concept of Write Once Properties?

Máté Kocsis 1:00

Write Once Properties can only be initialised, but not modified afterwards. So you can either define a default value for them, or assign them a value, but you can't modify them later. So any other attempts to modify, unset, increment, or decrements them, would cause an exception to be thrown. Basically, this RFC would bring Java's final properties, or C#'s, read only properties to PHP. However, contrary how these languages work, this RFC would allow lazy initialization. It means that these properties don't necessarily have to be initialised until the object construction ends, so you can do that later in the object's life cycle.

Derick Rethans 1:48

PHP already has constants, which are pretty much write only properties as long as they're being defined in a class definition. How does differ?

Máté Kocsis 1:58

Yeah, it's it's the difference because, so you can assign these properties value in the constructor or anywhere. You don't don't have to define them a default value.

Derick Rethans 2:12

Okay, and of course constants have the other problem is that you can only set its values to constants, not necessarily to any sort of expressions, or the result of other method calls.

Unknown Speaker 2:22

So you can use objects, resources, any kind of property value here.

Derick Rethans 2:28 You mentioned C#'s read only properties. And you sort of mentioned them in the same breath as write ones properties for PHP. These seem like opposite things

Máté Kocsis 2:39

Not quite opposite, but there's some distinction between the two. C sharp requires these properties to be initialised until the object construction ends. And this is very difficult to achieve in PHP. And now I'm using Nikita's words: Object construction is a fuzzy term and you can be sure if, if the contractor is involved at all. For example, if you are using Doctrine or proxy manager, so we decided to allow lazy initialization, which means that you don't have to assign these properties a value, you are free to do anytime when you want.

Derick Rethans 3:22

What happens if you read them without them having being set yet?

Máté Kocsis 3:27

Initially, when I started working on this proposal, I faced the problem because untyped properties have an implicit default value in the absence of an explicit default value. That's why you just can't really use them with the write once properties. Either you have a default value or you can do anything with them. That's why we we had to only allow typed properties with the write once properties and typed properties are in an uninitialised state by default. You can't read them until you first assign them a value.

Derick Rethans 4:04

Because in PHP 7.4 that will throw a type error. So that actually ties in really nicely with PHP 7.4's initialise concept for the type hinted properties.

Máté Kocsis 4:14

Yes.

Derick Rethans 4:15

One thing that is slightly skipped over is which keyword does the RFC produced, because you mentioned final for Java and read only for C sharp, which one of you picked for PHP?

Máté Kocsis 4:25

So there were plenty of possibilities considered. The first one was the final keyword. At first, it seemed to be the obvious choice for me, but after thinking about it, I turned out that it's not not the right candidate because currently it affects inheritance rules in PHP. And now we are talking about mutability rules. We had sealed which comes from C sharp and the problem is the same because it also affects inheritance rules, so we shouldn't reuse it for different purposes. We also consider immutable. It's one I like. But it might be a little bit misleading because the usage of immutable data structures, like objects or resources are not restricted at all. Then there's locked, which is a bit too abstract or vague name.

We also have writeonce as well. And technically, it's the most accurate term. But from the user's point of view, it could be a bit confusing because they are not expected to write them at all, only the read these properties. And now we have readonly and probably this keyword get the most traction so far. And it's good. It's a good name because it refers to what users should generally do with these properties. However, there's also a slight problem that users can, or in some circumstances can, write these properties too. But that's not the general use case.

Derick Rethans 6:10

It's a curious thing. I remember we had a PHP developers meeting back in 2000, let's say 2008. But it could as well have been 2005, where we also actually spoke about read only properties, but I'm going to have to dig up the notes for that to see what it said there. Maybe you find it interesting to read to see what the history said about this.

Unknown Speaker 6:31

I'm curious. The question is open, so I plan to put it to vote.

Derick Rethans 6:37

When do you think you're putting it up for a vote?

Unknown Speaker 6:39

I think it should be close now. I will answer the mail, which came from Nicholas. I don't know if there is no more problems than we could do it this week or early next week.

Derick Rethans 6:54

As the properties are write once, how will she implement lazy loading with that? In order to do the lazy loading, you need to first figure out whether the property is already set. How will you know that it's already set? How can you check for that?

Máté Kocsis 7:07

I think generally you don't have to worry whether a property's write once or or not. Since mainly, we are talking about private or protected properties in the most cases. However, if you need this information, then you will be able to use reflection. I've already added support for method in in ReflectionProperty for this purpose.

Derick Rethans 7:31

Let me ask a little bit more about that. You mentioned that this is meant for lazy loading. I understand lazy loading is something that you do well, you're executing and all the methods. For example, on an object, you do get something and that needs to fetch things from a database. Because those write once properties are private or protected, most of the time, the code that fetches the things from the database that does the lazy loading still needs to know whether the properties already been written to. Because if it would attempt it again, you'd potentially get an exception. So how would it know it's already been written to?

Máté Kocsis 8:03

Good question. I was talking with with Marco Pivetta. His use case with proxy manager is to unset these properties in advance and then it can use the get or set or I don't know which magic methods.

Derick Rethans 8:28

I saw that the RFC mentioned a few other alternative approaches for this feature. And the headlines in the RFC say: read only semantics, write before construction semantics, and property accessors. Would you mind explaining these and why they haven't made the final RFC?

Máté Kocsis 8:44

The first one was to follow Java and C sharp, and require all write once properties to be initialised until the object construction ends. And this is what we talked about before. The counter arguments were that it's not easy to implement in PHP. This approach is unnecessarily strict. The other possibility is to let our limited writes to these properties until object construction ends and then do not allow any writes. But positive effect of this solution is that it plays well with bigger class hierarchies, where possibly multiple constructors are involved, but it still has the same problems as the previous approach. Finally, the property accessors could be an alternative to write once properties, although in my opinion, these two features are not really related to each other. But some say that property accessors could alone prevent some unintended changes from the outside and they say that maybe it might be enough. I don't share this sentiment. So in my opinion, unintended changes can come from the inside, so from the private or protected scope. And it's really easy to circumvent visibility rules in PHP. There are quite some possibilities. That's why it's a good way to protect our invariants.

Derick Rethans 10:15

What was the most criticism you got on the mailing list about his proposal?

Máté Kocsis 10:18

As far as I remember, the property accessor. The biggest criticism was that we don't really need this term, but we could use property accessors.

Derick Rethans 10:29

We have spoken a little bit about what this feature is. We went into a few use cases with lazy loading. What would other use cases for this be?

Máté Kocsis 10:38

I think it's really suitable for domain driven design, or working with value objects, and I'm a great fan of DDD. The problem is PHP can't guarantee any immutability for our objects. Just one example. You can invoke the object constructor as many times as you wish, which overrides all your properties.

Derick Rethans 11:04

I had not thought about that you can actually call the constructor yourself. And of course you can.

Máté Kocsis 11:08

Yes, me neither. I just saw somewhere probably in a previous discussion about immutable objects. That's the advantage of having write once properties. You could by using write once properties, yeah, you can prevent accidental modifications from the outside or from the inside too. And that's the main purpose.

Derick Rethans 11:32

Your main purpose wasn't lazy loading but more immutable value objects.

Máté Kocsis 11:36

Yes, yes. Right. I proposed right fans properties first, to pave the road for immutable objects because this is my main goal.

Derick Rethans 11:46

Okay, but you're going step by step. I think that's actually a wise way and Nikita have said something similar that it is nicer to take things little by little so that it is easier to convince people that this is a good feature or not.

Unknown Speaker 11:59

Actually it was Nikita's idea to split the two proposals.

Derick Rethans 12:03

That make sense. Okay, Máté, thank you for taking the time this morning to talk to me.

Máté Kocsis 12:08

Thank you for having me.

Derick Rethans 12:11

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next week.


PHP Internals News: Episode 43: Syntax Tweaks

PHP Internals News: Episode 43: Syntax Tweaks

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the RFCs. One on abstract methods in traits, and one about an improvement to the tokenizer.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 43. Today I'm talking with Nikita Popov yet again about a few RFCs that he's produced for PHP 8. Good morning, Nikita. How are you doing?

Nikita 0:34

Good morning, Derick. I'm doing great.

Derick Rethans 0:37

I've given up on introducing you because we've done this so many times. Now, you don't need an introduction any more. The first RFC I wanted to talk about a little bit this morning is the abstract trait methods validation RFC. What are traits?

Nikita 0:51

We usually talk about traits as compiler assisted copy and paste. Basically, we just take all the methods and properties from a trait and copy them into the class that's using the trait. That's a bit over simplified, in particular, you can use multiple traits in the single class. And those traits might be defining the same method, in which case you have to resolve the conflict in some way. So that's where you have these insteadof or use annotations to specify precedents and aliases.

Derick Rethans 1:23

Traits has been in PHP for quite a long time. What is now the problem that you're trying to solve through this RFC?

Nikita 1:29

The problem is that traits are sometimes not self contained. So to give a specific example, we have in the logger PSR, we have a trait called logger trait, which has a bunch of methods like warning, error, info, notice, and so on. So just simple helper methods, which all called the log method with a specific log level and this trait only specified these helper methods but still requires the actual class to implement the log method. The way you'll usually indicate that is by adding an abstract method to the trait. You have all the methods you actually want to provide by the trait. And you have a number of abstract methods that the trait itself requires to work. This already works fine, but the problem is just that these methods are not actually validated, or they are only inconsistently validated. Even though the trait specifies this abstract methods, you could implement it in the class with a completely different signature.

Derick Rethans 2:30

Okay, just like any signature?

Nikita 2:32

Just like any signature right. The method still has to be present in some way. But the signature can be completely different. Could also be like different method type, like a static method, or an instance method.

Derick Rethans 2:43

Just basically checks for the name is what you're saying?

Nikita 2:46

Yeah, it only checks with the name.

Derick Rethans 2:49

Is this the only place, is this the only time where these abstract methods are not being validated. Or are there other situations where that could happen as well?

Nikita 2:57

No, I think this is the only place.

Derick Rethans 3:00

Are all the situations where these abstract methods in the trait will get validated. And also on signature?

Nikita 3:07

As I mentioned, it's not like the signatures are completely unvalidated. They are just inconsistently validated. It depends a lot on exactly how you use the trait. If you just use the trait and specify the methods of the same class, it doesn't get validated right now. If instead of the method is provided by the parent class, so it's inherited, then it does get validated. If you don't implement the method that makes the class abstract instead, then it's also going to get validated in the child class. It kind of already works halfway. And this RFC just tries to make it work always.

Derick Rethans 3:44

Okay, that seems like a reasonably good addition to almost a no brainer.

Nikita 3:48

I would say it's basically, a bug. Especially if you look at the implementation, there is clearly some validation code there. The conditions are just a little bit off, but so we do have to go through the proposal, because this is a backwards compatibility break.

Derick Rethans 4:02

Yes, I was about to ask if it's a bug fix, why bother with an RFC? But if it's a BC break then yeah, we still need to do it of course. I doubt there be many controversies about is?

Nikita 4:12

Actually there is one contentious point. Um, so something I didn't mention yet is that the RFC also allows you to define private abstract methods in traits. Normally private abstract is like a contradiction in terms because private means only visible in the same class. And abstract means it has to be implemented in the child class, you can't really have both. You can't have both with traits, because traits can see the private members in the class. I think that by itself is like not controversial. That's a reasonable thing to have a trait. The part that is controversial is what you do with existing visibility modifiers. This pattern already exists. So people already define abstract methods in traits but because right now private abstract is forbidden, the lowest they can use is actually protected abstract, even though they don't actually want that method to be publicly exposed, or even protectively exposed. So there is an argument there that we should maybe ignore the normal visibility validation that we do, and allow even implementing a protected abstract method from a trait with a private method inside the class, simply for backwards compatibility reasons.

Derick Rethans 5:21

Because if you wouldn't allow that then, how would it break things?

Nikita 5:26

It would break things because there is existing code, using these abstract protected methods simply because we don't support abstract private yet. So those code would start throwing visibility error, and I mean, could be fixed by just dropping the abstract method, but there's also not ideal.

Derick Rethans 5:45

Because people use it to make sure that, I mean it's there in the class that implements the trait pretty much. Do you have any idea when this is going to for vote?

Nikita 5:53

I think it can already go up for vote? Mainly I need to resolve that question about the visibility first.

Derick Rethans 5:59

I'm looking forward to seeing that showing up sometime soon then.

How do you call your second RFC?

Nikita 6:05

Object based token get alternative?

Derick Rethans 6:07

I think that's a great title. There's a few words in there that we might have to explain first. What are these tokens you're talking about?

Nikita 6:14

So the token_get_all function, which we already have, exposes a part of the PHP compiler infrastructure. PHP compilation generally has three steps. The first is the tokenization. The second part is the parser, and then the compiler. So the tokenizer converts the raw character stream into tokens, which encode higher level concepts, for example, that like the sequence of FUNC and so on is actually a function keyword, or that double code followed by characters is actually a string. So this part only recognises like not larger structures, like whole functions but at least the the atoms that make up language.

Derick Rethans 7:00

Would you say these are the words that make up the sentences?

Nikita 7:03

Yeah, that's that's the right analogy.

Derick Rethans 7:06

Why would you want to have access to them?

Nikita 7:08

For example, I have a PHP parser library, which converts these tokens into an actual syntax tree. And then on top of that, you can easily analyse PHP source code. So this is what all these static analyzers, like PHPStan or Psalm are based on.

Derick Rethans 7:27

Do they all use the tokens?

Nikita 7:29

Those two, in particular, use my PHP parser library, and that one uses the tokens internally. There is also other tooling that's more directly based on tokens, for example, code formatters or code style inspection tools like PHPCS. Those all directly operate on the tokens instead.

Derick Rethans 7:47

But as you say, these tokens only are words and they don't really provide a structure. How would these tools then convert that into a structure?

Nikita 7:54

If you're looking for, if you're looking just at formatting, then you may not really need a lot of structure. So you probably do need to write like that of extra code to recognise that, okay, the function token followed by white space, followed by an identifier, that's function declaration. For the more complicated tooling that builds a syntax tree, you need to implement a parser, either based in code generation, or based on recursive descent approach.

Derick Rethans 8:26

Why would you not want to have direct access to PHP's AST instead because that already provides a structure for you?

Nikita 8:33

We do have direct access to the AST through the AST PECL extension, which is not part of core yet. I don't know if there are plans in that direction.

Derick Rethans 8:43

Well you wrote it so you surely can make these plans.

Nikita 8:46

Yes, I can make them but I don't know if I should make them.

Derick Rethans 8:50

I think you should.

Nikita 8:51

I mean, the nominal advantage of the AST extension is that it's always up to date with PHP. In practice that really isn't an issue, because some of the userland tooling is also pretty quickly updated. The more practical advantage is that the extension is a lot faster than implementing this in userland code. Well, I mean, this is really one of the areas where C code is faster than PHP code. The AST extension only exposes the structure that PHP itself needs. PHP is not interested in like precise formatting, and things like that at all. So it throws away quite a few things. You can, for example, get accurate on position information. Like, where, exactly not just which line but of which column, something is defined. And that's something you're quite often interested in.

Derick Rethans 9:46

Also, from what I've known, it throws away all the comments unless they are doc bloc comments. How does the tokenizer currently return information about the tokens? I've played with this in the past and I didn't think it was the prettiest format to get back out of it.

Nikita 10:02

token_get_all returns an array of tokens. And there are generally two types of tokens. One is single character tokens, like a semicolon, or a comma, or whatever, which are just returned as a string. So it's a single character string. And then there are complex tokens, like the function keyword, like white space, like strings, which are returned as an array where the first element is the token ID, which is an integer. And we have constants defined for these integers. The second element is the actual string content of the token. So for the function keyword, that's always going to be function, but it could be written in different ways because the keyword is case insensitive, so it could be all lowercase, or uppercase, hopefully it's all lowercase.

Derick Rethans 10:52

You'll get the odd situation where the first letter is the capital, I suppose, but that's about it, hopefully.

Nikita 10:57

And finally, the last element is the line number. So the starting line number.

Derick Rethans 11:02

So if you want to look at the position on the line, you'd have to calculate it yourself?

Nikita 11:08

Right you would have to track that yourself. I mean, there are two problems. One is just that you have these single character tokens and the complex tokens using different structure. So all the codes using them as to always switch back between those; check if it's an array or a string, or a test to do some kind of normalisation itself. And the second problem is that arrays in PHP are fairly memory inefficient when it comes to storing a fixed amount of data. Storing three elements inside an array always means allocating an array for eight elements. Because its minimum array size, you have to use space to store the key, and so on. Generally, if you have a fixed structure, it's much much more efficient to store it inside an object. Using a class that has declared properties. So this makes a very large difference in some cases, especially if your array only has like two or three elements, you can save a lot of memory with it.

Derick Rethans 12:12

Have you done any benchmarks to see how much memory you'd actually save some likes some some particular scripts that you've parsed with how to tokenizer doesn't matter and how you proposing to do it?

Nikita 12:22

Yeah, I have here in the RFC, some numbers for some particular script that goes down from 14 megabytes to eight megabytes. So that's nearly half the memory usage. Well, actually, maybe I should first actually say what the RFC proposes. The RFCe proposes to instead return objects, an array of objects. And these objects have four properties. So first is again, the ID of the token, then the textual content, the line number, and also the starting position of the token in the string.

Derick Rethans 12:54

Is this something that the tokenizer extension and tracks for you?

Nikita 12:58

I mean, that's something that can easily do, so we can just as will expose it. And these objects are always used. So we no longer make the distinction between single character tokens and complex tokens. So we always return the uniform array of tokens, of token objects. Despite doing that, removing this optimization for a single character tokens, the end result is still that we use half as much memory, simply because objects are that much more efficient than arrays.

Derick Rethans 13:27

That's a clever trick. I'm sure people like that, that using less memory, at least I know I would. Is it also faster or doesn't particularly matter much?

Nikita 13:35

It's also faster, like maybe 30% or something, because memory usage and performance tend to be pretty heavily correlated. So if you use less memory, you also are faster.

Derick Rethans 13:46

That makes sense. Are you thinking of other things that you can add to the tokenizer extension to make working with them even easier?

Nikita 13:52

The way this new functionality is implemented is, we have a PHP token class and on it we have a static method getAll. So instead of calling the token_get_all function, you call PHPToken::getAll(). And one nice thing this allows you to do is to extend this token class. So you can say, MyPHPToken extends PHPToken, and then you call MyPHPToken::getAll() and then we will actually construct your extension class. That means that you can add whatever methods you like, in addition to what we provide by default.

Derick Rethans 14:29

Is that a pattern that we have in other places in PHP as well? Because I don't usually think that even if you'd call an inherited static method, why wouldn't suddenly return the inherited classes object? wDo we did it in other places?

Nikita 14:42

So this is somewhat uncommon in PHP internals. I think it's a pretty common pattern for userland where generally if you return new objects from static methods, you always use new static, not new self. This is essentially late static binding, which we did discuss quite recently. So, there is one limitation here namely that the constructor of the PHPToken class is final. So, you can extend the class and you can add extra methods, but you cannot modify the construction behaviour, because we would like to internally construct these tokens very efficiently by more or less directly writing the values into the right slot in memory and not doing slow constructor calls, becouse this functionality tends to be very performance sensitive. And the same trick where you can extend the class but not change the constructor is also used by the SimpleXML extension. Does exist but not very common in, generally where internal code is concerned, we usually do not really plan for extension. I think nowadays we mark nearly all internal all new internal classes as final simply because extension is such a pain to deal with. And for old classes who usually wish that we had marked them as final. I mean, this is also a general recommendation for userland that, like you should mark things final as much as you can get away with it. But it's much bigger concern for internals because dealing with userland extensions that do unexpected things is much harder for us.

Derick Rethans 16:23

You even need to make sure that your internal structures are properly constructed by the parent's constructor being called from inherited classes but in PHP, there's no such requirement that you do. Pretty sure I've had problems with that for the Date extension a long, long time ago, where people would extend from it, not call the constructor. And then because he didn't think of it, nothing is defined and everything just falls down.

Nikita 16:44

Yeah, so this is one of the common problems. And the other one is that internal classes often define custom object handlers. So that's something only internal classes can do. Just to give one example, they can define debug info handler that modifies the output of var_dump, but nowadays we also have the user land magic methods on get you back into and I think pretty much all internal classes are just going to ignore that, and always return their own internal debug information even if this method has been overwritten, simply because no internal class actually checks for that. And this kind of problem also exists for a lot of other magic, and generally no one takes it into account, and things are just more or less softly broken.

Derick Rethans 17:31

Very recently there was a pull request for Xdebug to change that as well because in Xdebug's debugging output get sent to IDEs. For internal classes always uses internal get debug handler, and for userland classes it uses whatever is userland defined; I mean if there's a magic method we'll use that. The pull request wanted to change Xdebug in such a way that it would also use the get debug info magic method for internal classes, whenever overridden. After some discussion about this, we figured out, this is probably a bad idea to do, and hence, we haven't merged that. Although we end up fixing some other things that the developer also found. That's a curious situation to be in. We would like us to be sort of work the same. But at the same time, you sometimes really want to see the internal information from the classes without developers having hidden the information behind it, right.

Nikita 18:20

Yeah, that's true.

Derick Rethans 18:21

And that is just from a from a debug perspective. And even from, let's make sure things don't crash perspective. I see that the RFC also rejected a few features that aren't part of the current iteration yet or might make sense to add it later. And one of them is about a lazy token stream. What would that be and what sort of different interface would it have?

Nikita 18:43

The lazy token stream basically just means that instead of returning an array of tokens, we return an iterator of tokens, which means that we do not have to store the full array in memory, which, like for the example, I used. The memory usage for the whole token array was eight megabytes, even after these memory usage improvements, which wasn't a fairly large file, but definitely not the largest file. You can encounter especially when it comes to generated files. So there is an advantage of processing tokens one by one as a stream, because then your memory usage is going to be basically O(1), not O(n). The problem is, I mean, the PHP lexer does indeed work one token at a time, so it can support it. The problem is that it has a lot of internal state. And in order to implement this iterator, we would have to backup and restore the state on each produced token to make sure that it's still possible to for example, include and compile other files in the meantime. So this is something that can be improved; we can make that cheaper, but that would be a larger effort. And I'm not really sure it's worthwhile because, while you can process one token at a time. And this is, for example, what the PHP parser does internally. Many practical applications in userland will generally want to have all tokens as an array. Because it makes it simply, makes things easier if you can always look ahead and look back. And I think it would be fairly hard to rewrite the existing libraries in terms of the latest tree. It may be a nice to have, but I'm not the most useful thing for it now.

Derick Rethans 20:32

What has been the feedback for this RFC?

Nikita 20:35

I think pretty good. This is something that we've already discussed years ago. Last time the discussion kind of got a bit got a bit sidetracked. Yeah, one of the dangerous when you start introducing object oriented interfaces. Well, actually, I just call this RFC object-based intentionally, because when you do object oriented then people would like to have their tokens, and their token streams, and their token stream factories, and the token stream managers. And this is basically held this the whole time. But generally everyone who is working on tokens, which is not a lot of people, but those who are working with them know that memory usage is a problem. And the current, current inconsistent structure is a problem, which is why most of them implement their own token objects, and basically do the same thing we propose here just themselves.

Derick Rethans 21:30

When it's this one going up for a vote at the same time?

Nikita 21:32

Soon.

Derick Rethans 21:33

Both of these RFCs that we spoken about today are both targeted to a PHP eight, I suppose?

Nikita 21:37

Yeah. So right now, I think all RFCs are targeted at PHP 8.

Derick Rethans 21:42

Thank you for taking the time with me today, Nikita to talk about a bunch of little RFCs that you've written. Perhaps by the time this podcast comes out, we've started voting on them and see what happens to them.

Nikita 21:52

Thanks for having me once again.

Derick Rethans 21:56

Thanks for listening to this instalment of PHP internals news. The weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 42: Userspace Operator Overloading

PHP Internals News: Episode 42: Userspace Operator Overloading

In this episode of "PHP Internals News" I chat with Jan Böhmer (GitHub, LinkedIn) about the Userspace Operator Overloading RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 42. Today I'm talking with Jan Böhmer about Userspace Operator Overloading. Jan, would you please introduce yourself?

Jan Böhmer 0:33

Hi, my name is Jan Böhmer. I'm a physics student from Germany. And I'm the author of the Operator Overloading RFC.

Derick Rethans 0:40

What brought you to writing this RFC?

Jan Böhmer 0:42

Mostly because I have worked with monetary objects in the past. And it was a bit tedious to work when it comes to calculating. And whenever you have to want to calculate something, you have to call functions on objects. This is not possible to call, just use operators like with normal values like floats or integers.

Derick Rethans 1:06

Because the monetary objects themselves had multiple things embedded in there or something like that?

Jan Böhmer 1:11

Yes, they describe mostly a value and a currency. And together they are saved in an object.

Derick Rethans 1:18

Okay that that seems like a reasonable thing to do, right? I mean, other times people say the same thing about doing complex numbers or something like vectors. The RFC is called Userspace Operator Overloading. What is operator overloading?

Jan Böhmer 1:31

Yeah. Basically, is the idea that you can define operators, like addition or subtraction, or the string concatenation for objects

Derick Rethans 1:43

Does PHP already have something like this?

Jan Böhmer 1:45

Actually, yes. Objects can have something that calls do operation handler. This is called whenever PHP encounters an object, but if used with an operator. The problem is that this handler is only available for PHP internally. So if you want to use it, you have to write an extension.

Derick Rethans 2:06

So it will be possible to have in an extension a Monetary class with its own operators already defined on it.

Jan Böhmer 2:14

Exactly PHP extension GMP uses this as already. The problem is that it's not very flexible, you already have to know, be familiar with C, you have to be able to compile that. You have to contribute it to whatever system you want to use it. Since we have the foreign function interface since PHP 7.4 we can implement many things without to actually have an extension. But this operator overloading is something that's not possible yet inside from PHP.

Derick Rethans 2:47

So it wouldn't have been possible to write the GMP extension which is of course a wrap around libgmp with FFI, because there's no operator overloading available in PHP.

Jan Böhmer 2:59

Not in that comfortable way. You could use this way with functions but it would a bit more tedious then using just operators.

Derick Rethans 3:09

You've mentioned the Monetary object as a good use case. What other use cases can you think of?

Jan Böhmer 3:15

Higher mathematical objects like complex numbers, vectors, or something like tensors, maybe something like the string component of Symfony. That's you can simply concat this string objects with a normal string using the concat operator and doesn't have to use the function to call that, because basically, this should behave similar to a basic string variable, and not, like something completely different.

Derick Rethans 3:45

What is the syntax you're proposing for implementing this?

Jan Böhmer 3:49

My idea was similar to Python to use special metric function, the methods for every operator you can overload. So if you want to overload the addition operator, you would implement function called, a static function called __add, for example. This offer this function takes both operands, the left hand operand and the right hand operand. So you can decide if your current, this object is on the right or the left hand. That is important to determine something like one divided for zero, or one divided through two, or two divided through zero. There are two complete different cases and you have to be able to differentiate between the two cases.

Derick Rethans 4:39

And wouldn't that not be possible to do in non static functions?

Jan Böhmer 4:43

Another problem with non static functions such as possible access to this variable. If you modify an object from inside an operator handler, this can lead to very, very strange behaviour. Because normally operations doesn't change the object itself, but rather you should return a new value. The problems such as asked us to this, it is very easy to accidentally change the this object. If you only pass both objects like via a static methods, it is a bit more clear that you have to create an all new object

Derick Rethans 5:24

Would a type hint enforce that you return a object to the same class?

Jan Böhmer 5:29

Not an all case you want to return an object of the same class. For example, takes a dots product of vectors. So you take two vectors, multiply it in some way as you return to normal float value.

Derick Rethans 5:43

Of course, yes.

Jan Böhmer 5:44

If you were to enforce that, but would always to be the same types as those limits the use cases, in my opinion too much.

Derick Rethans 5:52

But you could of course type hint the __add operator yourself?

Jan Böhmer 5:57

It's always typehints in arguments, in my observation are used as a hint which type are supported for the operand handler. If you for example, vector plus an integer, and your operator handler only declares vector vector as a parameter types, then this operator will not be called, and it will tried to be called on the second object.

Derick Rethans 6:24

So it won't the called and instead it falls back to the second object to be called on.

Jan Böhmer 6:29

Yes, the idea behind it, is that only one of the objects have to know about both classes. So if you want to combine, for example, two objects from different libraries, and library A doesn't know about library B then only objects of the second library have to know about object A. In C++ you can define supported type outside of classes. So you can define combinations between arbitrary objects. The problem is in PHP this was a bit complicated. And the best way to implement this handler in types or classes. So the class has to know about each other objects, it could be interact with possibly.

Derick Rethans 7:14

That makes sense to me. What happens if neither of the classes, or if one of them is a class, and the other one is just a scalar type, if none of the add methods fit, what would happen then?

Jan Böhmer 7:24

The operators implement an handler, then those doesn't support them, then an error would be thrown.

Derick Rethans 7:32

And that is a type error like you'd normally get?

Jan Böhmer 7:34

If the object doesn't implement operator at all, then a notice would be triggered. The idea's that in the moment, it is possible to write something like object plus one, this would be a fine expression in PHP, in the current PHP versions, the object could be interpreted as a one and just a notice would be thrown. For compatibility reasons, my RFC does the same behaviour if no operators are overloaded on objects.

Derick Rethans 8:05

That seems like a reasonable compromise there. I remember from in the past, I think it was Sara Golemon that wrote an extension for using operator overloading. And I remember from the time that there is a problem with using the lesser than or greater than operators, because I think one of them gets flipped around automatically in the engine is being changed in PHP already, or are you running into the same problem?

Jan Böhmer 8:28

I'm not sure about this. My RFC doesn't mention comparison operators like greater or less at all. Because comparison, handled differently internally of PHP. This doesn't work about this. This is mentioned do operator handler. It would be a bit other implementation to do this. Also, the comparison is a bit complicated on its own terms. Maybe it's more useful to use interfaces for, to implements this overloading, or to use. Also, there are some problems. Maybe we should only allow something like an compare operator that's resolved either, minus one, one, or zero. If object's lesser or equal, so that everything is defined together at once. So it's not possible to define an object that has maybe, for example, the lesser, but not the greater operator.

Derick Rethans 9:32

But this sounds like that's for a different RFC.

Jan Böhmer 9:35

Exactly. That's a bit complicated. If the current operator overloading RFC gets passed, then maybe a comparison operator overloading RFC would make sense.

Derick Rethans 9:46

From reading the RFC, I've noticed that you also won't be able to use a shorthand assignment operator. So for example, plus equals. What is the reason for that?

Jan Böhmer 9:56

So every shorthand operator becomes currently an assignment of A plus B. The do operation handler cannot decide if an shorthand operator or normal operator was called. Allowing to overloads the shorthand operators, would maybe allow some benefits for objects terms of memory optimization. If you call a short hand operator you can mutate the object itself doesn't have to create a new object which takes more memory, but I think with the garbage collector of PHP that is not such a big problem. And if that is really needed feature in the future, this could be edited in other, later version of PHP.

Derick Rethans 10:41

Okay.

Jan Böhmer 10:42

Many other languages doesn't allow to otherwise shorthand operators so I don't think that as too much need for.

Derick Rethans 10:49

Operator overloading sometime has criticisms directed at it. What are some of the criticisms you've heard about it?

Jan Böhmer 10:56

First of all, there are some criticisms about the operator overloading idea in general. So there's also some criticism could be abused for doing some very weird things with operator overloading. So as mentioned C++ there is a shift, left shift operator, is used for output in a stream to the console. Or you could do whatever you want inside this handler, so if somebody would want to save files or modify the file in inside operator overloaded handler, it would be possible, and it's in the most cases function would be more clear what it does.

Derick Rethans 11:35

Of course, in a function add(), if you implemented yourself, nothing stops you of course on writing to a file either.

Jan Böhmer 11:41

Operator overload issues, in my opinion only be used for things that's related to maths or creating custom types that behave similar to the built-in types.

Derick Rethans 11:52

Like complex numbers, or vectors, or monetary numbers. So far, we have been discussing this RFC for a few weeks now. What do you think the chances are of it being passing?

Jan Böhmer 12:05

I'm not sure. I think the idea of operator overloading in general is accepted in the community, but doesn't hear so much backlash. There was some time discussion about how to do it. Some people think it's maybe better if you would implement operator overloading with interfaces, like with ArrayAccess, or to introduce some completely new keywords, like in other languages. In C++, or C#, there are a special keyword operator, that's marks an operator overloading function. So it is clear that is not a real function but special handled way.

Derick Rethans 12:49

Instead of using the underscore underscore in front of method names. When do you think you'll be ready to put this up or vote?

Jan Böhmer 12:56

Wasn't it busy last days, I will do some revises to my RFC, and polish my implementation.

Derick Rethans 13:06

Okay, thank you very much this morning for taking the time to talk to me Jan.

Jan Böhmer 13:10

Thank you very much for inviting me.

Derick Rethans 13:13

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language, I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next week.


PHP Internals News: Episode 41: __toArray()

PHP Internals News: Episode 41: __toArray()

In this episode of "PHP Internals News" I chat with Steven Wade (Twitter, GitHub, Website) about the __toArray() RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. Hi, this is Episode 41. Today I'm talking with Stephen Wade about an RFC that he's produced, called __toArray(). Hi, Steven, would you please introduce yourself?

Steven Wade 0:35

Hi, my name is Steven Wade. I'm a software engineer for a company called follow up boss. I've been using PHP since 2007. And I love the language. So I wanted to be able to give back to it with this RFC.

Derick Rethans 0:48

What brought you to the point of introducing this RFC?

Steven Wade 0:50

This is a feature that I've I've kind of wish would have been in the language for years, and talking with a few people who encouraged it's kind of like the rule of starting a user group right? If there's not one and you have the desire, then you're the person to do it. A few people encouraged and say: Well, why don't you go out and write it. So I've spent the last two years kind of trying to work up the courage or research it enough or make sure I write the RFC the proper way, and then also actually have the time to commit to writing it and following up with any of the discussions as well.

Derick Rethans 1:18

Okay, so we've mentioned the word RFC a few times. But we haven't actually spoken about what it is about. What are you wanting to introduce into PHP?

Steven Wade 1:25

I want to introduce a new magic method. The as he said, the name of the RFC is the __toArray(). And so the idea is that you can cast an object, if your class implements this method, just like it would toString(). If you cast it manually to array then that method will be called if it's implemented. Or as, as I said, in the RFC, array functions will it can it can automatically cast that if you're not using strict types.

Derick Rethans 1:49

Oh, so only if it's not strictly typed. So if its weakly typed would call the toArray() method if the function's argument or type hint array.

Steven Wade 1:58

Yes, and that is actually something that came up during the discussion period, which is something again, this is why we have discussions, right? Is to kind of solicit feedback on things we don't think about it, we may overlook or, and so someone did point out that it is, you know, it would not function that way, or you would not expect it to be automatically cast for you, if you're using strict types.

Derick Rethans 2:17

Okay.

Steven Wade 2:18

The RFC has been updated to reflect that as well.

Derick Rethans 2:20

So now the RFC says it won't be automatically called just for type hint.

Steven Wade 2:24

Correct.

Derick Rethans 2:24

Not everybody is particularly fond of magic methods. What would you say about the criticism that introducing even more of them would be sort of counterproductive, because you'll end up not necessarily knowing what happens if you start calling a method, when you do a cost, for example.

Steven Wade 2:38

The beauty of PHP is in its simplicity. And so adding more and more interfaces, kind of expands class declarations enforcement's and in my opinion, can lead to a lot of clutter. And so I think PHP is already very magical. And the precedent has been set to add more magic to it with 7.4 with the introduction of serialize and unserialize magic methods, and so for me it's just kind of a, it's a tool. I don't think that it's necessarily a bad thing or a good thing. It's just another option for the developer to use.

Derick Rethans 3:06

Two episodes ago, I spoke with Nicolas Grekas about a Stringable interface that he suggested to introduce, which is a little bit similar to sort of the casting with toArray(). And hence, do you think it would have make sense to have an __toArray() also happen if the class implements a interface with a typed function argument?

Steven Wade 3:29

I think that would be two separate RFCs. I think the first one to kind of get it on par with what's what we have now in PHP would be to introduce the toArray(). And then a separate one would be if we wanted to follow suit with an arrayable interface.

Derick Rethans 3:43

And which is the same thing that happens with the Stringable interface, right? We have had toString() for how many years, decades? But from what I understand, if you have a typed property "string", it would also call the toString() method when it's defined on an object that's being passed in, or do I misunderstand that, there are misremember that?

Steven Wade 4:00

I haven't followed that one too closely. I've kind of been catching up on some of the discussion today. But and yeah, I don't know off the top of my head what that would do.

Derick Rethans 4:07

I didn't mean with the ori.. with the newly suggested Stringable interface with adults we currently have.

Steven Wade 4:12

I'm not sure how that would work.

Derick Rethans 4:13

I don't know, either. That's what I'm asking you.

Steven Wade 4:15

With the array and with the typed properties? That's a good question. That's again some feedback, we kind of need to that I need to think through

Derick Rethans 4:21

Because I think it would make sense to at least behave the same and I don't particularly mind which way it goes. Me that's, that's a personal opinion here.

Steven Wade 4:28

And that's a great idea I need to haven't played with 7.4 too much, I need to pull it down and try and just see what the behaviour of string is because that's the main goal of this is to try and just get this on a parity, functionality parity with with what's toString() will do. And so if that is how it handles it with typed properties and I would want to implement that as well.

Derick Rethans 4:47

In a similar way. I don't also know what happens if if you have toString() available in a class and you pass it in as an argument that is typed as string.

Steven Wade 4:54

Even though at least when my test was weak types, it will actually cast that for you. If you have that. String argument type hint, it will cast it and then that will be a copy. So it will actually just be the result of that cast to string. I do not think I think it throws an error if you have a strict type set.

No, I think it'd be very similar, right. It's just how you want to use it in user land, you know, the __toArray() is you're going to you could cast it yourself ,or you can with weak types PHP could cast for you in the appropriate circumstances. If you want the same functionality. In some for now, you would need to call, you know, the __serialize() yourself with the toArray(). In the future, you could implement the toArray() and then your serialize could actually just cast this object to array, and then that should actually convert that for you. And then serialise will then return array so you're not duplicating how you want that object represented when it's an array.

Derick Rethans 6:00

So the RFC mentions that when you do a print_r of person is called __toArray(). But that's not particularly a cast. So why would it do it here, but not for method arguments, for example?

Steven Wade 6:11

That is a product of this being my first time and that was a mistake that was thankfully pointed out during the discussion period and has been corrected.

Derick Rethans 6:19

I read this RFC a week or two ago or so. And I haven't.. I should have reread it this morning that. I did not so my apologies for not being fully up to date here. There's some array functions in PHP like sort() that operate on an array as a reference right? That can't particularly work if you first have to cast to an array, which is what your current RFC now just. I mean, toArray() only gets called when you cast to an array or when it's a weakly typed argument. But how would it work for methods or functions that accept an array by reference?

Steven Wade 6:49

At least the way I proposed it, they would throw an error as it currently does. Again for my test and trying to keep this within parity with the toString. I don't believe there are many functions that will operate on toString on, on a string by reference, as there are with arrays. From what I can recall is that it would throw an error. If you try to operate by reference on an object that implements toString, it will throw an error.

Derick Rethans 7:10

And it wouldn't just fall back to using an object because that'd be very strange behaviour in that case, I suppose.

Steven Wade 7:15

Basically, if it's if it's not something that can be cast or converted to an array through this method, and it's just going to be the same functionality you have in current PHP, which will be throw an error.

Derick Rethans 7:24

Going to go for the principle of least astonishment or something.

Steven Wade 7:27

Yeah, I don't want to introduce too many changes to it. I just want to be able to cast.

Derick Rethans 7:31

I think that is a great idea. Actually, I mean, the same thing I've spoken with Nikita about, that introducing features step by step makes it a lot easier for people to comprehend what you actually end up doing. And there's also less, less chance of people getting bogged down in liking a specific aspect of the RFC but not of the other RFC parts. And we end up not merging the whole thing with the sub part of it.

Steven Wade 7:54

And that's why I was very purposeful and not including any kind of write. You write, you cannot write to a class that implements toArray(). You know, as you will with array ArrayObject, because that we have that for a reason. So this is different functionality, we just wanted to keep it small, and just have this little helper

Derick Rethans 8:11

I read in the RFC, something called get_mangled_object_vars(), but I didn't quite understand what it was.

Steven Wade 8:16

So that was actually a function introduced in 7.4, as a direct result of my original proposal trying to see what people thought in the internals and in the community of this feature. Sometime in spring, last year 2019, I began this discussion, and there was some initial feedback with folks saying that it would cause some breaking changes in their libraries or their code, because they are overloading the casting. Right now, if you cast an object, I guess you get insight into the object's internals without any side effects. And so I think that's how Symfony's var dumper works. And that's how they're able to display some of that information. So that was concern by introducing this, that functionality would break. And so to introduce a method that would give you the same benefits without overloading the casting, the get_mangled_object_vars() was introduced and accepted and implement in 7.4.

Derick Rethans 9:04

And that returns the object properties with their special characters in place. Because PHP internally, if you have a private method, the name for both methods and property is done by doing a null character, the name of the class, a null character then the property name. So that's what that would return, I suppose.

Steven Wade 9:22

I believe so.

Derick Rethans 9:22

I ran into a similar issue in Xdebug, because in some cases, you want to call get_debug_info, which is what people implement for getting debug info for their objects. But in other cases, you don't want to do it because you want to see everything that happens internally, or you want to see all the properties that exist. So there's kind of a tricky one. And I think at some point with toArray also happening, I might actually end up adding the output of both toArray() and get_debug_info separate sort of fake properties into the Xdebug output. But of course that only works if toArray() has no side effects. I don't think there's any way of preventing that in the toArray method that you can now implement that it doesn't change any information in normal properties, for example, right?

Steven Wade 10:12

And that's kind of some of the internals of it that I'm not fully familiar with. With it, I'm hoping to kind of, you know, the discussion period will help eliminate some of that.

Derick Rethans 10:20

I don't think you'd be able to actually.

Steven Wade 10:22

Just recently, we were able to throw an exception from the toString. I don't know if you can actually do any kind of operations, write operations on the object within the toString? I do? That's a good question. And I do look that up. And whatever that behaviour is, we'd want to mimic here as well.

Derick Rethans 10:34

I believe you can. It's normal PHP code, right? And if you don't want to do it, you need to clone it first, which is something you could choose as an implementation, right? You could first clone the class and then call the toArray method on the cloned object. I don't think we have any protection for that. The RFC is currently in the discussion phase. At the time of recording, we're talking about the discussion period. When I sort of thinking of ending that and going for vote?

Steven Wade 10:58

I think this is actually going to be probably a longer period of discussion. And I think most RFC is most fleshed out just because of the nature of it. I am a full time employee full time, father, husband, and also student, as well. And so I don't have a lot of time to do this. And I want to do it right. I want to be able to respond to this. And so the discussion opened up a week ago, and this morning is the first time I've had to be able to respond to that and update the RFC. And so I because I really care about this and would love this feature to go in. I want to continue to solicit discussion and advice and questions and to be able to answer them all and do that. So however long it takes. Ideally, I would love it to be closed, voted on, accepted and implemented in time to be able to get in for the feature freeze for 8.0.

Derick Rethans 11:40

For that you have about four months. Would you have anything else to add that I forgot? Or you want to add that you think it's interesting to know about this RFC?

Steven Wade 11:50

Yeah, the only thing I would add is I've seen discussion, someone posted the RFC on Reddit and I've seen discussions with people like it, people hate it. They want to move one way or the other again, it's just It's a small feature, it's a helper. It's a tool that you can use. Is it perfect? No. Is it going to satisfy everybody? No. You've got the people who are want more functional and procedural you got people who want more OOP. I think it's just another helpful tool that could be in your tool belt. If you use it great. If you don't, you don't have to touch it.

Derick Rethans 12:19

Very well. Thank you, Steven, for taking the time to talk to me this afternoon. I'm looking forwards on this coming to vote at some point.

Steven Wade 12:27

Thank you for having me on the show. And let me explain the purpose and the reasoning behind this RFC. And thank you very much for giving a voice to those looking to improve the language.

Derick Rethans 12:35

You're most welcome. Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next week.


PHP Internals News: Episode 40: Syntax Tweaks

PHP Internals News: Episode 40: Syntax Tweaks

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about a bunch of smaller RFCs.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 40. Again, I'm talking with Nikita. Perhaps we should rename this podcast to the Derick and Nikita Show at some point in the future. This time we're going to talk about a bunch of smaller RFC that he produced related to tweaking PHP syntax for PHP 8. Nikita, would you please introduce yourself?

Nikita Popov 0:42

Hi, I'm Nikita and I do PHP core developement on behalf of JetBrains. We have a couple of new and not very exciting RFCs to discuss.

Derick Rethans 0:53

Sometimes non not exciting is also good to talk about. Anyway, the first one that caught my eye was a RFC called static return type. So we have had return types for well, but what is special about static?

Nikita Popov 1:07

So PHP has three magic special class names that's self, referring to the current class, parent referring to the well parent class, and static, which is the late static binding class name. And that's very similar to self. If no inheritance is involved, then static is the same as self introducing refers to the current class. However, if the method is inherited, and you call this method on the child class, then self is still going to refer to the original class, so the parent. While static is going to refer to the class on which the method was actually called.

Derick Rethans 1:51

Even though the method wasn't overloaded

Nikita Popov 1:54

Exactly. In the way one can think of static as: You can more or less replace static with self. But then you would have to actually copy this method inside every class where.

Derick Rethans 2:09

You have not explained the difference between self and static. Why would you want to use static as a return type instead of self?

Nikita Popov 2:17

There are a couple of use cases. I think the three ones mentioned in the RFC are. The first one is named constructors. So usually in PHP, we just use the construct method. Well, if we had to give this method, a type, a return type, then the return type will be static. Because of course, the constructor always returns while the class you're actually constructing, not some kinda parent class. And named constructors are just a pattern where you use a static method instead of a constructor, for example, because you have multiple different ways of constructing an object and you want to distinguish them by name.

Derick Rethans 2:57

Could we also call those factory methods?

Nikita Popov 3:00

Yeah, that's also related pattern. So for named constructors, you usually also want to return the object that it is actually called on.

Derick Rethans 3:09

It makes sense attached there because of that then creates a contract that you know that is named constructor is going to return that same class and not something else. Because there's no requirements that would otherwise require that same class, like you'd have to construct.

Nikita Popov 3:22

Exactly, yeah. The other pattern. These I think maybe that popularised by PSR, maybe 7 or something, the HTTP request object interface, the object is actually immutable. And the way you change it is by calling it with something method. And this method is going to return you a new object with this particular bit of information replaced. And again, usually for these kinds of API's, you also want to, you want to return the class that the methods actually call them. So if you extend this kind of API, you don't want to get objects of the parent class back. Right the third way, and I think the like by a pretty large margin, the most common one, is just normal fluid mthods. So where each method returns this. This is always an instance of static. So if you extend the class then this is going to be the extending class of the parent class. For this particular case, in PHP there is also a different convention where instead of returning static, you actually need right, return $this. So you use $this as a one word type, a special types indicates this type of method. So static would be a slightly weaker form of that. But we might still add the special $this case in the future.

Derick Rethans 4:51

Because static would only enforce it's the same class but not to the same object.

Nikita Popov 4:56

Exactly.

Derick Rethans 4:56

Are you intending to add that to this RFC?

Nikita Popov 4:59

I like to keep RFCs like different issues separated.

Derick Rethans 5:03

It makes it easy to talk about them and get them accepted or not.

Nikita Popov 5:06

I'm not totally convinced on the $this thing, because static is in the end. I mean, we already allow self return types, we allow parent. It make sense to allow static. But this is not really a type or some kind of extra contact contract on top of the type system. And I'm not sure it makes sense to open this position.

Derick Rethans 5:28

Okay, in which position would you be able to use the static keyword? You've already mentioned the return types, there are other places as well?

Nikita Popov 5:36

No, you can only use it in return types, it would simply not be sound. So it would violate the liskov substitution principle in any other place. The reason why you can use static in return types, is that static is basically a restriction on each inheriting class. So in your original class, static is the same as self. Then in the inheriting class, static is again the same as self, but in the inheriting class. And the inheriting class is a subclass or a subtype of the parent class. So this is allowed by the liskov substitution principle or by our variance rules. If you do the same things for parameters, you would also go from having a parameter for the parent class to parameter for the child class. So you would restrict the amount of inputs that are allowed in this parameter. And that's invalid. And the same argument also goes for properties.

Derick Rethans 6:37

The RFC also talks a little bit about variance and subtyping. How is static considered here differently from self, or if you just explained exactly that?

Nikita Popov 6:46

static is considered a sub type of self. If you have a parent method that uses a self return type, you can have a child method that uses a static return type, because static is ta further restriction. So self allows, still allows you to return the parent class, while static does not allow it. So you restrict the amount of return values and that's valid. While going the other direction. So replacing the static type and the parent method ,with the self type in the child method that would not be valid. Because, you make the amount of low values larger.

Derick Rethans 7:21

And that is exactly the same as the other variance rules that we have since PHP seven for of course. The the last thing the RFC mentions or actually don't quite remember whether it mentions is, is whether you can also use static as part of a union type.

Nikita Popov 7:35

So yes, you can.

Derick Rethans 7:37

Okay, that's the simple answer. I like simple answers.

Nikita Popov 7:40

Together with the other restrictions. So that union type has to be in the return type position. But apart from that, you can.

Derick Rethans 7:47

That's good to hear.

Nikita Popov 7:48

There is actually one more tricky thing regarding the property types. Without a lot of static and property types because as I mentioned, it would violate our variance rules. But unfortunately we have the extra issue that we also have static properties. So if you write public static foobar, then is that static for a static property or for a static type?

Derick Rethans 8:14

Right, because we don't enforce that a static goes or goes before or behind public, private, or protected.

Nikita Popov 8:21

Yeah.

Derick Rethans 8:21

At least not in the syntax. I mean, I think coding standards actually do most of the time require the static to be before.

Nikita Popov 8:27

Even the coding standards they would require you to write it as public static, not this static public.

Derick Rethans 8:35

Oh, really? Okay. I thought was the other way around. Yeah, that is difficult. Because then you don't know which static is meant here.

Nikita Popov 8:41

Yes, and we just allow on the, disallow it on the grammar level. It's actually a bit ugly, because we have to like duplicate the whole type grammar two times, once to include static, once to not include it, just to deal with this ugly of conflict.

Derick Rethans 8:56

That's what happens when you come with something clever. You need clever workarounds.

Nikita Popov 9:00

So it's unfortunate that the static keyword has like three or maybe four completely different meanings in PHP. Simply I think, simply because people wanted to re use a keyword, instead of introducing a new one

Derick Rethans 9:15

Because introducing new keywords might end up meaning breaking people's code.

Nikita Popov 9:19

On the downside, reusing keywords makes code confusing, because well, at least I got the impression that some people find the use of static for late static binding somewhat confusing. And I can also see if you see methods that has signature public static, whatever and return static, and you're not like super familiar with what all of that means.

Derick Rethans 9:46

And that is quite a common pattern because this named constructors are static methods that return static. Let's move on to the next one, which is a tiny RFC that you came up with, which is the Class Name Literal on objects. What does this do?

Nikita Popov 10:03

The syntax where you write a class name, then the double colon class. And that just returns you the fully qualified class name. For example, have a use statement for that class, you get back the full name instead of the short name. I think we've had this since PHP 5.5. And it's a great feature because it's like makes it clear where you're referencing the class and not just some random string. And that means, for example, that that IDE refactorings could work better and so on.

Derick Rethans 10:35

Okay.

Nikita Popov 10:36

The actual RFC is very simple. Currently, the class syntax is only allowed on like literal class names, but you can take an object variable and get the class of that object using the syntax.

Derick Rethans 11:48

However, PHP has a function for that already, which is called get_class() right?

Nikita Popov 10:52

Exactly. This is essentially just syntax sugar for get_class(). The reason why we want to have the syntax sugar is really not so much that writing get_class() is particularly hard, but just that people expect it to be there. This class syntax looks a lot like a constant access, like a class constant access. So it looks like every class has a magic constant called class. Usually you are able to access class constants on objects. So you can write object, the double colon, and the constant thing. And that works. So in that case, we just take the class of the object and access it on that class. For consistency reasons, it only makes sense that you can do the same with this particular magic concept as well. There's really all the motivation

Derick Rethans 11:43

Originally the class literal colon colon class is resolved at compile time. Of course, that can't happen on object colon colon class. Is that still true or no longer?

Nikita Popov 11:53

So it really was true in the first place. For normal class names of cours is resolved at compile time. Actually one of the like gotchas with the syntax is that some people expected to validate that the class actually exists. So they expect that this gets auto loaded and they get an error if it doesn't exist, doesn't happen. So you can reference some non existing class with this syntax just fine. The usually your IDE is going to show a warning for that. I mean, as we just discussed, we also have a couple of magic class names. So we have self, parent, and static. The static one in particular, also always has to be resolved at runtime, because we don't know what the what class the method is actually going to be called on. Actually, self and parent also sometimes have to be resolved at runtime. And there are two cases where that can happen. One is if you use traits, because in that case self refers to the using class, not to the trait. So in closures the self class, refers to the bound scope. The bind to method, there is like the last argument on, is the scope you're using. So in those cases, it's already dynamically resolved.

Derick Rethans 13:09

Okay. The RFC mentions one specific area where you can't use colon colon class. In which situation can you still not use colon colon class on objects?

Nikita Popov 13:20

You can always use it on an object. I think what you're referring to is that normally, for normal class constants, you can also put the class name inside the string. I mean, put the string class name inside the variable and then access the constant on that variable.

Derick Rethans 13:38

Oh, right. Yes.

Nikita Popov 13:39

For the double colon class syntax, we don't want to allow that. Because, well, first this is kinda useless, because it will just return you back the same string you gave it. And I think in that case, the fact that the class name is not validated, this is especially confusing.

Derick Rethans 14:00

Okay, that makes sense. So you can only call colon colon class on literal class names that you already could, as well as on variables that contain an object?

Nikita Popov 14:09

That's right? Yeah.

Derick Rethans 14:10

That sounds great. Does it show up differently in reflection?

Nikita Popov 14:13

This magic class constant actually doesn't show up in reflection at all. It looks like a constant both it's really a special syntax that just happens to share the look with constants.

Derick Rethans 14:24

Do you expect any controversies about this?

Nikita Popov 14:27

I don't think so.

Derick Rethans 14:28

I don't think so either. I can't really see anything that people could complain about too much. I think. I however, do think that for the next RFC that you came up with the variable syntax tweaks, there will be a little bit of haggling about whether this is good idea to do. In PHP seven, zero, we got this uniform variable syntax. Could you give a brief reminder of what it was about?

Nikita Popov 14:48

That was about, well fixing a couple of syntax inconsistencies when it comes to variables syntax. So variable syntax in PHP is extremely, extremely magic. Like our expression, syntax nice and regular. But the variable syntax is a huge assortment of special rules and that RFC made those rules little bit less special at more regular.

Derick Rethans 15:17

From what I understood we missed a few inconsistencies that we probably also should have addressed in that RFC. And that is what you know, trying to tweak again?

Nikita Popov 15:24

All of these remaining consistencies are like really, really minor things and edge cases. But weirdly, all or at least most of them are something that someone at some point ran into, and either open the bug or wrote me an email or pinged me on Twitter. So people somehow managed to still run into these things.

Derick Rethans 15:52

The RFC mentions four specific things that we've missed. What is the first one?

Nikita Popov 15:57

Yeah, so it's probably going to be somewhat hard to talk about some of these examples.

Derick Rethans 16:02

I know because I think some of them make no sense whatsoever.

Nikita Popov 16:05

Yeah.

Derick Rethans 16:06

Because how do you call a method on the string?

Nikita Popov 16:07

Context for this one is I have a nice little extension called scalar objects, which allows you to more or less define methods on strings, on integers, on arrays and so on. In with the uniform variable syntax, we have allowed calling methods on string literals. That actually makes no real sense with baseline PHP. But if you're using scalar objects, then this is a useful feature because you can do something like take a string that rule and call length on it, while otherwise we'll have to wrap it in brackets.

Derick Rethans 16:45

So it's just a syntax change pretty much.

Nikita Popov 16:48

Well, what this one particular is about that right now, this works if it's a literal string, but if you have any variables inside it than suddenly stops to work, which is just a very.

Derick Rethans 16:59

So it is the interpolated strings inside double quotes, the dollar variable name syntax. That's the problem that?

Nikita Popov 17:05

Yeah.

Derick Rethans 17:06

The second one is called constant derefenceability, which is a word I can't pronounce. And my text edit says it's not a word. So what do you mean by it?

Nikita Popov 17:14

That's a good question. I think the term is more or less picked up from C, where we have pointers. And we can dereference pointers to access what the pointer points to. So that's the star operator in C. In PHP, we use the term dereference to also access some kind of structure in some way. For example, to access an array element, so array dereference, or to access an object properties, object reference and so on. That particular one is, I think two things. One is that you can, for example, access the first character of a constant. So read the constant name then brackets zero. Well, maybe even not the first time I can think of a better example. Um, you haven't the constant that contains an array, and you want to access a specific key on that area. That's something you can do already you right now. The same syntax does not work if the constant is in magic constant. What also doesn't work is if you use the our alternative array access syntax. So we have the square brackets, that's what people should use. And we have the curly braces, which is the alternative way to access arrays and which is actually deprecated as of 7.4. I'm not totally sure that that's going to be removed in PHP 8 or not. If it's going to be removed, then this part is a moot point. But yeah, this is again, I think, from a practical perspective, not really interesting. The only situation where I think this is useful is again, of course scalar objects, because it means you can call the methods on the constant

Derick Rethans 18:57

Okay, which in syntax is the grammar currently disallowed doing that.

Nikita Popov 19:00

Currently it's disallowed and that would allow it.

Derick Rethans 19:03

A third one is related. I think it's a class constant dereferenceability.

Nikita Popov 19:07

So someone complained about this one on Twitter. I don't know how they ended up trying to do this. Something you can do right now is, you can access a static property. And then you can interpret the content of that static property as a class name, and access another static property on that. So you can change chain these static property accesses. For some reason, the same does not work with class constant access. So static property accesses can be chained. But class constant accesses can't be. Again, for no particular reason, this change would allow that to happen as well.

Derick Rethans 19:41

This is even a change that makes sense without having to use scalar objects.

Nikita Popov 19:45

That one is. I wouldn't write that kind of code, but it logically makes sense.

Derick Rethans 19:50

And then the last one is, in the RFC is called arbitrary expression support for new and instanceof.

Nikita Popov 19:56

Yeah, so this is probably the only one that's actually useful for something. PHP has well, a bunch of places where usually you have to place either an identifier or a namespace name, but class name, method name, or a property name, and so on, or even the variable name. For all of these places, we usually support some kind of special syntax to instead use a general expression. For example, some of the variable with a static name, you can use curly braces to use a dynamic name instead.

Derick Rethans 20:26

I think for new, we did it at some point already.

Nikita Popov 20:29

For new, this doesn't exist yet. So you can use a variable as class name, but you can't actually compute the class name as part of the expression.

Derick Rethans 20:40

I think what I was referring to, is you can use braces around the whole new class extension, so you can call methods. So that's that's what I meant, but this is specifically using an expression behind new.

Nikita Popov 20:51

Yeah, so these are like two things. One is whether you use an expression for the new class name, and the others for the use the new itself as an expression. And yeah, the same, so yeah, right now, we don't support that for new. And we also don't support it for instanceof, so the the right hand side, which consists of that as the class new. The RFC just proposes to allow an expression and parenthesis in there. And this kind of stuff is, again, not well not particularly useful. But it is useful for things like code generation, where you may have to insert arbitrary expressions sets up your coins. And there are actually some nice hacks that you can use right now. So you can use a variable with a complex expression inside it, where you assign to the variable itself and then return its name.

Derick Rethans 21:42

I don't think I understand this. You're saying you can construct a string with a complex expression in it.

Nikita Popov 21:48

Not a string. You you write something like new variable, but with a curly brace syntax, and in there you return you start off with the string containing some kind of dummy variable name, and then you concatenate that with an empty string. But that empty string is computed by doing the assignments to the variable name that you're actually going to return.

Derick Rethans 22:12

I still don't understand this. You know, what I'm going to do is I'm just going to link to a example for this in the show notes.

Nikita Popov 22:19

It's not really important. You can just cut off this part.

Derick Rethans 22:22

Yep sure, I can do that too-perfectly fine.

Nikita Popov 22:24

Nice hack.

Derick Rethans 22:24

But let's not teach too many hacks to people such I think. Thank you for taking the time with me today, Nikita to talk about a bunch of little RFCs that you've written. Perhaps by the time this podcast comes out, we've started voting on them and we'll see what happens to them.

Nikita Popov 22:37

Thanks for having me once again.

Derick Rethans 22:41

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next week.


PHP Internals News: Episode 39: Stringable Interface

PHP Internals News: Episode 39: Stringable Interface

In this episode of "PHP Internals News" I chat with Nicolas Grekas (Twitter, GitHub, LinkedIn, Symfony Connect) about the new "Stringable Interface" that Nicolas is proposing, as well as about voting rights (on RFCs).

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. Hello, this is Episode 39. Today I'm talking with Nicholas Grekas about an RFC that he's produced called stringable interface. I already spoke with Nicholas last year about the work that Symfony does the new PHP versions come out to look at deprecations and to make sure that versions of Symfony work with new versions of PHP. But this time Nicholas came up with his own RFC called the stringable interface. Nicholas, could you explain what streamable is?

Nicolas Grekas 0:54

Hello, and Stringable is an interface that people could use to declare that they implement some the magic toString() method.

Derick Rethans 1:02

Because currently there's not necessary to implement an interface, and PHP's internals will always use toString if it is available in a class, right?

Nicolas Grekas 1:10

Yeah, absolutely.

Derick Rethans 1:11

What is true reason why you would want to have a stringable interface.

Nicolas Grekas 1:16

So the reason is to be able to benefit from union type in PHP 8. Right now, if you want to accept a string as an argument, it's pretty easy. You just add the string type, right? Let's say now you want to accept a string or a stringable object, stringable an object being something that implements this method. If you want to do that, you can not express the type using types today.

Derick Rethans 1:42

Because if you choose string, and then the name of an object that would only do that specific object.

Nicolas Grekas 1:47

Yes, there are some cases in Symfony especially because this is where work and I do open source. Where we do want to not call toString method until the very latest moment. after example is in the code: one is from Drupal. Drupal computes some constraint validation messages, lazyly, and it's pretty important to them because computing the message itself is pretty costly. They don't need to compute it all the time. Actually, we added the type, the string type in Symfony five, before it was released and Drupal came and say: Oh, this is breaking our code and our features, what should we do now? And we removed the type and we replaced it by some annotation saying: Okay, this is a string or a stringable object. So in the future, when will add up PHP 6 would like to be able to express that using a type of real one,

Derick Rethans 2:41

PHP 6?

Nicolas Grekas 2:42

No, PHP 8, that's true. Strings and PHP 6.

Derick Rethans 2:49

Yay.

Nicolas Grekas 2:51

Another example is also is pretty similar, actually. It's in the symfony auto wiring system. We have services that we wire and sometime we can not; the auto wiring logic is broken doesn't work because some class cannot be at a wet. So in this case, we have a lazy message, because sometime of service while it's not auto wireable, it's going to be removed later on because we removed, Symfony removes, unused services. So instead of computing ahead of time and error message that is heavy to compute, and that we might just trash because the service is going to be removed. We have this lazy thing because yeah, it's heavy to cook with that. So real world use cases.

Derick Rethans 3:32

I think the intention by by having a stringable interface actually makes sense. What are the concerns for for adding this to your own code, are issues with backwards compatibility, for example?

Nicolas Grekas 3:43

That's another goal of the RFC. The way I have designed it, is that I think the actual current code should be able to express the type right now, using annotations of course. So what I mean is that the interface, the proposal, the stringable is very easily polyfilled. So we just create this interface into global namespace, the declarative method, and done. So we can do that now. We can improve the typings now, and then in the future, we'll be able to turn that into an actual union type.

Derick Rethans 4:16

You'd be able to do that almost immediately. Well, you would be able to do that in PHP 8.

Nicolas Grekas 4:21

Yeah.

Derick Rethans 4:21

Without it being a problem. And of course, in that case, you can remove to polyfilled stringable interface.

Nicolas Grekas 4:27

Yeah, absolutely.

Derick Rethans 4:28

This is going to impact extensions, as well, because extensions, I mean, PHP, internal functions, they often accept strings. I don't actually remember but if you use a scaler type hint string for an internal method than PHP or internal function, this is actually called a toString interface on objects. Like if you would call strlen() on an object that implements toString would actually call toString and return the length to that result.

Nicolas Grekas 4:53

Yes, absolutely.

Derick Rethans 4:54

So that wouldn't impact that specific case then.

Nicolas Grekas 4:57

About extension because that's the current state of the implementation of extension, there was a discussion we're going to talk a bit later about, I think. The current state of the art say is that the interface declares the method that just run right, it declares the written type. It's colon string. So the declaration is public function "toString : string". The very first version didn't have the written type, because it's easier for backward compatibility. Because the current code doesn't need the written type. So by not adding it to the interface, we don't break backward compatibility, which is another critical lighting designer feature that I want at least to have. And so feedback came on the first pull request and said okay, we need the written type. So, the way I implemented that is that now in the RFC actually, the written type is implicit. toString, if you declare it, whether you type ": string" or not, it's there. If you do some reflection later on an instance of something that that then the reflection will tell: Yes, there is a written type and it's string,

Derick Rethans 6:01

Whether you have defined it or not in your class. So that's a little bit of magic that gets added on.

Nicolas Grekas 6:07

So it doesn't break any semantics because the written type is already in force: you cannot return anything else than the string right now.

Derick Rethans 6:14

Yeah, that's true. So that means that automatically toString methods will in return type hints require string to be returned.

Nicolas Grekas 6:21

Yes.

Derick Rethans 6:22

And that tweak was necessary to make sure that an older backward compatibility was being broken.

Nicolas Grekas 6:27

Yes

Derick Rethans 6:28

Does that also extends to extension that no part that are not part of the PHP core distribution, do they need to be changed as well?

Nicolas Grekas 6:35

So right now, in the current implementation, yes, they need to be changed. If they declare the toString method, they need to change the type basically, to declare that they return the string explicitly in the C code. So that the current state it's pretty easy on the implementation, implementation side to ask that to the extension authors, right? I think it is doable, but Nikita today posted proposal to improve and go to the next level of the RFC. And the next level would be to have the same magic for the declaration of the interface itself. So it would mean if you declare a toString method, then you implement the stringable interface without having to explicitly declare it in the class.

Derick Rethans 7:22

I think that actually makes quite a bit of sense because that is pretty much how toString is used already. Anyway, the PHP engine enforces it has to be a string that's being returned.

Nicolas Grekas 7:31

Yeah, that's very interesting in that would make the type as a typehint much more useful because any pre existing code would just work with the type and pass the type into the written type and so on. So that would be great. So the link with the extension is that maybe we should have the same automatic declaration implicit declaration applied to extensions. So then extension to boodle have to do to do anything and done. That would declare both the written type and the interface.

Derick Rethans 8:03

That makes sense. You mentioned that Nikita just suggested something to tweak this RFC. I reckon this RFC is still open for discussion and voting hasn't started on it yet.

Nicolas Grekas 8:13

Yes.

Derick Rethans 8:13

Do you have any sort of idea for a timeframe where you think this will be finished?

Nicolas Grekas 8:17

The earliest is on February 6, because we know we need to wait two weeks. So I opened that so we go. I don't know how to write the last part of what we discussed. So Nikita's suggestion. So I'm asking him to some help. As soon as it's ready. I think it can be open for voting. So it can be 10 days. So it didn't trigger much discussions on internals, which I don't know. Maybe it's a very, it's a good point. Or maybe it's like people will vote against without expressing why, I don't know. I hope it's a good thing.

Derick Rethans 8:50

Sometimes people just start paying attention and there's a new vote.

Nicolas Grekas 8:53

Yeah.

Derick Rethans 8:54

So there wasn't a lot of controversy about stringable as you just said, but there was some controversy about you actually apply for voting rights, I remember what happened there?

Nicolas Grekas 9:03

So yes, I applied for voting. Because of my implication, I think I'm an active PHP contributor to internals in not on not on the C-side, but Okay, so since I wanted to open this RFC, I said: Okay, now it's time to do the bureaucratic steps to get a vote, right?

Derick Rethans 9:23

Yep.

Nicolas Grekas 9:24

And I think I'm the first person to actually get through some process for getting votes in itself. I mean, I think most people or maybe all people that have a vote, a vote as a side effect of of something else.

Derick Rethans 9:38

Yeah, usually about contributing patches, either PHP itself, documentation or extensions.

Nicolas Grekas 9:43

So I think there's there has been some confusion, but it's been sorted out pretty quickly. I think I'm going to be able to vote on the next RFC. I'll report back if I can.

Derick Rethans 9:54

Okay, fair enough. Currently, we don't really have a process for this at all. I mean, you get to vote when you have a GIT account. Pretty much, or a PHP commit access in some form. And I don't think we've ever really thought about handing that out to people that have been contributing a lot. Right. So that's kind of an interesting thing to see. What we have seen in the past, is people wanting just saying: Yeah, I'd like to vote, or in other cases, or yeah can I have a php.net email address, right. So that also happened because that is a side effect of getting commit access.

Nicolas Grekas 10:23

Okay.

Derick Rethans 10:24

At the moment I what happened when you did it, it got immediately shut down. Probably a bit quicker than was nice without any discussion. But I think in the future, we do need to come with, come up with a plan and perhaps even think about how to approach voting for features for RFCs the first place because we don't really have a set guideline on who gets to do this and who doesn't get to do it and stuff like that.

Nicolas Grekas 10:49

Yeah, it's pretty interesting. Nikita just after the or during the discussion at, he posted some stats on the number of people who can vote and I think the number is like 1900

Derick Rethans 10:59

Yeah. There's quite a lot here.

Nicolas Grekas 11:01

It's bit strange. And most people don't vote, I think, because they think they shouldn't. I don't know, something like that. But it's true. It's pretty strange. What I like about this situation is that it doesn't draw a strong line between people that contribute C code and people that write PHP code. And it's nice for PHP. I really think it's nice for PHP to have people that vote that don't do C code. But I think, of course, people that do C code must have the strongest voice, because at some point, the implementation decides.

Derick Rethans 11:35

Well, that is a different right, the votes are usually on the idea, not on the implementation. But sometimes the implementation is so complicated that it's nearly impossible to implement, like, I've very briefly spoken with Nikita about generics. I'm sure we'll talk about that at some point, where I'm pretty sure that generics is an idea that simple, I mean, people will vote for it, but as an implementation it might not be that simple to do.

Nicolas Grekas 12:01

Yeah.

Derick Rethans 12:02

So what happens if you vote for the feature, but you can't come up with a good implementation?

Nicolas Grekas 12:06

So I'm inside of thinking that people should vote on the implementations. I mean, people shouldn't be able to vote only on an idea. If there is an idea, it will be supported by an implementation that proves that we are talking about something real, no, just a fancy idea that might not work in black. So that's my opinion.

Derick Rethans 12:24

That's a good point. But as you said, from the 1900 people, or or 1900 people plus, that's controlled, most of them are not familiar with a PHP internals whatsoever, because they tend to be contributions to the documentation. This is also very valuable, but it doesn't mean you know, and you don't necessarily know PHP internals,

Nicolas Grekas 12:40

Yeah, sure.

Derick Rethans 12:41

The oher way can be true as well right? You might know a lot about PHPs internals, but never really use PHP in real life, in your job, or anything like that.

Nicolas Grekas 12:48

So it's also good to be able to team up with someone that knows how to code the C part, the internal part. So you have the idea you're you're the supporter part of the team and then someone - being able convince someone to do the implementation or to help you do it, is also proof of kind of interest. So starting small and bringing more people in the boat and making it happen as a thought.

Derick Rethans 13:12

Yeah, and we saw some of that happening last year. I can't quite remember what feature it was or or exactly what it was. But I agree with you. I think that is important to do that you can at least somebody convinced to implement the feature before just voting on the idea. Thank you for taking the time with me this morning, Nicholas.

Nicolas Grekas 13:30

And thank you Derick for having me again.

Derick Rethans 13:32

It it continues like this I'm sure we'll speak again at some point in the future.

Nicolas Grekas 13:35

Okay.

Derick Rethans 13:39

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next week.


PHP Internals News: Episode 38: Preloading and WeakMaps

PHP Internals News: Episode 38: Preloading and WeakMaps

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about PHP 7.4 preloading mishaps, and his WeakMaps RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weeklish podcast dedicated to demystifying the development of the PHP language. This is Episode 38. I'm talking with Nikita Popov about a few things that have happened over the holidays. Nikita, How were your holidays?

Nikita Popov 0:34

My holidays days were great.

Derick Rethans 0:36

I thought I'd start with something else then I did last year. In any case, and wanting to talk to you this morning about something that happens to PHP seven four over the holidays. And that is issues with preloading on Windows with PHP seven four. I have no idea what the problem is here. Would you try to explain this to me?

Nikita Popov 0:56

So there were actually quite a few issues with preloading in early PHP 7.4 releases. The feature definitely did not get enough testing. Most of the issues have been fixed in 7.4.2. But if you're using preload-user, what you have to use if you're running on the root, then you will probably still see crashes and that's going to be fixed in the next release.

Derick Rethans 1:20

In 7.4.3.

Nikita Popov 1:22

Right. But to get back to Windows, Windows has a well very different process architecture than Linux. In particular, on Linux, or BSD we have fork. Which basically just takes a process and copies its entire memory state to create a new process. This is a lot cheaper than it sounds because it's all like reuses memory until it's actually changed.

Derick Rethans 1:48

Its copy on write.

Nikita Popov 1:49

Copy on write exactly. The same functionality does not exist on Windows, or at least it's not publicly exposed. So on Windows, you can only create new processes from scratch, that look, we use our memory from the previous one. And for OPcache, this is a problem because OPcache would really like to reference internal classes as defined by PHP. But because we store things in shared memory, which is shared between multiple processes, we now have the problem that these internal classes can reside at different addresses, in these different processes. On Linux, it's always going to be the same address because we are forking and that keeps the address. On Windows each process could have a different address. And especially because Windows since I think Windows Vista, uses address space layout randomization. This is actually pretty much always going to be a different address.

Derick Rethans 2:51

Because that's a security feature?

Nikita Popov 2:52

Exactly. It's a security feature.

Derick Rethans 2:54

Would it also be a problem on Linux if you'd start a process instead of forking it?

Nikita Popov 2:59

Yes, it would be a problem. The difference's just that on on Unix, we don't do that. OPcache has quite a different architecture on Windows. On Linux, we do not allow to attach to an existing OPcache from a separate process. So the only way to share an OPcache is to use fork. On Windows because of this restriction that we don't have fork, we do though this kind of attachments and that's where we have we have to deal with these kind of issues. So that's actually a general problem, not just for preloading on differences, just that normally, we can just: Hey, okay, we do not allow any references to internal classes from shared memory on Windows. It's like a slight hit to optimization, but it's not super important. While we're preloading, we have to link the entire class graph during preloading. And if you have any classes that for example, extend from an internal class, like extend from Exception. Or in some cases, you can just use an internal class as a type hint, then we would not be able to store these kinds of references in shared memory on Windows. And because for preloading, it's pretty much inevitable that you run into the situation you just can't realistically do preloading on Windows,

Derick Rethans 4:18

Hence, the decision being made just turning it off, instead of trying to end and always failing pretty much.

Nikita Popov 4:24

Yeah, I mean, it kind of did work before, it just got a bunch of warnings that these classes haven't been preloaded. And if people try that, oh, it's like with a simple example there, we'll see you great, preloading is working. But once they move to their actual complex application that uses internal classes at various points, it turns out that: Actually, no, it doesn't really work in practice. And so the way that we just disabled entirely

Derick Rethans 4:51

That seems like a reasonable solution to this, do you think at some point this can be fixable in another clever way?

Nikita Popov 4:58

Well, main way in which can be fixed is to avoid this kind of multi process attachments on Windows. The alternative to having multiple processes is to have multiple threads, which do share an address space. Basically same as fork just with threads then. But that, of course, depends on what kind of web server you're using and what kind of SAPI you're using. And I think nowadays, on Windows on threaded web servers are somewhat more popular than on Linux, it's still not the majority deployment strategy.

Derick Rethans 5:34

I think it used to be that threaded process models on Windows were a lot more common when PHP just came out for Windows, because it was an ISAPI module which was always threaded. From what I remember the original reason why we had ZTS, in the first place. Yeah, at some point that started moving to PHP FPM kind of models because it didn't use threading and it was, tended to be a lot safer to use it that way.

Nikita Popov 5:57

Right. I mean, threading has issues in particular because things like locales are per process, not per thread. So processes are usually safer to use

Derick Rethans 6:08

Anything else interesting that happened that went wrong with a preloading, or do you not want to mention?

Nikita Popov 6:12

The rest is mostly just that we have two different ways of doing preloading. One is using OPcache compile file, and others using require or include, and the difference between them is that OPcache compile file combines the file but does not executed. In that case, the way we perform preloading is that we first collect all classes and then we, like gradually, link them, actually register them, always making sure that all the dependencies have already be linked. And this is the mode that that I think mostly work well at the release of PHP seven point four. And the other one, they require approach is where we, well require directly executes the code and registers the classes. And in that case, basically, if it turns out that some kind of dependency cannot be preloaded for some reason, we simply have to abort preloading, because we cannot recover from that. This abortion was missing. And it that turns out that, in the end, the way people actually use preloading is using the require approach, not using the OPcache compile file approach.

Derick Rethans 7:26

Although that's the one you see most of the examples that I've seen, and in the documentation.

Nikita Popov 7:30

Right, it has some advantages you some require.

Derick Rethans 7:34

Something else that happened over the holidays is that you've worked on several RFCs there're too many to talk about at all in this episode. But one of the earlier ones, was a WeakMap, or WeakMaps RFC, which sort of builds on top of the weak references that we already got in PHP seven four. What's wrong with the weak references, and why do we now need weak maps?

Nikita Popov 7:58

There's nothing wrong with weak references. As a reminder what weak references are both, they allow you to reference an object without preventing it from being garbage collected. So if the object is unset, then you're just left with a dangling reference. And if you try to access it, you get back knowledge of the object. Now, the probably most common use case for any kind of weak data structure is a map or an associative array, where you have objects and want to associate some kind of data with them. Typical use cases are caches or other memoise data structures. And the reason why it's important for this to be weak is that you do not well, if you want to cache some data with the object, and then nobody else is using that object. You don't really want to keep around that cache data because no one has ever going to use it again. And it's just going to take up memory usage. And this is what the weak map does. So you use objects as keys, use some kind of data as the value. And if the object is no longer used outside this map, then is also removed from the map as well.

Derick Rethans 9:16

So you mentioned objects as keys. Is that something new? Because I don't think currently PHP supports that.

Nikita Popov 9:22

I mean, you can't use objects as keys in normal arrays. That doesn't work. For example, the array access interface and the traversable interface, they don't really care what your types are. So you can use anything.

Derick Rethans 9:37

I glanced over that that point, yes. But weak map is something that then implements array access.

Nikita Popov 9:44

That's right

Derick Rethans 9:45

How does the interface of a weak map look like? How would you interact with it?

Nikita Popov 9:49

Yeah, actually, it just implements all the magic interfaces in PHP. So ArrayAccess, you can access the roadmap by key, where the key's object. Traversable, that is you can iterate over the weak map and get both the keys and values, and of course Countable, so you can count how many elements there are in there. And that's it.

Derick Rethans 10:12

All the methods, there's plenty of em then, there should be nine or 10 or so right?

Nikita Popov 10:17

Five.

Derick Rethans 10:18

No there's the six of iterator.

Nikita Popov 10:20

Right, yeah, there is this little detail where when you implement Traversable, internal classes, you don't actually have to implement iterator methods. That's why there is a few, a few less.

Derick Rethans 10:33

Who's going to benefit from this new feature?

Nikita Popov 10:35

Like one of the users for weak maps are things like ORMs. Where, well, database records are represented as object, and there is data storage related to these objects. And I think it's a, well, well known issue that if you're using ORMs you can sometimes run into Memory Usage issues. And the absence of weak structures is one of the reasons why that can happen. So that they just keep holding onto information even though the application actually doesn't use it anymore.

Derick Rethans 11:12

Did a specific ORM request this feature?

Nikita Popov 11:15

I don't think so.

Derick Rethans 11:16

Because weak maps are something done as an internal class in PHP, how are these things implemented? Is there something interesting because I remember talking to Joe about weak references last year, there is some functionality where it would automatically do something on the destructor or rather of the objects. Is this something that also happens with weak maps.

Nikita Popov 11:37

So yeah, the mechanism how weak references and maps work is basically the same. So there is a flag on each object, that can be set to indicate that it has a weak reference or weak map. If the object is destroyed, and has this nice flag, then we execute a callbeck that is going to remove the object from the Weak Reference or from the weak map, or from multiple maps.

Derick Rethans 12:05

Is it because there are some kind of registry that links an object?

Nikita Popov 12:08

So when we store all the weak references, weak maps, and the object as part of, so we can efficiently remove it.

Derick Rethans 12:16

When I was reading the RFC, I saw something like SPL object ID mentioned, which is a way how to basically identify a specific object. Is this something related to weak references or weak maps? Or is this something else no longer used, or people should no longer use pretty much, because I guess this was a way previously how to identify an object and then associated extra data with it. Like you mentioned that ORMs were due for cache.

Nikita Popov 12:44

Right. So it's kind of related, but I'm also not. So one is not a replacement for the other, just different use cases. We used to have SPL object hash for a very long time. And I think, somebody went around PHP 7.0, or maybe later SPL object ID was introduced, which this the same just because an integer and because because of that is more efficient. But in the end, what these functions do is return a unique identifier for an object. But this identifier is only unique as long as the object is alive. So these object IDs are reused when objects are destroyed.

Derick Rethans 13:30

And that makes them not usable for associating cache data with a specific object?

Nikita Popov 13:35

That makes them usable for associating cache data. But you also have to store the object to make sure it does not get destroyed in the meantime. So that's how you get around the restriction that you cannot use objects as array keys. That's what you need the ID for. But you still have to store the like a strong reference to the object to make sure it's not garbage collected. And this ID starts referencing some kind of other objects.

Derick Rethans 14:04

When you say Strong Reference, that is what PHP references are traditionally?

Nikita Popov 14:08

That's the normal reference.

Derick Rethans 14:10

Well, because it's been quite some time since it's got introduced from what I understood this has been accepted?

Nikita Popov 14:16

It is accepted: 25, zero

Derick Rethans 14:18

25, zero. That doesn't happen very often.

Nikita Popov 14:22

Most RFCs are maybe not anonymous, but usually either they are 95% accepted, or they rejected really hard. There is not a lot of middle ground.

Derick Rethans 14:34

That's pretty good, though. In any case, we will see this in PHP 8, I suppose, coming out later in the year.

Nikita Popov 14:39

That's right. Yes.

Derick Rethans 14:41

Well, thank you for taking the time today to talk to me about weak references and preloading especially on Windows. Thank you for taking the time.

Nikita Popov 14:50

Thanks for having me Derick

Derick Rethans 14:52

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.