PHP Internals News: Episode 41: __toArray()

PHP Internals News: Episode 41: __toArray()

In this episode of "PHP Internals News" I chat with Steven Wade (Twitter, GitHub, Website) about the __toArray() RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. Hi, this is Episode 41. Today I'm talking with Stephen Wade about an RFC that he's produced, called __toArray(). Hi, Steven, would you please introduce yourself?

Steven Wade 0:35

Hi, my name is Steven Wade. I'm a software engineer for a company called follow up boss. I've been using PHP since 2007. And I love the language. So I wanted to be able to give back to it with this RFC.

Derick Rethans 0:48

What brought you to the point of introducing this RFC?

Steven Wade 0:50

This is a feature that I've I've kind of wish would have been in the language for years, and talking with a few people who encouraged it's kind of like the rule of starting a user group right? If there's not one and you have the desire, then you're the person to do it. A few people encouraged and say: Well, why don't you go out and write it. So I've spent the last two years kind of trying to work up the courage or research it enough or make sure I write the RFC the proper way, and then also actually have the time to commit to writing it and following up with any of the discussions as well.

Derick Rethans 1:18

Okay, so we've mentioned the word RFC a few times. But we haven't actually spoken about what it is about. What are you wanting to introduce into PHP?

Steven Wade 1:25

I want to introduce a new magic method. The as he said, the name of the RFC is the __toArray(). And so the idea is that you can cast an object, if your class implements this method, just like it would toString(). If you cast it manually to array then that method will be called if it's implemented. Or as, as I said, in the RFC, array functions will it can it can automatically cast that if you're not using strict types.

Derick Rethans 1:49

Oh, so only if it's not strictly typed. So if its weakly typed would call the toArray() method if the function's argument or type hint array.

Steven Wade 1:58

Yes, and that is actually something that came up during the discussion period, which is something again, this is why we have discussions, right? Is to kind of solicit feedback on things we don't think about it, we may overlook or, and so someone did point out that it is, you know, it would not function that way, or you would not expect it to be automatically cast for you, if you're using strict types.

Derick Rethans 2:17

Okay.

Steven Wade 2:18

The RFC has been updated to reflect that as well.

Derick Rethans 2:20

So now the RFC says it won't be automatically called just for type hint.

Steven Wade 2:24

Correct.

Derick Rethans 2:24

Not everybody is particularly fond of magic methods. What would you say about the criticism that introducing even more of them would be sort of counterproductive, because you'll end up not necessarily knowing what happens if you start calling a method, when you do a cost, for example.

Steven Wade 2:38

The beauty of PHP is in its simplicity. And so adding more and more interfaces, kind of expands class declarations enforcement's and in my opinion, can lead to a lot of clutter. And so I think PHP is already very magical. And the precedent has been set to add more magic to it with 7.4 with the introduction of serialize and unserialize magic methods, and so for me it's just kind of a, it's a tool. I don't think that it's necessarily a bad thing or a good thing. It's just another option for the developer to use.

Derick Rethans 3:06

Two episodes ago, I spoke with Nicolas Grekas about a Stringable interface that he suggested to introduce, which is a little bit similar to sort of the casting with toArray(). And hence, do you think it would have make sense to have an __toArray() also happen if the class implements a interface with a typed function argument?

Steven Wade 3:29

I think that would be two separate RFCs. I think the first one to kind of get it on par with what's what we have now in PHP would be to introduce the toArray(). And then a separate one would be if we wanted to follow suit with an arrayable interface.

Derick Rethans 3:43

And which is the same thing that happens with the Stringable interface, right? We have had toString() for how many years, decades? But from what I understand, if you have a typed property "string", it would also call the toString() method when it's defined on an object that's being passed in, or do I misunderstand that, there are misremember that?

Steven Wade 4:00

I haven't followed that one too closely. I've kind of been catching up on some of the discussion today. But and yeah, I don't know off the top of my head what that would do.

Derick Rethans 4:07

I didn't mean with the ori.. with the newly suggested Stringable interface with adults we currently have.

Steven Wade 4:12

I'm not sure how that would work.

Derick Rethans 4:13

I don't know, either. That's what I'm asking you.

Steven Wade 4:15

With the array and with the typed properties? That's a good question. That's again some feedback, we kind of need to that I need to think through

Derick Rethans 4:21

Because I think it would make sense to at least behave the same and I don't particularly mind which way it goes. Me that's, that's a personal opinion here.

Steven Wade 4:28

And that's a great idea I need to haven't played with 7.4 too much, I need to pull it down and try and just see what the behaviour of string is because that's the main goal of this is to try and just get this on a parity, functionality parity with with what's toString() will do. And so if that is how it handles it with typed properties and I would want to implement that as well.

Derick Rethans 4:47

In a similar way. I don't also know what happens if if you have toString() available in a class and you pass it in as an argument that is typed as string.

Steven Wade 4:54

Even though at least when my test was weak types, it will actually cast that for you. If you have that. String argument type hint, it will cast it and then that will be a copy. So it will actually just be the result of that cast to string. I do not think I think it throws an error if you have a strict type set.

No, I think it'd be very similar, right. It's just how you want to use it in user land, you know, the __toArray() is you're going to you could cast it yourself ,or you can with weak types PHP could cast for you in the appropriate circumstances. If you want the same functionality. In some for now, you would need to call, you know, the __serialize() yourself with the toArray(). In the future, you could implement the toArray() and then your serialize could actually just cast this object to array, and then that should actually convert that for you. And then serialise will then return array so you're not duplicating how you want that object represented when it's an array.

Derick Rethans 6:00

So the RFC mentions that when you do a print_r of person is called __toArray(). But that's not particularly a cast. So why would it do it here, but not for method arguments, for example?

Steven Wade 6:11

That is a product of this being my first time and that was a mistake that was thankfully pointed out during the discussion period and has been corrected.

Derick Rethans 6:19

I read this RFC a week or two ago or so. And I haven't.. I should have reread it this morning that. I did not so my apologies for not being fully up to date here. There's some array functions in PHP like sort() that operate on an array as a reference right? That can't particularly work if you first have to cast to an array, which is what your current RFC now just. I mean, toArray() only gets called when you cast to an array or when it's a weakly typed argument. But how would it work for methods or functions that accept an array by reference?

Steven Wade 6:49

At least the way I proposed it, they would throw an error as it currently does. Again for my test and trying to keep this within parity with the toString. I don't believe there are many functions that will operate on toString on, on a string by reference, as there are with arrays. From what I can recall is that it would throw an error. If you try to operate by reference on an object that implements toString, it will throw an error.

Derick Rethans 7:10

And it wouldn't just fall back to using an object because that'd be very strange behaviour in that case, I suppose.

Steven Wade 7:15

Basically, if it's if it's not something that can be cast or converted to an array through this method, and it's just going to be the same functionality you have in current PHP, which will be throw an error.

Derick Rethans 7:24

Going to go for the principle of least astonishment or something.

Steven Wade 7:27

Yeah, I don't want to introduce too many changes to it. I just want to be able to cast.

Derick Rethans 7:31

I think that is a great idea. Actually, I mean, the same thing I've spoken with Nikita about, that introducing features step by step makes it a lot easier for people to comprehend what you actually end up doing. And there's also less, less chance of people getting bogged down in liking a specific aspect of the RFC but not of the other RFC parts. And we end up not merging the whole thing with the sub part of it.

Steven Wade 7:54

And that's why I was very purposeful and not including any kind of write. You write, you cannot write to a class that implements toArray(). You know, as you will with array ArrayObject, because that we have that for a reason. So this is different functionality, we just wanted to keep it small, and just have this little helper

Derick Rethans 8:11

I read in the RFC, something called get_mangled_object_vars(), but I didn't quite understand what it was.

Steven Wade 8:16

So that was actually a function introduced in 7.4, as a direct result of my original proposal trying to see what people thought in the internals and in the community of this feature. Sometime in spring, last year 2019, I began this discussion, and there was some initial feedback with folks saying that it would cause some breaking changes in their libraries or their code, because they are overloading the casting. Right now, if you cast an object, I guess you get insight into the object's internals without any side effects. And so I think that's how Symfony's var dumper works. And that's how they're able to display some of that information. So that was concern by introducing this, that functionality would break. And so to introduce a method that would give you the same benefits without overloading the casting, the get_mangled_object_vars() was introduced and accepted and implement in 7.4.

Derick Rethans 9:04

And that returns the object properties with their special characters in place. Because PHP internally, if you have a private method, the name for both methods and property is done by doing a null character, the name of the class, a null character then the property name. So that's what that would return, I suppose.

Steven Wade 9:22

I believe so.

Derick Rethans 9:22

I ran into a similar issue in Xdebug, because in some cases, you want to call get_debug_info, which is what people implement for getting debug info for their objects. But in other cases, you don't want to do it because you want to see everything that happens internally, or you want to see all the properties that exist. So there's kind of a tricky one. And I think at some point with toArray also happening, I might actually end up adding the output of both toArray() and get_debug_info separate sort of fake properties into the Xdebug output. But of course that only works if toArray() has no side effects. I don't think there's any way of preventing that in the toArray method that you can now implement that it doesn't change any information in normal properties, for example, right?

Steven Wade 10:12

And that's kind of some of the internals of it that I'm not fully familiar with. With it, I'm hoping to kind of, you know, the discussion period will help eliminate some of that.

Derick Rethans 10:20

I don't think you'd be able to actually.

Steven Wade 10:22

Just recently, we were able to throw an exception from the toString. I don't know if you can actually do any kind of operations, write operations on the object within the toString? I do? That's a good question. And I do look that up. And whatever that behaviour is, we'd want to mimic here as well.

Derick Rethans 10:34

I believe you can. It's normal PHP code, right? And if you don't want to do it, you need to clone it first, which is something you could choose as an implementation, right? You could first clone the class and then call the toArray method on the cloned object. I don't think we have any protection for that. The RFC is currently in the discussion phase. At the time of recording, we're talking about the discussion period. When I sort of thinking of ending that and going for vote?

Steven Wade 10:58

I think this is actually going to be probably a longer period of discussion. And I think most RFC is most fleshed out just because of the nature of it. I am a full time employee full time, father, husband, and also student, as well. And so I don't have a lot of time to do this. And I want to do it right. I want to be able to respond to this. And so the discussion opened up a week ago, and this morning is the first time I've had to be able to respond to that and update the RFC. And so I because I really care about this and would love this feature to go in. I want to continue to solicit discussion and advice and questions and to be able to answer them all and do that. So however long it takes. Ideally, I would love it to be closed, voted on, accepted and implemented in time to be able to get in for the feature freeze for 8.0.

Derick Rethans 11:40

For that you have about four months. Would you have anything else to add that I forgot? Or you want to add that you think it's interesting to know about this RFC?

Steven Wade 11:50

Yeah, the only thing I would add is I've seen discussion, someone posted the RFC on Reddit and I've seen discussions with people like it, people hate it. They want to move one way or the other again, it's just It's a small feature, it's a helper. It's a tool that you can use. Is it perfect? No. Is it going to satisfy everybody? No. You've got the people who are want more functional and procedural you got people who want more OOP. I think it's just another helpful tool that could be in your tool belt. If you use it great. If you don't, you don't have to touch it.

Derick Rethans 12:19

Very well. Thank you, Steven, for taking the time to talk to me this afternoon. I'm looking forwards on this coming to vote at some point.

Steven Wade 12:27

Thank you for having me on the show. And let me explain the purpose and the reasoning behind this RFC. And thank you very much for giving a voice to those looking to improve the language.

Derick Rethans 12:35

You're most welcome. Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next week.


PHP Internals News: Episode 40: Syntax Tweaks

PHP Internals News: Episode 40: Syntax Tweaks

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about a bunch of smaller RFCs.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 40. Again, I'm talking with Nikita. Perhaps we should rename this podcast to the Derick and Nikita Show at some point in the future. This time we're going to talk about a bunch of smaller RFC that he produced related to tweaking PHP syntax for PHP 8. Nikita, would you please introduce yourself?

Nikita Popov 0:42

Hi, I'm Nikita and I do PHP core developement on behalf of JetBrains. We have a couple of new and not very exciting RFCs to discuss.

Derick Rethans 0:53

Sometimes non not exciting is also good to talk about. Anyway, the first one that caught my eye was a RFC called static return type. So we have had return types for well, but what is special about static?

Nikita Popov 1:07

So PHP has three magic special class names that's self, referring to the current class, parent referring to the well parent class, and static, which is the late static binding class name. And that's very similar to self. If no inheritance is involved, then static is the same as self introducing refers to the current class. However, if the method is inherited, and you call this method on the child class, then self is still going to refer to the original class, so the parent. While static is going to refer to the class on which the method was actually called.

Derick Rethans 1:51

Even though the method wasn't overloaded

Nikita Popov 1:54

Exactly. In the way one can think of static as: You can more or less replace static with self. But then you would have to actually copy this method inside every class where.

Derick Rethans 2:09

You have not explained the difference between self and static. Why would you want to use static as a return type instead of self?

Nikita Popov 2:17

There are a couple of use cases. I think the three ones mentioned in the RFC are. The first one is named constructors. So usually in PHP, we just use the construct method. Well, if we had to give this method, a type, a return type, then the return type will be static. Because of course, the constructor always returns while the class you're actually constructing, not some kinda parent class. And named constructors are just a pattern where you use a static method instead of a constructor, for example, because you have multiple different ways of constructing an object and you want to distinguish them by name.

Derick Rethans 2:57

Could we also call those factory methods?

Nikita Popov 3:00

Yeah, that's also related pattern. So for named constructors, you usually also want to return the object that it is actually called on.

Derick Rethans 3:09

It makes sense attached there because of that then creates a contract that you know that is named constructor is going to return that same class and not something else. Because there's no requirements that would otherwise require that same class, like you'd have to construct.

Nikita Popov 3:22

Exactly, yeah. The other pattern. These I think maybe that popularised by PSR, maybe 7 or something, the HTTP request object interface, the object is actually immutable. And the way you change it is by calling it with something method. And this method is going to return you a new object with this particular bit of information replaced. And again, usually for these kinds of API's, you also want to, you want to return the class that the methods actually call them. So if you extend this kind of API, you don't want to get objects of the parent class back. Right the third way, and I think the like by a pretty large margin, the most common one, is just normal fluid mthods. So where each method returns this. This is always an instance of static. So if you extend the class then this is going to be the extending class of the parent class. For this particular case, in PHP there is also a different convention where instead of returning static, you actually need right, return $this. So you use $this as a one word type, a special types indicates this type of method. So static would be a slightly weaker form of that. But we might still add the special $this case in the future.

Derick Rethans 4:51

Because static would only enforce it's the same class but not to the same object.

Nikita Popov 4:56

Exactly.

Derick Rethans 4:56

Are you intending to add that to this RFC?

Nikita Popov 4:59

I like to keep RFCs like different issues separated.

Derick Rethans 5:03

It makes it easy to talk about them and get them accepted or not.

Nikita Popov 5:06

I'm not totally convinced on the $this thing, because static is in the end. I mean, we already allow self return types, we allow parent. It make sense to allow static. But this is not really a type or some kind of extra contact contract on top of the type system. And I'm not sure it makes sense to open this position.

Derick Rethans 5:28

Okay, in which position would you be able to use the static keyword? You've already mentioned the return types, there are other places as well?

Nikita Popov 5:36

No, you can only use it in return types, it would simply not be sound. So it would violate the liskov substitution principle in any other place. The reason why you can use static in return types, is that static is basically a restriction on each inheriting class. So in your original class, static is the same as self. Then in the inheriting class, static is again the same as self, but in the inheriting class. And the inheriting class is a subclass or a subtype of the parent class. So this is allowed by the liskov substitution principle or by our variance rules. If you do the same things for parameters, you would also go from having a parameter for the parent class to parameter for the child class. So you would restrict the amount of inputs that are allowed in this parameter. And that's invalid. And the same argument also goes for properties.

Derick Rethans 6:37

The RFC also talks a little bit about variance and subtyping. How is static considered here differently from self, or if you just explained exactly that?

Nikita Popov 6:46

static is considered a sub type of self. If you have a parent method that uses a self return type, you can have a child method that uses a static return type, because static is ta further restriction. So self allows, still allows you to return the parent class, while static does not allow it. So you restrict the amount of return values and that's valid. While going the other direction. So replacing the static type and the parent method ,with the self type in the child method that would not be valid. Because, you make the amount of low values larger.

Derick Rethans 7:21

And that is exactly the same as the other variance rules that we have since PHP seven for of course. The the last thing the RFC mentions or actually don't quite remember whether it mentions is, is whether you can also use static as part of a union type.

Nikita Popov 7:35

So yes, you can.

Derick Rethans 7:37

Okay, that's the simple answer. I like simple answers.

Nikita Popov 7:40

Together with the other restrictions. So that union type has to be in the return type position. But apart from that, you can.

Derick Rethans 7:47

That's good to hear.

Nikita Popov 7:48

There is actually one more tricky thing regarding the property types. Without a lot of static and property types because as I mentioned, it would violate our variance rules. But unfortunately we have the extra issue that we also have static properties. So if you write public static foobar, then is that static for a static property or for a static type?

Derick Rethans 8:14

Right, because we don't enforce that a static goes or goes before or behind public, private, or protected.

Nikita Popov 8:21

Yeah.

Derick Rethans 8:21

At least not in the syntax. I mean, I think coding standards actually do most of the time require the static to be before.

Nikita Popov 8:27

Even the coding standards they would require you to write it as public static, not this static public.

Derick Rethans 8:35

Oh, really? Okay. I thought was the other way around. Yeah, that is difficult. Because then you don't know which static is meant here.

Nikita Popov 8:41

Yes, and we just allow on the, disallow it on the grammar level. It's actually a bit ugly, because we have to like duplicate the whole type grammar two times, once to include static, once to not include it, just to deal with this ugly of conflict.

Derick Rethans 8:56

That's what happens when you come with something clever. You need clever workarounds.

Nikita Popov 9:00

So it's unfortunate that the static keyword has like three or maybe four completely different meanings in PHP. Simply I think, simply because people wanted to re use a keyword, instead of introducing a new one

Derick Rethans 9:15

Because introducing new keywords might end up meaning breaking people's code.

Nikita Popov 9:19

On the downside, reusing keywords makes code confusing, because well, at least I got the impression that some people find the use of static for late static binding somewhat confusing. And I can also see if you see methods that has signature public static, whatever and return static, and you're not like super familiar with what all of that means.

Derick Rethans 9:46

And that is quite a common pattern because this named constructors are static methods that return static. Let's move on to the next one, which is a tiny RFC that you came up with, which is the Class Name Literal on objects. What does this do?

Nikita Popov 10:03

The syntax where you write a class name, then the double colon class. And that just returns you the fully qualified class name. For example, have a use statement for that class, you get back the full name instead of the short name. I think we've had this since PHP 5.5. And it's a great feature because it's like makes it clear where you're referencing the class and not just some random string. And that means, for example, that that IDE refactorings could work better and so on.

Derick Rethans 10:35

Okay.

Nikita Popov 10:36

The actual RFC is very simple. Currently, the class syntax is only allowed on like literal class names, but you can take an object variable and get the class of that object using the syntax.

Derick Rethans 11:48

However, PHP has a function for that already, which is called get_class() right?

Nikita Popov 10:52

Exactly. This is essentially just syntax sugar for get_class(). The reason why we want to have the syntax sugar is really not so much that writing get_class() is particularly hard, but just that people expect it to be there. This class syntax looks a lot like a constant access, like a class constant access. So it looks like every class has a magic constant called class. Usually you are able to access class constants on objects. So you can write object, the double colon, and the constant thing. And that works. So in that case, we just take the class of the object and access it on that class. For consistency reasons, it only makes sense that you can do the same with this particular magic concept as well. There's really all the motivation

Derick Rethans 11:43

Originally the class literal colon colon class is resolved at compile time. Of course, that can't happen on object colon colon class. Is that still true or no longer?

Nikita Popov 11:53

So it really was true in the first place. For normal class names of cours is resolved at compile time. Actually one of the like gotchas with the syntax is that some people expected to validate that the class actually exists. So they expect that this gets auto loaded and they get an error if it doesn't exist, doesn't happen. So you can reference some non existing class with this syntax just fine. The usually your IDE is going to show a warning for that. I mean, as we just discussed, we also have a couple of magic class names. So we have self, parent, and static. The static one in particular, also always has to be resolved at runtime, because we don't know what the what class the method is actually going to be called on. Actually, self and parent also sometimes have to be resolved at runtime. And there are two cases where that can happen. One is if you use traits, because in that case self refers to the using class, not to the trait. So in closures the self class, refers to the bound scope. The bind to method, there is like the last argument on, is the scope you're using. So in those cases, it's already dynamically resolved.

Derick Rethans 13:09

Okay. The RFC mentions one specific area where you can't use colon colon class. In which situation can you still not use colon colon class on objects?

Nikita Popov 13:20

You can always use it on an object. I think what you're referring to is that normally, for normal class constants, you can also put the class name inside the string. I mean, put the string class name inside the variable and then access the constant on that variable.

Derick Rethans 13:38

Oh, right. Yes.

Nikita Popov 13:39

For the double colon class syntax, we don't want to allow that. Because, well, first this is kinda useless, because it will just return you back the same string you gave it. And I think in that case, the fact that the class name is not validated, this is especially confusing.

Derick Rethans 14:00

Okay, that makes sense. So you can only call colon colon class on literal class names that you already could, as well as on variables that contain an object?

Nikita Popov 14:09

That's right? Yeah.

Derick Rethans 14:10

That sounds great. Does it show up differently in reflection?

Nikita Popov 14:13

This magic class constant actually doesn't show up in reflection at all. It looks like a constant both it's really a special syntax that just happens to share the look with constants.

Derick Rethans 14:24

Do you expect any controversies about this?

Nikita Popov 14:27

I don't think so.

Derick Rethans 14:28

I don't think so either. I can't really see anything that people could complain about too much. I think. I however, do think that for the next RFC that you came up with the variable syntax tweaks, there will be a little bit of haggling about whether this is good idea to do. In PHP seven, zero, we got this uniform variable syntax. Could you give a brief reminder of what it was about?

Nikita Popov 14:48

That was about, well fixing a couple of syntax inconsistencies when it comes to variables syntax. So variable syntax in PHP is extremely, extremely magic. Like our expression, syntax nice and regular. But the variable syntax is a huge assortment of special rules and that RFC made those rules little bit less special at more regular.

Derick Rethans 15:17

From what I understood we missed a few inconsistencies that we probably also should have addressed in that RFC. And that is what you know, trying to tweak again?

Nikita Popov 15:24

All of these remaining consistencies are like really, really minor things and edge cases. But weirdly, all or at least most of them are something that someone at some point ran into, and either open the bug or wrote me an email or pinged me on Twitter. So people somehow managed to still run into these things.

Derick Rethans 15:52

The RFC mentions four specific things that we've missed. What is the first one?

Nikita Popov 15:57

Yeah, so it's probably going to be somewhat hard to talk about some of these examples.

Derick Rethans 16:02

I know because I think some of them make no sense whatsoever.

Nikita Popov 16:05

Yeah.

Derick Rethans 16:06

Because how do you call a method on the string?

Nikita Popov 16:07

Context for this one is I have a nice little extension called scalar objects, which allows you to more or less define methods on strings, on integers, on arrays and so on. In with the uniform variable syntax, we have allowed calling methods on string literals. That actually makes no real sense with baseline PHP. But if you're using scalar objects, then this is a useful feature because you can do something like take a string that rule and call length on it, while otherwise we'll have to wrap it in brackets.

Derick Rethans 16:45

So it's just a syntax change pretty much.

Nikita Popov 16:48

Well, what this one particular is about that right now, this works if it's a literal string, but if you have any variables inside it than suddenly stops to work, which is just a very.

Derick Rethans 16:59

So it is the interpolated strings inside double quotes, the dollar variable name syntax. That's the problem that?

Nikita Popov 17:05

Yeah.

Derick Rethans 17:06

The second one is called constant derefenceability, which is a word I can't pronounce. And my text edit says it's not a word. So what do you mean by it?

Nikita Popov 17:14

That's a good question. I think the term is more or less picked up from C, where we have pointers. And we can dereference pointers to access what the pointer points to. So that's the star operator in C. In PHP, we use the term dereference to also access some kind of structure in some way. For example, to access an array element, so array dereference, or to access an object properties, object reference and so on. That particular one is, I think two things. One is that you can, for example, access the first character of a constant. So read the constant name then brackets zero. Well, maybe even not the first time I can think of a better example. Um, you haven't the constant that contains an array, and you want to access a specific key on that area. That's something you can do already you right now. The same syntax does not work if the constant is in magic constant. What also doesn't work is if you use the our alternative array access syntax. So we have the square brackets, that's what people should use. And we have the curly braces, which is the alternative way to access arrays and which is actually deprecated as of 7.4. I'm not totally sure that that's going to be removed in PHP 8 or not. If it's going to be removed, then this part is a moot point. But yeah, this is again, I think, from a practical perspective, not really interesting. The only situation where I think this is useful is again, of course scalar objects, because it means you can call the methods on the constant

Derick Rethans 18:57

Okay, which in syntax is the grammar currently disallowed doing that.

Nikita Popov 19:00

Currently it's disallowed and that would allow it.

Derick Rethans 19:03

A third one is related. I think it's a class constant dereferenceability.

Nikita Popov 19:07

So someone complained about this one on Twitter. I don't know how they ended up trying to do this. Something you can do right now is, you can access a static property. And then you can interpret the content of that static property as a class name, and access another static property on that. So you can change chain these static property accesses. For some reason, the same does not work with class constant access. So static property accesses can be chained. But class constant accesses can't be. Again, for no particular reason, this change would allow that to happen as well.

Derick Rethans 19:41

This is even a change that makes sense without having to use scalar objects.

Nikita Popov 19:45

That one is. I wouldn't write that kind of code, but it logically makes sense.

Derick Rethans 19:50

And then the last one is, in the RFC is called arbitrary expression support for new and instanceof.

Nikita Popov 19:56

Yeah, so this is probably the only one that's actually useful for something. PHP has well, a bunch of places where usually you have to place either an identifier or a namespace name, but class name, method name, or a property name, and so on, or even the variable name. For all of these places, we usually support some kind of special syntax to instead use a general expression. For example, some of the variable with a static name, you can use curly braces to use a dynamic name instead.

Derick Rethans 20:26

I think for new, we did it at some point already.

Nikita Popov 20:29

For new, this doesn't exist yet. So you can use a variable as class name, but you can't actually compute the class name as part of the expression.

Derick Rethans 20:40

I think what I was referring to, is you can use braces around the whole new class extension, so you can call methods. So that's that's what I meant, but this is specifically using an expression behind new.

Nikita Popov 20:51

Yeah, so these are like two things. One is whether you use an expression for the new class name, and the others for the use the new itself as an expression. And yeah, the same, so yeah, right now, we don't support that for new. And we also don't support it for instanceof, so the the right hand side, which consists of that as the class new. The RFC just proposes to allow an expression and parenthesis in there. And this kind of stuff is, again, not well not particularly useful. But it is useful for things like code generation, where you may have to insert arbitrary expressions sets up your coins. And there are actually some nice hacks that you can use right now. So you can use a variable with a complex expression inside it, where you assign to the variable itself and then return its name.

Derick Rethans 21:42

I don't think I understand this. You're saying you can construct a string with a complex expression in it.

Nikita Popov 21:48

Not a string. You you write something like new variable, but with a curly brace syntax, and in there you return you start off with the string containing some kind of dummy variable name, and then you concatenate that with an empty string. But that empty string is computed by doing the assignments to the variable name that you're actually going to return.

Derick Rethans 22:12

I still don't understand this. You know, what I'm going to do is I'm just going to link to a example for this in the show notes.

Nikita Popov 22:19

It's not really important. You can just cut off this part.

Derick Rethans 22:22

Yep sure, I can do that too-perfectly fine.

Nikita Popov 22:24

Nice hack.

Derick Rethans 22:24

But let's not teach too many hacks to people such I think. Thank you for taking the time with me today, Nikita to talk about a bunch of little RFCs that you've written. Perhaps by the time this podcast comes out, we've started voting on them and we'll see what happens to them.

Nikita Popov 22:37

Thanks for having me once again.

Derick Rethans 22:41

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next week.


PHP Internals News: Episode 39: Stringable Interface

PHP Internals News: Episode 39: Stringable Interface

In this episode of "PHP Internals News" I chat with Nicolas Grekas (Twitter, GitHub, LinkedIn, Symfony Connect) about the new "Stringable Interface" that Nicolas is proposing, as well as about voting rights (on RFCs).

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. Hello, this is Episode 39. Today I'm talking with Nicholas Grekas about an RFC that he's produced called stringable interface. I already spoke with Nicholas last year about the work that Symfony does the new PHP versions come out to look at deprecations and to make sure that versions of Symfony work with new versions of PHP. But this time Nicholas came up with his own RFC called the stringable interface. Nicholas, could you explain what streamable is?

Nicolas Grekas 0:54

Hello, and Stringable is an interface that people could use to declare that they implement some the magic toString() method.

Derick Rethans 1:02

Because currently there's not necessary to implement an interface, and PHP's internals will always use toString if it is available in a class, right?

Nicolas Grekas 1:10

Yeah, absolutely.

Derick Rethans 1:11

What is true reason why you would want to have a stringable interface.

Nicolas Grekas 1:16

So the reason is to be able to benefit from union type in PHP 8. Right now, if you want to accept a string as an argument, it's pretty easy. You just add the string type, right? Let's say now you want to accept a string or a stringable object, stringable an object being something that implements this method. If you want to do that, you can not express the type using types today.

Derick Rethans 1:42

Because if you choose string, and then the name of an object that would only do that specific object.

Nicolas Grekas 1:47

Yes, there are some cases in Symfony especially because this is where work and I do open source. Where we do want to not call toString method until the very latest moment. after example is in the code: one is from Drupal. Drupal computes some constraint validation messages, lazyly, and it's pretty important to them because computing the message itself is pretty costly. They don't need to compute it all the time. Actually, we added the type, the string type in Symfony five, before it was released and Drupal came and say: Oh, this is breaking our code and our features, what should we do now? And we removed the type and we replaced it by some annotation saying: Okay, this is a string or a stringable object. So in the future, when will add up PHP 6 would like to be able to express that using a type of real one,

Derick Rethans 2:41

PHP 6?

Nicolas Grekas 2:42

No, PHP 8, that's true. Strings and PHP 6.

Derick Rethans 2:49

Yay.

Nicolas Grekas 2:51

Another example is also is pretty similar, actually. It's in the symfony auto wiring system. We have services that we wire and sometime we can not; the auto wiring logic is broken doesn't work because some class cannot be at a wet. So in this case, we have a lazy message, because sometime of service while it's not auto wireable, it's going to be removed later on because we removed, Symfony removes, unused services. So instead of computing ahead of time and error message that is heavy to compute, and that we might just trash because the service is going to be removed. We have this lazy thing because yeah, it's heavy to cook with that. So real world use cases.

Derick Rethans 3:32

I think the intention by by having a stringable interface actually makes sense. What are the concerns for for adding this to your own code, are issues with backwards compatibility, for example?

Nicolas Grekas 3:43

That's another goal of the RFC. The way I have designed it, is that I think the actual current code should be able to express the type right now, using annotations of course. So what I mean is that the interface, the proposal, the stringable is very easily polyfilled. So we just create this interface into global namespace, the declarative method, and done. So we can do that now. We can improve the typings now, and then in the future, we'll be able to turn that into an actual union type.

Derick Rethans 4:16

You'd be able to do that almost immediately. Well, you would be able to do that in PHP 8.

Nicolas Grekas 4:21

Yeah.

Derick Rethans 4:21

Without it being a problem. And of course, in that case, you can remove to polyfilled stringable interface.

Nicolas Grekas 4:27

Yeah, absolutely.

Derick Rethans 4:28

This is going to impact extensions, as well, because extensions, I mean, PHP, internal functions, they often accept strings. I don't actually remember but if you use a scaler type hint string for an internal method than PHP or internal function, this is actually called a toString interface on objects. Like if you would call strlen() on an object that implements toString would actually call toString and return the length to that result.

Nicolas Grekas 4:53

Yes, absolutely.

Derick Rethans 4:54

So that wouldn't impact that specific case then.

Nicolas Grekas 4:57

About extension because that's the current state of the implementation of extension, there was a discussion we're going to talk a bit later about, I think. The current state of the art say is that the interface declares the method that just run right, it declares the written type. It's colon string. So the declaration is public function "toString : string". The very first version didn't have the written type, because it's easier for backward compatibility. Because the current code doesn't need the written type. So by not adding it to the interface, we don't break backward compatibility, which is another critical lighting designer feature that I want at least to have. And so feedback came on the first pull request and said okay, we need the written type. So, the way I implemented that is that now in the RFC actually, the written type is implicit. toString, if you declare it, whether you type ": string" or not, it's there. If you do some reflection later on an instance of something that that then the reflection will tell: Yes, there is a written type and it's string,

Derick Rethans 6:01

Whether you have defined it or not in your class. So that's a little bit of magic that gets added on.

Nicolas Grekas 6:07

So it doesn't break any semantics because the written type is already in force: you cannot return anything else than the string right now.

Derick Rethans 6:14

Yeah, that's true. So that means that automatically toString methods will in return type hints require string to be returned.

Nicolas Grekas 6:21

Yes.

Derick Rethans 6:22

And that tweak was necessary to make sure that an older backward compatibility was being broken.

Nicolas Grekas 6:27

Yes

Derick Rethans 6:28

Does that also extends to extension that no part that are not part of the PHP core distribution, do they need to be changed as well?

Nicolas Grekas 6:35

So right now, in the current implementation, yes, they need to be changed. If they declare the toString method, they need to change the type basically, to declare that they return the string explicitly in the C code. So that the current state it's pretty easy on the implementation, implementation side to ask that to the extension authors, right? I think it is doable, but Nikita today posted proposal to improve and go to the next level of the RFC. And the next level would be to have the same magic for the declaration of the interface itself. So it would mean if you declare a toString method, then you implement the stringable interface without having to explicitly declare it in the class.

Derick Rethans 7:22

I think that actually makes quite a bit of sense because that is pretty much how toString is used already. Anyway, the PHP engine enforces it has to be a string that's being returned.

Nicolas Grekas 7:31

Yeah, that's very interesting in that would make the type as a typehint much more useful because any pre existing code would just work with the type and pass the type into the written type and so on. So that would be great. So the link with the extension is that maybe we should have the same automatic declaration implicit declaration applied to extensions. So then extension to boodle have to do to do anything and done. That would declare both the written type and the interface.

Derick Rethans 8:03

That makes sense. You mentioned that Nikita just suggested something to tweak this RFC. I reckon this RFC is still open for discussion and voting hasn't started on it yet.

Nicolas Grekas 8:13

Yes.

Derick Rethans 8:13

Do you have any sort of idea for a timeframe where you think this will be finished?

Nicolas Grekas 8:17

The earliest is on February 6, because we know we need to wait two weeks. So I opened that so we go. I don't know how to write the last part of what we discussed. So Nikita's suggestion. So I'm asking him to some help. As soon as it's ready. I think it can be open for voting. So it can be 10 days. So it didn't trigger much discussions on internals, which I don't know. Maybe it's a very, it's a good point. Or maybe it's like people will vote against without expressing why, I don't know. I hope it's a good thing.

Derick Rethans 8:50

Sometimes people just start paying attention and there's a new vote.

Nicolas Grekas 8:53

Yeah.

Derick Rethans 8:54

So there wasn't a lot of controversy about stringable as you just said, but there was some controversy about you actually apply for voting rights, I remember what happened there?

Nicolas Grekas 9:03

So yes, I applied for voting. Because of my implication, I think I'm an active PHP contributor to internals in not on not on the C-side, but Okay, so since I wanted to open this RFC, I said: Okay, now it's time to do the bureaucratic steps to get a vote, right?

Derick Rethans 9:23

Yep.

Nicolas Grekas 9:24

And I think I'm the first person to actually get through some process for getting votes in itself. I mean, I think most people or maybe all people that have a vote, a vote as a side effect of of something else.

Derick Rethans 9:38

Yeah, usually about contributing patches, either PHP itself, documentation or extensions.

Nicolas Grekas 9:43

So I think there's there has been some confusion, but it's been sorted out pretty quickly. I think I'm going to be able to vote on the next RFC. I'll report back if I can.

Derick Rethans 9:54

Okay, fair enough. Currently, we don't really have a process for this at all. I mean, you get to vote when you have a GIT account. Pretty much, or a PHP commit access in some form. And I don't think we've ever really thought about handing that out to people that have been contributing a lot. Right. So that's kind of an interesting thing to see. What we have seen in the past, is people wanting just saying: Yeah, I'd like to vote, or in other cases, or yeah can I have a php.net email address, right. So that also happened because that is a side effect of getting commit access.

Nicolas Grekas 10:23

Okay.

Derick Rethans 10:24

At the moment I what happened when you did it, it got immediately shut down. Probably a bit quicker than was nice without any discussion. But I think in the future, we do need to come with, come up with a plan and perhaps even think about how to approach voting for features for RFCs the first place because we don't really have a set guideline on who gets to do this and who doesn't get to do it and stuff like that.

Nicolas Grekas 10:49

Yeah, it's pretty interesting. Nikita just after the or during the discussion at, he posted some stats on the number of people who can vote and I think the number is like 1900

Derick Rethans 10:59

Yeah. There's quite a lot here.

Nicolas Grekas 11:01

It's bit strange. And most people don't vote, I think, because they think they shouldn't. I don't know, something like that. But it's true. It's pretty strange. What I like about this situation is that it doesn't draw a strong line between people that contribute C code and people that write PHP code. And it's nice for PHP. I really think it's nice for PHP to have people that vote that don't do C code. But I think, of course, people that do C code must have the strongest voice, because at some point, the implementation decides.

Derick Rethans 11:35

Well, that is a different right, the votes are usually on the idea, not on the implementation. But sometimes the implementation is so complicated that it's nearly impossible to implement, like, I've very briefly spoken with Nikita about generics. I'm sure we'll talk about that at some point, where I'm pretty sure that generics is an idea that simple, I mean, people will vote for it, but as an implementation it might not be that simple to do.

Nicolas Grekas 12:01

Yeah.

Derick Rethans 12:02

So what happens if you vote for the feature, but you can't come up with a good implementation?

Nicolas Grekas 12:06

So I'm inside of thinking that people should vote on the implementations. I mean, people shouldn't be able to vote only on an idea. If there is an idea, it will be supported by an implementation that proves that we are talking about something real, no, just a fancy idea that might not work in black. So that's my opinion.

Derick Rethans 12:24

That's a good point. But as you said, from the 1900 people, or or 1900 people plus, that's controlled, most of them are not familiar with a PHP internals whatsoever, because they tend to be contributions to the documentation. This is also very valuable, but it doesn't mean you know, and you don't necessarily know PHP internals,

Nicolas Grekas 12:40

Yeah, sure.

Derick Rethans 12:41

The oher way can be true as well right? You might know a lot about PHPs internals, but never really use PHP in real life, in your job, or anything like that.

Nicolas Grekas 12:48

So it's also good to be able to team up with someone that knows how to code the C part, the internal part. So you have the idea you're you're the supporter part of the team and then someone - being able convince someone to do the implementation or to help you do it, is also proof of kind of interest. So starting small and bringing more people in the boat and making it happen as a thought.

Derick Rethans 13:12

Yeah, and we saw some of that happening last year. I can't quite remember what feature it was or or exactly what it was. But I agree with you. I think that is important to do that you can at least somebody convinced to implement the feature before just voting on the idea. Thank you for taking the time with me this morning, Nicholas.

Nicolas Grekas 13:30

And thank you Derick for having me again.

Derick Rethans 13:32

It it continues like this I'm sure we'll speak again at some point in the future.

Nicolas Grekas 13:35

Okay.

Derick Rethans 13:39

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next week.


PHP Internals News: Episode 38: Preloading and WeakMaps

PHP Internals News: Episode 38: Preloading and WeakMaps

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about PHP 7.4 preloading mishaps, and his WeakMaps RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weeklish podcast dedicated to demystifying the development of the PHP language. This is Episode 38. I'm talking with Nikita Popov about a few things that have happened over the holidays. Nikita, How were your holidays?

Nikita Popov 0:34

My holidays days were great.

Derick Rethans 0:36

I thought I'd start with something else then I did last year. In any case, and wanting to talk to you this morning about something that happens to PHP seven four over the holidays. And that is issues with preloading on Windows with PHP seven four. I have no idea what the problem is here. Would you try to explain this to me?

Nikita Popov 0:56

So there were actually quite a few issues with preloading in early PHP 7.4 releases. The feature definitely did not get enough testing. Most of the issues have been fixed in 7.4.2. But if you're using preload-user, what you have to use if you're running on the root, then you will probably still see crashes and that's going to be fixed in the next release.

Derick Rethans 1:20

In 7.4.3.

Nikita Popov 1:22

Right. But to get back to Windows, Windows has a well very different process architecture than Linux. In particular, on Linux, or BSD we have fork. Which basically just takes a process and copies its entire memory state to create a new process. This is a lot cheaper than it sounds because it's all like reuses memory until it's actually changed.

Derick Rethans 1:48

Its copy on write.

Nikita Popov 1:49

Copy on write exactly. The same functionality does not exist on Windows, or at least it's not publicly exposed. So on Windows, you can only create new processes from scratch, that look, we use our memory from the previous one. And for OPcache, this is a problem because OPcache would really like to reference internal classes as defined by PHP. But because we store things in shared memory, which is shared between multiple processes, we now have the problem that these internal classes can reside at different addresses, in these different processes. On Linux, it's always going to be the same address because we are forking and that keeps the address. On Windows each process could have a different address. And especially because Windows since I think Windows Vista, uses address space layout randomization. This is actually pretty much always going to be a different address.

Derick Rethans 2:51

Because that's a security feature?

Nikita Popov 2:52

Exactly. It's a security feature.

Derick Rethans 2:54

Would it also be a problem on Linux if you'd start a process instead of forking it?

Nikita Popov 2:59

Yes, it would be a problem. The difference's just that on on Unix, we don't do that. OPcache has quite a different architecture on Windows. On Linux, we do not allow to attach to an existing OPcache from a separate process. So the only way to share an OPcache is to use fork. On Windows because of this restriction that we don't have fork, we do though this kind of attachments and that's where we have we have to deal with these kind of issues. So that's actually a general problem, not just for preloading on differences, just that normally, we can just: Hey, okay, we do not allow any references to internal classes from shared memory on Windows. It's like a slight hit to optimization, but it's not super important. While we're preloading, we have to link the entire class graph during preloading. And if you have any classes that for example, extend from an internal class, like extend from Exception. Or in some cases, you can just use an internal class as a type hint, then we would not be able to store these kinds of references in shared memory on Windows. And because for preloading, it's pretty much inevitable that you run into the situation you just can't realistically do preloading on Windows,

Derick Rethans 4:18

Hence, the decision being made just turning it off, instead of trying to end and always failing pretty much.

Nikita Popov 4:24

Yeah, I mean, it kind of did work before, it just got a bunch of warnings that these classes haven't been preloaded. And if people try that, oh, it's like with a simple example there, we'll see you great, preloading is working. But once they move to their actual complex application that uses internal classes at various points, it turns out that: Actually, no, it doesn't really work in practice. And so the way that we just disabled entirely

Derick Rethans 4:51

That seems like a reasonable solution to this, do you think at some point this can be fixable in another clever way?

Nikita Popov 4:58

Well, main way in which can be fixed is to avoid this kind of multi process attachments on Windows. The alternative to having multiple processes is to have multiple threads, which do share an address space. Basically same as fork just with threads then. But that, of course, depends on what kind of web server you're using and what kind of SAPI you're using. And I think nowadays, on Windows on threaded web servers are somewhat more popular than on Linux, it's still not the majority deployment strategy.

Derick Rethans 5:34

I think it used to be that threaded process models on Windows were a lot more common when PHP just came out for Windows, because it was an ISAPI module which was always threaded. From what I remember the original reason why we had ZTS, in the first place. Yeah, at some point that started moving to PHP FPM kind of models because it didn't use threading and it was, tended to be a lot safer to use it that way.

Nikita Popov 5:57

Right. I mean, threading has issues in particular because things like locales are per process, not per thread. So processes are usually safer to use

Derick Rethans 6:08

Anything else interesting that happened that went wrong with a preloading, or do you not want to mention?

Nikita Popov 6:12

The rest is mostly just that we have two different ways of doing preloading. One is using OPcache compile file, and others using require or include, and the difference between them is that OPcache compile file combines the file but does not executed. In that case, the way we perform preloading is that we first collect all classes and then we, like gradually, link them, actually register them, always making sure that all the dependencies have already be linked. And this is the mode that that I think mostly work well at the release of PHP seven point four. And the other one, they require approach is where we, well require directly executes the code and registers the classes. And in that case, basically, if it turns out that some kind of dependency cannot be preloaded for some reason, we simply have to abort preloading, because we cannot recover from that. This abortion was missing. And it that turns out that, in the end, the way people actually use preloading is using the require approach, not using the OPcache compile file approach.

Derick Rethans 7:26

Although that's the one you see most of the examples that I've seen, and in the documentation.

Nikita Popov 7:30

Right, it has some advantages you some require.

Derick Rethans 7:34

Something else that happened over the holidays is that you've worked on several RFCs there're too many to talk about at all in this episode. But one of the earlier ones, was a WeakMap, or WeakMaps RFC, which sort of builds on top of the weak references that we already got in PHP seven four. What's wrong with the weak references, and why do we now need weak maps?

Nikita Popov 7:58

There's nothing wrong with weak references. As a reminder what weak references are both, they allow you to reference an object without preventing it from being garbage collected. So if the object is unset, then you're just left with a dangling reference. And if you try to access it, you get back knowledge of the object. Now, the probably most common use case for any kind of weak data structure is a map or an associative array, where you have objects and want to associate some kind of data with them. Typical use cases are caches or other memoise data structures. And the reason why it's important for this to be weak is that you do not well, if you want to cache some data with the object, and then nobody else is using that object. You don't really want to keep around that cache data because no one has ever going to use it again. And it's just going to take up memory usage. And this is what the weak map does. So you use objects as keys, use some kind of data as the value. And if the object is no longer used outside this map, then is also removed from the map as well.

Derick Rethans 9:16

So you mentioned objects as keys. Is that something new? Because I don't think currently PHP supports that.

Nikita Popov 9:22

I mean, you can't use objects as keys in normal arrays. That doesn't work. For example, the array access interface and the traversable interface, they don't really care what your types are. So you can use anything.

Derick Rethans 9:37

I glanced over that that point, yes. But weak map is something that then implements array access.

Nikita Popov 9:44

That's right

Derick Rethans 9:45

How does the interface of a weak map look like? How would you interact with it?

Nikita Popov 9:49

Yeah, actually, it just implements all the magic interfaces in PHP. So ArrayAccess, you can access the roadmap by key, where the key's object. Traversable, that is you can iterate over the weak map and get both the keys and values, and of course Countable, so you can count how many elements there are in there. And that's it.

Derick Rethans 10:12

All the methods, there's plenty of em then, there should be nine or 10 or so right?

Nikita Popov 10:17

Five.

Derick Rethans 10:18

No there's the six of iterator.

Nikita Popov 10:20

Right, yeah, there is this little detail where when you implement Traversable, internal classes, you don't actually have to implement iterator methods. That's why there is a few, a few less.

Derick Rethans 10:33

Who's going to benefit from this new feature?

Nikita Popov 10:35

Like one of the users for weak maps are things like ORMs. Where, well, database records are represented as object, and there is data storage related to these objects. And I think it's a, well, well known issue that if you're using ORMs you can sometimes run into Memory Usage issues. And the absence of weak structures is one of the reasons why that can happen. So that they just keep holding onto information even though the application actually doesn't use it anymore.

Derick Rethans 11:12

Did a specific ORM request this feature?

Nikita Popov 11:15

I don't think so.

Derick Rethans 11:16

Because weak maps are something done as an internal class in PHP, how are these things implemented? Is there something interesting because I remember talking to Joe about weak references last year, there is some functionality where it would automatically do something on the destructor or rather of the objects. Is this something that also happens with weak maps.

Nikita Popov 11:37

So yeah, the mechanism how weak references and maps work is basically the same. So there is a flag on each object, that can be set to indicate that it has a weak reference or weak map. If the object is destroyed, and has this nice flag, then we execute a callbeck that is going to remove the object from the Weak Reference or from the weak map, or from multiple maps.

Derick Rethans 12:05

Is it because there are some kind of registry that links an object?

Nikita Popov 12:08

So when we store all the weak references, weak maps, and the object as part of, so we can efficiently remove it.

Derick Rethans 12:16

When I was reading the RFC, I saw something like SPL object ID mentioned, which is a way how to basically identify a specific object. Is this something related to weak references or weak maps? Or is this something else no longer used, or people should no longer use pretty much, because I guess this was a way previously how to identify an object and then associated extra data with it. Like you mentioned that ORMs were due for cache.

Nikita Popov 12:44

Right. So it's kind of related, but I'm also not. So one is not a replacement for the other, just different use cases. We used to have SPL object hash for a very long time. And I think, somebody went around PHP 7.0, or maybe later SPL object ID was introduced, which this the same just because an integer and because because of that is more efficient. But in the end, what these functions do is return a unique identifier for an object. But this identifier is only unique as long as the object is alive. So these object IDs are reused when objects are destroyed.

Derick Rethans 13:30

And that makes them not usable for associating cache data with a specific object?

Nikita Popov 13:35

That makes them usable for associating cache data. But you also have to store the object to make sure it does not get destroyed in the meantime. So that's how you get around the restriction that you cannot use objects as array keys. That's what you need the ID for. But you still have to store the like a strong reference to the object to make sure it's not garbage collected. And this ID starts referencing some kind of other objects.

Derick Rethans 14:04

When you say Strong Reference, that is what PHP references are traditionally?

Nikita Popov 14:08

That's the normal reference.

Derick Rethans 14:10

Well, because it's been quite some time since it's got introduced from what I understood this has been accepted?

Nikita Popov 14:16

It is accepted: 25, zero

Derick Rethans 14:18

25, zero. That doesn't happen very often.

Nikita Popov 14:22

Most RFCs are maybe not anonymous, but usually either they are 95% accepted, or they rejected really hard. There is not a lot of middle ground.

Derick Rethans 14:34

That's pretty good, though. In any case, we will see this in PHP 8, I suppose, coming out later in the year.

Nikita Popov 14:39

That's right. Yes.

Derick Rethans 14:41

Well, thank you for taking the time today to talk to me about weak references and preloading especially on Windows. Thank you for taking the time.

Nikita Popov 14:50

Thanks for having me Derick

Derick Rethans 14:52

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 37: PHP 7.4 Celebrations!


PHP Internals News: Episode 36: What didn’t make it into PHP 7.4?


PHP Internals News: Episode 35: Cryptography


PHP Internals News: Episode 34: Deprecate Backtick Operator


PHP Internals News: Episode 33: Union Types


PHP Internals News: Episode 32: Writing Extensions