PHP Internals News: Episode 54: Magic Method Signatures

PHP Internals News: Episode 54: Magic Method Signatures

In this episode of "PHP Internals News" I chat with Gabriel Caruso (Twitter, GitHub, LinkedIn) about the "Ensure correct signatures of magic methods" RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 54. Today I'm talking with Gabriel Caruso about his ensure correct signatures of magic methods RFC. Hello Gabriel, would you please introduce yourself?

Gabriel Caruso 0:37

Hello Derick and hello to everyone as well. My name is Gabriel. I'm from Brazil, but I'm currently in the Netherlands. I'm working in a company called Usabila, which is basically a feedback company. Yeah, let's talk about this new RFC for PHP eight.

Derick Rethans 0:52

Yes, well, starting off at PHP eight. Somebody told me that you also have some other roles to play with PHP eight.

Gabriel Caruso 0:59

Yeah, I think last week I received the news that I'm going to be the new release manager together with Sara. We're going to basically take care of PHP eight, ensuring that we have new versions, every month that we have stable versions every month free of bugs, we know that it's not going to happen.

Derick Rethans 1:17

That's why there's a release cycle with alphas and betas.

Gabriel Caruso 1:20

Yeah.

Derick Rethans 1:21

I've been through this exactly a year early, of course, because I'm doing a seven four releases.

Gabriel Caruso 1:25

Oh, nice. Yeah. So I'm gonna ask a lot of questions for you.

Derick Rethans 1:29

Oh, that's, that's fine. It's also the role of the current latest release manager to actually kickstart the process of getting the PHP, in this case, PHP eight release managers elected. Previously, there were only very few people that wanted to do it. So in for the seven four releases it was Peter and me. But in your case, there were four people that wanted to do it, which meant that for the first time I can ever remember we actually had to hold some form of election process for it. That didn't go as planned because we ended up having a tie twice, which was interesting. So we had to run a run off election for the second person between you and Ben Ramsey, that's going to go continuing for you for the next three and a half years likely.

Gabriel Caruso 2:11

Yep.

Derick Rethans 2:12

So good luck with that.

Gabriel Caruso 2:13

Thank you. Thank you very much.

Derick Rethans 2:15

In any case, let's get back to the RFC that we actually wanted to talk about today, which is the ensure correct signatures of magic methods RFC. What are these magic methods?

Gabriel Caruso 2:24

So PHP, let's say out of the box, gives the user some magic methods that every single class have it. We can use that those methods for anything, but basically, what magic methods are are just methods that are called by PHP when a given action happens to the class. So for example, if a class is being constructed, then the construct magic method is going to be called. If I'm calling serialize function, then the magic method serialize as per PHP seven four or PHP eight. I don't remember, so this is basically what magic methods are, are methods that PHP hook into the classes and then once a certain action happened with the class, then PHP is going to call those magic methods in something magic, so to speak is going to happen.

Derick Rethans 3:13

And other options are like underscore underscore get, and underscore underscore set.

Gabriel Caruso 3:17

We have, we have a lot.

Derick Rethans 3:19

Exactly, what do people tend to use these magic methods for?

Gabriel Caruso 3:22

So that's something interesting. As the magic method is called by a number of actions we can use, for example, for let's let's get the example of ORM for example, Doctrine or Eloquent or whatever one. Let's say I'm a maintainer of that library. I don't know what fields do you have in your database. So when I'm porting, when I'm doing the translation, what it can do is map in a property, all those columns and values that I have in the database. And then when you instantiate your entity and you try to access a variable that is does not exist, then we're going to go to a magic method in this case is get, as I said, and I'm going to say okay, is not set in the class, but is mapped in the entity that I have. So this is one case, we also have the case for testing your you have, for example, the famous PHP Unit test framework, every time that a test case is called with all those methods is starting in with test, the call magic method is invoked. And then you can perform whatever action you have. You also have middlewares and the examples go go even further

Derick Rethans 4:32

In the title of RFC you have the word signature, what is the signature?

Gabriel Caruso 4:37

All the attributes that our method can have. So for example, the name of a method is its signature, what does it return? What parameters does it take? And also what modifiers so for example, is it static or not? Is it public, private or protected? So all this information together in usually is one line in PHP. So for example, private static MyMethod, that receives a string and returns a Boolean. There you go. This is the signature of my method

Derick Rethans 5:06

Because some of these magic methods have been in PHP for a long long time. Back in the time where we didn't have argument types or return types or perhaps not even static. All the way back from the past PHP hasn't really done anything with signatures because they've simply didn't exist. At the moment which signature checks this PHP already do?

Gabriel Caruso 5:26

I don't remember a by the RFC but I think was introduced together with the scalar type RFC. But only constructors and destructors until PHP seven four, those two only magic methods were being checked. If they have none return type, not even void, just no return type. But in PHP eight, we're gonna have the new stringable interface and then every single toString magic method. If it is typed, this is very important if it is typed it needs to be a string and these are the only from the 17 that we have only three in PHP 8 are being checked.

Derick Rethans 6:01

PHP seven four.

Gabriel Caruso 6:02

Yeah, in PHP seven four only two and then PHP eight, we have the new toString.

Derick Rethans 6:07

But this RFC suggesting to change that of course.

Gabriel Caruso 6:10

yeah.

Derick Rethans 6:11

What's the reason why you want to extend these checks to the other magic methods?

Gabriel Caruso 6:14

That brings me back how I figured out that. I was looking at some bugs, because we have the https://bugs.php.net, where we centralized all the bugs of PHP. Then there is a bug report explaining in complaining exactly about that. Like, I can't hide my magic method. Back in the days I can say, for example, that my tostring method is going to return an integer or a Boolean. That makes no sense. And then I was like, yeah, makes makes no sense. We need to fix that out and then I start to search how do we type that? How what types do we have and then I was like, we can't in PHP eight, because this is going to be a new major version. So we are allowed to at least vote for do that. We can check if someone is using types, we can check those types. We are not going to force, we are not going to require, we're not going to evaluate even run static analysis. Nope, we're going to simply check. Okay. Are you saying that this get magic method is going to return anything? Okay, that's okay. Oh, but I want to my guess is that you specifically return a string. That's also okay. As to how to pronounce that liskov mistook principle, right?

Derick Rethans 6:36

The liskov substitution principle.

Gabriel Caruso 7:26

Yeah. And so this is what we're going to basically do with this RFC, there's going to be voted. We're going to simply check if you're using the right types, because, in my opinion, magic methods are a foundation in PHP. As we have theses methods across different code bases across different projects from different behaviours, at least when I'm looking at that code. Okay, I'm looking at this magic method. I know what parameters does it take. I know what return does it have. This is worth less tab to the bug are trying to understand what is happening. Because today maybe I'm debugging a toString method there is return an integer. And I'm like, okay, this is the bug, it's supposed to return a string. But once you ensure those all those signatures, is one less bug that we're gonna have in production.

Derick Rethans 8:17

When are these signatures being ensured?

Gabriel Caruso 8:19

It's not at compile time because he does not have a compile time. But he's when the Zend machine is compiling the code, we have a very specific method that is checking all the modifiers. So for example, the signature that we mentioned before so all the magic methods needs to be public. This has been checked, for example, they callStatic magic method needs to be static. So this has also been checked. And then I'm extending how do we check for signatures for param types and also for return types. So during compilation of the Zend VM.

Derick Rethans 8:52

Taking as example callStatic in the RFC, I see that the name has to be a string and the arguments has to be an array. What happens if you use a different type there?

Gabriel Caruso 9:01

So nowadays if you use a different type that's allowed. So if you say there, you're going to receive an integer, and you're going to receive a string. This is allowed today. And this is what I mentioned about when you are debugging or analyze different code bases, you're going to be like why in the documentation says that we need to receive a string and an array, and there's this specific code base is receiving a string and an integer. So this is what kinds of mismatch I want to avoid. Of course, when using types, because we also know that PHP in some projects does not use types. And that's perfectly fine. If you're not using types, I'm not going to ask you, hey, you need to type those magic methods. Well, what I'm going to do is okay, you're using types and I need to make sure they're using right otherwise this is going to be a mess.

Derick Rethans 9:47

If you type it; say use an integer for the name of underscore underscore get, will give you a warning or a compile error, or parse error? What what kind of feedback which you get back from that?

Gabriel Caruso 9:59

While you are running your code, as soon as that class get referenced, we're going to check. Is not when is initiated, when is not when is called, as soon as I think the autoload detects that class is gonna parse, is going to identify, and then is going to compile and during the compile time that we mentioned. We're going to identify that. So it's going to be early in the stages. Perhaps as soon as you run something or you would upset me, you're going to have that feedback saying: hey, this is not compatible with what we are expecting.

Derick Rethans 10:32

Is that a warning or type error?

Gabriel Caruso 10:34

It's going to be a fatal error, because this is what we are constantly returning with the destructors and constructors.

Derick Rethans 10:41

Yeah, we alluded to mixed already a little bit and the RFC mentioned mixed a few times, of course mixes in the type and PHP yet. So what do you want to do about that?

Gabriel Caruso 10:51

Today we are 11th of May of 2020. Right now we have an RFC voting in PHP to introduce the mixed type. I'm not going to say if I agree or disagree, it's being voted. If that RFC gets accepted then I have already talked with the authors of the that RFC, I'm going to wait until they merge into master. I'm going to rebase and readapt to my RFC, to have those mixed types. And there we go PHP eight probably can have mixed, and probably can already have the usage of mixed in the magic methods. So either No, I'm gonna need to wait for the end of their RFC. If it's approved, there go I need to rebase my PR. In the other case, we are going to keep as comments because we can't ensure that in the compile time with the VM.

Derick Rethans 11:41

At the moment, it looks like that vote will and in May 21. The current votes are 35 to six for passing. So it looks like that will go through

Unknown Speaker 11:50

And then I need to rush because we have the upcoming feature freeze of PHP eight. So I need to make sure that I start to vote and implement my RFC before that time.

Derick Rethans 12:00

Feature freeze should be by the end of July. So I think you have plenty of ime pfor that. And of course you have a release manager, you can make an exception. That's how that works. Usually adding extra checks will have impact to existing code. Is there much impact to existing code here as well?

Gabriel Caruso 12:18

That was the interest question that I made myself. Okay, I'm going to touch the magic methods of PHP. I'm going to break some code in an issue identified those breaking changes in an each map in the RFC. How do I map across many projects, many libraries, many PHP codes out there? How do I do that? I remember that Nikita back in his RFC about the parenthesis origin, like how do we present this ordering and yada yada yada. He made a script, where he went through I think was the top thousand or top 10,000 packages. On packagist, that is the official composer package provider and he identified everything, and ask myself how he did that. And actually was very easy. He just cloned other repositories. He instantiate a new PHP parser instance that is his magic parser. That is behind PHP Stan, is behind psalm, is behind a lot of infection, a lot of big projects, where you analyze the code. So you have a code base where you can analyze and say: Do I have magic methods wrong? And then I run this script, identify, I think six or seven types that were not perfect. Three of them. I have already submitted a request because we're in PHP Unit and I said to Sebastian: hey, this actually is not right. Because I'm proposing this RFC, he was like: Okay, perfect, let's merge it. And the other cases are the cases that I mentioned. For example, with get. Get, you need to return mixed but by the LSP, you can nail down to an integer or a string. So there you go, at least in the top 10,000 packages of composer is not going to be a breaking change. But of course, it's going to be breaking change for people that I can't map. So this is why it's mentioned the RFC that if you're using types with magic methods wrong, we're going to warn you.

Derick Rethans 14:13

But at least it's an easy thing to check for. Because even running all your files through PHP minus L should catch it.

Gabriel Caruso 14:20

Yeah, there you go.

Derick Rethans 14:22

So it's a very easy to check for something. You provided a link to Nikita's script where he checks for those ternairies, do you have a version of your own script available as well?

Gabriel Caruso 14:33

That's interesting. I thought the RFC was updated. So I'm going to update the RFC, because I do have the script locally.

Derick Rethans 14:39

Then I can link to it for the podcast as well.

Gabriel Caruso 14:41

Okay, perfect.

Derick Rethans 14:42

In the future, are you thinking of extending checks to a few more things?

Gabriel Caruso 14:46

So this is something that I fought about this RFC, like how much you want to break and explode people's code. And I think starting with checking types in the signature is the first step. The next step is to actually check the return type. We do that with toString. So for example, although you have type right for maybe, some logic or something is wrong, you're returning an integer. There is a check before the actual type saying you're supposed to return a string you're return an integer. And actually, there is a check in the magic method saying this magic method was supposed to return a string. I think is gonna break even more code because then it's something that I can't measure. So I was like: Okay, let's first start with types and then we can give it next step that is: okay, inside this method, what is being returned, okay, is something different from the signature: explode. You're returning something that I was not supposed to return. But this is not a fight that I'm going to pick. So I leave it up for the next major version of PHP or whatever.

Derick Rethans 15:49

Wouldn't PHP's strict versus weak type mechanism already catch these things. So from debugInfo, if you would type that as returning an array, and then you end up returning an object, which is not necessarily wrong, just not what you expected. PHP's return type checking mechanism should already catch that for you.

Gabriel Caruso 16:13

If you have a magic method typed. If it's not typed, so we can say that some efforts do have that check. And then we're going to expand when we don't have types in the signature.

Derick Rethans 16:24

That's clear now. Do you have anything else to add?

Gabriel Caruso 16:27

The only thing that I want to add that is, I have created another RFC, and this is something that I always tell everyone that is easy to do; is not impossible. Anyone can go there, identify a bug or catch a bug report and then try to fix it. And this is what I'm doing. Like I'll do them to release many of PHP eight. I'm also fixing bugs, improving documentation and everything else. This is something that I try to do and share with everyone. So everyone can also be the next one contributor to the to PHP and it's evolution.

Derick Rethans 16:57

This RFC isn't out for voting yet. You set you want to sort of wait until mixed gets passed or not. What's the reception been so far?

Gabriel Caruso 17:05

So I asked a couple of key members of the PHP community, both internal and external people. They agree, they said that the right approach is to first check for the signature, because if someone is already using types, that project is type friendly, so we can at least play with that. But if someone is not typing, then this is a bigger fight. And then we're going to talk about that in the future.

Derick Rethans 17:29

Thank you, Gabriel for taking the time this morning to talk to me. I've learned a few more things about this RFC, so that's always good to know. And again, congratulations of being the PHP eight release manager together with Sara.

Gabriel Caruso 17:41

Thank you very much. Also thank you for inviting me for this new podcast is amazing. Always listen to all these famous people of PHP that talked with you. And I'm like, Whoa, Derick has invited me this is going to be so much fun. Thank you very much.

Derick Rethans 17:55

Thanks for listening to this installment of PHP internals news, the weekly podcast dedicated to demystify the development of the PHP language, I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to Dderick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 53: Constructor Property Promotion

PHP Internals News: Episode 53: Constructor Property Promotion

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the Constructor Property Promotion RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 53. Today I'm talking with Nikita Popov about a few RFCs that he's made in the last few weeks. Let's start with the constructor property promotion RFC.

Nikita Popov 0:36

Hello Nikita, would you please introduce yourself? Hi, Derick. I am Nikita and I am doing PHP internals work at JetBrains and the constructor promotion, constructor property promotion RFC is the result of some discussion about how we can improve object ergonomics in PHP.

Derick Rethans 0:56

Object economics. It's something that I spoke with Larry Garfield about two episodes ago, where we discuss Larry's proposal or overview of what can be improved with object ergonomics in PHP. And I think we mentioned that you just landed this RFC that we're now talking about. What is the part of the object ergonomics proposal that this RFC is trying to solve?

Nikita Popov 1:20

I mean, the basic problem we have right now is that it's a bit more inconvenient than it really should be to use simple value objects in PHP. And there is two sides to that problem. One is on the side of writing the class declaration, and the other part is on the side of instantiating the object. This RFC tries to make the class declaration simpler, and shorter, and less redundant.

Derick Rethans 1:50

At the moment, how would a typical class instantiation constructor look like?

Nikita Popov 1:55

Right now, if we take simple examples from the RFC, we have a class Point, which has three properties, x, y, and Zed. And each of those has a float type. And that's really all the class is. Ideally, this is all we would have to write. But of course, to make this object actually usable, we also have to provide a constructor. And the constructor is going to repeat that. Yes, we want to accept three floating point numbers x, y, and Zed as parameters. And then in the body, we have to again repeat that, okay, each of those parameters needs to be assigned to a property. So we have to write this x equals x, this y equals y, this z equals z. I think for the Point class this is still not a particularly large burden. Because we have like only three properties. The names are nice and short. The types are really short. We don't have to write a lot of code, but if you have larger classes with more properties, with more constructor arguments, with larger and more descriptive names, and also larger and more descriptive type names, then this makes up for quite a bit of boilerplate code.

Derick Rethans 3:16

Because you're pretty much having the properties' names in there three times.

Nikita Popov 3:20

Four times even. One is the property name and the declaration, one in the parameter, and then you have to the assignment has to repeat it twice.

Derick Rethans 3:30

You're repeating the property names four times, and the types twice.

Nikita Popov 3:34

Right.

Derick Rethans 3:36

What is the syntax that you're proposing to improve this?

Nikita Popov 3:39

The syntax is to merge the constructor and the property declarations. So you only declare the constructor and you add an extra visibility keyword in front of the normal parameter name. So instead of accepting float x in the constructor, you accept public float x. And what this shorthand syntax does is to also generate the corresponding property. So you're declaring a property public float x. And to also implicitly perform this assignment in the constructor body. So to assign this x equals x, and this is really all it does. So it's just syntax sugar. It's a simple syntactic transformation that we're doing. But that reduces the amount of boilerplate code you have to write for value objects in particular, because for those commonly, you don't really need much more than your properties and the constructor.

Derick Rethans 4:40

Besides public, I suppose you can also use protected and private there as well.

Nikita Popov 4:45

That's right. So you can use all the visibility modifiers. Well, public protected private, static does not really make sense. But if we add other modifiers in the future, then those could be used there as well for example, if we add support for read only properties, then of course, you could also write public readonly float x or something.

Derick Rethans 5:09

The RFC talks about desugaring. How's this implemented? Is this transformation on in the AST, or in another way?

Nikita Popov 5:17

This is not an AST transform, but I would say close enough. So we just generate the corresponding property declarations and assignments in the compiler. If you inspect the AST with an extension like PHP AST, you will see the code as written. So with the public in front of the parameter name, but if you inspect the code in reflection, then it will look as if you declared the property explicitly.

Derick Rethans 5:48

So the RFC talks about a few constraints and what you can and cannot do with those promoted properties. One of the things it talks about is nullability.

Nikita Popov 5:58

Well, we have two different nullability semantics in PHP for historical reasons. One is in parameters, where we say, if you use a type that is not explicitly nullable, but you have a null default value, then we make the type implicitly nullable. While for property types, which are newer, we no longer have this implicit behaviour. So if you want to have a nullable property, you do need to explicitly mark it as nullable. Just using a null default value on will result in an error. And the handling is the same here. So if you want to have a nullable promoted property, you have to mark it as nullable

Derick Rethans 6:43

And you cannot just rely on setting the default to null?

Nikita Popov 6:46

Exactly, but I think it's like detail. And really this could go either way. I just prefer the explicit nullability because this seems like the direction we are going to in the future. I don't know if we will ever remove this implicit behaviour. Maybe not. But I think nowadays explicit one is preferred.

Derick Rethans 7:10

Less magic is better.

Nikita Popov 7:11

Less magic, exactly.

Derick Rethans 7:13

The RFC also has like constraints in there. You can also define a constructor in traits and abstract classes. Can you also use a constructor property promotion there as well?.

Nikita Popov 7:23

In traits? Yes, I mean in traits, using it will be a little bit weird. But there is no reason why it can't work. After all traits can have a constructor that will be used in the using class. And traits can also have properties that get imported. So the same mechanism works there as well. It does not work for abstract constructors or constructors in interfaces. The syntax also implies that you have some assignments inside the body of the constructor, and if we have an abstract constructor, then we could not emit these assignments anywhere. We could support it as a special case, like saying that it only declares the properties but skips those assignments. But I know how often you've used abstract constructors, I probably used them like maybe once or twice in all my time working with PHP. So either they really need extra support in that area.

Derick Rethans 8:25

It would also then introduce an inconsistency were promoted properties in abstract classes or abstract class constructors if that's the thing, would be different from normal class constructor property promotion. How does the inheritance work? Is the working in the same way or is there no specific difference in it?

Nikita Popov 8:44

Based on like discussion feedback, I think inheritance is the largest point of confusion with this syntax. The thing is that does not really have any special interaction with inheritance. So you can just follow this like syntactical transformation it does, which does not have any impact on inheritance. But the thing is, if you just look at the code, and you see you have the parent class defining the constructor, and the child class defining the constructor, and then you're wondering, well, is there some kind of connection between the parameters? The promoted parameters declared in one constructor and the other one? And the answer is simply: No, there isn't. Those have nothing to do with each other. And even more generally, constructors are a bit of a special case where inheritance is concerned. So usually, we say that methods always have to be compatible with the parent method. So the signature has to be compatible, the return type has to be well not match, but be contravariant. And similar for the argument types, but this rule does not apply for the constructor. So the constructor really belongs to a single class, and constructors between parent and child class do not have to be compatible in any way.

Derick Rethans 10:09

Are there any types that you can't use for constructor property promotion?

Nikita Popov 10:14

Just callable. Because callable is not a valid property type. Well, there is one more thing that you can't use a variadic argument. Well, if you write a variadic argument, you write something like int, dot, dot, dot, whatever. But the type you're actually writing is int, because that's the type of each individual argument. But all of that gets collected into an array. So the type of the corresponding property would have to be array. So we would have to do an extra transform that's maybe not super obvious. And so I've left this part out.

Derick Rethans 10:50

And also PHP's type system doesn't support defining an array of integers. It only supports describing an array. At a time we're talking about is, at the end of April, this hasn't gone up for a vote yet. When do you think this will happen?

Nikita Popov 11:05

The RFC will need one small adjustment because the attributes RFC is currently in voting and it very much looks like it's going to be accepted. We will need to also consider support for attributes on the promoted properties. I think the only small question there is, what does the attributes apply to? Because this could apply to the parameter or to the property, or both.

Derick Rethans 11:34

How would you actually set these attributes because from what I understand docblocks, you can only use in front of a method name or a property declaration. How would you define a different attribute for each of the promoted properties?

Nikita Popov 11:48

I believe that the attributes RFC already supports attributes on parameters, so that shouldn't be a problem.

Derick Rethans 11:55

So it allows for setting a specific attribute for each of the arguments coming into the constructor. But that didn't quite answer the question. When do you think we'll be voting on this?

Nikita Popov 12:05

Maybe in a week or so.

Derick Rethans 12:06

By the time this podcast comes out?

Nikita Popov 12:09

Well, we have had a lot of activity recently in PHP internals. So I guess we are one of the few places that benefit from the Coronavirus, because people now have time to work on PHP.

Derick Rethans 12:24

Yeah, I mean, I'm looking at so much extra code now. Interestingly, when going to the RFC, and as a side note, it mentioned somewhere that when defining more properties, the line length goes too long, because you now have this extra keyword in there. And that could benefit from then separating the constructor arguments over multiple lines. And that that raises the point is that you can use a trailing comma in arrays when you call functions, but not in argument lists. And I saw that you've also made another RFC for adding the trailing commas in the parameter lists.

Nikita Popov 12:58

So there's like a super simple RFC, just allow that extra comma. This has actually already been discussed a couple of times in the past, and has not, has been declined that point.

Derick Rethans 13:13

I'm just having a quick look at it. Because this RFC is already voting to see what the current votes are, and it's 58 for and one against.

Nikita Popov 13:21

I think like the main counter argument people have against this kind of trailing comma stuff is, well, doesn't that mean that it encourages writing methods with a lot of parameters, which is a bad style. I don't think it does. And I think that even if you don't have a lot of parameters, it's fairly easy to run into line length limitations, because nowadays like to use expressive long parameter names, and expressive long type names, so even without adding an extra protected in front of all of that, you can really easily get signatures that split across multiple lines. In which case having the trailing comma is nice, mainly because we already write it everywhere else.

Derick Rethans 14:12

Except for in arguments to methods, because you can't.

Nikita Popov 14:17

Well, there are also a couple of other places where you can't. For example, like if you have a class implements, and then implements many interfaces, then you can't put a trailing comma after the last interface. And this is something we could also allow. But I think the relevant distinction there is that this is kind of a freestanding list. Um, it's not wrapped inside brackets, or parentheses. So it kind of looks a little bit weird if you have a trailing comma there, which is possibly also why previous RFC on that simply allowed trailing comma everywhere did not pass.

Derick Rethans 14:58

As I said, it looks likely that will pass.

Nikita Popov 15:01

Yes, I think it's unlikely that we're going to get 13 new no votes.

Derick Rethans 15:07

What I also find interesting is that an RFC that you've mentioned earlier in the episode is that attributes are going to pass as well. At the moment, there's only one no votes there as well, which surprised me because the last time attributes was discussed was very much not going to pass whatsoever.

Nikita Popov 15:27

Yeah, this is an interesting effect. It's hard to say why it happens. Probably, well, part of the reason is just that issues that were raised on previous proposals have been addressed. For example, the last one by Dmitri had the very controversial aspects where it's exposed the AST. The abstract syntax tree representation of the attributes, which has gone from this one, and thus removes one of the contentious issues. But I think another part is just that sometimes it takes multiple proposals to really get an idea through internals. We have this situation pretty commonly that though the first RFC fails, second RFC fails, and then the third one does pass.

Derick Rethans 16:18

It's also it's taken five years or so. And people's opinions might just change about these things.

Nikita Popov 16:23

Exactly. The previous proposals might just have been before their time.

Derick Rethans 16:29

I saw you had made one other tiny RFC, which is the stricter type checks for arithmetic slash bitwise operators. What is that about?

Nikita Popov 16:40

Very simple. So if you're write, well, x minus y, and x is an array. And y is a resource, like what do you expect the outcome to be? There is really no reasonable way that can work. So this RFC proposes to make the arithmetic and the bitwise operators, when working on arrays, when working on objects, and working on resources, simply throw an exception. And the motivation for that was the operator overloading RFC, which has in the meantime been declined. But still, this was a concern raised there that while you can overload operators for objects, but you still get pretty weird behaviour if an overloaded operator is missing, because we currently handle that with just a otice and assuming that the object is equal to one, which is usually not a useful or desired behaviour.

Derick Rethans 17:39

There is of course, one exception where you can still use an arithmetic operator, which is the plus between arrays.

Nikita Popov 17:46

That's right, yeah. So array plus array is similar to an array merge operation. And that one is of course, well defined and remains supported

Derick Rethans 17:55

Whereas things like true divided by 17, although not sensible, it'll continue to work.

Nikita Popov 18:00

Right, that also. Yeah, so because this is simply a much more contentious issue whether, like implicitly treating true as one is a good idea or not. Personally, I know I have written code where I, for example, add up booleans. Just as a count of how often something is true. This is like maybe maybe, style wise it would be better to write an explicit integer cast. But the code is also not really wrong. This may be as a discussion for another time.

Derick Rethans 18:33

As we've said before, the smaller the RFCs, the easier it is to get them passed as well. Alright, Nikita, thanks for taking the time this morning to talk to me about constructor property promotion RFC, and a few others. We'll see whether they get passed for PHP eight.

Nikita Popov 18:48

Thanks for having me Derick, once again.

Derick Rethans 18:52

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 52: Floats and Locales

PHP Internals News: Episode 52: Floats and Locales

In this episode of "PHP Internals News" I talk with George Banyard (Website, Twitter, GitHub, GitLab) about an RFC that he has proposed together with Máté Kocsis (Twitter, GitHub, LinkedIn) to make PHP's float to string logic no longer use locales.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 52. Today I'm talking with George Banyard about an RFC that he's made together with Mate Kocsis. This RFC is titled locale independent floats to string. Hello, George, would you please introduce yourself?

George Banyard 0:39

Hello, I'm George Peter Banyard. I'm a student at Imperial College and I work on PHP in my free time.

Derick Rethans 0:47

All right, so we're talking about local independent floats. What is the problem here?

George Banyard 0:52

Currently when you do a float to string conversion, so all casting or displaying a float, the conversion will depend on like the current local. So instead of always using like the decimal dot separator. For example, if you have like a German or the French locale enabled, it will use like a comma to separate like the decimals.

Derick Rethans 1:14

Okay, I can understand that that could be a bit confusing. What are these locales exactly?

George Banyard 1:20

So locales, which are more or less C locales, which PHP exposes to user land is a way how to change a bunch of rules on how string and like stuff gets displayed on the C level. One of the issues with it is that like it's global. For example, if you use like a thread safe API, if you use the thread safe PHP version, then set_locale() is not thread safe, so we'll just like impact other threads where you're using it.

Derick Rethans 1:50

So a locale is a set of rules to format specific things with floating point numbers being one of them in which situations does the locale influence the display a floating point numbers in every situation in PHP or only in some?

George Banyard 2:06

Yes, it only impacts like certain aspects, which is quite surprising. So a string cast will affect it the strval() function, vardump(), and debug_zval_dump() will all affect the decimal locator and also printf() with the percentage lowercase F, but that's expected because it's locale aware compared to the capital F modifier.

Derick Rethans 2:32

But it doesn't, for example, have the same problem in the serialised function or say var_export().

George Banyard 2:37

Yeah, and json_encode() also doesn't do that. PDO has special code which handles also this so that like all the PDO drivers get like a constant treat like float string, because that could like impact on the databases.

Derick Rethans 2:53

How is it a problem that with some locales enabled and then uses a comma instead of the decimal point. How can this cause bugs and PHP applications?

George Banyard 3:02

One trivial example is if you do, you take a float, you convert it, you cast it to string, and then you cast it back to float. If you're on a locale, which is the dot decimal separator, you will get back the original float. However, if you have like locale which com... which changes the decimal separator, like the German one, you'll get a string; you'll get like three dash, three comma 14, and then when you convert it back to float, you will only get three because PHP doesn't recognise the comma as a decimal separator in its string to float conversion and so it will loses the decimal information.

Derick Rethans 3:39

That doesn't seem particularly very useful as a feature. So my question here is we talked about floating point numbers and, and I think floating point numbers have other issues as well. Not sure whether we want to go into the details of how floating point numbers and computers work, but we can if you want to.

George Banyard 3:56

The easy way to explain floating points is to use like exponential notation, or to use the scientific exponential notation, which most people will know from engineering or physics, where you usually have like, one significant like the number, like a comma, a couple of numbers, and then you have like an exponent which raises it to usually, so to your power 10 to the something, which then gives you an order of magnitude. Floating points, basically that but in base two.

Derick Rethans 4:26

Positions have magnitudes attached to them. They're all powers of two.

George Banyard 4:30

Yeah.

Derick Rethans 4:31

And of course, when we use numbers an decimal, like pi being a bad example.

George Banyard 4:36

Once said.

Derick Rethans 4:37

I was going to say if you divide 10 by three, you get 3.33333 that never ends, right. And I reckon if you have a specific number in decimal like three point 14, then you can't necessarily always exactly represent it in binary.

George Banyard 4:55

Yeah, one common example would say it's like one 10th which has like a perfect representation in decimal. But like in binary is a never ending repeating sequence. When you try to like display naught point one, like how it's saved in floating point, it's really weird and everything to get like these rounding errors which can propagate.

Derick Rethans 5:15

And hence you often hear people recommend to never use float for things like monetary values, but then as you said that you sentence that right?

George Banyard 5:23

Yeah, put everything in integers and work with integers and just like format it afterwards.

Derick Rethans 5:29

So let's get back to what you and Mate are actually suggesting to change. What are the changes that you want to make through this RFC?

George Banyard 5:36

The change's more or less to always make the conversion from float to string the same, so locale independent, so it always uses the dot decimal separator, with the exception of printf() was like the F modifier, because that one is, as previously said, locale aware, and it's explicitly said so.

Derick Rethans 5:56

Doesn't printf also have other floating related format specifiers? I believe there's an E and a G as well. And uppercase F. What is the difference between these?

George Banyard 6:06

Lowercase F is just floating point printing with locale awareness. Capital F is the same as lowercase, but it's not locale aware. So it always uses the dot decimal separator. Lowercase E is, what I've learned recently also locale aware, and uses the exponential notation, like with a lowercase e. Uppercase E is the same as lowercase E, but instead of having a small like a lowercase e in the printing format, it's a uppercase E, and lowercase G has some complicated rules onto when it decides which format to choose between lowercase F and lowercase E, depending on like how big like the number of significant digits are after the comma, or like the dot. And uppercase G is the same but using uppercase F and uppercase E instead of lowercase E and lowercase F.

Derick Rethans 6:58

And all of them can be locale dependent then except for uppercase F.

George Banyard 7:02

Yeah.

Derick Rethans 7:02

Do you think this is going to impact people's applications, if you change the default of normal casts to be locale independent?

George Banyard 7:10

I would have expected it to not be that significant. And only that would affect displaying floating point. So if you're like in Germany, instead of like seeing a comma, you would now see a dot, which can be annoying, but I wouldn't imagine is the most, the biggest problem for you like end users. But apparently, people made tooling to work around the locale awareness of it. And so they could maybe break with passing stuff, which I suppose that happens because it's been, PHP's 25 years old. And that behaviour has been there for like ever. So people worked around it or work with it.

Derick Rethans 7:49

Is this going to be purely a displaying change or something else as well?

George Banyard 7:54

For example, if you would send like a float to like an API via HTTP, you would usually already need to have like code around to like work around like the locale awareness, or like all by resetting set locale or by using number_format or like sprintf or something like that. Because most other APIs or like you would like contact would expect like the float to use like a decimal point. PHP. If you do the string to float conversion again, which was not a point, then you get only an integer basically.

Derick Rethans 8:27

Because PHP's parser, strips it out once it stops recognising digits, which is in this case, the comma.

George Banyard 8:33

Yeah, that would make the code nicer. The main reason why me and Mate like decided to propose this RFC is because like most APIs, and also databases and everything, expect strings to be formatted in like a standard way. Currently, like if you for whatever reason, use a locale, then it's not, but yeah, like apparently people worked around that when they were maybe stripping stuff from like HTML whatever displayed and try to work around it because that got raised in the list quite recently.

Derick Rethans 9:06

This change does not necessarily remove the ability of using locales for formatting numbers, because PHP still has the lowercase F as format specifier for printf. And sprintf and friends. Does PHP have other ways of rendering numbers according to locales?

George Banyard 9:24

According to locales? I don't think so. You can format it something like manually, or the number format a class from the Intl extension.

Derick Rethans 9:35

Yeah, from what I understand, number_format, you have to do it all by yourself. And the intl extension doesn't support the posix or C locales from the operating system, right. It uses its own locale rule set from the Unicode project. The RFC lists some alternative approaches. Would you mind touching a little bit on these as well?

George Banyard 9:58

One of the alternatives approaches is to deprecate setlocale altogether. Because as a byproduct, this just fixes the issue because you can't define any locale anymore. So, there will always be locale independent. This has been discussed like in back in 2016, mostly because of the non thread safe behaviour. Because it affects global states and everything. But at the time, the conclusion was, because HHVM, like did a patch, making a thread safe, setlocale function was to mimic this patch and like implement it into PHP, which hasn't been done yet. Another one that we thought about was to deprecate kind of the behaviour and like raise a notice, like a deprecation notice, because that would happen like basically on every float to string conversion. The penalty, like the performance penalty, seemed pretty like strong. One other thing we considered was with Mate was to deprecate the current behaviour in some way. However, emitting a deprecation notice on basically every float to string conversion seemed not to be ideal. And just like flood, the log, the log output, and like also bring like a performance penalty because like outputting warnings isn't like most friendly thing to do performance wise.

Derick Rethans 11:21

What has the feedback been so far?

George Banyard 11:24

Feedback currently has been that like most people, well, one person because there hasn't been that much feedback.

Derick Rethans 11:30

There hasn't been that much feedback because you've only just proposed?

George Banyard 11:33

some of the feedback we got officiates the change However, they have concerns about like the modification of like, in every case for locales without having any upgrade paths. In some sense. It's just, oh, you have the change, and then you need to execute it and see what breaks. We may be currently considering like ways to figure that out, maybe by adding a temporary ini setting which would kind of be like a debug mode, where when you use that it would like emit notices when like this conversion would happen before and they would notice: Oh, this is not happening anymore. You need to like be aware of this change in behaviour

Derick Rethans 12:17

Did we not used to have E_STRICT for this at some point or E_DEPRECATED?

George Banyard 12:24

E_DEPRECATED is still a thing. E_STRICT got mostly removed with PHP seven. There've been like a couple of remaining notices which I got rid off or put back to normal E_WARNINGS or E_NOTICES in PHP seven point four. There were like two or three remaining. But yeah, like so that's one way to maybe approach it of like implementing a debug ini setting which would only be used for like dev because then where if you get like warnings and everything, you don't really care about the performance impact. And then in production, you would like disable that and the warnings wouldn't pop up.

Derick Rethans 12:56

How would that setting be any different from just putting it behind an E_DEPRECATED warning?

George Banyard 13:00

So with an E_DEPRECATED warning, we would need to show this behaviour, and we would need, and we could only change the behaviour in like PHP nine. Currently if we do that with like debug setting, we could change it with PHP 8.

Derick Rethans 13:13

That's a bit cheating isn't that?

George Banyard 13:15

Could say so.

Derick Rethans 13:16

I'm interested to see how this ends up going. Do you have any timeframe of when you want to put it for a vote?

George Banyard 13:23

Currently, we've only started this discussion. And I think until we figure it out, if we get like an upgrade pass, or multiple upgrade passes that we could then put into a secondary vote. I wouldn't expect it to go to voting that soon. Maybe end of April would be nice.

Derick Rethans 13:41

So around the time when this podcast comes out?

George Banyard 13:44

Ah! For once!

Derick Rethans 13:46

For once I got my timing right.

George Banyard 13:49

Yes. Don't you have like the string contain one which just got out.

Derick Rethans 13:53

Yes.

George Banyard 13:54

Then that vote close like last week.

Derick Rethans 13:57

Yeah, it's really tricky because there's so many, so many small now that I can't keep up.

George Banyard 14:02

Yeah, Mark also did like his debug.

Derick Rethans 14:04

Yeah. And there's like two or three tiny ones more that I would quite like to talk about. But by the time there's an opening in the schedule, it's pretty much irrelevant. So I'm trying to see whether I can wrap a few of the smaller ones just in one episode because there's the throw expression, the is literal check, and typecasting in array destructuring expressions, and all showed up in the last three days.

George Banyard 14:26

I suppose people have like, lots of time now. Now, it's a taint checker, basically, like I know, there's been like this paper by Facebook like six or eight years ago, which talks about how they kind of tried to implement in their static analyzer, but like, a static analyzer doesn't need to be something in the engine. That's what I don't really get.

Derick Rethans 14:45

Thank you, George, for taking the time this afternoon to talk to me about a locale independent float to string RFC.

George Banyard 14:53

Thanks for having me on the podcast again. Derick.

Derick Rethans 14:55

You're most welcome. Thanks for listening to this installment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 51: Object Ergonomics

PHP Internals News: Episode 51: Object Ergonomics

In this episode of "PHP Internals News" I talk with Larry Garfield (Twitter, Website, GitHub) about a blog post that he was written related to PHP's Object Ergonomics.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 51. Today I'm talking with Larry Garfield, not about an RFC for once, but about a blog post that he's written called Object Ergonomics. Larry, would you please introduce yourself?

Larry Garfield 0:38

Hello World. My name is Larry Garfield, also Crell, CRELL, on various social medias. I work at platform.sh in developer relations. We're a continuous deployment cloud hosting company. I've been writing PHP for 21 years and been a active gadfly and nudge for at least 15 of those.

Derick Rethans 1:01

In the last couple of months, we have seen quite a lot of smaller RFCs about all kinds of little features here and there, to do with making the object oriented model of PHP a little bit better. I reckon this is also the nudge behind you writing a slightly longer blog post titled "Improving PHP object ergonomics".

Larry Garfield 1:26

If by slightly longer you mean 14 pages? Yes.

Derick Rethans 1:29

Yes, exactly. Yeah, it took me a while to read through. What made you write this document?

Larry Garfield 1:34

As you said, there's been a lot of discussion around improving PHP's general user experience of working with objects in PHP. Where there's definitely room for improvement, no question. And I found a lot of these to be useful in their own right, but also very narrow and narrow in ways that solve the immediate problem but could get in the way of solving larger problems later on down the line. So I went into this with an attitude of: Okay, we can kind of piecemeal and attack certain parts of the problem space. Or we can take a step back and look at the big picture and say: Alright, here's all the pain points we have. What can we do that would solve not just this one pain point. But let us solve multiple pain points with a single change? Or these two changes together solve this other pain point as well. Or, you know, how can we do this in a way that is not going to interfere with later development that we've talked about. We know we want to do, but isn't been done yet. So how do we not paint ourselves into a corner by thinking too narrow?

Derick Rethans 2:41

It's a curious thing, because a more narrow RFC is likely easier to get accepted, because it doesn't pull in a whole set of other problems as well. But of course, as you say, if the whole idea hasn't been thought through, then some of these things might not actually end up being beneficial. Because it can be combined with some other things to directly address the problems that we're trying to solve, right?

Larry Garfield 3:07

Yeah, it comes down to what are the smallest changes we can make that taken together have the largest impact. That kind of broad picture thinking is something that is hard to do in PHP, just given the way it's structured. So I took a stab at that.

Derick Rethans 3:21

What are the main problems that we should address?

Larry Garfield 3:24

So the ones that identify that people have been talking about are the following. One is constructors are just way too verbose. If you've looked at almost any PHP class, in almost any framework, the most common pattern is: you start with a class, you declare three to five properties that are private or protected. Then you have a constructor that takes three to five parameters and assigns each of those to those properties. Usually the names match all the way through, types match all the way through. It's all it's doing is shoving those parameters into properties. Right now, you have to repeat each property name four times total. It's just way too verbose. It's just more typing than we should be doing. And so there have been various proposals for ways to have to type less to do that.

Derick Rethans 4:11

We'll get to the solutions in a moment, I'm sure.

Larry Garfield 4:14

The next one is what I've called the bean problem. So I've referenced to Java beans. For those who have not worked with Java before. And I haven't worked with it in a long time. But when I last did, this was standard, you'd have what's called a Java bean, which is just a Java class that has a bunch of properties that are private, and then a getter and a setter for every single one of those properties. PHP, you see the same pattern a lot, especially in ORMs. Largely that comes down to this makes serialisation and deserialization straightforward because you can access properties through a method, you know, the names, automatic naming and so on. But that's again, an awful lot of typing to bypass the private and protected keyword. So how can we reduce the mental overhead of that and just have access to what we need to with less work. That relates to a lot of the reasons for that is immutable objects. So it's been increasingly popular in PHP in recent years to have objects that even though the language doesn't support immutability are effectively immutable, in that the object doesn't give you a way to change its properties. But it gives you a way to create a new object that is the same, but with certain changes. Think DateTimeImmutable in PHP core, or it has a modify() method, which doesn't change the objects in place. You see, if you call a DateTimeImmutable object, call it with the modify() method with a parameter of plus one week you get back a new DateTimeImmutable object, that is the timestamp one week later. That pattern is increasingly common. PSR-7, the HTTP messages spec uses that a lot of other packages have started doing it. The way that usually ends up working is these wither methods. It's with some value, with some some property name and so on, similar to a setter, but it returns a new object and there's a common pattern for that now. Another problem is materialised values, where you have something that conceptually is a property. And to a outside caller, it really should just be a property. But you want to not have it be a full property itself. The example I use the kind of the canonical example is you have a first name property and a last name property and you want to format a full name property. There's a lot of cases like that. Right now, you do that as a method, and you have some kind of static cache internally. Which works. It's just: Can we make that better? And can we not make it worse with any of these other changes? A lot of this comes down to how do we make not make any of these problems worse. Another problem is, for lack of better term, and what I call the documented property problem, where if you have a large constructor, then you're going to pass in a bunch of different values because they all map to properties, but you need to keep track of: Okay, which one of these is which? And especially comes up for value options, rather than service objects. Were introduced in C, or Rust or Go would just be a bare struct, essentially, which PHP doesn't have. And we can get to why I think that's okay, we don't have. But objects where you really just have a combination of properties, and that's okay. But you still need to keep track of them, you want to be able to create an object that has only some of them. And if you have eight optional properties, and you want to just set the last one, right, now you have a bunch of nulls or question marks, or empty quotes, or zeros, or whatever default value, and again, it's just very cumbersome. And so the kind of the question I was looking at is, how can we make all of these better and not make any of them worse? That's kind of the problem space. I think most people can relate to, at least most of these.

Derick Rethans 7:46

I would think so to certainly in some of my code, where that's been the case. Hopefully, that was all the problems you found.

Larry Garfield 7:53

I think I got all of them.

Derick Rethans 7:55

As I alluded to, in the introduction, there have been quite a few smaller RFCs already to address some of the problems that you just mentioned. Which you list and as well as others in things that you have found that multiple people currently already do. Should we have a quick look at what these things are?

Larry Garfield 8:15

One of the proposals that I looked at was writeonce properties, as we are recording this, there's an RFC for that that's in voting. Although it looks like it's probably not going to pass that the vote stays where it is. Now, the idea there is allow typed properties to have a read only marker on them just like the type or public or private, and then they can only be written to once if they're uninitialised you can write to them, after that they're just stuck that way. The advantage is that would make them safe to expose publicly. And so you can have a property that you can expose to the world just access a property but not be concerned about someone changing it out from under you. The downside of that mainly comes down to that evolvable immutable object where that with method then becomes a lot harder, because you can't say: clone this object and change this one property because well, you can't change this one property, you'd have to fully construct a new object. There's also two different proposals that have been floated recently for compact object property assignments. I think they have different names for the same basic idea. Basically, if an object has public properties, being able to write to those in one shot in a code block, along with the constructor in a named fashion. It's essentially there's a common pattern now where you pass an associative array to a function which has a bunch of named properties, and then you can put them in whatever order you want. And then you know, dissect those and map those to properties internally. It's essentially taking that idea and baking it into the syntax, which does help and gives you when you have a lot of properties that are optional. It makes it a lot easier to you have a lot of properties defined or a lot of parameters defined it makes it a lot easier to piecemeal select them. The downside is all of those proposals to date only work on public properties, which have a long list of challenges with them. It also means you're bypassing any kind of validation around this property is only valid if this property is set, or this property has to be less than this property, and so on. Those are too limiting, but definitely they're trying to solve a real pain point.

Derick Rethans 10:19

Nor can you enforce types through that, of course.

Larry Garfield 10:21

Some of them I think, might be able to

Derick Rethans 10:23

I meant associative arrays.

Larry Garfield 10:25

Yeah, the associative array approach you can do now, which is really the only possible thing I can say in its favour is that it works today. Type enforcement isn't there, it's poor for documentation. Please don't do that. All these are dancing around names parameters, which is a different language feature that's been discussed on and off for many, many years. I don't know of any current RFCs on the table for this one, but it's come up many times. Number of languages have this Python has it for example, where give or take whatever syntax instead of specifying, call this function with parameters, one, seven and 19, and then you have to guess what those numbers mean, you can call a function with count equals one, order equals ASC, whatever. And then you can reverse the order, change the order around. It's essentially the same idea. But for function parameters rather than Object Properties. Again, there's implementation challenges there. But certainly there are languages that do it successfully. Another problem space people have been looking at is access control. So we mentioned the the read only property. In the discussion for that Nicholas Grekas, made a suggestion for having instead of having a read only flag, allow the access control on a property to be different for read and write. So you could have a property that is publicly readable but not writable. But private writable, or private and protected writable. That gives you many the same benefits as the read only flag would have, but without breaking some of the current patterns we have around cheap cloning of objects and so forth.

Derick Rethans 11:58

Because of course in PHP, PHP's object oriented system is based on classes, not on objects. You can access read and write private properties of other objects as long as they have the same class.

Larry Garfield 12:10

Correct. And that's something that we take advantage a lot of in cloning, to hold wither method style is based on that. If that feature of PHP went away, it would break an awful lot of code. So don't change that. Other things have been on the table. People have talked in the past about constructor promotion, which is a feature that a couple of languages have including Hack, which is the Facebook PHP fork. The basic idea there is, instead of repeating properties once for their declaration, once in the constructor, and then twice in an assignment, you just declare them as part of the constructor. And it becomes essentially a macro to expand that out to the same original code. Hack already has a syntax for that. This one actually has been a proposal for PHP before and it didn't pass.

Derick Rethans 12:57

Was it proposed in the exact same syntax as Hack? I don't believe so because Hack had types at the moment, and PHP did not.

Larry Garfield 13:05

The earlier syntax, I was just looking at that RFC earlier today, used public function constructs this arrow foo, comma, this arrow bar. And then you still had to declare the properties independently, so it only solves half the problem. And the syntax looked kind of weird. The Hack syntax just lets you put the entire property declaration in place of the parameter in the constructor line, and it fills in all of the other pieces. You have public function, construct, parentheses, private int, a number, private bar, some bar object, and so on. And it would automatically create that property on the class and take the parameter and promote it and do the assignment for you. So that's what Hack does. I believe TypeScript has something similar, although I haven't worked with it. It's again just simplifying that common case. Another non PHP place I look for inspiration is Rust, because Rust does immutable objects very well. And so I figured, alright, let's let's look what other languages are doing. What Rust does, they have objects that are more bare than PHP does, much like Go where it's really a struct to which you can attach methods rather than an enclosed object, but they let you create a new object. Here, the object constructor syntax is essentially named parameters already, you're essentially providing a Json like block of this property of this value, this property should have this value, similar to the object constructor proposals. But you can then say, dot dot some other object of the same type, which Rust reads as: and fill in anything I haven't specified with the values from this other object. The fallout of that is making new object that is the same as this other object, but for this one change really easy. Could we do something like that either using Rust syntax or something else just conceptually, would that work to make with the with style methods easier, possibly would it help bypass the problems with a read only flag and so on. Finally, kind of the granddaddy of them all proposal in PHP from a couple of years ago is property accessor methods. This is a very contentious RFC, it didn't pass mostly for performance reasons, as I understand it. But the idea here was you could declare a property to have a dedicated getter and setter method. And then when you try to read or write a property, that method gets called transparently in the background. It's essentially the same idea as the magic get and magic set methods on objects, but specifically for each property, which can then eliminate a lot of: if we're talking about this property, if we're talking about that property gives you a lot more flexibility. It also allows you to then, because those are methods, control the access of those methods separately for get and set. So you can have a public getter and private setter method. A number of other languages have this, Python does, JavaScript does. So I included that okay, this has been a proposal on the table before, I personally really like it. The only downside is the performance impact because since people can't really know in advance if a property it's going to be accessing is guarded by methods like this or not, it means every property access, therefore has an extra if statement around it in the engine. And the performance impact of that, well, small, individually, really adds up when you're talking about 10s of thousands of property accesses. As I understand that, that was the main reason that it didn't pass before. I don't have a good solution for the performance issue. Unfortunately, it would be delightful if you know the typing system would let us do that. Or if the JIT would do something there. I have no idea that's well out of my wheelhouse.

Derick Rethans 16:34

That's lots of solutions that people have come up with in the past and haven't made RFCs for yet. Solving them all one by one, as you mentioned isn't particularly useful thing to do. Because, as you say, you end up in a jumbled mess of things. Your article continues to have an analysis section about all the different aspects of all the different problems and solutions that we've just mentioned here. What's your thinking here, how to join up all the dots?

Larry Garfield 17:00

My goal was alright, as I said, what's the minimum amount of change we can do, that gets us the maximum benefit and solve as many problems as possible without making anything worse? Is there a way that we can make some problems not their own problem, but the result of some other problem? Can we make one a degenerate case of another and thereby solve, kill multiple birds with one stone essentially? What I came up with was: one, constructor promotion on its own, I think is very useful. Let's do that. Named parameters on their own are very useful, let's do that. The combination of constructor promotion and named parameters together gives us the equivalent of a object initialization syntax. The specific symbology in the syntax may look slightly different. But essentially you get the same net effect where you could say, hey, new product object and pass it a series of key values and you're done. And the object itself is defined as just a bunch of key values in the construct statements, and no body, and that still gets promoted. So we end up with struct like, or record like objects with relatively little syntax as kind of a side effect of these two other changes that have good arguments for them on their own.

Derick Rethans 18:14

And also without introduce a new concept such as struct.

Larry Garfield 18:18

Exactly. There's also discussion about, should we just introduce a separate language construct for a struct or a record, that is just their properties, possibly some validation, they will pass by value instead of by reference, which makes immutability easier, to design those for immutability. I've toyed with that idea in the past. And every time I come down to eventually I'm going to want to do everything that classes do anyway. Or if they do something special, I'm going to want to do those in classes, except for the way they pass. Legitimately, there's cases where we would want to have a value object that passes in a more by value style instead of the pseudo reference that objects passed today. There are use cases for that, that's really the only difference. Everything else is essentially the same in both cases, it's more work than is needed to try and create a whole separate construct there. Instead, let's make this one construct flexible enough that we can use it in either way, at whatever use case makes sense. I think those two changes together give us the most bang for the buck and don't harm anything else.

Derick Rethans 19:16

Both of these two proposals help to solve the first problem that you have outlined, which is the problem with constructing objects. So the other problem that we spoke about is the value object and access to properties for example. Have you come up with a solution of which proposals would work towards solving that problem as well?

Larry Garfield 19:36

My proposal on that front, based on what's available, is so I like Nicholas's idea of separate access control for read and write. Okay, now what syntax can we use for that that is going to be self explanatory and readable and not block property accessors if we ever get to the point of figuring out how to do those performently. I don't think we can go all the way to property accessors right now, I would love to, but I don't think that's feasible. Instead, we can borrow some of the syntax from that proposal and let you declare hard to explain this in verbal format. It's like: string name, curly brace, public get, private set, curly brace. Which is essentially the syntax that the property accessor proposal RFC had, but with the method bodies removed, which that RFC actually supported anyway. And what that gives us is then a syntax to say, this property has different visibility for reading and writing, for get and for set, in a way where it's natural to be able to add in functionality to that later for getters and setter methods. If we figure out how to do it. There are probably other syntaxes that could do the same. I'm flexible. I think the key here is some sort of syntax that gives us that split visibility in a way that opens itself to future extension, rather than just throwing more keywords before a property and hoping it works out for the best. And once you've done that, then I think it's worth it to consider: could we do some kind of Rust like cloning or Rust like creation process? I don't know. It could be a variant on cloning. People have proposed a clone this with and then list of properties. And that, essentially de-sugars into creating that new object and then calling a bunch of property set commands. Maybe that's viable. Maybe it's not I'm not sure. Maybe using a syntax closer to what Rust has so that certain thing parameter lists can get auto populated, I don't know. But I think that's an area worth exploring, and would be a nice add on to these others, but it's not a prerequisite. The thing I like about what I'm proposing here, each of these individual pieces carries value on its own. And there's a good reason to vote for each of these on their own, but they dovetail together so that the whole is greater than the sum of the parts. And I think that's the mark of good design where you don't solve each individual problem. You have tools that together solve several problems. It just kind of falls out of the design.

Derick Rethans 22:06

Of course, at the moment you wrote this blog post, none of these proposals had more to it than your description in your article.

Larry Garfield 22:15

Some of them had old RFCs that had been proposed and either didn't make it to a vote or the vote gone slightly negative for various reasons. But yeah, I did not have any patches. My C skill is still extraordinarily limited. That this was a discussion starter, not a here's an RFC with code.

Derick Rethans 22:32

Of course, we are no day and a half or two days later. And now there is of course, an RFC for one of them, which is the constructor promotion, which pretty much as we spoke about earlier, picks up Hacklang's syntax and ports it to PHP.

Larry Garfield 22:47

Yes, I've concluded that my primary role in PHP internals is inspiring Nikita to go write things.

Derick Rethans 22:53

And you were successful in this case.

Larry Garfield 22:56

A year ago, I was on this podcast with you talking about comprehensions, when I was pushing for those, and those never happened. But out of that discussion, Nikita noticed, oh yeah, short lambdas I should go finish those and then went and finished that RFC. My role is convincing Nikita, he should do things. So I consider that a worthwhile contribution.

Derick Rethans 23:13

Fair enough. I agree. Anyhow, it would be interesting to see where this ends up going. We are about, what three, three months away from PHP 8.0's feature freeze. So there's plenty of time to look at these other three proposals that you concluded would be great to have altogether.

Larry Garfield 23:32

I'm happy to work with anyone who actually does know, working on internals on any of these. Personally, I think the asymmetric visibility is the next one after constructor promotion. That's straightforward to do. I know Levi Morrison on the lists has suggested that named parameters has a lot of other gotchas around it that I didn't get into here. And that is very likely. There may very well be implementation reasons why these are harder than I present them as. I fully acknowledge that. But again, if any of these individually, I think still moves the language forward in a way that doesn't close off future avenues.

Derick Rethans 24:07

Do you think you'll end up learning some C to be able to work on this yourself?

Larry Garfield 24:11

So I used to work in C briefly, 16 years ago. I had a very, very short career writing software for Palm OS.

Derick Rethans 24:18

And I remember us talking about it, when we recorded episode last year.

Larry Garfield 24:22

And I did some C again, just recently, while playing with FFI. As we've discussed before, the PHP engine is not written in C, it's written in a macro language that is written in C. There's a learning curve there that I have yet to scale.

Derick Rethans 24:34

Fair enough.

Larry Garfield 24:35

If someone wants to mentor me in that while we work on one of these, I am very open to that. So putting that out there.

Derick Rethans 24:40

You might be inundated by messages now, you never know.

Larry Garfield 24:43

Better that then getting ignored

Derick Rethans 24:45

Do you have anything else to at?

Larry Garfield 24:46

I think it's beneficial for PHP collectively to take this broader approach of, not just okay, what can solve this immediate problem in front of us, we can scratch this one itch, but what are all the itches that we have that need to get scratched? And how can we solve all of those in a way that is going to have the best bang for the buck. And let us do the least amount of work at the least amount of syntax, least amount of conceptual overhead, and yet give us the most flexibility. And there's been a lot of talk anytime we're talking about the PHP type system of we eventually want generics, generics are hard. But let's make sure that whatever we do, doesn't make generics even harder. I think that's good that we have this goal in mind. And we're: all right, what iterative steps get us closer to that without locking us, in without painting us into a corner. And that's kind of what I'm trying to do here. And I would very much encourage everyone working on PHP to take that approach of: don't solve the immediate problem, look at the broader picture, what will solve multiple problems, what will dovetail nicely with something else and what kind of big picture plan in architecture we can look at that ends up making the language better rather than just looking at our feet.

Derick Rethans 25:57

Well, thanks for taking the time this afternoon to come and talk about the object ergonomics. We'll see how much of it ends up in PHP eight.

Larry Garfield 26:05

Fingers crossed.

Derick Rethans 26:07

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next week.


PHP Internals News: Episode 50: The RFC Process

PHP Internals News: Episode 50: The RFC Process

In this episode of "PHP Internals News", Henrik Gemal (LinkedIn, Website) asks me about how PHP's RFC process works, and I try to answer all of his questions.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 50. Today I'm talking with Henrik come out after he reached out with a question. You might know that at the end of every podcast, I ask: if you have any questions, feel free to email me. And Henrik was the first person to actually do so within a year and a half's time. For the fun, I'm thinking that instead of I'm asking the questions, I'm letting Henrik ask the questions today, because he suggested that we should do a podcast about how the RFC process actually works. Henrik, would you please introduce yourself?

Henrik Gemal 0:52

Yeah, my name is Henrik Gemal. I live in Denmark. The CTO of dinner booking which does reservation systems for restaurants. I've been doing a PHP development for more than 10 years. But I'm not coding so much now. Now I'm managing a big team of PHP developers. And I also been involved in the the open source development of Mozilla Firefox.

Derick Rethans 1:19

So usually I prepare the questions, but in this case, Henrik has prepared the questions. So I'll hand over to him to get started with them. And I'll try to do my best to answer the questions.

Henrik Gemal 1:27

I heard a lot about these RFCs. And I was interested in the process of it. So I'm just starting right off here, who can actually do an RFC? Is it anybody on the internet?

Derick Rethans 1:38

Yeah, pretty much. In order to be able to do an RFC, what you would need is you need to have an idea. And then you need access to our wiki system to be able to actually start writing that, well not to write them, to publish it. The RFC process is open for everybody. In the last year and a half or so, some of the podcasts that I've done have been with people that have been contributing to PHP for a long time. But in other cases, it's people like yourself that have an idea, come up, work together with somebody to work on a patch, and then create an RFC out of that. And that's then goes through the whole process. And sometimes they get accepted, and sometimes they don't.

Henrik Gemal 2:16

How technical are the RFCs? Is it like coding? Or is it more like the idea in general?

Derick Rethans 2:23

The idea needs to be there, it needs to be thought out. It needs to have a good reason for why we want to add or change something in PHP. The motivation is almost as important as what the change or addition actually is about. Now, that doesn't always get us here at variable. In my opinion, but that is an important thing. Now with the idea we need to talk about what changes it has on the rest of the ecosystem, whether they are backward compatible breaks in there, how it effects extensions, or sometimes how it effects OPCache. Sometimes considerations have to be taken for that because it's, it's something quite important in the PHP ecosystem. And it is recommended that it comes with a patch, because it's often a lot easier to talk about an implementation than to talk about the idea. But that is not a necessity. There have been quite some RFCs where the idea was there. But it wasn't a patch right away yet. It is less likely that these RFCs will get accepted, because in order to get something into PHP not only needs to be there a good idea, that also needs to be there a good implementation of it. If you have been a long term contributor to PHP, then you should know how to write a patch yourself. In other cases, you'll see people that have an idea try to find somebody else to do and work on the implementation together. But all RFCs, if they get accepted. It's always pending a good implementation.

Henrik Gemal 3:52

How is an RFC actually done? Is that like a template you fill out or is it like a website or how does it work?

Derick Rethans 3:59

Our Wiki, I will add a link to that in the show notes, has a template of how to create an RFC. It has a set set of sections. There's always an introduction that basically lays out what it is about or why this change is being made. Then there is often a proposal of what the change actually is. And then there's a few sections that are sometimes empty or sometimes are filled in such as, at least backwards incompatible changes, for which PHP version is been targeted, what the impact is to all the parts of the PHP ecosystem. But these things are not always necessary, because they don't always make sense to do right? If you want to add a new syntax to PHP, then that almost never influences existing extensions, but it will influence OPCache, for example. And then there's also often things like open issues, things we haven't quite thought through yet. A bit of a discussion, discussion bits will get filled in after people in the PHP internals list, which I'm sure we'll get to in a moment, come up with better ideas or alternatives sometimes, and then things like future scope will also be part of the template. We don't really require a very rigid approach to this, but we do appreciate if all the sections are filled in, or at least thought about in such a way that there's either information or not information. And then at the end, there's often a proposed voting choice. Everything at the moment needs to pass by two thirds majority before it gets accepted. So yeah, those are the things in the template itself. But the template is important. And you do need to fill it in, if you want to propose an RFC.

Henrik Gemal 5:33

Are all RFCs public or do you have like private RFCs?

Derick Rethans 5:38

All RFCs have to be public, otherwise they can't be voted on. But some RFCs start out of just a conversation with a few developers coming up with an idea. In the last few months, some more complicated RFC start out on a GIT repository. As a pull request, they never get merged anywhere. Because on GitHub, it makes it much easier to comment on specific sections for adopting feedback. Instead of having large discussions on the PHP internals mailing list, where sometimes comments might just get lost because there's too much text in there. Even though these RFC start out, while they're still sort of public, but nobody knows about them. In the end, they will always have to be public otherwise there won't be any voting, done on it, and it won't get accepted.

Henrik Gemal 6:27

Where's the RFC sent to and who's kind of in charge of the RFC? Is the one that makes the RFC or is it like a RFC commander?

Derick Rethans 6:37

The person that makes the RFC is responsible for guiding it through the whole process that we have. Once they are finished, there is a requirement for you emailing the PHP internals list with a specific prefix, which I think is RFC in square brackets. And then that starts a minimum discussion periods of two weeks. That discussion period might end up longer, in cases, lots of things to talk about or discuss or lots of disagreements, but the discussion period has to be a minimum of two weeks on the PHP internals mailing list.

Henrik Gemal 7:09

I was wondering a little bit about the priority RFCs because I see RFCs as like, a little bit like feature requests. So wondering who actually decides on the priority of an RFC?

Derick Rethans 7:23

Nobody really decides on the priority. Multiple RFCs can go through the process at the same time, you don't really have a priority of which one is more important than others. So yeah, there's nothing really there for it.

Henrik Gemal 7:35

I was just wondering if it's done like a normal project, you know, there might be many RFCs at the same time. I'm wondering how many kind of RFCs are there at the moment, are we talking 10 or are talking thousands?

Derick Rethans 7:50

This depends a bit on where in PHP's release cycle we are. PHP should get released at the end of November or the start of December. In all PHP seven releases that actually has happens. Usually the period between December and March, there will be like maybe one or two a week, which is great because that makes it possible for me to pick the right one to make an episode out for the podcast. At the moment, there are 10 outstanding RFCs. That means there are so many that I don't actually have enough time to talk about all of these on the podcast. However, they are often more just before we go to feature freeze, which happens at the end of June. So there's still two months to go. But you also see that over the last two years, there's a lot more smaller RFCs than there are big RFCs. So big RFCs like union types. They tend to be early in a release cycle, where smaller RFCs, as an example here, there's currently an RFC that there is no episode about, that suggests to do a stricter type checks for arithmetic or bitwise operators. Those are tiny, tiny changes. And in the last two years, there have been more and more smaller RFC than bigger RFCs because they tend to limit the amount of contention that people can disagree with and hence, often makes it easier to then get accepted. That is a change that I've been seeing over the years. But no, there are no thousands for each PHP version, I would say on average, there's about one a week, so about 50.

Henrik Gemal 9:19

I want to get a little bit into the voting part, because that sounds kind of interesting, who can actually vote?

Derick Rethans 9:28

After the two week minimum discussion period is over on the PHP internals mailing list, an RFC author can decide to put up the RFC for a vote. And that also requires you then to send an email to the PHP internals mailing list prefixing your subject with the word vote in capital letters. Now at this moment, you unfortunately see that people start paying attention to the RFC. Instead of doing that during the discussion period. At a moment of vote gets called you shouldn't really change RFC unless it's for like typos or like minor clarifications to things, you can't really change syntax in it for example. People can vote our people with a PHP commit access. And that includes internals developers, documentation contributors, and people that do things in the infrastructure. Everybody that has a PHP VCS account and VCS, version control system, that used to be CVS and now then SVN, and now GIT, as well as people that have proposed RFCs. So the group that technically could vote is over 1000 big, but the amount of people that vote is very much under 50 most of the time. We don't really have any criteria beyond you have to have an account to be able to vote in PHP RFCs.

Henrik Gemal 10:43

How is the voting actual done?

Derick Rethans 10:47

Since about last year, each RFC needs to be accepted, with a two thirds majority. On each RFC on the wiki, once a vote gets called you as an RC alter needs to include a small code snippets that then creates a poll. Very often do we want this? Or do we not want this? So it's a yes or no question. But sometimes there are optional votes, whether we want to do it a specific way, or another specific way. Sometimes that allows you to then select between different syntaxes. I don't think that is necessarily a good idea to have. I think the RFC author should be opinionated enough about picking a specific syntax. It is probably better to have a secondary vote as we call those. Those secondary votes don't to have two thirds majority is often which one of the options wins out of these. But the main RFC won't get accepted, unless there's a two thirds majority with a poll done on the wiki.

Henrik Gemal 11:46

What happens after the vote? You know if it's both if it's Yes or no?

Derick Rethans 11:53

I'll start with the easy case, the no case. If it's a no then the RFC gets rejected. That also means that sometimes an RFC fails for a very specific reason. Maybe some people didn't like the syntax, or it was like a one tweak where it would behave in a wrong way or something like that. But as a rule that says that you cannot put the same subject back up for discussion for six months, unless there are substantial changes. Now, this has happened with scalar type hints, for example, and a few other big ones. If an RFC gets accepted, then pending on whether there is an implementation, the implementation will get set up as a pull request to the PHP project on GitHub. And then the discussion about the implementation starts. If the implementation doesn't get to the point where it is actually good enough, or whether it can actually not be implemented in a way that it doesn't impact performance, it still might end up failing, or might not get merged. And in some cases, it means that a feature will get added at some point but it might not be necessarily in the PHP version that it got targeted for. I don't actually have an example for that now. If the implementation is already good and already discussed it can get merged pretty much instantly. And then it will be part of the next PHP version.

Henrik Gemal 13:08

How many RFCs voted on every year? And what majority voted yes or no?

Derick Rethans 13:16

I don't have the stats for that. But there is a website called RFC watch, where you can see which RFCs had been gone through and which one had been accepted or not, in a nice kind of graph way. I will add a link in the show notes for that. I would guess that during a year, about 50 RFCs are voted on. And I will think that about half of them are passing. But that's a guess I don't have the stats.

Henrik Gemal 13:42

Thank you very much for the answers. It brought me closer to the whole process of the PHP development. You have any other things to add?

Derick Rethans 13:52

I don't think so at the moment. I think what we she'd be a bit careful about is that although we're getting closer and closer to feature freeze at the end of June. We currently have just elected the new PHP eight zero release managers, but I keep the names secret, because this podcast is recorded in the past. They are going to be responsible now for doing all the organisatorical work for PHP eight zero. And that also means that feature freeze will happen at the end of June somewhere. And I expect to see a bunch of RFCs coming up with just enough time to make it into PHP eight zero, or not. So that's going to be interesting to see what comes up there. But other than that, I think we have explained most things in the RFC process now. And I thought it was a fun thing for once somebody else asking the questions and me giving the answers. And I think in the future, I think I would like to do like a Q&A session where I have multiple people asking questions about the PHP process. I also thought this was a good experiment and thanks for you taking the time to ask me all dthese questions today.

Henrik Gemal 15:00

No problem. I love your podcast. I listen to it whenever I bike to work. It's nice to get some insights into the PHP development.

Derick Rethans 15:10

Yeah, and that is exactly why I started it. Thank you Henrik for taking the time this morning to ask me the questions. And I hope you enjoyed it.

Henrik Gemal 15:18

Thank you very much for having me on the show.

Derick Rethans 15:22

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 49: COPA

PHP Internals News: Episode 49: COPA

In this episode of "PHP Internals News" I converse with Jakob Givoni (LinkedIn) about the "Compact Object Property Assignment", or COPA for short, RFC that he is proposing for inclusion in PHP 8.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 49. Today I'm talking with Jakob Givoni about an RFC that is made with a very long name, the compact object property assignment RFC or COPA for short. Jakob, would you please introduce yourself?

Jakob Givoni 0:39

Yes, my name is Jakob. I'm from Denmark, and I've been working programming in PHP for 20 years now. I work as a software engineer for a company in Barcelona that's called Vendo. I got inspired to get involved in PHP internals after I saw you as well as Rasmus and Nikita in a PHP conference in Barcelona last November.

Derick Rethans 1:00

there was a good conference, I always like going there. Hopefully, they will run it this year as well. What I'd like to talk to you about today is the COPA RFC that you've made. What is the problem that this is trying to solve?

Jakob Givoni 1:14

Yes, I was puzzled for a long time why PHP didn't have object literals. And I looked into it. And I saw that it was not for lack of trying. Eventually, I decided to give it a go with a different approach. The basic problem is simply to be able to construct, populate, and send an object in one single expression in a block, also called inline. It can be like an alternative to an associative array. It gives the data a well defined structure, because the signature of the data is all documented in the class.

Derick Rethans 1:47

Of course, people abuse associative arrays for these things at a moment, right? Why are you particularly interested in addressing this deficiency as you see it?

Jakob Givoni 1:57

Well, I think it's a common task. It's something I've been missing, as I said inline objects, obviously literals for a long time, and I think it's a lot of people have been looking for something like this. And also, it seemed like it was an opportunity that seemed to be an fairly simple grasp.

Derick Rethans 2:14

What kind of solutions do people use currently, instead?

Jakob Givoni 2:18

I think, very popular one is the associative array where you define key value pairs as an array. The problem with that is that you don't get any help on the name of the indexes nor the types of the values.

Derick Rethans 2:33

I mean, it's easy to make a typo in the name, right? And it just either exists in the array suddenly, if you set it or you just get a random null value back. As you said, yeah, there's no way of enforcing the type here, of course. COPA compact object property assignment is a mouthful, and it is a new bit of syntax to the PHP language. What is this new syntax going to look like?

Jakob Givoni 2:55

While it looks just like when you assign a value to a property, but here you can add several comma separated lines of property name equals value inside a square bracket block, which is coming after the array and the array arrow operator. The syntax shouldn't really conflict with anything else we have at the moment.

Derick Rethans 3:17

Because that's becoming more and more of a problem, right? Finding new bits of characters to use for new syntax. It is something that came up with annotations or attributes as well.

Jakob Givoni 3:27

And then to start talking about, does this look like typical PHP? Or do you just like this syntax? Or do you hate it? It becomes a taste based thing. For me, the important thing is that if it works, and if it's fairly trivial to implement, I don't have a problem with it.

Derick Rethans 3:43

There was a related RFC early in the year which was called the object initializer RFC. How is your proposal different from that one?

Jakob Givoni 3:51

The object initializer is a new concept. Mine is different in in that I didn't want to introduce any new concepts. My approach was focused on pragmatism. In that other RFC, the initialization is done at the construction time. And you can kind of do it without even having to define your constructor. And one of the most important aspects of that one was to enforce that all the mandatory properties have been initialised. Because you can have type properties in PHP 7.4. If they don't have a value, then there is introduction of this new state of uninitialized properties. And the author of that RFC wanted to make sure that once the object was ready was fully constructed, it would validate that there was nothing missing there. So it has like six out of seven characteristics in common with mine, and one characteristic that is different. I looked into this about the mandatory promises and I didn't find a simple way or an obvious way to handle it. I have one idea if this COPA should pass and I have another idea if it fails. I didn't want to include that it was not part of my main goals.

Derick Rethans 5:01

I'm looking at the syntax here for a bit. And it seems that way how you can do this COPA block. If you have an object, you use the arrow which is dash greater than sign square brackets, and then the list of properties that you want to assign values to. And the RFC shows that to be equivalent to doing each line manually yourself. Does that mean that it is only works for public properties?

Jakob Givoni 5:31

No, it would work also, for what do you call it, virtual properties that don't actually exist, or if they're private, it would just invoke the magic set method in that case. The same thing would happen as if you were to do the assignment line by line as in the example.

Derick Rethans 5:48

Without there being the underscore underscore set method set, it means that you can only really set the public properties in that case.

Jakob Givoni 5:56

You won't be able to set private or protected properties directly unless the magic method does that.

Derick Rethans 6:03

So does that mean that it is pretty much only something that happens in syntax, and it doesn't have any other side effects or any other functionality that you wouldn't already be able to do?

Jakob Givoni 6:15

Yeah, it's just a new syntax for that. The emphasis here was pragmatism. So not introducing any new concepts.

Derick Rethans 6:23

What would use cases for this be?

Jakob Givoni 6:25

Typically, as I mentioned, they're data transfer objects, value objects. Those simple associative arrays that are sometimes used as argument backs to constructors, when you create objects. Some people have given some examples where they would like to use this to dispatch events or commands to some different handlers. And whenever you want to create and populate and and use the object in one go, the COPA should help you.

Derick Rethans 6:58

I suppose COPA would also work for standard class objects?

Jakob Givoni 7:02

It's an object just like anything else. So yeah, yes, there shouldn't be any surprises.

Derick Rethans 7:07

But of course, it doesn't really make a lot of sense to use standard class because then again, of course, you don't have the benefits of checking your property names or types, again, of course. Are the other use cases you can think of?

Jakob Givoni 7:19

Why don't have anything else in mind.

Derick Rethans 7:22

I remember quite a long time ago, because this is a subject that comes up quite a bit. That's pretty much people that write PHP code abuse associative arrays so much. Just like the object initializers RFC, as well as your COPA RFC, try to use objects in a different way to be able to prevent developers from abusing associative arrays, pretty much as more stricter data types. In languages like C, there's a distinct datatype for this is called a struct. Do you think it would make sense that instead of trying to overload our object semantics, then in stats use, or introduce something like a struct concept of that C or other kind of statically typed languages have?

Jakob Givoni 8:10

As I understand it, a struct is basically the same thing as structured as what I'm talking about structure set of data. However, I'm not sure if it's worth it to introduce a new concept. I don't know if it's necessary if it's possible to reuse the things that we already have enough familiar with. I think I would prefer that you call it overloading the object. But I don't see a lot of problems with having an object that is simply a list of properties with values. It's a very basic object. An object doesn't need to have any methods, it's possible to use that. Every time we add a new concept like struct would be, I feel that it would lead to a combinatorial explosion of implications that later you need to assess every time you want another future change. I haven't seen any RFCs that have specifically mentioned structs. But it is a very related concept.

Derick Rethans 9:08

I'm just asking because I spent a lot of time in C where we have structs. But we don't really have objects or classes to begin with. It's more familiar for me to use that. And the other reason why I was asking is that perhaps it would be possible to create like a slightly more natural syntax, because, in my opinion, I think the one that you currently have chosen isn't particularly the most friendly one, but that's my own opinion here.

Jakob Givoni 9:33

There might be a window of opportunity, because curly brackets after the variable is going to be deprecated as a way as an array access. So maybe that could be used just curly brackets and dropping the arrow itself. That would look a lot more like like an object, I think, and it would also be shorter.

Right. I mean, PHP 7.4 deprecated these.

So the question is just how soon can we remove it and replace it to mean something else completely?

Derick Rethans 10:03

Yeah, that's a good question. I don't think I have the answer either. I guess it can be introduced as long as syntax that existed previously would now not do something different. And I think you would actually be okay here.

Jakob Givoni 10:15

I'm pretty sure it would throw a syntax error. If you try to run this code in a previous version.

Derick Rethans 10:21

I meant saying if you would reuse the curly braces, because as you said, they have been deprecated in PHP 7.4.

Jakob Givoni 10:28

I mean, if someone were not to follow that deprecation notice, that is now in place and would continue to keep their the code. If we change the implementation, it's better to get a clear, fatal error than to just have something really spurious happening.

Derick Rethans 10:45

Yes, absolutely, I definitely agree. Now, that's sort of what I was trying to get at, but you explained it more eloquently than I did. The RFC lists a few special cases. It talks about execution order and exceptions. I think some, somebody brought up somewhere that what happened If we're trying to set multiple properties through COPA and say the second out of three throws an exception. What would be the end state of the object for example? Could you talk a little bit through that?

Jakob Givoni 11:11

Regarding exceptions being thrown in any of those expressions where you are assigning, it's important to understand that the block of code that is COPA is not an atomic operation. Anything that happened before the exception will still have happened. And everything anything that happens after won't happen. Exactly like what you would expect if you were doing it line by line. Or if you were using method chaining to do several things on an object. I think it's going to happen what you would expect to happen unless for some, I think it might be unintuitive, that it's not an atomic operation. But it's just important to keep that in mind. That's why I listed it under special cases. And there's something similar with the execution order, in that you can list the properties in any order you like. It doesn't necessarily mean that you're going to get the same result if you change the order because you will be able to use the value of a previous assignment in the next one. Again, not 100% intuitive, but I think it might be worth the trade off in implementation and flexibility.

Derick Rethans 12:19

As you mentioned, there's no new semantics in there. Talking a little bit about implementation here. As there is no patch available, is this something that you'd be interested in developing yourself? Or are you looking for somebody else to help you out on that?

Jakob Givoni 12:32

I actually haven't contributed any code before. I'm not familiar with C. But one reason that I chose this RFC and this approach is also that if I can't get any volunteers, I might be able to learn and to do it myself, since it seems like it's mostly a parser syntax thing, probably should be able to pick that up.

Derick Rethans 12:53

I would also think because there is no new semantics in here, that it would instead be something in between, probably just the lexer that we have, the parser, and then constructing an equivalent abstract syntax tree or AST segment out of that.

Jakob Givoni 13:12

I would be thrilled to collaborate with someone to do some pair programming in order to get started if anyone is up for it.

Derick Rethans 13:18

So if you're listening to this episode, and you want to help Jakob out, why not get in touch with him? His contact details will be in the show notes for sure. The RFC also lists a few things that you have thought about, but you have decided not to either pick up into the RFC or you don't think they are in scope. Would we'll talk about that a little bit?

Jakob Givoni 13:36

There's some special things that you can do at the moment when you assign a value to a property. Things like using a variable to specify the property name, or to generate the property name from an expression using the curly brackets after the arrow. There's also array access directly on the properties, or increment, decrement, or nested object accesses. I don't think that these things are really essential. I've decided to probably leave it out of scope for now unless it's trivial. If it if it's trivial to implement that as well. It's okay with me. It's not deal breaker. But you have to do a cost benefit analysis. And I'm thinking that it could be a future scope. If there's a demand this can be addressed in a later RFC.

Derick Rethans 14:23

The RFC also talks about nested COPA. But it looks so complicated to me that I'm not sure whether it is actually something that we even should add to begin with.

Jakob Givoni 14:34

I don't think it's as complicated as it looks. So you can already already do nested COPA in if you create a new object inline as well as you of course, you can assign it to a property in the outer scope of the COPA. But if you want to over, to set just one property of a nested object, then you cannot do that directly. Well, you can do it actually if you access the previous one. Because you have access to the current property when you do their assignments. So you can see in my example that you can do it. But there might be a better syntax for doing that.

Derick Rethans 15:11

I'm happy to see that there's no backward incompatible changes. So that's always a win. What has been the feedback so far?

Jakob Givoni 15:17

Yeah, the feedback has been mixed bag as to say. There's some recognition that this has potential to be a useful feature. This is a critique of the syntax, as you also mentioned, and then about the missing functionality, like the mandatory properties and atomic operations. And then of course, named parameters always comes up. The PHP internals list. It's a tough crowd. I really enjoyed engaged in this project. So I don't mind it's part of it. I also really like this side discussion that we're having currently about ways to improve the way that we collaborate and make progress, especially on tough issues.

Derick Rethans 15:58 That has definitely improved over the last five years to a decade, but it can always be improved more, I would say. What is your end goal with this RFC? I guess you would like to see this added to PHP at some point, are you targeting it for PHP eight?

Jakob Givoni 16:13

I would be extremely proud to see this added to PHP at some point. And if it can make it into PHP eight in the first release, that would be awesome. That's at least what I'm going for, for now.

Derick Rethans 16:25

The PHP project is looking for release managers for PHP eight zero, with feature freeze happening at the end of June somewhere. So there's lesser and lesser time available for doing these things. So I'm curious to see where this ends up.

Jakob Givoni 16:39

It's a race against time at the moment.

Derick Rethans 16:42

But that's always the case, isn't it? I think be interesting to see if, if somebody wants to help out to make the implementation of this, or rather, I'd be interested to see whether you'd be able to pick up that yourself actually. We can always do with more people that work on a PHP language. Do you have anything else to add yourself?

Jakob Givoni 17:00

I'd say that I spent a lot of effort researching and writing this. And I just hope that people will study the RFC properly and keep an open mind. I know it's probably going to be a hard sell. And that's okay. I just wanted to give it a go. And this is just just the beginning of my contributions, I hope.

Derick Rethans 17:19

I spoke with Mate a little bit a few episodes ago. He was getting worried about it not getting accepted at some point. And I pointed out to him that scalar type hints took about a decade and seven attempts to finally make it into PHP. So it helps to just persist I would say in times.

Jakob Givoni 17:37

Times change and also you get new ideas and you evolve.

Derick Rethans 17:42

The language continues to improve and that's how I like it. Thanks, Jakob for taking the time to talk to me today. It was interesting to see what you're up to.

Jakob Givoni 17:51

My pleasure. Thank you so much Derick for having me.

Derick Rethans 17:56

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 48: PHP 8, JIT, and complexity

PHP Internals News: Episode 48: PHP 8, JIT, and complexity

In this episode of "PHP Internals News" I discuss PHP 8's JIT engine with Sara Golemon (GitHub).

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 48. Today I'm talking with Sara Golemon about PHP 8 and JIT. Sara, would you please introduce yourself?

Sara Golemon 0:33

Hi there. Hi there, everybody listening to PHP internals podcast. I'm Sara. I've been on this podcast before. But in case you're just getting here to for the first time, welcome to the podcast. You have a nice backlog to go through. I am a lapsed web developer, come database security engineer by day, and an opinionated open source dev slash PHP 7.2 release manager by night and also day. I've been involved with the project for about 20 years now off and on. Somehow I just keep coming back for more punishment.

Derick Rethans 1:03

We're leading up to PHP 8, with lots of new features being added. But one of the biggest thing in PHP 8 that I've spoken about on the podcast on before all the way back last year in Episode 7, is that PHP eight is going to get a JIT engine. Would you care to explain what a JIT engine does again?

Sara Golemon 1:20

Well, I'm going to give you the short, you can look this up on Wikipedia in two seconds definition of JIT, means just in time compilation. That doesn't really tell you much, unless you listen to it on the sort of other half of that of AOT, or ahead of time compilation. AOT is what you expect from applications like GCC, you know, you just make an application that you've got C or C++ kind of source code to that's ahead of time. JIT is saying, well, let's take the source for application. And let's just run with it. Let's just start executing it as fast as I can. And eventually we're going to get down to some compiled code. That's going to run a little bit quicker than the initial stuff did. PHP already has this nice little virtual machine built into it. We call it the Zend engine. That takes your script and immediately just says: All right, well, what does this say in computer terms? Well, a computer readable term is a series of these op codes, they're also called byte codes in other languages that give you instructions for: run this type of instruction at this time and get something done. The PHP runtime interpreter interprets that one instruction at a time basically pretending to be a CPU. This works quite well, it runs quite efficiently. But there's still this sort of bottleneck in the middle there of a program pretending to be a CPU running on top of a CPU in order to run other code. The idea of JIT is that this thing sitting in the middle is going to gradually figure out what your program really is trying to do and how it's intended to run, and It's going to take those PHP instructions and it's going to turn them all the way down into CPU instructions, so that it can get out of the way and let the CPU run your code natively as if it had been written in a compiled AOT kind of language. What that actually means for execution of PHP code in PHP 8 is still sort of a, you know, a question that's, that's left to be answered here. I listened to your interview with Zeev. Episode 7, is a good episode of getting some good information on that. We do definitely agree on what the status of the JIT within PHP is, right now we can. It's subjective facts like this is how much work has been done largely by Dmitri, where we can kind of expect to see the best gains come from. I personally think I might be a little bit more pessimistic than him in terms of the actual performance impact we get out of it. I think we both recognise we're not going to see the two to one kind of improvements we saw from five to seven. Nobody's realistically expecting that, but if you look at the demo that Zeev ran a few months ago, where he shows the Mandelbrot set being generated in two different PHP requests, and then WebSocket out to a nice pretty display, it's a very visceral reaction because you can see one Mandelbrot set being calculated much, much faster than the other. And he acknowledges though this is not realistic PHP code, nobody's writing the Mandelbrot calculation in PHP. We can see that under certain workloads, it's definitely getting faster. But for PHP core mission, which is web serving, I mean, we both know that it's not going to be massively fast. I think it's going to be almost imperceptibly fast.

Derick Rethans 4:41

One question for my site, the Mandelbrot set, the implementation of that is all in a specific function, right? And it's all CPU heavy code, not IO.

Sara Golemon 4:51

Yes.

Derick Rethans 4:52

And it's all that in the same function.

Sara Golemon 4:54

Yes.

Derick Rethans 4:55

Now, what I was thinking of the other day is that how does this interact with calling standard library functions, because the JIT engine is going to have to go out of basically running things on the CPU and calling things that are then implemented in C to begin with.

Sara Golemon 5:10

So you're asking that question, because you already know some of the pitfalls of JIT, and you're leading me into it. And that's fine. When a JIT emitter is taking the language that it's emitting, so PHP. As long as it remains within the scope of PHP, it can sort of keep track of where it's at. It's like, Okay, I know this variable's init, your because I saw it get set. I know that this is going on here. I know that's going on there. And it can carry those assumptions around as it's admitting code. And emit very efficient code that doesn't need a whole bunch of double check guards of like: Wait, is this still an integer? Wait, is that still a string? All of these sort of like escape hatches for when things go wrong. Anytime you cross over into, I will say C-land, or internals land, or ahead of time compiled land. It's basically calling into what it sees as a black box. And it just says: Okay, here's some data, I know the types going in, have fun with it. And something air quotes happening in the air happens with that code and the black box spits out an answer. Well, by the time the black box has spit out the answer, the JIT that has taken that PHP code, no longer knows if any of its assumptions are true or not. It just has to say: Well, time to start from scratch, time to keep track of where we are from here, build up a new set of assumptions. So we get this speed bump in the road of executing code. And it turns out most PHP applications are using a whole lot of those internal API's because they're quite useful. There is a kitchen sink in PHP, and it does stuff. So you have these repeated hits of this road bump happening, and that's not great. If we want to compare this to other JIT languages that are out there. I might suggest we compare this to HHVM because of course, HHVM, at least in the beginning implemented a fairly close kin cousin to the dialect of PHP. It has since diverge much more and become hacklang. But it was doing the same thing, taking PHP code, running it native on the CPU and occasionally having to make that cross to this its own version of internals, or it was running C++ code. One of the ways to reduce those numbers of jumps is that they took a lot of those internal functions, the ones that actually didn't need to do anything, particularly internals ish, and just rewrote them in PHP code. And if you look at the HHVM source code right now, there is a big directory called systemlib and that's a whole bunch of hacklang code, read it as PHP code, that is implementing a lot of these very common quote unquote internal functions. We just had an RFC for function called str_contains(), that is a function that could have been hundred percent been written just as PHP code. Something could have thrown that into packagist. For the record, I voted against it because of exactly that. I think you should write that in packagist and just put it in your composer.json is okay. It's gonna pass anyway, it got a lot of votes. That aside over, that is a sort of function that if we were putting it into sort of an 8.X version of PHP, where we did have our own type of systemlib, we would have probably just said, let's write that as PHP code. So that the JIT, when it enters that function, can keep all those assumptions intact, and potentially even inline some of those instructions and avoid the function call entirely. That's basically taking all of the instructions that are part of the in this case, str_contains() function, and implementing them within the scope of the function that was calling it. So you skip that entire function call overhead, which a lot of people know is still one of PHPs sort of weaker points in terms of where that fat to trim is, as Zeev said in Episode Seven, we still have some parts of PHP that are a bit slow, irrespective of a JIT.

Derick Rethans 8:50

There are actually a few functions that have been inlined now into op codes. strlen() is an example of this where instead of it now being a function call, it's actually directly an opcode. Because it is a function that is used so much and actually gain a bit of performances there.

Sara Golemon 9:05

Yeah, I think all of these functions as well are just a single opcode for type check. Yeah.

Derick Rethans 9:10

There's a whole bunch of them for sure. I saw that earlier this morning, Dmitri produced, or proposed another branch in which he implemented tracing JITs, instead of the JIT that we already have, and I have no idea what the difference is between a normal JIT engine and the tracing JIT engine,

Sara Golemon 9:25

Ultimately, the distinction is not that important to end users, it's going to function the same, but it is a sort of an internal implementation detail. HVVM's by the way, is a tracing JIT. It basically looks at any given unit of work that it needs to translate, let's say a function, and it says, what are the pieces that have these sort of non branching parts attached to them? Let me look at each of the non branching pieces. And let me create a version of that translation based on the types that I expect to be going in there. If the types fail, I'm gonna have to create a new version of that piece. But then that piece can plug into this sort of chain of tracelets to create a full function. Most of the time, especially if you've written code that is well type hinted, you've got, you know, strict types turned on, you've got all of your types on the on the function parameters set. And it's very easy for the JIT to infer the types out of what you've put into your function. You're only ever going to need to create a single tracelet of any given section, and your full trace is going to be a single, unbroken chain of: do this, do this, maybe do a jump to another spot, just keep doing this, doing this, doing this. If you have, let's say, slightly messier code, maybe you're not using any kind of type hinting it becomes very difficult to infer any of the types, because there's lots of different call sites, that are doing lots of different things. We may end up having some functions that have multiple tracelets per body section that get built into the giant bush of interconnected edges, that's less ideal in terms of maximising performance, but it still at least functions.

Derick Rethans 11:06

We have spoken a little bit about what a JIT engine is and sort of how it works. It sounds quite complex and complicated.

Sara Golemon 11:14

It is definitely complicated. And I'm feeling like that's another lead. And so I'll just run with it.

Derick Rethans 11:19

I've also got to say my next leading question... Maybe I should actually ask the question?

Sara Golemon 11:24

Well, let's actually take a step back from the JIT for a second. And let's look at where the engine is right now. So the engine is basically two very large pieces. That's the sort of the extension library of all of the runtime functions. Everything you see exposed in user space, and the actual scripting engine. There are some other smaller pieces, but those are two, the two really big pieces. There are a whole lot of people pay a whole lot of attention to the extension piece, because that's the flashy bit. That's the part that gives you some bit of binding that you didn't have before, or some bit of functionality that can be delivered out of the box as part of that kitchen sink. And that definitely needs attention. I'm glad that that continues to evolve. But the scripting engine is that piece that defines syntax and how code is actually going to run.

Derick Rethans 12:09

Reading extension's code as a whole lot easier than reading the engine code.

Sara Golemon 12:13

And that's where I was going to go with that, yes, if you look at the code that's under ext, you can even come into that code without knowing any C at all. And you can actually make pretty good sense of a lot of it because a) PHP uses a whole lot of macros. So every function is literally defined with a macro that says: PHP_FUNCTION, like right here, PHP function, every class method, PHP_METHOD, here's the class name. Here's the method name. And what these things do are pretty clear sort of API's. They're very small bite sized pieces for the most part. The bits that involves sort of defining a class and how it does its memory management, those get a little bit more complicated, but I think on the whole extension code is far more accessible. If you go and look at the engine, particularly the runtime pieces of the engine, although the compiler is complex as well. You have to do a lot of digging before you even get to a point that you can see how the pieces maybe start to fit together. You and I have spent enough time in the engine code that we know where to look for a particular thing. Like let's say that opcode, you mentioned that implements strlen(). We know that, oh, zend_vm_def.h has got the definition for that. We also know that that file is not real code. It's a pre processed version of code that gets built later on. Somebody coming to that blind is not going to see a lot of those pieces. So there's already this big ramp up just to get into these engine as it exists now in 7.4. Let's add JIT on top of that. You've got code that is doing call forward graphs, and single static analysis, and finding these tracelets, and making sense of the code at a higher level than a single instruction at a time, and then distilling that down into instructions that the CPU is going to recognise. And CPU Instructions are these packed complex things that deal with immediates, and indirects, and indirects of indirects, and registers. And the x86 call ABI is ridiculous thing that nobody should ever have to look at. So you add all this complexity to it, that by the way, sits in ext/opcache. It's all isolated to this one extension that reaches into the engine, and fiddles around with things to make all this JIT magic happen. You're going to take your reduced set of developers who know how to work on Zend engine, and you're going to reduce that further. I think at the moment, it's still only about three or four people who actually understand how PHP's JIT is put together enough that they can do any effective work on it. That worries me for sure. I don't think that's an insurmountable hill to climb, especially if we can start getting some documentation written about it, at least from a high level point of view. Hey, you know, look over here to find this stuff. Look over here to find that stuff. Something to get started. So the people who have at least that basic understanding of how the VM part of the Zend engine works can sort of upgrade their knowledge to get into to the JIT. I only think that's worth it. If we actually get real performance boost out of JIT. If we actually turn the JIT on, and we see that for PHP's core workload, which is web serving, we're only seeing a one to 2% gain. For me, that's not enough. It may be enough for others. But for me, I would call that experiment, not a failure, but a non success at that point. Certainly there are people out there who are still going to want to use it, because they are you doing command line applications, and they're doing complex math. And I'm not saying we can't have it. I'm just saying it takes less than a forward stage that point.

Derick Rethans 15:43

Somebody mentioned earlier in the chat room. It's also another set of potential bugs, right?

Sara Golemon 15:48

It is definitely another potential bugs.

Derick Rethans 15:51

It's pretty much another implementation of the PHP syntax bits of PHP.

Sara Golemon 15:57

So if you run an application and you get behaviour you don't expect, where is that behaviour actually coming from? You can spend a lot of time looking in Zend engine because you're thinking like: Oh, well, this is the thing that executes opcodes. And when I run it in a single command line, it's definitely going through this bit of code, but it works on a single command line run. But at the twentiest request on my web server, it's not working. Why is that happening? Well, it turns out, it's happening, because that's when the JIT has finally kicked in, because it has enough information. And it's running through this tracelet that was just a little bit wrong. And well, crap. You mentioned I think, at one point, when we were talking in Miami just a couple months ago, that you're just gonna have to turn the JIT off entirely when Xdebug is running,

Derick Rethans 16:41

Just like I'll already turn OPCache optimizations off, because there's just too confusing for people.

Sara Golemon 16:46

It's confusing and complex, but it's also it may not even be 100% possible because we are right there down at the bare metal of running CPU instructions. There's not a lot of opportunity to just say like, Oh, hold on Mr. CPU, let me just take a look at your registers right now. Okay, this is okay, let's go ahead and keep going now. The VM that we have now in in Zend lends itself 100% to those kinds of activities, CPU does not. What that means is that what we experience in the development mode with Xdebug running is not going to be the exactly the same thing that we experience in real runtime code. And I don't know if we have a solution for that.

Derick Rethans 17:23

As far as I know, there's no solution for it at all.

Sara Golemon 17:26

I was trying to cage it in the hope that maybe we could someday have solution for it.

Derick Rethans 17:30

It'd be lovely, but I can't see that happening to be honest. I think it's going to be important to find out how much this actually benefits, real live code. How does it benefit your Laravel project or your Symfony project or anything like that? I think it's going to be hard to now make a case for not shipping PHP 8 with a JIT. I think that'd be a bit unfair. But on the other side, if it's, as you say, only really gives you one or 2%, whether this is worth have the additional complexity. The additional maintenance burden as well as another opportunity for having bugs that are a lot harder to reproduce, but it's actually worth having it at all?

Sara Golemon 18:11

I definitely don't want to poopoo on the JIT effort.

Derick Rethans 18:14

Oh, no, absolutely not.

Sara Golemon 18:15

I think this is an important experiment to run. And I think if 8.0 as a whole winds up being a sort of public beta experiment of it, that will definitely give us a lot of good information. And I am super hopeful that we see better percentages, that we see 5-10 maybe even 15%

Derick Rethans 18:31

Absolutely.

Sara Golemon 18:32

I want to be guarded in what I how I talk about it on a podcast like this because I don't want anybody say: Oh, 8's gonna be great. Our code is gonna run 10 times as fast as it was running before No, that's not gonna happen two x is not gonna happen. We're talking much lower numbers than that. Be guarded, be hopeful, but 8.0 is going to be, as I said, it's going to be that sort of public beta experiment.

Derick Rethans 18:55

I think that's great. I think running this experiment again because ta similar experiment was, of course run during the PHP 5.6 days when PHP 7 came out. Originally with PHP 7, was PHP with a JIT engine. And then Dmitri and others found out that it was so much other things that could be done to make PHP run pretty much twice as fast.

Sara Golemon 19:16

Yeah, there was a lot of really low hanging fruit.

Derick Rethans 19:19

Yep. And that was great to see. I am apprehensive about people thinking that the JIT engine in PHP eight is going to similar performance boost.

Sara Golemon 19:29

We'll see. Nothing to say about it, but then: we'll see.

Derick Rethans 19:32

But I would suggest is that if you're interested in seeing what this can do for your projects, you should go try it out. Download PHP's master branch, enable it and see how it goes.

Sara Golemon 19:41

And of course, make sure you are running on x86 hardware. I doubt very much that he's bothered to put more than one back end on this.

Derick Rethans 19:48

I don't actually know.

Sara Golemon 19:49

I haven't looked. He might be using some helper library for it. So it's possible that we're hitting multiple backends. But this is probably going to be an x86 only thing and possibly a Linux thing. I should find out the answer to that question.

Derick Rethans 20:00

I should do too. Okay, Sara, thanks for taking the time this morning to have a chat with me about PHP 8' JIT efforts.

Sara Golemon 20:08

It's fun as always, I always love to speak with you Derick. You bring a bright Corona of sunlight to my day.

Derick Rethans 20:16

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 47: Attributes v2

PHP Internals News: Episode 47: Attributes v2

In this episode of "PHP Internals News" I chat with Benjamin Eberlei (Twitter, GitHub, Website) about an RFC that he wrote, that would add Attributes to PHP.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 47. Today I'm talking with Benjamin Eberlei about the attributes version 2 RFC. Hello, Benjamin, would you please introduce yourself?

Benjamin Eberlei 0:34

Hello, I'm Benjamin. I started contributing to PHP in more detail last year with my RFC on the extension to DOM. And I felt that the attributes thing was the next great or bigger thing that I should tackle because I would really like to work on this and I've been working on this sort of scope for a long time.

Derick Rethans 0:58

Although RFC startled attribute version two. There was actually never an attribute version one. What's happening there?

Benjamin Eberlei 1:05

There was an attributes version one.

Derick Rethans 1:07

No, it was called annotations?

Benjamin Eberlei 1:08

No, it was called attributes. There were two RFCs. One was called annotations, I think it was from 2012 or 2013. And then in 2016, Dmitri had an RFC that was called the attributes, original attributes RFC.

Derick Rethans 1:25

So this is the version two. What is the difference between attributes and annotations?

Benjamin Eberlei 1:30

It's just a naming. So essentially, different languages have this feature, which we probably explain in a bit. But different languages have this. And in Java, it's called annotations. In languages that are maybe more closer home to PHP, so C#, C++, Rust, and Hack. It's called attributes. And then Python and JavaScript also have it, that works a bit differently. And it's called decorators there.

Derick Rethans 1:58

What are these attributes or annotations to begin with?

Benjamin Eberlei 2:01

They are a way to declare structured metadata on declarations of the language. So in PHP or in my RFC, this would be classes, class properties, class constants and regular functions. You could declare additional metadata there that sort of tags those declarations with specific additional machine readable information.

Derick Rethans 2:27

This is something that other languages have. And surely people that use PHP will have done something similar already anyway?

Benjamin Eberlei 2:35

PHP has this concept of doc block comments, which you can access through an API at runtime. They were originally I guess, added as part or of like sort of to support the PHP doc project which existed at that point to declare types on functions and everything. So this goes way back to the time when PHP didn't have type hints and everything had to be documented everywhere so that you at least have roughly have an idea of what types would flow in and out of functions.

Derick Rethans 3:07

Why is that now no longer good enough?

Benjamin Eberlei 3:09

Essentially, user land developers use doc blocks to put metadata in there, and you could access them through an API. We had two sort of standards, or we still have two standards that use this. The documentation standard coming from the PHP documentor community. And then mostly runtime use case that exists now is covered by the doctrine annotations library, which, incidentally, I have also worked on a lot. It is used, for example, by the Symfony community, by the Drupal community, and by a few other communities as well that are smaller that wanted to go into the direction of using annotations in this case or attributes.

Derick Rethans 3:53

What would doctrine use an annotation for?

Benjamin Eberlei 3:55

I said before that annotations, add metadata to declarations. So let's say you have in your code, for example, classes that you want to store in the database. So you need to map PHP classes to database tables and back. Usually, you would do that using some kind of configuration. And configuration can be many folds. So the easiest way would be to write this in PHP, say, this is the column name, this is the field name, this is the class name and then store and use this information. And then you can go and store this in ini files, yaml files, XML files. The problem with this kind of approach is often that you have the configuration file and you have the class, and they are totally separate from each other, usually in very different places of the codebase. This is not some kind of configuration that is fluid. It's very, very static configuration that depends on the class. And it will not really change unless the class also changes. So changes are usually done together. In this case, it might make sense to put the configuration on to the class. Because then you see the declaration, you see it's configured in some way. And then you can more easily understand that changes affect each other in some way. And it leads to less mistakes, in my opinion. And it makes it a little bit more obvious that the class is used in some configured way.

Derick Rethans 5:26

We've had a quick look at what annotations are. The RFC introduces them in a different way, the attributes that you're not proposing, how are they different from the doc block comments?

Benjamin Eberlei 5:37

The idea is that we introduce a new syntax that is independent of the doc block comments. Essentially, before each declaration, you can use the lesser than symbol twice, then the attribute declaration, and then the greater than sign twice. This is the syntax I've used from the previous attributes RFC. And Dmitri at that point used the syntax from Hack. And it makes sense to reuse this not because Hack and PHP are going in the same direction any more. But because Hack at that point they introduced it that they had the same problems with which symbols are actually still easy to use. And we do have a problem in PHP a little bit with the kind of sort of free symbols that we can still use at certain places. And lesser than and greater than at this point are easy to parse. There are a bunch of alternatives and one thing that I will probably propose is an alternative syntax where we start with a percentage sign, then the square bracket open and then a square bracket close. This is more in line with how Rust declares attributes. While Rust uses the sort of the hash symbol, which we can't use because it's a comment in PHP.

Derick Rethans 6:54

And you don't want to use emojis.

Benjamin Eberlei 6:55

Some crazy people propose to use emojis which would easily work in PHP, but I guess it would be hard to remember the number to get the Unicode sign.

Derick Rethans 7:06

Within the two opening lesser than signs and two greater than signs to close it. What's in the middle?

Benjamin Eberlei 7:12

You declare an attribute name. And then you sort of have a parenthesis open, parentheses close, to pass optional arguments. You don't have to use them. So you can only use the attribute name. If you sort of want to tag something: just this is a validator, or this is an event listener, whatever you come up with, to use attributes for. But if you need to configure something in addition, then you can use. The syntax sort of looks like if you would construct a new class, except that you don't have to put the new keyword in front of it.

Derick Rethans 7:45

It looks like function arguments pretty much.

Benjamin Eberlei 7:47

Yes, exactly. Yeah.

Derick Rethans 7:48

What kind of values can you use in the optional arguments to the attributes?

Benjamin Eberlei 7:53

The attributes are not really runnable code in a way. Since they are declarations, they don't allow arbitrary PHP code to run there. What is obviously allowed a simple literal values, so a number, or a fixed string, a fixed array declaration, and all this kind of things are possible. What is also possible is exactly the same expressions that you can also declare in class constants. So, in the class constants, you can do simple mathematical expressions, you can reference other constants. So, this is something that will be very interesting for attributes to do reference class names for example.

Derick Rethans 8:34

What happens if you define an attribute on a declaration element?

Benjamin Eberlei 8:38

What happens is that while the PHP script gets compiled, it will see that there are attributes declared and it will parse the attributes and similar to the doc block store them on the internal structure for future reference. Attributes are parsed in my current proposal in a way that you can have every attribute just once. This is something that is still under heavy discussion, because there are a few good ideas why you would need two, or multiple. Essentially similar to how a doc block is a string, we then store an array, which represents the attributes belonging to the class or the function or the constant. And this is something that the engine stores and also stores it in OPCache.

Derick Rethans 9:27

How would you access these attributes?

Benjamin Eberlei 9:28

Attributes are accessed through the reflection API. The reflection API also allows access to doc blocks. For attributes that would be a new function called getAttributes(). And it returns a list of all attributes using a new reflection class called ReflectionAttribute. There you can access what name does this attribute have? What are the arguments that are passed? And then this goes into one of the next features of this RFC proposal. You can also ask it to return this attribute as an object instance.

Derick Rethans 10:05

An object instance of which class though?

Benjamin Eberlei 10:07

Attributes, and this is something that is different to the initial version, the version one attributes RFC is, attributes names resolve to class names. That means if you declare an attribute, for example, Foo, and you have an import for our class, MyApplication/Foo, then during passing the attribute will be resolved to my attribute view name. It uses the same mechanism for class resolving that is used in every script. It reflects the use statements that are declared in the file. And you can use namespaces, namespace operators to reference the attributes as well.

Derick Rethans 10:49

These are attributes not classes, so I don't quite see all the link between the attribute names in the classes is?

Benjamin Eberlei 10:55

One problem with the original doc block based system was that there are conflicts between attributes of different systems. One library would have a type annotation, or a var annotation, and some other library would also use it. This could lead to conflict if the syntax for them was slightly different. So this would lead to problems when multiple parses would use the same attribute. And they would parse them differently. And this could lead to errors. One problem that was mentioned in the initial attributes RFC and that, I think, if you vote us all so used as a reason for voting no is that there was no namespacing, which means that different libraries could clash and their use of attributes. My idea was we already have classes, we have namespacing. We can resolve this by using this mechanism. You declare an attribute and an attribute always resolves to a class. In the best case scenario, you would also declare this class in your code. Essentially, the attribute is not an attribute, but it's a special class that represents an attribute. This is also shown in the code that by having an additional interface, or a sort of a marker interface, that attributes can implement to make it obvious that they they are used as an attribute.

Derick Rethans 12:19

You mentioned that you could access the attributes through reflection API, and you can get them out as an object?

Benjamin Eberlei 12:25

Yes, this is why I mentioned before that the syntax sort of looks like constructing a new object, but without the new keyword. When you access the objects through the reflection API, it would essentially instantiate the class, and all the arguments that you put into the attribute declaration are passed into the constructor of the object. And this is why the connection is there between a class and an attribute. It directly goes to instantiating the attributes as an object using the arguments and giving the developer access to them.

Derick Rethans 13:00

Does it only do something like this when you use the getObject() on the reflection arguments? Or is it also possible that I don't care about these classes things whatsoever, and I can just get a list of attributes and their optional values that are associated with them?

Benjamin Eberlei 13:16

You don't have to have a class, and the class name resolving in PHP is independent of classes actually existing. The attributes RFC respect that. You can just import anything that is not a class and use an import statement to shorten the attribute usage, or you can use the absolute namespace syntax to put a fully qualified attribute name into your code. And it wouldn't fail. The fail would only happen when you call the method on ReflectionAttribute to get the attribute as an object. So this is something the RFC is also in flux with and about to change it. The first version mentioned that attributes will always be auto loaded when they are declared at compile time. This would essentially treat attributes similar to base classes or interfaces, in a way that they are always resolved, they're always checked. However, this is a little bit overkill for userland attributes. And a lot of feedback was related to this should only happen when the reflection API is used. So I'm going to change this. One thing that we do need to handle in a way is a built in attributes. One reason why I want to add this RFC as well is that there are a few use cases coming up in PHP itself, that could benefit a lot if we had built in attributes. Since we don't have a clear path forward there. But Nikita has published his ideas on editions. So there's some paths forward to having PHP code work slightly differently depending on what developers want. Attributes could be helpful there. Other things for example, the JIT. JIT has features where you can at the moment use doc block comments to declare methods as always JIT-able or never JIT-able. Dmitri used doc block comments to check for JIT or no JIT tag in there. This is essentially something that attributes should be used for because should be machine readable. Then there's a lot of other stuff that for example, Rust also put forward that PHP is struggling with: conditional declarations of functions. For example, Symfony has a polyfill library that adds functions that are in higher languages, re implements them in a way that they're also available in lower versions where they don't exist in core. There are a lot of hacks around the sort of conditional declaration of functions and classes and stuff that make it difficult for OPCache to actually cache the files. I believe there are also even more problems if you use these kind of fights with pre loading. Essentially what could be done with attributes would be something like conditionally declared as function only if it's on PHP 7.3 and lower something like this.

Derick Rethans 16:13

You just mentioned using JIT or no JIT as an annotation. Does that also mean that extensions have easy access to these attributes?

Benjamin Eberlei 16:21

OPCache's not a PHP core functionality. It's still its own extension. The idea is that extensions have access to attributes in a very simple way. So there will be a Zend API, sort of an internal name for an API that the Zend engine provides to extensions and extensions will be able to access attributes and make decisions based on this. Extensions can already hook into the compile step of PHP and there's a hook called zend_ast_process. During AST processing, you can do stuff. That would be one way to, for extensions to look at attributes and maybe change code if they want. Then the engine obviously has tonnes of other hooks where the declarations are available in the data structure that the Zend engine provides. So there's zend_class_entry, for example, where you could look into the attributes as an extension and make decisions.

Derick Rethans 17:20

This is a pretty new RFC, and hence there're always going to be few open issues. Because we like to argue about stuff. What are the open issues on this RFC?

Benjamin Eberlei 17:29

This is the seventh RFC on this topic. So there has been a lot of discussion. I guess this feature is, in a way quite controversial because of the implementation details. A lot of my work now will be to find the best implementation that can actually make this feature part of core by getting enough votes for it. And so I gathered a lot of feedback from the community; also talked a lot to contributors. Changes that I will be probably doing is allowing multiple attributes. What I said before, the auto loading has to be clarified. There has to be some distinction between internal attributes and user land attributes in a way that doesn't require auto loading. Hack, for example, has __ as a magic prefix, which I want to avoid, because it puts up all this magic methods, sort of argument back on the table. We need to have something to make a distinction between userland and internal attributes, because the internal attributes need to be validated very strictly at compile time. And the userland attributes need to be validated only when you call the getAsObject() method on the reflection API.

Derick Rethans 18:42

How long do you think there'll be before you put this RFC up for a vote?

Benjamin Eberlei 18:46

It's a bit tricky because this issue is so controversial. I don't want to invest month of work and then get a no vote. And so I do want to have some feedback quite quick enough. I do realise that the first draft needs some work and clarifications that would otherwise lead to no votes from contributors. So I hope to get this done in, let's say, two to four weeks of additional work.

Derick Rethans 19:09

All right, Benjamin. That was a great explanation of the attributes version two RFC.

Benjamin Eberlei 19:16

Thank you for having me, and I really appreciate it again.

Derick Rethans 19:21

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 46: str_contains()

PHP Internals News: Episode 46: str_contains()

In this episode of "PHP Internals News" I chat with Philipp Tanlak (GitHub, Xing) about his str_contains() RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 46. Today I'm talking with Phillipp Tanlak, about an RFC that he's made titled str_contains. Phillipp, would you please introduce yourself.

Philipp Tanlak 0:35

Hey, Derick. My name is Philipp. I'm 25 years old and I live in Germany. I work for an IT service company, which does mainly development and maintenance of IT projects. We specialise in the maintenance of e-commerce website and create enterprise applications.

Derick Rethans 0:52

How long have you been using PHP for?

Philipp Tanlak 0:54

I've been using PHP for quite a long time now that might be six years I guess.

Derick Rethans 0:58

What brought to you creating an RFC?

Philipp Tanlak 1:02

The main reason I've created this RFC was out of necessity and interest, mainly to scratch my own itch.

Derick Rethans 1:08

That is how most things make it into PHP in the end isn't it?

Philipp Tanlak 1:11

Yeah, I guess.

Derick Rethans 1:12

The RFC is titled str_contains, that tells me something that is about strings and containing things. How do we currently find a string in a string?

Philipp Tanlak 1:22

The current approach to find the string in a string is to use the strpos() function or the strstr() function. But on Reddit, I found someone also use preg_match which I find kind of interesting.

Derick Rethans 1:35

There are multiple amount of different methods in use, what are the general problems with these approaches that people have made?

Philipp Tanlak 1:41

So the current approach which I find is not very intuitive, and mainly because of the return values of these functions. For example, the strpos() returns either the position where the string is found, or a false value if the string is not found, but there has to be a check with a !== operation, and the strstr() function just returns a string. So you have to convert that to a boolean to check if the string is found or not.

Derick Rethans 2:11

Because with strpos(), if you wouldn't use the === or !== operator. Of course, if it would find it at the first position of the string, it'd be zero position, and it would return false, even though it's sfound it.

Philipp Tanlak 2:26

Yeah.

Derick Rethans 2:27

So there's a few different problems with these things. Also, I don't think it's particularly vary intuitive to do because you sort of need to come up with like a whole construct to see whether it's part of a string.

Philipp Tanlak 2:37

Correct. I don't think it's intuitive for a beginner. So if someone is learning PHP for the first time, then he has to search through the documentation, what are the exact return values for these functions, and has to remember that so I thought, string or str_contains() might be a better fit for that to just return a true or false value.

Derick Rethans 2:58

We've mentioned str_contains() a few times now, I guess the RFC is producing to add this function. How would this function differ from what PHP already has?

Philipp Tanlak 3:07

So this function does not differ in a lot of ways. It's basically the same implementation of the strpos() function. But instead of returning the position of the found string, it just simply returns it as a boolean value. So either true or false.

Derick Rethans 3:23

I can imagine some people will say, well, you can just do this in your own wrapper function, right? Because pretty much what it deos is converting the results from strpos() to a boolean. But you must have a good reason of why to want to add an extra function here.

Philipp Tanlak 3:38

The reason for this function, and maybe someone might disagree is, mainly a user experience for the developer. So this is just out of necessity which I found, and I've been using this function quite a lot. So I thought this might be a valid add to the PHP language. So I tried to implement it and it got some great reviews. So I thought that wasn't a very bad idea I had.

Derick Rethans 4:04

Is the RFC suggesting just out a single function: str_contains().

Philipp Tanlak 4:09

Yes, the RFC is currently adding just a single function, which is the str_contains(). When I first submitted the discussion about this RFC, there were quite a few people asking why is there no case insensitivity or multibyte versions for these, and I did not think of those at first. But in the discussion, it became clear that the multibyte version did not seem to be very necessary because the comparison is going to be byte by byte. Unlike strpos(), the position of the found string is not relevant. So it doesn't matter if there is any difference in encoding.

Derick Rethans 4:47

I remember in last year, there was another RFC related to strings functions they were the string_starts_with() and a string_ends_with(). Those are two functions and there were also variants for both case insensitivity, ss well as multibyte. Which made eight different functions to be added to pretty much do a single thing. That RFC failed, potentially because there are so many things being added.

Philipp Tanlak 5:11

Yeah, that was also the main reason, I think the case insensitivity of this function, or the variant of it was not so relevant. So I did not include it into the RFC just because of this case you mentioned. So instead of polluting the global space with more functions, someone suggested to just advance PHP incrementally and add in case sensitivity for this function just if it is necessary.

Derick Rethans 5:37

This is a common recurring subject. Most of the people I spoke with in the last few episodes are all adding things to PHP bit by bit instead of coming up with big RFCs which I think is a good way of going forwards. When reading the RFC, I had a quick look at which argument the function would accept. PHP of course this weakly typed strings in most of time. Is this str_contains() function handling distinct different from what strpos() does for function arguments.

Philipp Tanlak 6:10

So the str_contains() function uses the same internal function, which is php_memnstr(), if I recall correctly. It tries to interpret it as a string. And if it's not a string, it either throws a warning or notice, but I've just run some checks and it seems like in the next PHP version, non string values which are passed into the string functions will be interpreted as a string, and if that is not the case, it will throw an error or usually return false.

Derick Rethans 6:43

So it doesn't do any special magic, and just relies on the PHP tends to do for parsing arguments and weak and strict typing.

Philipp Tanlak 6:51

Yes, that's correct.

Derick Rethans 6:53

Most RFCs they come with a patch, as does yours. How did you find it getting started with writing things for PHP instead of using PHP.

Philipp Tanlak 7:02

So basically, I've looked at the PHP source code in the past, just to see how things are implemented. And I had some basic background in C. So I thought that this was not very hard for me. Most of the functions or things I had to do to include this patch, were already there. So basically, I just copied the strpos() function and remove the, when the string is found, use the position to calculate a new string and just remove that code and return the boolean value from the found position.

Derick Rethans 7:35

Because it is not a very different function from strpos(), it's just pretty much a different return type. It's a lot easier to do.

Philipp Tanlak 7:44

Yeah.

Derick Rethans 7:45

When looking at feedback, what were the main criticisms of this?

Philipp Tanlak 7:48

The main criticism of this was basically just the variants of these functions. So mainly the multibyte variant or the in case sensitivity. Other than that, the response was very, very nice and, and also very rewarding for me. So I thought I did a good job on this. And many people wanted to have this function in PHP, but either did not have the time to implement it or it was too easy. I'm not sure how that went. But I think the response from the devs and the overall PHP community was very nice.

Derick Rethans 8:23

The RFC is already in voting, so I'm I'm a bit late to talk about them. Usually I'm and things are still in discussion. And at the moment, it looks like it is passing because the votes are 43 to 6 with another weeks ago, then.

Philipp Tanlak 8:37

Yeah.

Derick Rethans 8:37

Do you think this will be your last RFC? Or do you have something else in mind?

Philipp Tanlak 8:41

At the time of this recording I don't have anything else in mind, but maybe if I find something. Since I'm working with PHP on a daily basis, which I think is worth adding to PHP I might create a new RFC.

Derick Rethans 8:54

That's how I started and see what happens now. Thank you for taking the time to talk to me today Phillipp, I hope you enjoyed this.

Philipp Tanlak 9:01

Yeah, thanks for having me Derick.

Derick Rethans 9:05

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 45: Language Evolution Overview Proposal

PHP Internals News: Episode 45: Language Evolution Overview Proposal

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the Language Evolution Overview Proposal RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 45. Today I'm talking with Nikita Popov yet again about a non technical RFC that he's produced titled language evolution overview. Somewhere last year, there was a big discussion about P++, an alternative ID of how to deal with improving PHP as a language but also still think about how some other people already use PHP and I don't really want to change how they currently use PHP. Like then I didn't really have an episode about that because I'd like to keep politics out of this podcast, or definitely PHP's internals politics. I do think that we realised at that moment that something did have to happen, because there's not really policy about when we can add things, when we can remove things, and so on. So I was quite pleased to see that you have come up with a quite wordy RFC, not talking about anything technical, but more looking forward of were will see PHP in the near or medium future, I would say. What are your thoughts about making this RFC to start with?

Nikita Popov 1:29

As you mentioned we had some pretty, let's say heated discussions last year, concerning especially backwards incompatible changes. So there were a number of very, very contentious RFCs. One of them was the short opentags removal, and another one was the classification of undefined variable warnings. So whether those should throw or not throw, and well basic contention is this that PHP is a by now pretty old language, 25 years old. And we can all admit that it's not the language with the best design. So it has evolved relatively organically with quite a few words, and the famous inconsistencies. And now we have this problem where we would like to resolve some of these long standing issues. Many of them are genuine problems that are introducing bugs in code, that reduce developer productivity. But at the same time, we have a huge amount of legacy code. So there are probably many hundreds of millions of lines of PHP code. And every time we do a backwards compatibility break, that code has to be updated, or more realistically, that code does not get updated and keeps hitting on old PHP version that, at some point also drops out of security support. And now the question is how can we fix the problems that PHP has, while still allowing this legacy code to update their PHP version. The general idea of how to fix this is to make certain backwards compatibility breaks opt in. By default, you just get the old behaviour, but you can specify in some way, exactly how it's done doesn't really matter at this point, that you want to opt into some kind of change or improvement.

Derick Rethans 3:34

As one example being the strict types that have been introduced in PHP that you need to turn on with a switch with a declare switch.

Nikita Popov 3:42

Strict types is really a great example because it has the important characteristic that has done per file. So you can turn on the strict types in one file and not affect any other code, at least in theory. So there are some edge cases, but I think like mostly you can just enable strict types in your library and you don't affect any other library that the project uses. We would like to extend this concept. It should be possible that libraries can update to your language, well, it's called language dialect without forcing other libraries or without forcing the using codes to update as well. Because this is what we have to do right now, though, before you can update your project to PHP eight, let's say, you first have to wait that all the libraries you're using update to PHP eight. And maybe there are libraries that are going to update but also say that: Okay, now actually PHP eight is required. And then you kind of get these complex dependencies with libraries supporting these versions and not supporting those versions, and doing updates becomes pretty hard. As I said, the idea is to make the these backwards incompatible changes opt in some way, and there are multiple general models. So as you mentioned, P++ is the most radical approach. It's more or less a separate language but sharing the same implementation. And as the name suggests that this is inspired by C and C++. So those are usually implemented in the same compiler. And they can be interoperable in a limited way, mostly in that you can use C code inside C++ easily. Using C++ code inside C code tends to be much harder. Yeah, P++ is, I think the option we are pretty unlikely to take for a couple of reasons, because it's this kind of one time huge break which first means that we only have one chance to get it right, and given all the track record, we should maybe not rely on that. Also means that the upgrade becomes especially hard because you have to do everything at once. It's not spread out over a longer time.

Derick Rethans 5:54

You say that we need to get it right in one go, but that is hard to say because you don't know, in the future what else we want to add? Like the RFC mentions a few few other cases, like, for example, things like forbidding dynamic Object Properties, we'd have to do right away now as well, if he'd go with the two languages one implementation phase, right? I mean, if we hadn't thought about it, nobody would have thought about it after the split as we made, we'd still not be able to do it.

Nikita Popov 6:20

That's true. So P++ is, one time, one time solution. It doesn't really scale over time. I mean, there are also other concerns. And I think like in the end, one of the big ones is just that we don't have the resources for it anyway. So we have only maybe three full time developers on PHP. And I don't think we want to start focusing on this huge separate language more or less. Now we're just going to take a couple of years. Next to having this entirely separate language, there are two other ways to approach the problem. One is editions, which is a concept used by the rust programming language. The idea there is that next to the version, which is more or less than implementation version, you also have this edition, which is a completely orthogonal concept. Basically, we will say: okay right now we are for example at edition zero. And then in addition one you opt into some kind of set of backwards incompatible changes. Then in addition two, there are more backwards incompatible changes, and so on. Each edition is essentially a superset of the previous one.

Derick Rethans 7:32

Would it also mean you couldn't get new features in a new edition or is it purely about making backwards incompatible changes?

Nikita Popov 7:40

So, this is purely about backwards compatibility. So, if a new feature can be added without breakage then should always be available. The editions switch would only control the backwards incompatible parts. This is to contrast with the second approach, which is to have fine grained declare statements. As you already mentioned, we have the existing strict types directive and we could continue down the same path. So, we could add new declare for no dynamic Object Properties equals one, and then for a strict operators equals one, and for whatever else equals one. And then you would have this long list of possible declares, with which you could enable or disable some particular bit of language behaviour.

Derick Rethans 8:26

Then I can imagine that in another five years, that list might be 20 options long.

Nikita Popov 8:31

Right. So, the concern there is of course, one part is maintenance, because we have to support basically an exponential combination of different options. And the other is from the programmer perspective, that the like mental model becomes more complicated because you have to keep in mind like which exact set of declares am I using right now? I should say, though, that this model is actually used by Python. Because Python has this import or use from future feature. So there is basically this magic module __future from which you can import language features that will become the default in newer Python versions. For example, you can import the new integer division behaviour inside an older version. This is more or less the same as doing the declares, the fine grained declares, just with a different syntax and with the I think, stronger focus that the behaviour is going to become the default in the future version.

Derick Rethans 9:38

So basically, you're opting into experimental functions really?

Nikita Popov 9:41

Could be either experimental functions, or it could be really functions from newer versions. In particular Python, also for a while had parallel development of Python 2 and Python 3, in which context this probably makes more sense.

Derick Rethans 9:56

There's pretty much three options that the RFC mentions: a new language common implementation or the PHP / P++ option, the editions, and the fine grained declares. These are all still going to be based per file?

Nikita Popov 10:12

So that's the second large question, what is the general model? And the second one is where we declare it. The approach I was initially pursuing was to have this declare it at the package level. So for a whole library or for for a whole project.

Derick Rethans 10:32

How would you define what a package is?

Nikita Popov 10:33

We have namespaces. And there is a somewhat loose coupling between namespaces and packages. So I have an old RFC for a namespace scope declares, where you could, for example, specify strict types for whole namespace, which is, I think, maybe the most natural way to treat packages right now, because this is the closest thing to a package we have. Fortunately, it does have a few issues. One of them is that this namespace package mapping is not always there. So there are packages that have some somewhat odd nesting of name spaces. And I've also heard that some people, for example, define their models inside the Doctrine name space, because they're, you know, extend their classes. So they also put them the namespace. Of course, you shouldn't do that. But it's things that could happen, because we don't really have this enforcement that the namespace really is a package. And then there are also technical concerns, because right now, namespaces are really just a compile time thing to handle name resolution, and now they kind of turn into a feature that also has some kind of runtime impact. And you have to consider things like what happens if you have multiple namespaces in the same file, and also other considerations, like what happens if the names namespace is first used, and you issue some namespace scope declares afterwards. All that can be resolved, but it makes the model somewhat more complicated.

Derick Rethans 11:53

And I guess you end up having to declare these namespace scope declares maybe in a separate file or something like that?

Nikita Popov 12:14

At least what I have in mind that is that you would declare them in composer.json, and Composer would then take care of registering them with PHP itself. Of course, you could also do that manually, which are not using Composer but that at least was the 95% use case.

Derick Rethans 12:31

In applications that make use of Composer, it is very likely that Composer knows about all the libraries that a specific application uses, and hence will be able to construct an array, where it can tell PHP by calling a function declaring all the different options or editions of whatever that end's up being.

Nikita Popov 12:49

So that's one of the approaches. There are also some alternatives. One is to instead introduce an actual package concept. One of the possibilities is to basically: add an extra line to each file, which says package and the package name. So that really removes any and all ambiguities. But you do have to add that extra line, which serves some very limited purpose. And basically only for these package scope declares, could maybe also be used for some extra features, like, package private symbols.

Derick Rethans 13:23

But it would also instantly make that code base non-parsable with older PHP versions.

Nikita Popov 13:28

That's also true, right. But that's a general problem that most approaches I think, would have. So namespace scope declares is one that doesn't have it, but even the per file approach would have this problem because if you write for example, declare edition, then you would right now on PHP seven get the warning that the edition declare is not known. Yeah, last variant that I'm discussing here is to make packages based on the file system, which is something many other languages do. So you have some kind of magic file somewhere that says okay, this directory and all the sub directories are part of the package. In PHP, this kind of file system based approach is somewhat problematic, because our include mechanism is not really based on the file system but on fairly general stream abstraction. You can include from the file system, you can include, if you're really crazy from HTTP, but you can also include from Phar files, from an input stream, or from some kind of custom defined stream. These file system based packages require some additional operations to be well defined. So they have to have a notion of path canonicalization so you can determine whether a file is inside the directory, even if there are things like symlinks or the file system is case insensitive. Which does exist for the file system. So we have the real path syscall, but doesn't exist for streams right now. And a similar problem is that we need to be able to walk up from a path to the directories. And that's also something that doesn't exist for streams. And like more generally, not all streams really have a well defined concept of a directory. For example, if you are reading a file from stdin, so the stdin or the input stream, then there is no directory and like, which package is that going to be in?

Derick Rethans 15:31

I think it would be hard to end up debugging at some point. So why some things don't actually end up being in a package where you expect them to be, for example. And then on top of that, you also need to define: Well, how do I call this file and things like that, right? I mean, a PHP script wouldn't be just a single file, for example, would be a single file and this extra definition file. And that's the concept of course that we don't have in PHP at all. Everything is on profile pretty much.

Nikita Popov 15:56

Which is why at least to right now. I think, like the immediate way forward, is to use per file declares. So if we don't use the fine grained declare approach, and instead have a single edition, then it's not really a problem to put the declare edition inside every file, because this is already what we do for strict types. It's like not super ergonomic. But I think it's also not a huge problem. And it does have the one very big advantage that files are and remain self contained. So you don't have to consult an external definition that may be hard to locate to figure out how to process.

Derick Rethans 16:36

And every IDE or tool would have to implement that same logic and make sure that it's all consistent with each other as well.

Nikita Popov 16:43

I wouldn't say it's really hard, but it might be somewhat fragile, especially when it comes to convention. I said if we put things in composer.json, there's probably something tooling can easily deal with. But if you then encounter a project that doesn't use Composer and uses as some other way to register the package declares, then you might run into problems.

Derick Rethans 17:09

Lots of things to talk about and discuss at some point. As you submitted this RFC to the mailing list some time ago now, what is sort of the feedback that you're getting on this?

Nikita Popov 17:19

So I think the general direction, at least this pretty clear. Most of the discussion is focused on the addition concept, not the finger in declaratives, or the P++. I think for now, we would also go with the per file approach. Now, the main two points that remain contentious is: first, how does the support timeline look like? So basically, the concept of editions just enables different libraries to upgrade independently. That's the core premise. But at least in Rust additionally editions of are also guaranteed to be supported forever. So you can leave your old code running on the old edition, and you do not have to ever update it.

Derick Rethans 18:10

How often do they make new editions? Every three years?

Nikita Popov 18:13

Yeah, it's not quite clear yet, but probably it's going to be every three years. And now for us, the question is, well, do we want to support old editions forever? Or do we want to give them a finite lifetime? Say we introduced a new edition in PHP eight, and then we supported until PHP nine. That means code can take its time to do the necessary updates, but it does have to do the updates at some point.

Derick Rethans 18:37

But you'd have five years?

Nikita Popov 18:39

It's more of the general question of if it's forever or if it's limited. So I think based on the discussion, there is a pretty strong preference to not support them forever.

Derick Rethans 18:51

But for how long then? I mean, it must be longer than what we support a normal PHP version for, right?

Nikita Popov 18:56

Yeah, would expect it to be something like a major version cycle. The second question is related to the strict types, as you said, strict types is like an existing example of a mechanism that works like this. And now we're introducing a second mechanism with the same basic characteristics. Are we going to merge them or not? Would we say that, in the new edition that strict types is enabled by default, or even always enabled? If we do that, and we say that additions have limited support life, that means that strict types is going to become the only option in the future at some point, at least. You can imagine that this is somewhat contentious because there are quite a lot of people who consider weak types to still be the superior option.

Derick Rethans 19:49

Whenever I go speak at conferences or user groups, that's not the case. One question is, which keeps recurring always is: Why isn't this the default in PHP eight? I think there's an expectation that strict title at some point is going to be turned on by default.

Nikita Popov 20:04

Yeah, and the thing, this is where people disagree whether this expectation is this or not. So there are plenty of people in the discussion thread, well, by plenty I mean, at least two, who strongly think that strict types should remain an option. I mean, PHP of deals with often deals with input coming from HTTP or from a database which is usually coming in as a string. And they think that the typecast you have to do to make that work with strict types actually kind of weaken the type safety guarantees, because if you perform an explicit cast, then that cast is performed basically without any checks. So you can like take a completely non numeric string cast it to integer and you will get zero without any warning or whatever. While even in weak typing mode, that would still result in an error.

Derick Rethans 20:58

It's a curious thing actually when you mention databases because, of course databases, you've defined very strict types for your data in them. It's just that it's interesting that PHP's interface to most of these old SQL databases, just decided to always turn into a string.

Nikita Popov 21:14

It's it does actually support returning things in they're like native type.

Derick Rethans 21:20

With PDO, yes.

Nikita Popov 21:21

But under options, and I think it's also like dependent on whether you do emulation or not, and stuff like that. And you have all these different drivers that have differing support for that. But yeah, to get back to strict types, but one of the options is to really keep editions and strict types separate, and also evolve the strict and the non strict mode independently. So you could say that in the new edition, the strict typing mode becomes stricter, for example, by also extending to operators, arithmetic operators, not just to function arguments, but that of course doesn't mean that: Yeah, we saying strict types of states exist forever as a separate track of language.

Derick Rethans 22:06

Yeah, that's an interesting one. I'm not sure how to get to a conclusion there actually. Because there's always going to be people on each side side.

Nikita Popov 22:13

Yeah.

Derick Rethans 22:13

Would you think that this language evolution overview proposal would have been decided on which way to go by the time feature freeze for PHP eight comes around?

Nikita Popov 22:23

I think it would be pretty good to have this for PHP eight, because well, it's new major version and the time to introduce this kind of concept. I should say, though, that we already have quite a few backwards incompatible changes in PHP eight, and at least some of them are, like, we are definitely not going to retrofit them into the editions concept. So there are already certainly going to be breaking changes there.

Derick Rethans 22:52

Why wouldn't you retrofit them? I mean, if we end up deciding a PHP eight will have these editions, would they not be part of that or would they always end up breaking anyway? Because it seems like a sort of an ideal place to then do it.

Nikita Popov 23:05

And yeah, problem is just that the there are some quite extensive changes, especially when it comes to warnings versus exceptions, and will just be like a lot of efforts to get this under an edition flag and to support both behaviours there. Maybe some of the existing changes could be moved into there, with not a huge amount of effort. But I think there are definitely going to be some like hard edition independent breaking changes.

Derick Rethans 23:37

New major PHP versions still might have some backward breaking changes independently from when we do the editions or not, or more declares or not?

Nikita Popov 23:46

Yeah, that's like one more question, what exactly is the scope of editions? What goes into the edition, what doesn't go into there? I mean, there is always a cost to ending something with this mechanism. One is just maintenance for us. And of course that like user has to consider more different versions of the language. And I think one particularly large aspect that would likely never fall under edition concept is changes to the standard library. So additions work well for language changes, but I don't think they really make sense for a standard library changes. So everything that involves depreciations, or functions with eventual removal would not be covered for that.

Derick Rethans 24:31

Do you have an example of such a change in the standard library that PHP eight might have?

Nikita Popov 24:36

What I just said might as the general that, usually in every PHP version, we deprecate a bunch of functions and are going to remove them at some point. And these deprecations are like going to apply independently of what edition you set. Actual changes in terms of like real behaviour changes of the standard library I think that's something we quite rarely do. Actual changes to the standard library where the behaviour of a function is changed. That's something we generally try to avoid. Specifically because this causes relatively subtle backwards compatibility breaks. So usually we will either do changes by introducing a new flag or a new function, or by deprecating the functionality entirely. Even when it comes to language changes, there is like I know one example. And the discussion was, well, if we had the edition concept, and we wanted to introduce something like traits, the trait functionality in general is not backwards compatibility breaking. But the trait feature does introduce two new reserved keywords, which is trait and insteadof. So there is technically a backwards compatibility break even though it's finer. And now you have the trade off. Do you introduce traits in the new edition and only reserve the keywords there, thus removing any backwards compatibility break. Or do you you introduce it always, which means that everyone can benefit from it, even if they haven't updated the code to the new edition yet. But it does introduce the small backwards compatibility break. And then you get this trade off and the discussion what you should be doing about that.

Derick Rethans 26:17

I think making that kind of decisions will have to be done based on evidence. And I think in the past you've used the top thousand projects on GitHub and see whether things break or not to make a decision. For example, having the nested, or the triple, quadruple nested ternary. Anytime people use it, it's pretty much a bug in the code.

Nikita Popov 26:36

Yeah, so to give one example, in PHP 7.4, we introduced the short closure syntax with the fn keyword, and they're the source code analysis showed that basically, fn is not used outside of tests, apart from one library, which is my own. Which does have quite a few dependencies. And that library was indeed broken essentially completely by that change. So in that case, I think there might have been an argument that this feature should be introduced under an edition, because there is like evidence of actual breakage in the wild.

Derick Rethans 27:14

This is one of us trying to get it right. We now have evidence for it.

Nikita Popov 27:18

And probably like the insteadof keyword for traits, that there's much less problematic.

Derick Rethans 27:24

Again, as I say, it's the data that speaks that there right? That was quite a bit to go through. I'm curious to see where those discussions ends up going. Hopefully, we get to a conclusion somewhere in the next few months and ready for PHP 8.0. Who knows? Maybe we have another podcast episode where we introduce a new editions concept.

Nikita Popov 27:43

So this is probably my most vague RFC, with a somewhat unclear goal and the somewhat unclear discussion outcome.

Derick Rethans 27:53

Do you have anything else to add to this discussion that we've missed?

Nikita Popov 27:55

I think there is just one thing maybe worth mentioning, which Rust uses pretty extensively, which has automatic upgrades. So they have some tooling to do that, which is mostly reliable. And I think it would be pretty nice if in PHP, we had something similar. In PHP, we can't really make this reliable because language is just way too dynamic. And we actually do have some tooling in the form of the rector library. But we might want to think about providing something under the PHP project umbrella that is more geared towards like doing updates that are as safe as possible. So you can run them without thinking but still reduce your loads some what.

Derick Rethans 28:40

And that is something that is definitely for the future. Thanks for talking to me about the language evolution overview proposal.

Nikita Popov 28:46

Thanks for having me, Derick.

Derick Rethans 28:53

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP line. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.